Life-Saving Unix Text-Processing Tools

This is my second incomplete post that I’ve started to write about two years ago, but posting incomplete version now :) .

If you are going to parse a log file or do some text-processing, there are too many ways to do those simple tasks. But I think following commands are should-be-known list:

jot-seq:

jot is a tool used for sequential or random numbers or characters but can also help you do string formatting as well. seq can be used for similar purposes but I usually prefer jot which is much more powerful. Here are some example commands:

jot 3
This will output the numbers from 1 to 3 separated with newline:
1
2
3

jot 21 -1 1.00
This command prints 21 evenly spaced numbers increasing from -1 to 1:
-1.00
-0.90
-0.80
-0.70
-0.60
-0.50
-0.40
-0.30
-0.20
-0.10
-0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00

jot -s ” -b ‘-’ 50

will print 50 hyphens:

————————————————–

-s ‘word’ option denotes the seperator string by default it is newline ‘\n’.
-b ‘word’ option is used for printing the ‘word’ repetitively.

jot -b yes 0

will print the word ‘yes’ infinitely.

jot -w tmp%d 10 1
prints the following string:
tmp1
tmp2
tmp3
tmp4
tmp5
tmp6
tmp7
tmp8
tmp9
tmp10

-w option is used to print word with the generated data appended to it. Octal, hexadecimal, exponential, ASCII, zero padded, and right-adjusted representations are possible by using the appropriate printf(3) conversion specification inside word, in which case the data are inserted rather than appended.

An interesting use case for jot is below:

for f in `jot - 0 50 5` ; do ping -c 1 -m 50 10.0.2.$f ; done
This will print ping the every fifth ip address from 10.0.2.0 to 10.0.2.50.

PING 10.0.2.0 (10.0.2.0) 56(84) bytes of data.
Warning: Failed to set mark 50

— 10.0.2.0 ping statistics —
1 packets transmitted, 0 received, 100% packet loss, time 0ms

PING 10.0.2.5 (10.0.2.5) 56(84) bytes of data.
Warning: Failed to set mark 50
— 10.0.2.5 ping statistics —
1 packets transmitted, 0 received, 100% packet loss, time 0ms

PING 10.0.2.10 (10.0.2.10) 56(84) bytes of data.
Warning: Failed to set mark 50
— 10.0.2.10 ping statistics —
1 packets transmitted, 0 received, 100% packet loss, time 0ms

PING 10.0.2.15 (10.0.2.15) 56(84) bytes of data.
Warning: Failed to set mark 50
— 10.0.2.15 ping statistics —
1 packets transmitted, 0 received, 100% packet loss, time 0ms

PING 10.0.2.20 (10.0.2.20) 56(84) bytes of data.
Warning: Failed to set mark 50
— 10.0.2.20 ping statistics —
1 packets transmitted, 0 received, 100% packet loss, time 0ms

PING 10.0.2.25 (10.0.2.25) 56(84) bytes of data.
Warning: Failed to set mark 50
— 10.0.2.25 ping statistics —
1 packets transmitted, 0 received, 100% packet loss, time 0ms

PING 10.0.2.30 (10.0.2.30) 56(84) bytes of data.
Warning: Failed to set mark 50

— 10.0.2.30 ping statistics —
1 packets transmitted, 0 received, 100% packet loss, time 0ms

PING 10.0.2.35 (10.0.2.35) 56(84) bytes of data.
Warning: Failed to set mark 50

— 10.0.2.35 ping statistics —
1 packets transmitted, 0 received, 100% packet loss, time 0ms

PING 10.0.2.40 (10.0.2.40) 56(84) bytes of data.
Warning: Failed to set mark 50

— 10.0.2.40 ping statistics —
1 packets transmitted, 0 received, 100% packet loss, time 0ms

PING 10.0.2.45 (10.0.2.45) 56(84) bytes of data.
Warning: Failed to set mark 50

— 10.0.2.45 ping statistics —
1 packets transmitted, 0 received, 100% packet loss, time 0ms

PING 10.0.2.50 (10.0.2.50) 56(84) bytes of data.
Warning: Failed to set mark 50

— 10.0.2.50 ping statistics —
1 packets transmitted, 0 received, 100% packet loss, time 0ms

od:

od – dump files in octal and other formats. You can use it to see the non-printable characters in a file as wekkç
For example you use:

cat my_file | od -c |more

Or more clearly to see non-printable characters like tabulations, CRLF, LF line terminators in colors:

od -c | grep --color '\\.'
And the output will look like as follows:

0000000 [ D e s k t o p E n t r y ] \n
0000020 V e r s i o n = 1 . 0 \n T y p e
0000040 = L i n k \n N a m e = E x a m p
0000060 l e s \n C o m m e n t = E x a m
0000120 U b u n t u \n U R L = f i l e :
0000160 m p l e - c o n t e n t / \n I c
0000200 o n = f o l d e r \n X - U b u n
0000260 t \n

The other incredibly useful linux utilities for text processing are which I’m not going to discuss here, recommend you to do some googling about them are:

  • cut – paste/join
  • head – tail
  • awk/sed
  • uniq
  • tr
  • grep / ack
  • perl
  • tac
  • sort
  • split

Related Posts:

A few Life Saving Tips to Use with Bash

I was away for two years, I’ve just decided to check this blog again and saw some incomplete blog posts that probably wait to be completed for eternity. But I’ve just decided to post the incomplete version of them, hoping that they’ll be helpful for someone else on the web.

Alias rm such that instead of removing files for eternity, it just moves them to /tmp or thrash directory. A common solution for this (works for ubuntu and debian based linuxes):
alias rm='mv --verbose -f --backup=numbered --target-directory ~/.Trash/'

Redirect standard output to vi
e.g.: ls -alh | vi -

If no options given, to remove files that have special characters in their name:

rm -- --exclude=img*

“–” tells rm that no more options are given

Related Posts:

Online Supplementary Mathematics Materials for Machine Learning and Artificial Intelligence Courses

Recently I’ve started to see a lot of questions regarding to the “mathematics tutorials or supplementary materials for Machine Learning and AI” in the online discussions with the emergence of Stanford’s online AI and machine learning courses. As with the internet crowd, I’m going to participate these courses as well and I’ve always found the resources provided in this post as being useful for revising my math knowledge. Hence I thought that those might be useful for other folks as well. Probably a having look at only one resource in each category will be more than enough for an introductory level ML/AI course (you can skip some categories which might have less significance for an introductory level ML and AI course, e.g.: skip diff eqs, numerical analysis and study more to the probability and stats):

Linear Algebra

Calculus

Multivariable Calculus

Probability and Statistics

Differential Equations

Numerical Analysis

Intro ML

Matlab

Related Posts:

Basic Concepts in Functional Programming Part 1


In a series of posts I’ll try to summarize (only briefly) the basic concepts of functional programming.

Functional Programming

According to wikipedia the definition of functional programming is:

In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast to the imperative programming style, which emphasizes changes in state.[1] Functional programming has its roots in lambda calculus, a formal system developed in the 1930s to investigate function definition, function application, and recursion. Many functional programming languages can be viewed as elaborations on the lambda calculus.

Functional programming is a style of programming which models computations as the evaluation of expressions and in functional programming the main essence of the programs is functions not objects or procedures. Functional programming requires that those functions to be first order functions  entities. Which means that those functions can be passed as an argument or manipulated like any other value in the program and it is possible to define another function within a function. There is an important feature of pure functional languages:

Referential Transparency

Referential Transparency means given a function and input, you’ll always get the same output when you pass that input to the function. There is no external state that is used in that function. Hence referential transparent function depends only on its input. For example:

1
2
3
int doubleX(int x){
  return x*2;
}

In a pure functional language, all functions are referentially transparent(pure). The counter example (referential opaqueness) of that might be:

1
2
3
4
int M=12;
int getX(int x){
  return x*M;
}

Exercise to the reader: Are mathematical functions referential transparent?

Resources

What does functional programming mean?: http://dl.dropbox.com/u/7810909/docs/what-does-fp-mean/what-does-fp-mean/html/index.html
HaskellWiki, Functional Programming: http://www.haskell.org/haskellwiki/Functional_programming
C2, Functional Programming http://c2.com/cgi/wiki?FunctionalProgramming
Stackoverflow, What is Referential Transparency? http://stackoverflow.com/questions/210835/what-is-referential-transparency

First class Functions

A programming language is said to have a first class function if it treats functions as a first class object. Which implies that it should satisfy the following conditions without recursively invoking a compiler or interpreter or otherwise metaprogramming:

  • Create new functions from preexisting functions at run-time
  • Store functions in collections
  • Use functions as arguments to other functions
  • Use functions as return values of other functions

For example javascript and python supports first class functions. An example on this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
var compose = function (f, g) {
    return function (x) {
        return f(g(x));
    };
};
 
var fn  = [Math.sin,  Math.cos,  function (x) { return Math.pow(x, 3);   }];
var inv = [Math.asin, Math.acos, function (x) { return Math.pow(x, 1/3); }];
 
(function () {
    for (var i = 0; i < 3; i++) {
        var f = compose(inv[i], fn[i]);
        print(f(0.5));    // 0.5
    }
})();

 

Resources:

Wikipedia, First class Functions: http://en.wikipedia.org/wiki/First-class_function
Rosetta Code, First class Functions: http://rosettacode.org/wiki/First-class_functions#JavaScript

Higher Order Functions

Higher-order functions are functions which take one or more functions as an input or return/output a function. It is pretty straightforward to implement them in Python:

1
2
3
4
def f(n):
	    def g(x):
	        return x + n
	    return g

In C you can use function pointers to accomplish this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
//Resource of this code is Wikipedia: http://en.wikipedia.org/wiki/Higher-order_function
//Compute the integral of f() within the interval [a,b]
double integral(double (*f)(double x), double a, double b)
{
    double  sum, dt;
    int     i;
 
    // Numerical integration: 0th order approximation
    sum = 0.0;
    dt = (b - a) / 100.0;
    for (i = 0;  i < 100;  i++)
        sum += (*f)(i/100.0 * (b - a) + a) * dt;
 
    return sum;
}

In many programming languages, map a higher-order function that applies a given function to each element of a list, returning a list of results. An example in perl is:

1
2
  #For example map in the following example translates a list of numbers to their squared values.
  my @squares = map { $_ > 5 ? ($_ * $_) : () } @numbers;

Resources

Wikipedia, Higher Order Functions: http://en.wikipedia.org/wiki/Higher-order_function
c2, Higher Order Functions http://c2.com/cgi/wiki?HigherOrderFunction

Free and Bound Variables

 

According to SICP the definition of free and bound variables are,

A bound variable:

A variable, V, is “bound in an expression”, E, if the meaning of E is unchanged by the uniform replacement of a variable, W, not occurring in E, for every occurrence of V in E.

And a free variable is,

A variable, V, is “free in an expression”, E, if the meaning of E is changed by the uniform of replacement of a variable, W, not occurring in E, for every occurrence of V in E.

In first order logic, a variable is said to occur free in a formula X if and only if it is not within the “scope” of a quantifier (existential or universal).

Basically a free variable is a notation that specifies places in an expression where substitution may take place.
Each of the variable binding operators below, binds the variable x. Hence x becomes a bound variable:

Free and Bound variables, Wikipedia: http://en.wikipedia.org/wiki/Free_variables_and_bound_variables

Free and Bound Variables, Saarland: http://www.coli.uni-saarland.de/projects/milca/esslli/html/node8.html

First-Order Logic: bound variables, free variables http://cnx.org/content/m12081/latest/

Related Posts:

  • No Related Posts

Philosophy and Computer Science

I’ve just found out an interesting MIT course on Philosophy and Theoretical Computer Science with code 6.893 and thought by Scott Aaronson. The most attractive part of this course for me was articles included in the reading list of the web page. There are great articles in that list:

Scott Aaronson, NP-complete Problems and Physical Reality
Scott Aaronson, Is P Versus NP Formally Independent?
Scott Aaronson and John Watrous, Closed Timelike Curves Make Quantum and Classical Computing Equivalent
Ethan Bernstein and Umesh Vazirani, Quantum Complexity Theory
Nick Bostrom, Are You Living In A Computer Simulation?
David Chalmers, Does A Rock Implement Every Finite-State Automaton?
David Deutsch, Quantum Mechanics Near Closed Timelike Lines
David Deutsch, Quantum theory, the Church-Turing Principle and the universal quantum computer
Jack Edmonds, Paths, Trees, and Flowers (Section 2)
Hector Levesque, Is It Enough To Get The Behaviour Right?
Christos Papadimitriou, Algorithms, Games, and the Internet
Ian Parberry, Knowledge, Understanding, and Computational Complexity
Juergen Schmidhuber, A Computer Scientist’s View of Life, the Universe, and Everything
Stuart Shieber, The Turing Test As Interactive Proof
Michael Sipser, The History and Status of the P versus NP Question
Robert Stalnaker, The Problem of Logical Omniscience I
Alan Turing, On Computable Numbers, With An Application to the Entscheidungsproblem
Alan Turing, Computing Machinery and Intelligence
Leslie Valiant, A Theory of the Learnable
Leslie Valiant, Evolvability
Avi Wigderson, P, NP, and Mathematics – A Computational Complexity Perspective
Avi Wigderson, Knowledge, Creativity, and P versus NP
Ronald de Wolf, Philosophical Applications of Computational Learning Theory

Related Posts:

Bitwise Operators and Binary Tricks for System Programming

Bitwise operators in most of the programming languages are common and inherited from the C (e.g.: &, |, ^, >>, …). These operations are critical in the low-level programming (for instance for writing drivers).

In the beginning of this post I’ll just briefly pass over them and present some tricks using them. As a programming language choice, I’ll give examples from C and python.

See the resources for more complete and detailed overview.

Two most Fundamental Bitwise operators

There are two types of conditional operator in most of the programming languages, logical one the bitwise one. For example “bitwise AND” is shown as “&” and the “logical AND” is shown with “&&”. Similarly “bitwise OR” is | and “logical OR” is “||”.

AND operator

Bitwise AND (&) is very similar to the “logical AND” but instead it works with bits only. The bitwise AND compares each bit of the first operand to the corresponding bit of the second operand. It returns 1 if and only if both of its operands are equal to 1, otherwise it returns 0.

For example 00&11 is 0.

Let’s fire up python shell and write 5&7 the result is 5 but how?

5 in binary format is 0101 and 7 is 0111 and by comparing each bit we’ll get 0101 which is 5.

We can use this notion for detecting if a bit is 0 or 1.

For example, given a bit pattern: 0011 (decimal 3), we can determine whether the second bit is 1 may be done by using a bitwise AND with a bit pattern containing 1 only in that bit:

0011 (decimal 3)
& 0010 (decimal 2)
--------------------
= 0010 (decimal 2)

Because the result 0010 is non-zero, we know the second bit in the original pattern was 1. This process is usually called as bit masking.

OR operator

A bitwise OR (|) takes two bit patterns of equal length and performs the logical inclusive OR operation on each pair of corresponding bits. For each pair, if either bit is 1, the corresponding result bit is set to 1. Otherwise, the corresponding result bit is set to 0.

For example if calculate 5&7 in python you’ll get 7. Because 5 in binary format is 0101 and 7 is 0111 and by comparing each bit with inclusive OR we’ll get 0111 which is 7.

The bitwise OR may be used to set selected bits. One example is to set a specific bit (or flag) in a register where each bit represents an individual Boolean state. For example: 0010 (decimal 2) can be considered a set of four flags. The first, second, and fourth flags are clear (0), while the third flag is set (1). The first flag may be set by performing a bitwise OR between this value and a bit pattern in which the first flag is set:

0010 (decimal 2 )
| 1000 (decimal 8 )
--------------------
= 1010 (decimal 10 )

This technique is an efficient way to deal with a number of Boolean values using the minimum of memory.

Other important bitwise operators

NOT

The bitwise NOT (~), or complement, is a unary operation that performs logical negation on each bit, forming the ones’ complement of the given binary value. Digits which were 0 become 1, and vice versa. For example:

~7 is 8. Because 7 in binary format is 0111 and by converting each bit we’ll get 1000 which is 8.

If the number is signed we’ll use two’s complement for the bitwise not operator.

XOR

The term bitwise XOR (^) stands for “exclusive OR,” and means “one or the other, but not both.” In other words, XOR compares each bit, returns 1 if and only if exactly one of its operands is 1. If both operands are 0, or both are 1, then XOR returns 0. For example:

0101 (decimal 5)
^ 0011 (decimal 3)
--------------------
= 0110 (decimal 6)

Assembly language programmers sometimes use the XOR operation as a short-cut to set the value of a register to zero. Performing XOR on a value against itself always yields zero, and on many architectures this operation requires fewer CPU clock cycles and/or fewer bytes (less precious cache-space) than loading a zero value and saving it to the register.

The bitwise XOR may be used to invert selected bits in a register (also called toggle or flip). Given the bit pattern: 0010 (decimal 2) the first and third bits may be toggled simultaneously by a bitwise XOR with a bit pattern containing 1 in the first and third positions:

0010 (decimal 2 )
^ 1010 (decimal 10 )
---------------------
= 1000 (decimal 8 )

This technique also may be used to manipulate bit patterns representing sets of Boolean states.

Bitwise shifts

There are two bitwise shift operators and they are shift left and shift right. In C, these are represented by the << and >> operators, respectively. These operations are very simple, and do exactly what they say: shift bits to the left or to the right. The syntax for a shift operation is as follows:

[integer] [operator] [number of places]

A statement of this form shifts the bits in [integer] by the number of places indicated, in the direction specified by the operator.

Left Shift

Left shift basically shifts the binary n bits to the left. A left arithmetic shift by n is equivalent to multiplying by 2^n (provided the value does not overflow). For example:

Fire up your python shell and run the following in the shell:

3 << 2

Result of this operation is 12. Because,

Decimal 3 is 0011 in binary. When you shift 0011 2 bits to the left, you'll get 1100 and that's 12.

Another example is:

x = 0000 1010 1000 0001;
x = x << 4;

After shifting 4 bits, the new x will be:

1010 1000 0001 0000

Every bit is shifted to the left by one place. When you do this, the MSB (most significant bit, remember?) of x is lost, because there isn't another place to shift it to. Similarly, after a shift left, the LSB of x will always be 0. There is no position to the right of the LSB, and so there's nothing to shift into the LSB, so it's assigned a value of 0.

Right Shift

Similar to the left shift, right shift basically shifts the binary number n bits to the left. That's actually taking two's complement of the number or dividing by 2^n. An example is:

12>>2

and result of this is 3. Because 12 in binary is 1100 and when you shift two bits you'll get 0011 which is 3.

Another example is,

x = 0110 1111 1001 0001;
x = x >> 4;

and the x in the end will be 0000 0110 1111 1001. Here, the bits are being shifted right by four places.

Some Hacks

Computing the modulus:

If the divisor is a power of 2, the modulo (%) operation can be done with:
modulus = numerator & (divisor - 1);
This is about 600% faster.

x = 131 % 4;
//equals:
x = 131 & (4 - 1);

Absolute Value

Forget Math.abs() for time critical code. Version 1 is 2500% faster than Math.abs(), and the funky bitwise version 2 is again 20% faster than version 1.

//version 1
i = x < 0 ? -x : x;

//version 2
i = (x ^ (x >> 31)) - (x >> 31);

Swap integers without a temporary variable using XOR

That's a pretty neat trick and this is about 20% faster.

int a, b, t;
a = b;
b = t;

//equals:
a ^= b;
b ^= a;
a ^= b;

Multiplication

Here is a bitwise multiplication algorithm only using bitwise shifts and addition.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <stdio.h>
 
int multiply (int a, int b){
  int c = 0;
  while (b!=0) {
    if ((b & 1) != 0)
      c = c + a;
    a = a<<1;
    b = b>>1;
  }
  return c;
}
int main () {
  int a = 3, b = 8, c, d;
  c = multiply(a, b);
  d = a * b;
  if (c == d) {
    printf("Result is %d\n", c);
  }
  else {
    printf("Incorrect : %d\n", c);
  }
  return 0;
}

Detect if two integers have opposite signs


int x, y; // input values to compare signs
bool f = ((x ^ y) < 0); // true iff x and y have opposite signs

Check if an integer is even/uneven using bitwise AND


isEven = (x & 1) ^ ((-1 & 1) | ((x < 0) ? 0 : 1)));

This example is taken from this SO post.

Compute the minimum (min) or maximum (max) of two integers without branching

This one is from Bit Twiddling Hacks page:

If you know that INT_MIN <= x - y <= INT_MAX, then you can use the following, which are faster because (x - y) only needs to be evaluated once.

r = y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // min(x, y)
r = x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // max(x, y)

Note that the 1989 ANSI C specification doesn't specify the result of signed right-shift, so these aren't portable. If exceptions are thrown on overflows, then the values of x and y should be unsigned or cast to unsigned for the subtractions to avoid unnecessarily throwing an exception, however the right-shift needs a signed operand to produce all one bits when negative, so cast to signed there.

Sign Flipping

Sign flipping using NOT or XOR:

i = -i;
//equals
i = ~i + 1;
//or
i = (i ^ -1) + 1;

Useful Resources

Bitwise Tricks

Bitwise Operations in C

The art of Programming Fasc 1

Bitwise gems – fast integer math

Bit Twiddling Hacks

Bitwise Operations

Reciprocal Multiplication, a tutorial

Related Posts:

PHP IPv6 address to Decimal Format Conversion

Storing IP addresses in database as decimal have several advantages on storing as a string (storing, efficiency, processing… etc). But my goal was to be able to easily query if an IP address is in a block or between a given IP range. The following two functions are often handy for converting a string IPV6 address to long decimal format. These functions require php gmp to be able to generate long decimals.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
<?php
function ip2long6($ipv6) {
    $ip_n = inet_pton($ipv6);
    $bits = 15;
    // 16 x 8 bit = 128bit
    $ipv6long = 0;
    while ($bits >= 0) {
        $bin = sprintf("%08b",(ord($ip_n[$bits])));
        if ($ipv6long){
            $ipv6long = $bin.$ipv6long;
        } else {
            $ipv6long = $bin;
        }
        $bits--;
    }
    return gmp_strval(gmp_init($ipv6long,2),10);
}
 
function long2ip6($ipv6long) {
    $bin = gmp_strval(gmp_init($ipv6long,10),2);
    if (strlen($bin) < 128) {
        $pad = 128 - strlen($bin);
        for ($i = 1; $i <= $pad; $i++) {
            $bin = "0".$bin;
        }
    }
    $bits = 0;
    $ipv6 = 0;
    while ($bits <= 7) {
        $bin_part = substr($bin,($bits*16),16);
        if ($ipv6) {
            $ipv6 .= dechex(bindec($bin_part)).":";
        } else {
            $ipv6 = dechex(bindec($bin_part)).":";
        }
        $bits++;
    }
    return inet_ntop(inet_pton(substr($ipv6,0,-1)));
}
 
$ipv6 = "fe80:0:0:0:202:b3ff:fe1e:8329";
print $ipv6long =  ip2long6($ipv6)."\n";
print $ipv6 = long2ip6($ipv6long)."\n";
 
?>

Related Posts:

Traverse Files In the Source Folder for C++ Lint

Google has an awesome C++ coding style guideline and they have done a very good job on creating a very handy C++ lint based on that guideline.

I have downloaded it renamed that script to cpplint, chmod to executable and move to /usr/bin to use. (Also changed the shebang to /us/bin/env python, because default shebang uses the python2.4)

But this script solely itself can not traverse the files in a folder recursively and output the errors to a file, hence I had to write a basic bash script by myself:

1
2
3
4
5
6
7
8
9
#!/bin/bash
if [ -n "$2" ]; then
  exec 2>$2
  if [ -d "$1" ]; then
    find $1 -type f -iname \*.cc -or -iname \*.h -exec sh -c  "cpplint --filter=+build,+readability,+runtime,-whitespace --output=vs7 '{}'" \; 
  fi
else 
  echo "Please enter a log file name to output the errors to and a path for the source folder to traverse for errors: ./check_cpplint [search folder] [log file]"
fi

First parameter of this script is the path of the source code’s folder that your C++ codes are located in and the second one is the file that the errors will be written to.

BTW there is another great c++ lint called cppcheck.

Related Posts:

Learn you a haskell for Math! – Part 1

It has been for some time since I’ve used haskell for programming (in fact for my personal and work related projects I don’t use haskell. because I still feel so novice at it), so I thought that implementing some basic mathematical functions might be a good exercise. I’m planning to write 3 series, in the first part I’ll implement some of the basic calculus operators such as integration and derivative of a function. In the second part I’ll implement a few numerical analysis algorithms like newton-raphson, taylor series approximation and sieve of aristothenes. In the third and the last part I’ll present you the implementation of MCMC and gaussian processes.

If you are using haskell there are too many ways to accomplish a task. As a layman haskell-er, my approaches might be naive.

To find integration and derivation of a function you can use any of the following two different approaches:

  1. You can approach problem by using an approximation technique.
  2. You can solve by symbolically. Implementation of a symbolic solver is harder.

Integral of a Function

We’ll try to find the definite integral of a function between upper and lower values, it is basically the area under the curve:

Riemann Sum Approach

First approach that I’ll use will be riemann sum is a method for approximating the total area underneath a curve on a graph, otherwise known as an integral of a specific function.

Formal definition (From Wolfram Math):

Let a closed interval [a,b] be partitioned by points a<x_1<x_2<...<x_(n-1)<b, where the lengths of the resulting intervals between the points are denoted Deltax_1Deltax_2, …, Deltax_n. Let x_k^* be an arbitrary point in the kth subinterval. Then the quantity

 sum_(k=1)^nf(x_k^*)Deltax_k

is called a Riemann sum for a given function f(x) and partition, and the value maxDeltax_k is called the mesh size of the partition.

If the limit of the Riemann sums exists as maxDeltax_k->0, this limit is known as the Riemann integral of f(x) over the interval [a,b]. The shaded areas in the above plots show the lower and upper sums for a constant mesh size.

A general graphical view of convergence of riemann sum to a function is:

The haskell code of riemann sum is shown below. I have used tail recursion, to learn about tail recursion in haskell see this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
epsilon_dx = 0.000001
-- integrate function from a to b
 
integrate f a b = riemannSum f a b epsilon_dx 0
 
--using a tail recursion
--integrate a function f from x to b
 
riemannSum f x b dx sum =
    if x > b
      then
        sum
      else
        riemannSum f (x + dx) b dx (sum + (f x) * dx)
 
f1 x = 1
f2 x = x**2
f3 x = x * exp(x**2)

type of the integrate function is:

1
2
*Main> :t integrate
integrate :: (Double -> Double) -> Double -> Double -> Double

A basic test via ghci:

1
2
*Main> integrate f2 0 1
0.33333283333399866

function f2 is $$f(x) = x^2$$

and the integral we are trying to calculate is:
$$\int^b_a f_2(x) dx = \int^1_0 x^2 dx= 0.33333…$$

so the real and approximated answers are very close.

Trapezoidal rule (also known as the trapezoid rule or trapezium rule) is an approximate technique for calculating the definite integral

 \int_{a}^{b} f(x)\,dx.

The trapezoidal rule works by approximating the region under the graph of the function f(x) as a trapezoid and calculating its area. It follows that

 \int_{a}^{b} f(x)\, dx \approx (b-a)\frac{f(a) + f(b)}{2}.

The graphical depiction of the definition above is:

The function f(x) (in blue) is approximated by a linear function (in red).

Rule

The haskell code for the trapezoidal rule using tail recursion (again):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
--number of steps/resolution taken to find an approximate solution
step_size = 100000
 
-- integrate function from a to b
 
integrate f a b = trapezoidalSum f a b 0 ((b-a)/step_size)
 
--use of a tail recursion
--integrate a function f from x to b
 
trapezoidalSum f x b sum delta
    | x > b = sum
    | otherwise = trapezoidalSum f (x + delta) b (sum + delta * ((f x) + (f (x + delta))) / 2) delta
 
f1 x = 1
f2 x = x**2
f3 x = x * exp(x**2)

Again type of the integrate function is same as in the riemann sum version:

1
2
*Main> :t integrate
integrate :: (Double -> Double) -> Double -> Double -> Double

An example test:

1
2
*Main> integrate f2 0 1
0.33334333344937134

 

Derivative of a Function

First derivative of function at a certain point is actually the tangent of the function at that certain point. Therefore we can write:

m = \frac{\Delta f(x)}{\Delta x} = \frac{f(x+h)-f(x)}{h}.

This expression is Newton‘s difference quotient. The derivative is the value of the difference quotient as the secant lines approach the tangent line. Formally, the derivative of the function ƒ at a is the limit

f'(a)=\lim_{h\to 0}\frac{f(a+h)-f(a)}{h}
and we can show that in the graph as follows:

 

By using currying and higher order functions implementing that derivation is straight-forward:

 

1
2
3
4
5
6
diff :: (Fractional a) => a -> (a -> a) -> (a -> a)
diff h f x = (f (x + h) - f x) / h 
 
f1 x = 1
f2 x = x**2
f3 x = x * exp(x**2)

The test result at ghci is:

1
2
*Main> diff 0.00001 f2 10
20.00000999942131

we have chosen h as 0.0001 and differentiate function $$f_2(x)$$ at point 10. Derivative of $$f_2(x)$$ is $$f’_2(x) = 2x$$ and the result at 10 is, $$f’_2(10) = 20$$ . Result that we have found with the above haskell code is very close to the exact value.

Further Reading

 

Integral Calculus in Haskell, Thoughts

Functional Differentiation, Haskell.org

 

Note: I mistakenly post the wrong revision of this post which was incomplete and realized that a few days later.

Related Posts:

Large Scale Bayesian Inference for Network Tomography

1 Introduction


Major goal of network tomography is to infer the internal characteristics of network by only using data from the end nodes. Each node can either be a computer, router or a subnetwork. Broadly speaking large-scale network inference involves estimating network parameters (can be performance or other) based on traffic measurements at a limited subset of nodes [2]. Network tomography assumes that connections can be directional where data travels from origin to destination and the connections can be uni-directional or bidirectional which changes according to the level of abstraction of the model. During the collection of measurements, some of the data can be missing. Some of the reasons of missing data might be either lossy networks, devices’ load or connectionless UDP protocol.
According to data-collection mechanism, there can be 2 types of network tomography:
• Active Tomography: Active network tomography aims to infer the network’s internal characteristics by sending probe packets (or uses traceroute) from ends.
• Passive Tomography: Passive network tomography infers network information from passive observations of regular network traffic.

Some of the problems in literature about network-tomography (NT) that we are trying to solve are:

• Link-level measurement inference: This is the characterization of latent variables like delay from the observed measurements at the links of network.
• Network topology inference: The task of inferring the network’s topology by only using the end-to-end measurements.
• Sender-Receiver traffic intensity: This is also known as traffic matrices estimation. The task is to estimate the intensity of traffic between two sources.
• Network Anomography: This is the task of inferring anomalies in the network(might be caused by malfunctioning component, malwares or attacks like DDOS) using only the data from the ends.

The mathematical characterization of network tomography problem can be stated as[1]:

Given observations $$Y (t) := [Y_1 (t) . . . Y_l (t)]$$ , over times $$t = 1 . . . T$$ and a matrix $$A(l×κ)$$ such that $$Y (t) = AX(t), ∀t$$ we want to measure the latent variables $$X(t) := [X1 (t) . . . Xκ (t)]$$ over times $$t = 1 . . . T $$. It is always $$κ ≥ l$$ and in general $$κ = O(l^2 )$$.

Many network tomography problems can be approximated by the linear (not necessarily Gaussian) model $$Y (t) = AX(t) + e$$ where X is the $$κ$$ dimensional vector of network dynamic parameters(ex: delay, loss…etc), $$Y$$ is a $$l$$ dimensional measurement vector and hence $$A$$ is the $$l × κ$$ routing matrix. It is difficult to measure the traffic matrices directly. Therefore we try to estimate it by using statistical techniques like maximum entropy[12].

 

2 Related Works


The term network tomography was first used by Vardi (1996)[11] to capture the similarities between Origin-Destination(OD) traffic matrix estimation through link counts: in network inference, it is common that one does not observe quantities of interest but their aggregations instead and this goes beyond OD estimation. Vardi, also devised the linear tomography Poisson model for OD traffic estimation and the linear form [11] is shown later to approximate other network tomography problems [2]. Network Tomography tries to infer the latent variables and usually observed variables of the data is very limited. In that sense Network Tomography is an ill-posed inverse problem. Current approaches in network tomography uses a limited number of measurements to infer the network performance parameters, by using:

• Maximum Likelihood Estimator.
• Bayesian Inference.

and assuming a prior model.

Venkata et al(2003)[7], used Bayesian inference with Gibbs sampling for only using a number of lost and number of sent packets throughout the network. But although their model could give information about bottlenecks in the network, the model’s accuracy suffers. Nowak et al [2] [3] used MCMC to estimate MAP and EM to estimate MLE of the delays. In [6] Liang et al developed a pseudo EM to maximize the pseudo log-likelihood function. In order to illustrate the pseudo likelihood proposal, they inferred the internal link delay distribution and OD matrix estimation from link measurements. M. Rabbat et al in [8] inferred the network structure from the co-occurence data. Network elements that co-occur in many observations are assumed to be closely connected. Particularly, network elements that co-occur in many observations are probably closely connected and they modeled the co-occurrence observations as independent realizations of a random walk on the graph, subjected to a random permutation which accounts for the lack of order information. Treating the permutations as missing data, Rabbat et al employed MCEM algorithm with importance sampling for estimating the random walk parameters. As a result they were able get an accurate model of the network.

 

3 Possible Approaches to Attack the Problem


Queue management is an important aspect of routers. Routers have several interfaces (inif-ingrees, outif-egress) which each link is connected to and the data comes from these interfaces processed by a FPGA or specific multi-core CPU’s. Packets from different queues are processed in a FCFS and round-robin way. Hence it is a natural way to use queuing networks to model the routers’ behavior. Internet is built on the distributed networks scattered around the world and naturally measurements in the end-points can be modeled with queuing networks and graphical models which also allows us to apply Markov chain Monte Carlo for the missing data. C Sutton et al developed a model for Bayesian inference in queuing networks [10], [9] and obtained satisfactory results in their test for the web servers. In a similar fashion, using queuing networks for network tomography might be a more realistic model. Thus I expect that it will yield better accuracy for the estimation of throughput and RTT (round trip-time). Because the large amount of data-processing we’ll need to parallelize and adapt scalable approaches for the machine learning algorithms to be used.Sensor-networks are becoming more ubiquitous and Internet is moving from client-server architecture to the P2P architecture. These networks have specific importance in network tomography. Because there is no other efficient way to have information about the networks internal characteristics. There are some challenges that network tomogoraphy for these networks are bringing to like massive data load and lack of ends. Because every node can be an end. Thus there is no specific end. It might be more efficient to get measurements from supernodes. Network coding[4] is a new technique for modeling network and it has many applications in network tomography. This model is very suitable for bayesian causality and belief propagation. The possibilities should be investigated. Apart from the network’s internal characteristics, it is possible to approximate the router’s internal characteristics like cpu load and temperature with a sampler like MCMC.

With the recent revision of SNMP protocol it is possible to get different types of measurements from MIB’s like the number of in and out connections of a host (this data can be obtained from SNMPv2 by implementing a simple SNMP agent). It is possible to predict device’s connection behavior in an evolving network similar to the social network analysis and protein-protein interaction predictions. This problem is highly structured and device’s behavior is highly correlated with the number of incoming and outgoing connections. Therefore I believe it is possible to use CRF for network’s hosts’ behavior analysis [5]. Flow exporting from routers is an expensive procedure so routers use sampling (e.g.: trajectory sampling) and the missing data problem in that case can be solved with MCMC by sampling from the posterior distribution.

References


[1] E.M. Airoldi and C.N. Faloutsos. Advances in network tomography. Citeseer, 2003.
[2] A. Coates, AO Hero Iii, R. Nowak, and B. Yu. Internet tomography. Signal Processing Magazine, IEEE, 19(3):47–65, 2002.
[3] M.J. Coates and R.D. Nowak. Network tomography for internal delay estimation. In Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP’01). 2001 IEEE International Conference on, volume 6, pages 3409–3412. IEEE, 2002.
[4] C. Fragouli, J. Le Boudec, and J. Widmer. Network coding: an instant primer. Computer Communication Review, 36(1):63, 2006.
[5] T. Karagiannis, K. Papagiannaki, and M. Faloutsos. BLINC: multilevel traffic classification in the dark. In Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications,pages 229–240. ACM, 2005.
[6] G. Liang and B. Yu. Maximum pseudo likelihood estimation in network tomography. Signal Processing, IEEE Transactions on, 51(8):2043–2053, 2003.
[7] V.N. Padmanabhan, L. Qiu, and H.J. Wang. Passive network tomography using bayesian inference. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurement, pages 93–94. ACM, 2002.
[8] M.G. Rabbat, M.A.T. Figueiredo, and R.D. Nowak. Network inference from co-occurrences. Information Theory, IEEE Transactions on, 54(9):4053–4068, 2008.
[9] C. Sutton and M.I. Jordan. Inference and Learning in Networks of Queues. 3
[10] C. Sutton and M.I. Jordan. Bayesian Inference for Queueing Networks and Modeling of Internet Services. Arxiv preprint arXiv:1001.3355, 2010.
[11] Y. Vardi. Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data. Journal of the American Statistical Association, 91(433),1996.
[12] Y. Zhang, M. Roughan, C. Lund, and D. Donoho. An information-theoretic approach to traffic matrix estimation. In Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pages 301–312. ACM, 2003.

Related Posts: