jueves, 24 de enero de 2013

1st Assignment: Channel Simulation

Link to the Github repository: https://github.com/synnick/infotheory

General Description:


For this assignment we had to simulate a noisy channel where binary words would be transmitted. There are some factors that will determine whether the output word after passing through the channel will be the same as the input word, like the length of the word, the probability, the generation of the word, and the probabilities of converting ones to zeros and viceversa. Here I will test all of this using a Python script and GNUplot to generate graphics.

The channel, how does it work?

The Word Generator


First, we need to have a word that will be transmitted through the channel. This is a binary word, so we could as easily as represent it as a string of 1's and 0's like "101101011' or use integers and move bytes to add zeros and ones. Now two key important things with the word is: the length and the frequency of 0's and 1's. The length of the word is obviously the number of ones and zeros that the word has (and this affects the outcome, as I will explain later). The frequency is the probability of apparition of zeros and ones, so let's say we have 0.3 probability of a '1' showing up, then if the word length is 10, running an infinite number of times the generation of words with that probability would make three '1's and seven '0's (if we do not get '1', of course the other result is '0'). Then for every bit that we are generating a Bernoulli trial will be used to determine whether a zero or one is generated.



The next thing is to "transmit" this word through a noisy channel. To simulate this we need to use the Q table, which looks like this:



Where  p is the probability of a '0' transmitting as a '1' and q the probability of a '1' transmitting as a '0'.

The probability of transmitting correctly is the difference between 1 and the probability of changing for both cases. So we could describe it easily as the following drawing:
So the parameters for the channel would be: a word, the probability of changing from one to zero, and the probability of changing from zero to one. With this parameters the channel will simulate the transmission of every bit in the binary word, running Bernoulli trials again with the p and q probabilities.

This of course would not be very significant if we run it only one time, because we are using random numbers to determinate the probability, and even if the random numbers are generated uniformly, we have to make sure we can get an average of the error in the transmission of a certain word. That so, the same word will be transmitted 30 times through the noisy channel, and we will count the number of times that the word was not transmitted successfully (or the number of times it was transmitted successfully, doesn't really matter). 

With this we get a nice average of how that certain word is transmitted, but that's just one word of that certain length. So again we will do the same with other 30 words of the same length, running them 30 times each and counting the errors. 

The length also will be varying, starting from 1 to 30. That's a lot of loops that the program has to do, but we need to do this to make sure the experiment is done correctly.

All of the above, will be run one time each for a set of different variables that will change to see what variable affects what. The variables are: The probability of generating a one, the probability of changing from one to zero, and the probability of changing from zero to one.

Code

Some people did all this with a bash/AWK combination that automatically runs the python script with varying parameters, and then plots the results. What I did is to save the desired parameters in a file, in the format:

0.5 0.2 0.2
0.5 0.4 0.4
... ... ...
Where the first column is the probability of generating a one, second the probability of changing from one to zero, and third the probability of changing from zero to one.

This will run all the tests mentioned before with said parameters saving the data in the file 'output.dat'. Then we can plot the data as we wish.



Plots

Simulating various probabilities and word length's we obtain the following plot, measuring the error:

The error for every set of variables, starts low, and goes increasing very fast. For the range between 20 and 30, the probability of error is almost always 1.

And measuring the success:

This is the opposite as before as we can guess. The success is high with small words, and goes decreasing as the size increases.


Gnuplot code



Conclusions

Pretty much everything affects the outcome of the channel, but certain combinations prove to have the most great effect. We can easily see in both graphics that the word length increases the error and decreases the success, respectively, so there is not arguing that the word length affects the outcome.

The probability at the word generation also affects because if we have a high probability of generating a '1', and a high probability of a '1' changing to '0', then there is also a high probability that many ones will become zeros. It is also possible the other way, having a low probability of generating a 1, and a low probability of changing '1' to '0', reducing the probability of error in that bit.

1 comentario: