Vous êtes sur la page 1sur 13

Communication 101: Information Theory Made REALLY SIMPLE

Claude Shannons 1948 paper A Mathematical Theory of Communication is the paper that made the digital world we live in possible. Scientific American called it The Magna Carta of the Information Age. Shannon defined modern digital communication and determined things like how much information can be transmitted over a telephone line, the effects of noise on the signal, and the measures you have to take to get a perfect signal on the other end. It made the Internet possible. Trouble is, its tough reading college level material for engineers and math geeks. HOWEVER Shannons concepts are simple and easy to explain. In just a few minutes youll understand Shannons concepts and youll see that any 7th grader can easily grasp them.

1. This is what a communication system looks like:

An encoder receives input and encodes a message according to the rules of the code. The code is transmitted across a communications channel.

The code is decoded by the decoder, also according to a fixed set of rules and the message is understood on the other side.

2. What is a Code?
The dictionary defines code as a system of symbols for communication. I define Information as: Communication between an encoder and decoder using agreed-upon symbols. Here, we are interested in digital codes. The most important thing in the system is the code itself. Here are three examples and one non-example: The Genetic Code, ASCII, the U.S. ZIP code, and sunlight:

As you see here, a code is a sequence of symbols that has specific meaning. DNA has a four-bit alphabet, the bits are A C G T. Three bits come together to form a letter called a triplet or Codon. The Codon GGG is an instruction to make Glycine. Its very important to understand that GGG (three Guanines in a row) are not themselves Glycine. They are symbolic instructions to MAKE Glycine.

In other words, the message and the medium are not the same thing. The message is stored in Adenine, Cytosine, Guanine and Thiamine, and those bases operate as symbols. When strung together these symbols form instructions to build proteins; to place those proteins in specific locations; and to build assemblies such as bones, arms, eyes and blood vessels. Interesting side note: Two other codons, GGC and GGA, also code for Glycine. Each amino acid can be represented by three different codons, not just one. This is called redundancy. DNA has a clever mechanism for reducing copying errors, namely that each amino acid has three Codons that code for it, not just one. It means that some copying errors (GGC instead of GGG) will not cause a problem. This is very, very important its why DNA has survived over 3 billion years intact. More about redundancy in a minute. ASCII is the standard computer mapping between 1s and 0s and the English alphabet. A = 1000001 B = 1000010 a = 1100001 b = 1100010 and so on for the whole English alphabet. When you punch the letter A on your keyboard, the keyboard sends a 1000001 to the computer. On the other side, 1000001 is interpreted as A and displayed on the screen. The 1s and 0s themselves are in turn coded into a series electrical impulses or magnetic bits on a hard drive. 1000001 might be stored on your hard disk as North / South spots of magnetism, i.e. NSSSSSN. The letter A is not an electrical impulse, nor is it a magnetic field. The electrical impulse or magnetic field represents A. This is what is meant when we say The message is independent of the medium its stored in. You can write the word dog with ink on a piece of paper but a dog is not ink or paper, and ink and paper are not a dog. A long string of DNA like ACGGGTCTTTAAGATG- that DNA pattern might build a claw, but that string of letters itself is not a claw and the claw is not a string of letters. One of the most common questions I get about DNA and codes is:

Why isnt sunlight a code? Why isnt radioactive decay a code? Why isnt H2O a code?
Sunlight is not a code because sunlight is just a stream of photons. There is no encoder in the sun. That photon does not symbolically represent some other thing. The sun does not send out digital streams of photons that obey the laws of a code. The photon IS sunlight, it does not SAY sunlight. It does not give instructions for making sunlight. It doesnt have any instructions at all. Its just a photon. It represents nothing other than itself. Now somebody will naturally say, Yeah, but I can look at sunlight through a spectrum analyzer and recognize that its sunlight. Yes, if you intelligently build a spectrum analyzer you can recognize patterns in the spectrum. However the spectrum is not a sequence of digital symbols. Sunlight is not a code and neither is a rainbow. A rainbow is a spectrum of light and it represents nothing other than itself. Radioactive decay is just atomic particles decomposing. There is no symbolic relationship. Same with water. Water is water. The word H2O is a code HUMANS have devised to describe water. But as a molecule it just has two atoms that we call Hydrogen and one atom that we call Oxygen. Thats it. The molecule doesnt represent anything other than itself.

3. Noise and Information Entropy


One of the most important things in a communication system is noise and how well the system deals with it. Noise destroys information. When we add noise to the communication system, heres what it looks like:

Noise introduces uncertainty as to what the original message was. It was originally 1000001 (A) but the receiver doesnt know that. It might think the message was 1000100 and give you a letter d instead. When Claude Shannon worked out the math, he found something very surprising: The formula for noise in an information system was identical to the formula for entropy in thermodynamics. Entropy is the irreversible process of useful energy becoming useless energy. The heat coming out of the exhaust pipe in your car is cooler and a whole lot less useful than the heat inside your engine, and that process is irreversible. All audio engineers know that noise is also irreversible. Once its in your recording, you cant get it back out. Its in there for good. All you can do is try to disguise it. Also, noise NEVER improves a signal. There are a few very narrow applications in digital signal processing where noise can be put to good use (i.e. dither and noise shaping) but noise always destroys information. It never creates it. Shannon measured information in bits, the exact same way that we measure the size of computer files. So one thing that confuses a lot of people is that when you add noise to a signal, it adds bit information to the signal and the signal appears to have more information. In one sense it does if you add noise to a signal, the signal does contain more data. But you cant separate the noise out of the signal once its in. Once your signal is lost its gone forever.

Also, there is no such thing anywhere in engineering or computer science as a percentage of the time that noise accidentally improves a signal. Nor is there an optimum level of noise that you would want in a signal. The ideal amount of noise to have in a signal is ZERO. Shannon pointed out that the best way to combat noise was through redundancy: Extra letters or numbers in the signal that help you fill in the blanks if there are missing letters. For example The quick brown fox jumps over the lazy dog is still somewhat readable even if 1/3 of the letters are missing: Th q ic br wn fo jum s ove the l zy dog Thats because the English language is about 50% redundant. You can usually figure out what the original sentence was as long as at least half the letters are still there.

4. Layers of Communication
Think about this for a second: Right now, you are seeing a detailed 2-dimensional image on your computer screen. Its an image of my blog on your monitor. But ALL the information you are looking at originally came into your computer on a wire or wireless network, via a 1-dimensional stream of 1s and 0s. How does a single 1D stream of 1s and 0s get turned into 2D or even 3D object? The answer is: Layers. One of the essential aspects of communication systems is that the codes, the encoders and decoders have layers.

Layers operate like this multiple encoders and decoders cascaded together:

Information is encoded from the top layer down, and it is decoded from the bottom layer up. Heres what I mean by this: 6 LAYERS OF ENCODING TOP DOWN: Meaning is expressed by sentences Which are made of words Which are made of letters Which are made of 1s and 0s Which are sent on wire via electrons. 6 LAYERS OF DECODING BOTTOM UP: Your computer reads the 1s and 0s from the wire And decodes them into letters The letters form words The words form sentences The sentences give Meaning.

Thats a linguistic explanation of layers.

Heres a more mechanical version of communication layers:


You type a message into your keyboard The keyboard encodes the message into ASCII characters Which go into an email Which is encoded into a TCP/IP packet Which is transported across the Internet via copper & fiber The packet is read from the copper & fiber The email is decoded from the packet The ASCII characters are turned into letters on a screen Which your friend reads on her computer.

Now I want you to notice something thats vitally important and overlooked by almost everyone:
Lets say you create a Microsoft Word document. You can edit text in Microsoft Word. But if you edit the file in a plain text editor thats NOT MS Word, you just see garbage. Then if you try to modify and save it, youll just wreck the file. If you want to successfully edit an email message, you have to edit it in an email program. If you want to change one of the information layers to make it say something different, you have to be IN that layer to change it. If you skip layers and try to make changes youll just wreck everything. This is why, in computer systems, there are error checking mechanisms in almost EVERY layer. Most people dont even know theyre there. Theres an error checking mechanism between the keyboard and the PC. There are error checking mechanisms on the hard drives, in TCP/IP packets, in your Wi-Fi and in your email program. There are also extensive error checking systems in DNA to make sure the data isnt corrupted. Because it only takes a very small injury to the packet to irreparably damage the whole thing, to the point where it cannot be decoded at all. Even a tiny flaw in a strand of DNA can cause a birth defect, for example.

So lets bring this back to biology.

These very simple ideas that Claude Shannon introduced have profound and farreaching implications for Origin of Life Research and Evolutionary Theory:
1. All communication systems rely on prior agreement between encoder and decoder, otherwise no communication can take place. This agreement between the two sides must be made in advance. It is abstract just like the symbols of the encoding / decoding table. The agreement begins as an immaterial idea, just like the information itself is immaterial. All communication systems that we know of are designed. There are no known exceptions to this. And if you stop and think about communication, its abstract by its very nature. Its symbolic. Symbols are immaterial. The symbols have to be chosen in advance. The meaning of the message is independent of the medium that carries it. The very existence of information overturns the materialistic worldview. Materialistic philosophy has no explanation for the existence of information. 2. Communication systems possess something that matter and energy alone do not possess: Information and symbols. Information is an entity on par with matter and energy. Norbert Weiner, the MIT Mathematician and father of Cybernetics, said, Information is information, neither matter nor energy. No materialism that denies this can survive the present day. 3. All communication systems are implicitly purposeful. In DNA, the purpose of GGG is to manufacture Glycine. In ASCII, the purpose of 1000001 is to transmit the letter A. In the Zip code, the purpose of 68450 is to make sure letters get to Tecumseh, Nebraska. 4. In communication theory, noise is always your enemy. It is NEVER your friend. Random mutation is noise. Noise is information entropy which is the irreversible destruction of information. Therefore random mutations by definition cannot be the source of new information in evolution. There has to be a different explanation for evolution. (People constantly say to me: But once in awhile noise could introduce a beneficial mutation. To that I say, Try it. Prove it. People whove actually done this with any real-life information system know better. Communication engineers definitely know better.) 5. Information is always created top-down, not bottom-up. Encoding is top down and decoding is bottom up. The highest layer is intent. The lowest layer is the matter or energy that stores the information, i.e. CD or bases in DNA or radio waves in a Wi-Fi network. (Electrical Engineers call this the Physical Layer. The term is taken from the OSI 7-Layer Model for computer networking) 6. Because information is created top-down, existing information has to be decoded first before it can be edited or changed in any beneficial way. Edits have to take place within the

layer that they are intended to affect. Edits made on the wrong layer, or noise added, only destroys the information packet. This means that within the genome, cellular genetic engineering must also be done top-down, not bottom-up. This completely overturns the traditional Darwinian assumption of random mutation. Random mutations ALWAYS destroy Internet packets and they always destroy DNA. Beneficial mutations are engineered by the genome via intelligent algorithm, not random mutation. Communication theory proves that living things were designed; that they are purposeful (teleological); that the information in DNA operates top-down not bottom up; and that evolution is internally directed by the cell and the genome, not by external damage from the environment. This turns all the former assumptions of materialistic biology upside down. Everything we know about the information age, computers and the Internet shows us that living things are designed and evolve according to an internal genetic program, not random chance. Perry Marshall

Ascii Codes Explained


Computers deal with numbers, not letters or punctuation, and so to get them to work with text we need to represent each and every character as a number. The text files that you read and write are actually stored, loaded into memory and manipulated as a sequence of numbers. When the computer displays the data on screen as text, it changes the numbers into characters so that humans can understand it. In order for text files to be reliably stored and processed by computers, it is important that they all interpret the data in the same way. To achieve this, a standardised mapping was created that defined what numbers should be used to represent all the characters in the english language. This mapping was called the American Standard for Information Interchange, or ASCII for short. Lower Ascii Table The Ascii table is split into two sections, the lower ascii table and the upper ascii table. The lower ascii table defines all numbers beteen 0 and 127 inclusive. This is the officially standardised section of the Ascii table and represents all the most common characters. The numbers from 0 to 31 inclusive represent non-printing characters, meaning characters that are not directly displayed. These characters control how the data should be interpretted. They were designed for when text was processed by teletype terminals and as such needed characters to represent actions such as ringing an audible bell, and signalling the beginning or end of a data transmission. Although a lot of these characters are no longer of any use, some of them have become common place.

The carriage return is the character used by the Macintosh to represent a new line. On UNIX (and in some Mac OS X text files) the line feed is the standard way of designating a new line. On Windows, both a carriage return AND a line feed are used as a pair! The other commonly used control characters include NULL, which often indicates the end of a string of text, tab which is used to assist the layout of text by indenting it and backspace which deletes the previous character. All the rest of the characters above 31 in the lower ascii table are printable, except for forward delete, an apparently late addition which, means to delete the next character. Lower Ascii Table

Upper Ascii Table The upper Ascii table has not been officially standardised, and tends to vary from computer to computer and from font to font. Some characters are more standard than others, and your milage may vary, but here's the table for the Macintosh Courier font. Upper Ascii Table

Working with Ascii

Vous aimerez peut-être aussi