Information Theory

Chapter 1
Introduction
1.1
Introduction
Modern age is age of Information.

Information is everywhere encapsulated as letters, words, sentences, prose,
sounds,signals and pictures.
Everyone sees information differently but generally, Information is the
level of abstraction succeeding data and preceding Knowledge. Human beings perceive information as critical tool for decision making especially in
uncertain environment. Human ability to gather and process information
coupled with communicating the ideas to other similar entities played a major hand in development and establishment of human rule over other species
which are more strong and sharp.
Information can be seen as Level of Abstraction- Information is the level of abstraction , obtained
after processing on data. Data could be any sensory input and processing can be done by mapping data on context.
Measure of Disorder - Information measures the disorder of the system.
The measure of information is Entropy, which is again very similar to
measure of disorder in any system.
Reduction in Complexity- Gathering information reduces the complexity of the system. Thus information is a tool for representing complex
systems and its process
Tool for control and transform-Information about any system provides
the knowledge of its state and change.When system undergoes a change
it produces information. Conversely providing specific information to

the context can influence the behaviour and produce required change
Mode of communication- Information is a mode of communicating ideas
, thoughts and expectations
1.2
Information Theory
Information theory is a branch of applied science which uses techniques from

Mathematics, Physics, and Engineering to quantify information and device
methods of communicating, storing, retrieving and processing information.
Early attempts to describe Information on basis of philosophy and science
can be dated back to Leibnitz, however major breakthrough for information
theory came when C. E. Shannon published his seminal paper A Mathematical Theory of Communication in Bell Systems Technical Journal in
1948.
Shannons work was based on previous attempts of R.V. Hartely and H.
Nyquist, which are also published in same journal 20 years ago. Shannons
work defined the subject in new intuitive way and provided a solid theoretical
background on which many applications and theories are build.
The other noticeable attempts to define information were from Kolmogrov,
Weiner and Fisher
Shannons paper not only provided mathematical basis for information
communication but also proposed a systems approach to communication
model. In 1948 W. Weaver published an article describing the work of Shannon and it was this interpretation of Shannons theory which extended it to
other fields like philosophy and management.
1.3
Weavers Interpretation of Shannons work
Shannons background was in Electrical engineering and hence he is concern

only in the technical aspect of communication problem. However the article
of Weaver presented the whole idea of communicating information in different
ways including philosophical implications of communicating information.
Weaver classified the problem of communication into three categories,
namely technical, semantic and control.
Problem Class A The Technical Problem- Problems related to communicating message from one entity (Sender)to other entity (Receiver).
(Shannons Work)
2
Problem Class B The Semantic Problem- Problems related to the semantic aspects of message or information i.e. the meaning of the message
received. (Floridi & Zadeh - Fuzzy Information theory)
Problem Class C The Control problem - Problems on the study of effect
of information communication on the behaviour of receiver i.e. does
the communicated message induce the desire behaviour of receiver or
not? (Wiener, Kolmogrov, Chaitin and Vigo)
In the above classification technical problems concerns with the accuracy
of transference of sets of symbols, from sender to receiver .The set can be
discrete set (like written speech), continuous varying signal (like voice or
music) or continuously varying two - dimensional pattern (like television).
The second class of problems i.e. the semantic problem has deep significance in daily life. Almost all human communication is based on semantic
information, like language (set of symbols which produce different meanings
with different combinations) or body gestures etc. Semantic problem is also
a difficult problem to handle only on the basis of technical aspects of information theory.
Class C problems are logical extension of problems of class B in the sense
that after understanding the content of the message dose the receiver behaves
in the desired way? The question of control through information is topic of
intense research in humanities and cognitive sciences.
Weaver (page 5) also argue that all the three levels of problem are interrelated and the technical side is the heart of any solution we propose
to solve the communication problem. Hence sending message correctly and
completely without errors is the first important step.
1.4
Structure of this Review
Information theory, from its conception has became a potent subject of study
and research. It is now interpolating with other areas of sciences like Computer sciences, Fuzzy theory, Electronics Engineering, Systems theory to address some of very old problems of various applied subjects and in similar
nature it is merging with Cognitive sciences, Psychology, Communication,
Management, Information systems,Economics to develop theories of human
interaction and nature. Presented review on Information Theory is an attempt to trace its history, its present status and see its future potential both
in Applied Sciences and Humanities. The structure of this review is based on
the classification of Weaver (See section ) and the particulars are following -
Chapter - I - Introduction - Presents a general overview of Information

theory, classification of problems related to information theory
Chapter - II - The Technical Problem - The technical problem and its
description. Works of Shannon, Kolmogrov, Weiner. Entropy and its
relation with information. Improvement on Shannons theory. Application in other areas like Cryptography and Coding.
Chapter - III - The Semantic Problem - Overview of Semantic Problem. Various views on the Meaning of message. Attempts of quantification of semantic content - Semantic and Fuzzy Information theory.
Chapter - IV - The Control Problem - Future of Information Theory
- Controlling aspects of information. Information in general - representation of complexity of system. Application of various theories in
problems related to Science, Engineering and Humanities.
*************
Chapter 2
The Technical Problem
In 1948 C. E. Shannon published A Mathematical Theory of Communication
in Bell Systems Technical Journal. It was not a first attempt of proposing
mathematical framework for communication, but it was by far most complete and comprehensive theory of communication. According to Verdu,
Shannons discovery of the fundamental laws of data compression and transmission marks the birth of Information theory. Shannons theory was an
attempt to unify various fields like Probability, Statistics, Computer Sciences etc. The proposed theory still continues to be an inspiration to new
findings in field of Information communication. In this chapter we will see
the main achievements of Shannon and the applications of Shannons theory
in other fields.
2.1
Attempts to Define Information Mathematically
Before 1948 major modes of communications which existed were Telephone,

telegraph(Morse Code), Wireless Telegraph, Television, FM and AM Radio.
In each of above methods there are some elements of Information theory, like
Morse code is based on efficient way of encoding information based on the
frequency of the symbols to be encoded.
In 1924 H. Nyquist first proposed use of logarithm for defining the transmission rate
W = K log m
where W = Speed of transmission of intelligence (Number of characters which
can be transmitted in a given length of time), m = number of current values,
and K is a constant.
5
Nyquist also discussed the possibility of transmission gain by replacing Morse code by an optimum code. Similar studies were conducted by
K
uppfm
uller, Kotelnikov, Whittaker, and Gabor targeted towards the signalling speed or bandwidth manipulation. R. Hartelys paper in 1928 introduced various terms like capacity to communicate information, rate of
communication and most importantly the quantitative measure of information.
He defines the amount of information on the basis of available choices of
messages for the sender. For, if there are n selection states and s number
of symbols available in each selection then the amount of information H is
defined by
H = n log s
In above mentioned attempts there are two prominent features - One, Information is defined as statistically, on the basis of frequencies of letters and
words (it was useful for Cryptography) and second there was no study for the
effect of noise(i.e. unwanted information or disturbance caused by medium
of communication).
During WWII need of providing information with increased rate and better security measures was immense.There were several attempts to build
theory of communication including trade-off between transmission rate, reliability, bandwidth and signal - noise ratio. Prominent names were Norbert
Weiner, A.N.Kolmogorov R. A. Fisher and C. E. Shannon.
In his ground - breaking book Cybernetics, Weiner classifies the interests
of his peers. He writes
The idea of [probabilistic] information occurred at about the
same time to several writers, among them are statistician R.A.
Fisher, Dr. Shannon of Bell Labs and the author. Fishers motive in studying the subject is to be found in classical statistical
theory, that of Shannon in the problem of coding information
and that of the author in the problem of noise and message in
electrical filter. Let it be remarked parenthetically that some of
my speculations in this direction attach themselves to the earlier
work of Kolmogorov...
Shannons and Weiners works had appeared simultaneously, delayed by war
but accepted and popularized very quickly both in theoretical and applied
realms. Today information theory is synonymous with Shannons communication theory however in early years it was also known as Weiner - Shannon
Communication theory
2.2
The Birth of Information Theory - Shannons Work
In 1945 Shannon wrote a technical report on information systems and use

of cryptography. This report contains the phrases Entropy, Information
theory which were later defined in the paper published in 1948 entitled A
Mathematical Theory of Communication.
The publication was at that time mo comprehensive and had a strong
basis of mathematical theory.On the onset of the paper Shannon asserted
that the semantic aspects (meaning) of communication are irrelevant to
engineering problem . However he made an important note of the basic
structure of language helping in coding and communication of information.
In Shannons words We can think of a discrete source as generating the message
symbol by symbol. It chooses successive symbols according to
certain probabilities depending, in general on preceding choices
as well as the particular symbols in the question.
Thus Shannon had found a way to exploit the redundancy of language
and based on this, he proposed the mathematical model of a communication
system which produces sequence of symbols on the basis of probability( as a
stochastic process - specifically a Markov Process).
Major breakthrough of Shannon was to define the measure of information i. e. at what rate a Markov process generate information. Let us
consider a single random variable with n possible outcomes with probabilities p1 , p2 , p3 ..., pn , then the measure of choice, uncertainty or information is
given by
n
X
H=
pi log pi
i=1
The quantity H, is known as Entropy 1 . Entropy measures the average uncertainty in the message. The unit of measure of information is known as
bit2 . The important properties of this measure H are as followed 1. H is continuous in pi .
2. H is positive - since pi 1 log pi 0.
3. H = 0 if and only if one and only one of pi s is one and all remaining
are 0 i.e. if we are certain about the outcome then the entropy is zero.
1
2
The name was suggested by John Von Neumann

Suggested by John Wilder Tukey
4. If all pi are equal to 1/n then H is maximum equal to log n i.e. when
every outcome is equally likely then the entropy is maximum.
5. If x and y are two events with m and n possibilities respectively. If
p(i, j) is the probability of joint occurrence of xi and yj then the entropy
of joint event is given by P
H(x, y) = i,j p(i, j) log p(i, j)
P
P
H(x) = i,j p(i, j) log j p(i, j)
P
P
H(y) = i,j p(i, j) log i p(i, j)
6. Any operation which equalize the probabilities, will increase H.
7. The conditional
P entropy of y, is defined by
Hx (y) = i,j p(i, j) log pi (j) Also we can write
H(x, y) = H(x)+Hx (y), where x and y are not necessarily independent.
2.2.1
Model of Communication System
Shannon has also presented a model for a general communication system,

which comprises of five parts (see fig.)
Figure 2.1: Schematic Diagram
Source- An information source is a producer of messages to be send. A

message can be discrete sequence like written letters, or a function of
time like telephone, or may be function of several variables or several
functions like television signals.
Transmitter- Transmitter converts the message into signals which can be

transmitted over the channel. For example in telegraphy words are
changed into sequences of spaces, dashes and dots.
Channel- Channel is a medium used to transmit the signal from the transmitter to the receiver.Example - transmission lines for telegraph and
telephones.During transmission the signal may be disturbed by unwanted signals known as noise.
Receiver- Receiver performs a reverse operation on incoming signal to convert it into message, it is a reconstruction of message from received
signal.
Destination- Destination is the ultimate end of communication system, a
person or entity for which the message was sent.
2.2.2
Information and Entropy
The quantity which measures

P the information in any message is known as entropy, denoted by H = pi log pi . The quantity is similar to the homonym
measure of randomness in physics. The entropy in Physical sciences was introduced by Clausius about two centuries ago. The works of Boltzmann
and Gibbs then firmly establish the concept in Statistical Mechanics and
Thermodynamics. In words of A. Eddington The second law of thermodynamics - the law that entropy always
increases - holds the supreme position among the laws of nature
P
The resemblance
fi log fi and informational enP of physical entropy H =
tropy H = pi logpi has deep significance in Information theory. In physical sciences the entropy measures the degree of randomness or the disorder
of the system. It is fundamental property of any dynamic system to become
more and more disordered or shuffled. This property gives the time its arrow.
Similarly the informational entropy measures the average surprise in the
message. If a message contains very discernible pattern of letters repeatedly
then the it contains very less information because we can predict the next
string of letters.
2.2.3
Some Important theorems from Shannon
Shannon has not only classified the source of message but provided efficient
ways to encode the communication and data compression on the basis of
entropy. According to Shannon H is approximately the logarithm (base 2)
9
of the reciprocal probability of a long sequence divided by the number of

symbols in the sequences. Following theorem is given Theorem 2.2.1 (Shannon) Given any > 0 and > 0, we can find an
N0 such that the sequences of any length N N0 fall into two classes:
1. A set whose total probability is less than .
2. The remainder, all of whose members have probabilities satisfying the
inequality

log p1

<
H
N

Coding and Data Compression
In above theorem the second class of sequences are known as typical sequences. The probability is decreasing exponentially with increasing blocklength. This fact is useful in the data compression and coding, we can neglect atypical sequences and code the typical sequences assuming them to be
equiprobable.The resulting encoding of string of N symbols will be a longer
string of HN , increasing the length will delimit the probability of failure of
recovery of signal as small as possible.The above theorem provides a suboptimal coding method, thus in next theorem Shannon provides a criterion for
optimal coding.
Theorem 2.2.2 (Strong Converse Source Coding Theorem) We can
define n(q) to be the number of most probable sequences of length N , which
accumulate to give probability q, then
log n(q)
=H
N
N
where q does not equal to 0 or 1.
lim
In other words, as long as we require probability of error strictly less than 1,

asymptotically we cannot encode the rates below the entropy. The weak converse theorem states that error probability cannot vanish if the compression
rate is below the entropy.
Channel Capacity
The channel capacity is defined by the maximum possible rate at which a
system can transmit information. Mathematically we define channel capacity
of a discrete channel by log N (T )
T
T
C = lim
10
where N (T ) is the number of allowed signals of duration T.

Theorem 2.2.3 (The Fundamental Theorem of Discrete Noiseless Channel)
Let a source have entropy H (bits per symbol) and a channel have a capacity
C (bits per second). Then it is possible to encode the output of the source in
C
such a way as to transmit at the average rate H
symbols per second over
the channel where is arbitrarily small. It is not possible to transmit at an
C
.
average rate greater than H
Discrete Channel With Noise
When we transmit a signal, the receiver gets the signal plus some perturbations due to medium, which is known as noise.Thus the received signal cannot
be completely reconstructed into original transmitted signal.Noise can be distortion, disruption or overlapping of incoming signal. Distortion is easy to
remove because of invertability of function, however the cases where signal
does not undergo the same change in transmission, noise has to be compensated by other methods. If E is the received signal, S is the transmitted
signal and N is noise then E = f (S, N ), E is function of S and N .
When we fed a noisy channel a source there are two statistical processes,
the source and the noise. If H(x) is the entropy of source and H(y) is the
entropy of receiver, when there is no noise then H(x) = H(y). If H(x, y) is
the joint entropy then the following relation is true for a noisy channel H(x, y) = H(x) + Hx (y) = H(y) + Hy (x)
If the channel is noisy then the incoming signal cannot be reconstructed into
transmitted signal with certainty by any operation on E.However there are
ways to guess or fill the missing information. According to Shannon the
conditional entropy is the apt measure of the missing information, thus the
actual rate of transmission R is given by
R = H(x) Hy (x)
,which is the amount of information sent less the uncertainty of what was
sent.The quantity Hy (x) is known as equivocation, which measures the average ambiguity of the received signal. Following theorem uses the fact of
equivocation and proposes the use of a correction channel, which enable the
receiver to measure the noise and correct the error.
Theorem 2.2.4 (Correction Channel) If the correction channel has a capacity equal to Hy (x) it is possible to so encode the correction data as to send
it over this channel and correct all but an arbitrarily small fraction of the
errors. This is not possible if the channel capacity is less than Hy (x).
11
Channel Capacity of Noisy Channel

The channel capacity of a noisy channel is the maximum possible rate of
transmission or the source is matched to the channel C = M ax(H(x) Hy (x))
, where the maximum is taken over all possible inputs. If the channel is
noiseless then Hy (x) = 0.
Theorem 2.2.5 (Fundamental Theorem of Discrete Channel with Noise)
Let a discrete channel have the capacity C and a discrete source the entropy
per second H. If H C there exists a coding system such that the output
of the source can be transmitted over the channel with an arbitrarily small
frequency of errors or arbitrarily small equivocation. If H > C it is possible
to encode the source so that the equivocation is less than H C + , for arbitrarily small . There is no method of encoding which gives an equivocation
less than H C.
2.3
Application of Shannons Theory & Improvements
In this section we shall discuss the application of Shannons theory and subsequent improvements by other significant contributors. In 1949, Shannons
other paper Communication Theory of Secrecy Systems was made public. Together these two papers Shannon had established a deep theoretical
foundation for various branches of digital communication - data compression,
data encryption and data correction (Gappmair).
Coding Theory
The fundamental problem in communication engineering is ideal encoding, i.e.
how to transmit as many bits of information as possible while maintaining
a high reliability(Jan Kahre).
Coding Theory is the study of codes and

their various applications. Codes are used for data compression (known as
source coding),cryptography,digital communication,network communication
and error correction (channel coding).
Data Encoding is process of conflicting goal - more information can be
send but the accuracy will be compromised. The second part of Shannons
paper [MTC]addresses the issue of encoding the information in optimum way.
According to Shannon the objective of encoding is to introduce redundancy
12
so that even if some of the information is lost or corrupted it will still be

possible to recover the message at the receiver (Error correcting codes)
Shannon proposed (1957) a new measure, namely Mutual Information
which measures the information about one random variable by observing the
another variable. If x and y are two random variables, then the mutual
information of x relative to y is given byIy (x) =
p(x, y) log
x,y
p(x, y)
p(x)p(y)
The following relation is true for mutual information Iy (x) = H(x) Hy (x),
where Hy (x) is equivocation. Thus mutual information is the gain in the
information coding of x, when we know y, compared to when we do not
know y.
Shannon completed the theory of coding with following two theorems Theorem 2.3.1 (Source Coding Theorem) The number of bits necessary to uniquely describe any data source can approach the corresponding
information content as closely as desired. If there are N random variables
independently distributed with entropy H, then these data can be compressed
into N H bits with negligible risk of information loss (as N ), but conversely if data are compressed into fewer bits than N H, then there will be
some data loss.
Theorem 2.3.2 (Channel Coding Theorem) The error rate of data transmitted over a band - limited noisy channel can be reduced to an arbitrarily
small amount if the information rate is lower than the channel capacity.
In his attempts Shannon did not proposed any algorithm to encode information but in subsequent years growth of digital communication saw the
rise of coding theory as separate branch of investigation in engineering and
science. Shannons Channel coding theorem had predicted the forward error
correction schemes,invented in 1967 by A. J. Viterbi. Similarly more data
compression methods(both lossless and lossy) have been found after emergence of mobile communication.Following figure (adapted from Gappmair)
presents an overview of various coding schemes.
2.3.1
Entropy and its Generalization
The relation of physical entropy and informational entropy goes very back in
history. In 1860 Maxwell proposed a thought experiment by name Maxwells
Demon. In this experiment Maxwell proposed a bifurcated chamber with
13
Figure 2.2: History of Coding

gas molecules moving randomly. The partition has a gate with hinge which
can moved by a demon with negligible work. The demon watches molecules
and stops the fast moving ones to move from side A to side B and similarly
stops slow moving ones from B to A. After finite time the side A will have
only fast moving molecules and side B will have only slow moving molecules.
Thus the entire system is more ordered, hence lesser entropy. This violates
the second law of thermodynamics - i.e. the entropy of system does not
decrease without doing work.(See 2.3)
The thought experiment by Maxwell was controversial at that time and
was interpreted in various ways to overcome the violation of second law. One
of the interpretation was given by Leo Szilard, in which he proposed that the
demon has to gather information about the speed of gas molecules in the
chamber,hence gathering information and using it generates the necessary
amount of work which reduces the entropy of system.
P
Shannon has proposed not only the measure of information H = pi log pi ,
but also defines the maximum and minimum cases. The maximum possible
entropy Hmax = log n is attained when each possible outcome (n messages)
has equal probability, given by n1 . This case is also known as white noise. The
opposite case (single message) is obtained when there is only one message
has probability 1 and all other n 1 messages have zero probability, i.e. we
14
Figure 2.3: Maxwells Demon - A thought Experiment

can send only one message, then Hmin = 0.
Information entropy gives a value for only the potential for information
as we go from one letter in the text to the next letter.The value of entropy
denotes the average choice available for the sender about the massage. A
high value of H does not denote high information content, because higher
uncertainty may be caused by either noise or the message is entire gibberish. As discussed earlier in the chapter the semantic aspects of message is
neglected in the technical theory of communication,but Shannons another
article in 1951 describes the counter balance of entropy (uncertainty) by inherit property of communication medium i.e. language, which is known as redundancy. If sender selects some part of the message freely then remaining
part is filled by the redundancy(language structure,grammatical rules and
rules of word formation).
Thus the communication process, as described by Shannon, is more so a
statistical process of selecting and sending messages from a set of possible
messages, and hence it is suitably defined by a measure based on the probability. There are other entropies defined by various eminent researchers, in
which some are refinements of Shannons entropy and some are generalization of entropy function or communication process. Following is the list of
various (information)entropies Shannon Entropy Continuous case- If information is coming from a continuous source, with probability distribution P = f (x), then the con-
15
tinuous equivalent of Shannons entropy is given by Z

H(P ) =
f (xi ) log f (xi )
R
enyi Entropy - Renyi entropy is a generalization of Shannons entropy,
which is used for measuring information from weak signal (of lower
probabilities), which are overlapped by stronger signals (of higher probabilities).We define Renyi entropy by
H =
n
X
1
log(
pi )
1
i=1
Leaving inclusion of , the Renyi entropy works just as Shannons entropy (equals to Shannons entropy when = 1). It has maximum
H (max) = log n at pi = 1/n, and satisfy the additive property. The
inclusion of parameter makes it more sensitive to selective frequencies.
Tsallis Entropy - Tsallis Entropy is defined by
n
X
1
T =
(1
pi )
1
i=1
Min-Entropy - Min - Entropy is defined by the limiting value of Renyi
Entropy as . It is given by H = log maxxX p(x)
2.3.2
End Note on the Chapter -
As it is clear from Shannons theory and its subsequent improvements, by

Shannon and other scientists, have strengthen digital communication and established information theory as a important subject of research and study.Before
Shannons initial attempt there was no useful definition of information and
neither was any yardstick to measure information content in communication.
Shannon introduced terms like Entropy, Channel Capacity,Encoding,
and also produced various results which became pillars of the subject. Shannon termed the paper A Mathematical Theory of Communication, but he
later termed the theory by nformation theory. Shannon also stated that
the theory is gaining immense popularity which was overwhelming, and cautioned the general research community about the bandwagon effect. The
16
theory itself presented few unanswered questions about the meaning and
control aspects of communication.
Shannon has declared that the mathematical/ technical aspect is free from
the semantic aspect of information theory, however to apply the Shannons
theory in fields like Economics, Psychology or Biology, one cannot overlook
these aspects.For Shannon the semantic aspects are left due to his believe that
it plays no part in technical side of theory. Weaver in his paper cleared the
interconnection between technical, semantic and control side of information
theory and how they are related to each other, and in the heart of this is the
technical or mathematical aspect of communication.
*************
17
Chapter 3
The Semantic Problem
The credit to establish Information Theory as a discipline of study goes to
C. E. Shannon and Norbert Weiner. However, making the theory famous
and accessible to general scientific community was the attempt of Warren
Weaver who published The Mathematics of Communication in Scientific
American Journal in 1949. Weaver defines Communication in a broader
sense, a process by which one entity may effect another entity.This process
may be of three types - Technical - sending and receiving messages, Semantic
- meaning of the message, and Influential - controlling the behaviour of
receiver of message through communication.
Shannons theory is attempt to establish rules for the syntactic part, but
it neglects the semantic and control aspects. Semantics is important aspect of
communication, specially in human communication where we have to convey
our thoughts to similar entities, or we have to understand the meaning of
what the others are communicating.
In 1938 C.W. Morris published a book on the theory of signs titled Foundations of Theory of Signs. In this he describes not only the communication
process of humans by signs and symbols but all organisms capable of communication. Human communication depends partially on language. Language
is collection of symbols, groups of symbols and the rules of grouping the symbols. Language can be understand as a coding method , which one human
use to encode his/her thoughts and communicate. Other methods of communicate are performing arts (music, dance, acting,painting etc.) and body
language.
Morris defines the process of communication through signs, semiotics 1 , as
a universal process and an interdisciplinary undertaking.Morriss foundation
divides semiotics into three interrelated discipline 1
A version of word Semiosis given by Ch. S. Peirce
18
1. syntactics- the study of the methods by which signs may be combined

to form compound signs.
2. semantics- the study of the significance of signs.
3. pragmatics- the study of the origins, uses, and effects of signs.
Thus semantics is the study of combinations of signs which indicates some
meaning to receiver of the message.The success of communication depends
on the fact that it generates the desired behaviour in the part of receiver or
not.
3.1
Understanding the Semantic Problem-
Weaver writes in his paper The semantic problems are concerned with the identity or satisfactorily close approximation, in the interpretation of meaning
by the receiver,as compared with the intended meaning of the
sender. This is very deep and involved situation, even when one
deals only with the relatively simpler problem of communicating
through speech.
Above quote sum-up the semantic problem adequately - it concerns with
understanding on the side of receiver, where the meaning is intended by the
sender.The complexity of this problem can be illustrated by simple example.
Suppose A asks B Do you understand me?, and B replies No, then we
cannot be certain that B has not understood the question or the meaning of
the question.
Thus semantic problem of communication can be interpreted as problem of approximation (understanding as close to the intended meaning) and
coding (encoding understanding/meaning in one mode of communication to
other mode.) Weaver also introduce elements of meaning and effectiveness
in the schematic diagram proposed by Shannon(See Figure3.1)
The schematic diagram now contains following new blocks Semantic Coding - Semantic coding becomes the first coding of information into message, it can be mode of expression of intended meaning
by sender, generally it is language(but can be any means of expression,
like music, art, painting, writing etc).
Semantic Noise - The unwanted (or essential) addition in the information
content of the message to make it receivable or understandable. It
19
Figure 3.1: Modified Communication Scheme For Semantic Messaging

works like engineering noise present in the medium, but it affects the
semantic content of the message.The noise is produced on the part of
sender, and often it is in form of redundancy.
Semantic Receiver/decoder - The semantic receiver works at the side
of receiver, which takes the message and decode it on the semantic
basis.For example - Voice of known person works differently on us,
while some stranger shouting our name works differently.
The theory of semantic problem thus works differently from that of technical problem. The technical problem is independent of mode of message
(language, music, pictures etc.).It is concerned with how we can send the
content (message), from one point (sender) to another point (receiver), efficiently and quickly?, however, semantic problem starts with the selection
of communication mode and continues to till the intended message is understood by the receiver and the required response is generated (at this point it
merges with influential problem) The main questions in semantic information
theory are 1. How to quantify semantic information?
20
2. How can semantic information theory help in data compression and

encryption (reliable communication)?
3. How engineering coding is related to semantic coding?
4. Identifying and quantifying semantic noise.
5. Are there any bounds in semantic coding, analogues to Shannons classical information theory?
6. How we can improve semantic communication where the content of
message matters?
From Shannons theory ( Information is sequence of bits), Semantic theory
is whole different approach (Meaning or content of message).In next section
we shall see some existing theories of semantic information.
3.2
Theories of Semantic Information -
Theories of semantic information are applied in those communication cases

where the semantic content is important.The endeavour of formulating theory
of semantic information started soon after Shannon proposed his communication theory (generally known as classical information theory).Most notable
were C. S. Peirce (1918),Donal MacKay (1948),Bar-Hillel & Carnap (1952),
L. A. Zadeh (1971-),Floridi (2000-) etc. Following are some theories which
try to capture the essence of semantics-
3.2.1
Classical Semantic theory - Bar-Hillel & Carnap
In 1952 Yehoshua Bar-Hillel and Rudolf Carnap presented Theory of Semantic Information. It was the first attempt to define semantic content in
message. The highlight of (Classical) Semantic theory (CST) was that it
uses similar ideas that of Shannon and defines measures of semantic content
useful for application in technical as well as non - technical problems.
The CST applies on a formal system of language, denoted by Ln .It denotes a universe which includes entities, formal statements (predicates), logical connectives and rules to form compound formal statements.Statements
can be Basic or Ordinary. A basic statement is predicate applied to an
entity(Ram is tall ) and an ordinary statement is combination of basic statements (Ram is tall and Laxman is handsome.).Furthermore, we have state
descriptions, which are statement about the universe in terms of predicate
applied to every entity of the universe. There are infinite number of state
21
description possible. If we take any ordinary statement and count the number of state description which are made false by it, then this number gives
the information content of the statement.
If more state descriptions are made false by the statement, its information content is that much higher. A tautology (true by definition) does not
disapprove anything, thus it contains zero information. A self-contradictory
statement, disapproves everything, hence it contains maximum information
content. Statements, which are logically indeterminate rule out some possibilities, thus they have some level of content.More particular a statement,
more it rules out possible states and it has more information content. 2
To measure the information content, Bar-Hillel and Carnap used logical
probability instead of classical probability (used by Shannon). For a hypothesis Hi to be probably true, the empirical evidence is E, then the degree of
support of Hi by E is known as logical probability of Hi given E. Logical
probabilities are based on logical relations between propositions.It is denoted
by m(A) = TTAS , where TA = total no. of times when A is logically true, TS =
Total no. of logical states.
With use of logical probability of state description CST proposes two
information measure, based on Inverse relationship property. These measure
are defined as followed 1. Content - Denoted by Cont(A), the content measure, measures the
amount of content in the logical statement denoted by A, and it is
defined by Cont(A) = 1 m(A),m(A) being the logical probability of
A.
2. Information - Denoted by In(A), the information measure, measures
the semantic information of the logical statement, and it is defined by
In(A) = log
1
1
= log
= log m(A)
1 Cont(A)
m(A)
Following are some properties of above two functions 1. log is logarithm base 2.
2. For a basic statement B,Cont(B) = (1/2) and In(B) = 1.
3. For conjunction of n basic statements Cn , Cont(Cn ) = 1 (1/2)n
4. For disjunction of n basic statements Dn , Cont(Dn ) = (1/2)n
2
The inverse of information content is Range, the number of states which are confirmed
(implied)by a statement.
22
5. 0 In(A)
6. In(A) = 0 if and only if A is logically true.
7. In(A) = if and only if A is logically false.
8. In(A) is positively finite iff A is factual.
9. If A logically implies B then In(A) In(B)
10. In(A) = In(B), iff A is logically equivalent to B.
11. In(A B) = log Cont( A B)
12. In(A B) = log Cont( A B)
13. If A and B are inductively independent then In(AB) = In(A)+In(B)
14. The relative content of B with respect to A is defined by Cont(B/
A) = Cont(A B) Cont(A)
15. If A and B are inductively independent then In(B/A) = In(B)
Bar - Hillel and Carnap also define estimation of amount of information. If
H = h1 , h2 , , hn is an exhaustive system on a given evidence e, then the
estimate of the information carried by H with respect to e is given by the
formula
n
X
Est(In, H, e) =
c(hi , e) In(hi /e)
i=1
where c(hi , e) is the relative inductive probability.

Critique on CST
Bar -Hillel and Carnap proposed CST in lines along Shannons classical information theory. However following are some limitations of CST1. CST applies only on a formal universe which includes only logical statements.It does not includes communication between two entities.
2. CST provides clear definitions, but it does not apply with practical
aspects of communication.
3. CST tries to measure possible meaning of set of propositions rather
than actual meanings.
23
4. Generally CST will apply to very restricted domain of formal statements, it cannot be applicable to real world languages.
5. CST assumes ideal receiver, which can understand all of the logic
and consequences of incoming message. However the theory does not
have scope of mis-information or wrong information.
6. CST assumes infinite information content from a contradiction , however real life situations does not confirm this.
Some improvement on CST was done by J. Hintikka, in terms of inclusion
of polyadic first order logic and defining new information content measure of
tautology (non-zero).Hintikka also prove the maximum information content
issue for contradictions.
3.2.2
Floridis Strongly Semantic Theory
In Classical Semantic Theory of Bar-Hillel and Carnap, a statement which

is less probable or less possible, contains more information (element of surprise).However, we cannot make a sentence more and more wrong to increase
the information content. According to authors, making statement less likely
to be true will increase information, but at certain point it will implode, i.e.
becomes too informative to be true. Bar-Hillel and Carnap address this problem by saying that people do not communicate in self -contradictory statements because they assert too much.This phenomena is commonly known as
Bar-Hillel and Carnap Paradox (BCP). The occurrence of BCP shows the
need of consistency conditions for any measure of semantic content. In 2004,
L. Floridi provide a possible solution of BCP in his strongly semantic theory.
Data Based definition of Information
Floridi starts the explanation with defining information on the basis of data
and meaning, the definition is also known as General Definition of Information (GDI), and it is given by is an instance of information, understood as semantic content, iff1. consists of n data (d), for n 1.
2. the data are well-formed (wfd).
3. the wfd are meaningful (mwfd = )
24
Postulate one affirms Data as the building block of information, a single

data (datum) is an uninterpreted variable whose domain is left open to further interpretation. Postulate 2, denotes that data are clustered together
correctly according to the rules of language chosen for communication.The
third postulate provides the condition for the clustering of symbols in a way
that it should be meaningful (it must confirm semantically to the chosen
language), however meaning does not always conveyed by words only.
According to Floridi, when data are well formed and meaningful, it is
known as semantic information. The semantic information can be one of two
types Instructional Information - a piece of instruction to convey need for
the specific action or Factual Information- which represents fact (a proposition). An instance of factual information can put constraints on information
agents behaviour, which are known as Constraining Affordances. One type
of affordance is alethic value or truthfulness of data.
The Strongly Semantic Theory of Information
The occurrence of Bar-Hillel and Carnap paradox shows that the Classical
Semantic theory is weak theory. The weakness is due to the fact that truth
- values and semantic information are taken independent to each other.The
strong semantic theory (SST) takes approach of defining semantic - factual
information from well-formed, meaningful and truthful data. A data has to
first qualify as truthful, then the quantity of semantic information is calculated. Let w be a situation (Devlin defines - A situation is topologically
connected, structured region of space - time), and be an information instance. The set W be set of all situations, and E be set of all information
instances.The information i logically conforms the situation wi (maximal
informativeness property), however each i can correspond partially to any
of wi s.
The informativeness of i is a function of (i) - The alethic value of i
(polarity) and (ii) degree of discrepancy between i and elements of set W ,
measured in terms of degree of semantic deviation . A statement can be
full of information but false and may be it has no information but it is true,
the maximal informative pair (i , wi ) provides a benchmark to calculate the
boolean truth value and degree of discrepancies.To express both the positive
and negative discrepancies (of true and false values respectively), let f () be
a function from E into set of real numbers with range [1, 1].The function
associates a real value of discrepancy to each depending on the truth
value and deviation from w, denoted by = f ().
The resulting set contains a continuous set of ordered pairs (, ). If we
plot values at X axis and a composite function (measure of informativeness)
25
of Y axis, then zero denotes complete conformity, and left and right side
denote negative and positive discrepancies respectively.The value of denotes
the distance of an information instance from a situation w (can be read as
degree of support of w by ). The estimation of is done through a metric
function which should satisfy following properties 1. If is true and confirms w completely and precisely then = 0 = f ()).
2. If is true and confirms the complete set W then = 1 = f ().
3. If is false and confirms no situation at all then = 1 = f ().
4. If is false in some cases then 0 > > 1.
5. If is true in some cases and does not confirm to any w completely
then 0 < < 1.
According to Floridi the degree of informativeness can be estimated by a
parabolic function defined by () = 1 ()2 .(see Figure 3.2)
Figure 3.2: Estimation of Semantic Information

Above measure follow all the properties stated before.If has a very high
degree of informativeness , then it has very low and we can deduce that
26
it contains very high amount of semantic information, equivalently lower

amount of , shows high value of and therefore very low semantic information. The amount of information is the area delimited by the estimate
equation (), and given by Z b
()d
=
a
, where a and b are the bounds of area.

Floridi also describes the possible solution of Bar-Hillel and Carnap Paradox(BCP), by stating that the content measure of Bar-Hillel and Carnap
Cont() measures only quantity of data in , and thus only deals in completely uninterpreted data.Thus without confirming the truthfulness of , the
Cont measure falls in the trap of BCP.However in Strong semantic theory
() is based on alethic values (truthfulness) of and its value is proportional
to semantic content, thus it avoids the BCP.
Critique of Strongly Semantic Theory
Floridis SST, provides a sound basis of evaluating semantic content in information, however following are some limitation of this theory 1. SST applies on restricted formal system, the set W of possible situations (world states) must be listed down a priori.
2. Theory concentrates more on discrepancies rather than semantic content.
3. The estimation of () is not possible for every situation instance, similarly denotes distance of from w, but it can be negative as well as
positive.
4. Fitting estimation curve on for real data is highly difficult and approximating may provide errors.
5. To apply SST, needs to posses alethic value, but the measurement of
alethic value is not well defined.
3.2.3
Some Technical Aspects on Semantic Theory
In 2011,Jie Bao and a group of researcher, published a technical report on

Semantic communication, which improves the Classical Semantic theory of
Bar-Hillel and Carnap. Some important insights of this model are as following 27
A semantic information source is denoted by tuple (Ws , Ks , Is , Ms ), and

similarly destination is denoted by (Wr , Kr , Ir , Mr ), where W is world model of source or receiver (How they observe the world)
K is background knowledge base of source or receiver.
I is the inference procedure of Source or destination.
M is Message generator at source side or Message interpreter at destination side.
A semantic source is an entity that can send messages using a syntax, such
that these messages are true, i.e. semantically valid messages.Semantic
source sends most accurate and minimum cost messages. Similarly Semantic
destination is an entity which receives the messages and interpret it with
the use of Kr . A semantic error occurs when a message which is true
with respect to source Ws , Ks , Is but receive message is false with respect to
Wr , Kr , Ir .This may be due to source coding, noise in the channel or losses
in decoding or a combination of these.
To define semantic entropy let w Ws be a observed value of world
model for which the message generator Ms generates information x, in terms
of propositions. If m(x) is logical probability of x denoting w well and truly
(proposition x is true), then the semantic entropy is defined by Hs (x) =
log2 (m(x)). Similarly the conditional entropy is defined by using the conditional logical probability m(x|K), it gives a restricted value of x being true
in K. Hence the entropy Hs (x|K) = log2 (m(x|K))
Semantic Source Coding
Since all possible messages which are syntactically valid can be infinite if the
length of messages are not restricted. To delimit this we assume that the
interface language only allows a subset of all possible messages. Let X be
finite set of allowed messages. A semantic coding strategy is a conditional
probabilistic distribution P (X|W ). If is probability of w W , then the
distribution of expressed messaging P (X) can be determined by
X
P (x) =
(w)P (x|w)
w
The Shannon entropy of messages X with the distribution P (X) is given by

X
H(X) =
P (x) log P (x)
x
28
A relation between semantic entropy and the syntactic entropy of source is

given by
H(X) = H(W ) + H(X|W ) H(W |X)
where H(X|W ) is measure of semantic redundancy of coding and H(W |X) is
measure of semantic ambiguity of the coding. Above relation is direct result
of definition of respective entropies.Above relation states that the message
entropy can be larger or smaller than the model entropy depending upon
whether the redundancy or ambiguity is higher.
The maximum entropy is reached when the messages are description of
the models themselves i.e. most informative coding will be the full state
description.
X
Hmax =
(w) log((w)) = H(W )
w
A Note on Semantic Noise

Semantic noise is an instance where the incoming content is evaluated wrong
through Kr . However the semantic noise can affect the message both at
senders and receivers level, as opposite to technical noise which effects the
message only in medium during transmission. Let X be input in the channel
and Y be output. The technical aspect deals with the minimizing the difference between x y where x X and y Y .In semantic communication, we
try to keep intact the meaning of x, converted in y. Technical aspect not always create semantic noise, however reduction of noise also helps in reduction
of semantic noise.Generally semantic errors are of two types - Unsoundness
-sent message is true but received message is false, and Incompleteness - Sent
message is false and received message is true.In lossless coding, the message
is always true (no compression) the goal is to reduce unsoundness, i.e. to
find joint distribution of (w, x, y) such that
X
p(w, x, y)
w=y
is maximum.
Another way of reducing noise is to introduce semantic redundancy into
message, or reformulating it in equivalent way i.e. replacing longer messages
with shorter messages or adding pictures or graphics to it.
3.2.4
Other Attempts to Define Semantic Information
Semantics relates with the language of communication, therefore the attempt

to define natural languages mathematically, invariably also defines semantics
29
of the messaging. Most important work is due to L.A. Zadeh, who is the
founding father of Fuzzy Theory. Day to day communication is mostly
fuzzy in the sense of meaning. Thus to capture the fuzziness we can use
Possibilistic logic instead of probabilistic logic. This is defined as quantitative
fuzzy semantics, or Possibilistic Relational Universal Fuzzy (PRUF).
Possibilistic logic applies to propositions of type x is F where x is object
and F is a fuzzy subset of universe of discourse U . PRUF uses possibilistic
functions to represent such statements which provides meaning in terms of
fuzzy set F of objects x.The benefit of possibilistic logic is that it removes
the necessity of truth values, by using fuzzy logic we can now use linguistic
constructs for relations and logic.These attempts are dealt in extension in
(Ref.)
3.3
End Note to the Chapter
Problem of semantic information theory is an interesting and inspiring challenge in various fields of study. Semantic theory is applied in our daily life
in which we communicate meaning to similar entities. Diverse problems
in Information theory, Economics, Psychology, Mathematics and Computer
Science can be dealt with use of semantic information theory.
***********
30
Chapter 4
The Future of Information
Theory
Discussion presented in previous chapters shows the importance and applicability of Information Theory in Engineering and Technology. Problems in
diverse fields can be converted and solved by methods of information theory.
The subject has given birth to new discipline of study termed as Information
Science, which borrows techniques from Mathematics, Physics, Computer
Science and engineering disciplines, and studies all the problems related to
information transmission, collection, analysis and storage.
In this chapter we shall see the the control problem and its example in
other subjects. Also we shall discuss the endeavours to define information in
general. We will also discuss the information theory in dealing complexity of
system and uncertainty. The chapter will be closed by listing out some open
problems in the field.
4.1
The Control Problem
The prime objective of communication is to influence the behaviour of receiver. A logical extension of semantic problem, control problems concerns
with the control of behaviour (or influence it) by information. The success
of information exchange is measured by the fact that it generates the desired
behaviour on behalf of the receiver. The application of this fact is in AI
(Artificial Intelligence), Electronics and humanities (like Social Science and
Economics).
The control problem is hard to define, because it intersects the technical
problem and semantic problem in a vague way.To generate the behaviour
from receiver,the information first reach it (Technical problem), and the re31
ceiver must understand the meaning intended by sender (Semantic Problem).

Most theories of information communication deals in either technical problem (Shannons Communication theory) or semantic problem (Bar-Hillel &
Carnap, Floridi), but there are few examples of dealing in control problem.
The controlling theory must address following points of importance Receiver must get the information complete and without distortion
(Noise reduction, Coding and compression)
Receiver must understand the information (Semantics, Language, theory of signs and symbols)
Observation and Measurement of behaviour of system (Observation
system)
Reporting of result of observation to sender (Feedback system)
Correction of communicated signal (Correction system)
4.1.1
Examples of Control Problem
In this section we shall list some examples of control problemIn Economics

Economics deals with price of resources with the goal of efficiency and effectiveness of resource allocation. The main information related to economics
is the level of prices. It is indicator of unused resources or extra pressure on
resources.
Previous works on information theory and Economics deals with the measure of information rather than the transmission part. The first instance of
application of Shannons information theory was in J. F. Muths paper. Muth
and Stigler propound that market transmit information through prices.Also
information asymmetry plays important role in competitive equilibrium. If
every one is has same information about future prices there is no incentive
to buy or sell the product. However if there is different information from
different individuals, there will be some economic activity which will shift
the equilibrium and change it.
The how information transmits and interpreted effects the behaviour
of economics agent, which is essence of control problem.According to W.
ONeill, when information is free, there can be no equilibrium in price, because information on prices is freely exchanged between economic agents,
however when information is costly (available only for selected individual),
there will be a certain equilibrium.
32
In Ecology
In 1955 R. MacArthur proposed a measure of stability of a diverse biome on
the basis of ecological processes or flows. He cites view of Eugene Odem The
amount of choice which the energy has in following the paths up through the
food web is a measure of the stability of the community.If a species has
higher number in the system but if its energy is distributed in the food web
by different predators, it will not effect the whole ecology of the system.
The measure of stability is calculated by Shannon - Weaver index given
by
X
S=
pi log(pi )
i
where pi = fFi is the contribution (fraction) of energy flow of specie (i) given
by fi in the sum of all flows F . The measurement is very similar to the
entropy in information theory.The theory states that the higher diversity
index shows higher stability of ecological system.
Operations Management
Operations management is concerned with sooth running of operations of
any system which takes inputs, process it and produces output. Information
flow in this type of system is most important because it facilitates feedback
and correction.Automation of production processes and use of PLC (Programmable Logic & Controlling) machines in controlling the process in an
example of information exchange.PLC machines gather information about
production line and evaluate a possibility of a bottleneck (stoppage of production), this makes production line smooth and less flawed.
In Artificial Intelligence
Artificial Intelligence deals will intelligence in machines. Intelligence can be
defined by assimilation of information and decision making about any particular context.Officially AI was born in 1956 and now it has been applied to
many areas.Recent literature on AI shows the increasing trend towards producing Human level intelligence in AI system. According to Zadeh,capability
of reasoning and decision making on the basis of possibility and partial information is most remarkable ability of humans.Achieving this quality in a
machine is still a non-achieved goal of AI.
Zadeh has also proposed a solution of the problem in terms of Computing with words. Computing with words or CWW is method of converting
33
words as computing tools (as humans do) opposite to bivalent logic (which
machines use). The fundamental of CWW is that words are converted into
mathematical representations using fuzzy sets (Type - I or Type - II) and a
CWW engine will solve the problems, the results again will converted into
words. The entire CWW depends on the logic of information theory, more
specifically semantic theory of information, because the CWW engine must
understand the word and its meaning/relation to other words to calculate
the solution.
In Psychology
Psychology is a complex science with no definition, it concerns with mind,
mental and functions of brain. Information theory can be applied to the
cognition process of mind, methods of learning and training process. The
learning process can be defined as gathering of information and making patterns of world, known as perception and intelligence. Information theory can
help in improving learning and to device training programmes which help
individuals to learn and understand better.
4.2
Other Related Concepts In Information

Theory - Uncertainty, Complexity and Representation
The field of Information theory has progressed leaps and bounds from its
conception in 1940s.Not only the communication part has been in rapid
development, researcher are finding answers of other problems in information
theory. Following are some examples-
4.2.1
Algorithmic Information Theory
Algorithmic Information Theory (AIT) stems out from works of Godel and
Turing. The Incompleteness theorem of Godel states that Infinitely many
statements in Mathematics are true, but they cannot be proved. Turing in
other hand states that If a computing machine halts at any input or whether
it continues in infinite loop, no one can decide.
The credit to formalize AIT is given to three Mathematicians - Kolmogorov, G. Chaitin and Solomonoff. Solomonoff proposed definitions which
are used in AIT and G. Chaitin composed the framework of AIT. Kolmogorov
proved concepts related to algorithmic complexity and measures. According
34
to AIT, algorithms encapsulate information. If a programme (running on

algorithm) generates a string then it is producing information in uncompressed form.The information content in a string is equivalent to the length
of the most compressed self contained representation of that string, this self
contained representation is a programme, which generates the string when
run. The example of compressed sentence can be given by removing vowels
from any sentence in common English. A sufficient knowledgeable person
can complete the sentence. It is same as the concept of Redundancy in
Shannons Information Theory.
Let U be a universal Turing Machine, p be a binary program and |p| be
its length. The halting probability of U is the function of Chaitin.
X
=
2|p|
U (p)halts
The algorithmic complexity of a finite string s, denoted by K(s) is defined

by
K(s) = min{|p| : U (p) = s}
A finite string is said to be random if it cannot be compressed to a shorter
programme i. e. its complexity is equal to its length. The famous result of
Chaitin is given by following theorem Theorem 1 (Chaitin) Any recursively axiomatizable formalized theory enables one to determine only finitely many digits of .
4.2.2
Representational Information Theory
The representational information is the information carried by a finite nonempty set of objects R about its origin i.e. a superset S. Representational
Information Theory (RIT), is a method of defining information in terms of
subsets of a larger set. It uses elements of Category Theory and Measure
Theory to define information represented by subsets of the parent universal
set.
A category is a set of objects which are related in some well - defined
way (more specifically a boolean algebraic rule).A Categorical Structure is a
category and a concept function defined on the category. Concept functions
are useful to define attributes of the elements in the set and define the logical
structure of set adequately. For example if x = Blue, x0 = Red, y = Circle
and y 0 = Square, then a concept function given by boolean expression xy +
x0 y denotes a category consisting of{blue circle,red circle}.
35
Use of Category theory in human cognition and learning process has a

rich history. RIT uses the similar concepts to develop a theory of information which can be use in human learning, machine learning, psychology,
behavioural sciences and AI.
Representational Information Theory is based on following principles 1. Human communication is based on mental representations of categories
(of objects).
2. The mental representation or formally concept is a well defined category
which acts as mediator of information.
3. A concept can be qualitative definition of any environmental variable
(moving , non-moving, a thing, or a person).
4. The extent of non-distinguishable objects is characterized (quantitatively)by the degree of categorical invariance and its cardinality.
In RIT the information carried by R is measured by the change of structural complexity of superset S after removing R. If R carries large information then removal of R effects the structural complexity of S, greatly. Thus
again the information is equivalent to the surprise element of information
instance. The difference of RIT from Shannons theory is that it uses concept structure rather than symbolic sequences. Also it is based on category
theory not on probability theory. In RIT an event is not a source of information, it is represented by set of objects, this structure is helpful in problems
associated with cognition and learning applications and modelling.
4.2.3
General Information Principle
In 2015, L. A. Zadeh published The Information Principle. It was a first

attempt to define information in generalised way - in terms of restriction. A
restriction is defined for a variable, which delimits the values it can take.A
restriction is usually defined in natural language.There can be possibilistic
restriction, probabilistic restrictions or bimodal restrictions (a combination
of possibilistic and probabilistic restriction). If x is a variable then the possibilistic restriction of x is defined by
R(x) = x is A
where A is a fuzzy set in universe of discourse U , with membership function
A .
36
Similarly the probabilistic restriction of x is delimitation of values of x

with a probability distribution i. e.
R(x) = x is p
where p is a probability distribution.
A bimodal restriction of x is combination of possibilistic and probabilistic
restriction, it is defined by
R(x) = P rob(x is A) is B
where A and B are fuzzy sets and usually defined in natural language.
According to Zadeh the restrictions are most general form of information,
and can be used to represent semantic information. Thus the possibilistic
restriction about a variable x is possibilistic information about x. Similarly
probabilistic information about x is the probabilistic restriction on x. The
information is fuzzy or crisp, depending upon the restriction set is fuzzy or
crisp. Similar case is true for bimodal information.
The general information principle consists of three important axioms 1. information is restriction.
2. There are types of information depending upon restriction we chose
over the variable - Possibilistic, probabilistic or Bimodal.
3. Possibilistic and probabilistic information are orthogonal to each other,
i.e. one is not derivable from other.
Equating information with a restriction provides new insights in information theory and it can be applied to human and machine learning
problems and A.I.
4.3 Some Open Problems in Information

Theory
In 1945, C. E. Shannons article gave birth to a new area of study Information theory, now it has grown from unknown to a Scientific discipline, its concepts are being widely used in fields of Science, humanities and engineering. Opposite to Shannons view that the theory must
remain strictly in technical domain, it has been accepted and applied
in non -technical subjects like Psychology, Life-Sciences, Management
37
and Economics. The theory of information is applicable in every field

where data are exchanged and informations are communicated.
There are some problems which remain unanswered through this progress,
some are very old and some are result of new searching in the field, following is the list which by no means is complete and exhaustive Improvement of Data compression limit, extending it beyond the
limit of entropy.
Improvement of speed of transmission of information.
Theory of multichannel and multi-network information flow.
Breaking limits of computation (calculation of ).
Search of Complete theory of Information, which includes the technical, semantic and influential aspects of information.
Search of semantic information theory which can be applied on
natural language.
A theory of human perception based on information, duplicating
it to a machine.
Information reduces uncertainty, thus exploration of generalized
uncertainty principle using information theoretic concepts.
Development of human capabilities of calculation and approximation on uncertain information in machine.
4.3.1
End Note to the Chapter
It is clear by above discussion that information theory has a rich history,

a solid ground of concepts and theories and has tools and techniques
which can be applied to plethora of both technical and non-technical
problems. There are avenues of further research and application which
are promising and fruitful.
Information lies in the basic need of humans, communicating to similar
entities, exchange of information makes humans the most prominent
specie in this earth.Thus knowing more about information flow and its
communication is both necessary and important for scientific community.
***********
38

Information Theory

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Information Theory

Transféré par

Droits d'auteur :

Formats disponibles

Chapter 1

Modern age is age of Information.

it produces information. Conversely providing specific information to

Information theory is a branch of applied science which uses techniques from

Weavers Interpretation of Shannons work

Shannons background was in Electrical engineering and hence he is concern

Structure of this Review

Chapter - I - Introduction - Presents a general overview of Information

Attempts to Define Information Mathematically

Before 1948 major modes of communications which existed were Telephone,

The Birth of Information Theory - Shannons Work

In 1945 Shannon wrote a technical report on information systems and use

The name was suggested by John Von Neumann

Model of Communication System

Shannon has also presented a model for a general communication system,

Figure 2.1: Schematic Diagram

Source- An information source is a producer of messages to be send. A

Transmitter- Transmitter converts the message into signals which can be

Information and Entropy

The quantity which measures

Some Important theorems from Shannon

of the reciprocal probability of a long sequence divided by the number of

In other words, as long as we require probability of error strictly less than 1,

where N (T ) is the number of allowed signals of duration T.

Channel Capacity of Noisy Channel

Application of Shannons Theory & Improvements

Coding Theory is the study of codes and

so that even if some of the information is lost or corrupted it will still be

Entropy and its Generalization

Figure 2.2: History of Coding

Figure 2.3: Maxwells Demon - A thought Experiment

tinuous equivalent of Shannons entropy is given by Z

f (xi ) log f (xi )

End Note on the Chapter -

As it is clear from Shannons theory and its subsequent improvements, by

A version of word Semiosis given by Ch. S. Peirce

1. syntactics- the study of the methods by which signs may be combined

Understanding the Semantic Problem-

Figure 3.1: Modified Communication Scheme For Semantic Messaging

2. How can semantic information theory help in data compression and

Theories of Semantic Information -

Theories of semantic information are applied in those communication cases

Classical Semantic theory - Bar-Hillel & Carnap

where c(hi , e) is the relative inductive probability.

Floridis Strongly Semantic Theory

In Classical Semantic Theory of Bar-Hillel and Carnap, a statement which

Postulate one affirms Data as the building block of information, a single

Figure 3.2: Estimation of Semantic Information

it contains very high amount of semantic information, equivalently lower

, where a and b are the bounds of area.

Some Technical Aspects on Semantic Theory

In 2011,Jie Bao and a group of researcher, published a technical report on

A semantic information source is denoted by tuple (Ws , Ks , Is , Ms ), and

The Shannon entropy of messages X with the distribution P (X) is given by

A relation between semantic entropy and the syntactic entropy of source is

A Note on Semantic Noise

Other Attempts to Define Semantic Information

Semantics relates with the language of communication, therefore the attempt

End Note to the Chapter

The Control Problem

ceiver must understand the meaning intended by sender (Semantic Problem).

Examples of Control Problem

In this section we shall list some examples of control problemIn Economics