Académique Documents
Professionnel Documents
Culture Documents
Introduction
1.1
Introduction
1.2
Information Theory
1.3
Problem Class B The Semantic Problem- Problems related to the semantic aspects of message or information i.e. the meaning of the message
received. (Floridi & Zadeh - Fuzzy Information theory)
Problem Class C The Control problem - Problems on the study of effect
of information communication on the behaviour of receiver i.e. does
the communicated message induce the desire behaviour of receiver or
not? (Wiener, Kolmogrov, Chaitin and Vigo)
In the above classification technical problems concerns with the accuracy
of transference of sets of symbols, from sender to receiver .The set can be
discrete set (like written speech), continuous varying signal (like voice or
music) or continuously varying two - dimensional pattern (like television).
The second class of problems i.e. the semantic problem has deep significance in daily life. Almost all human communication is based on semantic
information, like language (set of symbols which produce different meanings
with different combinations) or body gestures etc. Semantic problem is also
a difficult problem to handle only on the basis of technical aspects of information theory.
Class C problems are logical extension of problems of class B in the sense
that after understanding the content of the message dose the receiver behaves
in the desired way? The question of control through information is topic of
intense research in humanities and cognitive sciences.
Weaver (page 5) also argue that all the three levels of problem are interrelated and the technical side is the heart of any solution we propose
to solve the communication problem. Hence sending message correctly and
completely without errors is the first important step.
1.4
Information theory, from its conception has became a potent subject of study
and research. It is now interpolating with other areas of sciences like Computer sciences, Fuzzy theory, Electronics Engineering, Systems theory to address some of very old problems of various applied subjects and in similar
nature it is merging with Cognitive sciences, Psychology, Communication,
Management, Information systems,Economics to develop theories of human
interaction and nature. Presented review on Information Theory is an attempt to trace its history, its present status and see its future potential both
in Applied Sciences and Humanities. The structure of this review is based on
the classification of Weaver (See section ) and the particulars are following -
Chapter 2
The Technical Problem
In 1948 C. E. Shannon published A Mathematical Theory of Communication
in Bell Systems Technical Journal. It was not a first attempt of proposing
mathematical framework for communication, but it was by far most complete and comprehensive theory of communication. According to Verdu,
Shannons discovery of the fundamental laws of data compression and transmission marks the birth of Information theory. Shannons theory was an
attempt to unify various fields like Probability, Statistics, Computer Sciences etc. The proposed theory still continues to be an inspiration to new
findings in field of Information communication. In this chapter we will see
the main achievements of Shannon and the applications of Shannons theory
in other fields.
2.1
Nyquist also discussed the possibility of transmission gain by replacing Morse code by an optimum code. Similar studies were conducted by
K
uppfm
uller, Kotelnikov, Whittaker, and Gabor targeted towards the signalling speed or bandwidth manipulation. R. Hartelys paper in 1928 introduced various terms like capacity to communicate information, rate of
communication and most importantly the quantitative measure of information.
He defines the amount of information on the basis of available choices of
messages for the sender. For, if there are n selection states and s number
of symbols available in each selection then the amount of information H is
defined by
H = n log s
In above mentioned attempts there are two prominent features - One, Information is defined as statistically, on the basis of frequencies of letters and
words (it was useful for Cryptography) and second there was no study for the
effect of noise(i.e. unwanted information or disturbance caused by medium
of communication).
During WWII need of providing information with increased rate and better security measures was immense.There were several attempts to build
theory of communication including trade-off between transmission rate, reliability, bandwidth and signal - noise ratio. Prominent names were Norbert
Weiner, A.N.Kolmogorov R. A. Fisher and C. E. Shannon.
In his ground - breaking book Cybernetics, Weiner classifies the interests
of his peers. He writes
The idea of [probabilistic] information occurred at about the
same time to several writers, among them are statistician R.A.
Fisher, Dr. Shannon of Bell Labs and the author. Fishers motive in studying the subject is to be found in classical statistical
theory, that of Shannon in the problem of coding information
and that of the author in the problem of noise and message in
electrical filter. Let it be remarked parenthetically that some of
my speculations in this direction attach themselves to the earlier
work of Kolmogorov...
Shannons and Weiners works had appeared simultaneously, delayed by war
but accepted and popularized very quickly both in theoretical and applied
realms. Today information theory is synonymous with Shannons communication theory however in early years it was also known as Weiner - Shannon
Communication theory
2.2
The quantity H, is known as Entropy 1 . Entropy measures the average uncertainty in the message. The unit of measure of information is known as
bit2 . The important properties of this measure H are as followed 1. H is continuous in pi .
2. H is positive - since pi 1 log pi 0.
3. H = 0 if and only if one and only one of pi s is one and all remaining
are 0 i.e. if we are certain about the outcome then the entropy is zero.
1
2
4. If all pi are equal to 1/n then H is maximum equal to log n i.e. when
every outcome is equally likely then the entropy is maximum.
5. If x and y are two events with m and n possibilities respectively. If
p(i, j) is the probability of joint occurrence of xi and yj then the entropy
of joint event is given by P
H(x, y) = i,j p(i, j) log p(i, j)
P
P
H(x) = i,j p(i, j) log j p(i, j)
P
P
H(y) = i,j p(i, j) log i p(i, j)
6. Any operation which equalize the probabilities, will increase H.
7. The conditional
P entropy of y, is defined by
Hx (y) = i,j p(i, j) log pi (j) Also we can write
H(x, y) = H(x)+Hx (y), where x and y are not necessarily independent.
2.2.1
2.2.2
2.2.3
Shannon has not only classified the source of message but provided efficient
ways to encode the communication and data compression on the basis of
entropy. According to Shannon H is approximately the logarithm (base 2)
9
H
N
Coding and Data Compression
In above theorem the second class of sequences are known as typical sequences. The probability is decreasing exponentially with increasing blocklength. This fact is useful in the data compression and coding, we can neglect atypical sequences and code the typical sequences assuming them to be
equiprobable.The resulting encoding of string of N symbols will be a longer
string of HN , increasing the length will delimit the probability of failure of
recovery of signal as small as possible.The above theorem provides a suboptimal coding method, thus in next theorem Shannon provides a criterion for
optimal coding.
Theorem 2.2.2 (Strong Converse Source Coding Theorem) We can
define n(q) to be the number of most probable sequences of length N , which
accumulate to give probability q, then
log n(q)
=H
N
N
where q does not equal to 0 or 1.
lim
C = lim
10
2.3
In this section we shall discuss the application of Shannons theory and subsequent improvements by other significant contributors. In 1949, Shannons
other paper Communication Theory of Secrecy Systems was made public. Together these two papers Shannon had established a deep theoretical
foundation for various branches of digital communication - data compression,
data encryption and data correction (Gappmair).
Coding Theory
The fundamental problem in communication engineering is ideal encoding, i.e.
how to transmit as many bits of information as possible while maintaining
a high reliability(Jan Kahre).
p(x, y) log
x,y
p(x, y)
p(x)p(y)
The following relation is true for mutual information Iy (x) = H(x) Hy (x),
where Hy (x) is equivocation. Thus mutual information is the gain in the
information coding of x, when we know y, compared to when we do not
know y.
Shannon completed the theory of coding with following two theorems Theorem 2.3.1 (Source Coding Theorem) The number of bits necessary to uniquely describe any data source can approach the corresponding
information content as closely as desired. If there are N random variables
independently distributed with entropy H, then these data can be compressed
into N H bits with negligible risk of information loss (as N ), but conversely if data are compressed into fewer bits than N H, then there will be
some data loss.
Theorem 2.3.2 (Channel Coding Theorem) The error rate of data transmitted over a band - limited noisy channel can be reduced to an arbitrarily
small amount if the information rate is lower than the channel capacity.
In his attempts Shannon did not proposed any algorithm to encode information but in subsequent years growth of digital communication saw the
rise of coding theory as separate branch of investigation in engineering and
science. Shannons Channel coding theorem had predicted the forward error
correction schemes,invented in 1967 by A. J. Viterbi. Similarly more data
compression methods(both lossless and lossy) have been found after emergence of mobile communication.Following figure (adapted from Gappmair)
presents an overview of various coding schemes.
2.3.1
The relation of physical entropy and informational entropy goes very back in
history. In 1860 Maxwell proposed a thought experiment by name Maxwells
Demon. In this experiment Maxwell proposed a bifurcated chamber with
13
15
R
enyi Entropy - Renyi entropy is a generalization of Shannons entropy,
which is used for measuring information from weak signal (of lower
probabilities), which are overlapped by stronger signals (of higher probabilities).We define Renyi entropy by
H =
n
X
1
log(
pi )
1
i=1
Leaving inclusion of , the Renyi entropy works just as Shannons entropy (equals to Shannons entropy when = 1). It has maximum
H (max) = log n at pi = 1/n, and satisfy the additive property. The
inclusion of parameter makes it more sensitive to selective frequencies.
Tsallis Entropy - Tsallis Entropy is defined by
n
X
1
T =
(1
pi )
1
i=1
Min-Entropy - Min - Entropy is defined by the limiting value of Renyi
Entropy as . It is given by H = log maxxX p(x)
2.3.2
theory itself presented few unanswered questions about the meaning and
control aspects of communication.
Shannon has declared that the mathematical/ technical aspect is free from
the semantic aspect of information theory, however to apply the Shannons
theory in fields like Economics, Psychology or Biology, one cannot overlook
these aspects.For Shannon the semantic aspects are left due to his believe that
it plays no part in technical side of theory. Weaver in his paper cleared the
interconnection between technical, semantic and control side of information
theory and how they are related to each other, and in the heart of this is the
technical or mathematical aspect of communication.
*************
17
Chapter 3
The Semantic Problem
The credit to establish Information Theory as a discipline of study goes to
C. E. Shannon and Norbert Weiner. However, making the theory famous
and accessible to general scientific community was the attempt of Warren
Weaver who published The Mathematics of Communication in Scientific
American Journal in 1949. Weaver defines Communication in a broader
sense, a process by which one entity may effect another entity.This process
may be of three types - Technical - sending and receiving messages, Semantic
- meaning of the message, and Influential - controlling the behaviour of
receiver of message through communication.
Shannons theory is attempt to establish rules for the syntactic part, but
it neglects the semantic and control aspects. Semantics is important aspect of
communication, specially in human communication where we have to convey
our thoughts to similar entities, or we have to understand the meaning of
what the others are communicating.
In 1938 C.W. Morris published a book on the theory of signs titled Foundations of Theory of Signs. In this he describes not only the communication
process of humans by signs and symbols but all organisms capable of communication. Human communication depends partially on language. Language
is collection of symbols, groups of symbols and the rules of grouping the symbols. Language can be understand as a coding method , which one human
use to encode his/her thoughts and communicate. Other methods of communicate are performing arts (music, dance, acting,painting etc.) and body
language.
Morris defines the process of communication through signs, semiotics 1 , as
a universal process and an interdisciplinary undertaking.Morriss foundation
divides semiotics into three interrelated discipline 1
18
3.1
Weaver writes in his paper The semantic problems are concerned with the identity or satisfactorily close approximation, in the interpretation of meaning
by the receiver,as compared with the intended meaning of the
sender. This is very deep and involved situation, even when one
deals only with the relatively simpler problem of communicating
through speech.
Above quote sum-up the semantic problem adequately - it concerns with
understanding on the side of receiver, where the meaning is intended by the
sender.The complexity of this problem can be illustrated by simple example.
Suppose A asks B Do you understand me?, and B replies No, then we
cannot be certain that B has not understood the question or the meaning of
the question.
Thus semantic problem of communication can be interpreted as problem of approximation (understanding as close to the intended meaning) and
coding (encoding understanding/meaning in one mode of communication to
other mode.) Weaver also introduce elements of meaning and effectiveness
in the schematic diagram proposed by Shannon(See Figure3.1)
The schematic diagram now contains following new blocks Semantic Coding - Semantic coding becomes the first coding of information into message, it can be mode of expression of intended meaning
by sender, generally it is language(but can be any means of expression,
like music, art, painting, writing etc).
Semantic Noise - The unwanted (or essential) addition in the information
content of the message to make it receivable or understandable. It
19
3.2
3.2.1
In 1952 Yehoshua Bar-Hillel and Rudolf Carnap presented Theory of Semantic Information. It was the first attempt to define semantic content in
message. The highlight of (Classical) Semantic theory (CST) was that it
uses similar ideas that of Shannon and defines measures of semantic content
useful for application in technical as well as non - technical problems.
The CST applies on a formal system of language, denoted by Ln .It denotes a universe which includes entities, formal statements (predicates), logical connectives and rules to form compound formal statements.Statements
can be Basic or Ordinary. A basic statement is predicate applied to an
entity(Ram is tall ) and an ordinary statement is combination of basic statements (Ram is tall and Laxman is handsome.).Furthermore, we have state
descriptions, which are statement about the universe in terms of predicate
applied to every entity of the universe. There are infinite number of state
21
description possible. If we take any ordinary statement and count the number of state description which are made false by it, then this number gives
the information content of the statement.
If more state descriptions are made false by the statement, its information content is that much higher. A tautology (true by definition) does not
disapprove anything, thus it contains zero information. A self-contradictory
statement, disapproves everything, hence it contains maximum information
content. Statements, which are logically indeterminate rule out some possibilities, thus they have some level of content.More particular a statement,
more it rules out possible states and it has more information content. 2
To measure the information content, Bar-Hillel and Carnap used logical
probability instead of classical probability (used by Shannon). For a hypothesis Hi to be probably true, the empirical evidence is E, then the degree of
support of Hi by E is known as logical probability of Hi given E. Logical
probabilities are based on logical relations between propositions.It is denoted
by m(A) = TTAS , where TA = total no. of times when A is logically true, TS =
Total no. of logical states.
With use of logical probability of state description CST proposes two
information measure, based on Inverse relationship property. These measure
are defined as followed 1. Content - Denoted by Cont(A), the content measure, measures the
amount of content in the logical statement denoted by A, and it is
defined by Cont(A) = 1 m(A),m(A) being the logical probability of
A.
2. Information - Denoted by In(A), the information measure, measures
the semantic information of the logical statement, and it is defined by
In(A) = log
1
1
= log
= log m(A)
1 Cont(A)
m(A)
Following are some properties of above two functions 1. log is logarithm base 2.
2. For a basic statement B,Cont(B) = (1/2) and In(B) = 1.
3. For conjunction of n basic statements Cn , Cont(Cn ) = 1 (1/2)n
4. For disjunction of n basic statements Dn , Cont(Dn ) = (1/2)n
2
The inverse of information content is Range, the number of states which are confirmed
(implied)by a statement.
22
5. 0 In(A)
6. In(A) = 0 if and only if A is logically true.
7. In(A) = if and only if A is logically false.
8. In(A) is positively finite iff A is factual.
9. If A logically implies B then In(A) In(B)
10. In(A) = In(B), iff A is logically equivalent to B.
11. In(A B) = log Cont( A B)
12. In(A B) = log Cont( A B)
13. If A and B are inductively independent then In(AB) = In(A)+In(B)
14. The relative content of B with respect to A is defined by Cont(B/
A) = Cont(A B) Cont(A)
15. If A and B are inductively independent then In(B/A) = In(B)
Bar - Hillel and Carnap also define estimation of amount of information. If
H = h1 , h2 , , hn is an exhaustive system on a given evidence e, then the
estimate of the information carried by H with respect to e is given by the
formula
n
X
Est(In, H, e) =
c(hi , e) In(hi /e)
i=1
23
4. Generally CST will apply to very restricted domain of formal statements, it cannot be applicable to real world languages.
5. CST assumes ideal receiver, which can understand all of the logic
and consequences of incoming message. However the theory does not
have scope of mis-information or wrong information.
6. CST assumes infinite information content from a contradiction , however real life situations does not confirm this.
Some improvement on CST was done by J. Hintikka, in terms of inclusion
of polyadic first order logic and defining new information content measure of
tautology (non-zero).Hintikka also prove the maximum information content
issue for contradictions.
3.2.2
24
of Y axis, then zero denotes complete conformity, and left and right side
denote negative and positive discrepancies respectively.The value of denotes
the distance of an information instance from a situation w (can be read as
degree of support of w by ). The estimation of is done through a metric
function which should satisfy following properties 1. If is true and confirms w completely and precisely then = 0 = f ()).
2. If is true and confirms the complete set W then = 1 = f ().
3. If is false and confirms no situation at all then = 1 = f ().
4. If is false in some cases then 0 > > 1.
5. If is true in some cases and does not confirm to any w completely
then 0 < < 1.
According to Floridi the degree of informativeness can be estimated by a
parabolic function defined by () = 1 ()2 .(see Figure 3.2)
3.2.3
28
is maximum.
Another way of reducing noise is to introduce semantic redundancy into
message, or reformulating it in equivalent way i.e. replacing longer messages
with shorter messages or adding pictures or graphics to it.
3.2.4
of the messaging. Most important work is due to L.A. Zadeh, who is the
founding father of Fuzzy Theory. Day to day communication is mostly
fuzzy in the sense of meaning. Thus to capture the fuzziness we can use
Possibilistic logic instead of probabilistic logic. This is defined as quantitative
fuzzy semantics, or Possibilistic Relational Universal Fuzzy (PRUF).
Possibilistic logic applies to propositions of type x is F where x is object
and F is a fuzzy subset of universe of discourse U . PRUF uses possibilistic
functions to represent such statements which provides meaning in terms of
fuzzy set F of objects x.The benefit of possibilistic logic is that it removes
the necessity of truth values, by using fuzzy logic we can now use linguistic
constructs for relations and logic.These attempts are dealt in extension in
(Ref.)
3.3
Problem of semantic information theory is an interesting and inspiring challenge in various fields of study. Semantic theory is applied in our daily life
in which we communicate meaning to similar entities. Diverse problems
in Information theory, Economics, Psychology, Mathematics and Computer
Science can be dealt with use of semantic information theory.
***********
30
Chapter 4
The Future of Information
Theory
Discussion presented in previous chapters shows the importance and applicability of Information Theory in Engineering and Technology. Problems in
diverse fields can be converted and solved by methods of information theory.
The subject has given birth to new discipline of study termed as Information
Science, which borrows techniques from Mathematics, Physics, Computer
Science and engineering disciplines, and studies all the problems related to
information transmission, collection, analysis and storage.
In this chapter we shall see the the control problem and its example in
other subjects. Also we shall discuss the endeavours to define information in
general. We will also discuss the information theory in dealing complexity of
system and uncertainty. The chapter will be closed by listing out some open
problems in the field.
4.1
The prime objective of communication is to influence the behaviour of receiver. A logical extension of semantic problem, control problems concerns
with the control of behaviour (or influence it) by information. The success
of information exchange is measured by the fact that it generates the desired
behaviour on behalf of the receiver. The application of this fact is in AI
(Artificial Intelligence), Electronics and humanities (like Social Science and
Economics).
The control problem is hard to define, because it intersects the technical
problem and semantic problem in a vague way.To generate the behaviour
from receiver,the information first reach it (Technical problem), and the re31
4.1.1
In Ecology
In 1955 R. MacArthur proposed a measure of stability of a diverse biome on
the basis of ecological processes or flows. He cites view of Eugene Odem The
amount of choice which the energy has in following the paths up through the
food web is a measure of the stability of the community.If a species has
higher number in the system but if its energy is distributed in the food web
by different predators, it will not effect the whole ecology of the system.
The measure of stability is calculated by Shannon - Weaver index given
by
X
S=
pi log(pi )
i
where pi = fFi is the contribution (fraction) of energy flow of specie (i) given
by fi in the sum of all flows F . The measurement is very similar to the
entropy in information theory.The theory states that the higher diversity
index shows higher stability of ecological system.
Operations Management
Operations management is concerned with sooth running of operations of
any system which takes inputs, process it and produces output. Information
flow in this type of system is most important because it facilitates feedback
and correction.Automation of production processes and use of PLC (Programmable Logic & Controlling) machines in controlling the process in an
example of information exchange.PLC machines gather information about
production line and evaluate a possibility of a bottleneck (stoppage of production), this makes production line smooth and less flawed.
In Artificial Intelligence
Artificial Intelligence deals will intelligence in machines. Intelligence can be
defined by assimilation of information and decision making about any particular context.Officially AI was born in 1956 and now it has been applied to
many areas.Recent literature on AI shows the increasing trend towards producing Human level intelligence in AI system. According to Zadeh,capability
of reasoning and decision making on the basis of possibility and partial information is most remarkable ability of humans.Achieving this quality in a
machine is still a non-achieved goal of AI.
Zadeh has also proposed a solution of the problem in terms of Computing with words. Computing with words or CWW is method of converting
33
words as computing tools (as humans do) opposite to bivalent logic (which
machines use). The fundamental of CWW is that words are converted into
mathematical representations using fuzzy sets (Type - I or Type - II) and a
CWW engine will solve the problems, the results again will converted into
words. The entire CWW depends on the logic of information theory, more
specifically semantic theory of information, because the CWW engine must
understand the word and its meaning/relation to other words to calculate
the solution.
In Psychology
Psychology is a complex science with no definition, it concerns with mind,
mental and functions of brain. Information theory can be applied to the
cognition process of mind, methods of learning and training process. The
learning process can be defined as gathering of information and making patterns of world, known as perception and intelligence. Information theory can
help in improving learning and to device training programmes which help
individuals to learn and understand better.
4.2
The field of Information theory has progressed leaps and bounds from its
conception in 1940s.Not only the communication part has been in rapid
development, researcher are finding answers of other problems in information
theory. Following are some examples-
4.2.1
Algorithmic Information Theory (AIT) stems out from works of Godel and
Turing. The Incompleteness theorem of Godel states that Infinitely many
statements in Mathematics are true, but they cannot be proved. Turing in
other hand states that If a computing machine halts at any input or whether
it continues in infinite loop, no one can decide.
The credit to formalize AIT is given to three Mathematicians - Kolmogorov, G. Chaitin and Solomonoff. Solomonoff proposed definitions which
are used in AIT and G. Chaitin composed the framework of AIT. Kolmogorov
proved concepts related to algorithmic complexity and measures. According
34
4.2.2
The representational information is the information carried by a finite nonempty set of objects R about its origin i.e. a superset S. Representational
Information Theory (RIT), is a method of defining information in terms of
subsets of a larger set. It uses elements of Category Theory and Measure
Theory to define information represented by subsets of the parent universal
set.
A category is a set of objects which are related in some well - defined
way (more specifically a boolean algebraic rule).A Categorical Structure is a
category and a concept function defined on the category. Concept functions
are useful to define attributes of the elements in the set and define the logical
structure of set adequately. For example if x = Blue, x0 = Red, y = Circle
and y 0 = Square, then a concept function given by boolean expression xy +
x0 y denotes a category consisting of{blue circle,red circle}.
35
4.2.3
36
4.3.1
38