Académique Documents
Professionnel Documents
Culture Documents
intentionally left
blank
Copyright © 2008, New Age International (P) Ltd., Publishers
Published by New Age International (P) Ltd., Publishers
This book deals with a novel paradigm of neural networks, called multidimensional neural
networks. It also provides comprehensive description of a certain unified theory of control,
communication and computation. This book can serve as a textbook for an advanced course
on neural networks or computational intelligence/cybernetics. Both senior undergraduate
and graduate students can get benefit from such a course. It can also serve as a reference
book for practicising engineers utilizing neural networks. Further more, the book can be
used as a research monograph by neural network researchers.
In the field of electrical engineering, researchers have innovated sub-fields such as
control theory, communication theory and computation theory. Concepts such as logic
gates, error correcting codes and optimal control vectors arise in the computation,
communication and control theories respectively. In one dimensional systems, the concept
of error correcting codes, logic gates are related to neural networks. The author, in his
research efforts showed that the optimal control vectors (associated with a one dimensional
linear system) constitute the stable states of a neural network. Thus unified theory is
discovered and formalized in one dimensional systems. Questioning the possibility of
logic gates operating on higher dimensional arrays resulted in the discovery as well as
formalisation of the research area of multi/infinite dimensional logic theory. The author
has generalised the known relationship between one dimensional logic theory and one
dimensional neural networks to multiple dimensions. He has also generalised the
relationship between one dimensional neural networks and error correcting codes to
multidimensions (using generator tensor).
On the way to unification in multidimensional systems the author has discovered and
formalised the concept of tensor state space representation of certain multidimensional
linear systems.
It is well accepted that the area of complex valued neural networks is a very promising
research area. The author has proposed a novel activation function called the complex signum
function. This function has enabled proposing a complex valued neural associative memory
on the complex hypercube.
He also proposed novel models of neuron (such as linear filter model of synapse).
This book contains 10 chapters. The first chapter provides an introduction to the unified
theory of control, communication and computation. Chapter 2 introduces a mathematical
(viii) Preface
G. Rama Murthy
Contents
PREFACE (vii)
1. INTRODUCTION 1
L OGICAL BASIS FOR COMPUTATION 3
L OGICAL BASIS FOR CONTROL 3
L OGICAL BASIS OF COMMUNICATION 4
A DVANCED THEORY OF EVOLUTION 6
INDEX 141
This page
intentionally left
blank
CHAPTER
1
Introduction
Ever since the dawn of civilization, the homo-sapien animal unlike other lower level animals
was constantly creating tools that enabled the community to not only take advantage of
the physical universe but also develop a better understanding of the physical reality through
the discovery of underlying physical laws. The homo-sapien, like other lower level animals
had two primary necessities: metabolism and reproduction. But, more important was the
obsession with other developed necessities such as art, painting, music and sculpture.
These necessities naturally lead to the habit of concentration. This most important habit
enabled him to develop abstract tools utilized to study nature in most advanced civilizations.
Thus the homo-sapien animal achieved the distinction of being a higher animal compared
to the other animals in nature.
In ancient Greece, the homo-sapien civilization was highly advanced in many matters
compared to all other civilizations. Such a lead was symbolized by the development of
mathematics subject in various important stages. The most significant indication of such
development is left to posterity in the form of 13 books called, Euclid’s Elements. These
books provide the first documented effort of axiomatic development of a mathematical
structure such as the Euclidean geometry. Also, Greek, Babylonian civilizations made
important strides in algebra: solving linear, quadratic equations and studying the quadratic
homogeneous forms in two variables (for conic sections). Algebra was revived during the
Renaissance in Italy. In algebra, solution of cubic, quartic equations was carried out by the
Italian algebraists. This constituted the intellectual heritage, cultural heritage along with
religious, social traditions.
To satisfy the curiosity of observing the heavens, various star constellations,
astronomical objects were classified. In navigating the ships for battle purposes as well as
trade, astronomical observations were made. These provided the first curious data related
to the natural world. In an effort to understand the non-living material universe, homo-
sapiens have devised various tools: measuring equipment, experimental equipment,
mathematical procedures, mathematical tools etc.
2 Multidimensional Neural Networks: Unified Theory
With the discovery that Sun is the center of our relative motion system by Copernicus,
Ptolemaic theory was permanently forsaken. It gave Galileo, the curious motivation for
deriving the empirical laws of far flung significance in natural philosophy/natural
science/physics. Kepler after strenuous efforts derived the laws of planetary motion
leading to some of the laws of Newton. Issac Newton formalized the laws of Galileo by
developing calculus. He also developed a theory of gravitation based on the empirical
laws of Kepler. Michael Faraday derived the empirical laws of electric and magnetic
phenomena. Though Newton’s mechanical laws were successfully utilized to explain
heat phenomenon, kinetic theory of gases as being due to mechanical motion of molecules,
atoms, they were inadequate for electrical phenomena. Maxwell formalized Faraday’s
laws of electro-magnetic induction leading to his field equations. Later physics developed
at a feverish pace.
These results in physics were paralleled by developments in other related areas such
as chemistry, biology etc. Thus, the early efforts of homo-sapiens matured into a clearer
view of the non-living world. The above description summarizes the pre 20th century
development of this progress on homo-sapien contributions to understanding the non-
living material universe.
In making conclusive statements on the origin and evolution of physical reality,
the developments of the 20th century are more important. In that endeavor, Einstein’s
general theory of relativity was one of the most important cornerstones of 20th century
physics. It enabled him to develop a general, more correct theory of gravitation,
outdating the Newtonian theory. It showed that gravitation is due to curvature of space-
time continuum. The general theory of relativity also showed that all natural physical
laws are invariant under non-linear transformations. This result was a significant
improvement over special theory of relativity, where he showed that all natural physical
laws are invariant under linear Lorentz transformations. This result (in special theory
of relativity) was achieved when Einstein realized that due to finiteness of velocity of
light, one must discard the notions of absolute space and time. They must be replaced
by the notions of space-time continuum i.e. space and time are not independent of one
another, but are dependent. Thus, special and general theories of relativity constrained
the form of natural physical law.
In the 20th century, along with the Theory of Relativity, Quantum Mechanics was
developed due to the efforts of M.Planck, E. Schrodinger and W. Heisenberg. This theory
showed that the electromagnetic field at the quantum level was quantized. This, along
with, wave-particle duality of light was considered irreconcilable with the general theory
of relativity. To reconcile general theory of relativity with various quantum theories,
Y. Nambu proposed a string model for fundamental particles and formalized the
dynamics of light string. Utilizing the experimentally verified quantum theories of
chromodynamics, electrodynamics, supersymmetry of fundamental particles (unifying
Bosons and Fermions), it was possible to supersymmetrize the string model of
fundamental particles, resulting in the so-called superstring (supersymmetric string)
Introduction 3
George Boole developed the algebra when the variables assume “true” or “false” values.
This algebra is called the Boolean algebra. Certain elementary Boolean algebraic expressions
are realized in equipment called “logic gates”. When the logic gates are combined/co-
ordinated, arbitrary Boolean algebraic expression can be computed. The combination of
Boolean logic gates ( an assemblage with some minimum configuration of gates) and
memory elements forms an arithmetic unit. When such a unit is coupled with a control
unit the Central Processor Unit (CPU) in a computer is realized. The CPU in association
with a memory, input and output units forms a computational unit without intelligence.
This is just a machine which can be utilized to perform computational tasks in a fast manner.
Various thought provoking modifications make it operate on data in an efficient manner
and provide computational results related to various problems.
discovered that a time varying electric field leads to magnetic field which can be capitalized
for the motion of a neutral body. He also discovered that a time varying magnetic field leads
to electric field inside a neutral conductor and flow of current takes place. These formed the
Fleming’s left hand and right hand rules relating the relativistic effects between the electric
field, magnetic field and conductor. These investigations of Faraday and other scientists
naturally paved the way for electric circuits consisting of resistors, inductors and capacitors.
Such initial efforts led to canonical circuits such as RL circuit, RLC circuit, RC circuit etc. The
systems of differential equations and their responses were computed utilizing the analytical
techniques. The ability to control the motion of an arbitrary neutral object led to applications
of electrical circuits and their modifications for control of trajectories of aircrafts. Thus, the
automata which can perform CONTROL tasks was generated. These control automata were
primarily based on electrical circuits and operate in continuous time with the ability to make
synchronization at discrete instants. Later utilizing the Sampling Theorem, sample-data
control systems operating in discrete time were developed.
The problem of communication is to convey message from one point in space to another
point in space as reliably as possible. The message on being transmitted through the channel,
by being subject to various forms of disturbance (noise) is changed/garbled. By coding
the message (through addition of redundancy), it is possible to retrieve the original message
from the received message.
Thus, the three problems: control, communication and computation can be described
through the illustration in Figure 1.1. From the illustration, the message that is generated
may be in continuous time or discrete time. Utilizing the Sampling Theorem, if the original
signal is band-limited, then the message can be sampled. The sampled signal forms the
message in discrete time. The message is then encoded through an encoder. It is then
transmitted through a channel. If the channel is a waveform channel, various digital
modulation schemes are utilized in encoding. The signal, on reaching the receiver is
demodulated through the demodulator and then it is decoded. This whole assembly of
hardware equipment forms the COMMUNICATION equipment.
The above summary provided the efforts of engineers, scientists and mathematicians
to synthesize the automata which serve the purpose of CONTROL, COMMUNICATION
AND COMPUTATION. These functions are the basis of automata that stimulate living
systems. These automata model the living systems. In other words, control, communication
and computation automata when properly assembled and co-ordinated lead to robots which
simulate some functions of various living systems.
In the above effort at simulating the functions of living systems in machines, traditionally
the control, communication and computation automata led to sophisticated robots (which
served the purpose pretty well). Thus, the utilitarian viewpoint was partially satisfied.
But, the author took a more FUNDAMENTAL approach to the problem of simulating a
6 Multidimensional Neural Networks: Unified Theory
are extended to multi/infinite dimensional linear systems. Also, the results developed
in one dimension for computation of optimal control are immediately extended to certain
multi/infinite dimensional linear systems. This result in association with the formalization
of multi/infinite dimensional logic theory, multi/infinite dimensional coding theory
(as an extension of one dimensional linear and non-linear codes) provided the formal
UNIFIED THEORY in multi/infinite dimensional linear systems. The formal
mathematical detail on models of living system functions are provided in Chapters 2 to
5. These chapters provide the details on control, communication and computation
automata in multiple dimensions. Several generalized models of neural networks are
discussed in Chapters 5 to 9. Also relationship between neural networks and optimal filters
is discussed in Chapter 7. In Chapter 10 advanced theory of evolution is discussed.
Mathematical models of living system functions motivated us to take a closer look at the
functions of natural living systems observed in physical reality. In physical reality, we observe
homo-sapiens as well as lower level animals such as tigers, lions, snakes etc. It is reasoned
that some of the functions of natural living systems are misunderstood or un-understood.
Biological living systems such as homo-sapiens lead to a biological culture. In a
biological culture that originated during the ice age in oceans, various living species
were living in the oceans. Through some process, the two necessities of metabolism and
reproduction were developed by all living species. The homo-sapien species was
responsible for our current understanding of various activities, functions of observed
living systems. The author hypothesizes that the homo-sapien interpretations are totally
wrong. For instance,
• Metabolism which leads to killing of one species by another is unnecessary to
sustain life.
• The belief (like many superstitions) that death and aging are inevitable is only
partially true.
To be more precise, it should be possible to take non-decayed organs of a living species
and by recharging the dead cells, make it living. Many such innovative ideas on living
systems are discussed in Chapter 10.
The only necessities of natural living systems that are observed are ‘metabolism’ and
‘reproduction’. By and large the only organization and community formation that we see
in other (than homo-sapiens) natural sustems are of the following form
• Migratory pattern of birds
• Sharing the information on the place of food
• Forming a group of families to satisfy the reproductive needs
• Occasional bird songs of mutual courtship
• Occsional rituals related to protecting the members of their group etc.
Introduction 7
The organization, culture observed in other biological systems and other natural living
systems is nowhere comparable to those observed in the homo-sapien species. But the
author hypothesizes that this marginal/poor organization is primarily due to lack of co-
ordination which is achieved through the language. Thus, major effort in organizing the
lower level species of living systems is through teaching a language. Thus, organization of
living systems other than the homo-sapiens (for homo-sapien and other purposes) should
be possible.
An important part of organizing the homo-sapiens was the educational system through
an associated language. In the same spirit, by teaching some lower level animals to speak
certain language, they could be organized/educated to understand as well as develop
science and technology. When the lower level animals are organized in a zoo through
various methods, they could lead to a culture and a civilization.
Various natural living machines have developed organs/functional units due to
evolutionary needs. These functional units essentially include sensors to collect video,
audio information or more generally sensors to collect data on the surrounding environment
in the universe. The data gathered by the living machine from the surrounding environment
in physical reality is utilized to perform some primary functions such as metabolism,
reproduction etc. The data is processed by various functional sub-units inside the brain of
a living machine. Thus the understanding of the operation of various functional sub-units
in the brain of natural living machines leads to building artificial living machines which
are far superior in functional capabilities.
This page
intentionally left
blank
CHAPTER
2
Multi/Infinite Dimensional
Neural Networks, Multi/Infinite
Dimensional Logic Theory
2.1 INTRODUCTION
general class of information processing sub-units and thus the units operate on arrays whose
entries are allowed to assume multiple (not necessarily binary) values.
Automata which operate on multidimensional arrays to perform desired operation
can be defined heuristically in many ways. In some applications such as in 3-d array/
image processing, the information processing operation can only be defined heuristically
based on the required function. But, a more organized approach to define multidimensional
logic functions is discovered and formalized by the author. In this chapter, the author
describes the mathematical formalization for multidimensional logic units. The relationship
between multidimensional logic units and multidimensional neural networks is also
discussed. The generalization of the results to infinite dimensions is also briefly described.
Two dimensional neural networks were utilized by various researchers working in the
area of neural networks. The application of two dimensional neural networks to various
real world problems was also extensively studied. But, an effective mathematical abstraction
for modeling two/multi/infinite dimensional neural networks was lacking. The author in
this chapter demonstrates that tensors provide a mathematical abstraction to model multi/
infinite dimensional neural networks.
The contents of this chapter are summarized as follows:
A mathematical model of an arbitrary multidimensional neural network is developed. A
convergence theorem for an arbitrary multidimensional neural network represented by a
fully symmetric tensor is stated and proved. The input and output signal states of a
multidimensional logic gate/neural network are related through an energy function,
defined over the fully symmetric tensor representing the multidimensional logic gate, such
that the minimum/maximum energy states correspond to the output states of the logic
gate realizing a logic function. Similarly, a logic circuit consisting of the interconnection of
logic gates, represented by a symmetric tensor, is associated with a quadratic/higher degree
energy function. Multidimensional logic synthesis is described. Infinite dimensional logic
theory, logic synthesis are briefly discussed through the utilization of infinite dimension/
order tensors.
This chapter is organized as follows. In section 2, a mathematical model of an arbitrary
multidimensional neural network and associated terminology is developed. In section 3, a
convergence theorem for an arbitrary multidimensional neural network is proved. In section
4, the input/stable states of a multidimensional neural network are associated with the
input/output signal states of a multidimensional logic gate. A mathematical model of an
arbitrary multidimensional logic gate/circuit is described. Thus, multidimensional logic
theory, logic synthesis is formalized. In section 5, infinite dimensional logic theory, logic
synthesis are described. In section 6, the relationship between multidimensional neural
networks, multidimensional logic theories, various constrained static optimization problems
is elaborated. Various constrained optimization problems that commonly arise in various
problems are listed. Various innovative ideas in multidimensional neural networks are
briefly described. The chapter concludes with a set of conclusions.
Multi/Infinite Dimensional Neural Networks, Multi/Infinite Dimensional Logic Theory 11
∑C X
i =1
i 1 (2.1)
∑∑ C
i =1 j =1
ij Xi X j (2.2)
∑∑∑ C
i =1 j =1 k =1
ijk Xi X j K k (2.3)
is called a homogeneous form (BoT) of degree three and so on. Given the components of
a tensor of order n, of dimension m , it is possible to define a homogeneous form of
degree n.
The connection structure of a one dimensional neural network, the symmetric matrix,
is naturally associated with a homogeneous quadratic form as the energy function, which
is optimized over the one dimensional hypercube. Thus, in one dimension, to utilize a
homogeneous form of degree n as the energy function, a generalized neural network is
employed, in which, at each neuron, an arbitrary algebraic threshold function is computed.
But, in multidimensions, to describe the connection structure of a neural network, a tensor
is necessarily utilized.
12 Multidimensional Neural Networks: Unified Theory
state tensor of neuronal states at the time instant n. Thus, we first compute the outer product
of connection tensor and the state tensor of neurons at the time instant n and perform the
contraction over all the indices (representing the neurons) connected to a chosen neuron.
Thus, this inner product operation followed by determining its sign/parity/polarity
(positive or negative value) gives us the state tensor at time instant n+1. This procedure is
repeated at all the neurons where the state is updated.
Remark
Throughout the research article, the notation “multidimensional neural network” is utilized.
The standard notation associated with tensors utilizes the term, “dimension” to represent
the number of values an independent variable can assume and the term, “order” to represent
the number of independent variables. Thus, the state tensor order represents the number
of independent dimensions in the multidimensional neural network, MN. The notational
confusion between the usage of terms “order”, “dimension” should be resolved from the
context.
where,
m m
Hi 1, i 2,..., in (t) = ∑ ... ∑ Si 1,..., in ; j 1,..., jn X j 1,..., jn (t) − Ti 1,..., in (t) (2.8)
j1= 1 jn = 1
The next state of the network Xi1,...,in (t +1) is computed from the current state by
performing the evaluation (2.7) at a subset of the nodes of the multidimensional neural
network, to be denoted by G. The modes of operation of the network are determined by
the method by which the subset G is selected in each time interval.
If the computation is performed at a single node in any time interval, i.e.|G| = 1, then
we will say that the network is operating in a serial mode, and if |G| = m n, then we will
say that the network is operating in a fully parallel mode. All other cases, i.e. 1 < |G| < m n,
will be called parallel modes of operation. Unlike a one dimensional neural network,
multidimensional neural network lends itself for various parallel modes of operation. It is
possible to choose G to be the set of neurons placed in each independent dimension or a
union of such sets. The set G can be chosen at random or according to some deterministic
rule. A state of the network is called stable if and only if
Xi 1,..., in (t) = Sign (S ⊗ Xi 1,..., in (t) − Ti 1,...iin ) (2.9)
where ⊗ denotes inner product i.e. outer product followed by contraction over the
appropriate indices. Once the network reaches such a state, there is no further change in
the state of the network no matter what the mode of operation is.
contribution from all neurons is first determined and its sign is determined to arrive at the
updated state of the neuron. Mathematically, this is achieved by computing the outer
product of the fully symmetric tensor S and the {+1, –1} state tensor of the multidimensional
neural network. In tensor notation, this is specified by
Ci 1,..., in ; j 1,..., jn = Si 1,..., in ; j 1,..., jn X j 1,..., jn . (2.10)
The total synaptic contribution at any neuron located at the location (i1, i2,..., in) is
determined by contracting the above outer product over all the indices {j1, j2,..., jn} i.e.
over all the neurons connected to it through the synaptic weights determined by the
components of the fully symmetric tensor S. The resultant scalar synaptic contribution
at any neuron (i1, i2,..., in) is thus determined by the inner product operation. The sign of
the resulting scalar constitutes the updated state of neuron. Thus, the state of any neuron
(i1, i2,..., in) in the multidimensional neural network in the serial mode of operation is
given by
m m
Xi 1, i 2,..., in ( k + 1) = Sign ( ∑ ... ∑ Ci 1,..., in ; j 1,..., jn ( k ) − Ti 1,..., in ) (2.11)
j1= 1 jn = 1
where < > denotes the inner product operator between the compatible tensors. It is
assumed in the above specification of the energy function of the neural network MN that
the threshold at each neuron is zero. This is no loss of generality, since by augmenting
the tensor S and the state tensor, the threshold values can be forced to be zero. It is easy
to see that such a thing can always be done by considering a one dimensional neural
network in which the threshold at each neuron is non-zero and arriving at a network in
which the threshold at each neuron can be made zero by augmenting the state vector as
well as the connection matrix.
Utilizing the definition of the above energy function of the network, let
∆E = E1 ( t + 1) − E1 ( t) , (discrete time index t instead of k is used) be the difference in the
energy associated with two consecutive states (transited in the serial mode of operation of
the multidimensional neural network ), and let ∆X i 1,....in denote the difference between the
next state and the current state of the node at location (i1, i2,..., in) at some arbitrary time t.
Clearly,
16 Multidimensional Neural Networks: Unified Theory
∆Xi 1,..., in = {−2, if, Xi 1,..., in (t) =1, and, Sign( Hi 1,...,in (t)) =− 1 (2.14)
∆E = ∆Xi 1,..., in (∑
j1
..∑ Si 1,..., in ; j 1,..., jn X j 1,..., jn + ∑ ..∑ Si 1,...,in ; j 1,..., jn Xi 1,...,in )
jn i1 in
Hence, since ∆Xi 1,..., in Hi 1,..., in ≥ 0 and Si 1,..., in i 1,..., in ≥ 0 , it follows that at every time instant,
∆E ≥ 0 . Thus, since the energy E is bounded from above by the appropriate norm of S, the
value of energy will converge. Now, it is proved in the following that convergence of
energy implies convergence to a stable state.
Once the energy in the network has converged, it is clear from the following facts that
the network will reach a stable state after utmost m 2n time intervals.
One dimensional logic theory as well as logic synthesis deal with information processing
logic gates, logic circuits which operate on one dimensional arrays of zeroes and ones (or
more generally one dimensional arrays containing finitely many symbols ). The operations
performed by AND, OR , NOR, NAND, XOR gates have appropriate intuitive interpretation
in terms of the entries of the one dimensional arrays i.e. vectors. Any effort to generalize
the one dimensional logic operations to multidimensions leads to various heuristic
possibilities and requires considerable ingenuity in formalizing a definition. But, in the
following, utilizing the multidimensional neural network model described above, a formal/
mathematical procedure to multidimensional logic theory is described.
The input and output signal states of a multidimensional logic gate are related through
an energy function. Equivalently, the multidimensional logic functions are associated with
the local optimum of various energy functions defined over the set of input m-d arrays. In
view of the mathematical model of a multidimensional neural network described in section
3, it is most logical to define the minimum/ maximum energy states of a multidimensional
neural network (optimizing an energy function over the multidimensional hypercube ) to
correspond to the multidimensional logic gate functions operating on the input arrays.
Definition 2.1
A multidimensional logic function realized through a multidimensional logic gate (with
inputs and outputs) is defined to be the local minimum/maximum of the energy function
of an associated multidimensional neural network.
Equivalently, the local optima of the energy function of a multidimensional neural
network correspond to the logic functions that are realized through various logic gates.
The following detailed description is provided to consolidate the above definition vital
to multidimensional logic theory.
The logic functions which operate on the input array are identified to be the stable states
of a multidimensional neural network ( in multiple independent variables i.e. time, space
etc.). These are the transformations between a set of input states of a multidimensional neural
network which converge to a stable state on iteration of a multidimensional neural network.
In other words, in multiple independent variables, the mapping between the input states
and the stable states to which the network converges on iteration are defined to be the logic
functions realized by a multidimensional logic gate.
By the proof of the convergence theorem, the logic functions are invariants of a tensor on
the multidimensional hypercube. The definition of multidimensional logic function is illustrated
in Figure 2.1.
In the case of one dimensional logic theory, it has been shown that the set of stable
states of a neural network correspond to various one dimensional logic functions (CAB).
With the definition of multidimensional logic function stated and clarified in many
redundant ways above, multidimensional logic synthesis is described in the following.
18 Multidimensional Neural Networks: Unified Theory
Theorem 2.2: Given a multidimensional logic circuit, there exists a block symmetric
tensor S, representing the inter-connection structure of multidimensional neural
networks (modeling the multidimensional logic gates). The mapping between the input
and output states of a multidimensional logic circuit corresponds to that between input
tensors, local optima of energy function (quadratic/higher degree) represented by the
block symmetric tensor. The stable states of interconnected multidimensional neural
networks represent the multidimensional logic functions synthesized by the logic
circuit.
Multi/Infinite Dimensional Neural Networks, Multi/Infinite Dimensional Logic Theory 19
The proof of the above theorem follows from the convergence theorem and is avoided
for brevity.
The classification of multidimensional logic circuits is based on the type of transitions
allowed between the states in the multidimensional state space. The type of state transitions
fall into the following form:
(a) whether the next state reached depends on the past state only or not, as in one
dimensional logic synthesis,
(b) the type of neighbourhood of states about the current state on which the next state
reached depends. The type of neighbourhoods about the current state are classified
into few classes. These classes are similar to those utilized in the theory of random
fields, multidimensional image processing,
(c) the classification of trajectories transited by the multidimensional neural network
or a local optimum computing circuit/scheme.
In the above discussion, we considered quadratic forms as the energy functions
(motivated by the simplest possible neural network model) optimized by the logic gates,
which when connected together lead to logic circuits. This approach toward
multidimensional logic theory motivates the definition of more ‘general‘ switching/logic
functions as the local optimum of higher degree forms over the various subsets of
multidimensional lattice (hypercube, bounded lattice etc.).
Definition 2.2
A generalized logic function (representing a generalized logic gate or generalized logic
circuit) is defined as a mapping between an m -dimensional input array and the local
optimum of a tensor based form of degree greater than or equal to two, over various
subsets of multidimensional lattice (the multidimensional hypercube,
multidimensional bounded lattice). These local optimum of higher degree form (based
on a tensor) are realized through the stable states of a generalized multidimensional
neural network.
In (Rama 3) , it is shown that the strictly generalized logic function defined above has
better properties than the ordinary logic function described in Definition 4.1. The generalized
logic function is related to a multidimensional encoder utilized for communication through
multidimensional channels.
Now, with the generalized multidimensional logic gate defined above, logic synthesis
with these types of logic gates involves interconnection of them in certain topology.
This ordinary and generalized approach to multidimensional logic gate definition and
logic synthesis is depicted in Figures 2.1 to 2.3. Detailed documentation on logic synthesis
and design of future information processing machines is being pursued.
Multi/Infinite Dimensional Neural Networks, Multi/Infinite Dimensional Logic Theory 21
Proof: One dimensional neural network with state vector size infinity is uniquely defined
by (S, T) where S is an infinite dimensional (rows as well as columns) symmetric matrix
and T is an infinite dimensional vector of thresholds at all the neurons.
The state of the neural network at time t is a vector whose components are +1 and –1. The
next state of a node is computed by
Xi (t + 1) = Sign(Hi (t)) = + 1, if, Hi (t) ≥ 0 (2.18)
–1, otherwise
where,
∞
Hi (t ) = ∑ Sji X j (t ) − Ti (2.19)
j =1
The entries of S are such that the infinite sum in the above expression converges.
The next state of the network i.e. X ( t+1 ), is computed from the current state by
performing the evaluation (2.18) at a subset of the nodes of the network, to be denoted by
K. The mode of operation of the network is determined by the method by which the set K
is selected at each time interval i.e. if |K| = 1, then we will say that the network is operating
in a serial mode. Without loss of generality T = 0.
In the following, we consider the serial mode of operation. We argue that with the
above stated updating scheme at an arbitrary chosen neuron, the energy function (quadratic)
increases.
∞ ∞
E( k ) = ∑∑ Sij Xi ( k ) X j ( k ) (2.20)
i =1 j =1
Without loss of generality, consider the case where all the thresholds are set to zero. It
is easy to see (set the last component of state vector to –1 and appropriately augmented
entries of S) that for any finite L, we have
L L L L
∑∑ S
i =1 j =1
ij Xi ( k ) X j ( k ) ≤ ∑∑ S
i =1 j =1
ij Xi ( k + 1) X j ( k + 1) (2.21)
by the convergence theorem for one dimensional neural networks of order L, for any
arbitrary L. Now let L tend to infinity. Hence
∞ ∞ ∞ ∞
∑∑ Sij Xi (k) X j (k ) ≤
i =1 j =1
∑∑ S
i =1 j =1
ij Xi ( k + 1) X j ( k + 1) (2.22)
dimensional vector is state updated in the parallel mode, for every finite segment of it,
either there is convergence or a cycle of length 2 (utmost two vectors for which the energy
values are the same) exists. Since, the energy function associated with the infinite
dimensional vector is the limit of those associated with the finite segments, it is evident
that the scalar energy values converge or a cycle of length utmost two exists. Q.E.D.
Now, we discuss briefly, the other infinite dimensional neural networks of dimension
infinity and order finite/infinite ( modeling tensor variables).
The following lemma is well known from the set theory.
dimensional neural networks considered, convergence to a stable state in the serial mode
of operation is ensured ( i.e. there are no cycles in the state space ).
In the parallel mode of operation of the infinite dimensional neural network, by the
same reasoning as in Theorem (2.1), the network will always converge to a stable state or
to a cycle of length 2 depending on the order of the network ( i.e. the cycles in the state
space are of length less than or equal to 2). Q.E.D
As in the case of multidimensional logic theory, the above convergence theorem is
utilized as the basis to describe infinite dimensional logic theory as well as logic
synthesis. It should be noted that the infinite dimensional logic synthesis only has
theoretical importance. Brief discussion on infinite dimensional versions is provided
for the sake of completeness.
Definition 2.3
An infinite dimensional logic function realized through an infinitedimensional logic gate
(with inputs and outputs) is defined to be the local minimum/maximum of the energy
function of an associated infinitedimensional neural network. Equivalently, the local optima
of the energy function of an Infinitedimensional neural network correspond to the logic
functions that are realized through various logic gates.
With the above definition of infinite dimensional logic function, detailed results in
infinite dimensional logic synthesis are being developed along the lines of those in
multidimensional logic synthesis. Brief description is provided in the following for the
sake of completeness.
An infinitedimensional logic circuit consists of an arbitrary interconnection of
infinitedimensional logic gates. Infinitedimensional logic synthesis, as in one dimension
involves synthesizing logic circuits for different purposes. These infinite dimensional logic
circuits only have theoretical implementations. Infinitedimensional logic synthesis depends
on how the infinitedimensional logic gates are connected to one another. The structure of
interconnection determines the structure of symmetric tensor (order and/or dimension is
infinity) representing the infinitedimensional logic circuit.
In this technical memorandum, the author for the first time associates energy functions
with the state updating scheme. The multidimensional versions of these continuous time
neural networks are discussed in (Rama 4).
2.7 CONCLUSIONS
A mathematical model of an arbitrary multidimensional neural network is described. This
model is utilized to prove the convergence theorem for multidimensional neural networks.
Utilizing the convergence theorem, multidimensional logic functions are defined and
multidimensional logic synthesis is discussed. Infinite dimensional logic synthesis is briefly
described. Various constrained static optimization problems of utility in control,
communication, computation and other applications are summarized. Several innovative
themes on one/multidimensional neural networks are summarized.
26 Multidimensional Neural Networks: Unified Theory
REFERENCES
(BoT) A. I. Borisenko and I. E. Tarapov, “Vector and Tensor Analysis with Applications,“ Dover
Publications Inc., New York,
(BrG) J.Bruck and J.W. Goodman, “A Generalized Convergence Theorem for Neural
Networks”, IEEE Transactions on Information Theory, Vol. 34, No. 5, Sept 88.
(CAB) S.T. Chakradhar, V.D. Aggarwal and M.L. Bushnell, “Neural Models and Algorithms
for Digital Testing”, Kluwer Academic Publishers.
(HoT) J. J. Hopfield and D. W. Tank, “Neural Computations of Decisions in Optimization
Problems,“ Biological Cybernetics., Vol. 52, pp. 41-52, 1985.
(Rama 1) Garimella Rama Murthy, “Multi/Infinite Dimensional Logic Synthesis,“ Manuscript
in Preparation.
(Rama 2) Garimella Rama Murthy, “Unified Theory of Control, Communication and
Computation-Part 1,” Manuscript to be submitted to IEEE Proceedings.
(Rama 3) Garimella Rama Murthy, “Multi/Infinite Dimensional Coding Theory: Multi/Infinite
Dimensional Neural Networks: Constrained Static Optimization,” Proceedings of 2002 IEEE
Information Theory Workshop, October 2002.
(Rama 4) Garimella Rama Murthy, “Optimal Control, Codeword, Logic Function Tensors—
Multidimensional Neural Networks, International Journal of Systemics, Cybernetics and
Informatics, October 2006, pages 9-17.
(Rama 5) Garimella Rama Murthy, “Signal Design for Magnetic and Optical Recording Channels:
Spectra of Bounded Functions, “ Bellcore Technical Memorandum, TM-NWT-018026.
Multi/Infinite Dimensional Coding Theory: Multi/Infinite Dimensional Neural Networks 27
CHAPTER
3
Multi/Infinite Dimensional
Coding Theory: Multi/Infinite
Dimensional Neural
Networks
Networks— —Constrained Static
Optimization
3.1. INTRODUCTION
In the recent years, technological developments in parallel data transfer mechanisms led
to HIPPI (high performance parallel interface), SMMDS (switched multi-megabit data
service), FDDI (fiber distributed data interface). To match these high speed parallel data
transfer mechanisms, multidimensional coding theory has been originated and some ad
hoc procedures were developed for designing linear as well as non-linear codes.
Multidimensional codes are utilized to encode arrays of symbols for transmission over
a multidimensional communication channel. Thus, the central objective in multidimensional
coding theory is to design codes that can correct many errors and whose encoding/decoding
procedures are computationally efficient. A multidimensional error correcting code can
be described by an energy landscape, with the peaks of the landscape being the codewords.
The decoding of a corrupted codeword (array) which is a point in the energy landscape
that is not a peak is equivalent to looking for the closest peak in the energy landscape. An
alternative way to describe the problem is to design a constellation which consists of a set of
points on a multidimensional lattice that are enclosed within a finite region, in such a way
that a certain optimization constraint is satisfied.
Neural network model, simulated annealing, relaxation techniques are some of the various
computation models (based on optimization) that have been attracting much interest because
they seem to have properties similar to those of biological and physical systems. The standard
computation performed in a neural network is the optimization of the energy function. The state
space of a neuro-dynamical system can be described by the topography defined by the energy
function associated with the network. The connection structure of a neural network can either
be distributed on a plane or in multidimensions (Rama 2).
Thus, the field of multidimensional neural network theory and the field of
multidimensional coding theory are linked through the common thread of optimization of
28 Multidimensional Neural Networks: Unified Theory
possible states +1 and –1. The state of node ( i1, i 2,..., in ) at time t is denoted by Xi 1, i 2..., in (t) .
The state of MN at time t is the tensor X i1, i2,...1. in (t)of dimension m and order n. The state
evolution at node ( i1, i 2,..., in ) is computed by
Xi 1, i 2,..., in (t + 1) = Sign (Hi 1, i 2,..., in (t )) (3.1)
where
m m
Hi 1,..., in (t ) = ∑ ... ∑ Si1,..., in; j1,..., jn X j1, j 2,..., jn (t) − Ti1,..., in (t)
j1= 1 jn = 1
The next state of the network i.e. X i1, i2,..., in (t + 1), is computed from the current state by
performing the evaluation (3.1) at a subset of nodes of the multidimensional neural network,
to be denoted by G. The modes of operation are determined by the method by which the
subset G is selected in each time interval. If the computation (3.1) is performed at a single
node in any time interval i.e. G| = 1 , then we will say that the network is operating in the
serial mode, and if G|= mn , then we will say that the network is operating in the fully
parallel mode. A state is called stable if and only if
Xi 1, i 2,..., in (t) = Sign (S ⊗ Xi 1,..., in (t) − Ti 1,..., in ) (3.2)
where ⊗ denotes inner product (the symbol is sometimes suppressed for notational brevity).
Once a neural network reached such a state there is no change in the state of the network
no matter what the mode of operation is.
An important feature of the network MN is the convergence theorem stated below.
Theorem 3.1: Let MN = (S, T) be a multidimensional neural network of dimension m and
order n. S is a fully symmetric tensor of order 2n and dimension m . The network MN
always converges to a stable state while operating in the serial mode (i.e. there are no
cycles in the state space) and to a cycle of length utmost 2 while operating in the fully
parallel mode.( i.e. cycles in the state space are of length utmost 2 ).
This theorem is proved in (Rama 2). This theorem suggests the utilization of MN as a
device for performing a local search of the optimum of an energy function. In the following,
we formulate a problem that is equivalent to determining the global maximum of an energy
function and how to map it onto a multidimensional neural network.
Definition 3.1
Let G = (V, E) be a weighted and undirected non-planar graph in multidimensions where
V denotes the set of nodes of G and E denotes the set of edges of G. Let K be the fully
symmetric tensor whose components are the weights of the edges of G.
Let V1 be a subset of V, and let V–1 = V–V1. The set of edges each of which is incident at
a node inV1 and at a node in V–1 is called a cut in G.
Multi/Infinite Dimensional Coding Theory: Multi/Infinite Dimensional Neural Networks 31
Definition 3.2
The weight of a cut is the sum of its edge weights. A minimum cut (MC) of a non-planar
graph/graphoid is a cut with minimum weight.
In the following, we show the equivalence between the minimum cut problem in a
graphoid (from now onwards, we call the connection structure of a multidimensional neural
network also as a graphoid ) and the problem of maximizing the quadratic form as the
energy function of a multidimensional neural network. Every non-planar graph including
the connection structure of a multidimensional neural network is a Graphoid (by definition).
Theorem 3.2: Let MN = (S, T) be a multidimensional neural network with all the thresholds
being zero i.e. T = 0. The problem of finding a state V for which the quadratic energy
function E is maximum is equivalent to finding a minimum cut in the graphoid
corresponding to MN.
Proof: Since T = 0, the energy function is given by
m m m m
E= ∑ ... ∑ ∑ ... ∑ S
i1= 1 in = 1 j 1 = 1 jn = 1
i 1,..., in ; j 1,..., in Xi 1,..., in X j 1,..., jn (3.3)
non-planar graph type structure called graphoid (not necessarily the connection structure
of a multidimensional neural network). Consider a fully symmetric tensor of dimension m
and order 2n; which is utilized to describe the connection structure of a multidimensional
neural network.
A subset of the set of edges of G can be represented by a characteristic tensor of order 2n
with the edge between two nodes Vi 1, i 2,..., in , Vj 1, j 2..., jn , leading to an entry of +1 at those locations
in the tensor. Thus, an edge characteristic tensor of a graphoid E is defined such that
TVˆ1
TVˆ2
DGˆ = .
..
(3.6)
TVˆ
n
where TVˆ represents the tensor of the set of edges incident upon the node Vi . It should be
i
noted that the incidence tensor is a blocked tensor and the above illustration is shown to
aid the imagination of the reader.
Various concepts associated with planar graphs are utilized as the basis to define the
following concepts associated with a graphoid (non-planar). They provide the notation
associated with graphoid theoretic codes.
The following lemmas are very easy to verify.
Lemma 3.1: The set of characteristic tensors that correspond to the cuts in a connection
structure G = (V , E ) of a multidimensional neural network form a linear tensor/m-d vector
(depending on the notational convenience) space over GF(2) in multidimensions of
dimension ( V − 1) .
The linear tensor/m-d vector space that corresponds to the cuts of a graphoid Ĝ will
be called the only cut space of Ĝ . Furthermore, the circuits in a graphoid also constitute a
linear tensor/vector space.
Lemma 3.2: Given a connected graphoid G = (V , E ) ; the incidence tensor of Ĝ has rank
Multi/Infinite Dimensional Coding Theory: Multi/Infinite Dimensional Neural Networks 33
( V − 1) . Every block tensor in DĜ associated with a node is a characteristic tensor of a cut
( )
and every Vˆ − 1 block tensors of DĜ corresponding to different vertices/nodes of the
graphoid form a basis for the cut space of Ĝ .
Hence, given a connection structure Ĝ , the cut space of the graphoid is a
multidimensional linear block code of dimension Vˆ − 1 . ( )
For the sake of brevity, in the following, we only consider ‘cut codes’.
Given a graphoid, Ĝ , an interesting question is how to formulate the maximum
likelihood decoding (MLD) problem of the code CĜ in a graphoid-theoretic language.
( )
That is, given a graphoid Ĝ = Vˆ , Eˆ and a (0, 1) tensor Y of dimension m and order
2n; what is the codeword in C closest to Y in Hamming distance?
Ĝ
The following lemmas will answer the questions.
Hamming Distance: Given two (0,1) tensors, X, Y; the Hamming distance between m
dimensional tensors of order 2n is the number of places where they differ.
This definition is motivated by transmitting a binary tensor X through a noisy
multidimensional channel, observing the output Y and counting the number of errors that have
occurred.
( )
Lemma 3.3: Let Ĝ = Vˆ , Eˆ be a graphoid. Let CĜ be the multidimensional code associated
with Ĝ . Let Y be a (0,1) tensor of order 2n (dimension m). Construct a new graphoid, to be
defined/denoted by Ĝ Y; by assigning weights to the edges of Ĝ as follows:
Wi1, i2,..., in; j1,..., jn = (–1) Yi1,..., in; j1,..., jn ((−1)Power…) (3.7)
Wi 1, i 2,..., in ; j 1,..., jn is the weight associated with the edge (i1,..., in ; j1,..., jn ) in Ĝ . Then the
maximum likelihood decoding of the tensor Y with respect to CĜ is equivalent to finding
the minimum cut in Gˆ Y .
Proof: Assume the number of ones in Y is b. Let P be an arbitrary codeword in CG. Let L i,j denote
the number of positions in which P contains an i ∈ {0, 1} and Y contains a j ∈ {0,1}. Clearly,
b = L0,1 + L1,1 (3.8)
Thus,
−L1,1 + L1, 0 = L0,1 − b + L1,0 (3.9)
= L0,1 + L1,0 − b (3.10)
34 Multidimensional Neural Networks: Unified Theory
Minimizing the right hand side of the above expression over all P ∈ C G is equivalent to
finding a codeword which is the closest to Y. On the other hand, minimizing the left hand
side is equivalent to finding the minimum cut in G Y. Q.E.D
From the above lemma, the following theorem follows.
( )
Theorem 3.3: Let Ĝ = Vˆ , Eˆ be a graphoid. Then, maximum likelihood decoding of a
tensor word Y with respect to CĜ is equivalent to finding the maximum of the quadratic
energy function E of the multidimensional neural network defined by the graphoid Gˆ Y
with all its threshold values equal to zero.
Proof: By Lemma 3.3, maximum likelihood decoding of Y with respect to C Ĝ is equivalent
to finding the minimum cut in Gˆ Y . By Theorem 3.2, finding the minimum cut in a graphoid
is equivalent to finding the global maximum of the energy function (quadratic) of a
multidimensional neural network defined by a graphoid with all the thresholds at each
neuronal element set to zero. Q.E.D.
Graphoid based error correcting codes are very limited since the connection structure
of a multidimensional neural network is represented by a fully symmetric tensor. This
imposes restrictions on the minimum distance of multidimensional codes. Thus, a natural
question that arises is whether the equivalence stated above in the Theorem 3.3 can be
generalized to arbitrary multidimensional linear block codes.
Graphoid codes arose naturally out of the topological properties of the connection structure
of a multidimensional neural network. The connection structure required a fully symmetric
tensor to represent it. The neural network model enabled the association of a quadratic energy
function with the fully symmetric tensor and its optimization over the multidimensional
hypercube. Thus, the encoders and decoders of graphoid codes are defined through topological
structure and optimization of multivariate polynomials. Since, an arbitrary tensor like the fully
symmetric tensor constitutes a linear operator, unlike graphoid codes, arbitrary
multidimensional linear codes are first defined through their algebraic structure in the next
section. Then the maximum likelihood decoding problem of such codes is discussed.
Recent advances in high speed parallel data transfer mechanisms based on light wave/
optical networks motivated the design and analysis of multidimensional codes. Several
researchers utilized ad hoc techniques (sometimes pseudo-mathematical techniques) to
design and analyze multidimensional codes based on the extensions of the ideas in one
dimensional error control coding theory.
The author for the first time developed the idea of utilizing ‘tensor linear operator’ for
the design and analysis of multi/infinite dimensional linear as well as non-linear codes
conceived as sub-spaces over tensor spaces.
Multi/Infinite Dimensional Coding Theory: Multi/Infinite Dimensional Neural Networks 35
where ⊗ denotes the inner product operation (between tensors defined over a finite field)
by means of exclusive or operation between the components of outer product of tensors
(contraction over appropriate indices of the sum of products of binary variables).
The above procedure of generating the codeword tensor from an information tensor leads to
the following interesting considerations which are inherent to multidimensional code design.
In one dimension, a binary information vector of length k is encoded into a codeword
vector of length n by padding the parity bits to it. The parity check equations obtained
through the parity check matrix determine these bits. In the case of two/multidimensional
array of information bits, there are many ways to encode the array into a codeword array.
Even in the simplest two dimensional array case, by padding a border of parity bits along
the row wise as well as column wise directions, the codeword array can be generated. In
the following, this degree of freedom in multidimensional coding is formally described.
A multidimensional information array (information tensor) is mapped into a codeword
array in the following ways:
(1) An m-dimensional information tensor of order n is mapped into an m-dimensional
codeword tensor of order l (l > n),
(2) An m -dimensional information tensor of order n is mapped into k -dimensional
codeword tensor (k > m ) of order n,
(3) An m-dimensional information tensor of order n is mapped into a k -dimensional
(k > m) codeword tensor of order l (l > n).
For the purpose of notational convenience, in the following encoding through the
operation (1) is only utilized. It is easy to realize that by transposing the information as
well as generator tensors, the operation (2) in encoding is achieved. But to encode an
information tensor into a generator tensor through the operation (3), a second generator
type tensor is utilized.
Various ideas familiar in one dimensional coding theory (parity check matrices,
primitive polynomials, basis, cosets etc.) have corresponding parallels in multi/infinite
dimensional coding theory based on the tensor linear operator defined over a finite field.
The detailed translation from one dimensional encoding/decoding algorithms to
36 Multidimensional Neural Networks: Unified Theory
m m
H (Xi 1, i 2,..., in ) = ∑ ...∑ Pi1, i 2,..., in log (1 Pi1, i 2,..., in )
i1= 1 in = 1
(3.12)
Given the basic idea of the above definition, results from one dimension are generalized
to multidimensions utilizing the principles described in (Rama 3). Complex sources such
as a Markovian Source require some sophistication in defining the entropy/uncertainty of
the source. The interesting channel model in multidimensions is the discrete memoryless
channel represented through a stochastic tensor whose elements are conditional
probabilities Pj 1,..., jl , i 1,..., ik . This corresponds to a Markov random field. Detailed theorems
are derived utilizing the principles described in (Rama 3).
With the multidimensional encoding scheme formally described, it is proved in the
following that the maximum likelihood decoding problem of a multidimensional linear
block code is equivalent to the maximization of multivariate polynomial (whose terms/
monomials are described in terms of the entries of received, generator tensors) associated
with the generator/received tensors over the multidimensional hypercube.
The essential idea in the derivation of the desired result is (generalization of Theorem
3.3 to arbitrary multidimensional linear codes) to represent the symbols of the additive
group as symbols in the multiplicative group through the following transformation:
a → ( − 1) a i .e . 0 → 1, 1 → − 1 . (3.13)
Thus, the information tensor Bi 1,..., in is represented by the tensor Xi 1,..., in , where the
Bi 1,...in
component Xi 1,..., in = (–1) . The encoded codeword C j 1,..., jl is thus represented by the
tensor Yj 1,..., jl . Hence, a component of the tensor Y is given by
m m
= ∏ ...∏ X i 1,....,
C j1 ,...., jl G ; j1,...... jl
Yj1, j2 ,.... Jl = ( −1) i1,.....,in
in (3.14)
i1 = 1 in = 1
Definition 3.4
In the {1, –1} representation of a multidimensional linear code, instead of a generator tensor,
given an information tensor Xi 1,..., in , an encoding procedure X → Y is utilized, where the
tensor Y j1,..., jl is such that Y j1,..., jl component is a monomial that consists of a subset of the
X i1,..., in . An encoding procedure is systematic if and only if Y j1,..., js = X j1,..., js for 1 < s < n.
Definition 3.5
Let Gi1, i 2,..., in ; j1, j 2,...., jl be a generator tensor of ones and zeroes. The polynomial representation
of generator tensor G with respect to a {+1, –1} received tensor of dimension m and order
l, W denoted by E is,
38 Multidimensional Neural Networks: Unified Theory
m m
EW ( X ) =W ⊗ ∏...∏ X i1,...,in (3.15)
i1 = 1 in =1
= W ⊗Y (X ) (3.16)
where ⊗ denotes inner product between the tensors (i.e. outer product of the tensors
followed by contraction over appropriate indices).
Consider the linear multidimensional block code defined by the generator tensor G (or
equivalently by the encoding procedure associated with G ). The polynomial representation
of G i.e. EW ( X ), will be called the energy function of W with respect to the encoding
procedure X → Y .
To establish the connection between the energy functions (optimized by neural/
generalized neural networks over various subsets of the multidimensional lattice) and linear
multidimensional block codes, we will prove that finding the global maximum of EW (X) is
equivalent to maximum likelihood decoding of a tensor W with respect to the code C.
Proof: For an { +1, –1 } information tensor,X the scalar energy function is given by
EW (X) = W ⊗ Y(X) (3.18)
= {( j1, j 2,..., jl ): W j1,..., jl = Y j1,..., jl ( X )}
(3.19)
− {( j1,..., jl ): W j1,...., jl ≠ Y j1,...., jl ( X )}
= ml
− 2 {( j 1,... jl ): Wj 1,..., jl ≠ Yj 1,..., jl (X )}
= ml − 2 dH (W , Y) (3.20)
where d H denotes the Hamming distance between the multidimensional codewords W, Y .
From the above expression, EW ( B ) will achieve a maximum if and only if d H (W , Y) achieves
a minimum. Q. E. D.
m m
EW (X ) = ∑ ...∑ Yj1 ,..., jl
j1= 1 jl = 1
(3.22)
ml – M
d* = (3.25)
2
The above results are being generalized to infinite dimensional codes utilizing infinite
dimension/order tensors .
Lemma 3.5: Let E (X) be the polynomial representation of parity check tensor HT with respect
to the all ones tensor. Then, X ∈ C , the multidimensional linear block code if and only if
E (X) = m(n–l).
Proof: E , the polynomial representation of parity check tensor has m (n–l) terms, and all
the coefficients are equal to one. Hence, E = m(n–l). if and only if all the terms are equal to
one. Q. E. D.
The above Lemma ensures that in the polynomial representation, E (X), every codeword
corresponds to a global maximum (stable state). An interesting question is, does every local
maximum correspond to a codeword. This question is answered by the following theorem.
Theorem 3.5: Let C be a linear multidimensional block code, with G, H, EC, and E as
defined above. Then E is a polynomial with the properties of EC. That is, X corresponds to
∈C.
a local maximum in E if and only if X∈
Multi/Infinite Dimensional Coding Theory: Multi/Infinite Dimensional Neural Networks 41
Proof: From the above Lemma, the global maximum of E is m (n–l) ; thus every codeword is
a global ( and thus a local ) maximum. The converse follows from the fact that the tensor H
has a systematic form. Specifically, the last m (n–l) variables in E ; i.e., xi1,i 2 ,...., in – l +1 ,...., in ;
where the order indices iˆn −l +1 ,..., iˆn (each of them) assume m values, each appear only in
one term. That is, since I is an identity tensor in the parity tensor H; x i 1,..., iˆn – l +1 ,...., iˆn appears
only in first term, and so on. Now, assume that a tensor V exists that corresponds to a local
(n−l )
maximum (which is not global maximum). That is E (V) = L, where L < m . Hence, at
least one term exists in E (V ) that is not one. However, this can be made one by flipping
the value of the index variables that appear in this term. This contradicts the fact that V is
a local maximum. Q. E. D.
To summarize, given a linear code C, the algorithm for constructing a polynomial is as follows:
(1) Construct the systematic generator tensor of C by the standard techniques in
tensor algebra,
(2) Construct the systematic parity check tensor of C in accordance with (3.27)
(3) Construct E , which is the polynomial representation of H with respect to the all-
ones tensor. By the above Theorem 3.5, EC = E .
In the following, generalizations of the above results are discussed. Also, some important
comments, remarks are provided.
(A) The construction just described also works for cosets of linear multidimensional
block codes. Let W be a tensor of dimension m and order (n – l) of the coefficients
of E. In the construction described above, the all-ones coefficient tensor was
chosen and it was concluded that EC = E . It corresponds to the all-zero syndrome
tensor. Let C be a coset of C, and let T be the syndrome which corresponds to
C. Utilizing the proof argument of Theorem 3.5, it can be proven that a one-to-
one correspondence exists between the local maxima of polynomial representation
of the parity check tensor H with W = T and the tensors in the coset C. Clearly,
the syndrome that corresponds to the code C is the all-ones tensor (by noting that
in the transformation in section 3, 0 goes to 1).
(B) The construction described in this section is a dual way of defining the maximum
likelihood problem (MLD) (with respect to the one suggested in section (3)).
Consider a linear multidimensional block code defined by the parity check tensor
H. Given a tensor V, the maximum likelihood decoding (MLD) problem can be
defined as finding the local maximum in EC closest to V or, equivalently, finding
a local maximum of the energy function associated with the syndrome
(corresponding to V) that is achieved by a tensor of minimum weight.
The above results are generalized to some infinite dimension/order tensors in a
straightforward manner.
In the following section, the above results are generalized to non-binary codes.
42 Multidimensional Neural Networks: Unified Theory
(3.32)
m m
∑ .. . ∑ ( B i 1 ,. .. , ik G i 1 ,. .. , ik ; j 1 ,. .. , j n ) m o d p
= u i 1 = 1 ik = 1
m m
= ∏ ...∏ u
Bi1 ,..., ik Gi1 ,...,ik ; j1 ,..., jn
i1= 1 ik = 1
m m
= ∏ ...∏ Xi1,...,i 1,...,ik ik ; j 1,..., jn
G
i1= 1 ik = 1
decoding (MLD) problem with the metric being the Hamming distance between the tensors
while in the second case, we consider the Lee distance.
The generalization for the case where the Hamming distance is utilized in the maximum
likelihood decoding (MLD) problem is based on the following well known Lemma.
Lemma 3.6: Let p be a prime, and let ( j 2 π p ). Assume KE{0,1,2,...,(p − 1)}µ=e then
1, if k = 0,
( p −1)
(1 p ) = ∑u km
(3.34)
m =0 0, othe rw ise
The generalization is stated through the following theorem.
Theorem 3.6: Consider an (m, k ; m, n) multidimensional linear block code over GF(p), with
p
p being a prime. Let X → Y be the corresponding encoding procedure. Let EW be the
following multivariate polynomial representation of the generator tensor G with respect to
an arbitrary received tensor W :
( p −1)
∑ (W
p •
EW (Y ) = i 1,..., in ⊗ Yi 1,..., in ) (3.35)
l=0
where W• denotes the complex conjugate of W and ⊗ denotes the inner product between
the tensors. Then, the maximum likelihood decoding of W i1 ,......, i n is equivalent to finding
p
the maximum of EW (Y ) .
Proof: It follows by the same argument as Theorem (3.4) adopted to the variables appearing
p
in the polynomial EW (Y) and the application of above Lemma. Q.E.D.
The essence of the above theorem stated in more explicit language leads to the following
conclusion.
Given a received tensor Wi 1,..., in , the closest codeword tensor (in Hamming distance) to
W in C (the code utilized at the input to the multidimensional channel) corresponds to a
tensor B if and only if
( p −1)
Max
EW (B ) = All tensors EW (Y ) = ∑ (W i 1,..., in ⊗ Yi 1,..., in )
l
(3.36)
l =0
Next, we consider the maximum likelihood decoding problem with respect to the Lee
distance. We first consider the cases where p = 3 or 5. In these cases, there are easy
expressions for the energy function. It is convenient to redefine the energy function in the
following manner:
Given an encoding procedure for a transmitted tensor X = (Xi1,..., i k), into a codeword
tensor Y = (Yi1,..., in), by the following procedure i.e.
X = (Xi 1,..., ik ), → Y = (Yi 1,..., in ) (3.37)
44 Multidimensional Neural Networks: Unified Theory
and W = (W i1,..., in), a tensor whose entries are the pth roots of unity, we redefine the energy
function as follows:
i
EW ( X ) = Re(W i1 ,....,in ⊗Yi1 ,..., in ) (3.38)
where Re (x) denotes the real part of the complex number, x denotes the integral part of
the number x and xi denotes the complex conjugate of x.
It should be noted that the energy function coincides with the one for p = 2 (in the case
u = –1). The definition of Lee distance is provided to facilitate the easier understanding of
further discussion.
Definition 3.6
p
The Lee weight of an m-dimensional tensor of order k , X = (Xi 1,..., ik ), (Xi 1,..., ik ) ∈ Z , p is a
prime, is defined as
m m
WL = ∑ ...∑ Xi 1,..., ik (3.39)
i1=1 ik =1
where
X i 1, i 2,..., ik , 0 < X i 1, i 2 ,..., ik ≤ ( p 2)
X i 1, i 2,..., ik =
p − X i 1, i 2 ,..., ik , ( p 2) < X 1i , i 2,..., ik < ( p − 1)
The Lee distance between any two compatible tensors is defined as the Lee weight, W L
of their difference.
With the above definition, we study the cases where p = 3, p = 5. From now, in the
following discussion, X → Y denotes the encoding procedure that defines a code
(multidimensional), and X , Y are tensors of dimension m and order k , n respectively, of
third or fifth roots of unity.
In the following, two new theorems are proved. The first one is equivalent to the
Theorem (3.4). It states that maximum likelihood decoding (MLD) in a ternary code is
equivalent to the maximization of the energy function in (3.39). The Theorem is formally
stated below:
Theorem 3.7: Let p = 3, A → B; then B is the closest multidimensional codeword (in the
Hamming distance) to a received tensor word W if and only if
EW ( A ) = Max EW (X ). (3.40)
X
Proof: The proof is similar to that of Theorem (3.4) and is avoided for brevity Q.E.D.
The proof of Theorem (3.7) as well as Theorem (3.8) requires the utilization of Lemma
(3.6) and a clear understanding of when the energy function is maximized. The new
Multi/Infinite Dimensional Coding Theory: Multi/Infinite Dimensional Neural Networks 45
Theorem 3.8: Let p = 5, A → B ; then B is the closest multidimensional codeword (in Lee
distance) to a received tensor word W if and only if
EW ( A ) = Max EW (X ). (3.41)
X
In the theory of error control codes in one dimension, linear block codes are first extensively
studied and various problems including the sphere packing problem was subjected to intense
theoretical investigations. The research and development led to various theoretical as well
as practical encoding/ decoding algorithms. Then, because it was thought that linear codes
are limited from the point of view of various code parameters such as the number of (Ara)
correctable errors/minimum distance, non-linear block codes were studied. The research in
this direction culminated in the discovery of codes from algebraic geometry based techniques.
The encoding algorithm was generally easy from the point of view of theory as well as
physical hardware. It is the decoding algorithm which was considered difficult and was the
subject of intense investigations resulting in several decoders. The maximum likelihood
decoding (MLD) problem of linear codes and the relationship to energy functions (discussed
46 Multidimensional Neural Networks: Unified Theory
in the previous sections) naturally suggests a search for similar techniques to non-linear
codes. In the following, non-linear multidimensional codes are investigated.
The essential idea in generalizing the results in previous section to non-linear
multidimensional codes is to consider the representation of Boolean functions as polynomials
over the field of real numbers. In the context of one dimensional non-linear codes, part of the
discussion is known (BrB) and is repeated here for the sake of completeness. Also, utilization
of some subtle ideas associated with tensor products make the presentation essential aid for
realizing that non-linear multidimensional codes share various features with linear codes.
Definition 3.7
A Boolean function f on n variables, is a mapping
f : {0,1}n → {0,1} (3.44)
For the present discussion, it is useful to define Boolean functions using the symbols 1
and –1 instead of the symbols 0 and 1, respectively.
Definition 3.8
A Hadamard matrix of order m, denoted by H , is an m × m matrix of +1’s and –1’s such that
Hm HmT = mI m , (3.45)
where Im is the m × m identity matrix. The above definition is equivalent to the assertion
that any two rows of H are orthogonal.
Hadamard matrices of order 2k exist for all k > 0. The construction is as follows:
H1 = [1]
1 1
H2 = 1 – 1
H 2n H 2n
H 2n +1 = . (3.46)
H 2n − H 2n
Definition 3.9
Given a Boolean function f of order n, P is a polynomial (with the coefficients over the field
of real numbers) equivalent to f if and only if for all vectors X ∈ {1, – 1}
n
f (X ) = Pf (X ). (3.47)
An important problem that is relevant to the investigation of non-linear multidimensional
codes is the following:
Given a Boolean function f of order n, compute Pf , polynomial which is equivalent to f.
From the results in section 3, it is evident that the components of the codeword tensor
(of a linear code), in the {1, – 1} representation are Boolean functions (monomials) in the
Multi/Infinite Dimensional Coding Theory: Multi/Infinite Dimensional Neural Networks 47
Theorem 3.9: Let f be a Boolean function of order either strictly less than or equal to m n (in
the components of a tensor X of dimension m and order n). Let Pf be a polynomial equivalent
n
to f. Let B denote the tensor of coefficients of Pf . Let P denote the tensor of utmost 2 m
values of Pf (corresponding to m n{+1, – 1} components of tensor X ). Then,
(1) the polynomial Pf always exists and is unique,
(2) the following relationship is satisfied
P = G ⊗ B, where ⊗ denotes the inner product of tensors.
Proof: The proof is constructive in nature. The essential idea is to determine the
coefficients of the polynomial by solving a system of linear equations, possibly imbedded
in tensors.
First, let us consider a Boolean function f of one variable and let us determine the
coefficients of the polynomial Pf .
Pf (x) = b 0 +b 1 x (3.48)
Evaluating the polynomial on the domain of the Boolean function, we have
Pf (1) = b 0 + b 1 (3.49)
Pf (–1) = b 0 – b 1 (3.50)
+1 + 1
Thus, P = G ⊗ B, where G = +1 – 1 (3.51)
G is a Hadamard matrix and B as defined before is the vector of coefficients of
Pf (X1 , X2 ,..., Xn + 1 ) .
Remark
Before proceeding with the proof, the following comparison/discussion on the similarities
and differences between tensor products, matrix products is very relevant. Consider a G
matrix and a column vector B. The tensor product, when the variables (matrix, column
vector) are treated as tensors is given by
48 Multidimensional Neural Networks: Unified Theory
G ⊗ B = Gi , j Bk = Pijk
CONTRACTION
→ Pi
G11 B1 G11 B2 G12 B1 G12 B2 (3.52)
G21 B1 G21 B2 G22 B1 G22 B2
Now, we perform contraction on certain indices of the tensors. The resulting tensor is
a first order tensor. Specifically, suppose we do the contraction over the indices j, k. Then,
we have
G11 B1 + G12 B2
(3.53)
G21 B1 + G22 B2
Thus, the tensor product, in contrast to the matrix product allows more freedom in
summing the components over different indices (contraction over different indices in the
language of tensor algebra) of the tensor.
Now, we return to the original proof.
The above argument is now generalized to less than or equal to m n variables ( or arbitrary
finite/countable number of variables which are possibly the components of a tensor ) by
the method of mathematical induction.
The case m = 1, n = 1 is proved at the beginning of the proof. Since m n is still a large
number (finite), say l, it is sufficient as well as necessary to prove the result for a finite
number l ( in the case considered, the binary variables are imbedded inside a tensor. Also,
the polynomial representing the Boolean function is expressed through inner product
operation over appropriate tensors ).
Now, as an induction hypothesis, assume that the claim is true for l
P = G2n B (3.54)
variables. Since, every polynomial of (l + 1) variables can be written as a combination of
two polynomials each of l variables, we have
Pf (X1 , X2 ,..., Xl + 1 ) = Pf 1 (X1 , X2 ,..., Xl ) + Xl + 1 Pf 2 (X1 , X2 ,..., Xl ) (3.55)
There are two possibilities, either Xl + 1 = + 1 or Xl + 1 = − 1. Hence, by the induction
hypothesis (3.55), the system of linear equations in (l + 1) variables, becomes
G2n G2n
P = G – G (3.56)
2n 2n
we have P = G 2n + 1 B
Hadamard matrices are non-singular; thus, for any given f, a unique Pf exists (defined
by a vector of coefficients).
Multi/Infinite Dimensional Coding Theory: Multi/Infinite Dimensional Neural Networks 49
In the language of tensor algebra, the same argument holds true except that the tensor
can have ( the tensor utilized to couple the coefficients of the polynomial representing a
Boolean function to the values of the polynomial) ‘0’ (zero) entries in addition to +1, –1
entries ( when contraction is performed over the appropriate indices). Uniqueness of such
a polynomial is ensured by the uniqueness as a representation of Boolean function ( from
the discussion/proof above ). Thus, in the tensor algebra notation, we have
P = G ⊗B (3.58)
where ⊗ denotes the inner product of two tensors. Q. E. D.
It should be clear that the above representation theory has relevance to the minimum
sum of products representation of a Boolean function. The above theory, as is easily seen
holds true, if one is interested in finding the equivalent polynomial of a Boolean function
which assumes {0,1} values. One way to see the result is by the following claim.
CLAIM: Every monomial over {1, –1} can be written as a polynomial over {0,1} by the
change of variable (BrB), x = 1 - 2 u, as follows:
k k
∏ Xi = 1+ ∑ (−2)i
i =1 i =1
∑ ∏U
Si j ∈Si
j (3.59)
∑X
j =1
j
⊗ Aj =0
∞ (3.62)
∑ X j ⊗ Aj = 0
j =1
where X, {A} are tensors of compatible dimension, order such that the inner/outer
product operations are well defined. The solution techniques developed in (Rama 11)
when the linear operators are matrices are extended to the tensor linear operator case
in (Rama 6). Also, various results that are well documented in the books such as (Gol)
for matrix polynomials based on the properties of matrix linear operator are extended
to tensor linear operator. Furthermore, in one dimensional system theory, various results
are developed for systems of matrix polynomial equations utilizing only linear operator
properties of a matrix. These results are extended to systems of tensor polynomial
equations (Rama 3). In (Rama 6), the author formulates as well as solves the problem
of determination of tensor variate zeroes of multi-tensor variate polynomial, power
series equations
L L
∑ ...∑ X
i1
1 ⊗ X i22 ... X mim ⊗ Ai 1,...., in = 0
i1= 0 m= 0
∞ ∞
(3.63)
∑ ...∑ X
i1
Various other associated results are documented in (Rama 6). It is well known that the
zeroes of a uni-variate scalar polynomial constitute a group. By utilizing the set of zeroes
of a determinental polynomial associated with the uni-variate/multi-variate (tensor
variables) polynomial, the set of tensor zeroes are divided into certain set of equivalence
classes. Thus, a group structure is imbedded onto the linear subspace of tensor zeroes of
uni-variate/multi-variate polynomial equations.
Unlike the multivariate polynomials (whose terms/monomials are based on the
components of tensors) optimized in sections 3, 4, 5, 6; in view of the above results, a
natural question that arises is whether the local optimum of multi-tensor variate polynomials
over various subsets of multidimensional (very high dimensional) lattice lead to (each
variable is a tensor) codeword sets with better properties. When the information tensor,
generator tensor, codeword tensors are blocked into sub-tensors and the objective function
for the optimization problem over a subset of multidimensional lattice is rewritten, it is
evident that a multi-tensor variate polynomial appears. Thus, such polynomials are
subsumed in the ones considered in sections 3, 4, 5, 6.
n
Maximize ∑ Wi ∏X j (3.64)
i =1 j ∈Si
where Sj is a subset of {1,2,...,n} and X j ∈{0,1} . Thus, the problem is concerned with optimizing
a multivariate polynomial, whose variables assume integer values. By the discussion, in this
section, every polynomial over {1, –1} can be transformed to an equivalent one over {0,1} by
a change of variable. It is shown in section 2, that a special case of the above problem i.e.
maximization of a quadratic form in {1, –1} variables arises in connection with the
determination of global optimum stable state of a neural network and is equivalent to the
minimum cut problem. This problem is known to be an NP hard problem.
The problem in (3.64) was studied extensively by various researchers and the main effort
concentrated in identifying the special cases which are solvable in polynomial time and in
devising approximation techniques. The most common technique for solving the
unconstrained {0, 1} program of the form in (3.64) is by transforming them to the problem of
finding the maximum weight independent set in a graph, which is an NP-hard problem. The
problem in (3.64) is transformed to the problem of finding the maximum weight independent
set by using the concept of a conflict graph of a 0-1 polynomial. In (BrB), it is shown how
decoding techniques can be utilized to maximize 0-1 nonlinear programs.
The multidimensional version of the 0-1 nonlinear programming problem in (3.64) is
given by
Multi/Infinite Dimensional Coding Theory: Multi/Infinite Dimensional Neural Networks 53
n
Maximize ∑ Wi ⊗ X ii (3.65)
i =1
where W, X are tensors containing the known coefficients w ‘s in W and the monomials in
the variable components of the unknown tensor X. The inner product between these two
tensors provides the scalar objective function whose variables are allowed to assume only
{0, 1} or more generally finitely many values. It is shown in (Rama 6) that such an integer
programming problem can be solved utilizing the multidimensional decoding techniques
for linear block multidimensional codes. These results in operations research are avoided
here and relegated to (Rama 6).
In one/two independent dimensions, various static optimization problems are solved under
the sub-fields of optimization theory such as (a) linear programming, (b) non-linear
programming, (c) calculus of variations, (d) combinatorial optimization etc. With the
innovative idea of formulating and solving the parallel problems in multidimensions (Rama
3) through the utilization of tensor linear operator (motivated by practical applications),
vast literature in multidimensional optimization theory is generated. Various consequences of
this innovative idea of the author are fully explained in the companion research article
(Rama 3) on dynamic optimization. In the following, some innovative ideas of generic
consequence in static optimization are described.
In view of the results in section 5, the constraint set over which a multivariate
polynomial (terms of the polynomial expressed in terms of the components of a generator
tensor, received tensor in the case of MLD) is optimized is a subset of the multidimensional
lattice (or bounded lattice, say, in multidimensions) and subsumes the multidimensional
hypercube as its subset. These results naturally lead to a question as to whether it is
possible to utilize the results in sections 3, 4, 5, 6 for optimizing multivariate polynomials
over more general constraint sets in multidimensions. In the following theorems, constrained
optimization over more general constraint sets utilizing the results of sections 3, 4, 5, 6 is
discussed.
Theorem 3.11: Consider a compact set in a multidimensional metric space. The local
optimum of a multivariate polynomial (with the terms/monomials expressed in terms of
the components of tensors and assuming binary/finitely many integer values) whose
variables are allowed to assume finitely many values, over the compact set, occurs at the
union of codewords of finitely many multidimensional non-binary/binary codes.
Proof: From real/complex analysis (also topology), we have the Heine-Borel Theorem,
which states that every open covering of the compact set ( in the space described by multiple
independent variables ) has a finite sub-covering. The covering generally consists of open
54 Multidimensional Neural Networks: Unified Theory
Remark
Suppose, the compact set/open set (in multidimensions) is covered by finitely/countably
many hyperspheres (multidimensional) and a quadratic/higher degree form is optimized.
By the spectral representation theorem, the local optima of quadratic/higher degree form
occur at the eigentensors with the eigenvalues being the corresponding values. This
corresponds to L 2 norm based optimization.
The above theorems illustrate two essential ideas of generic utility in static
optimization: (a) optimization over more general constraint sets, (b) decomposition principle.
Multi/Infinite Dimensional Coding Theory: Multi/Infinite Dimensional Neural Networks 55
Decomposition Principle
Consider an arbitrary constraint set over which an objective function is optimized in
one or more independent dimension variables. The constraint set is decomposed into the
union of finitely many special sets with interesting structure. Optimization of various
objective functions over the special sets has various interesting features: (a) various results
are well known, (b) the local optima have interesting structure, (c) it is thoroughly studied
etc. Utilizing these features, optimization of any objective function over the original set is
decomposed into simpler problems. The above two theorems are only illustrative.
The discovery and application of the above decomposition principle to
multidimensional constrained optimization problems naturally led the author to investigate
various other innovative ideas in static optimization.
(I) Approximation of Objective Function by Polynomials, Power Series ( other Special
Classes of Functions)
Polynomials and power series (uni/multi-variate) are very important classes of functions.
The optimization results (unconstrained as well as constrained) associated with these functions
enable one to derive the local optimum of some classes of functions over various constraint
sets invoking standard theorems (from approximation theory). For instance, the following
theorem enables deriving results on continuous objective functions utilizing polynomials:
Theorem 3.6 is utilized in association with the following theorem.
Theorem 3.13: Every continuous function over a compact set always attains its maximum/
minimum over the set. Every continuous function can be arbitrarily closely approximated
by polynomials ( multi-variate/univariate).
Also, invoking the standard theorems from approximation theory, various classes of
functions are arbitrarily closely approximated by polynomials: uni-variate/multi-variate.
Thus, when these functions are utilized as objective functions, results associated with
polynomials (derived in sections 3-6 ) are invoked.
(II) Discovery of new local/global optimization techniques
This requires utilizing either new classes of functions or new constraint sets. The
constraint set structure renders the local optima of some functions with interesting structure
and also the properties satisfied by the objective functions enables discovering efficient
techniques.
NP-Hard Problems:
In computer science, operations research and other applied/theoretical research fields,
various NP-hard problems are well identified and studied. It is well known that one NP-
hard problem is as complex ( in the terminology of complexity theory in theoretical computer
56 Multidimensional Neural Networks: Unified Theory
science) as any other NP-hard problem. Finding algorithms which are efficient (in terms of
complexity) for an NP-hard problem is well recognized as a difficult problem. The following
is a difficult open problem in theoretical computer science:
Problem: Does a polynomial time algorithm exist for an NP-hard problem? In other words,
is the class of problems in NP, the same as the class of problems in P? i.e. is P = NP?
In the following, an innovative algorithm/approach to solve various NP-hard problems
in one dimension is described. The multidimensional generalization of this algorithm/
approach to any NP-hard problem (in multidimensions) is being formalized. It is an
extension of the following results to multidimensions.
In section 2, the problem of computation of minimum cut in a graph is shown to be
equivalent to the problem of determining the global optimum of the energy function of a
neural network i.e. maximizing a quadratic form over the hypercube. It is well known that
this is an NP-hard problem. In the following, an attack on this problem is described.
Positive Definite Synaptic Weight Matrix: Determination of Global Optimum Stable State of a
Neural Network:
Consider a neural network whose synaptic weight matrix is symmetric as well as
positive definite. In the following, an algorithm to determine the global optimum stable
state of such a neural network is described.
(a) Utilizing the well known theorem in linear algebra, every positive definite
symmetric matrix, S can be decomposed into the following form by means of
Cholesky Decomposition.
S = N NT (3.66)
where N is a lower triangular matrix.
(b) The quadratic form being optimized by the neural network over the hypercube
can be expressed into the following form:
X T S X = XT N N TX = YT Y , where Y = N TX . (3.67)
Since S is positive definite, XT S X > 0. Thus, YT Y > 0. The scalar expression for the quadratic form
n
in terms of Y is given by ∑Y
j =1
2
j
. Thus, it is evident that the value of the quadratic form is either
cut computation in an undirected graph, knapsack problem etc.), the complexity of the
algorithm is determined by
(a) Complexity of determination of Cholesky decomposition of a positive definite
symmetric matrix. Since there are various polynomial time procedures for the spectral
decomposition, computationally well studied efficient algorithms are available,
(b) Solving the linear programming problems related to optimization of linear forms
( maximization or minimization whichever leads to a larger value for the term)
over the hypercube. It is well known that there are polynomial time algorithms
for linear programming problems.
In some problems that arise in operations research, communication theory etc.,
constraint set is a convex polygon/polytope (convex hull of various finite structures
leading to convex sets bounded by hyperplanes) etc. and a quadratic/higher degree
form is optimized over the constraint set. Then, by means of Spectral/Cholesky type
decomposition of the positive definite symmetric linear operator (in one as well as
multidimensions), various linear programming problems are solved through efficient
polynomial time procedures. The computation of complexity of such procedures,
efficient algorithms for NP-hard problems in one and multi-dimension, are being
documented. When the connection matrix has other special structure efficient
algorithms are found.
finding the maximum of the energy function E of a neural network defined by the graph
G (the weights on the edges of G are given by W = (–1) yi with all its threshold values
equal to zero.
But, it is well known that the local optimum of a quadratic form over the hypersphere
occurs at the eigenvectors (eigentensors of the symmetric second order tensor) of the
symmetric matrix (associated with the symmetric matrix) with the value of the quadratic
form being the eigenvalue. Thus, maximum eigenvector of the symmetric matrix maximizes
the quadratic form over the hypersphere. Thus, the sign structure (sign of the components
of the vector) of the maximum eigenvector is utilized as the initial condition to run a neural
network i.e. Mathematically, let X 0 be the vector given by
X 0 = Sign ( X max ), where X max is the normalized maximum eigenvector and
X 0 is the initial state in which the neural network starts. A is the symmetric matrix.
The analysis of hop-and-skip algorithm is provided below.
X T A X = (X – X0 + X0)T A(X – X0 + X0) (3.68)
= (X – X0)T A( X – Xmax) + X0 A X0 + 2 XT0 A( X – X0 )
= λmax + (X – X0)T A(X – X0) + 2 X0T A( X – X0)
= λmax + (X – X0)T A(X – X0) + 2 λmax X0T(X – X0)
–λmax + (X – X0) T A(X – X0) + 2 λmax X0T X
= n (3.69)
The above manipulations enable one to compare the value of the quadratic form on the
hypercube at any discrete time instant against the maximum value on the unit hypersphere.
The particular choice of initial condition, minimizes the Hamming distance between the
maximum eigenvector and the initial condition vector to run the neural network.
The set of eigenvectors of the connection matrix of neural network span the entire
space or a subspace of it. Similarly, the set of stable states/ stable vectors span the space
or a sub-space. To determine the maximum stable state, the essential idea of the above
approach is to find the vector closest to the maximum stable state and utilize it as the
initial condition to run the neural network. Detailed analysis of the algorithm is being
investigated.
Dynamic Optimization
In (Rama 3), certain multidimensional system, in discrete/ continuous time is described
by the following state space representation through tensors:
Discrete Time:
X(n + 1) = A(n) ⊗ X(n) + B(n) ⊗ U(n),
(3.70)
Y(n) = C(n) ⊗ X(n) + D(n) ⊗ U( n).
Multi/Infinite Dimensional Coding Theory: Multi/Infinite Dimensional Neural Networks 59
Continuous Time:
X(t + 1) = A(t) ⊗ X(t) + B(t) ⊗ U(t),
(3.71)
Y(t) = C(t) ⊗ X( t) + D(t) ⊗ U(t).
where ⊗ denotes the inner product between compatible tensors in the system description
in continuous/discrete time. Utilizing this state space representation, the author formalized
a unified theory of control, communication and computation in multi/infinite dimensional
systems, first discovered in (Rama1) for one dimensional systems. This theory enabled the
author to develop a highly advanced version of the theory of evolution of life from organic
matter. In this theory the author reasons that various body organs, functions of living
systems have evolved over time and that bilogical systems are organic/inorganic matter
based dynamical systems.
3.8 CONCLUSIONS
Tensor linear spaces over finite fields are utilized to describe and study the structure/
properties of multi/infinite dimensional linear codes. The three concepts: multidimensional
neural/generalized neural networks, multidimensional codes, multivariate polynomial
(terms/monomials being expressed in terms of the components of generator, other tensors)
optimization over various subsets of lattice, are related.
It is shown that (a) the problem of maximum likelihood decoding of error correcting
codes (multidimensional), (b) finding the global maximum of the energy function of neural/
generalized neural networks, and (c) solving integer/non-linear programming problems
in multidimensions are related. The equivalence is proved for binary as well as non-binary
cases. This equivalence naturally suggests utilizing the solvable cases of one problem to
the equivalent problem and vice versa. Full capitalization of equivalence leads to various
new results (Rama 6).
The programming problem of multidimensional neural networks is solved. Several
new heuristic procedures for NP-hard problems in multidimensions are suggested from
the equivalence. The decoding techniques of various (multidimensional extensions of one
dimensional codes) codes are utilized to find approximate solutions of NP-hard problems.
Various innovative results in static optimization are described. Infinite dimensional
generalization of the results is briefly described.
REFERENCES
(Ara) B. Arazi, “Common Sense Approach to the Theory of Error Correcting Codes, “ MIT Press
book.
(BoT) A.I. Borisenko and I.E. Tarapov, “Vector and Tensor Analysis with Applications, “ Dover
Publications Inc., New York, 1968.
60 Multidimensional Neural Networks: Unified Theory
(BrB) J. Bruck and M. Blaum, “Neural Networks, Error Correcting Codes and Polynomials
Over the Binary Hypercube, “ IEEE Transactions on Information Theory, Vol. 35, No. 5,
September 1989.
(Gaal) Gaal, “ Group Theory, “ Academic Press, 1982,
(Gol) I. Goldberg, “Matrix Polynomials, “ Academic Press, 1972.
(Rama 1) Garimella Rama Murthy, “Unified Theory of Control, Communication and
Computation—Part-1, “ Manuscript to be submitted to the IEEE Proceedings.
(Rama 2) Garimella Rama Murthy, “Multi/Infinite Dimensional Neural Networks, Multi/Infinite
Dimensional Logic Theory, Logic Synthesis, “ Published in International Journal of Neural
Systems, Vol. 15, No. 3, pp 223-235, 2005.
(Rama 3) G. Rama Murthy, “Optimal Control, Codeword, Logic Function Tensors:
Multidimensional Neural Networks,” International Journal of Systemics, Cybernetics and
Informatics, October 2006, pages 9-17. See also Chapter 4.
(Rama 4) Garimella Rama Murthy, “Multi/Infinite Dimensional Logic Synthesis, “ Manuscript
to be submitted to the IEEE Transactions on Computers.
(Rama 5) Garimella Rama Murthy, “Signal Design for Magnetic and Optical Recording Channels,
“ Bellcore Technical Memorandum, TM-NWT-018026.
(Rama 6) Garimella Rama Murthy, “Tensor Variate Polynomials/Power Series, Tensor based
Functions, Tensor Algebraic Geometry: Optimization, “ Manuscript to be submitted to the
Transactions of American Mathematical Society.
(Rama 10) Garimella Rama Murthy, “Unified Theory of Control, Communication and
Computation: Dynamical Systems, “ Manuscript in Preparation.
(Rama 11) Garimella Rama Murthy, “Transient and Equilibrium Analysis of Computer Networks:
Finite Memory and Matrix Geometric Recursions, “ Ph. D. Thesis, Purdue University, West
Lafayette, Indiana.
(Rama 12) Garimella Rama Murthy, “Origin of Universe: Living/Non-Living: Grand-unification
Theory of Universe, “ Manuscript in preparation.
Tensor State Space Representation: Multidimensional Systems 61
CHAPTER
4
Tensor State Space
Representation:
Multidimensional Systems
4.1 INTRODUCTION
With the efforts of researchers in electrical engineering, linear system theory started with
abstract models of arbitrary linear systems through forced/unforced nth order difference
equations in discrete time and differential equations in continuous time. Such representations
are called the input-output representations of the linear system. These arbitrary system (electrical,
mechanical, chemical, hybrid systems) evolution equations were then converted into first
order differential/difference equations in state, control, input, output vectors through state,
input, output coupling matrices. Such a representation is called the state space representation.
The state space equations take the following form (Gop)
Discrete Time Systems:
X(n + 1) = A( n) X(n) + B(n) U(n),
Y(n) = C(n) X(n) + D(n) U(n),
One of the main tools in the design and analysis of one dimensional linear dynamic systems
as well as static systems is linear algebra. Motivated by practical applications in image
processing and other fields, system theorists proposed various input-output models for
two/multidimensional systems. Models which exhibit quarter plane causality have been
initially investigated from the input-output point of view (BiF) in the framework of two
dimensional filter theory, where two dimensional filters are represented by proper rational
functions in two indeterminates of the following type:
∑nZ
i + j ≥1
ij 1
i
Z2 j
W (Z1 , Z2 ) =
1+ ∑dZ
i + j≥1
ij 1
i
Z2 j (4.2)
The idea of associating two dimensional state space models with two dimensional filters
was originated very naturally. However, since the beginning it appeared that the canonical
technique based on the Nerode equivalence leads to an infinite dimensional state space.
The reason was to utilize a matrix as the linear operator to describe the state dynamics. So,
following some heuristic procedures, several finite dimensional models have been (BiF)
introduced, where two notions of state play different roles:
1. local states: X(h,k) belong to a finite dimensional vector space. They enter in the
state updating equation and determine the value of the output.
64 Multidimensional Neural Networks: Unified Theory
Y (h , k ) = C X(h , k ) (4.5)
In Attasi’s model A 1 and A 2 are commutative matrices. Also, A 1A 2= –A 0. It realizes
separable filters only and constitutes an interesting second order model, as the underlying
theory is very close to the one dimensional theory (BiF).
Recently, the behavior approach has been extended to two dimensional systems. Following
this theory, a two dimensional system is defined by a family of β admissible functions
(behavior), defined over the discrete plane. These functions are characterized by the property
of belonging to the Kernel of a polynomial matrix M (Z1, Z2) in two variables
β = {ω = ∑ wij z1i z2j M ω = 0}
i , j∈Z
(4.6)
Associated with the external description provided by the behavior different internal
representations can be given by introducing the so called latent variable models. State variable
models constitute a particular type of latent variables, that hold the memory of the system
with respect to the notion of past introduced on Z × Z. When a state description is possible,
Tensor State Space Representation: Multidimensional Systems 65
i.e. when the notion of past, present and future are allowed by the structure of β , the behavior
is called Markovian. Since there is not any natural direction for the evolution in Z × Z , the
Markovian property appears more general than the familiar quarter plane causality and has
been exploited in the analysis of non-causal two dimensional dynamics.
Also, various static systems that involve simple linear transformations in the
multidimensional space were previously abstracted utilizing the matrix linear operator.
Such systems arise in practical applications such as databases (modeling storage of multiple
attribute trees), computerized topography etc. The techniques developed for design and
analysis of such systems were thus very elementary.
The above efforts in two/multidimensional system theory were primarily utilizing the
matrix linear operator on an n-dimensional ( in one independent variable) vector space. System
theorists did not realize that utilization of tensor linear operator (in multidimensions) could
lead to design and analysis of a large class of multidimensional systems.
In the following areas, utilization of tensor linear operator to describe the multi/infinite
dimensional state space enables one to formulate new problems , introduce new concepts,
derive new results/theorems. Some of the areas of interest where such an idea could be
utilized are
(1) Multi/Infinite dimensional computation theory,
(2) Multi/Infinite dimensional information/communication/coding theory,
(3) Multi/Infinite dimensional rate distortion theory,
(4) Multi/Infinite dimensional stochastic systems—Theory of Markov random fields,
(5) Multi/Infinite dimensional time series analysis,
(6) Multi/Infinite dimensional digital signal processing,
(7) Theory of Multi/Infinite dimensional connectionist structures—graphoids,
(8) Theory of databases utilizing multidimensional storage,
(9) Matroid theory,
(10) Multi/Infinite dimensional Game theory.
By the utilization of the idea of capturing a multidimensional state space through a
tensor linear operator, new research problems can be formulated and solved.
Remark: Notation
In the tensor notation, the word “dimension of a tensor” stands for the number of
values each independent variable assumes, whereas the word, “order” represents the
66 Multidimensional Neural Networks: Unified Theory
Definition
A dynamical system is linear if and only if, given any two points (scalar, vector, tensor
variables) in the input space, say U1 and U2, and given any two scalar ( real or complex )
constants, the following property is satisfied by the transformation L, describing the
dynamical system:
L (C 1U 1 + C 2U 2 ) = C 1L (U 1 ) = C 1L (U 1 ) + C 2L (U 2 ); C 1 , C 2 ∈C or R or any field (4.7)
If the above property is violated by the dynamical system, we call it a non-linear system.
Conventionally, in multidimensional ( multi-order may be more appropriate, but is not
utilized by the author ) system theory, in the case of discrete time dynamical system (an
example is provided in section 2), the evolution is described by means of local state, local
control, local input and local output variables. This is very cumbersome. In the case of
certain multidimensional systems, the state space representation by means of tensors
(described below) enables one to compactly capture a higher order difference equation
through TENSOR notation.
In order to describe the tensor state space representation, the following concepts/ideas
from tensor analysis are explained.
where A(n) is an m dimensional tensor of order 2r (called the state coupling tensor ), X(n)
is the state of the dynamical system at the discrete time index n, whereas X(n+1) is the state
of the system at the discrete time index n+1. Furthermore B(n) is an m dimensional tensor
of order r+p ( called the input coupling tensor ), Y(n) is an output tensor of dimension m
and order s. U (n) is an m dimensional input tensor (varying with the discrete time index of
order p) and C(n) (called the state coupling tensor to the output dynamics) is an m -
dimensional tensor of order (s + r), D(n) is the input coupling tensor to the output dynamics
of dimension m and order s+p.
In the above state space description of certain type of multidimensional discrete time
dynamical system, there are r dimension variables which are inherently discrete. The
evolution of the system (changes in the system parameters) occur at discrete time instants.
The notation for index set in the state equations requires some explanation. Since the state
tensor is an m -dimensional tensor of order r, it will have m components. When the system
evolves, it transits through tensors in the state space.
With the summary of tensor functions of scalar argument provided above, the dynamics
of certain type of multi/infinite dimensional continuous time/index systems is described
by the following state space description:
68 Multidimensional Neural Networks: Unified Theory
Conventional Multidimensional System State Space Representation versus Modern Tensor State
Space Representation:
In section 2 as well as section 3, the limitations of the way system theorists tried to
represent and analyze the two/multidimensional discrete time/index systems is discussed.
Also, the advantages of tensor state space representation (of certain large class of multi/
infinite dimensional systems) discovered and formalized by the author are described. The
transition from the conventional mode of thinking where the system is represented by
means of multiple independent variables, local state/local control are coupled to the system
dynamics by means of matrices to the modern version where tensor notation is utilized,
requires the realization that the linear space utilized in multidimensions is captured through
the tensor and the system dynamics when done in discrete time requires a discrete variable.
The continuous index case requires more imagination to understand the transition
from conventional approaches to the modern approaches. In the conventional
multidimensional system representation, partial differential equations are utilized to
describe the input-output behavior as well as the state (internal description) dynamics. In
the conventional approaches, multiple independent variables are tracked through separate
indices, leading to partial differential equations. But, the utilization of tensor linear operator
Tensor State Space Representation: Multidimensional Systems 69
and the tensor function of scalar argument enables one to describe the dynamics of tensor
state variable as a function of one continuous time/index variable. Thus, the discrete as
well continuous multi/infinite dimensional system state space representation utilizing
tensors resembles the familiar one dimensional system state space description.
The above tensor state space description reduces to the one dimensional case when the
order of the tensors is one. Thus, various results developed on one dimensional linear
spaces for one dimensional linear systems are readily translated to certain multi/
infinitedimensional systems described through tensor linear spaces (with some care taken
in pathological cases as well as when the problem being solved depends heavily on the
neighborhood set).
The state space representation of one dimensional linear systems resembles that in (4.11),
( 4.12). In fact, one dimensional linear systems are a very special case of certain multi/
infinite dimensional systems described through (4.11), (4.12). A natural question that
arises is whether it is possible to transfer the results from one dimensional systems to
certain multidimensional systems described through (4.11), (4.12). It is explained in the
following that it is possible to do such a translation provided some care is taken in deriving
the results for certain class of multi/infinite dimensional systems. Some principles which
can be utilized as a guideline in deriving the results for multi/infinite dimensional systems
are provided below:
(1) In the case of one dimensional systems utilizing the state space representation
of a linear system, if a result is derived on the system response (invoking the
standard theorems in the theory of ordinary difference/differential equations),
that result has a corresponding version for multi/infinite dimensional systems
when the inner product and outer product between state vector/input vector/
output vector, matrices appearing in the state space descriptions are replaced by
those between compatible tensors in multi/infinite dimensions. One must exercise
care in making sure that the tensor products make sense.
(2) The tensor state space representation (rather than vectors and matrices in one
dimensional case) enables one to translate the results on controllability,
observability, stability from one dimensional linear space based dynamical systems
to certain multidimensional linear space based dynamical systems. The tensor
state space representation enables one to translate various problems for one
dimensional systems, in a one to one manner to certain multi/infinite dimensional
systems. These problems are defined utilizing the state space structure to be
linear (linear spaces in one/multi/infinite dimensions). In translating the solution
70 Multidimensional Neural Networks: Unified Theory
Yi 1,..., ir (n + 1) = Bi 1,..., ir ; j 1,..., jr ⊗ Yj 1,..., jr (n) + Vi 1,..., ir (n) + Ci 1,..., ir ⊗Vj 1,..., jr (n − 1) +
noise models Wi 1,..., ir (n), Vi 1,...,ir (n) are multidimensional versions of white noise.
As in one dimension, the continuous time versions of these models are based on utilizing
a continuous time index t, in the place of discrete time index n and replacing the noise
models in (4.13 and 4.14) by the continuous time white noise or colored noise models. The
formal description is avoided for brevity.
The above models (which effectively reduce to the one dimensional models in the one
dimensional case) enable one to derive various important details related to such stochastic
processes in multi/infinite dimensions. For instance, the autocorrelation tensors, the power
spectrum are derived based on the well known techniques for one dimensional systems. It
should be noted that the multi/infinite dimensional power spectrum estimation problem
(formulated using local state etc.) was well known to be very difficult. Thus, the utilization
of tensor linear operators in certain multidimensional systems enabled one to invoke the
results from one dimensional systems to be extended to certain multidimensional systems.
Various interesting identities arise in the actual analysis. The details are avoided.
In the following, state space representations for arbitrary stochastic linear systems are
described. In one dimension, it is well known that the widely utilized Markov chains
constitute the one dimensional stochastic linear systems. Thus, there has been research
effort to extend the idea, approach to multi/infinite dimensions. Like the deterministic
multi/infinite dimensional linear systems, conventionally various models based on the
local state approach were developed. These are traditionally called the random field
models. With the Tensor State Space Representation (TSSR) (of certain multidimensional
systems) provided in section 3, stochastic multi/infinite dimensional linear systems,
called structured Markov random fields, are based on the tensor linear operator. In the
spirit of the one dimensional approach, the multi/infinite dimensional structured Markov
random fields are homogeneous stochastic linear systems, described by difference
equation of the following form in the discrete time/index
∏ ( n + 1) = ∏ ( n ) ⊗ P( n ) (4.15)
where Π(n) is the tensor of probabilities of the states in the state space, P (n) is the state
transition tensor of the discrete time structured Markov random field. When the structured
Markov random field is homogeneous, then P(n) = P . Both P(n), P are stochastic tensors.
In the continuous time, the multi/infinite dimensional structured Markov random field
is described by means of a generator tensor. It is given by
•
d
∏ (t ) = dt π (t) = π (t) ⊗ Q(t) (4.16)
where Π(t) is the tensor of probabilities of states in the state space at time t, Q (t) is the
generator tensor of the continuous time strucured Markov random field. Q(t) satisfies the
properties of a generator tensor.
The equilibrium distribution of states in the discrete as well as continuous time/index
72 Multidimensional Neural Networks: Unified Theory
structured Markov random field are derived through the utilization of the spectral
representation theorem of the linear operator (tensor) utilizing the eigenvalues and
eigentensors of the linear operator.
When the state transition tensor as well as generator tensor have the G/M/1-type
structure, M/G/1-type structure (Neu), the invariant distribution of the random field has
the tensor geometric form. The derivation of the form of invariant distribution and efficient
recursions for the invariant distribution follow from a generalization of the results in one
dimension.
In the following, state space representations for various types of multidimensional stochastic
dynamical systems that are commonly utilized in electrical engineering are discussed.
In the discrete time, the multi/infinite dimensional dynamical system is described by
the difference equation of the following form:
X(n + 1) = A(n) ⊗ X(n) + B(n) ⊗U(n) + W (n)
(4.17)
Y(n) = C(n) ⊗ X(n) + V (n) + D(n) ⊗ U(n)
The tensors A(n), B(n), C(n), D(n) and the state, input, output tensors are of compatible
dimension and order. The noise terms are multi/infinite dimensional extensions of the
independent, identically distributed noise model in one dimension. It is based on the following
tensor based random variable/random process (like vector random variables, vector
random processes) specification. Generally, they are zero mean tensors (each component
random variable has zero mean) and as a sequence constitute independent tensor random
variables. This model is the simplest model that is commonly utilized in stochastic control
theory (ZoP), (SaW). Utilizing Tensor State Space Representation (TSSR), Unified Theory
of Control, Communication and Computation is formalized in (Rama 4).
Co var iance tensor {W (m), W (n)} = Q(m) δ (m − n),
Co var iance tensor {V (m), V (n)} = R(m) δ (m − n), (4.18)
Co var iance tensor {W (m), V (n)} = 0
These plant noise and measurement noise models are assumed to be independent of the
normal random initial state tensor, X( ). The continuous time multi/infinite dimensional
stochastic models utilize continuous time I.I.D. noise (as in one dimension). The state space
model description has an additive I.I.D. noise term to those described in section 3. With
the above state model, theorems in one dimensional stochastic control are extended to
multi/infinite dimensions, since the matrix linear operator is replaced by the tensor linear
operator. In translating the results inner/outer product between vectors/matrices are
replaced by those between the tensors/tensors.
Now, we consider a noise model which describes processes which are more complicated
than the ones considered previously. The colored noise model considered in ARMA time
series model is a special case version of the following noise model. In this model, the noise
processes constitute a structured Markov random field in multi/infinite dimensions. The
Tensor State Space Representation: Multidimensional Systems 73
plant noise model and measurement noise are uncorrelated/independent. The noise models
satisfy the following equations.
X(n + 1) = A(n) ⊗ X(n) + B(n) ⊗U(n) + L(n)
(4.19)
Y (n) = C(n) ⊗ X(n) + M(n) + D(n) ⊗ U(n)
L(n), M(n) are discrete time structured Markov random fields. The fact that Markov random field
is a stochastic linear system enables one to apply the stochastic dynamic programming. In the
above noise model, the plant and measurement noise are made to be the most general models
that are conceivable, while at the same time they are tractable. The continuous time version of the
state space model has an additive term added to those in section 3.
With the above state space representation, various results developed in one dimensional
stochastic control theory (SaW) are extended to multi/infinite dimensional systems utilizing
the generic principles described in section 3. Thus, various recursive forms for state
estimation, filtering and prediction are translated from one dimensional systems to
multidimensional systems, particularly with the I.I.D. form of noise.
The time series model discussed at the beginning of the section with tensor state space
representation, led the author to provide very detailed linear prediction type results in multi/
infinite dimensions when the noise process is white as well as colored. Thus, the linear prediction
theory, which was so successful in theoretical as well as practical applications is successfully (in
mathematical completeness) advanced to multi/infinite dimensions by the author with the tensor
state space representation. The mathematical equations look familiar with tensor products
being utilized in the equations.
It should be noted that using the signal and noise models described in this section,
multidimensional versions of Wiener and Kalman filters can easily be derived. Various
results on estimation, prediction and control are translated from one dimension to multi-
dimension (Rama 4) (when the multidimensional system has Tensor State Space
Representation i.e. TSSR).
In summary various results developed in one dimensional stochastic control theory,
theory of one dimensional random processes are extended to multi/infinite dimensions
through the Tensor State Space Representation.
Distributed dynamical systems are a class of systems which are more general than the
dynamical systems considered above in some sense. They arise in various practical
applications such as the electrical transmission lines (distributed inductance, capacitance,
resistance along the line), image models, models of tomographic images of brain etc.
One/multi/infinite dimensional systems in which the tensors which appear in the
system dynamics that vary with time are one of the simple illustrations of distributed
dynamical systems. These systems illustrate a form of non-homogeneity in the evolution
74 Multidimensional Neural Networks: Unified Theory
of the system in the state space i.e. a dependence on the discrete/continuous time index of
the manner in which the state coupling, input coupling, output coupling tensors vary with
time, resulting in a distributed nature of the manner of state transitions depending on the
location i.e. discrete/continuous time index. This naturally motivates considering systems,
based on practical applications, in which the state transitions in multi/infinite dimensions
depend on the location. This is once again reminiscent of the conventional models of two/
multidimensional signal processing. To formally provide models of distributed dynamical
systems in multi/infinite dimensions, the following notation from tensor algebra/analysis
is introduced.
Tensor Field:
By a tensor field, we mean a rule assigning a unique value of a tensor to each point of
a certain volume V ( V may be all of space). Let r be the radius vector of a variable point
of V with respect to the origin of some coordinate system. Then, a tensor field is indicated
by writing
Ai 1,..., in = Ai 1,..., in (r ) (4.21)
if the tensor is of order n. A special class of tensor fields are nonstationary fields, which are
functions of both space and time i.e. of both the vector r and the scalar t:
ϕ = ϕ (r , t ), A = A(r , t ) (4.22)
A tensor field is said to be homogeneous if it has no spatial dependence. In this case,
the above reduces to
A = A(t) (4.23)
Tensor fields which are continuous are of utility in physical applications and in modeling
various real life dynamical systems. Non-stationary fields are of utility in modeling
distributed dynamical systems.
It will be evident to an intelligent reader, how the above concepts are utilized in the
following models of distributed dynamical systems. Particularly, tensor fields enable one
to define dynamical systems over regions in the higher dimensional space which are not
Tensor State Space Representation: Multidimensional Systems 75
Y{(i 1,..., in )(k )∈N} = C( i 1,..., in ; j 1,..., jn) (k ) ⊗ X {( j 1,..., jn) (k )∈N } +
Y{(i 1,..., in )(t )∈N } = C( i 1,..., in ; j 1,..., jn) (t ) ⊗ X {( j 1,..., jn) (t)∈N} +
4.7 CONCLUSIONS
Utilization of tensor linear operator associated with dynamic as well as static linear systems
enables one to formulate as well as solve various known as well as new problems utilizing
the powerful tools of tensor algebra (Rama1). This important representation invoked by
the author is hoped to have useful effect on various scientific/mathematical fields. State
space representation by tensor linear operators is discovered and formalized (Rama1). It is
formally demonstrated how the theory of certain multidimensional systems is developed
utilizing the tensor state space representation and translations of the results from one
dimensional system theory. Approaches to translate one dimensional stochastic control
theory to multi/infinite dimensional systems are briefly described. New state space
representations for distributed dynamical systems are developed which enable translating
the results from conventional state space models of multidimensional systems. Thus, in
Tensor State Space Representation: Multidimensional Systems 77
essence the tensor linear operator based representation of static as well as dynamic systems
has important impact on various fields of scientific endeavour.
REFERENCES
(BiF) M. Bisiacco and E. Fornasini, “Optimal Control of Two Dimensional Systems,” SIAM
Journal of Control and Optimization, Vol. 28, pp. 582-601, May 1990.
(BoT) A. I. Borisenko and I. E. Tarapov, “Vector and Tensor Analysis with Applications,” Dover
Publications Inc., New York, 1968.
(Gop) M. Gopal, “Modern Control System Theory“, John Wiley and Sons, New York.
(Neu) M.F. Neuts, “Matrix Geometric Solutions in Stochastic Models”, Marcel-Dekker,
Baltimore.
(Rama 1) Garimella Rama Murthy, “Tensor State Space Representation: Multidimensional
Systems, International Journal of Systemics, Cybernetics and Informatics (IJSCI), January
2007, page 16-23
(Rama 2) Garimella Rama Murthy, “Multi/Infinite Dimensional Neural Networks, Multi/Infinite
Dimensional Logic Theory,” International Journal of Neural Systems, Vol. 15, No. 3, June
2005.
(Rama 3) Garimella Rama Murthy, “Multidimensional Neural Networks: Multidimensional
Coding Theory:Constrained Static Optimization” Proceedings of 2002 IEEE International
Workshop on Information Theory.
(Rama 4) “Optimal Control, Codeword, Logic Function Tensors: Multidimensional Neural
Networks, IJSCI, October 2006, Pages 9-17.
(SaW) Sage and White, “Optimal Control Theory,” Academic Press.
(Zop) R. Zoppoli and T. Parisini, “Learning Techniques and Neural Networks for the
Solution of N-stage Non-linear No n-quadratic Optimal Control Problems,” Topics in 2-d
System Theory, 1992.
This page
intentionally left
blank
CHAPTER
5
Unified Theory of Control,
Communication and Computa-
tion: Multidimensional Neural
Networks
5.1 INTRODUCTION
In the mid 1940s, Norbert Wiener coined the word Cybernetics for the research field
dedicated to understand the control, communication, computation and other such functions
of living systems. It is well agreed that these functions of living systems are controlled by
various functional sub-assemblies in the brain synthesized through bio-chemical circuits.
Research work on this field was pursued by several researchers in diverse fields. The multi-
disciplinary effort resulted in progressing the literature on the subject. But no formally
precise discoveries were made.
Also, starting in 1950s, the research efforts in electrical engineering discipline led to
the isolated theories of control, communication and computation. The central goal of these
three fields is summarized in the following:
• The problem of communication is to convey a message from one point in space
and time to another point in space and time as reliably as possible.
• The problem of control is to move a system from one point in state space to
another point in state space such that a certain objective function is minimized
• The problem of computation is to process a set of input symbols and produce
another set of output symbols based on some information processing operation.
These three problems, on the surface seem to be unrelated to one another.
Also, in the mid 1960s, several researchers became interested in the mathematical model of
the nervous system. This effort was meant to complement the research in cybernetics. Hopfield/
Amari succeeded in providing an abstract model of associative memory. Based on this abstract
model, researchers are led to the following question which remained unanswered.
Question: Is it true that the functional units responsible for control, communication
and computation are synthesized through a network of homogeneous neurons?
80 Multidimensional Neural Networks: Unified Theory
Occasionally research efforts led to establishing some relationship between the three
fields. But, in this chapter it is shown (with mathematical clarity and preciseness) that in the
sense of optimization ( consolidating the earlier efforts of other authors) of some objective
function, these three problems are related to one another leading to one form of unification.
From a practical point of view, this unification leads to design of brain of powerful robots.
With the efforts of the author, Boolean Logic theory was generalized to multi/infinite
dimensions using an optimization approach (Rama 1). This approach led to the area of
multidimensional neural networks (Rama 1). Also using the generalization of results in (BrB)
in one dimension, multidimensional linear as well as non-linear codes are related to
multidimensional neural networks. Thus using these results the research fields: Computation
and Communication are related through the common thread of neural networks. In this
paper, the main achievement of the author is to show that optimal control tensors of certain
multidimensional systems are synthesized as the stable states of neural networks. Thus
utilizing the results summarized in this paragraph, Unified Theory of Control,
Communication and Computation is generalized to multidimensional systems.
This chapter is organized in the following manner. In section 2, unification of control,
communication and computation in one dimensional systems is summarized. In Section 3,
the discovery and formalization of Tensor State Space Representation of certain
multidimensional systems is briefly discussed. Using this representation, optimal control
tensors (in a well known criteria of optimality) are shown to constitute the stable states of
a multidimensional Hopfield neural network. In Section 4, utilizing the results in (Rama
1), (Rama 2), Unified Theory of Control, Communication and Computation in
multidimensional systems is formally described. Conclusions are reported in Section 5.
operations performed by AND, OR, NOR, NAND, XOR gates have appropriate intuitive
interpretation in terms of the entries of the one dimensional arrays i.e. vectors.
Research in the area of artifical neural networks led to the problem whether all one
dimensional logic gates can be synthesized using a single layer neural network. Chakradhar
et al. provided an answer to the problem. They showed that the set of stable states of a
Hopfield neural network correspond to one dimensional logic functions (CAB).
Equivalently, the input and output signal states of a logic gate are related through an
energy function. The outputs correspond to the stable states of neural network (which
constitute the local optima of the energy function). Thus, in a well defined sense, one
dimensional neural networks and logic theory are related.
In the case of one dimensional linear systems, it was shown that the state space
representation of the dynamics is much better than input-output description. Specifically,
state space representation naturally leads to concepts such as controllability, observability
associated with the system.
Unfortunately, in the case of multidimensional systems, there is no natural notion of
causality. Thus system theorists introduced notions such as quarter-plane causality, half-
plane causality etc by partitioning the index set for state variables. In contrast to these
approaches, the author discovered and formalized (Rama 3), Tensor State Space
Representation (TSSR) of CERTAIN multidimensional systems. It is discussed in (Rama 3)
that this particular representation enables transferring results from one dimensional
systems (with vector-matrix state space representation) to certain multidimensional
systems.
In summary, CERTAIN multi/infinite dimensional discrete time/index dynamical
systems can be described by means of a state space description of the following form:
Where ⊗ denotes inner product operation between compatible tensors (BoT). Also in (5.1),
A(n) is an m dimensional tensor of order 2r (called the state coupling tensor ), X(n) is the
state of the dynamical system at the discrete time index n, whereas X(n+1) is the state of
the system at the discrete time index n+1. Furthermore B(n) is an m dimensional tensor of
order r+p ( called the input coupling tensor ), Y(n) is an output tensor of dimension m and
order s. U(n) is an m dimensional input tensor of order p (varying with the discrete time
index of order p) and C(n) (called the state coupling tensor to the output dynamics) is an
m-dimensional tensor of order (s + r), D(n) is the input coupling tensor to the output
dynamics of dimension m and order s + p.
With the above important representation of certain multidimensional systems, we
formulate and solve an important problem in optimal control of certain multidimensional
systems. The solution of the problem shows that the optimal control tensors are synthesized
as the stable states of a multidimensional Hopfield neural network (The connection structute
of m -d Hopfield neural network is a fully symmetric tensor).
Unified Theory of Control, Communication and Computation: Multidimensional Neural Networks 83
Problem Definition
Find an admissible sequence of (realizable) input signal tensors, U(k ) for k ∈ { 0, 1, 2, ....}
(with each component of the tensor being bounded in amplitude by unity (one) or without
loss of generality be a fixed constant) i.e. Ui 1, i 2,..., ir ( k ) ≤ 1 in order to minimize the criterion
−1 kf
J = 2 ∑ Yin ,..., i 1 ( k ) ⊗ Yi1,..., in ( k ) (5.2)
k=0
subject to
X (n +1) = A(n ) ⊗ X (n) + B(n ) ⊗ U (n) (5.3)
Y (n ) = C ( n ) ⊗ X ( n ) (5.4)
where A(n), B(n), C(n), D(n) are tensors arising in the system dynamics of the discrete
time multi/infinite dimensional system. Furthermore, X(n) is the state tensor of the system.
These tensors which arise in the system dynamics are of compatible dimensions. Without
loss of generality, a multi-input, multi-output multidimensional linear system is considered.
Let the impulse response tensor of the system be denoted by h(k, l). This is the discrete
time version of the problem given in (GoC) for CERTAIN discrete time multidimensional
systems. The open problem given in (GoC) is solved in (Rama 5).
Problem Definition
The optimality condition is derived through the application of the maximum principle or
equivalently, the dynamic programming principle. The application of dynamic
programming enables us to derive the necessary as well as sufficient condition through
the principle of optimality in some cases.
Solving (5.7) for λ(k + 1) and substituting in (5.9), we arrive at the optimal control
sequence. When the constraint set is other than a hypercube, various well known techniques
from mathematical programming for different constraint sets such as a convex polytope,
convex polyhedra are invoked in the context of quadratic programming. The cost function
is quadratic and it is optimized over various types of constraint sets such as the one
described previously.
With the terminal state specified, the equation (5.7) is recursed backwards to arrive at
the optimal control tensor in the case of multi/infinite dimensional systems. Thus, an
efficient computational form for solving the two point boundary value problem is
derived in the following. It should be noted that, we derive the expression for λk +1 in the
case of certain linear time varying multi/infinite dimensional dynamical systems
λ ( k ) = − C jm ,..., j 1 ( k ) ⊗ Yi 1,..., ip ( k ) + Ais ,..., i 1 ( k ) ⊗ λ ( k + 1) (5.10)
starting with the terminal condition, recursing backwards.
Remark
Before we proceed further, it should be reminded that the indices for tensor describing the
order of the tensor are given values by the symbols that came to mind. The tensors in the
above state space representation are of compatible order to ensure that inner and outer
products make sense. Now, we return to the derivation. In the following, the notation ⊗ is
utilized to denote the inner product (BoT) between the tensors of compatible order.
λt1,..., tl (k f ) = – C jm ,..., j 1 (k f ) ⊗ Yi 1,..., ip (k f ) (5.11)
−... − Ais ,..., i 1 ( k + 1) ⊗ Ais ,..., i 1 (k + 2) ⊗ ... ⊗ Ais ,..., i 1 ( k + l) ⊗ C jm ,..., j 1 (k + l + 1) ⊗ Yi 1,..., ip ( k + l + 1)
(5.16)
Thus we have the optimal control solution for the problem given by (utilizing (5.9))
Uv 1,..., vr ( k ) = Sign (Bsl ,..., s1 (k ) ⊗ C jm ,..., j 1 (k + 1) ⊗ Y( k + 1) +
∑B
i =1
sl ,..., s1 ( k ) ⊗ Ais ,..., i 1 (k + 1) ⊗ .... ⊗ Ais ,..., i 1 ( k + i) ⊗ C jm ,..., j 1 ( k + i + 1) ⊗ Yi 1,..., ip (k + i + 1) ) (5.17)
Now, utilizing the definition of the impulse response tensor of the time varying linear
system, we have
Uv 1,..., vr ( k ) = Sign (Bsl ,..., s1 ( k ) ⊗ C jm ,..., j 1 ( k + 1) ⊗ Y( k + 1) +
∑ h (k + i + 1, k ) ⊗ Y(k + i + 1))
i =1
(5.18)
l
= Sign (∑ h (k + i + 1, k ) ⊗ Y(k + i + 1) )
i=0
86 Multidimensional Neural Networks: Unified Theory
h(.,.) is the transposed tensor of the impulse response tensor. The term in the
parenthesis is given by
l l k +i +1
∑ h (k + i + 1, k) ⊗ Y(k + i + 1) = ∑ h (k + i + 1, k ) ⊗ ∑ h (k + i + 1, j) ⊗ u ( j)
i=0 i=0 j=0
(5.19)
Exchanging the order of summation, (with the help of associated index grid), we have
kf kf − k − 1
∑
j =0
∑
i = max imum {0, j − k − 1}
( h ( k + i + 1, k ) ⊗ h ( k + i + 1, j) ⊗ u( j) (5.20)
kf kf − k − 1
U ∗ (k ) = Sign ∑ ∑ (h (k + i + 1, k ) ⊗ h (k + i + 1, j )) ⊗ U ( j ) (5.21)
j = 0 i = max{1, j − k − 1}
Let us define
kf − k − 1
R(k , j ) = ∑ (h (k + i + 1, k ) ⊗ h (k + i + 1, j)) (5.22)
i = max{1, j − k − 1}
This is the energy density tensor of time invariant linear system obtained from the
impulse response tensor. Thus the optimal control tensor is the stable state of a
multidimensional Hopfield neural network.
subject to the constraint given in (5.26) and the input tensors are constrained to be on the
continuous time multi/infinite dimensional hypercube.
Solution:
Form the Pontryagin function ( or Hamiltonian) of the problem. It is given by
−1
H (X , U , λ , t) = (C(t) ⊗ X(t))ls ,..., l 1 ⊗ (C(t) ⊗ X(t))l 1,..., ls +
2
λir ,..., i 1 (t) ⊗ ( A(t) ⊗ X(t) + B(t) ⊗ U(t)) (5.28)
i
−λi 1,..., ir (t) = − Cls ,..., l1 (t) ⊗ Cl1,..., ls (t) ⊗ Xi 1,..., ir (t) + Ajr ,..., j 1; ir ,..., i 1 (t) ⊗ λi 1,..., ir (t)
λi 1,..., ir (t) = − Ajr ,..., j 1; ir ,..., i 1 (t) ⊗ λi 1,..., ir (t) + Cl 1,..., ls (t) ⊗ Yl 1,..., ls (t)
d a
φ (t ,τ ) = − Ajr ,..., j 1; ir ,..., i 1 (t) ⊗ φ a (t , τ ) (5.33)
dt
φ a ( t ,τ ) = I ; φ a ( t , t f ) = φ ( t f , t )
tf
= ∫ φ (τ , t) ⊗ C
tf
ls ,..., l1 (τ ) ⊗ Yl1,..., ls (τ )dτ (5.34)
tf
Hence, we have
−Bjp ,..., j 1; ir ,..., i 1 (t) ⊗ λi 1,..., ir (t) =
tf
∫B
t
jr ,..., j1; ir ,..., j1 (t) ⊗ φ (τ , t) ⊗ Cls ,..., l 1 (τ ) ⊗ Yl 1,..., ls (τ )dτ (5.35)
where
τ
Yl1,..., ls (τ ) = ∫ Cl1,..., ls (τ ) ⊗ φ (τ , s) ⊗ B( s) ⊗ U ( s)ds (5.36)
to
Unified Theory of Control, Communication and Computation: Multidimensional Neural Networks 89
Thus, we have
− B jp , ..., j 1 ; ir , ..., i 1 (t) ⊗ λ i1, ..., ir (t)
tf τ
= ∫
t
B j p , ..., j1 ; ir , ..., i1 (t ) ⊗ φ (τ , t ) ⊗ C ls, ..., l1 (τ ) ⊗ [ ∫
t0
C(τ ) ⊗ φ (τ , s) ⊗ B(s ) ⊗ U ( s ) ds] dτ (5.37)
tf tf
∫ ∫B
0 s
jp ,..., i 1 (t) ⊗ φ (τ , t) ⊗ Cls ,..., l 1 (τ ) ⊗ Cl1,..., ls (τ ) ⊗ φ (τ , s) ⊗ Bi 1,..., jp ( s) dsdτ (5.38)
tf
= Sign ∫ R(t , s) ⊗ U * j 1,..., jp ( s)ds (5.41)
0
where R(t,s) is the energy density tensor of the linear system and is given by
tf
For, linear time invariant multidimensional systems, H (τ , s ) , the impulse response tensor
is dependent only on the difference between arguments/indices. Thus, the necessary
condition on the optimal control (for continuous time multidimensional systems) is given
by (5.41). It shows that the optimal control tensor is the stable state of a continuous time
(Hopfield type) neural network. One must understand that the concept of continuous
time multidimensional neural network is conceived by the author in (Rama 4). It should
be noted that when the objective function is a higher degree form (rather than quadratic
form), similar derivations are done. Details are avoided for brevity.
90 Multidimensional Neural Networks: Unified Theory
In view of the results in previous section, in the following, we briefly summarize the results
discussed in (Rama 1) and (Rama 2) so that we realize that the unification (of control,
communication and computation functions) in one dimension, discussed and formalized
by the author also naturally extends to multidimensional systems.
such a way that every local maximum of the energy function corresponds to a codeword
tensor and every codeword tensor corresponds to a local maximum (i.e. stable state).
Unification: Now utilizing the results in Section 3 (relating optimal control tensors and
multidimensional neural networks), we readily have the unification of control,
communication and computation (through the common thread of neural networks).
Formally, the optimal control tensors, optimal multidimensional logic functions,
multidimensional codeword tensors are synthesized through the stable states of
multidimensional neural (generalized) networks.
In the above unification discussion, we only considered neural (generalized neural)
networks in discrete time. In equation (5.41), we discovered and formalized the concept of
continuous time neural associative memory (with the energy function being a quadratic
form associated with certain Kernel).
Continuous time generalized neural networks are defined and associated with
optimal control tensors, optimal codeword tensors and optimal switching functions.
Unified theory with generalized neural networks follows in a similar fashion. Details
are avoided here for brevity.
In view of formal clarity, the following theorem is a comprehensive statement of the
unification of control, communication and computation functions (with quadratic energy
function/objective function) The generalization to the case of higher degree energy function
follows in a similar manner.
Theorem 5.1: Consider a linear time varying multidimensional system with the state space
representation provided in (5.3), (5.4). The optimal control tensor (subject to a finite
amplitude constraint, i.e. Uv1,..., vr ≤ 1 o r, N), optimal switching function (in the sense of a
transformation between an input tensor and an output tensor), optimal linear
multidimensional code constitute the local optimum of a quadratic form in the components
of state variable, input, output tensors. Thus, in the case of linear dynamical systems, with
quadratic energy/objective function, the optimal control tensors, optimal switching
function, optimal linear code are unified to be the local optima of a quadratic form (with
argument/index/time varying coefficient tensors for time varying systems) over the
multidimensional hypercube. Thus these local optima are synthesized as the stable states
of neural/generalized neural network.
Proof: From (Rama1), the stable states of a multidimensional neural network constitute
the local optimum of a quadratic form with the fully symmetric connection tensor as the
weighting tensor. The convergence theorem for (infinite) multidimensional neural networks
provides a formal result. These local optima are defined to be the multidimensional logic
functions in the sense of a mapping between the input tensors and the stable state tensor.
But, from (5.23), the optimal control tensors which optimize a quadratic objective function
have the stable state structure of an interconnected multidimensional neural network with
92 Multidimensional Neural Networks: Unified Theory
block fully symmetric connection structure. Thus, the optimal control and optimal switching
function which optimize a quadratic objective function constitute the stable states of a
multidimensional neural network.
From (Rama 2), it is formally true that the connection structure of a multidimensional
graph-type structure (say graphoid) is associated with a multidimensional linear code through
its cut space. These cutset codes are termed graph-theoretic codes. It is also proved in (Rama
2) that maximum likelihood decoding of a corrupted word (received word) with respect to
the graphoid theoretic code is equivalent to finding the global optimum of the quadratic
energy function associated with a multidimensional neural network. Furthermore, it is shown
that a tensor constitutes the local optimum of a multi-variate polynomial in the components
of input, output tensors (quadratic tensor form) if and only if ( the polynomial is associated
with the parity check tensor) it is a codeword of the multidimensional linear code. Thus,
associated with the generator/parity check tensor of graphoid theoretic code, there exists a
quadratic form whose local optimum constitute the codewords (quadratic form over the
multidimensional hypercube).
Hence the optimal code, optimal control and optimal switching function which
constitute the local optimum of a multi-variate quadratic form ( in the components of
state, input, ouput tensors) are unified to be the same. This constitutes the statement of
unified theory of control, communication and computation in linear dynamical systems (time
varying as well as time invariant systems) with a quadratic form as the objective function.
Q. E. D.
In a future revision, it is discussed how the unification extends to other important
functions. Generalization of the results to certain infinite dimensional systems is also
discussed.
5.5 CONCLUSIONS
In this chapter, based on the work of author and earlier authors, the unification of control,
communication and computation functions (through the common thread of neural
networks) is formalized. The main contribution of the author for unification in one
dimension is to show that the optimal control vectors (in a well known optimality criterion)
constitute the stable states of a Hopfield network. The next important step was to envision
unification in multidimensions. Based on the concept of multidimensional neural networks
(Rama1, Rama 2), the author was able to formally unify communication and computation
functions. Tensor State Space Representation (TSSR) conceived and formalized by the author
was utilized to prove that the optimal control tensors constitute the stable states of a
multidimensional neural network (in discrete as well as continuous time systems). With
this important result, the author was able to show that optimal codewords, optimal logic
functions and optimal control tensors constitute the stable states of a multidimensional
neural network.
Unified Theory of Control, Communication and Computation: Multidimensional Neural Networks 93
REFERENCES
(BoT) A. I. Borisenko and I. E. Tarapov, “Vector and Tensor Analysis with Applications“, Dover
Publications Inc., New York, 1968.
(BrB) J. Bruck and M. Blaum, “Neural Networks, Error Correcting Codes and Polynomials Over
the Binary Hypercube“, IEEE Transactions on Information Theory, Vol. 35, No. 5, September
1989.
(CAB) S.T. Chakradhar, V.D. Agrawal and M.L. Bushnell, “Neural Models and Algorithms
for Digital Testing“, Kluwer Academic Publishers, 1991.
(GoC) B. Gopinath and T. Cover, “Open Problems in Control, Communication and Computation“,
Springer, Heidelberg, 1987.
(HoS) M.Honig and K. Stieglitz, “On Wyner’s conjecture” Bellcore Technical
Memorandum.
(Rama 1) Garimella Rama Murthy, “Multi/Infinite Dimensional Neural Networks, Multi/Infinite
Dimensional Logic Theory“, International Journal of Neural Systems, Vol.15, No.3, Pages
223-235, June 2005.
(Rama 2) Garimella Rama Murthy, “Multidimensional Coding Theory: Multidimensional Neural
Networks“, In part presented at the 2002 IEEE International Workshop on Information
Theory.
(Rama 3) G. Rama Murthy, “Tensor State Space Representation: Multidimensional Systems“,
International Journal of Systemics, Cybernetics and Informatics (IJSCI), January 2007,
pages 16-23.
(Rama 4) G. Rama Murthy, “Optimal Control, Codeword, Logic Function Tensors:
Multidimensional Neural Networks”, International Journal... (IJSCI), October 2006, Pages 9-
17.
(Rama 5) G. Rama Murthy, “Signal Design for Magnetic and Optical Recording Channels: Spectra
of Bounded Functions“, Bellcore Technical Memorandum, TM-NWT-018026, December 1990.
(RKB) G. Rama Murthy, P. Krishna Reddy and L. Behera, “Neural Network Based Optimal
Binary Filters”, submitted to Elsevier Signal Processing Journal.
(Gop) M. Gopal, “Modern Control System Theory“, John Wiley and Sons, New York,
(SaW) Sage and White, “Optimum Systems Control“, Prentice-Hall Inc., Englewood Cliffs,
New Jersey 07632.
This page
intentionally left
blank
CHAPTER
6
Comple
ComplexxV alued Neural
Valued
Associative Memory on the
Comple
Complexx Hypercube
6.1 INTRODUCTION
The Hopfield model of the neural network is designed basing on the McCulloch-Pitts
neuron. In this network the computation of the algebraic threshold function is carried out
at each node. The edge between two nodes is associated with a weight. This network can
hence be represented with a weight matrix which is nothing but a symmetric matrix where
Wi , j represents the weight associated with the edge connecting the neurons i and j . Since
it is a symmetric matrix, (i.e., the network represented by an undirected graph), we have
Wi , j = W j ,i . The threshold function can be calculated at each neuron using the function
1, if Hi (t ) ≥ 0
Vi (t + 1) = sgn ( Hi (t) ) =
−1 otherwise
where,
n
Hi (t ) = ∑ Wj , i Vj (t ) − Ti
j =1
Here, Vi (t + 1) represents the value of the function i.e. state value at node i at time (t +1)
(which is the next time instant).
Energy function: The model also associates an energy function which is the quadratic
form V T (t )WV (t ) (neglecting the threshold value without loss of generality) where V(t)
stands for the column matrix that represents the vector corresponding to the state of all
neurons at time instant t. This vector will lie on the hypercube whose order is that of the
synaptic weight matrix.
96 Multidimensional Neural Networks: Unified Theory
Modes of operation: The Hopfield model can operate in one of the two modes, serial or
fully parallel mode or a combination of these. Serial mode is the one in which the next state
computation, i.e., the evaluation of the neural network takes place at each node (node after
node) for every time instant. In the fully parallel mode the evaluation takes place for every
node at each time instant. A combination implies that the evaluation occurs at a group of
nodes for every time instant.
A stable state is defined as a state such that after reaching it, the network output does
not change i.e., V(t) = sgn(WV(t)).
The model results in the following convergence theorems:
Theorem 1: If the neural network is operating in the serial mode and the elements on the
diagonal of connection matrix are non-negative, the network will converge to a stable
state i.e., there are no cycles in the state space.
Theorem 2 : If the network is operating in the fully parallel mode, the network will either
converge to a stable state or to a cycle of length two i.e., it oscillates between two states in
the state space.
Goals: The goals of this chapter are to consider the possibilities of implementing a complex
valued associative memory and observe the behavior of the model in the serial and the
fully parallel modes.
Remark
Recently the concept of a complex valued neural network has been explored since
the work of [ZURADA 1996] and has been almost successfully applied to the fields
of image processing and pattern recognition. A conglomeration of the papers on
the subject has been briefly collected in [HIROSE 2003]. Following this literature
our work is based on implementation of a newer method to realize a complex valued
neural network.
The chapter is organized into three parts. The first part of section 2 discusses the features
of the model the authors are proposing. Also implications to convergence of the network
are briefly pointed out. The second part of the same section provides a proof technique
used for arguing the convergence properties of the discussed form of the complex valued
associative memory. The third part actually presents the proof of convergence and considers
how it is similar to real valued Hopfield associative memory.
The Synaptic weights are complex-valued and the weight matrix is Hermitian unlike the
real valued case, where it is symmetric. The next state V(t+1) can be computed as,
V(t+1) = sgn (real part (WV(t))) + jsgn (complex part (WV(t))) (6.1)
Thus the values of the entities in the column vector V(t+1) anytime would be confining to
the set {1+J, 1–j, –1+j, –1–j} unlike the real case wherein the values confine only to the set
{1, –1}. Thus the total number of values V(t+1) would take, i.e., the number of points of the
“Complex hypercube” would equal 4n where n is the order of the neural network.
The energy function would thus be E(t) = (VT(t))* WV(t) (neglecting the threshold value
without loss of generality). The authors would like to prove that an important property of
this model would be to converge to a stable state when operating in the serial mode and
utmost to a cycle of length 2 when operating in the fully parallel mode.
The proof technique adopted by the authors is method of isolating the real and imaginary
parts of the Hermitian synaptic weight matrix and evaluating them separately. As one can
see when the Hermitian matrix is isolated into two parts real and imaginary, the matrix
corresponding the real part would be a real symmetric one and that corresponding the
imaginary part would be a real anti-symmetric one.
Remark
It is an interesting observation that the energy function when evaluated for the real part
with the complex valued vector would behave exactly as if it were a matrix being
evaluated for the real valued neural network proposed in [HOPFIELD].That is, we have
complex valued associative memory with a real connection matrix. The exact details of
the proof follow.
V1 (t )
W11 " W1 n #
Ek (t ) = (V (t)" Vk (t ) " V (t ) )
1
* *
n
*
# O # Vk (t )
W
n1 " W nn #
Vn (t )
If we break the expression for Ek (t) into two parts, EkR(t) and Ek i(t), the real and imaginary
parts (of energy function), they come out like this:
V1 (t )
W11R " W1nR #
V (t )
(
EkR ( t ) = V1 (t)"Vk (t) "Vn (t)
* * *
) #
W
" #
k
#
(6.2)
1nR " WnnR
V (t )
n
V1 (t )
0 " jW1nI #
V (t )
EkI (t ) = (V1 (t )"Vk (t ) "Vn (t ) )
* * * # " # k
− jW " 0 #
V (t )
1nI
n
Evaluating the real part of (6.1) for the energy function, we have,
k −1 k −1 n
∑V i
*
( t ) ∑ WijRVj ( t ) + WikRVk ( t ) + ∑W
Vj ( t ) + ijR
i=0 j =1 j= k +1
k − 1 n
Ek (t ) = Vk * (t ) ∑ WkjRVj ( t ) + WkkRVk ( t ) + ∑ WkjRVj ( t ) +
j=1 j= k +1
n k −1 n
∑ Vi * (t ) ∑ WijRVj (t ) + WikRVk ( t ) + ∑ WijRVj ( t )
i =k +1 j =1 j =k +1
Similarly,
V1 (t )
W11R " W1nR #
EkR (t + 1) = (V1 (t )" Vk (t + 1)"Vn (t )) # # Vk (t + 1)
* * *
O (6.3)
W
n1 R " WnnR #
V (t )
n
Complex Valued Neural Associative Memory on the Complex Hypercube 99
The expression for Ek R(t+1) results because it is operating in the serial mode and the
updating of the function value takes place at only one node, i.e., the node at which we are
evaluating(Vk ). In the parallel mode, however, all the function values in the vector will be
updated.
k −1 k −1 n
∑ V i
*
( t ) ∑ W V
ijR j ( t ) + W V
ikR k ( t + 1 ) + ∑ WijRVj (t ) +
i=0 j =1 j= k +1
k − 1 n
Vk * (t + 1) ∑ WkjRVj (t ) + WkkRVk ( t + 1) + ∑ WkjRVj ( t ) +
EkR ( t + 1) j =1 j =k +1
=
n k −1 n
∑ Vi * (t ) ∑ WijRVj (t ) + WikRVk ( t + 1) + ∑ WijRVj (t )
i =k +1 j =1 j =k +1
∑
j = 1( j≠ k )
(
WkjR Vk* ( t + 1)Vj (t ) − Vk* (t )Vj (t ) + ) ∑ j = 1( j≠k )
(
WjkR Vk (t + 1)Vj* ( t ) − Vk (t )Vj* (t ) )
But since the real part of both matrices are symmetric, WkjR = WjkR ;
Thus,
∑ (
WkjR Vk* ( t + 1)Vj ( t ) − Vk* ( t )Vj ( t ) + Vk (t + 1)Vj* ( t ) − Vk ( t )Vj* ( t ) )
j = 1( j≠ k )
(
WkkR (VkR2 ( t + 1) + VkI2 (t + 1)) − (VkR2 (t ) + VkI2 (t )) + )
∆EkR ( t ) = n
∑
j = 1( j≠ k )
(
WkjR Vj ( t ) ∆Vk* ( t ) + Vj* (t ) ∆Vk (t ) )
(
WkkR ∆VkR (t ) (VkR (t + 1) + VkR (t )) + ∆VkI (t )(VkI (t + 1) + VkI (t )) + )
∆EkR ( t ) = n
∑
j = 1( j≠ k )
(
WkjR Vj (t ) ∆VkR ( t ) − jVj ( t ) ∆VkI ( t ) + Vj* ( t ) ∆VkR ( t ) + jVj* (t ) ∆VkI ( t ) )
100 Multidimensional Neural Networks: Unified Theory
(
WkkR ∆VkR ( t ) (VkR (t + 1) + VkR ( t )) + ∆VkI (t )(VkI ( t + 1) + VkI (t )) + )
∆EkR (t ) = n
∑
j = 1( j≠ k )
( )
2WkjR VjR (t ) ∆VkR (t ) + VjI (t ) ∆VkI (t )
n
2 WkkRVkR ( t ) + ∑ WkjRVjR (t ) ∆VkR ( t ) + WkkR ∆VkR2 ( t ) +
j =1( j ≠k )
∆EkR (t ) = (6.4)
n
2 WkkRVkI (t ) + ∑ WkjRVjI (t ) ∆VkI (t ) + WkkR ∆VkI2 (t )
j =1( j≠ k )
n n
If we consider that W k k RVkR (t ) + ∑ W k jRVjR (t ) and W k k RVk R (t ) + ∑ W k jRV jR (t )
j =1( j ≠k ) j =1( j ≠k )
are expressions for H k (t) in the real mode with some arbitrary VjR (t) and VjI (t), then from
the Hopfield convergence theorem, it is proved from the expression for
∆E = 2 Hk ∆Vk2 (t) + Wkk ∆Vk2 (t) , that it is a value that eventually goes to zero which means
that Ek (t) is not which is the local maxima. Hence ∆EkR (t) in complex case also reaches zero
hence Ek (t) is also not-to a local maxima.
Thus it remains to evaluate the imaginary part contribution of energy function i.e.
V1 ( t )
0 " W1nI #
EkI (t ) = (V1 ( t )"Vk (t )"Vn ( t )) #
* * *
O # Vk ( t )
−W
1 nI " 0 #
Vn (t )
Complex Valued Neural Associative Memory on the Complex Hypercube 101
k −1 k −1 n
∑ V (t ) ∑
i
*
jWijI Vj (t ) + jWikIVk (t ) + ∑ jWijIVj (t ) +
i =1 j =1( i ≠ j ) j = k + 1( i ≠ j )
k −1 n
EkI (t ) = Vk (t ) ∑ jWijIVj (t ) + ∑ jWijI Vj (t ) +
*
j =1 j = k +1
n k −1 n
∑ Vi (t ) ∑ jWijI Vj (t ) + jWikIVk (t ) + ∑ jWijIVj (t )
*
i =k +1 j =1(i ≠ j ) j = k + 1( i ≠ j )
and similarly,
V1 (t )
0 " jW1nI #
EkI ( t + 1) = (V1 (t )"Vk ( t + 1)"Vn ( t )) #
* * *
" # Vk (t + 1)
− jW1nI " 0 #
Vn (t )
k −1 k −1 n
∑ V i
*
(t ) ∑ jWijI Vj (t ) + jWikIVk (t + 1) + ∑ jWijI Vj (t ) +
i =1 j =1( i ≠ j ) j = k + 1( i ≠ j )
k −1 n
EkI ( t + 1) = ∆ Vk * (t + 1) ∑ jWijIVj (t ) + ∑ jWijI Vj (t ) +
j =1
j = k +1
n k −1 n
∑ Vi (t ) ∑ jWijI Vj (t ) + jWikIVk (t + 1) + ∑ jWijI Vj (t )
*
i =k +1 j =1(i ≠ j ) j = k + 1( i ≠ j )
∆EkI (t ) = 2 ∑ (W
j =1( j ≠ k )
kjI ) (
VjR ( t ) ∆VkI − WkjI VjI (t ) ∆VkR ) (6.5)
Thus,
(6.6)
This is the expression for ∆Ek (t) in the complex valued neural network.
102 Multidimensional Neural Networks: Unified Theory
As one can observe from the above expression, the first term is zero when the neural
net converges to a stable state(from [BRUCK 1987]). The second term is real but may take
negative values depending on the imaginary parts of the corresponding entities of the
weight matrix. But when the first term becomes zero, i.e., when the net converges to a
stable state, ∆VkI will be zero. Hence the second term will be zero at the stable state. Which
means that the energy function of a complex valued neural net converges to a positive
value. This also proves that the complex valued associative memory constrained with a
real connection matrix converges to a stable state with a behavior that matches the real
valued neural net described by Hopfield.
Graphs of convergence of the energy function to a stable value and that of the entity
∆E to zero are depicted below. The relative performance as well as the analogous
relationship of that of three cases depicted (i) complex synaptic weight matrix and complex
vector, (ii) convergence for real valued neural network and (iii) an intermediary of these
two, i.e., a test with complex vectors on a real valued synaptic weight matrix is shown.
Convergence of Energy function
800
500
Energy function value E
Real weights
complex vectors
400
300
200
Real weights real
vectors
100
-100
-200
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
time t
Real weights
200 complex vectors
Energy difference
150
100
Real weights
50 real vectors
0
0 2 4 6 8 10 12 14 16
time t
Complex Valued Neural Associative Memory on the Complex Hypercube 103
Thus as we see from the figures above, in the first graph the energy function value is
the greatest for the case where weights are complex and the vectors are also complex. That
is because the complex part of the weight matrix is non-zero and contribute to the energy
value. Next comes the case where weights are real but the vectors are complex. In this case,
though there will be no complex part of the weights contributing, the complex part of the
vectors contribute to this increase in the energy value from the original Hopfield case
which in turn has the least local optima of convergence.
It is customary and mandatory to prove that this analogue between the complex and
the real cases of Hopfield associative memory in the serial mode can also be extended to
work in the fully parallel mode. It has been observed however, that a separate proof is not
required to illustrate the behavior of the model in the fully parallel mode. The same proof
can be extended in a certain manner shown by the work of [BRUCK 1987].
Before going into the extension needed, let the general form of the expression for the
fully parallel mode be observed first.
V1 (t + 1)
W11R " W1nR #
EkR ( t + 1) = (V1 (t + 1)"Vk (t + 1)"Vn (t + 1)) # # Vk (t + 1)
* * *
"
W
n 1R " WnnR #
Vn (t + 1)
V1 (t + 1)
W11I " W1nI #
EkI (t + 1) = (V1 (t + 1)"Vk (t + 1)"Vn (t + 1)) # # Vk (t + 1)
* * *
"
W
n 1I " WnnI #
Vn (t + 1)
These expressions change because the computation of the function is done at every
node of the neural net at a certain time instant.
Instead of evaluating the above expressions, an easier method is to define a special
neural net N’ such that,
N′ = [W ‘, T ‘]
0 W
where W′ = W 0
T
and T′ = T
104 Multidimensional Neural Networks: Unified Theory
As it can be seen from the above matrix N’, it defines a newer neural net which
corresponds to a bipartite graph with 2n nodes. Let the subsets of nodes be P1 and P2
which are independent sets of nodes.
It has been proved beyond doubt in previous work that
1. for any serial mode of operation in N there exists a serial mode of operation in
N’ provided W has a non-negative diagonal.
2. there exists a serial mode of operation in N’ which is equivalent to a fully parallel
mode of operation in N.
This can be seen because if N is operating in a fully parallel mode, since P1 and P2
correspond to independent sets of nodes, it would be equivalent to evaluating one node at
a time in N’ which is nothing but the serial mode.
Now, since N’ is operating in a serial mode, when N’ reaches a stable state one of the
following things happen.
1. The current state of operation of both the partitions P1 and P2 that correspond to
N’ may be the same which means that both P1 and P2 converge to a stable state.
2. The current state of operation of both P1 and P2 are distinct which implies N
will oscillate between the two states at which P1 and P2 are existing currently,
thus converging to a cycle of length two.
The graphs which depict the operation in the parallel mode are shown below.
200
150
Energy function E
100
50
-50
-100
-150
1 2 3 4 5 6 7 8 9 10 11
Time t
Complex Valued Neural Associative Memory on the Complex Hypercube 105
150
Energy difference Delta E
100
+ + +
50
-50
+ + +
-100
+
-150
1 2 3 4 5 6 7 8 9
Time t
The above graphs depict the oscillation of the value of the energy function for a peculiar
case and that of the value of ∆E as it oscillates about zero.
6.4 CONCLUSIONS
From the above discussion one can observe that the evaluation of the signum function at
each node and thereby determining the vector that originates for the next instant makes
the complex valued neural network similar in behavior to the real valued one. However
the designs of neural networks( [ZURADA 1996],[ZURADA 2003] and [HIROSE 2003])
proposed so far have not seen the plausibility of the implementation of the above mentioned
method of performing the complex signum function. Since the application of the function
proves that the network converges just as the real valued one, it can be conveniently applied
to applications such as image processing and pattern recognition.
106 Multidimensional Neural Networks: Unified Theory
REFERENCES
[1]. [HOPFIELD 1982] J.J. Hopfield and D.W. Tank “Neural Computations of Decisions in
Optimization Problems” in Proc. Nat. Acad. Sci. USA , Vol. 79., pp. 2554-2558, 1982.
[2]. [BRUCK 1987] Jehoshua Bruck and Joseph W.Goodman “A Generalized Convergence Theorem
for Neural Networks”. IEEE First Conference on Neural Networks, San Diego, CA June 1987.
[3]. [ZURADA 1996] Stainslaw Jankowski, Andrzej Lozowski and Jacek M.Zurada,
“Complex-valued Multistate Neural Associative Memory”. IEEE Transactions on Neural
Networks. Vol. 7, No 6, November 1996.
[4]. [ZURADA 2003] Mehemet kerem Muezzinoglu, Student member IEEE, Cuneyt Guzelis
and Zacek.M.Zurada, Fellow IEEE “A New Design Method for the Complex-valued Multistate
Hopfield Associative Memory”. IEEE Transactions on Neural Networks Vol. 14. No. 4, July
2003.
[5]. [HIROSE 2003] Akira Hirose. “Complex Valued Neural Networks: Theories and
Applications”. World Scientific Publishing Co, November 2003.
[6]. G. Rama Murthy and D. Praveen, “Complex valued Neural Associative Memory on the
Complex Hypercube,” Proceedings of 2004 IEEE Conference on Cybernetics and Intelligent
Systems (CIS 2004).
Optimal Binary Filters: Neural Networks 107
CHAPTER
7
Optimal Binary Filters:
Neural Networks
7.1 INTRODUCTION
Our essential goal is to formulate and solve an optimal filtering problem. But the problem
is related to optimal signal design problem, which is formulated and solved in the following.
The signal design problem is an open research problem formulated by A.Wyner, in
continuous time [8].
108 Multidimensional Neural Networks: Unified Theory
subject to
X k +1 = AX k + Buk , X (0) = 0
Yk = CX k
Here A is n × n matrix, B is n × p matrix, C is an m × n matrix. Single input, single output
(SISO) as well as multi-input, multi-output (MIMO) channels are considered. Let the m ×
p impulse response matrix [6] be denoted by h(.).
7.2.2 Solution
The optimal control vectors which maximize the total output energy of a linear discrete
time filter over a finite horizon [0, k f ] are given by
*
u k = sign ∑R kj u j
j
k f − k −1
where Rkj = ∑
i = max{1, j − k −1}
hT (i + 1)h(k + i + 1 − j ) and u * is the optimal choice. The condition
k
provided above provides a necessary condition on the optimum input signal/control. The
stable states of a neural network constitute the local optimum control vector. The global
optimum stable state provides the global optimum control vector [2].
Proof : Discrete Maximal Principle well known in literature is utilized to provide the
solution. Consider a one-dimensional linear dynamical system. Its state space description
is given by
X(k +1)= A(k) X(k) + B(k)u(k),
X(0)=X 0 (7.1)
order tensor) and X(k ) is the state vector (first order tensor) i.e. an ‘n × 1’ vector. C(k ) is an
‘m × n’ matrix, B(k ) is an ‘n × p ’ matrix.
In the case of certain multidimensional linear systems as well as infinite dimensional
linear systems, the state transition tensor [7], the state tensor, the input tensor [5], the
output tensor are of compatible dimension as well as order. The discrete time dynamical
system evolution (linear or non-linear) is described through tensors [7]. The inner product
between the linear operators is carried out with the standard method. To restrict oneself
to the problem considered in the present chapter, we return to the one-dimensional system
keeping in mind that the authors already made the extension to multi/infinite
dimensional systems [7].
i
The input sequence satisfies the constraints of the following form i.e. uk ≤ 1 , where u ki
is the ith component of the input vector. Thus, U k ∈ V , a subset of Rr.
The cost function J is given below. The problem considered is to find an admissible
sequence uˆ k , k = 0,1,..., k f − 1 subject to the constraints and also minimizing the objective
function J.
k −1
−1
f
1 T T
J=
2 k =0
∑
X kT CT (k )C(k )X k − Xk f C ( k f )C( k f )Xk f
2
(7.3)
∂H( xk , uk , λk , k )
λk = (7.4)
∂xk
where, the Hamiltonian is given by
−1 T T
H (x k , uk , λk +1 , k ) = Xk C ( k )C( k )Xk +λ T [ A( k )X + B(k )u ] (7.5)
2 k +1 k k
This will provide the terminal condition for solving (7.6). Since the input is constrained
it must necessarily satisfy
− AT ( k f − 1)CT ( k f )Y( k f )
λk f −2 = −C T ( k f − 2)C( k f − 2)X (k f − 2)
− AT (k f − 2)C T (k f − 1)Y (k f − 1)
− AT ( k f − 2) AT ( k f − 1)C T ( k f )Y( k f )
λk f −3 = −CT ( k f − 3)Y( k f − 3)
Optimal Binary Filters: Neural Networks 111
− AT ( k f − 3)CT ( k f − 2)Y( k f − 2)
− AT ( k f − 3) AT ( k f − 2)C T ( k f − 1)Y( k f − 1)
− AT ( k f − 3) AT ( k f − 2) AT ( k f − 1)CT ( k f )Y( k f )
Thus continuing the pattern downwards, we have for the linear time invariant filters
λk f −l = −CT Y( k f − l) − AT CT Y( k f − l + 1)
− ( AT )2 C T Y ( k f − l + 2)
− ( AT )3 C T Y ( k f − l + 3) ...
−( AT )l C T Y ( k f ) (7.11)
−( AT )2 CT Y(k + 3) ...
k f − k+l
−( AT ) C T Y( k + l + 1) (7.12)
* l
i =0
∑
uk = Sign BT ( AT )i C T Y ( k + i + 1)
(7.13)
* l
i =0
∑
i.e. uk = Sign h T (i + 1)Y ( k + i + 1)
(7.14)
Exchanging the order of summation, with the help of grid, in Figure 7.1 we have
k f k f −k −1
uk* = Sign ∑ ∑ [h T (i + 1) h (k + i + 1 − j )] u ( j )
j =0 i = max ( 1, j − k −1)
kf
k+1
0, 0 1 2 kf – k – 1
k f k f − k −1
uk* = Sign ∑ ∑ [h T (i + 1)h (k + i + 1 − j )]u ( j ) (7.16)
j =0 i =max{1, j − k −1}
Let us define
k f − k −1
Rkj = ∑
i = max{1, j −k −1}
hT (i + 1) h ( k + i + 1 − j ) (7.17)
Thus,
k f
uk* = Sign ∑ Rk ju ( j ) (7.18)
j =0
In the case of linear time varying systems, the above derivation still applies, as is easily
seen above. It is easy to see that we have an expression of the following form for the
optimal control vector over finite horizon of a time varying linear system.
uk* = Sign {∑ S }
kj u( j )
where, Skj is the energy density matrix of time varying linear system. This can be stated as
a theorem. The authors derived similar results for multidimensional and infinite
dimensional systems[2].
Optimal Binary Filters: Neural Networks 113
The optimal filter problem is formulated below. For simplicity we consider a single input,
single output linear filter/channel/system. It is easily seen that extension to multi input,
multi output systems follows in a straightforward manner. It is discussed how the solution
of optimal signal design problem also leads to a solution to this problem.
X k +1 = AX k + Buk , X (0) = 0
Yk = CX k
where A is n x n matrix , B is n x 1 matrix, C is an 1 × n matrix. Let the impulse response at
time ‘k’ be denoted by h(k ).
That is, find a bounded support ,bounded magnitude impulse response values such that total
output energy over a finite horizon is maximized. The input is unconstrained.
7.3.2 Solution
Since convolution is a commutative operator, as far as the output of linear filter is concerned,
the roles of input and impulse response can be exchanged.
y ( n ) = u ( n ) * h ( n) = h ( n ) * u ( n )
The input and impulse response have a dual role in determining output. Maximizing
output subject to bounded extent, bounded support input is equivalent to maximizing
output subject to bounded extent, bounded support impulse response. The optimal input
vector is given as the stable state of a neural network [3]. Thus optimal input signals
constitute a linear code [1], [4].
The optimal set of impulse responses constitutes stable states of a neural network whose
connection matrix is the input energy density matrix. The components of optimal impulse
response vector assume binary values. They constitute a linear code. The linear filtering
operation reduces to the “binary filtering” operation. The optimal binary filters are related
to optimal codes matched to input. The derivation follows by replacing the “input” by
“impulse response” finite in extent and finite in support. The derivation involves duplication
114 Multidimensional Neural Networks: Unified Theory
effort required with the derivation of “optimal input”. Thus, linear filtering involves
weighting the input values in the window by binary values. It is shown in [3] that the logic
functions in multidimensions also constitute the stable states of an m-d neural network.
Determine the
connection matrix of
neural network (energy
density matrix)
Determine the
local/global optimum
input signal, i.e., the
stable state of neural
network by running it in
serial mode
7.4 CONCLUSIONS
the optimal control/ signal design problem, it is shown that the global optimum impulse
response constitutes the stable state of a Hopfield neural network.
REFERENCES
* Journal:
[1] Jehoshua Bruck, Mario Blaum, “Neural Networks, Error – Correcting Codes, and Polynomials
over the Binary n- Cube“, IEEE Transactions on Information Theory, Vol. 35, No. 5, September
1989.
* Conference Proceedings:
[2] G. Rama Murthy, “Optimal Control, Codeword, Logic Function Tensors: Multidimensional
Neural Networks,” International Journal of Systemics, Cybernetics and Informatics (IJSCI),
October 2006, pages 9-17.
[3] G. Rama Murthy, “Multi/Infinite Dimensional Neural Networks, Multi/Infinite Dimensional
Logic theory,” International Journal of Neural Systems, Vol. 15, No. 3, pp. 223-235, 2005.
[4] G. Rama Murthy, “Multi/Infinite Dimensional Coding Theory: Multi/Infinite Dimensional
Neural Networks: Constrained Static Optimization,” Proceedings of IEEE Information Theory
Workshop, October 2002.
* Books:
[5] A.I. Borisenko and I. E. Tarapov, “Vector and Tensor Analysis with Applications“, Dover
Publications Inc., New York, 1968.
[6] M. Gopal, “Modern Control System Theory“, John Wiley and Sons, New York.
[7] G. Rama Murthy, “Tensor State Space Representation: Multidimensional Systems”, International
Journal of Systemics, Cybernetics and Informatics (IJSCI), January 2007, pp.16-23
[8] B. Gopinath, T.Cover , “Open Problems in Control and Communication and Computation.”,
Springer, Hiedelberg, 1987.
[9] A.E. Bryson and Y.C. Ho, “Applied Optimal Control: Optimization, Estimation and Control”,
Taylor and Francis Inc. 1995.
This page
intentionally left
blank
CHAPTER
8
Linear Filter Model of a
Synapse: Associated Novel
Real/Comple
eal/Complex xValued Neural
Valued
Networks
8.1 INTRODUCTION
Artificial neural networks are innovated to provide models of biological neural networks.
The currently available models of neurons are utilized to build single layer ( e.g. single
layer perceptron ) as well as multi-layer neural networks (e.g. multi-layer perceptron).
These neural networks were utilized successfully in several applications. Also various
paradigms of neural networks such as radial basis functions, self-organizing memory are
innovated and utilized in applications.
In the case of conventional real valued neural networks, the inputs, outputs belong
to the Euclidean space (Rn or Rm ). In these neural networks, a synapse is represented/
modeled by a single synaptic weight which is lumped at one point. These synaptic weights
are updated in the training phase using one of the learning laws ( for example, Perceptron
learning law, gradient rule etc). In the case of supervised training, these learning laws
enable one to classify the input patterns into finitely many classes (based on the training
samples).
support) [0,T]. Thus the class of input signals belong to a function space (defined
on [0,T]). For the sake of notational convenience, let the synaptic weight functions
be also defined on [0,T].
In summary, a continuous-time, real valued neuron has input signals (which are real valued
functions of time) defined over a finite support. The input signals are fed to synapses acting as
linear systems/filters and sum of responses is operated on by an activation function. Using this
model of a neuron, various feed-forward/recurrent networks of neurons are designed and studied.
This chapter is organized as follows. In section 2, continuous time perceptron model is
discussed. Also in this section, the continuous time perceptron learning law is discussed.
In section 3, abstract mathematical structure of neuronal models is discussed. In section 4,
neuronal model based on finite impulse response model of synapse is discussed. Also the
associated neural networks are proposed. In section 5, a novel continuous time associative
memory is proposed and the convergence theorem is discussed. In section 6, various
multidimensional neural network generalizations are discussed. In section 7, complex
valued neural networks based on the continuous time neuronal model are discussed. The
chapter concludes in section 8.
The area of artificial neural networks was pioneered by the efforts of McCulloch and Pitts
to provide a model of neuron. Soon, it was realized by Minsky et al. that such a model of
neuron has no training of the synaptic weights. Thus they proposed the model of single
perceptron as well as single layer of perceptrons. Further they provided the perceptron
learning law. This law was proved to converge when the input patterns are linearly
separable. Later it was shown that a Multi-Layer-Perceptron, a feed forward network can
be trained (using the back-propagation algorithm) to classify non-linearly separable
patterns. In the following (as discussed in the Introduction), we propose a more accurate
(biologically) model of neuron and use it to construct various artificial neural networks.
M
y(t) = Sign ∑
i =1
ai (t ) ⊗ Wi (t ) − T
(8.1)
Linear Filter Model of a Synapse: Associated Novel Real/Complex Valued Neural Networks 119
where ⊗ denotes the convolution operation between two time functions (and T is the threshold
at the neuron. Without loss of generality, T can be assumed to be zero ). More explicitly,
(8.2)
M T
y(t) = Sign
∑∫ ai (τ ) Wi (t − τ ) dτ − T
i =1 0
The successive input functions are defined over the interval [0,T]. They are fed as inputs to
the continuous time neurons at successive SLOTS.
M
y(t) = sign ∑ ai (t ) ⊗ w i (t )
i =1
a1 (t)
w1 (t)
w2(t)
a2 (t)
. wm(t)
.
.
am (t)
Proof: In this model of continuous time perceptron, the weights are functions of time defined
on the interval [0, T]. Thus, since the synaptic weights are functions of time, we are led to
investigating the type of convergence: (i) Pointwise or (ii) Uniform.
Suppose we fix the time point, t. The convergence of synaptic weights in (8.3) is assured
by the proof of convergence in the case of conventional perceptron. Since, the choice of
time point is arbitrary, we are assured of pointwise convergence of synaptic weights based
on training sample input functions.
It is interesting to know under what conditions, the sequence of synaptic weight
functions converge UNIFORMLY. Q.E.D.
120 Multidimensional Neural Networks: Unified Theory
Consider the inputs to a continuous-time neuron which are defined on a finite support
[0,T]. Let the impulse responses of synapses modeled as linear filters be defined on the
finite support [0,T]. Thus, the inputs as well as synaptic weight functions belong to the
function space defined over the finite support [0,T]. We answer the following question.
Q: Under reasonable assumptions, what is the mathematical structure of the function
space defined over [0,T] ?
Let F be the set (function space) on which the following operations are well defined:
Addition, Convolution (These operations are like addition, multiplication defined on the
sets: real numbers, complex numbers).
Lemma: Let the identically zero function be the additive identity element and Delta function
(δ (t ) = 1for t = 0 and δ (t ) = 0 for t ≠ 0 ) be the multiplicative identity. Then, the set F on
which addition, multiplication (of functions defined on [0,T]) operations are defined
constitutes a Ring.
Proof: Involves routine verification of axioms of the ring (closure under addition, convolution
operations between the members of F i.e. functions) and are avoided for brevity. Actually F
is “Close” to being a field except that “multiplicative inverse” does not always exist. Q.E.D.
Include such functions and convert F into field.
Now define a vector space, G over the field. The set of input functions incorporated
into a vector belongs to G. The usual ‘multiplication’ operation is replaced by ‘convolution’.
Linear Filter Model of a Synapse: Associated Novel Real/Complex Valued Neural Networks 121
∑ a (t)
i =1
i ⊗ Wi (t) = L(t) (8.5)
• So far we have considered continuous time neural networks in which the synaptic
weight function corresponds to an analog linear filter. A natural question arises
whether it is possible to conceive a synapse whose impulse response corresponds to that
122 Multidimensional Neural Networks: Unified Theory
of a digital filter i.e. a Finite Impulse Response Filter (FIR). In the following, we
consider neural network with such a model of synapse.
• Typically, let the discrete time input signals be considered over the finite horizon
[ 0, 1, 2, …, S]. For the sake of simplicity, let the length of all FIR filters modeling
the synapses be the same, say T (The generalization to the case where the FIR filters
have different lengths is straightforward). Thus the impulse response sequences
(associated with different synapses) extend over the duration {0,1,2,……….,}.
• The output of the synapse (described by an FIR filter) depends on the input signal
values over a finite horizon (depending on the length of the impulse response).
Typically the length of filter is smaller than the support of a distinct input sequence
i.e. T << S. It should be noted that the successive input sequences are of same length.
M i
y (n ) = Sign ∑
i =1
C ( n) ⊗ a i ( n)
(8.7)
M T
= Sign ∑∑
i =1 k = 0
C i (k ) a i (n − k )
Where C i(k) for k = 1, 2,...,T is the impulse response sequence of ith synapse and
a i (k ),... for k = 1, 2,..., S is the ith input sequence to the neuron.
• Thus the synaptic weight sequence values (impulse response of FIR filters) can
be trained according to the following perceptron learning law
( n)
Ci(n+1) (k ) = Ci ( k ) + η ( S( k ) − g( k )) a i(k ) (8.8)
where S(k ) is the target output for the current training example, g(k ) is the output generated
by the perceptron at time k and η is a positive constant called the learning rate.
This update rule converges when the input patterns are linearly separable. Using the
same model of neuron, a multi layer perceptron is trained using a modified version of
Back Propagation Algorithm.
It is possible to consider neuronal models in which the synapse acts as an Infinite
Impulse Response filter. Furthermore, based on such a model of neuron (synapse acting as
an FIR filter), it is possible to discuss a novel associative memory. Currently, the models of
neurons discussed (in section 2, section 4) are being compared with traditional models of
neurons [Rama5].
In addressing, the problem of signal design for magnetic/optical recording channels, Wyner
formulated an open research problem [GoC]. The problem statement is provided below.
Linear Filter Model of a Synapse: Associated Novel Real/Complex Valued Neural Networks 123
∫ y (t)
2
dt ( where y(t) is the output of linear filter )
0
The author [Rama3] as well as Honig et al. [HoS] independently solved the problem.
The solution in [Rama3] is more general in the sense that we considered Multi-Input, Multi-
Output (MIMO) linear time varying filters and derived the optimal input vector. Let Y (t)
be an optimal input vector. Then it satisfies the following signed integral equation
T
∫
Y (t) = Sign R (t , u) Y(u) du
0
(8.9)
where R (t,u) is the energy density matrix of the multi-input, multi-output, Linear time
varying system. In the case of multi-input, multi-output, linear time invariant system, the
optimal input vector satisfies the following equation
T
Y (t) = Sign
0
∫
R (t − u) Y (u) du
(8.10)
T
∫
(n)
Y (t) = Sign R ( t − τ ) Y (τ ) d τ •
( n +1)
(8.11)
0
From practical considerations, it is necessary to know whether the above successive
approximation scheme converges or not. This problem is converted into an equivalent
problem by discretizing the continuous time linear system into a discrete time system.
Such discretization can always be done for some types of systems (satisfying some regularity
124 Multidimensional Neural Networks: Unified Theory
conditions) without fear of approximating the system dynamics. The standard procedure
of discretizing a continuous time system is summarized in many textbooks including
Gopal’s book ( Gop., Pages 185-187),
With the discrete time system equivalent to the continuous time system, the argument
technique adopted for convergence is once again the energy function hill climbing in
successive iterations.
Theorem 8.1: Consider a Multi-Input, Multi-Output (MIMO), linear time-invariant
system described by the dynamics
i
X (t ) = A X (t ) + B Y (t )
Z(t) = C X (t) (8.12)
The discrete time simulation (of the above continuous time system) of the following form
X( k+1 ) = F X( k ) + G Y (k) (8.13)
Z( k ) = H X( k ) (8.14)
can always be done. The discrete simulation is almost exact except for the error introduced
by sampling the input and that caused by the iterative procedure for evaluating the matrices.
Proof: Follows from the procedure described in Gopal (Gop, pp.185-187 ). Q.E.D.
With such a discrete time system corresponding to a continuous time system, we have
the following recursion (successive approximation scheme):
Y (n +1) ( k ) = Sign {W Y( n) (k )} for n ≥ 0, (8.15)
Where Y(k ) is the optimal control vector associated with the discrete time linear system
(obtained by discretizing a continuous time system) and W is the energy density tensor
(associated with the discrete time system). Thus we have a Hopfield network with W as
the synaptic weight matrix. Hence starting with an initial vector Y (0) (k ) , the above recursion
converges to a stable state (local optimum vector) or at most a cycle of length 2 ( by invoking
the convergence theorem associated with Hopfield neural network whose Connection
matrix is W). Q.E.D.
Thus, the above approach converts the problem of determining the convergence of
scheme in (8.11), to that associated with a discrete time linear system. The iteration reminds
of L∞ version of Neumann Series. The energy function (Lyapunov function) optimized
over the state trajectory of continuous time linear system is a quadratic form [Rama1].
In [BrB], various possible generalized neural networks are discussed. These neural
networks are associated with an energy function which is a higher order form than a
quadratic form (associated with a Hopfield neural network). It is very natural to formalize
associative memories which are generalizations of those discussed in this chapter.
Several generalizations of the results are documented in the technical report [Rama5].
For instance, the complex valued, continuous time associative memory is discussed in
Linear Filter Model of a Synapse: Associated Novel Real/Complex Valued Neural Networks 125
detail in the technical report [Rama5, RaP]. For such a complex valued associative memory,
a convergence theorem is stated and proved.
8.8. CONCLUSIONS
In this, chapter models of neurons are proposed. The synapses are considered as distributed
elements rather than lumped elements. Thus, synapses are modeled as linear filters in
continuous time as well as discrete time. Using these novel models of neurons, associated
neural networks are proposed. Also, a novel model of associative memory is proposed.
Using such a model, convergence aspects of various modes of operation is discussed.
Multidimensional generalizations of neural networks are discussed. Also associated
complex valued neural networks are discussed.
REFERENCES
[AAV] I. N. Aizenberg, N. N. Aizenberg and J. Vandewalle, “Multi-Valued and Universal
Binary Neurons”, Kluwer Academic Publishers, 2000.
[BrB] J. Bruck and M. Blaum, “Neural Networks, Error Correcting Codes, and Polynomials over
the Binary n-Cube”, IEEE Transactions on Information Theory, pp. 976- 987, Vol. 35, No.5,
September 1989.
[GoC] B. Gopinath and T. Cover, “Open Problems in Control, Communication and Computation”,
Springer, Hiedelberg, 1987.
[Gop] M. Gopal, “Modern Control System Theory“, John Wiley & Sons, Second Edition,
1993.
[HoS] M. Honig and K. Stieglitz, “On Wyner’s Conjecture”, Bellcore Technical Memorandum.
[Nit1] T. Nitta and T. Furuya: “A Complex Back-propagation Learning”, Transactions of
Information Processing Society of Japan, Vol.32, No.10, pp.1319-1329 (1991) (in Japanese).
Linear Filter Model of a Synapse: Associated Novel Real/Complex Valued Neural Networks 127
[Nit2] T. Nitta : “An Extension of the Back-Propagation Algorithm to Complex Numbers”, Neural
Networks, Vol.10, No.8, pp.1391-1415 (1997).
[Rama1] G. Rama Murthy,” Unified Theory of Control, Communication and Computation”, To
be submitted to Proceedings of IEEE.
[Rama2] G. Rama Murthy, “Multi/Infinite Dimensional Neural Networks, Multi/Infinite
Dimensional Logic Theory“, International Journal of Neural Systems, Vol. 15, No. 3, Pages
223-235, June 2005.
[Rama3] G. Rama Murthy, “Signal Design for Magnetic/Optical Recording Channels: Spectra of
Bounded Functions“, Bellcore (Now Telcordia) Technical Memorandum, TM-NWT-018026.
[Rama4] G. Rama Murthy, “Optimal Control, Codeword, Logic Function Tensors:
Multidimensional Neural Networks“, International Journal of Systemics, Cybernetics and
Informatics, October 2006, pages 9-17.
[Rama5] G. Rama Murthy, “Linear Filter Model of Synapses: Associated Novel Real/Complex
Valued Neural Networks“, IIIT Technical Report in Preparation.
[RaP]G. Rama Murthy and D. Praveen, “Complex-Valued Neural Associative Memory on the
Complex Hypercube“, Proceedings of 2004 IEEE International Conference on Cybernetics
and Intelligent Systems (CIS-2004), Singapore.
This page
intentionally left
blank
CHAPTER
9
Novel ComplexV
Complex alued
Valued
Neural Networks
9.1 INTRODUCTION
Starting in 1950s researchers tried to arrive at models of neuronal circuitry. Thus the research
field of artificial neural networks took birth. The so-called, perceptron was shown to be
able to classify linear separable patterns. Since the Ex-clusive OR gate cannot be synthesized
through any perceptron (as the gate outputs are not linearly separable), the interest in
artificial neural networks faded away. In the 1970s, it was shown that multi-layer feed
forward neural network such as a multi-layer perceptron is able to classify non-linearly
separable patterns.
Living systems/machines such as homosapiens, lions, tigers etc. have the ability to
associate externally presented one/two/three dimensional information such as audio
signal/images/three dimensional scenes with the information stored in the brain. This
highly accurate ability of association of information is amazingly achieved through the
bio-chemical circuitry in the brain. In 1980s Hopfield revived the interest in the area of
artificial neural networks through a model of associative memory. The main contribution
is a convergence theorem which shows that the artificial neural network reaches a memory/
stable state starting in any arbitrary initial input (in a certain important mode of operation).
He also demonstrated several interesting variations of associative memory. In (Rama4), a
continuous-time version of associative memory is described. It is shown that the celebrated
convergence theorem in discrete time generalizes to the continuous time associative
memory. In (Rama2), the model of associative memory in one dimension (Hopfield
associative memory) is generalized to multi/infinite dimensions and the associated
convergence theorem is proven.
It was realized by researchers such as N.N. Aizenberg that the basic model of a
neuron must be modified to account for complex valued inputs, complex valued
130 Multidimensional Neural Networks: Unified Theory
synaptic weights and thresholds [AAV]. In many real world applications, complex
valued input signals need to be processed by neural networks with complex synaptic
weights [Hir]. Thus the need to study, design and analysis of such networks is real.
Also, in (Rama3) the results on real valued associative memories are extended to
complex valued neural networks. In [Nit1, Nit2], the celebrated back propagation
algorithm is generalized to complex valued neural networks. Also, in [Rama4], based
on a novel model of neuron, complex valued neural networks are designed. Thus, based
on the results in section 2, section 3, it is reasoned that transforming real valued signals
into complex domain and processing them in the complex domain could have many
advantages.
This chapter is organized as follows. In Section 2, Discrete Fourier Transform (DFT)
is utilized to transform a set of real/complex valued sequences into the complex valued
( frequency) domain. It is reasoned that, in a well defined sense, processing the signals
using complex valued neural networks is equivalent to processing them in real domain.
In Section 3, a novel model of continuous time neuron is discussed. The associated neural
networks (based on the novel model of neuron) are briefly outlined. In Section 4, some
important generalizations are discussed. In Section 5, some open questions are outlined.
The chapter concludes in Section 6.
In the field of Digital Signal Processing (DSP), discrete sequences are processed by discrete
time circuits such as digital filters. One transform which converts the time domain
information into frequency domain is called as the Discrete Fourier Transform (DFT). One
of the main reasons for utilizing the DFT in many applications is the existence of a fast
algorithm to compute DFT. This fast algorithm is called as the Fast Fourier Transform
(FFT). In the following, we provide the mathematical expressions for the Discrete Fourier
Transform (DFT) as well as Inverse Discrete Fourier Transform (IDFT) of a discrete sequence
{X n } nM=−01 i.e. { x0 , x1 , x2 ,..., xM − 1 } .
M −1
DFT: X ( k ) = ∑
n =0
x(n) WMk n for 0 ≤ k ≤ ( M − 1) (9.1)
M −1
1
IDFT: x(n ) =
M ∑
k =0
X( k ) WM− k n for 0 ≤ n ≤ ( M − 1) (9.2)
Where
2π
− j( )
WM = e M
(9.3)
Novel Complex Valued Neural Networks 131
Lemma 9.1: Under Bijective Linear Transformation, linearly separable patterns in Euclidean
Space are mapped to linearly separable patterns in the transform space.
Proof: For the sake of notational convenience, we consider the patterns in a 2-dimensional
Euclidean space. Let the bijective/invertible linear transformation be T: R 2 → R 2 .
Let the original separating line (more generally hyperplane) be given by
W1 X + W2 Y = C (9.4)
T : R2 → R 2
(9.6)
( x , y ) → ( px + qy , rs + sy )
Let the linear transformation be represented by the following matrix:
p q
r s (9.7)
Under this transformation, the separating line coordinates become:
X ' p q X
Y ' = r s Y (9.8)
Thus we readily have
X‘ = pX + qY
(9.9)
Y‘ = rX + sY
On inverting the linear transformation, we have
132 Multidimensional Neural Networks: Unified Theory
−1
X p q X '
Y = (9.10)
r s Y '
s /d −q /d X '
= −r /d p /d Y '
'
Where d is the determinant of the matrix and is given by d = p s – q r. We thus have
s q
X d X' − d Y '
Y = (9.11)
−r p
X' + Y '
d d
Thus, substituting for X, Y in the original separating line/hyper plane
W1 X + W2 Y = C , we readily have
s q −r p
W1 X ' − Y ' + W2 X' + Y ' = C
d d d d
(x , y ) ∈ S1
T (x , y ) = ( x ', y ')∈S '1 Where the set S’1 is given by
• Thus we have shown that the patterns which are linearly seperable in two
dimensional Euclidean space will remain linearly seperable after applying a
bijective linear transformation to the samples.
• The above proof is easily generalized to samples in n-dimensional Euclidean
space ( where ‘n’ is arbitrary). Q.E.D.
Consider the equation (9.1) for computing the Discrete Fourier Transformation of a
discrete sequence of samples {x(n) : 0 ≤ n ≤ ( M − 1)}. Let the column vector containing these
samples be given by Y. Also, let the column vector containing the transformed samples i.e
{X(k) : 0 ≤ k ≤ ( M − 1)} be given by Z. It is clear that equation (9.1) is equivalent to the
following:
Novel Complex Valued Neural Networks 133
Z = F Y, (9.14)
Where F is the Discrete Fourier Transform matrix. This matrix is invertible. Hence the
transformation between the discrete sequence vectors Y, Z is bijective. Thus the above
Lemma applies.
Consider a single layer of conventional perceptrons. Let the sequence of input vectors be
{Y1 , Y2 ,..., YL } . The following supervised learning procedure is utilized to classify the patterns:
• Apply the DFT to the successive input training sample vectors resulting in the
vectors. {Z1 , Z2 ,...., ZL } .
• Train a single layer of Complex Valued Perceptrons using the transformed sample
vectors (complex valued version of perceptron learning law provided in [AAV]
is used)
• Apply the IDFT to arrive at the proper class of training samples.
• Utilize the trained complex valued neural network to classify the test patterns.
In view of Lemma 1, the above procedure converges when the training samples are
linearly separable. Thus the linearly separable test patterns are properly classified.
The above procedure is also applied for non-linearly separable patterns using a complex
valued Multi-Layer Perceptron. Back propagation algorithm discussed in [Nit1, Nit2] is
utilized. Detailed discussion is provided in [Rama1]. It is argued by Nitta et al. that the complex
valued version of back propagation algorithm converges faster than the real one. Thus from
computational viewpoint, the above procedure is attractive.
M
y(t ) = Sign
j =1 ∑
a j (t ) w j (t )
(9.15)
More general activation functions ( sigmoid, hyperbolic tangent etc.) could be used. The
successive input functions are defined over the interval [0,T]. They are fed as inputs to the
continuous time neurons at successive SLOTS. For the sake of notational convenience, we
call such a neuron, a continuous time perceptron.
M
y(t) = Sign[ Σ a (t) w (t) ]
1 i
a1(t) i=1
w1(t)
w2(t)
a2(t)
. wm(t)
.
.
am (t)
9.8 CONCLUSIONS
In this chapter, transforming real valued signals into complex domain (using DFT) and
processing them using complex valued neural network is discussed. A novel model of
neuron is proposed. Based on such a model, real as well as complex valued neural networks
are proposed. Some open research questions are provided.
REFERENCES
[AAV] I. N. Aizenberg, N. N. Aizenberg and J. Vandewalle, “Multi-Valued and Universal
Binary Neurons”, Kluwer Academic Publishers, 2000.
[Hir] A.Hirose, “Complex Valued Neural Networks: Theories and Applications“, World Scientific
Publishing Company, November 2003.
[KIM] H. Kusamichi, T. Isokawa, N. Matsui et al. “A New Scheme for Colour Night Vision by
Quaternion Neural Network“, 2nd International Conference on Autonomous robots & agents,
Dec. 13-15, 2004, Palmerston North, New Zealand.
[Nit1] T. Nitta and T. Furuya: “A Complex Back-propagation Learning”, Transactions of
Information Processing Society of Japan, Vol.32, No.10, pp.1319-1329 (1991) (in Japanese).
[Nit2] T. Nitta : “An Extension of the Back-Propagation Algorithm to Complex Numbers,”
Neural Networks, Vol.10, No.8, pp.1391-1415 (1997).
[Rama1] G. Rama Murthy, “Unified Theory of Control, Communication and Computation”, To
be submitted to Proceedings of IEEE.
[Rama 2] G. Rama Murthy, “Multi/Infinite Dimensional Neural Networks, Multi/Infinite
Dimensional Logic Theory“, International Journal of Neural Systems, Vol.15, No. 3 (2005), 1-
13, June 2005.
[Rama 3] G. Rama Murthy and D. Praveen, “Complex-Valued Neural Associative Memory on
the Complex Hypercube“, Proceedings of 2004 IEEE International Conference on Cybernetics
and Intelligent Systems (CIS-2004), Singapore.
[Rama 4] G. Rama Murthy, “Linear Filter Model of Synapses: Associated Novel Real/Complex
Valued Neural Networks“, IIIT Technical Report in preparation.
[Rama 5] G. Rama Murthy, “Some Novel Real/Complex Valued Neural Network Models,”
Proceedings of 9th Fuzzy days, (International Conference on Computational Intelligence),
September 2006, Dortmund, Germany, Pages, 473-483.
Advanced Theory of Evolution of Living Systems 137
CHAPTER
10
Advanced Theory of
Evolution of Living Systems
Various life forms starting with one/few cell organisms such as amoeba, hydra etc. have
evolved from the organic mass in the oceans under certain atmospheric conditions. Some of
these organisms have a starting with the ellipsoid based egg. The egg based life forms have
formed organs such as eye, mouth due to the organic reactions taking place inside the egg.
138 Multidimensional Neural Networks: Unified Theory
With one/two eyes formed on the surface of the egg, due to the rotation of the earth, the
natural terrain in the oceans, the egg was constantly drifting in the ocean. The homogeneity
of the organic mass has simultaneously led to the formation of several eggs in the same
region. This led to the problem of congestion in the region (of eggs). The life forms, in order
to cope with the problem began to develop limbs for LOCOMOTION. The remaining organs
formed due to the natural environment/atmosphere have similar topological features in
different species of living systems (UNIFICATION of various LIVING species). The life forms
began to jostle and to handle the environmental needs, the ellipsoid based egg deformed to
form various shapes for the body and primitive organs (non-intelligence based). These
differences in the shape/topological features of the body, organs led to the classification of
such living systems into species such as frogs, fish, crocodiles etc.
The initial organic mass based life had no intelligence. The set of characteristics that
are common to various life forms have formed over a large length of time. Some novel and
innovative concepts which are the distinguishing features of this advanced theory are
briefly described below.
(A) Principle of Equivalence of Trainability of Intelligence in all Natural Living Systems: The
observed variation of intelligence in living systems could be due to variation of
the biochemical content in the brain. But by training various living systems to
learn a language, they could be made intelligent and a certain living animal
culture with intelligence can be developed. In essence, various lower/higher level
animals could be organized in a zoo and rendered useful to themselves as well
as homo-sapiens.
(B) Principle of Non-Necessity of Perceived Needs of Living Systems: Various species
characteristics as needs have evolved over a period of time. These cravings/needs
are genetically replicated. Some of these needs/cravings are not necessary to
sustain life. For instance, METABOLISM which leads to killing of one life form
by another was an accident and is not necessary to sustain the life of various
species of organic life based machines.
(C) “Life” and “death” are identical to functioning and non-functioning machines. It
should be possible to take an organically non-decayed species form which is “dead”
due to “bleeding”, heart failure, some organic decay, malignant growth etc. and make
it living. Thus, in some sense, there is no death. In summary, unified theory sheds
a proper light on the previously less understood concepts of death and life.
(D) Reproduction, as a species need has evolved over a long period of time. This has
led to the problem of overpopulation in some parts of the planet. As is well
known, reproductive appetite could be turned off.
Advanced Theory of Evolution of Living Systems 139
10.4 CONCLUSIONS
In an effort to understand non-living physical reality, various sub-fields of science such as
physics and chemistry were developed. Based on experimental observations from physical
reality, various mathematical, empirical theories were constucted to derive laws of nature.
These laws, principles, theories on non-living physical reality were utilized to develop
science and engineering. The field of biology was developed to understand the composition,
operation, coordination of various organs/functional units of living systems in nature such
as homo-sapiens, tigers etc. The distinction between living pysical reality and non-living
physical reality was very puzzling to scientists. In the mid 1940’s, N. Wiener coined the
word CYBERNETICS for the field dedicated to understand the control, communication
and computation functions of living systems.
The author pioneered the field of mathematical cybernetics by unifying the control,
communication and computation functions of living system functional units. Thus, a
mathematical model of natural living systems was developed. It is shown that in the context
of one dimensional linear dynamical systems that the unification includes various other
functions alongwith control, communication and computation. By utilizing the tensor state
space representation of certain multi/infinite dimensional linear dynamical systems
discovered by the author, cybernetics results for multi/infinite dimensional systems were
developed. These results enabled the development of multi/infinte dimensional coding,
computation and system theories.
The author also made some pioneering investigations into the functions of various
natural living sytems. These investigations provided the important conclusion that the
living machines such as homo-sapiens, tigers etc. programmed themselves for functions
140 Multidimensional Neural Networks: Unified Theory
such as metabolism, sex etc. Many issues of importance to the living machines such as
control/coordination of them, diseases, programmed bad habits are all addressed based
on a proper understanding of the theory. The advanced theory of evolution resulting from
the unified theory of control, communication and computation resulted in new perspectives
into nature based living systems.
In summary, in this book, the author related multidimensional logic, coding and control
theories to the concept of multidimensional neural networks (proposed by him). He
innovated a novel complex signum function and proposed a novel complex valued
associative memory. Several novel models of neuron are proposed and associated real as
well as complex valued neural networks are discussed.
Index 141
Index
A Arbitrary Open 54
ARMA Time Series Model 72
A codeword 91 Array 35
A Human/Animal Brain 125 Artificial 3, 107
A Multi-Layer Feed Forward Network 120 Artificial Neural Networks 81, 117, 118, 129
A Sigmoid Function 120 Associated Boundary Conditions 87
Astable State 16, 124 Associative Memory 79, 80, 125, 129
Abstract Mathematical Structure 118 Attasi’s Model 64
Abstract Model 79 Audio Signal 129
Abstract models 137 Auto-Regressive (AR) 70
Accurate 118 Auto-Regressive Moving Average (ARMA) 70
Activation Function 118, 125, 133, 134 Autocorrelation Tensors 71
Activation Functions 125 Automata 4, 10
Adaptive Neural Networks 24
Addition 120
Additive 42
B
Additive I.I.D. Noise Term 72 Back Propagation Algorithm 118, 122, 126, 130, 133
Adjoint Equations 87 Basic Model of a Neuron 129
Admissible Control Tensors 87 Basis 33
Admissible Functions 64 Behavior 64, 96
Admissible Sequence 83, 108, 109, 113 Behavior Approach 64
Advanced Theory of Evolution 137 Better Model of Neurons 117
Algebraic Geometry 45 Bijection 131
Algebraic Threshold Function 1, 29, 95, 97 Bijective Linear Transformation 131
All One Dimensional Logic Gates 81 Binary Arrays 9
All-Ones Tensor 40 Binary Codes 50
Amplitude Modulation 135 “Binary Filtering” 113
Analog Linear Filter. 121 Binary Linear Multidimensional Code, 42
Analysis 62, 63, 76 Binary Tensor 33
AND, OR , NOR, NAND, XOR Gate 17 Binary Valued Functions 123
AND, OR, NOR, NAND, XOR Gates 81 Binary Vector 123
AND, OR, NOR, NAND, XOR, NOT Gates 90 Biological Neural Networks 117
AND, OR, NOT, NAND, XOR, NOR 9 Biological Neurons 117
Animal 1 Biological Systems 135
Approximation 55 Bipartite Graph 104
Approximation Theory 55 Block Codes 107
Arbitrary Algebraic Threshold Function 12 Block Symmetric Tensor 18
142 Multidimensional Neural Networks: Unified Theory
G
Energy Functions 18, 19, 20, 24, 38, 45, 51, 54, 90
Energy Landscape 27
Energy Values 21
G/M/1-Type Structure 72
Entropy 36
Game-Theoretic Codes: Optimal Codes 39
Entropy/Uncertainty 37
Generalization of Back Propagation Algorithm
Equilibrium Distribution 72
120
Error Correcting Codes 31, 34, 59, 81
Generalized Logic Circuit 19
Errors 27, 39
Generalized Logic Function 19
Euclidean Space 117
Generalized Logic Gate 19
Every Codeword 91
Generalized Multidimensional Logic Gate 19
Every Local Maximum 81, 91
Generalized Multidimensional Neural Network 19
Evolution 62
Generalized Neural 91
Evolution At Node 14
Generalized Neural Network 11, 28
Evolution Equations 61
Generalized Neural Networks 124
Evolution of The System 74
Generalized/Multidimensional Neural Networks 12
Evolutionists 137
Generator Tensor 29, 34, 35, 37, 38, 40, 42, 49,
Exclusive OR 35, 129 53, 71, 72, 90
Generator Tensor, Codeword Tensors 52
F Generator Tensor G 43
Generator Tensors 35
Fast Fourier Transform 130 Generator/Information Tensor 36
Feed Forward Network 118, 134 Generator/Parity Check Tensor 92
Feed-forward/Recurrent Networks of Neurons Generator/Parity Check Tensors 36
118 Global Maximum 28, 29, 30, 34, 38, 40, 81
‘Field’, F 121 Global Optimization 55
Filter 118 Global Optimum 28, 92
Filtering 73, 107 Global Optimum Control Vector 108
Finite Dimensional Vector Space 63 Global Optimum Impulse Response 115
Finite Field 35, 42 Global Optimum Stable State 52
Finite Fields 59 Global States 63
Finite Impulse Response Filter 122 Global/Local Optimum 54
Finite Impulse Response Model 121 Graph-Theoretic Code 22, 57, 92
Finite Impulse Response Model of Synapse 118 Graphoid 29, 31, 33, 34, 92
Finite Support 118, 120, 123 Graphoid Based Codes 29
Finitely Many Classes 117 Graphoid Codes 31, 34
Fourier Laplace Transform 121 Graphoid Theoretic Codes 32
Fully Parallel Mode 14, 16, 20, 96, 103, 104 Graphs of Convergence 102
Fully Symmetric Connection Tensor 91 Group 42, 52
Fully Symmetric Tensor 10, 14, 29, 30, 82, 84
H
Fully Symmetric Tensor of 22
Fully Symmetric Tensor S 13, 15
Function 31
Hadamard Matrix 46, 47
Function Space 118, 120, 121
Half Plane Causal 75
146 Multidimensional Neural Networks: Unified Theory
L
Linear Tensor/Vector 32
Linear Tensor/Vector Space 32
Linear Time Invariant Continous Time System 81
Language 138
Linear Time Invariant Multidimensional System 89
Latent Variable Models 64
Linear Time Varying Multidimensional System 91
Lattice 22, 50, 59
Lattice (Unbounded Lattice) 24 Linear Time Varying Multi/Infinitedimensional
Lattice 5 Dynamical Systems 84
Learning Laws 117 Linear Time Varying System 108
Learning Rate 119, 134 Linear Time Varying Systems 112
Lee Distance 42, 43, 44, 45 Linear Time-invariant System 124
Lee Weight 44 Linear Transformation, 131
“Life” 138 Linear Transformation Groups 51
148 Multidimensional Neural Networks: Unified Theory
Novel Continuous Time Associative Memory 118 Optimal Binary Filters: Neural Networks 107
Novel Model of a Neuron 133 Optimal Code 92
Novel Model of Associative Memory 126 Optimal Codeword 92
Novel Model of Continuous Time Neuron 130 Optimal Codeword Vector 81
Novel Model of Neuron 130, 136 Optimal Control 5, 92, 110
Novel Models of Neurons 126 Optimal Control of Certain Multidimensional
NP-hard Problem 52, 56, 57 System 82
NP-hard Problems 55, 59 Optimal Control Problem 87, 114
Optimal Control Sequence 84
O Optimal Control Tensor 82, 84, 86, 89, 91, 92
Optimal Control Tensor Sequence 83
Objective Function 55, 57, 80, 87, 91 Optimal Control Tensors 80, 82, 87, 91, 92
Objective Function J 109 Optimal Control Vector 81, 112, 123, 124
Objective Functions 55 Optimal Control Vectors 81, 92, 108
Observability 61, 62, 82 Optimal Control/ Signal Design 115
One Dimensional 5 Optimal Filter 107
One Dimensional Arrays 9 Optimal Filter Design Problem 113
One Dimensional Arrays i.e.vectors 90 Optimal Filter Problem 113
One Dimensional Arrays of Zeroes and Ones 80 Optimal Filtering Problem 107, 114
One Dimensional Coding Theory 35 Optimal Input 113
One Dimensional Error Control Coding Theory 34 Optimal Input Vector 123
One Dimensional Error Correcting Code 81 Optimal Linear Multidimensional Code 91
One Dimensional Error Correcting Codes 90 Optimal Logic Functions 92
One Dimensional Linear Dynamic Systems 63 Optimal Logic Gate Output 81
One Dimensional Linear Space 69 Optimal Multidimensional Logic Functions 91
One Dimensional Linear System 82 Optimal Sequence 109
One Dimensional Linear Systems 69 Optimal Set of Impulse Responses 113
One Dimensional Logic Functions 18, 81 Optimal Signal Design 107
One Dimensional Logic Theory 17, 18, Optimal Switching Function 91, 92
50, 51, 80, 90 Optimality Condition 83
One Dimensional Neural Network 11, 20, 81 Optimization 23, 27, 28, 54, 55, 59, 80
One Dimensional Neural Networks 20, 80, 81 Optimization Approach 80
One Dimensional Non-Linear Codes 46 Optimization Constraint 27
One Dimensional Optimal Control Vectors 81 Optimization of Multivariate Polynomials 28, 50
One Dimensional Stochastic Linear Systems 71 Optimization of Quadratic/Higher Degree Forms 28
One Dimensional System Theory 76 Optimization Over More General Constraint Sets, 54
One Dimensional Systems 71, 80, 82 Optimum Input Signal 108
One-dimensional Linear Dynamical System 108 Optimum Stable State 56
One/Two/Three Dimensional Information 129 Order 13, 22, 29
Open Problem 83 Ordinary Difference 69
Open Questions 130 Ordinary/Partial Difference/Differential Equations 76
Open Research Problem 107, 122, 123 Organic Evolution 137
Open Set 54 Organic Life Based Machines 138
Open/Closed Sets 29 Organic Mass 137, 138
Operating in the Fully Parallel Mode 97 Organically Non-decayed 138
Optical Networks 34 Oscillate 104
152 Multidimensional Neural Networks: Unified Theory
Parallel Computers 9 Q
Parallel Data Transfer 34
Parallel Mode 21, 22, 97 Quadratic 84
Parallel Modes 14 Quadratic Energy 91
Parity Check Equations 35 Quadratic Energy Function 22, 31, 34, 92
Parity Check Matrices 35 Quadratic Form 23, 31, 56, 58, 86, 91, 92, 95
Parity Check Matrix 35 Quadratic Form over the Hypercube 80
Parity Check Tensor 29, 40 Quadratic Forms 12, 19
Partial Differential Equations 68, 69 Quadratic Objective Function 57, 91, 92
Pattern Recognition 96 Quadratic/Higher Degree Energy Function 10
Patterns 132 Quarter Plane Causal Distributed Dynamical
Perceptron 118, 119, 129, 134 Systems 75
Perceptron Learning Law 118, 135 Quarter Plane Causal Model 75
Perceptron Model 135 Quarter Plane Causality 63, 64, 65, 70
Planar Graphs 32 Quarter-plane Causality, Half-plane Causality 62,
Plant and Measurement Noise 73 82
Plant Noise 72 Quaternion Based Neural Networks 135
Plant Noise Model 73
Point Wise Convergence 134 R
Polynomial 46, 49
Polynomial Representation 37, 40 Random Field 72
Polynomial Time Algorithms 57 Random Field Models 71
Polynomials 46, 47 Random Process 72
Polynomials, Power Series 55 Random Variable 72
Pontryagin Function 83, 87 Rational Functions 63
Positive Definite Symmetric Matrix 56, 57 Real Anti-symmetric One 97
Positive Definite Synaptic Weight Matrix 56 Real Connection Matrix 102
Power Spectrum 71 Real Mode 100
Preciseness 80 Real Numbers 120
Prediction 73 Real Part 97, 98, 99
Prime 43 Real Symmetric 97
Index 153
Real Valued Associative Memories 130 SISO Discrete Time, Linear Time Invariant
Real Valued Neural Networks 117 Systems 81
Real Valued Neuron 118 Solution of the Difference Equation 85
Real/Complex Valued Sequences 130 Space 32
Realistic Model 117 Space Representations 71
Realizable 83 Special Sets 55
Received Tensor 43, 53 Species 6
Received Tensor Word 50 Species of Living Systems 138
Received Word 45 Spectral Representation Theorem 72
Recursion 124 Spectral/Cholesky Type Decomposition 57
Redundancy 4 Sphere Packing Problem 45
Representation 38, 82 Stability 62
Reproduction 138 Stable 14, 30
Response Determination 70 Stable State 16, 20, 21, 22, 30, 40, 58, 92, 96,
Ring 120 97, 102, 104, 113, 115
Robots 3 Stable State of a Continuous Time 89
Roesser’s Model 64 Stable State of a Multidimensional Hopfield
Neura 86
S Stable States 17, 18, 19, 22, 80, 81, 91, 92, 108,
110, 113, 114
Samples 131 Stable States (Stable Functions) 5
Scalar Synaptic Weight 117 Stable States of a Hopfield Network 92
Second Order Models 64 Stable States of a Hopfield Neural Network 81
Separable Filters 64 Stable States of a Multidimensional Hopfield
Separating Line/Hyper Plane 132 Neural Network 82
Serial Mode 14, 20, 21, 22, 30, 96, 97, 99, 103, Stable States of a One Dimensional Neural
104 Network 81
Shannon 107 Stable States of Multidimensional Neural (Gener-
Sigmoid Function 134 alized) Network 91
Sign Structure 58 Stable States of Neural Network 81
Signal Design 122 Standard Theorems 55
Signal Design Problem 24 State 13, 31, 67, 95
Signed Integral Equation 123 State Coupling Tensor 67, 68, 82
Signum Function 105, 118, 120 State Equations 88
Signum or Sigmoid or Hyperbolic Tangent 133 State Estimation 73
Simulated Annealing 27 State Evolution 30
Single Input, Single Output 81, 113 State, Input 72
Single Input, Single Output Linear Time Invariant State of a Neuron 13
123 State of a Node 20
Single Layer 117, 121, 133 State of Neuron. 15
Single Layer Neural Network 81 State of Node 30
Single Layer of Perceptrons 118 State of the Dynamical System 68, 82
Single Layer Perceptron 117 State of the Network 14, 20, 30
Single Synaptic Weight 117 State of the Node 16
Single/Multi-Layer Continuous Time Neural State Response 109
Networks 125 State Space 22, 27, 30, 63, 74
154 Multidimensional Neural Networks: Unified Theory
State Space Description 61, 67, 82, 107, 108 Synaptic Weight Matrix 56, 124
State Space Description of a Dynamical System 75 Synaptic Weight Sequence Values 122
State Space Model 64 Synaptic Weights 13, 15, 20, 25, 97, 117, 118, 133
State Space Representation 58, 59, 61, 63, 65, Syndrome 41
66, 69, 73, 76, 82, 91 Synthesis 61, 62
State Space Representation Through Tensors 70 System 61
State Space Representations 72, 76 System Dynamics 75, 83, 87, 124
State Space Structure 69 System Theorists 63, 68
State Tensor 13 System Theory 66, 107
State Transition Tensor 71, 72, 88 System Theory Approach 107
State Transitions 74 Systematic Form 29
States of Neural Networks. 80 Systems 67, 139
Static 63 Systems Function 61
Static Optimization 23, 29, 54, 55
Static Optimization Problems 53
Static Systems 65, 70
T
Stochastic Control Theory 73, 76 Target Output 119, 122, 134
Stochastic Dynamic Programming 73 Tensor 11, 29, 32, 68
Stochastic Linear Systems 71 Tensor Algebra, 49, 76
Stochastic Models 72 Tensor Algebra Concepts 35
Stochastic Processes 71 Tensor Analysis 66
Stochastic Tensor 37 Tensor Based 29
Storage of Data 81 Tensor Based Difference/Differential Equations 76
Stress Tensor 67 Tensor Based Multivariate Polynomials 28
Structure of Optimal Control 110 Tensor Based State Space Representation 63
Structure of the local Optimum 86 Tensor Field 74
Structured Markov Random Field 71, 72, 73 Tensor Functions 74
Structured Markov Random Fields 71 Tensor Geometric Form 72
Sub-spaces 34 Tensor Inner Product (Outer Product) 35
Subsets of Multidimensional Lattice 19, 28 Tensor Linear Operator 34, 35, 62, 63, 65, 69, 76
Subsets of the Lattice 51 Tensor Linear Operators 62, 71, 76
Successive Approximation Procedure 123 Tensor Linear Space 36
Successive Approximation Scheme 123 Tensor Linear Spaces 59, 69, 70
Successive Input Functions 119, 134 Tensor of Partial Derivatives 76
Supervised Learning 135 Tensor of Probabilities of The States 71
Supervised Learning in a Function Space 135 Tensor Product 12, 48
Supervised Training 117 Tensor Products 46, 73, 76, 125, 135
Switching/Logic Functions 19 Tensor Products, Matrix Products 47
Symmetric 99 Tensor Spaces 28, 34
Symmetric Matrix 11, 20, 58, 95 Tensor State Space 82
Symmetric Tensor 10, 18, 23 Tensor State Space Description 69, 70
Synapse 117, 121 Tensor State Space Representation 62, 63, 66,
Synapses 117, 118, 126, 133 67, 68, 71, 72, 73, 76, 80, 82, 92
Synaptic Contribution 15 Tensor State Space Representations 68
Synaptic Weight 134, 135 Tensor-tensor Products 76
Synaptic Weight Function 121 Tensor-tensor Variables 70
Synaptic Weight Functions Tensors 10, 12, 38, 58, 62
118, 121, 125, 133, 134, 135 Terminal State 84
Index 155