Vous êtes sur la page 1sur 11

Hindawi Publishing Corporation

International Journal of Distributed Sensor Networks


Volume 2016, Article ID 1851829, 11 pages
http://dx.doi.org/10.1155/2016/1851829

Research Article
A Multilayer Improved RBM Network Based Image Compression
Method in Wireless Sensor Networks

Chunling Cheng,1 Shu Wang,1 Xingguo Chen,1 and Yanying Yang2


1
College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 213001, China
2
Department of Information and Technology, Nanjing College of Forestry Police, Nanjing 210023, China

Correspondence should be addressed to Shu Wang; wangshu njupt@foxmail.com

Received 5 November 2015; Accepted 21 January 2016

Academic Editor: Reza Malekian

Copyright © 2016 Chunling Cheng et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.

The processing capacity and power of nodes in a Wireless Sensor Network (WSN) are limited. And most image compression
algorithms in WSN are subject to random image content changes or have low image qualities after the images are decoded. Therefore,
an image compression method based on multilayer Restricted Boltzmann Machine (RBM) network is proposed in this paper. The
alternative iteration algorithm is also applied in RBM to optimize the training process. The proposed image compression method
is compared with a region of interest (ROI) compression method in simulations. Under the same compression ratio, the qualities
of reconstructed images are better than that of ROI. When the number of hidden units in top RBM layer is 8, the peak signal-to-
noise ratio (PSNR) of the multilayer RBM network compression method is 74.2141, and it is much higher than that of ROI which
is 60.2093. The multilayer RBM based image compression method has better compression performance and can effectively reduce
the energy consumption during image transmission in WSN.

1. Introduction the likelihood of training data in each RBM layer. Moreover,


the training complexity of RBM has a great effect on the
WSNs emerge as a new research hot spot in recent years. energy consumption of image compression coding.
In a WSN, the resources in each sensor node are limited. Most of the current RBM training algorithms are carried
Therefore, it is a huge challenge to reduce energy consump- out only with large quantities of sampling using Markov
tion and extend the lifetime of a sensor node. The energy Chain Monte Carlo (MCMC) method. The average joint
cost of transmitting image in WSN remains to be a main probability between visible units and hidden units is esti-
factor that affects the lifetime of a sensor node. To reduce the mated based on these samples without calculating the nor-
bandwidth and energy consumption in image transmission, malizing parameter. However, the frequency of state transi-
it is necessary to propose a more effective image compression tions should be enough to ensure that the acquired samples
method. satisfy the target distribution when MCMC sampling is
Currently, the image compression algorithms in WSNs conducted. Also, large quantities of sampling are needed to
are subject to the random changes in image contents. It is improve the accuracy of the estimated value, which increases
unrealistic to describe various images in real world with only the difficulties of RBM training. Aiming at the problems
one kind of image model. To address this issue, the neural encountered in the current RBM training process, alternative
network is adopted in WSNs to compress images. As an algorithm is used in the RBM training process.
important branch of neural network, Deep Learning [1, 2] has We have adopted the alternative iteration algorithm into
many computation models. Restricted Boltzmann Machine the process of RBM training. In this algorithm, the normaliz-
(RBM) [3, 4] is one of the prime models of Deep Learning. ing parameter is considered as another unknown parameter.
When the multilayer RBM based network is used to compress Therefore, the likelihood function can be transformed into
images, the quality of compressed images has much to do with two subfunctions. One is about the normalizing parameter
2 International Journal of Distributed Sensor Networks

and the other is about the model distribution parameter. The and the DC routing algorithm. In [9, 10], a ring model
model distribution parameter which is about to be assessed is based distributed time-space data compression algorithm
calculated alternatively with the normalizing parameter and and a wavelet based self-fitting data compression algorithm
eventually can be obtained through a highly efficient training are proposed. Storage efficient two-dimension and three-
process. This training process is of low complexity. This dimension continuous wavelet data compression methods are
algorithm can improve the likelihood of RBM for training proposed in [11]. They are based on the ring model of fitting
data. sensor network wavelet transform and the overlapping cluster
Furthermore, we have used the improved RBM train- partition model, respectively. They are storage efficient and
ing process in image compression in WSNs. A multilayer can save the transmission energy consumption in networks.
improved RBM based image compression method is pre- The distributed data compression algorithm is based on
sented in this paper. This image compression method can the fact that all the centralized and decentralized information
extract more abstract data to coding based on the image fea- services can be implemented. The feature of a distributed data
tures and has a better compression effect. In the simulations, compression algorithm is that it can reduce the data amount
the reconstructed image quality of multilayer RBM networks by the cooperative work among different sensor nodes. A
is superior to that of another image compression method chain model based distributed data compression algorithm
under the same compression ratio, which will be stated in is proposed in [12] based on the random lengths of wavelets.
detail in Section 5. At the same time, the proposed image This algorithm designs a chain model that is suitable for
compression method can reduce the energy consumption wavelet transform. It is suitable for random lengths of wavelet
during image data transmission process. functions.
The rest of the paper is organized as follows. In Section 2, Traditional lossless data compression methods mainly
related work on image compression and RBM training algo- include Run Length Encoding technology, Huffman coding
rithms is discussed. Section 3 presents the basic idea of the compression, dictionary compression method, and arith-
multilayer RBM network based image compression method. metic compression method. These methods are mainly
And the RBM model and the improved RBM algorithm adopted in advanced computers or workstations. In the
based on alternative iteration are depicted in Section 4. The application of sensor networks, the processing capacity of
performance of the proposed algorithm is compared with each processor is limited. Its memory is small. Therefore, it is
some typical algorithms in Section 5. At last, conclusions and essential to optimize the traditional compression algorithm.
future work are presented in Section 6. In [13], the difference between the two perceptual pieces of
data is encoded based on the self-fitting Huffman coding
algorithm. Reference [14] proposes a region of interest (ROI)
2. Related Work based lossy-lossless image compression method. It carries
out different coding compression methods on the small area
Typical image compression algorithms include time-space that is important to itself and the other large area. In this
related data compression algorithm, wavelet transform based way, compression ratio is improved under the condition that
data compression algorithm, distributed data compression sensitive information is reserved.
algorithm, and improved traditional data compression algo- In recent years, Deep Learning (DL) is widely used
rithm. in WSNs to carry out image compression. Deep Learning
The space-time relativity based data compression algo- extracts the characteristics of data from low to high layers
rithm mainly includes prediction coding and linear fitting by modeling the layer model of analyzing in human brains.
method for time series. A prediction coding method is However, the effect of image compression using DL is
proposed in [5]. It can effectively evaluate the source data subject to the likelihood of RBM for training data and the
based on the time relativity of the source data. However, the training complexity of RBM. Therefore, an improved training
prediction coding based data compression method does not algorithm based on RBM training is also proposed in this
involve large amount of image data transmission. Reference paper.
[6] proposes a curve fitting technology based data flow com- Currently, researchers have made lots of researches on
pression method. It compresses data collected on each node RBM training algorithms. In 2002, Hinton proposed a fast
and restores the data in the base station. But this method is learning algorithm of RBM, Contrastive Divergence (CD)
very complex, and it does not consider the transmission delay [15]. This algorithm is a RBM approximate learning algorithm
in each sensor node. Reference [7] presents a space-time data of high efficiency. However, the RBM model acquired by the
compression technology based on simple linear regression CD algorithm is not a maximum entropy model and does not
model. This method can eliminate data redundancy in single have high likelihood when training data [16].
node and collector node, respectively. But only the data that In 2008, Tijmen Tieleman proposed a Persistent Con-
satisfies the error requirement is considered in this method. trastive Divergence (PCD) algorithm [17]. This algorithm has
Abnormal data is not involved in this method. remedied the deficiency in CD algorithm. It has the same
Wavelet transform is a time-frequency analysis method efficiency of CD algorithm and does not violate the maximum
which is superior to traditional signal analysis methods. likelihood learning. In addition, the RBM obtained by PCD
Reference [8] considers the existence of stream data in the training has more powerful pattern generation capacity. In
data transmission of sensor networks. It compresses data 2009, Tieleman and Hinton made further improvement of
by using wavelet transform based on the data aggregation PCD algorithm [18] and proposed Fast Persistent Contrastive
International Journal of Distributed Sensor Networks 3

Divergence (FPCD) algorithm. A group of auxiliary param- Original


Image
The
eters are involved in improving the Markov chain composite compression
image Preprocessing wireless
encoding
rate in PCD. Another group of parameters, which are called data
technology
sender
Fast Weight and denoted by 𝑊󸀠 , are learned at the same time Communication
when carrying out RBM learning. channel
Some RBM learning algorithms of MCMC sampling Image Wireless
Receiving
methods based on tempering also appear during these years. image data
decoding receiving
A parallel tempering algorithm based on RBM is introduced technology end
in [19]. This algorithm maintains a state for every distribution
under a certain temperature. During state transition, the Figure 1: The image sending and receiving process.
low temperature distribution state can be transmitted to
high temperature distribution state by exchanging the two
distribution states. In this way, there is a high chance that
the low temperature distribution state can be transmitted to
a remote peak value; therefore, the whole distribution can Output the compressed
be sampled. In 2014, Xu et al. proposed a tempered based image data
MCMC method, Tempered Transition, in [20] to learn RBM
model. The main idea of Tempered Transition is to maintain Output
··· layer
the current state in the target distribution. When a new state RBM Encoding Decoding
appears, the state transition is carried out step by step from
low to high temperature, by which the state gravity from ···
the current peak value can be decreased. At last, a group of
RBM Encoding Decoding
state transitions from high to low temperature are conducted
until the temperature gets normal. The essence of the above ···
two algorithms can improve the RBM training efficiency by
Encoding Decoding
adopting the MCMC sampling method based on tempering RBM
[21]. ··· Input
layer

3. Image Compression Using Input original image


Multilayer RBM Network matrix

In a WSN, the data transmission process can be divided


Figure 2: Basic idea of image compression using multilayer RBM.
into two parts: data compression encoding process and data
decoding process. The image sending and receiving process
in WSNs can be shown in Figure 1.
The basic idea of the multilayer based RBM network data
compression encoding method is as follows: firstly, an image
whose pixel is 𝑀 × 𝑁 is changed into 𝑀 × 𝑁 pixel matrix; the input of input layer as much as possible. The output of
then normalization processing is carried out on each element RBM hidden layer in the first layer is inputted into the RBM
in this matrix based on the mean distribution preprocessing in the second layer. When the number of hidden layer units
method, and each element in the original matrix is changed is smaller than that of the input layer units, it means that
in the range [0, 1]. We denote the changed matrix by 𝑔: the hidden layer can effectively express the input of the input
layer. The transformation from input layer to hidden layer can
𝑇 be seen as the process of compression encoding. The process
𝑔 = (𝑔0row , 𝑔1row , . . . , 𝑔(𝑁−1)row )
of multilayer RBM network image compression is shown in
𝑔0,0 𝑔0,1 ⋅⋅⋅ 𝑔0,𝑁−1 Figure 2.
[ 𝑔 The bottom input layer consists of 𝑀 × 𝑁 neural units.
[ 1,0 𝑔1,1 ⋅⋅⋅ 𝑔1,𝑁−1 ]
] (1) Each neural unit represents a pixel in 𝑀 × 𝑁 image. The
[ ]
=[ . .. .. ], number of hidden units can be determined based on the
[ . ]
[ . . d . ] image compression ratio.
[𝑔𝑀−1,0 𝑔𝑀−1,1 ⋅ ⋅ ⋅ 𝑔𝑀−1,𝑁−1 ] Image decoding is the inverse process of image compres-
sion coding process. The compressed image is inputted into
where 𝑔𝑖row denotes a row in the matrix and 0 ≤ 𝑖row < 𝑀. the topmost layer and then is decoded from layer to layer. At
𝑁 is the number of elements in each row. last, the bottom level outputs the original image.
Then, the image matrix 𝑔 is inputted into the multilayer The main part of the improved image compression coding
RBM network. The network contains an input layer and method is RBM. It is essential to improve the likelihood of
multiple hidden layers. The connection weights and bias RBM for image data so as to ensure high similarity between
between input layer units and hidden layer units can be the original image and the image after decoding. Therefore,
adjusted so as to make the hidden layer output equal to we have improved the training method of RBM.
4 International Journal of Distributed Sensor Networks

iteration algorithm to RBM training. When the model param-


··· Hidden layer h eter cannot be calculated because of some other uncertain
parameters, alternative iteration algorithm can be applied to
get the maximum estimated value of these parameters using
W iteration strategy.

··· Visible layer v


4.2. The Alternative Iteration. The alternative iteration algo-
rithm is a common method to solve optimization problems.
Figure 3: Graph of RBM model. For instance, there is a maximization problem max 𝑓(𝑥, 𝑦).
Firstly, 𝑥 keeps unchangeable and 𝑦 is changed to increase
function 𝑓. Then, 𝑦 keeps unchangeable and 𝑥 is changed to
increase 𝑓. The two operations are carried out alternatively
4. An Improved RBM Algorithm Based on
until 𝑓 cannot increase any more.
Alternative Iteration Algorithm The alternative iteration algorithm is adopted in this
4.1. The RBM Model. RBM can be assumed as an undirected paper to solve the problem of RBM training. The likelihood
graph model [22]. As is shown in Figure 3, V is the visible function 𝑙 in RBM training is defined as
layer and ℎ is the hidden layer. 𝑊 denotes the connecting
max 𝑙 (𝜃, 𝑧) , (5)
weights of the edges that connect the two layers.
Assume that there are 𝑛 visible units and 𝑚 hidden units. where 𝑥 is the formalizing parameter 𝑧 and 𝑦 denotes the
And the states of the visible and hidden units are referred to model parameter 𝜃.
as vectors V and ℎ accordingly. Then, for a certain combined The traditional way of getting the value of 𝜃 is to maximize
state (V, ℎ), the energy of the RBM system can be defined as 𝑙(𝜃) by using maximum likelihood estimation. Generally, we
𝑛 𝑚 𝑛 𝑚 intend to get the maximum model distribution parameter
𝐸 (V, ℎ | 𝜃) = −∑𝑎𝑖 V𝑖 − ∑ 𝑏𝑗 ℎ𝑗 − ∑ ∑ℎ𝑗 𝑤𝑖𝑗 V𝑖 , (2) 𝜃 which can maximize the likelihood function. But the
𝑖=1 𝑗=1 𝑖=1 𝑗=1 normalizing parameter 𝑧 is involved in this process, which
makes it quite difficult to calculate the likelihood function.
where 𝜃 = {𝑤𝑖𝑗 , 𝑎𝑖 , 𝑏𝑗 }, 𝑤𝑖𝑗 denotes the connecting weight However, it will be easier to get the model parameter when
between visible unit 𝑖 and hidden unit 𝑗, 𝑎𝑖 is the bias of the 𝑧 is already known. The alternative iteration algorithm is
visible unit 𝑖, and 𝑏𝑗 is the bias of the hidden units. All the firstly adopted in RBM in this paper. The problem of training
parameters are real numbers. And when these parameters are RBM can be considered as the double parameters solving
decided, we can get the joint probability distribution of (V, ℎ) problem. For the two groups of unknown parameters 𝑧 and
based on this energy function (2): 𝜃, we firstly keep 𝜃 unchangeable to get the expression of 𝑧.
With 𝜃 in this expression, the likelihood function of 𝑧 can be
𝑒−𝐸(V,ℎ|𝜃) obtained. And then we keep estimated 𝑧 unchangeable and
𝑃 (V, ℎ | 𝜃) = , (3)
𝑍 (𝜃) deduce the maximization function of 𝜃 based on the marginal
distribution of the joint distribution of visible and hidden
where 𝑍(𝜃) = ∑V,ℎ 𝑒−𝐸(V,ℎ|𝜃) is the normalizing parameter. layers in RBM. The improved algorithm will carry out the
The object of RBM learning process is to determine 𝜃, the two operations alternatively until it satisfies the termination
parameter fitting training data. Traditional way of getting the conditions.
value of 𝜃 is to maximize the log-likelihood function based
on RBM: 4.3. The Process of Calculating RBM Parameters by Alternative
𝑛𝑠 Iteration. Assume that there is a group of training samples
𝜃∗ = arg max𝐿 (𝜃) = arg max∑ log 𝑃 (V𝑖 | 𝜃) . (4) 𝑉 = {V1 , V2 , V3 , . . . , V𝑖 , . . .}, V𝑖 = (V1𝑖 , V2𝑖 , . . . , V𝑛𝑖 V ), 𝑖 = 1, 2,
𝜃 𝜃
𝑖=1 . . . , 𝑛𝑠 , where 𝑛V is the dimension of input sample whose value
The stochastic gradient ascent is usually used to get the is equal to the number of units in the visible layer and 𝑛𝑠 is
optimum parameter 𝜃∗ [23]. The critical part during this the number of samples. All of these samples are independent
of each other. After the input of sample V𝑖 and 𝑡 times of
process is to calculate partial derivatives of log 𝑃(V𝑖 | 𝜃)
iterations, the value of the model parameter is denoted by 𝜃𝑡𝑖
with respect to every model distribution parameter. The joint
and 𝑧𝑡 denotes the value of the normalizing parameter, where
distribution of the visible and hidden units is involved, and
this distribution will be determined only if 𝑍(𝜃) is obtained 𝑡 ∈ (0, 𝑇) and 𝑇 is the maximum number of iterations. 𝜃𝑖
which will need 2𝑚+𝑛 times of calculation. denotes the final value of the model parameter when finishing
From the current researches, the approximate value of the training with the input of sample V𝑖 . And 𝜃0 is the initial
joint distribution can be obtained by some sampling methods input value of the model parameter.
like Gibbs Sampling [24], CD, and so on. However, most When the algorithm begins, the model parameter is firstly
of these methods have the defect that the RBM training is initialized by 𝜃0 . Then, we take the first sample V1 as the input
very complex because of the frequent state transitions and of RBM, and we set 𝜃01 = 𝜃0 . The normalized parameter is
large quantities of sampling. We propose applying alternative then estimated and we get an initial value 𝑧1 based on 𝜃01 .
International Journal of Distributed Sensor Networks 5

The model distributed parameter is estimated and changed At this time, we need to choose the equation to calculate
𝑖
to 𝜃11 based on 𝑧1 . Continue to estimate the above two 𝜃𝑡+1 . When 𝑧𝑡+1 and 𝜃𝑡𝑖 are all already known, we can get the
parameters alternatively until the convergence condition is joint probability distribution of (V𝑖 , ℎ) based on (1):
satisfied or it reaches the maximum iteration times 𝑇. The 𝑖 𝑖
final value of the model parameter, which is obtained by 𝑒−𝐸(V ,ℎ|𝜃𝑡 )
𝑝 (V𝑖 , ℎ | 𝜃𝑡𝑖 ) = , (11)
sample V𝑖 , is denoted by 𝜃𝑖 . After that, the model parameter 𝑍 (𝜃𝑡𝑖 )
is denoted by 𝜃1 . It is the initial value of the model parameter
when inputting the second sample V2 , which means 𝜃02 = 𝜃1 . where 𝑍(𝜃𝑡𝑖 ) is the normalized parameter 𝑧𝑡+1 which is
When sample V𝑖 and model parameter 𝜃𝑡𝑖 are inputted, we obtained above.
need to consider the objective function of 𝑧𝑡+1 . Assume that We can get the marginal distribution of the joint probabil-
𝑧𝑡 = 𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ) is the distribution of the normalized parameter ity distribution 𝑝(V𝑖 , ℎ | 𝜃𝑡𝑖 ) based on the derivation equation
of the original RBM:
of the sample when 𝜃𝑡𝑖 keeps unchanged, where 𝑧𝑖 is the
normalized parameter of sample V𝑖 . The satisfied conditions 1 𝑖 𝑖

of 𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ) are ∑𝑧𝑖 𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ) = 1 and 𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ) ≥ 0. Because 𝑝 (V𝑖 | 𝜃𝑡𝑖 ) = ∑𝑒−𝐸(V ,ℎ|𝜃𝑡 ) . (12)
𝑧𝑡+1 ℎ
the log function is concave, we can calculate the approximate
expression of 𝑧 by using the Jensen inequality. Then, we can 𝑖
Then, we keep 𝑧𝑡+1 unchanged and get a value 𝜃𝑡+1 of the
derive the following equation: model parameter:
𝑛V 𝑛V 𝑛V
𝑖
∑ log 𝑝 (V𝑛𝑖 ; 𝜃𝑡𝑖 ) = ∑ log ∑𝑝 (V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) . (6) 𝜃𝑡+1 = arg max𝑙 (𝜃𝑡𝑖 ) = arg max ∑ ln 𝑝 (V𝑛𝑖 | 𝜃𝑡𝑖 ) . (13)
𝑛=1 𝑛=1 𝑧𝑖 𝜃𝑡𝑖 𝜃𝑡𝑖 𝑛=1

Multiply the denominator and numerator of the right However, the initial value 𝜃0 we assigned to the model
fraction in (6) by 𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ): parameter may not be suitable for the model. In that case,
we can update the value of the model parameter by iterative
𝑛V 𝑛V 𝑝 (V𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) optimization based on the alternative iteration algorithm.
∑ log 𝑝 (V𝑛𝑖 ; 𝜃𝑡𝑖 ) = ∑ log ∑𝑍 (𝑧𝑖 | 𝜃𝑡𝑖 ) . (7) Thus, sample V𝑖 can be used to estimate a value of 𝜃𝑖 . 𝜃𝑖 which
𝑛=1 𝑛=1 𝑧𝑖
𝑍 (𝑧𝑖 | 𝜃𝑡𝑖 ) is obtained by the training of former sample can be used as
the initial value of 𝜃𝑖+1 . 𝜃𝑖+1 is the model parameter which
We can deduce from the Jensen inequality and the is about to be estimated based on the next sample. Repeat
property concave function the following equation: the optimization operations until termination conditions are
satisfied.
𝑛V 𝑝 (V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) The improved RBM algorithm is described in
∑ log ∑𝑍 (𝑧𝑖 | 𝜃𝑡𝑖 ) Algorithm 1.
𝑛=1 𝑧𝑖
𝑍 (𝑧𝑖 | 𝜃𝑡𝑖 )
(8)
𝑛V 𝑝 (V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) 5. Simulation Experiments and
≥ ∑ ∑𝑍 (𝑧𝑖 | 𝜃𝑡𝑖 ) log . Results Analysis
𝑛=1 𝑧𝑖 𝑍 (𝑧𝑖 | 𝜃𝑡𝑖 )
The experiment consists of three parts: the performance
Equation (8) is true if and only if 𝑝(V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 )/𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ) = analysis of RBM; the analysis of the compression performance
𝑐, where 𝑐 is a constant which is independent of 𝑧𝑖 . of the proposed image compression method and the evalu-
According to ∑𝑧𝑖 𝑍(𝑧𝑖 | 𝜃𝑡𝑖 ) = 1, we can draw the ation of reconstructed image quality; the analysis of energy
following equation: consumption in WSNs when multilayer RBM network image
compression method is used. MATLAB 2013a is used to carry
𝑖
𝑝 (V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) 𝑝 (V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) out the simulations.
𝑍 (𝑧 | 𝜃𝑡𝑖 ) = =
∑𝑧𝑖 𝑝 (V𝑛𝑖 , 𝑧𝑖 ; 𝜃𝑡𝑖 ) 𝑝 (V𝑛𝑖 ; 𝜃𝑡𝑖 ) (9)
5.1. Performance Analysis of the Improved RBM. The datasets
𝑖
= 𝑝 (𝑧 | V𝑛𝑖 ; 𝜃𝑡𝑖 ) . of our experiment are the famous handwritten digital
database of MNIST [25] and the toy dataset. The MNIST
When 𝜃𝑡𝑖 keeps unchangeable, we maximize 𝑝(V𝑖 | 𝑧𝑡 ), dataset consists of 50,000 groups of training samples and
10,000 groups of testing samples. Each group of samples
which is the same as maximizing ln 𝑝(V𝑖 | 𝑧𝑡 ). Hence, consists of a grayscale image whose resolution is 28∗28. There
𝑛V are handwritten Arabic numerals in the image. These Arabic
𝑧𝑡+1 = arg max𝑙 (𝑧𝑡 ) = arg max ∑ ln 𝑝 (V𝑛𝑖 | 𝑧𝑡 ) . (10) numerals have their indexes so as to conduct the experiment
𝑧𝑡 𝑧𝑡 𝑛=1 with supervised learning. Part of the data samples in the
MNIST dataset is shown in Figure 4.
The normalized parameter can be estimated and we get a Compared with MINST dataset, the toy dataset is simpler
value 𝑧𝑡+1 . and lower dimensional. It consists of 10,000 images. Each
6 International Journal of Distributed Sensor Networks

Setting the convergence threshold 𝜎, the terminal threshold 𝜀


Input: the value of the model parameter by pre training 𝜃0 , maximum
iteration times 𝑇, number of hidden units 𝑛ℎ
Output: the final value of the model parameter 𝜃𝑖
For 𝑖 = 1, 2, . . . , 𝑛𝑠 (for all samples)
For 𝑡 = 1, 2, . . . , 𝑇
𝑛V
Computing 𝑧𝑡+1 , 𝑧𝑡+1 = arg max𝑧𝑡 𝑙(𝑧𝑡 ) = arg max𝑧𝑡 ∑𝑛=1 ln 𝑝(V𝑛𝑖 | 𝑧𝑡 )
𝑛V
Computing 𝜃𝑡+1 , 𝜃𝑡+1 = arg max𝜃𝑡𝑖 𝑙(𝜃𝑡 ) = arg max𝜃𝑡𝑖 ∑𝑛=1 ln 𝑝(V𝑛𝑖 | 𝜃𝑡𝑖 )
𝑖 𝑖 𝑖

Judging the reconstruction error of the model reaches 𝜎


End for
Judging the difference of the likelihoods of the two RBMs defined
by the adjacent parameters is within (0, 𝜀)
End for

Algorithm 1: The RBM algorithm based on alternative iteration algorithm.

90

The average reconstruction error (2-norm)


80

70

60

50
Figure 4: Part of samples of MNIST database.
40

30
image has 4 × 4 binary pixels. The dataset is generated in the
same way as that mentioned in [26].
20
We compare the proposed algorithm with PCD algo- 0 50 100 150 200 250 300 350
rithm, parallel tempering algorithm (PT-K) [27], and parallel The number of hidden units
tempering with equienergy moves (PTEE) [28] in the experi-
PCD PT-10
ments. In PT-K, 𝐾 is the number of auxiliary distributions of
Our proposed algorithm PTEE-10
parallel tempering under different temperatures. The value of
each temperature is usually between 0.9 and 1. The parameter Figure 5: The reconstruction errors of the four algorithms after 15
in PT-K can be easily controlled and in our experiments 𝐾 times of iterations on the MNIST dataset.
is set to 5 and 10, respectively. Based on some preliminary
experiments, we find that PT can achieve better likelihood
scores than that of using 5 chains when using 10 chains. Result
yielded by PTEE when using 5 chains is similar to that when We set the number of hidden units to 10, 15, 20, 25, 30, 50,
using 10 chains, which means that PTEE cannot be affected 100, 150, 200, 250, and 300. Results obtained by using PT-
by the number of Markov chains to some extent [28]. So, 10, PTEE-10, PCD, and the proposed algorithm are shown in
we show the results obtained by using PTEE and PT with 10 Figures 5–8.
chains. From Figures 5 and 6, we can see that the average
We evaluate their qualities by the likelihood of the RBM reconstruction error of the proposed algorithm is always
for training data with two methods: the reconstruction error smaller than the other three algorithms on the MNIST
and enumerating states of hidden units. dataset. And we can get similar results from Figures 7 and
Firstly, we compare the reconstruction errors of four 8.
algorithms with different numbers of hidden nodes on the Figures 5–8 show that all of the reconstruction errors of
MNIST dataset and toy dataset. The first 30,000 groups of four algorithms decrease when the number of hidden units
samples in MINIST are divided into three parts. Each part increases. When there are a small number of hidden units,
includes 10,000 groups of samples. The number of hidden the reconstruction error obtained by the proposed algorithm
units is set to 10, 15, 20, 25, 30, 50, 100, 200, 250, 300, and 350. is close to that of the other three algorithms. However, when
The number of iterations on each part ranges from 1 to 45. the number of hidden units increases, the superiority of
And the average reconstruction errors of the three parts after the proposed algorithm appears gradually. And we can see
15 and 30 times of iterations are shown, respectively, below. decreasing ratios of the average reconstruction errors of PT-
Then, the experiments on the toy dataset are also executed. 10, PTEE-10, and our proposed algorithm compared with
International Journal of Distributed Sensor Networks 7

90 120

The average reconstruction error (2-norm)


The average reconstruction error (2-norm)

110
80
100
70
90
60
80

50 70

40 60

50
30
40
0 50 100 150 200 250 300
20
0 50 100 150 200 250 300 350 The number of hidden units
The number of hidden units
PCD PT-10
PCD PT-10 Our proposed algorithm PTEE-10
Our proposed algorithm PTEE-10
Figure 8: The reconstruction errors of the four algorithms after 30
Figure 6: The reconstruction errors of the four algorithms after 30 times of iterations on the toy dataset.
times of iterations on the MNIST dataset.

120 −100
The average log-likelihood of training data
The average reconstruction error (2-norm)

110

100
−150
90

80

70 −200
60

50
−250
40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0 50 100 150 200 250 300
The number of parameter updating times (∗5000)
The number of hidden units
PCD PT-10
PCD PT-10
Our proposed algorithm PTEE-10
Our proposed algorithm PTEE-10
Figure 9: The average likelihood of the four algorithms when there
Figure 7: The reconstruction errors of the four algorithms after 15
are 15 hidden units on the MNIST dataset.
times of iterations on the toy dataset.

PCD on the MNIST and the toy dataset when there are the 10,000 times of parameter updates. However, as the number
same numbers of hidden units. When the number of hidden of updates gradually increases, the likelihood for data of the
units is 350, after 30 times of iterations, the reconstruction proposed algorithm also increases which can be better than
error of the proposed algorithm is 26.60% lower than that that of PTEE-10. When the number of updates is 30,000, the
of PCD on MNIST dataset. Under the same conditions and likelihood of PCD reaches the peak. And it decreases because
compared with PCD, PT-10 is 8.20% lower and PTEE-10 is the number of Gibbs transitions increases and the model
16.64% lower. distribution also gets steeper and steeper. PT-10 algorithm
Next, a small scale of experiment with 15 hidden units is is a Monte Carlo method based on tempering. It has more
conducted on the MNIST dataset. The log-likelihood can be even distribution when the temperature gets higher so that it
obtained by enumerating states of hidden units. Therefore, can overcome the steep distribution difficulties by conducting
high accuracy can be achieved. Figure 9 shows the averaged state transitions from low to high temperatures. So it has a
log-likelihood by training each model 5 times. better effect than that of PCD. PTEE-10 proposes a new type
Figure 9 shows that the likelihood of the proposed algo- of move called equienergy move, which improves the swap
rithm is not as good as the other three algorithms within rates between neighboring chains to some extent. But after
8 International Journal of Distributed Sensor Networks

−1 proposed algorithm has better performance over the three


The average log-likelihood of training data

−2 other algorithms, which is also validated on the toy dataset.


Moreover, the proposed algorithm only takes 2 or 3 times the
−3
time that PCD will take.
−4
−5 5.2. Performance Analysis of the Multilayer RBM Network
−6
Image Compression Method. In this section, the 256 × 256
Lena image is used. Compression ratio is used to evaluate the
−7 compression performance. It is the data size ratio between
−8 the compressed image and the original image. It can reflect
the storage size and transmission efficiency of an image. Peak
−9
signal-to-noise ratio (PSNR) and signal-to-noise ratio (SNR)
−10 are the two main criteria to evaluate the performance of the
0 0.5 1 1.5 2 2.5 3
reconstructed image quality. For an image that has 𝑀 × 𝑁
The number of parameter updating times (∗5000)
pixels,
PTEE-10
2
Our proposed algorithm ∑𝑀 𝑁
𝑖=1 ∑𝑗 [𝑥 (𝑖, 𝑗)]
SNR = 10 log 2
,
Figure 10: The average likelihood of the two algorithms when there ∑𝑀 𝑁
̂ (𝑖, 𝑗)]
𝑖=1 ∑𝑗 [𝑥 (𝑖, 𝑗) − 𝑥
are 10 hidden units on the toy dataset. (14)
2552
PSNR = 10 log 𝑀 𝑁 2
,
Table 1: The running time (in seconds) of different algorithms when ̂ (𝑖, 𝑗)]
(1/𝑀𝑁) ∑1 ∑1 [𝑥 (𝑖, 𝑗) − 𝑥
they are applied to toy dataset and MNIST dataset.
where 𝑥(𝑖, 𝑗) and 𝑥̂ (𝑖, 𝑗), respectively, denote the gray values
Dataset PCD PT-10 PTEE-10 Our proposed algorithm of the original image and the reconstructed image.
MNIST 17.497 183.23 165.35 40.273 In this experiment, ROI image compression method [14]
Toy 0.937 10.562 8.419 3.017 is compared with the proposed algorithm. The compression
ratio of the ROI compression method is calculated based on
the region of interest:
18∗5000 times of parameter updates, the likelihood of PTEE-
10 decreases gradually. However, the proposed algorithm will 𝑆𝑖 + 𝑆𝑏
𝑅ROI = , (15)
always try to skip the steep distributions by increasing the 𝑆
number of samples constantly. It has an overall effect which
where 𝑆𝑖 denotes the size of interest region, 𝑆𝑏 is the
is comparable with PTEE-10 at the beginning. After 4 ∗ 5000
background region size of the reconstructed image, and 𝑆
times of parameter updates, it has a better effect than the other
represents the original image size.
three algorithms. To validate the efficiency of the proposed
In the proposed algorithm, the compression ratio is
algorithm, experiments are conducted on the toy dataset.
determined by the number of neural units in hidden layer:
Only the PTEE-10 and our proposed algorithm are compared
on the toy dataset since PTEE-10 is the most competitive 𝐻1 ∗ 𝐻2 ∗ ⋅ ⋅ ⋅ ∗ 𝐻𝑛−1 ∗ 𝑈
model to our proposed algorithm. The number of hidden 𝑅RBM = , (16)
𝑀×𝑁
units is set to 10. RBMs are trained five times via the proposed
algorithm and PTEE-10. The average likelihood scores are where 𝑀 × 𝑁 is the number of units in the bottom input layer
shown in Figure 10. We can conclude that the proposed and it is the number of pixels of an image. 𝐻𝑖 is the number of
algorithm works better than PTEE-10 from Figure 10. nodes in the hidden layer 𝑖 of RBM. 𝑈 is the number of units
Moreover, we also recorded the running time of different in the output layer of network. 𝑛 is the number of layers.
algorithms on one epoch and on one training sample when During the experiment, the number of hidden layer
they are applied to the toy dataset and the MNIST dataset. units 𝑈 of RBM is set to 2, 4, and 8, respectively. We
All experiments were conducted on a Windows operating compare the reconstructed image quality of ROI compression
system machine with Intel® Core i5-3210M 2.50 GHz CPU TM
algorithm with that of the proposed algorithm under the
and 8 GB RAM. Table 1 displays the results. condition that the compression ratio is unchangeable. In
Based on the results in Table 1, we can see that the running a multilayer RBM network, the middle data quantification
time of our proposed algorithm is less than PT-10 and PTEE- process will bring about some damage to image compression.
10. Therefore, the increasing number of hidden layers will make
From all the simulation results above, we can see that the reconstructed image decline in quality. So we set the
the reconstruction error of the proposed algorithm is better number of layers in RBM to 3. The experiment results are
than that of PCD, PT-10, and PTEE-10. Under the same shown in Table 2.
conditions, PT-10 and PTEE-10 perform better than PCD, but The compression ratio is in inverse proportion to the
PT-10 and PTEE-10 will spend tenfold time. However, the number of hidden units 𝑈. From the objective quality
experiment results obtained on MINST dataset show that the assessments of the reconstructed Lena image in Table 2, we
International Journal of Distributed Sensor Networks 9

Energy consumption of the transmitting node (EJ/bit)


Table 2: The SNR and PSNR of Lena image when using multilayer 550
RBM network compression algorithm and interest based compres-
sion algorithm. 500
450
Methods SNR (dB) PSNR (dB)
𝑈=2 400
Multilayer RBM network 34.3152 49.9201 350
ROI 30.4617 47.1041 300
𝑈=4
250
Multilayer RBM network 42.6412 51.2351
200
ROI 36.7224 47.5036
𝑈=8 150
Multilayer RBM network 59.8027 74.2141 100
ROI 51.1349 60.2093 50
0 50 100 150 200 250
Distance (m)
can conclude that the quality of low compression ratio image No compression
is better than that of the high compression ratio image. Multilayer RBM network
When the compression ratio is high, although much storage ROI
space is saved, the reconstructed image cannot describe the
image texture details. From Table 2, when the number of Figure 11: The energy consumption of transmitting nodes.
hidden units is 8, the PSNR of the multilayer RBM network
compression method is 74.2141. At this time, the visual effects
of the reconstructed Lena image are very close to that of the
original image. carry out compression coding. Then, the compressed image
From Table 2, we can also conclude that the reconstructed is assigned to the transmitting node. In the experiment, we
image quality of multilayer RBM networks is superior to that calculate the energy consumption of every transmitting node.
of ROI compression method under the same compression We compare the proposed algorithm with the ROI lossy-
ratio. The ROI compression algorithm can compress the lossless image compression method under three conditions:
interest region and the background region, respectively, and (1) no image compression methods are used; (2) only the
therefore it can get high compression ratio. But the overall multilayer RBM network compression method is used; (3)
reconstructed image quality is not good because of the high only the ROI compression method is used. We compare the
compression ratio of background region. In multilayer RBM energy consumption of transmitting nodes under the three
networks, the compression ratio can be improved by setting conditions. In conditions (2) and (3), we compare the energy
the number of neural units in each layer of RBM. In addition, consumption of the two algorithms under the same SNR.
the training process in a multilayer RBM network is layered. The parameter settings are as follows: 𝐸elec = 0.5 ×
The data from the bottom input layer to the bottom hidden 10−6 EJ/bit, 𝜀amp = 1 × 10−9 EJ/bit/m2 . When the energy
layer is the first compression. The data from the first hidden consumption values of transmitting nodes are compared, the
layer to the second hidden layer is the second compression. distance between transmitting nodes is between 0 and 250
The second compression is based on the first compression. meters and the step size is 10 meters. The experiment results
RBM in each layer will compress the image and greatly are shown in Figure 11.
remove the redundancy of the original image. Figure 11 shows that more energy is consumed when
the transmitting distance increases. When the transmitting
5.3. The Energy Consumption Analysis of Wireless Sensor distances are the same, the energy consumption of trans-
Network. In this section, the energy consumption of a WSN mitting nodes using multilayer RBM network is obviously
is analyzed in the aspect of image transmitting. The energy smaller than that using ROI compression method. Although
consumed during the transmitting process can be calculated ROI compression method can code the interest region and
using the formula below: background region, respectively, and get high compression
ratio, it cannot ensure high quality of the reconstructed
𝑘
image. However, in the multilayer RBM network compression
𝐸𝑇𝑥 = ∑ (2𝐸elec + 𝜀amp 𝑔𝑖2 ) 𝑀, (17)
method, data redundancy is reduced in every layer and
𝑖
therefore it has high compression ratio.
where 𝐸elec is the line loss in the electrical circuit, 𝜀amp We continue to find out the relationship between image
represents the amplifier parameter, 𝑘 is the number of nodes, compression performance and compression energy con-
𝑔𝑖 is the distance between transmitting nodes, and 𝑀 is the sumption. The image compression energy consumption can
bit number of an image to be transmitted. be calculated using the formula below:
In the process of simulation, when cluster head nodes
2
receive images, they transmit these images to coding nodes to 𝐸𝐶 = 𝑁𝐶 ∗ 𝐶 ∗ 𝑉𝑑𝑑 , (18)
10 International Journal of Distributed Sensor Networks

1200 affect the delay in the sensor network. We should find more
suitable normalizing parameter function during the RBM
1000 training process. Besides, the problem of finding routing path
Total energy consumption (EJ/bit)

should also be considered. Therefore, we endeavor to find


out more integrated image compression method so as to
800 accelerate the application of WSNs in real life.

600 Conflict of Interests


The authors declare that there is no conflict of interests
400
regarding the publication of this paper.

200
Acknowledgments

0
This work is sponsored by the Fundamental Research Funds
0 50 100 150 200 250 for the Central Universities (no. LGZD201502), the Natural
Distance (m) Science Foundation of China (nos. 61403208 and 61373139),
and the Research and Innovation Projects for Graduates of
No compression
Multilayer RBM network
Jiangsu Graduates of Jiangsu Province (no. CXZZ12 0483).
ROI

Figure 12: Total energy consumption in WSN. References


[1] P. J. Sadowski, D. Whiteson, and P. Baldi, “Searching for Higgs
Boson decay modes with deep learning,” in Proceedings of the
where 𝑁𝐶 is the time spent during the image compression 28th Annual Conference on Neural Information Processing Sys-
process. 𝐶 is the capacitance and 𝑉𝑑𝑑 is the voltage. Therefore, tems (NIPS ’14), pp. 2393–2401, Montreal, Canada, December
under the same compression environment, the compression 2014.
energy consumption 𝐸𝐶 is only subject to 𝑁𝐶. The proposed [2] X. Ding, Y. Zhang, T. Liu, and J. Duan, “Deep learning for event-
image compression method includes additional RBM train- driven stock prediction,” in Proceedings of the 24th International
ing process. And the RBM is multilayered which will extend Joint Conference on Artificial Intelligence (ICJAI ’15), pp. 2327–
2333, ACM, Buenos Aires, Argentina, July 2015.
the training process. However, the RBM training process need
not be carried out every time when images are compressed. [3] S. Chatzis, “Echo-state conditional restricted boltzmann
machines,” in Proceedings of the 28th AAAI Conference on
When the RBM training process is finished, it can be used for
Artificial Intelligence, pp. 1738–1744, 2014.
all coding nodes.
[4] T. Osogami and M. Otsuka, “Restricted Boltzmann machines
We continue to test the total energy consumption in modeling human choice,” in Proceedings of the 28th Annual
WSN when the three image compression algorithms are used, Conference on Neural Information Processing Systems (NIPS ’14),
respectively, and Figure 12 shows the results. pp. 73–81, Montreal, Canada, December 2014.
Figure 12 shows that although RBM training process [5] C. Zhang, G.-L. Sun, W.-X. Li, Y. Gao, and L. Lv, “Research
extends 𝑁𝐶, the total energy consumption of the proposed on data compression algorithm based on prediction coding
method is superior to the other two methods when the for wireless sensor network nodes,” in Proceedings of the Inter-
transmitting distance increases. Based on Table 2, the pro- national Forum on Information Technology and Applications
posed image compression method has better reconstructed (IFITA ’09), vol. 1, pp. 283–286, IEEE, Chengdu, China, May
image quality than ROI under the same compression ratio. 2009.
Therefore, we can conclude that the proposed method can [6] L. Xiang-Yu, W. Ya-Zhe, and Y. Xiao-Chun, “Facing the wireless
ensure a better compression performance and smaller energy sensor network streaming data compression technology,” Com-
puter Science, vol. 34, no. 2, pp. 141–143, 2007.
consumption at the same time.
[7] L.-C. Wang and C.-X. Ma, “Based on a linear model of the
space-time data compression algorithm in sensor networks,”
6. Conclusions and Future Work Electronics and Information Technology, vol. 32, no. 3, pp. 755–
758, 2010.
Image compression is an important research field in WSNs. [8] L. Wang and S.-W. Zhou, “Based on interval wavelet transform
It is difficult to find a comprehensive method of image in hybrid entropy data compression algorithm in sensor net-
compression because of the complex features in sensor net- work,” Computer Applications, vol. 25, no. 11, pp. 1676–1678,
works. A multilayer RBM network based image compression 2005.
method is proposed in this paper. And an improved RBM [9] Z. Si-wang, L. Ya-ping, and Z. Jian-ming, “Based on ring model
training algorithm based on alternative iteration is presented of wavelet compression algorithm in sensor networks,” Journal
to improve the likelihood of RBM. However, there remain of Software, vol. 18, no. 3, pp. 669–680, 2007.
many problems to be solved when using multilayer RBM [10] Z. Tie-jun, L. Ya-ping, and Z. Si-wang, “Based on adaptive
network to compress image. The multilayer RBM network can multiple modules data compression algorithm of wavelet in
International Journal of Distributed Sensor Networks 11

wireless sensor networks,” Journal of Communication, vol. 30, the International Joint Conference on Neural Networks (IJCNN
no. 3, pp. 48–53, 2008. ’10), pp. 1–8, Barcelona, Spain, July 2010.
[11] S.-W. Zhou, Y.-P. Lin, and S.-T. Ye, “A kind of sensor network [28] N. Ji and J. Zhang, “Parallel tempering with equi-energy moves
storage effective wavelet incremental data compression algo- for training of restricted boltzmann machines,” in Proceedings
rithm,” Journal of Computer Research and Development, vol. 46, of the International Joint Conference on Neural Networks (IJCNN
no. 12, pp. 2085–2092, 2009. ’14), pp. 120–127, Beijing, China, July 2014.
[12] W.-H. Luo and J.-L. Wang, “Based on chain model of distributed
wavelet compression algorithm,” Computer Engineering, vol. 36,
no. 16, pp. 74–76, 2010.
[13] F. Xiang-Hui, L. Shi-Ning, and D. Peng-Lei, “Adaptive nonde-
structive data compression system of WSN,” Computer Mea-
surement and Control, vol. 18, no. 2, pp. 463–465, 2010.
[14] N. Cai-xiang, Study on Image Data Compression Processing
in Wireless Multimedia Sensor Network, Chang’an University,
Xi’an, China, 2014.
[15] G. E. Hinton, “Training products of experts by minimizing
contrastive divergence,” Neural Computation, vol. 14, no. 8, pp.
1771–1800, 2002.
[16] I. Sutskever and T. Tieleman, “On the convergence properties of
contrastive divergence,” Journal of Machine Learning Research—
Proceedings Track, vol. 9, pp. 789–795, 2010.
[17] T. Tieleman, “Training restricted Boltzmann machines using
approximations to the likelihood gradient,” in Proceedings of the
25th International Conference on Machine Learning, pp. 1064–
1071, ACM, Helsinki, Finland, July 2008.
[18] T. Tieleman and G. E. Hinton, “Using fast weights to improve
persistent contrastive divergence,” in Proceedings of the 26th
Annual International Conference on Machine Learning (ICML
’09), pp. 1033–1040, ACM, June 2009.
[19] G. Desjardins, A. Courville, and Y. Bengio, “Adaptive parallel
tempering for stochastic maximum likelihood learning of
RBMs,” in Neural Information Processing Systems (NIPS), MIT
Press, 2010.
[20] J. Xu, H. Li, and S. Zhou, “Improving mixing rate with tempered
transition for learning restricted Boltzmann machines,” Neuro-
computing, vol. 139, pp. 328–335, 2014.
[21] Y. Hu, Markov chain Monte Carlo based improvements to
the learning algorithm of restricted Boltzmann machines [M.S.
thesis], Shanghai Jiao Tong University, Shanghai, China, 2012.
[22] Y. Bengio, A. C. Courville, and P. Vincent, Unsupervised Feature
Learning and Deep Learning: A Review and New Perspectives,
Department of Computer Science and Operations Research,
University of Montreal, Montreal, Canada, 2012.
[23] A. Fischer and C. Igel, “Training restricted Boltzmann
machines: an introduction,” Pattern Recognition, vol. 47, no. 1,
pp. 25–39, 2014.
[24] A. Fischer and C. Igel, “An Mpirical analysis of the divergence of
Gibbs sampling based learning algorithms for restricted Boltz-
mann machines,” in Artificial Neural Networks-ICANN 2010:
20th International Conference, Thessaloniki, Greece, September
15–18, 2010, Proceedings, Part III, vol. 6354 of Lecture Notes in
Computer Science, pp. 208–217, Springer, Berlin, Germany, 2010.
[25] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
learning applied to document recognition,” Proceedings of the
IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[26] G. Desjardins, A. Courville, and Y. Bengio, “Parallel tempering
for training of restricted Boltzmann machines,” Journal of
Machine Learning Research Workshop & Conference Proceed-
ings, vol. 9, pp. 145–152, 2010.
[27] K. H. Cho, T. Raiko, and A. Ilin, “Parallel tempering is efficient
for learning restricted Boltzmann machines,” in Proceedings of

Vous aimerez peut-être aussi