Académique Documents
Professionnel Documents
Culture Documents
ON
By
At
(July-December, 2018)
A REPORT
ON
By
At
(July-December, 2018)
ACKNOWLEDGEMENT
I would like to express my special thanks of gratitude to Lending Analytics team who chose me
to work for their project on implementing a neural network for forecasting their collection data.
My mentors Lakshmi Prasad sir and Prachi ma’am guided me to learn Deep Learning concepts
and Java Language. I would like to thank Paramjeet ma’am for teaching web concepts.
I express my deepest thanks to Pankaj Sir (Vice President) and Parag Bhise sir (Senior Vice
President) for reviewing my project weekly and giving necessary advices and guidance to make
the project more useful.
I would like to thank Ritu Arora ma’am for guiding about the internship and reviewing our
project activity at regular times.
I perceive this opportunity as a big milestone in my career development. I will strive to use
gained skills and knowledge in the best possible way, and I will continue to work on their
improvement, in order to attain desired career objectives.
- Harshit Garg
2014A3PS0257G
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
Practice School Division
Abstract:-
Large amount of collection data of past years can be processed through a neural network to
forecast for future years reference. This is what the project is all about, implementing a neural
network using LSTM model with DL4J library to forecast multiple targets of the collection data
of Lending Analytics department.
The project is built in Java. The work is mainly divided into 4 parts.
Building the neural network model, creating a dataset iterator file in java, creating a plot utility
file in Java and the data prediction file. The data is usually multivariate and the prediction
concerns multi-targeting.
In the neural network model there are 4 layers used. Input and output layers with two hidden
layers. Three types of activation functions are used to filter the data accordingly.
The model is a LSTM model which is trained with past years data. With its two hidden layers
and memory, a LSTM model learns data trend and can forecast for future years.
With my mentors’ aid I analyzed passing the univariate data into neural network and classifying
the data into labeled groups.
And I learnt to pass multivariate data into model using DL4J commands.
The project at the end aims to forecast any type of multivariate data with sufficient examples to
train neural net.
The basic unit of computation in a neural network is the neuron, often called a node or unit. It
receives input from some other nodes, or from an external source and computes an output. Each
input has an associated weight (w), which is assigned on the basis of its relative importance to
other inputs. The node applies a function f (defined below) to the weighted sum of its inputs as
shown in Figure 1 below:
The above network takes numerical inputs X1 and X2 and has weights w1 and w2 associated with
those inputs. Additionally, there is another input 1 with weight b (called the Bias) associated with
it. The output Y from the neuron is computed as shown in the Figure 1. The function f is non-linear
and is called the Activation Function. The purpose of the activation function is to introduce non-
linearity into the output of a neuron. This is important because most real world data is non
linear and we want neurons to learn these non linear representations.
An Artificial Neural Network (ANN) is a computational model that is inspired by the way
biological neural networks in the human brain process information. Artificial Neural Networks
have generated a lot of excitement in Machine Learning research and industry. Artificial
intelligence, cognitive modelling, and neural networks are information processing paradigms
inspired by the way biological neural systems process data. Artificial intelligence and cognitive
modeling try to simulate some properties of biological neural networks. In the artificial
intelligence field, artificial neural networks have been applied successfully to speech
recognition, image analysis and adaptive control, in order to construct software
agents or autonomous robots.
1
Activation Functions:-
Every activation function (or non-linearity) takes a single number and performs a certain fixed
mathematical operation on it. There are several activation functions you may encounter in practice:
σ(x) = 1 / (1 + exp(−x))
tanh(x) = 2σ(2x) − 1
ReLU: ReLU stands for Rectified Linear Unit. It takes a real-valued input and thresholds
it at zero (replaces negative values with zero)
Importance of Bias: The main function of Bias is to provide every node with a trainable constant
value (in addition to the normal inputs that the node receives).
2
Feedforward Neural Network
The feedforward neural network was the first and simplest type of artificial neural network
devised. It contains multiple neurons (nodes) arranged in layers. Nodes from adjacent layers
have connections or edges between them. All these connections have weights associated with
them.
1. Input Nodes – The Input nodes provide information from the outside world to the network
and are together referred to as the “Input Layer”. No computation is performed in any of
the Input nodes – they just pass on the information to the hidden nodes.
2. Hidden Nodes – The Hidden nodes have no direct connection with the outside world
(hence the name “hidden”). They perform computations and transfer information from the
input nodes to the output nodes. A collection of hidden nodes forms a “Hidden Layer”.
While a feedforward network will only have a single input layer and a single output layer,
it can have zero or multiple Hidden Layers.
3. Output Nodes – The Output nodes are collectively referred to as the “Output Layer” and
are responsible for computations and transferring information from the network to the
outside world.
3
In a feedforward network, the information moves in only one direction – forward – from the input
nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in
the network (this property of feed forward networks is different from Recurrent Neural Networks
in which the connections between the nodes form a cycle).
1. Single Layer Perceptron – This is the simplest feedforward neural network and does not
contain any hidden layer.
2. Multi Layer Perceptron – A Multi Layer Perceptron has one or more hidden layers. Multi
Layer Perceptrons below since they are more useful than Single Layer Perceptrons for
practical applications today.
A Multi Layer Perceptron (MLP) contains one or more hidden layers (apart from one input and
one output layer). While a single layer perceptron can only learn linear functions, a multi layer
perceptron can also learn non – linear functions.
Figure 4 shows a multi layer perceptron with a single hidden layer. Note that all connections have
weights associated with them, but only three weights (w0, w1, w2) are shown in the figure.
Input Layer: The Input layer has three nodes. The Bias node has a value of 1. The other two
nodes take X1 and X2 as external inputs (which are numerical values depending upon the input
dataset). As discussed above, no computation is performed in the Input layer, so the outputs from
nodes in the Input layer are 1, X1 and X2 respectively, which are fed into the Hidden Layer.
Hidden Layer: The Hidden layer also has three nodes with the Bias node having an output of
1. The output of the other two nodes in the Hidden layer depends on the outputs from the Input
layer (1, X1, X2) as well as the weights associated with the connections (edges). Figure 4 shows
the output calculation for one of the hidden nodes (highlighted). Similarly, the output from other
hidden node can be calculated. Remember that f refers to the activation function. These outputs
are then fed to the nodes in the Output layer.
4
Figure 4: A multi layer perceptron having one hidden layer
Output Layer: The Output layer has two nodes which take inputs from the Hidden layer and
perform similar computations as shown for the highlighted hidden node. The values calculated (Y1
and Y2) as a result of these computations act as outputs of the Multi Layer Perceptron.
Given a set of features X = (x1, x2, …) and a target y, a Multi Layer Perceptron can learn the
relationship between the features and the target, for either classification or regression.
Lets take an example to understand Multi Layer Perceptrons better. Suppose we have the following
student-marks dataset:
5
The two input columns show the number of hours the student has studied and the mid term marks
obtained by the student. The Final Result column can have two values 1 or 0 indicating whether
the student passed in the final term. For example, we can see that if the student studied 35 hours
and had obtained 67 marks in the mid term, he / she ended up passing the final term.
Now, suppose, we want to predict whether a student studying 25 hours and having 70 marks in the
mid term will pass the final term.
This is a binary classification problem where a multi layer perceptron can learn from the given
examples (training data) and make an informed prediction given a new data point. We will see
below how a multi layer perceptron learns such relationships.
The process by which a Multi Layer Perceptron learns is called the Backpropagation algorithm.
An ANN consists of nodes in different layers; input layer, intermediate hidden layer(s) and the
output layer. The connections between nodes of adjacent layers have “weights” associated with
them. The goal of learning is to assign correct weights for these edges. Given an input vector, these
weights determine what the output vector is.
In supervised learning, the training set is labeled. This means, for some given inputs, we know the
desired/expected output (label).
Backpropagation-Algorithm:
Initially all the edge weights are randomly assigned. For every input in the training dataset, the
ANN is activated and its output is observed. This output is compared with the desired output that
we already know, and the error is “propagated” back to the previous layer. This error is noted and
the weights are “adjusted” accordingly. This process is repeated until the output error is below a
predetermined threshold.
Once the above algorithm terminates, we have a “learned” ANN which, we consider is ready to
work with “new” inputs. This ANN is said to have learned from several examples (labeled data)
and from its mistakes (error propagation).
6
The Multi Layer Perceptron shown in Figure 5 has two nodes in the input layer (apart from the
Bias node) which take the inputs ‘Hours Studied’ and ‘Mid Term Marks’. It also has a hidden layer
with two nodes (apart from the Bias node). The output layer has two nodes as well – the upper
node outputs the probability of ‘Pass’ while the lower node outputs the probability of ‘Fail’.
In classification tasks, we generally use a Softmax function as the Activation Function in the
Output layer of the Multi Layer Perceptron to ensure that the outputs are probabilities and they
add up to 1. The Softmax function takes a vector of arbitrary real-valued scores and squashes it to
a vector of values between zero and one that sum to one. So, in this case,
All weights in the network are randomly assigned. Lets consider the hidden layer node marked V in
Figure 5 below. Assume the weights of the connections from the inputs to that node are w1, w2
and w3 (as shown).
The network then takes the first training example as input (we know that for inputs 35 and 67, the
probability of Pass is 1).
Then output V from the node in consideration can be calculated as below (f is an activation
function such as sigmoid):
Similarly, outputs from the other node in the hidden layer is also calculated. The outputs of the
two nodes in the hidden layer act as inputs to the two nodes in the output layer. This enables us to
calculate output probabilities from the two nodes in output layer.
Suppose the output probabilities from the two nodes in the output layer are 0.4 and 0.6 respectively
(since the weights are randomly assigned, outputs will also be random). We can see that the
calculated probabilities (0.4 and 0.6) are very far from the desired probabilities (1 and 0
respectively), hence the network in Figure 5 is said to have an ‘Incorrect Output’.
7
Figure 5: Forward propagation step in a multi layer perceptron
We calculate the total error at the output nodes and propagate these errors back through the network
using Backpropagation to calculate the gradients. Then we use an optimization method such
as Gradient Descent to ‘adjust’ all weights in the network with an aim of reducing the error at the
output layer. This is shown in the Figure 6 below (ignore the mathematical equations in the figure
for now).
Suppose that the new weights associated with the node in consideration are w4, w5 and w6 (after
Backpropagation and adjusting weights).
Figure 6: Backward propagation and weight updation step in a multi layer perceptron
8
If we now input the same example to the network again, the network should perform better than
before since the weights have now been adjusted to minimize the error in prediction. As shown in
Figure 7, the errors at the output nodes now reduce to [0.2, -0.2] as compared to [0.6, -0.4] earlier.
This means that our network has learnt to correctly classify our first training example.
The numbers within the neurons represent each neuron's explicit threshold (which can be
factored out so that all neurons have the same threshold, usually 1). The numbers that annotate
arrows represent the weight of the inputs. This net assumes that if the threshold is not reached,
zero (not -1) is output. The bottom layer of inputs is not always considered a real neural network
layer.
9
What is Time Series?
A time series is a series of data points indexed (or listed or graphed) in time order. Most
commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it
is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts
of sunspots, and the daily closing value of the stock prices average.
Time series are very frequently plotted via line charts. Time series are used in statistics, signal
processing, pattern recognition, econometrics, mathematical finance, weather
forecasting, earthquake prediction, electroencephalography, control
engineering, astronomy, communications engineering, and largely in any domain of
applied science and engineering which involves temporal measurements.
Time series forecasting is the use of a model to predict future values based on previously
observed values. While regression analysis is often employed in such a way as to test theories
that the current values of one or more independent time series affect the current value of another
time series, this type of analysis of time series is not called "time series analysis", which focuses
on comparing values of a single time series or multiple dependent time series at different points
in time.[1] Interrupted time series analysis is the analysis of interventions on a single time series.
In statistics, prediction is a part of statistical inference. One particular approach to such inference
is known as predictive inference, but the prediction can be undertaken within any of the several
approaches to statistical inference. Indeed, one description of statistics is that it provides a means
of transferring knowledge about a sample of a population to the whole population, and to other
related populations, which is not necessarily the same as prediction over time. When information
is transferred across time, often to specific points in time, the process is known as forecasting.
A recurrent neural network (RNN) is a class of artificial neural network where connections
between nodes form a directed graph along a sequence. This allows it to exhibit temporal
dynamic behavior for a time sequence. Unlike feedforward neural networks, RNNs can use their
internal state (memory) to process sequences of inputs. This makes them applicable to tasks such
as unsegmented, connected handwriting recognition or speech recognition.
Because of their internal memory, RNN’s are able to remember important things about the input
they received, which enables them to be very precise in predicting what’s coming next. This is the
reason why they are the preferred algorithm for sequential data like time series, speech, text,
financial data, audio, video, weather and much more because they can form a much deeper
understanding of a sequence and its context, compared to other algorithms.
In a RNN, the information cycles through a loop. When it makes a decision, it takes into
consideration the current input and also what it has learned from the inputs it received previously.
10
The two images below illustrate the difference in the information flow between a RNN and a
Feed-Forward Neural Network.
A usual RNN has a short-term memory. In combination with a LSTM they also have a long-term
memory, but we will discuss this further below.
Another good way to illustrate the concept of a RNN’s memory is to explain it with an example:
Imagine we have a normal feed-forward neural network and give it the word “neuron” as an input
and it processes the word character by character. At the time it reaches the character “r”, it has
already forgotten about “n”, “e” and “u”, which makes it almost impossible for this type of neural
network to predict what character would come next.A Recurrent Neural Network is able to
remember exactly that, because of its internal memory. It produces output, copies that output and
loops it back into the network.
Recurrent Neural Networks add the immediate past to the present. Therefore a Recurrent Neural
Network has two inputs, the present and the recent past. This is important because the sequence of
data contains crucial information about what is coming next, which is why a RNN can do things
11
other algorithms can’t. A Feed-Forward Neural Network assigns, like all other Deep Learning
algorithms, a weight matrix to its inputs and then produces the output. RNN’s apply weights to
the current and also to the previous input. Furthermore they also tweak their weights for both
through gradient descent and Backpropagation Through Time. While Feed-Forward Neural
Networks map one input to one output, RNN’s can map one to many, many to many (translation)
and many to one (classifying a voice).
12
Backpropagation Through Time (BPTT) does Backpropagation on an unrolled Recurrent Neural
Network. Unrolling is a visualization and conceptual tool, which helps us to understand what’s
going on within the network. Most of the time when we implement a Recurrent Neural Network
in the common programming frameworks, they automatically take care of the Backpropagation
but we need to understand how it works, which enables us to troubleshoot problems that come up
during the development process.
We can view a RNN as a sequence of Neural Networks that you train one after another with
backpropagation.
The image below illustrates an unrolled RNN. On the left, we can see the RNN, which is unrolled
after the equal sign. There is no cycle after the equal sign since the different timesteps are
visualized and information gets’s passed from one timestep to the next. This illustration also
shows why a RNN can be seen as a sequence of Neural Networks.
Within BPTT the error is back-propagated from the last to the first timestep, while unrolling all
the timesteps. This allows calculating the error for each timestep, which allows updating the
weights. Note that BPTT can be computationally expensive when you have a high number of
timesteps.
13
A gradient is a partial derivative with respect to its inputs. A gradient measures how much the
output of a function changes, if we change the inputs a little bit.
We can also think of a gradient as the slope of a function. The higher the gradient, the steeper the
slope and the faster a model can learn. But if the slope is zero, the model stops to learning. A
gradient simply measures the change in all weights with regard to the change in error.
Vanishing Gradients
We speak of „Vanishing Gradients“ when the values of a gradient are too small and the model
stops learning or takes way too long because of that. This was a major problem in the 1990s and
much harder to solve than the exploding gradients.
LSTM Networks
Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN,
capable of learning long-term dependencies. They were introduced by Hochreiter & Schmidhuber
(1997). They work tremendously well on a large variety of problems, and are now widely used.
LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering
information for long periods of time is practically their default behavior, not something they struggle
to learn!
All recurrent neural networks have the form of a chain of repeating modules of neural network. In
standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer.
Figure 13: The repeating module in a standard RNN contains a single layer
LSTMs also have this chain like structure, but the repeating module has a different
structure. Instead of having a single neural network layer, there are four, interacting in a
very special way.
14
Figure 14: The repeating module in a LSTM contains four interacting layers
Notations used:-
15
The LSTM does have the ability to remove or add information to the cell state, carefully regulated by
structures called gates.
Gates are a way to optionally let information through. They are composed out of a sigmoid neural net
layer and a pointwise multiplication operation.
The first step in LSTM is to decide what information is going to be thrown away from the cell state.
This decision is made by a sigmoid layer called the “forget gate layer.” It looks at ht−1 and xt, and
outputs a number between 0 and 1 for each number in the cell state Ct−1. A 1 represents “completely
keep this” while a 0 represents “completely get rid of this.”
Consider the example of a language model where next word is predicted based on all the previous
ones. In such a problem, the cell state might include the gender of the present subject, so that the
correct pronouns can be used. When a new subject is seen, we want to forget the gender of the old
subject.
16
The next step is to decide what new information we’re going to store in the cell state. This has two
parts. First, a sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a
tanh layer creates a vector of new candidate values, C~tC~t, that could be added to the state. In the
next step, we’ll combine these two to create an update to the state.
It’s now time to update the old cell state, Ct−1, into the new cell state Ct. The previous steps already
decided what to do, we just need to actually do it.
We multiply the old state by ft, forgetting the things we decided to forget earlier. Then we add it∗C~t.
This is the new candidate values, scaled by how much we decided to update each state value.
In the case of the language model, this is where we’d actually drop the information about the old
subject’s gender and add the new information, as we decided in the previous steps.
Finally, we need to decide what we’re going to output. This output will be based on our cell state, but
will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re
going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1)
and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.
For the language model example, since it just saw a subject, it might want to output information
relevant to a verb, in case that’s what is coming next. For example, it might output whether the subject
is singular or plural, so that we know what form a verb should be conjugated into if that’s what follows
next.
17
Figure 21: Output Layer
Robot control
Time series prediction
Speech recognition
Rhythm learning
Music composition
Grammar learning
Handwriting recognition
Human action recognition
Sign Language Translation
Protein Homology Detection
Predicting subcellular localization of proteins
Time series anomaly detection
Several prediction tasks in the area of business process management
Prediction in medical care pathways
Semantic parsing
Object Co-segmentation
18
DL4J Library
DL4J library is extensively used in the project of sales forecasting. It stands for Deep Learning
for Java. Deeplearning4j is a domain-specific language to configure deep neural networks, which
are made of multiple layers. Everything starts with a MultiLayerConfiguration, which organizes
those layers and their hyperparameters.
Hyperparameters are variables that determine how a neural network learns. They include how
many times to update the weights of the model, how to initialize those weights, which activation
function to attach to the nodes, which optimization algorithm to use, and how fast the model
should learn.
.weightInit(WeightInit.XAVIER)
.activation("relu")
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.updater(new Sgd(0.05))
.list()
.backprop(true)
.build();
Before the algorithm can start learning, data is prepared, even if a trained model is available.
Preparing data means loading it and putting it in the right shape and value range (e.g.
normalization, zero-mean and unit variance). Building these processes from scratch is error
prone, so DataVec should be used wherever possible.
19
Deeplearning4j works with a lot of different data types, such as images, CSV, ARFF, plain text
and, with Apache Camel integration.
Once the DataSetIterator is ready, which is just a pattern that describes sequential access to data,
it can be used to retrieve the data in a format suited for training a neural net model.
Normalizing Data
Neural networks work best when the data they’re fed is normalized, constrained to a range
between -1 and 1. There are several reasons for that. One is that nets are trained using gradient
descent, and their activation functions usually having an active range somewhere between -1 and
1. Even when using an activation function that doesn’t saturate quickly, it is still good practice to
constrain values to this range to improve performance.
It does that by keeping the values in several instances of INDArray: one for the features of your
examples, one for the labels and two additional ones for masking, if you are using time series
data.
An INDArray is one of the n-dimensional arrays, or tensors, used in ND4J. In the case of the
features, it is a matrix of the size Number of Examples x Number of Features. Even with only a
single example, it will have this shape.
This is another important concept for deep learning: mini-batching. In order to produce accurate
results, a lot of real-world training data is often needed. Often that is more data than can fit in
available memory, so storing it in a single DataSet sometimes isn’t possible. But even if there is
enough data storage, there is another important reason not to use all of the data at once.
Since the model is trained using gradient descent, it requires a good gradient to learn how to
minimize error. Using only one example at a time will create a gradient that only takes errors
20
produced with the current example into consideration. This would make the learning behavior
erratic, slow down the learning, and may not even lead to a usable result.
A mini-batch should be large enough to provide a representative sample of the real world (or at
least your data). That means that it should always contain all of the classes that are to be
predicted and that the count of those classes should be distributed in approximately the same way
as they are in the overall data.
MultiLayerConfiguration conf =
new NeuralNetConfiguration.Builder()
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.list(
new
DenseLayer.Builder().nIn(numInputs).nOut(numHiddenNodes).activation("relu").build(),
new
OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD).activation("softmax").nIn(num
HiddenNodes).nOut(numOutputs).build()
).backprop(true).build();
Unlike other frameworks, DL4J splits the optimization algorithm from the updater algorithm.
This allows for flexibility as we seek a combination of optimizer and updater that works best for
the data and problem.
Besides the DenseLayer and OutputLayer there are several other layer types,
like GravesLSTM, ConvolutionLayer, RBM, EmbeddingLayer, etc. Using those layers we can
define simple neural networks, recurrent and convolutional networks.
Training a Model
After configuring the neural, we will have to train the model. The simplest case is to simply call
the .fit() method on the model configuration with your DataSetIterator as an argument. This will
train the model on all the data once. A single pass over the entire dataset is called an epoch.
DL4J has several different methods for passing through the data more than just once.
21
The simplest way, is to reset DataSetIterator and loop over the fit call as many times as you
want. This way we can train our model for as many epochs as we think is a good fit.
The Evaluation class is used for evaluation. Slightly different methods apply to evaluating a
normal feed forward networks or recurrent networks.
DL4J provides a listener facility to help us monitor our network’s performance visually. We can
set up listeners for the model that will be called after each mini-batch is processed. One of most
often used listeners that DL4J ships out of the box is ScoreIterationListener.
While ScoreIterationListener will simply print the current error score for our
network, HistogramIterationListener will start up a web UI that to provide us with a host of
different information that we can use to fine tune our network configuration.
22
Forecasting the Collection Data:-
Collection data from the company’s database is passed to the neural network to predict the prices
for a day ahead.
Prediction is done with multiple targets with many attributes like Total amount overdue, Total
amount outstanding, Count of people and information about buckets.
The project is divided into four parts:-
1. Building a model with appropriate hidden layers and activation functions.
2. Creating a DataSetIterator file in java to process the input data.
3. Creating a plot utility function to plot the graph between actual and predicted
value.
4. Creating the Prediction file which will run with above files.
In all these scenarios, primary data would not be very dependable. Therefore, primary data
collection should be done with utmost caution and prudence. Primary data helps the researchers
in understanding the real situation of a problem. It presents the current scenario in front of the
researchers; therefore, it is more effective in taking the business decisions.
Secondary Data:
Refers to the data that is collected in the past, but can be utilized in the present scenario/research
work. The collection of secondary data requires less time in comparison to primary data.
23
Sample Collection Data
Here,
Bucket 1% is the percentage of total Loan cases which fall in DPD 1-30.
DPD is Days Past Due;
Bucket 2% is the percentage of total Loan cases which fall in DPD 31-60.
Self_Curing% is the percentage of total Loan cases which are sorted out before due date.
Amount_Arrer= (Total_Loan_Cases)*1000.
Recovery% = Percentage of Arrer Amount recovered.
Recovery Amount=Amount_Arrer*(Recovery%*0.01)
This sequential data of collection department is fed to neural model. There are 2000 such rows,
out of which 90% are used for training and rest for testing.
So if number of working days in a month are 22, then for testing we pass example 1801 to 1822
in the model and the model predicts for 1823th.
Similarly when we pass data from 1802 to 1823 then it predicts 1824th . In this way we plot the
graph of predictions vs actual data and study its deviation from the actual values.
This serves as the capacity planning tool for collection department which can then use it to
automatically plan capacity for next month.
24
Figure 22:- Model Output
The model outputs the above graph for Bucket 1%. The red line shows predictions and blue line
the actual values.
The important thing to note is that the pattern of predictions is almost same as the actual values
despite having some deviations.
Studying above results bank can optimize their collection capacity.
25
Conclusion:-
For building a forecasting model using neural network in Java, DL4J library is extensively used.
A prior knowledge of deep learning concepts and neural network is required.
Optimization of the model is done by using appropriate activation functions and number of
hidden layers used.
If the data available is raw dataset then it is first processed using DataSetIterator so that it can be
fed to the network. Prediction is done with multiple attributes together. LSTM model is trained
with thousands of examples before it can forecast data.
In the practical world neural network is seen as a vast subject. Many data scientists solely focus
only on neural network techniques. Neural networks particularly work well on some particular
class of problems like image recognition. The neural network algorithms are very calculation
intensive. They require highly efficient computing machines. Large datasets take a significant
amount of runtime on R. Currently, there is a lot of exciting research going on, around neural
networks.
26