Vous êtes sur la page 1sur 110

Computer Science Coursework - DeepLearning Library

Aamir Soni

May 2017 - March 2018


Contents

1 Analysis 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Application of AI/ML libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Methods of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.1 Coursera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.2 Khan Academy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.3 Medium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Current Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6.1 Tensorflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6.2 Pytorch and Theano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Target Audience and Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.8 Aims & Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8.1 My Core Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8.2 Desired Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Design 8
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1.1.1 Forward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1.1.2 Back-propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1.1.3 Loss-Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1.1.3.1 Squared-error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1.1.3.2 Soft-max cost function . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1.2 Optimisers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1.2.1 Momentum Optimiser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1.2.2 RMS Optimiser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1.2.3 Adaptive Moment Estimation (Adam) Optimiser . . . . . . . . . . . . . . . 14
2.1.1.3 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1.3.1 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1.3.2 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.1.3.3 Matrix Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1.3.4 Basic Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Data-Structures & Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 User-Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1
3 Technical Solution 25
3.1 Base Classes Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Techniques Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1 Linear Algebra Techniques Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.2 Techniques - Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.3 Optimisation Techniques Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 Testing 47
4.1 Non-erroneous Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.1 Matrix Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.2 Volume Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.3 Mapping Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1.4 Net Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.5 Neural Networks Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.6 Screen shots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Erroneous Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.1 Matrix Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.2 Volume Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Evaluation 63
5.1 Meeting the Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1.1 Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1.2 Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2 Re-Interviewing Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3 Revisiting the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6 Code Listings 66
6.1 NeuralDot.Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.2 NeuralDot.Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.3 NeuralDot.Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.4 NeuralDot.Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.5 NeuralDot.Dense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.6 NeuralDot.Conv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.7 NeuralDot.Reshape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.8 NeuralDot.MaxPool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.9 NeuralDot.Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.10 NeuralDot.Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.11 NeuralDot.netData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.12 NeuralDot.Optimiser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.13 NeuralDot.GradientDescentOptimiser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.14 NeuralDot.AdvancedOptimisers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.15 NeuralDot.Momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.16 NeuralDot.RMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.17 NeuralDot.ADAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

2
Analysis

1.1 Introduction
Machine Learning has recently become a very popular field in Computer Science thanks to an area known as ”Deep
Learning”, which has allowed researches to achieve record breaking results in tasks such as computer vision, natural
language processing, drug synthesis, health care, financial trading, physical simulation and many other tasks, which
were difficult to do before the rise of Deep learning. Thus there are many libraries online for researchers, AI scientists
or students with the relevant knowledge to carry out their tasks, however for students that are passionate and have
very limited knowledge of AI, it can be a daunting experience to learn how to use a machine learning library.
The hardest part however, is the theory behind the machine learning which is required to make full use of the
library, can be extremely difficult as the mathematics behind the algorithms require graduate level mathematics,
causing many students to give up, even before building their very own first ML model.

1.2 Application of AI/ML libraries


ML libraries, are there to make the process of creating an AI for a specific task seamlessly easy. The core algo-
rithms running under the surface for these libraries, include matrix operations which includes inverse, determinant,
multiplication, back propagation with its many variations to reduce a cost function, convolutions and many other
heavy computational tasks. By using these libraries, the user can focus on the important tasks such as choosing
the best parameters for their ML models. Once the user has made their ML model, they can then use it for their task.

Some examples are:


• Using a ML model for a bio-metric recognition system, such as face, sound or some other feature
• Using a ML model for a game playing agent, this could include making an AI opponent for a game of noughts
and crosses to making an AI to play mine-craft.
• Using a ML model to simulate biological systems over time, this may include natural selection or even see how
animals may compete against each other and how changes in the initial parameters such as the size, colour
or any other parameter may have an impact on the system over time
• A ML model may be used to make accurate predictions about some chaotic system. This may include
predicting the weather, stock markets or even immigration patters.
• A ML model may be used to develop a self driving vehicle, which can include cars, aeroplanes and ships.
The most impressive part about this is that developing an AI model for a self driving vehicle wouldn’t be much
different to developing an AI model for a simulation. This is because the basic framework behind the model is
the same, while the only difference being are the parameters chosen such as the model architecture, size and the
training data, but the principles remain the same. This just shows how easy the process becomes using a machine
learning library.
From this we can see that the applications of AI are endless, hence it is no surprise that many beginners would
not want to miss out on any opportunity with the on-going hype of AI.

3
1.3 Machine Learning Techniques
All machine learning models work the same way, with the only difference being the inference process. Every machine
learning model has 2 different phases. The training phase and the inference phase.
Before progressing to the training phase, a machine learning engineer should have a training set. A training set
is split up into 2 difference sets. One set is used to make the predictions while the others are used as labels for
the other set. An example of this is for a home face recognition. Say, you wanted to make a ML model that can
detect faces inside the house. So the training data would in this case be the face images of the people that live in
the house and the labels would be the name of each face in the training set.

The training phase is as follows:


1. An inference is made using the training data

2. The outputs of the inference is then compared with the labels of the training set to find the error for each
training iterations using a costf unction.
3. The parameters of the ML model are updated using a back propagation algorithm
This training phase is then repeated many times until the error is less than the acceptable bound that was set.

1.4 Neural Networks


Neural Networks by definition are general purpose function approximators. Neural Networks are currently one of
the best machine learning approaches to take, mainly due to the hardware available and numerical computation
has now became very cheap. Being a breakthrough, NNs have become the hottest field in AI. Due to this, many
beginners and enthusiast are learning how to develop their own NN for their application, however many of the
libraries developed require a strong understanding behind NN and the mathematics involved.

1.5 Methods of Research


I will be researching in several ways which ranges from online courses to articles. I will also be comparing my product
against other existing products in the market and see how it compares. I will also be experimenting with other
machine learning libraries to gain a strong ground on how current machine learning libraries are and comparing
their strengths and weaknesses.

1.5.1 Coursera
I will be using Coursera as it will allow me to gain a better and deeper understanding in machine learning. Currently,
I am taking a course in deep learning which will give me the foundations and skills required to understand neural
networks and also expose me to the other different types of variations of NN such as convolutional networks, recurrent
neural networks and LTSM networks. By being exposed to these new algorithms, I will be able to improve the
functionality of my library making it more flexible to use, as there are constant advancements being made in the
field of AI. Thus by making the library more flexible the user wouldn’t feel restricted in experimenting with new
ideas, this could include, a different cost function, activation function or even a different layer.

1.5.2 Khan Academy


I will be using Khan Academy to learn linear algebra and multi-variable calculus for my neural network course.
This is because there is a lot of linear algebra involved in machine learning such as tensor products and matrix
calculus. Therefore by taking a course in linear algebra, I will be able to deepen my knowledge and develop an
insight into how the algorithms should be implemented.

4
1.5.3 Medium
I will also be using medium as a method of research, as there have been many articles on medium that explain the
different machine learning algorithms. I will be using Medium as it will expose me to the different machine learning
algorithms and approaches people have taken. This will help me to learn from other people’s techniques and the
approaches they took when encountering a problem.

1.6 Current Systems


Due to the current AI explosion, AI libraries have become very common, with the leading library being Tensorflow
made by Google’s brain team for their machine learning research. However, Tensorflow isn’t the only library out
there, others include theano, pytorch, openNN and many others. However, many of these libraries are made for
different purposes and weren’t made for beginners in mind and certainly not for A level students with limited
knowledge of AI.

1.6.1 Tensorflow
Currently, Tensorflow is the leading AI library and has been made especially for researchers and for industrial use.
Tensorflow, being one of the best libraries for machine learning in general however, requires extensive knowledge
from linear algebra to optimisation. This makes it very limiting for many student already as many of these topics
are not studied at A level and all they would want is an easy to use machine learning library that allows them to
experiment with their ML models.
In addition, Tensorflow also lacks in its user-friendliness, as making a simple Neural Network would require a lot
of work. This is because Tensorflow’s main purpose isn’t for NNs but for allowing the user to create a computational
graph that they can experiment with. This means that when creating a simple neural network, the user would need
to set up the dimensions of the matrices and the biases being used, when in fact all that was required was a single
number representing the number of neurons in that layer. After this the user would also need to mathematically
define the cost function being used and also create some placeholders for the training set and the test set. All this
makes it very difficult for the user especially if the user isn’t a confident programmer who wants to experiment with
simple NNs.

1.6.2 Pytorch and Theano


Pytorch and Theano, are also one of leading AI libraries that are being used. However, Pytorch and Theano both
face the same problem of being overly complicated for beginners as they are more focused on intermediate AI
engineers and research teams. Furthermore, Pytorch and Theano are only restricted to python which again puts a
barrier for VB users as they also need to learn a new language, thus extending the barrier for many beginners.

1.7 Target Audience and Clients


My target audience are A level students that are passionate about AI, but have a very limited amount of knowledge
regarding AI. Therefore, I have interviewed many students and asked for their opinion regarding my initial design
of the library.

One student, Nitish Bala, I interviewed said ”The library should be easy to use by making sure many of the
parameters such as the initial weights should already be defined and creating a NN shouldn’t require someone to
know all the maths behind it. The library should also make it easy for users to add convolutional layers and define
their own layers which can also be trained by a pre-defined gradient descent optimiser.”

5
Another student, Basim Khajwal, said ”The library should allow the user to create their own back propagation
algorithms and also allow the user to experiment with different models. The library should be easy to use by limiting
the amount of setting up required by the user such as matrix sizes and defining cost functions from definition.
Finally, users should be allowed to create many NN at the same time to compare the performance of one model to
another.”

A third student, Taha Rind, said ”The library shouldn’t be too complicated and that experimenting with different
parameters should be easy to do. Adding a dense layer should at minimum require the user to input the number of
neurons in a layer and layer activation and there should be some advanced optimisation algorithms to train these
dense networks such as momentum. Finally, the library should also allow the user to see the parameters learnt and
the gradients of the dense layers in any layer.”

A fourth student, Mujahid Mamaniat, I interviewed said ”The library must offer easy manipulation of matrices by
making sure many of the functions required are already implemented such as rotation, reshaping and adding/removing
a column in a matrix. The library must also further allow the user to manipulate with images in RBG format, which
may be done through the use of lists of matrices. Finally, making a neural network should require little effort and
should not be difficult to make.”

Finally, a fifth student, Jamie Stirling , I interviewed said ”The library should have extra focus on dense-nets as
many beginners do not understand how conv-layers work. I remember when I tried experimenting with my first ML
library. It was difficult to use as it supported many different types of layers which over complicated it, as I didn’t
understand most of what it offered, as all I knew about was dense nets. Therefore, making a dense net should be
incredibly easy as that is all many beginners in AI know about. One way in which it could be made easy would be by
setting many of the parameters optional such as the initial weights. By having many optional parameters, I think
the user would worry less about whether they have implemented the net correctly.”

From these interviews it is clear that my target audience are looking for an easy to use machine learning library
that allows them to experiment easily with different configurations of Neural Networks. I will be focusing more on
dense-nets as it is clear that many beginners do not have a wide knowledge regarding NNs. Therefore, I will try
to make it as easy as possible to make a dense net, and also add some extra functionality to dense nets, making
it more functional for the user to work with. These extra functionality will include allowing the user to view the
gradients of the dense-layers, thus enabling users to view the change in gradients as the network learns. This will
allow the user to develop an insight into NNs, thus easing the way for beginners in AI. Furthermore, I will also be
including conv-layers as many students will quickly learn the basics of NNs and would want to move on to complex
data such as images or sound. Therefore, including conv-layers will allow users to experiment with all kinds of
data. However, my main focus will be on dense nets as many beginners would not have the necessary skills to use
conv nets and would just want to experiment with the dense-layers due to their limited knowledge. Finally, I will
also be interviewing while the making of the project and keep asking users as to which parts of library can still be
improved further.

1.8 Aims & Objectives


It is clear, that although there are many machine learning libraries, many of these are out of reach for A level
students, due to their very limited knowledge of AI and expertise. Therefore, my project will be focused on making
a machine learning library that is focused only on deep-learning and will be made specifically for VB users. By
having a specific target market, I will be able to clearly set my Aims and objectives to fulfil the users needs.

1.8.1 My Core Aims


The core aims that I would like my project to meet are:

6
1. The user can manipulate with matrices by joining two matrices together, splitting a given matrix, iterate
through the columns of a matrix or its values and applying a onehot encoding function to a given matrix
2. The user can multiply, add, subtract and transpose a given matrix.

3. The user can multiply, add, subtract and transpose a list of matrices together, i.e volumes.
4. The user can create their own Dense Neural Networks.
5. The user has an option in which activation function they would like to use and also allow the user to experiment
with their own activation functions.

6. The user can tune the hyper parameters, such as the learning rate, number of neurons in a layer, number of
layers and loss function.
7. The user can train the network using a back-propagation algorithm.
8. The gradient descent algorithms should implement stochastic gradient descent, batch gradient descent as well
as mini-batch gradient descent.
9. The user can view the weights of the network i.e learned parameters of the network
10. The user can view the gradients for a specific layer in a dense-net, given the gradients for the layer above.
11. The user can add convolutional layers to their network, which they can tune by changing the hyper parameters
such as the kernel dimensions, layer activation and kernel strides.

1.8.2 Desired Objectives


1. The user can choose from a wide range of optimisation algorithms such as momentum, RMS and Adam to
train their dense networks
2. The user can define their own back propagation algorithms to train their neural networks
3. The user can define their own layers

7
Design

2.1 Overview
For the aims to be met, it is important that we have a strong idea about what will be needed to be done in order
to meet the objectives set. In this section, we will be discussing the core algorithms, core data-structures and also
the user-interface of the library. The library will now be referred to as the NeuralDot library.

2.1.1 Algorithms
The main algorithms that will be used in the making of the NeuralDot library is as follows:
1. Neural Networks

2. Optimisation
3. Matrix Operations
I will be discussing each of these algorithms in detail separately along with their respective pseudo-codes

2.1.1.1 Neural Networks


There are many variations of neural networks each with its own purpose, however the main type of neural networks
I will be focusing on will be feed-forward neural networks and convolutional neural networks. Feed-forward NNs
are the most basic type of NNs and also the backbone for many other variations there are.

Throughout this paper, we will be using the standard notation to avoid any unnecessary confusion

nx : Input size
ny : Output size
L : Number of layers in Neural Network
[l]
nh : Number of hidden units in layer l
m : Number of examples in data set
X ∈ Rnx ×m : Input matrix, can also be referred to as a[0]
x(i) ∈ Rnx : ith example represented as a column vector
Y ∈ Rny ×m : Label matrix for X
y ∈ Rny : Output label for the ith example
(i)
l−1 l
W l ∈ Rnh ×nh : Weight matrix in layer l
[l]
b[l] ∈ Rnh : Bias vector in layer l
[l]
z [l] ∈ Rnh : Product vector in layer l
g [l] : Activation function in layer l
[l]
a[l] ∈ Rnh : Activation vector in layer l
ŷ ∈ R : Predicted output vector. Can also be denoted as a[L]
ny

8
Here is an example of a simple Neural Network, with the notation included

Figure 2.1: Neural Network Example

This network is called a 2 layer network, as it has 1 hidden layer and one output layer.

2.1.1.1.1 Forward Propagation


Forward Propagation, also referred to as inference, is used to make a prediction given a set of inputs. The inference
process for a specific layer in a neural network works as follows:
1. The inputs are multiplied by the corresponding weights for that layer and then finally a bias is added, which
results with the corresponding z [l] for that layer.
2. Once the z [l] has been calculated for that layer, the activation’s are then calculated using the corresponding
g [l] for that layer, which results in a[l] .

This process is then repeated throughout every layer until the final layer where the error is calculated. Using the
notation we have established, we can now write this mathematically as:

a[l] = g [l] (z [l] )


where z [l] = W [l] a[l−1] + b[l]
In pseudo-code this may be written as:

Algorithm 1 Dense Forward Propagation Algorithm


1: net ← Network being used to make prediction
2: x ← mini-batch
3:
4: for each layer in net do
5: x = M atrix.matmul(layer.w, x) + layer.b
6: x = layer.act(x)
return x

Forward propagation for the convolutional layers is the same to that of the dense layers, the only difference
being is that instead of matrix multiplication being the operator, the operator is now cross-correlation with both
the 3d volumes. In pseudo-code this can be now written as:

9
Algorithm 2 Convolution Forward Propagation Algorithm
1: net ← Network being used to make prediction
2: x ← mini-batch
3: stridesx ← User Defined
4: stridesy ← User Defined
5: padding ← User Defined
6:
7: for each layer in net do
8: x = V olume.conv2d(layer.f ilter, x, stridesx, stridesy, padding) + layer.b (Applying 2d convolution, using
x as input and filter as the kernel, with strides = (stridesx, stridesy) )
9: x = layer.act(x)
return x

2.1.1.1.2 Back-propagation
Back-propagation, is another algorithm with many variations which will be discussed further in the section 2.1.1.2.
Although there are many techniques for back-propagation they all have the same goal; reduce the error of the
network. Therefore the essence of back propagation is to find the correct weight matrix that approximates a given
function to an appropriate degree of accuracy in a given interval. Back-propagation, is a recursive algorithm that
uses memoization to back-propagates throughout the network to find the derivative of each weight w.r.t (with
respect to) a cost. The cost being used, will be set before training and is different for each task. The notation that
will be used in explaining back-propagation will be:

E : denotes the error/cost function begin used


α : denotes the learning rate used for back-prop
∂E
δ l : denotes the derivative of the bias matrix bl with respect to the cost function i.e δ l = ∂bl
· : denotes the hadamard product

The equations, that will be used for back-propagation for a Dense layer will be:
δ l = ((W [l+1] )T δ [l+1] ) · g 0[l] (z [l−1] )
∂E
∂W l
= δ l (a[l−1] )T
∂E
∂bl
= δl
In summation form this may be written as:

δl = l+1
P
m δm wm,k

The equations, that will be used for back-propagation for a Conv layer will be:
l+1 0 l
δ l = δx,y
l+1
· rot180◦ (wx,y )f (ax,y )
∂E
∂W l
= δx,y · f (rot180◦ (ol−1
l
x,y ))
x,y
∂E
∂bl
= δl
In these equations, w denotes a Volume, i.e a lists of matrices with equal dimensions.
In summation form the gradient of the error w.r.t wl , in a conv net will be:
1 −1 kX
kX 2 −1
l l+1 l+1 0
δ = rot 180◦ { δi+m,j+n wm,n f (xi,j )}
m=0 n=0

Finally, the corresponding updates for a layer in the net will be:
∂E
W := W − α ∂W
∂E
b := b − α ∂b

10
These updates will be applied to every layer in the network on each iteration.
These equations, can be easily derived by extending the chain-rule to matrix multiplication, i.e by using the
T
lemma ∂x∂x a = a
In pseudo-code this may be written as:

Algorithm 3 Back-propagation Algorithm - Dense Layer


1: net ← Network being trained
2: x ← mini-batch
3: y ← labels for mini-batch
4:
5: ŷ = f orewardP op(x)
6: error = ŷ − y
7: db = [error × net.gradact(net.last)] (db is a list containing the gradient w.r.t the biases for each layer)
8: dw = [net.layer(−2) × db.last()] (net.layer is a list containing the gradient w.r.t the weight matrices of each
layer)
9:
10: for each layer in net step -1 do
11: db.add(matrix.matmul(layer.w.T, deta.last) · net.gradact(layer)) (Adding the gradient of the loss w.r.t
b[layer] )
12: dw.add(layer.act, db.last) (Adding the gradient of loss w.r.t W [layer] )
return dw, db in the network

These back-propagation algorithms require a sample of the training data. If each image is used for one back
propagation, this is known as stochastic gradient decent, whereas if every image is used in the training sample,
then this is known as batch-gradient descent. Batch-gradient descent is more prone to local minimas, however it
can take less time to train the network, whereas stochastic gradient descent can take more time but is less prone to
locals minimas. Therefore, it is important that the user selects something in between by splitting the training data
into different batches each. This is known as mini-batch gradient descent. Therefore, stochastic gradient descent
and batch-gradient descent are just special cases for mini-batch gradient descent when mb = 1 and mb = m where
mb is the mini-batch size, respectively.

2.1.1.1.3 Loss-Functions
The back-propagation algorithms in section 2.1.1.1.2 relied on a loss function. The loss function, also referred to as
the cost function, measures how bad the network is performing. Therefore, the goal of the network is to minimise
the loss function, which is done through the use of the back-prop algorithm. This allows the network to learn from
the data by minimising the cost function, thus a decreasing loss function implies that the net is learning from the
data and a converging loss function implies that the learning is slowing down possibly due to the net having learnt
the data, or reaching a global/local minimum. There are a wide array of cost function that are available, however
the two most commonly used are: soft-max and squared error.
Minimising a function, is a classic problem in calculus and relies upon the derivative of the function. Furthermore,
the back-propagation algorithm in section 2.1.1.1.2 relies upon the derivative of the loss function. Therefore, it is
necessary to be able to compute the derivative of the cost function as it is used to back-propagate throughout the
network.

2.1.1.1.3.1 Squared-error
The squared-error loss function measures the average of the sum of the squared error for each data-point.
In mathematical form, the squared-error and its derivative may be written as:
1
Pm
E = 2m (ŷ − y)2
dE
dŷ = ŷ − y

11
These results can be proved by using the chain-rule, i.e using the result dy dy dx
dt = dx dt
The squared error cost function is commonly used for regression tasks, image-recognition and also used to train
many other type of ML models besides neural networks.

2.1.1.1.3.2 Soft-max cost function


The soft-max function, unlike the squared error is most commonly used for multiclass classification. Multiclass
classification unlike binary classification, where the data has to belong in one of 2 sets, enables the model to classify
objects into more than 2 groups. This is useful for recognition system, where the model is looking out for many
different objects at the same time.
Unlike the squared error, softmax works very differently compared to the squared error. The softmax loss
function is usually composed of a loss function and an activation function, which is the actual softmax activation
function. The loss function being used is usually the cross-entropy loss function.
The softmax activation function works by exponentiating each element in the input vector and then dividing
by the sum of this vector. This results in a vector that represents the probability that a given element at an index
represent the probability of the image belonging to that class.
Using mathematical notation, the softmax activation function and its respective derivative may then be written
as:
zc
ŷ = Pney ezc
c
where z is the product vector and zc represents the cth item in that vector. An important point to note that since
this is the last layer, the length of this vector will be ny
dE
dŷ = ŷ ∗ (1 − ŷ)

Finally, the cross-entropy loss function works by taking in the vector of probabilities(ŷ) of each class and then
applying the natural logarithm to the reciprocal of each element in this vector. Each element is then multiplied by
its corresponding yi which produces a vector, whose elements are then summed up which produces the output of
the cross-entropy loss function.
Using mathematical notation, the cross-entropy activation function and its respective derivative may then be
written as:
Pn
E = c y −yc log(ŷc )
dE
dŷ = ŷ − y
An interesting point to note is that the derivative for the squared error and the cross-entropy loss function is the
same.

2.1.1.2 Optimisers
As discussed in the previous section, we have seen that by using matrix calculus we can formulate an algorithm to
back-propagate through the neural network to find the respective derivatives for each weight matrix. Once these
derivatives have been found, the most basic way to update the weights is as shown in the section 2.1.1.1.2. However,
there are many other alternatives which have shown to work much better. The alternative optimisation methods
that I will also be implementing are:
• Momentum Optimisation

• RMS Optimiser
• Adaptive Moment Estimation (ADAM) Optimiser

2.1.1.2.1 Momentum Optimiser


Momentum is an improvement on the traditional method of back-propagation. The traditional method of back-
∂E
propagation is very susceptible to the problem of local minima. This means that ∂W = 0 and the network stops
learning, even though the error function could still be reduced further.

12
Momentum, is an alternative approach that uses an extra parameter called the ”momentum” term that is used
to calculate the variable ”velocity” on each update. The intuition behind this is that when the gradient is high, i.e
the network is learning, the velocity parameter also increases, and when the learning is slow the velocity parameter
also reduces. This effect allows the network to be less susceptible to local minimas. The pseudo algorithm for
Momentum is as shown in Algorithm 4

Algorithm 4 Momentum
1: α ← User defined, default 0.01
2: β ← User defined, default 0.9
3: T ← User defined Number of iterations
4: t ← 0 (Initialise time step)
5: vdw ← 0 Setting the Momentum term for W
6: vd w ← 0 Setting the Momentum term for b
7:
8: for t < T do
9: for each layer in net do
10: compute ∂E ∂E
∂w , ∂b using back-prop with mini-batch
11:
12: dw = ∂E
∂w
13: db = ∂E
∂b
14:
15: vdw = βvdw + (1 − β)dw (calculating the new momentum term for w)
16: vdb = βvdb + (1 − β)db (calculating the new momentum term for b)
17:
18: W = W − αvdw (Updating weights using the weight momentum term)
19: b = b − αvdb (Updating bias using the bias momentum term)

2.1.1.2.2 RMS Optimiser


RMS optimiser, is very similar to the momentum optimiser. One key difference between momentum and RMS
prop however, is that RMS is not very susceptible to quick gradient changes. This is much better, as very quick
gradient changes can lead to the network overshooting the local minimum and thus diverging. RMS achieves this
by keeping a running average of the magnitude of the gradients and dividing the next gradient by this average so
that the gradient values are approximately normalised. This can lead to much better results and has shown to
out-beet momentum in many cases. The pseudo-code for the RMS optimiser is as shown in Algorithm 5

13
Algorithm 5 RMS Optimiser
1: α ← User defined, default 0.01
2: β ← User defined, default 0.9
3: t ← 0 (Initialise time step)
4: T ← User defined Number of iterations
5: sdw ← 0 RMS term for the weights
6: sdb ← 0 RMS term for the biases
7:
8: for t < T do
9: for each layer in net do
10: compute ∂E ∂E
∂w , ∂b using back-prop with mini-batch
11:
12: dw = ∂E
∂w
13: db = ∂E
∂b
14:
15: sdw = βsdw + (1 − β)dw2
16: sdb = βsdb + (1 − β)db2
17:
18: W = W − α √sdw
sw +
(Updating the weights using the normalised gradient values)
19: b = b − α √sdb
sb +
(Updating the biases using the normalised gradient values)

2.1.1.2.3 Adaptive Moment Estimation (Adam) Optimiser


Adaptive Moment Estimation, or Adam for short, is another approach regularly used to minimise the cost function.
Currently, Adam is one of the most used optimises used mainly due to its high-performance in computer vision
problems. Therefore, I will also be implementing the Adam optimisation algorithm. The respective pseudo-code
for the Adam algorithm is as shown in Algorithm 6.

14
Algorithm 6 Adam Optimiser
1: α ← User defined, default 0.01
2: β1 , β2 ∈ [0, 1) Exponential decay rates for moment estimates
3: mdw ← 0 (Initialise 1st moment vector for weight matrix)
4: vdw ← 0 (Initialise 2nd moment vector)
5: mdb ← 0 (Initialise 1st moment vector for bias matrix)
6: vdb ← 0 (Initialise 2nd moment vector)
7: t ← 0 (Initialise time step)
8: T ← User defined Number of iterations
9:
10: for t < T do
11: for each layer in net do
12: compute ∂E ∂E
∂w , ∂b using back-prop with mini-batch
13:
14: dw = ∂E
∂w
15: db = ∂E
∂b
16:
17: mdw = β1 mdw + (1 − β1 )dw
18: vdw = β2 vdw + (1 − β2 )dw2
19:
20: mdb = β1 mdb + (1 − β1 )db
21: vdb = β2 vdb + (1 − β2 )db2
22:
mdw
23: mˆdw = 1−β t
1
mdb
24: mˆdb = 1−β t
1
25:
vdw
26: vdw
ˆ = 1−β t
2
vdb
27: vˆdb = 1−β t
2
28:
29: W = W − α √m ˆdw
v ˆ + dw

30: b = b − α √mˆdb
vˆ + db

2.1.1.3 Matrix Operations


Neural Networks would be impossible without the help of matrices to vectorise their implementation. Therefore, it
is crucial for my project to use matrices for the computationally expensive tasks. Some key algorithms, I will be
using are:
• Matrix Multiplication

• Convolution
• Matrix Transpose
• Basic Matrix Operations

2.1.1.3.1 Matrix Multiplication


Matrix multiplication will be used to propagate throughout a neural network. Being O(n3 ), it is crucial that we
implement it efficiently, otherwise it would mean that the neural network runs incredibly slow when the number of
neurons used increase exponentially. The pseudo-code for Matrix Multiplication is as shown in Algorithm 7.

15
Algorithm 7 Matrix Multiplication
1: A ←User defined Parameter
2: B ←User Defined Parameter
3:
4: n = A.shape(0)
5: m = B.shape(1)
6:
7: C ← Matrix(n, m) ’C will be the resulting matrix of matrix multiplication
8: for 1 ≤ i ≤ n do (Looping over the columns of A)
9: for 1 ≤ j ≤ m do (Looping over the rows of B)
10: Cij = 0
11: for 1 ≤ k ≤ n do (Iterating over rows & columns of A and B, respectively and summing the products)
12: Cij = Cij + (Aik ∗ Bjk )
return C

In mathematical terms this can be neatly written as:


m
X
cij = aik bkj (2.1)
k=1

2.1.1.3.2 Convolution
Convolution is another operation that is widely used in deep-learning. Convolution, or more correctly known as
cross-correlation, is used in image recognition as it allows the network to learn a representation that is equivalent
to translations. This speeds up learning, as a traditional deep-net would require many training iterations for it to
learn this kind of inter-relationship of a given image. The pseudo-code for convolution is as shown in Algorithm 8

Algorithm 8 Convolution
1: M ←User defined Matrix - Input for the convolution operation
2: kernel ←User Defined Matrix
3: stridesx ←User Defined integer
4: stridesy ←User Defined integer
5:
6: hkernel ← kernel.shape(0) (kernel.shape(0) returns the height of the kernel)
7: wkernel ← kernel.shape(1) (kernel.shape(1) returns the width of the kernel)
8: hm ← M.shape(0)
9: wm ← M.shape(0)
10:
−hm wkernel −wm
11: c ← Matrix( hkernel
stridesy + 1, stridesx + 1) (c will be the output matrix for the convolution operation)
12:
13: (The Following code will now use a striding window to stride over the input M using the kernel)
14: for 0 ≤ i ≤ hm step stridesy do
15: for 0 ≤ j ≤ wm step stridesx do
16: cij = dotsum(kernel, m.item[i, i + hkernel , j, j + wkernel ]) (dotsum is a function that multiplies each
matrices element-wise and then sums up the element of the matrix)
return C

In convolutional networks, the input can also have a depth, meaning the convolution operation is applied on
a volume instead of a matrix. The dotsum in this case is the same, but this time the kernel is also multiplied by
each layer in the input to produce a volume, whose elements are then summed up. Furthermore, the kernel being
used can also be a volume. The process is again very similar, with the only difference being that the convolution
operation uses each layer of the kernel and convolves it with the input. This generates a volume as the output of
the convolution operation.

16
In mathematical terms, a convolution operation can be written as follows:

X
(f ∗ g)[n] = f ∗ (m)g[m + n] (2.2)
−∞

2.1.1.3.3 Matrix Transpose


The transpose of a matrix is used to back-propagate throughout a neural network. It is used many times, therefore
an efficient algorithm is required. Therefore, I have decided that instead of transposing an actual matrix, instead
keep track of the state of the matrix using a variable called tState. This means when indexing the ijt h item from
a matrix return the ijt h item if tState is set to ”F alse”, otherwise return the jit h item. This provides an efficient
method of transposing a matrix, as traditional algorithms require transposing a matrix every time the transposed
matrix is required, which is an O(n2 ) operations. Therefore, by using this approach we can solve this problem in
O(1) time.
The corresponding pseudo-code will be:

Algorithm 9 Indexing Matrix


1: m ← User-defined Matrix being used to index item
2: i ← User defined integer (Used to select the ith row)
3: j ← User defined integer (Used to select the j th column)
4:
5: if tState = F alse then
6: return m(i, j)
7: else
8: return m(j, i)

Therefore, the Pseudo-code for transposing a matrix will be:

Algorithm 10 Transposing Matrix


¯ (Inverting the tState boolean variable)
1: tState = tState

2.1.1.3.4 Basic Matrix Operations


Some basic matrix operations that I will be using is as follows:
• Addition and subtraction of 2 matrices
• Element wise Multiplication
• DotSum of 2 matrices
• Applying a function f (x) element-wise to a matrix
• Applying an arbitrary function f (x1 , x2 ) between 2 matrices
• Reshaping a matrix - including reshaping a matrix to a volume
• Cloning a given matrix
• Adding columns and rows to a matrix
• Joining two matrices
• Max-pooling a matrix
• Rotating the items of a matrix

17
• Normalising matrices
The pseudo-code for the reshaping of a matrix is as follows:

Algorithm 11 Reshaping Matrix


1: M ← Matrix being reshaped
2: rows ← number of rows in resulting matrix
3: cols ← number of columns in the resulting
4: i ← Indexing of matrix
5:
6: if matrix.shape(0) × matrix.shape(1) 6= rows × cols then
7: Throw exception(”Matrix dimensions do not conform for reshaping”) (Exception was thrown here, as the
number of elements in both matrices should be the same for reshaping to take place)
8: C ← matrix(rows, cols)
9: for each d in M do (Iterating over each element in the matrix M )
10: Ctrunc(i/cols),(i mod cols)+1 = d (Setting the values of matrix C by placing each element row-wise)
11: i+ = 1
return C

Algorithm 12 2 to 1 function applied element-wise on 2 matrices


1: X ← Matrix the function will be applied on
2: Y ← Matrix the function will be applied on
3:
4: f (x, y) ← function that will be applied to both matrices
5: C ← Matrix that will be returned
6: i=0
7: if X.shape <> Y.shape then
8: Throw exception(”Matrix dimensions do not conform”) (Exception was thrown here as both matrices must
have same dimensions for this function to be applied)
9: for each dx , dy in X,Y do (looping over each element in matrix X and Y )
10: Ctrunc(i/cols),(i mod cols)+1 = f (dx , dy ) (Output of the function is assigned to the matrix C, which is being
filled row-wise)
11: i+ = 1
12: return C

Algorithm 13 1 to 1 function applied element-wise on matrix


1: X ← Matrix that function will be applied
2:
3: f (x) ← function that will be applied to both matrices
4: C ← Matrix that will be returned
5: i=0
6: for each dx in X do (looping over each element in matrix X)
7: Ctrunc(i/cols),(i mod cols)+1 = f (dx ) (Output of the function is assigned to the matrix C, which is being filled
row-wise)
8: i+ = 1
9: return C

One crucial aspect of machine learning is data processing. This is extremely important as the data needs to
be processed properly for the ML models to work well. Therefore, I will be adding as much functionality as I can

18
to allow the user to process their data as much as possible. This will include functions such as remove cols/rows,
rotating matrices, inversing the oneHot operation on a matrix, padding a matrix and many other functions to make
the manipulation of data easy for the user. These functions will come extremely handy when the user is dealing
with images as data for conv-nets, which will require volumes, and many times the data is in RGB format, which is
essentially a volume of depth 3. Therefore, if the user would wanted to manipulate and process their data to make
the training of the net as easy as possible, it is important that all the most used functions are all predefined, as the
most important aspect of my library is to make machine learning as easy as possible. Therefore, I will be one step
closer to achieving this goal, as the user will not need to define these functions which could be a daunting tasks
especially for beginners thus putting off many people before even getting their hands on the machine learning stuff.
Furthermore, from the pseudo-code, for the back-propagation and forward propagation algorithms, it is clear that
I will be using many of these matrix operations.

19
Mapping
2.2
<<Interface>>
Net Tensor

//MustOverride netData
+ print()
+ transposeSelf()
+ clone(): Tensor
Optimizer + normalize(mean: double = 0, std: double = 1) as Tensor
+ getshape(): List(int)
Layer(Of Tensor)

Matrix Volume

- values:double(,) + values:List(Matrix) {readOnly}


- shape:Tuple(int, int) + shape:Tuple(int, int) {readOnly}
- tState:boolean = False

//Constructors //Constructors
+ Matrix(initial_values : double(,)) + Volume(initial_values:List(Matrix))
+ Matrix(rows : int, cols : int) + Volume(h:int, w:int, d:int, val:int = 0)
+ Matrix(rows:int, cols:int, mean:decimal, std: decimal) + Volume(h:int, w:int, d:int, mean:double, std:double)
Conv(Volume)
+ Matrix(rows:int, cols:int, value:decimal)
//Accessors
//Accessors + item(i: int, j: int, k: int): double
+ print(MatList: Ienumerable(Of Matrix) + split(i_start: int, i_end: int, j_start: int, j_end: int, k: int): Matrix
+ item(i: int, j: int): double + split(i_start: int, i_end: int, j_start: int, j_end: int): Volume
+ item(i_start: int, i_end: int, j_start: int, j_end: int): Matrix
Data-Structures & Diagrams

//Operations
//Operations + arange(n: int, h: int, w: int, d: int): Volume
MaxPool(Volume)

20
+ arange(rows: int, cols: int): Matrix + rotate(v: Volume): Volume
+ reshape(rows: int, cols: int): Matrix + op(x:Volume, f: func(Matrix, Matrix), y: Volume): Volume
+ get_values(): double + conv(v: Volume, kernel: Volume, stridesx: int = 1, stridesy: int = 1, padding: string = "valid") : Volume
+ randn(i: int, j: int): Matrix + conv2d(v: Volume, kernel: Volume, stridesx: int = 1, stridesy: int = 1, padding = "valid"): Volume
+ transpose_self() + maxpool(filter: Volume, kernely: int, kernelx: int, stridesy: int, stridesx: int): Volumes
+ transpose(): Matrix + op(v: Volume, f: func(Matrix, Matrix, Matrix), m: Matrix): Volume
+ transpose(): Matrix + op(m: Volume, f: func(Matrix, Matrix, Matrix), v: Matrix): Volume
+ remove_index(m: Matrix, row: int, axis: int): Matrix + op(v: Volume, f: func(Matrix, double, Matrix), y: double): Volume
+ conv(m: Matrix, kernel: Matrix, stridesx: int = 1, stridesy: int = 1, padding: string = "valid"): Matrix + op(f: func(Matrix, Matrix), v: Volume): Volume
+ padd(m: Matrix, i: int, j: int): Matrix + op(f: func(double, double), v: Volume): Volume
+ addcol(m: Matrix, index: int, n: int, axs: int): Matrix + transpose(): Volume
+ join(m: Matrix, n: Matrix, index: int): Matrix + rotate(theta: int): Volume
+ split(m: Matrix, row: Tuple(Matrix, Matrix), cols: Tuple(Matrix, Matrix)): Matrix + cast(v: Volume, rows: int, cols: int): Matrix
+ maxpool(kernelx: int, kernely: int, stridesx: int, stridesy: int): Matrix + cast(m: Matrix, rows: int, cols: int): Volume
+ max(): double + cast(matrixList: List(Matrix), rows: int, cols: int): List(Volume)
+ rotate(m: Matrix, theta: int = 1): Matrix + mean(axis: int): Matrix
+ oneHot(num_classes: int): Matrix + j_rotate(v: Volume): Volume
+ invOneHot(): Matrix
+ broadcast(m: Matrix): Matrix //Iterators
+ op(f: Func(double, double), m: Matrix): Matrix + subvolume(v: Volume, kernelx: int, kernely: int, stridesx: int, stridesy: int): IEnumerable(Volume)
+ op(f: Func(double, double), m: Matrix, n: Matrix): Matrix + Items(v: Volume): IEnumerable(Matrix)

Figure 2.2: Class Diagram for NeuralDot Library - Tensors


+ sum(m: Matrix, axis: int): Matrix + Items(): IEnumerable(Tensor)
+ sum(m: Matrix): Matrix
+ AbsSum(m: Matrix): Matrix
+ SquaredSum(m: Matrix): Matrix
+ mean(m: Matrix, axis: int)
+ exp(m: Matrix): Matrix
+ add(x: Matrix, y: Matrix, broadcast: boolean = True): Matrix
+ add(x: Matrix, y: Matrix)
+ matmul(x: Matrix, y: Matrix): Matrix Reshape(Matrix)
+ multiply(m: Matrix, c: decimal): Matrix
+ multiply(x: Matrix, y: Matrix): Matrix
+ dotsum(x: Matrix, y: Matrix): Matrix
+ norm(mean: Decimal, std: Decimal)
+ tensor_print(vollist: IEnumerable(Tensor))
Dense(Matrix)
//Iterators
+ submatrix(m: Matrix, kernelx: int, kernely: int, stridesx: int, stridesy: int): IEnumerable(Matrix)
+ val(m: Matrix, stepx: int = 1, stepy: int = 1): IEnumerable(double)
+ Items(m: Matrix): IEnumerable(Tensor)
+ columns(): IEnumerable(Tensor)
<<Layer(Interface)>>
Tensor(Out T: Tensor)

+ parameters(): List(Tensor) {readOnly}


Tensor + f(x: Tensor): T
+ clone(): Layer(T)
+ update(learning_rate: decimal, _prev_dx: Tensor, param(): Tensor): Tensor
+ update(learning_rate: decimal, _prev_dx: Tensor): Tensor
+ deltaUpdate(deltaParams: Tensor)

Conv(Volume) MaxPool(Volume) Reshape(Matrix) Dense(Matrix)

21
+ act: Mapping {readOnly} + kernelx, kernely, stridesx, stridesy: int {readOnly} + act: Mapping {readOnly}
- x_in, filter, z, b, a: Volume - x_in: Volume - x_in: Volume - x_in, w, z, b, a: Matrix
+ filters_depth, kernelx, kernely, stridesx, stridesy: int {readOnly} + units, v_rows, v_cols, v_depth: int {readOnly} + units: int {readOnly}
+ padding: string {readOnly}
// Constructors
// Constructors
+ MaxPool(_kernelx: int, _kernely: int, _stridesx: int, _stridesy: int) // Constructors
// Constructors + Dense(prevUnits: int, layerUnits: int, layerAct: Mapping, mean: double, std: double)
+ Reshape(_v_rows: int, _v_cols: int, _v_depth: int)
+ Conv(_filters_depth: int, _kernelx :int, _kernely: int, _stridesx:int, _stridesy: int, _padding:
string, _act: Mapping, mean: double, std: double) // Operations
+ gradient(prev_dx: Matrix, param() Tensor): IEnumerable(Matrix)
+ gradient(prev_dx: Matrix): IEnumerable(Matrix)

Matrix
Volume

Volume
Matrix

Figure 2.3: Class Diagram for NeuralDot Library - Layers


Optimizer

+ model:net {readOnly}
+dataxy: IEnumerable(Tuple(Tensor, Tensor)) {readOnly}
+ iterations: int = 0

// Constructors
+ Optimizer(_net:Net, xydata: IEnumerable(Tuple(Tensor, Tensor))) Net

// MustOveride
+ run(learning_rate: decimal, printLoss: Boolean, batchSize: int, param(): decimal): List(Tensor) Tensor
+ resetParameters
+ calculateCost: Matrix
+ splitdata(batchSize: int): list(IEnumerable(tuple(Tensor, Tensor)))
+calculateGradients(xypoint: IEnumerable(tuple(Tensor, Tensor))) : tuple(list(Matrix), list(Matrix))

AdvancedOptimisers GradientDescentOptimizer

22
// Constructor
+ GradientDescentOptimizer(_net: Net, xydata: IEnumerable(Tuple(Tensor, Tensor)))
// Constructor
+ AdvancedOptimisers(_net:Net, xydata: IEnumerable(Tuple(Tensor, Tensor)))

Momentum AdamOptimizer RMS

+v_dw: list(Matrix) +v_dw: list(Matrix) +s_dw: list(Matrix)


+v_db: list(Matrix) +v_db: list(Matrix) +s_db: list(Matrix)
+s_dw: list(Matrix)
+s_db: list(Matrix)
+ Momentum(_net:Net, xydata: IEnumerable(Tuple(Tensor, Tensor))) // Operations
+ RMS(_net:Net, xydata: IEnumerable(Tuple(Tensor, Tensor)))
// Operations
+ AdamOptimizer(_net:Net, xydata: IEnumerable(Tuple(Tensor, Tensor)))

Figure 2.4: Class Diagram for NeuralDot Library - Optimisers


Mapping
netData
+ act(x: Matrix):delegate function: Matrix
Net + loss(x: Matrix, y: Matrix): Matrix + data: IEnumerable(tuple(Tensor, Tensor)) {readOnly}
# f_, d_ + xdata, ydata: IEnumerable(Tensor) {readOnly}
+ netLayers:stack(Layer(Tensor)) {readOnly} + linear: Mapping
+ checkpoints: list(tuple(stack(layer(Tensor)), datetime)) {readOnly} + relu: Mapping
+ features: int {readOnly} //Constructors
+ sigmoid: Mapping + netData(datax: IEnumerbale(Tensor), datay: IEnumerable(Tensor))
+ loss: Mapping {readOnly} + tanh : Mapping
+ softmax_act: Mapping // Operations
// Costructors
+ swish: Mapping + join(datax: IEnumerable(Tensor), datay: IEnumerable(Tensor)): IEnumerbale(tuple(Tensor, Tensor))
+ Net(depth: int, loss_function: Mapping)
+ squared_error: Mapping + oneHot(numClasses: int): IEnumerbale(Matrix)
+ softmax: Mapping + normalise(mean: double, std: double): netData
// Operations
+ AddDense(units: int, act: Mapping, mean: double = 0, std: double = 1) + toConvNetData(xfileName: string, yfileName: string, rows: int, cols: int): netData
+ AddConv(filters: int, kernelx: int, kernely: int, stridesx: int, stridesy: int, act: //Constructors + toMatrix(fileName: string): list(Matrix)
Mapping, padding: string = "valid", mean: double = 0, std: double = 1) + Mapping(_f: loss, _d: loss)

23
+ AddMaxPool(kernelx: int, kernely: int, stridesx: int, stridesy: int) + Mapping(_f: act, _d: act)
+ addReshape(v_rows: int, v_cols: int, v_depth: int)
+ predict(x: Tensor): Tensor + f(x: Matrix): Matrix
+ predict(x: IEnumerable(Tensor)): IEnumerable(Tensor) + d(x: Matrix): Matrix
+ save() + f(x: Matrix, y: Matrix): Matrix
+ laod(check_point: int): Net + d:(x: Matrix, y: Matrix): Matrix
Tensor
- sigmoidAct(x: Matrix): Matrix
- sigmoidDerivative(x: Matrix): Matrix

Optimizer
Tensor
Tensor

Figure 2.5: Class Diagram for NeuralDot Library - Net

functions and iterators, I have also added shared operators to both classes such as +, -, *, and /. These operators
subroutines that the user may use and the same applies for the Volume Class. In addition to the sub-routines,
layers will be of type Volume, and some of Matrix. The class matrix includes all the functions, iterators and
generic base class of Tensor. Therefore, it is necessary to have volume and matrix inherit from Tensor, as some
and Volume, as both classes have functions in common. This base class is also necessary, as the layers class is a
The figures above shows the UML for the NeuralDot library. The class Tensor, is a base class for the class Matrix
makes using matrices and volumes more accessible and intuitive for beginners making the library more user-friendly
to work with.

The Layer class is a generic interface of type Tensor. The classes Dense, Conv, MaxPool and Reshape
all extend from Layer, as they all have the same functions and are all layers. The class Layer includes all the
main functionalities required by any specific layer such as updating the layer, retrieving the parameters and so on.
Besides the class Dense, every other class that implements Layers, do not have any other functions besides the
ones they override from the base class. The class Dense, which implements layer(Of Matrix), has 2 extra func-
tions called gradient each which return the gradient for a particular back-prop iteration given the previous layers
gradients, with the only difference being that one of them only works if it is the final layer in the net. These extra
functions are there so that the advanced optimisation methods can then be used to train these dense nets, allowing
the user to gain an intuition onto which optimisation method works best for which particular architecture or data.
Furthermore, these extra functions also allow the user to view the gradients for a particular training iterations,
which is something the user asked for in the interview process, therefore I have added these extra functions to the
Dense class only as my main focus was on Dense nets.

Finally, the class Net adds functionality to the network enabling users to add layers to their net and choose
the loss function being used as well as the activations at each layer. Another functionality, I added to the Net class
is that I allowed the user to save models in a list, enabling the user to have many models. This is so that the user
can then experiment with different architectures, optimisation techniques and loss functions and then compare the
impact each has on the overall outcome on the Net. After saving these models, the user can then load the net back
from the list, called checkpoints, and then re-use this loaded Net.

2.3 User-Interface
Due to my program being a library, I will not be having a proper user-interface. However I will be making my
library as easy as possible to use as this is very clearly stated by my clients.
The main method in which I will be interacting with my users are through exception handlers and comments I
placed in my code. An important point to note about exception handlers however, is that there are 2 types. Ones
that I purposefully placed in my code, and the other are because of the vb compiler finding error in my code or
the users code. The second type of errors are something I will need to keep to a minimum, as the user will not
know where the error is specifically. I will achieve this by thoroughly testing out my library and taking care of any
overflow problems that could occur anywhere within my library.
On the other hand, exceptions written by myself, will provide the user with where they went wrong and details of
how they can avoid it. These exceptions, will make the library more accessible for the user as they know exactly
where they went wrong.
Finally, comments can also be used as a way to guide the users upon how certain algorithms work or what certain
variables or parameters represent. This will allow the user to feel less restricted when using my library as they will
know exactly what each function does at each stage.

24
Technical Solution

In this section, I will be explaining some of the advanced algorithms used in my project and also some of the base
classes that were used throughout my project.

25
3.1 Base Classes Used

Base Class Imple- Reason for Implementation


mented
Tensor The class Tensor is an abstract class as it is never instantiated. The class Tensor
is inherited by the classes Volume and Matrix as both share functions and sub-
routines in common. In addition to this, the class Tensor is an abstract class as
the class Layer is a generic class of type Tensor, allowing its derived classes to be
of derived classes of type Tensor. Furthermore, functions in other classes such as
netData also use both matrices and volumes implicitly, therefore, it is necessary to
have a base class that both Volume and Matrix can inherit from.
Layer The class Layer, unlike Optimiser and Tensor is an interface meaning no functions
and subroutines are defined but instead gives a list of functions or sub-routines that
must be defined in the classes that extend from it. This interface is then extended by
the Dense, Conv, Reshape and Maxpool classes. This saved time, as they had
the same functions and therefore the interface Layer provided a template for creating
these classes. Furthermore, this interface is a covariant generic interface. The reason
for this is that the class Net, stores a stack of layers defined by the user. This is
an example of composition that was used in my project. Therefore, if the interface
Layer, was not covariant the derived classes, such as Dense, Conv, Reshape and
Maxpool would not be able to be pushed onto the stack as this would result in a
run-time error. Therefore, to solve this issue, it was necessary to make the interface
Layer a covariant generic interface instead. Furthermore, having layer as a generic
interface makes intuitive sense as well, as the specific parameters for a particular
function in a layer will depend upon the type of layer it is, as this will then dictate
the type of the output from the function. Finally, one of my key extended objectives
was enabling the user to create their own layers. Therefore, by having an interface
called layer, the user can easily define their own layers by extending from the Layer
interface. This allows easy creation of new layers, without having to change all the
other classes just to integrate the new layer defined.
Optimiser The Optimiser class is a MustInherit class that is inherited by GradientientDe-
scentOptimiser and AdvancedOptimiser. This class includes all the necessary
functions a back-propagation algorithm may require. This makes it easy for users
to create their own back-propagation algorithms, which was one of my extended ob-
jectives. Furthermore, the class also includes some key functions such as splitdata,
resetParameters, calculateCost and CalculateGradients. These functions are used by
the derived classes in their back-propagation procedure, by communicating with the
base class. Furthermore, the class Optimiser has a MustOverride function called run,
which is where the actual back-propagation procedure occurs and is used to train the
net. By having this abstract class, the user can easily create their own optimisation
algorithms by defining the run procedure of their optimisation class.
Advanced Opti- This base class was used as the optimisation methods, Adam, Momentum and
miser RMS only work on dense nets. Therefore, it was necessary to have a base class
that can avoid these optimisation methods being used to train the conv nets or other
user-defined nets. Finally, if the user wanted to create an optimisation algorithm for
dense nets, the user could inherit from the AdvancedOptimiser class instead of
inheriting from the Optimiser class.

Table 3.1: Base Classes Used

Below is the following code used to create the 3 most important base classes in my project:

26
Public Interface Tensor
'The base class tensor, will be inherited by the Volume and the Matrix class.
'Also both Volume and Matrix are tensors and will both have functions that are in common.

Sub print() 'The sub print is used to print out the values of the Tensor. This is nessesary
,→ that every child inherits this as the user may want
'to see all the values the Tensor holds
Sub transposeSelf() 'This subroutine transposes the Tensor. This is a useful operation as
,→ Transpose is used many times in deep-nets, especially for back-prop

Function clone() As Tensor 'This function will be used by all Tensors, when cloning every
,→ layer. This clone function returns the exact same Tensor, with
'the same values and same state.
Function normalize(Optional ByVal mean As Double = 0, Optional ByVal std As Double = 1) As
,→ Tensor 'This function will be used to normalise the values
'in a matrix.
Function getshape() As List(Of Integer) 'Function returns the shape of the Tensor. Function
,→ returns a list as the tensors can have an arbitrary number
'of dimensions.

End Interface

Imports NeuralDot

Public Interface Layer(Of Out T As Tensor)


'This base class Layer will be inherited by Dense, Conv, MaxPool and Reshape
'This base class will hold the common function that all thses layers will use
'This is a generic class of type Tensor, as all layers must be of type Tensor

ReadOnly Property parameters As List(Of Tensor)


'The parameter property will be in common for all layers as all layers will need to output
,→ the variables they are storing
'This property is useful for back-prop and debugging

Function f(ByVal x As Tensor) As T


'Function will be used to forward propagate through a layer
Function clone() As Layer(Of T)
'Function will be used to clone a layer. This is useful when saving a model as all layers
,→ will need to be cloned when saving a model
Function update(ByVal learning_rate As Decimal, ByVal prev_delta As Tensor, ByVal
,→ ParamArray param() As Tensor) As Tensor
'This function is used when a layer depends upon the prevous layers parameters. Therefore,
,→ this function is a MustInherit, as when the user defines their
'own functions they may need to use this depending upon how forward propagation works
,→ within the layers

Function update(ByVal learning_rate As Decimal, ByVal prev_delta As Tensor) As Tensor


'This function updates the parameters usisng prev_delta which is the gradient of loss
,→ function w.r.t the parameters

27
'The applicability of this function depends upon how the forward propagation works in this
,→ layer.

Sub deltaUpdate(ByVal ParamArray deltaParams() As Tensor)


'This sub-routine updates the parameters that are being trained, using deltaParams as the
,→ respective gradient via backprop
End Interface

Public MustInherit Class Optimizer


Public ReadOnly model As Net, dataxy As IEnumerable(Of Tuple(Of Tensor, Tensor))
Public iterations As Integer = 0, losses As New List(Of Tensor) 'The list losses will store
,→ all the losses for every iteration
'The variable Model stores the Net being trained by referenced. This means when an update
,→ occurs the net is updates
'dataxy stores the data that will be used to train the net

Public Sub New(ByRef _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
model = _net
dataxy = xydata
End Sub

MustOverride Function run(ByVal learning_rate As Decimal, ByVal printLoss As Boolean, ByVal


,→ batchSize As Integer, ByVal ParamArray param() As Decimal) As List(Of Tensor)
'This function "run" is MustOverride, as every optimiser must have a method to train the
,→ net using a mini-batch.
'There is no need to hardcode batch or stoachastic gradient descent as they are just
,→ special cases of mini-batch gradient descent, i.e batchsize = m, batchsize = 1,
,→ respectively.
'The iterations variable describe the number of training iterations, used to train the net
'If printLoss = True, then the loss is printed out on each training epoch
'param() denotes the parameters that will be used by the optimiser in traning the net
'After each training epoch, the error will be stored in a list, which is returned by this
,→ function

MustOverride Sub resetParameters() 'This sub-routine resets the parameters being used to
,→ train the net. This includes the iterations variable.

Public Function calculateCost(ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor))) As


,→ Matrix
Dim temp As New Matrix(1, 1)
For j As Integer = 0 To xydata.Count - 1
temp += model.loss.f(model.predict(xydata(j).Item1), xydata(j).Item2)
Next
Return temp / xydata.Count
End Function
'This function returns the average cost of the net using the current weights

Public Function splitdata(ByVal batchSize As Integer) As List(Of IEnumerable(Of Tuple(Of


,→ Tensor, Tensor)))
Dim batchdata As New List(Of IEnumerable(Of Tuple(Of Tensor, Tensor))) 'This list will
,→ store all the data for the seperate batches
For batchNum As Integer = 0 To dataxy.Count / batchSize - 1

28
Dim temp As New List(Of Tuple(Of Tensor, Tensor)) 'The Temp list will store the
,→ examples for a particular batch for the gradient descent
For n As Integer = 0 To batchSize - 1
temp.Add(dataxy(batchNum * batchSize + n))
Next
batchdata.Add(temp.AsEnumerable)
Next
Return batchdata
End Function 'This function will organise all the data examples into seperate batches for
,→ minibatch gradient descent

Public Function calculateGradients(ByVal xypoints As IEnumerable(Of Tuple(Of Tensor,


,→ Tensor))) As Tuple(Of List(Of Matrix), List(Of Matrix))

Dim pred As Tensor


Dim errors As New List(Of Matrix)
For Each point In xypoints
pred = model.predict(point.Item1)
errors.Add(model.loss.d(pred, point.Item2) * model.netLayers.Peek.parameters.Last)
Next
Dim deltas As New Stack(Of Tensor)
deltas.Push((New Volume(errors) / dataxy.Count).mean(2))

Dim d As IEnumerable(Of Matrix) = DirectCast(model.netLayers(0),


,→ Dense).gradient(deltas.Peek)
Dim dw As New List(Of Matrix)({d(0)})
Dim db As New List(Of Matrix)({d(1)})
For layer As Integer = 1 To model.netLayers.Count - 1

Dim dlayer As IEnumerable(Of Matrix) = DirectCast(model.netLayers(layer),


,→ Dense).gradient(deltas.Peek, model.netLayers(layer - 1).parameters(0))
dw.Add(dlayer(0))
db.Add(dlayer(1))
deltas.Push(db(layer))
Next
Return New Tuple(Of List(Of Matrix), List(Of Matrix))(dw, db)
End Function 'This function calculates the gradients for a batch of xypoints, i.e
,→ mini-batch gradient descent

End Class

29
3.2 Techniques Used
3.2.1 Linear Algebra Techniques Used

Technique How it works Reason for implementation


Used (& Class
Implemented)
Matrix Opera- See section 2.1.1.3.4 See section 2.1.1.3.4
tions (Matrix)
Volume Op- The Volume operations work very sim- There are two main reasons why these opera-
erations ilar to the Matrix operations. The only tions were implemented in the Volume class.
(Volume) difference between the two is that the The first reason is that these operations are
Volume operations work on volumes in- used thoroughly in the Conv class for back-
stead. An example of this is the multi- propagation and also for the foreword propa-
plication of two Volumes element-wise. gation. Furthermore, these operations are also
Multiplication element-wise of 2 matri- used when handling with Volumes. This is be-
ces involve multiplying every element in cause when the user is dealing with images,
one matrix with the corresponding ele- sound or some other data format which have
ment in the other. The same applies for a 3rd dimension, it will be much easier for the
Volumes. The elements in one volume user to manipulate using the Volume class,
are multiplied element-wise with the el- as most of the most common data-processing
ements in the other. This can be ex- functions will be implemented. This includes
tended to other operations such as ”-”, normalising, rotating, splitting a portion of the
”+” and ”/”. volume and so on. An example of a situation
where the user may want to use these opera-
tions is when the user is dealing with images in
RGB format, the user may want to normalise
the values for the conv net, therefore by hav-
ing a normalising function in the Volume class
makes it much easier for the user as they will
not need to define their own functions for the
data processing part. Another function that
was implemented was op(f(x), v), where f(x)
is a function that takes in a matrix and re-
turns a matrix and V is a volume which this
function will be applied on layer by layer. This
particular function allows many of the matrix
operations to be extended to the Volume class.
Matrix Multi- Matrix Multiplication works by Multi- The reason I implemented matrix multiplica-
plication (Ma- plying the rows of one matrix with the tion was that it is used many times in dense
trix) columns of the the other and summing networks to propagate through the layers.
in the process. For more information
about matrix multiplication see section
2.1.1.3.1, which is in design.
Matrix Itera- The matrix iteration algorithms work Matrix iteration are extremely useful to my
tion (Matrix) by yielding an item instead of return- library as it saved me from writing the same
ing a specific item and then terminat- for-loops many times and thus makes my code
ing. Matrix Iteration for the columns in much easier to read and understand. These
a matrix works by yielding columns in iteration algorithms can also be used by the
the matrix which then allows the user to user as the user may want to loop over their
loop over these iterations. The subma- data which may be in matrix format. Fur-
trix iterator works by striding a window thermore, the iteration submatrix is used in
over a matrix and then returning all the the Matrix class for the functions conv, max-
elements the window strides over. 30 pool and is also used in the MaxPooling class,
as it implements the required for-loops these
functions require.

Table 3.2: Techniques Used - Linear Algebra (part 1)


Volume It- The iterations for the volume class The reason for implementing these iterators
erations work similarly to the iterations in the are the same for the Matrix Iterators, as they
(Volume) Matrix Class. The Volume class im- both reduce the need to use the same for-loops
plements the iterators, subvolume and to iterate over the same data-structure.
Items. The subvolume iterator works
the same way as the submatrix itera-
tor with the exception that subvolume
works with volumes instead. Finally,
the iterator Items work by yielding the
layers of a Volume.
Volume Casts Volume casting is used to cast a given to These functions are used by the Reshape
(Volume) matrix or to cast a matrix to a volume. class, as this layer transforms the output vol-
ume from a Conv class to a matrix for the
Dense class that follows on. For the back-
prop procedure however, the casting is from
Matrix to Volume, which will then be back-
propagated to the conv layers as the deltas for
the back-prop procedure.
Convolution See 2.1.1.3.2 Convolution is used to forward propagate
(Volume) and also back-propagate throughout the conv
classes as it is part of the inference process for
convolutional neural networks.

Table 3.3: Techniques Used - Linear Algebra (Part 2)

Below is the code for some of the matrix operations that are being used in my project:

Public Function reshape(ByVal rows As Integer, ByVal cols As Integer) As Matrix


If rows * cols <> Me.shape.Item1 * Me.shape.Item2 Then
Throw New System.Exception("Matrix dimensions do not conform for reshape")
End If
Dim result As New Matrix(rows, cols)
Dim i As Integer = 0
For Each d In val(Me)
result.item(Math.Truncate(i / cols) + 1, (i Mod cols) + 1) = d
i += 1
Next
Return result
End Function 'COMPLETED -- Rehapes a matrix in to another matrix with shape = (rows, cols)

Public Sub transposeSelf() Implements Tensor.transposeSelf


tState = Not tState
shape = New Tuple(Of Integer, Integer)(Me.shape.Item2, Me.shape.Item1)
End Sub 'COMPLETED -- Subroutine used to swap "ij" and also swap indices of shape -
,→ transposes the current matrix - Does not create an instance

Public Function clone() As Tensor Implements Tensor.clone


Dim cloned As New Matrix(Me.shape.Item1, Me.shape.Item2)
cloned.values = Me.values.Clone
cloned.tState = Me.tState
'For a matrix to be cloned, values tState and shape variables are required to be the
,→ same

31
Return cloned
End Function 'completed -- function returns an identical matrix (A clone)

Public Shared Function conv(ByVal m As Matrix, ByVal kernel As Matrix, Optional ByVal
,→ stridesx As Integer = 1, Optional ByVal stridesy As Integer = 1, Optional ByVal
,→ padding As String = "valid") As Matrix
Dim paddy, paddx As Integer
If padding = "full" Then
'For a full convolution, the Matrix should be first zero-padded such that the every
,→ element in the matrix can be used to convolve with the kernel, and then a Valid
,→ convolution is applied.
m = Matrix.padd(m, kernel.shape.Item1 - 1, kernel.shape.Item2 - 1)
Return conv(m, kernel, stridesx, stridesy, "valid")
End If
If padding = "same" Then
If ((m.shape.Item1 Mod stridesy) = 0) Then
paddy = Math.Max(kernel.shape.Item1 - stridesy, 0)
Else
paddy = Math.Max(kernel.shape.Item1 - (m.shape.Item1 Mod stridesy), 0)
End If
If ((m.shape.Item2 Mod stridesx) = 0) Then
paddx = Math.Max(kernel.shape.Item2 - stridesx, 0)
Else
paddx = Math.Max(kernel.shape.Item2 - (m.shape.Item2 Mod stridesx), 0)
End If
m = Matrix.addcol(m, 1, Math.Floor(paddy / 2), 1)
m = Matrix.addcol(m, m.shape.Item1 + 1, paddy - Math.Floor(paddy / 2), 1)
m = Matrix.addcol(m, 1, Math.Floor(paddx / 2), 0)
m = Matrix.addcol(m, m.shape.Item2 + 1, paddx - Math.Floor(paddx / 2), 0)
'The amount of padding done for SAME convolution follows the tensorlflow guidlines
,→ for the amount of padding
Return conv(m, kernel, stridesx, stridesy, "valid")
ElseIf padding = "valid" Then
Dim result As New Matrix(Math.Truncate((m.shape.Item1 - kernel.getshape(0)) /
,→ stridesy) + 1, Math.Truncate((m.shape.Item2 - kernel.getshape(1)) / stridesx) +
,→ 1)
Dim i As Integer = 0
'The following code is used to compute the resulting convolved Matrix
'The dot product is used as convolution is essentially a series of dot products
For Each S In submatrix(m, kernel.shape.Item2, kernel.shape.Item1, stridesx,
,→ stridesy)
result.values(Math.Truncate(i / result.shape.Item2), i Mod result.shape.Item2)
,→ = Matrix.dotsum(S, kernel)
i += 1
Next
Return result
End If
Console.WriteLine(padding)
Throw New System.Exception("Padding must be either valid, same or full")
End Function 'COMPLETED -- Returns the convolution after a kernel have been applied

32
Public Overloads Shared Function join(ByVal m As Matrix, ByVal n As Matrix, ByVal index
,→ As Integer) As Matrix
If m.shape.Item1 <> n.shape.Item1 Then
'An error is thrown here is both Matrices do not have the same number of rows
Throw New System.Exception("Number of Rows must be the same for both Matrices")
End If
Dim result As New Matrix(m.shape.Item1, m.shape.Item2 + n.shape.Item2)
Dim i As Integer = 0
For k As Integer = 0 To m.shape.Item2 - 1
If i = index Then
i += n.shape.Item2
End If
For l As Integer = 0 To m.shape.Item1 - 1
result.values(l, i) = m.item(l + 1, k + 1)
Next
i += 1
Next
For k As Integer = 0 To n.shape.Item2 - 1
For l As Integer = 0 To m.shape.Item1 - 1
result.values(l, index + k) = n.item(l + 1, k + 1)
Next
Next
Return result
End Function 'COMPLETED -- Concatenates a Matrix(M) to another Matrix(N) at the specified
,→ axis

Public Overloads Shared Function join(ByVal m As Matrix, ByVal n As Matrix, ByVal index As
,→ Integer) As Matrix
If m.shape.Item1 <> n.shape.Item1 Then
'An error is thrown here is both Matrices do not have the same number of rows
Throw New System.Exception("Number of Rows must be the same for both Matrices")
End If
Dim result As New Matrix(m.shape.Item1, m.shape.Item2 + n.shape.Item2)
Dim i As Integer = 0
For k As Integer = 0 To m.shape.Item2 - 1
If i = index Then
i += n.shape.Item2
End If
For l As Integer = 0 To m.shape.Item1 - 1
result.values(l, i) = m.item(l + 1, k + 1)
Next
i += 1
Next
For k As Integer = 0 To n.shape.Item2 - 1
For l As Integer = 0 To m.shape.Item1 - 1
result.values(l, index + k) = n.item(l + 1, k + 1)
Next
Next
Return result
End Function 'COMPLETED -- Concatenates a Matrix(M) to another Matrix(N) at the specified
,→ axis

33
Public Function maxpool(ByVal kernelx As Integer, ByVal kernely As Integer, ByVal stridesx
,→ As Integer, ByVal stridesy As Integer) As Matrix
Dim result As New Matrix(((Me.shape.Item1 - kernely) / stridesy) + 1, ((Me.shape.Item2
,→ - kernelx) / stridesx) + 1)
Dim i As Integer = 0
'Following code is used to select the Maximum element out of each submatrix
For Each m In submatrix(Me, kernelx, kernely, stridesx, stridesy) 'submatrix is a
,→ function of type ienumerable
result.values(Math.Truncate(i / result.shape.Item2), i Mod result.shape.Item2) =
,→ m.max()
i += 1
Next
Return result
End Function 'Completed -- Applies maxpooling to a matrix with a kernel of size = (kernelx,
,→ kernely)

Public Shared Function rotate(ByVal m As Matrix, Optional ByVal theta As Integer = 1) As


,→ Matrix
If theta = 0 Then 'If theta is 0, then no rotation is required
Return m
Else
'Rotating a matrix 90 degrees clockwise is the same as transposing the matrix, and
,→ then reversing each row
Dim transposed As Matrix = m.transpose
Dim result As New Matrix(transposed.shape.Item1, 0)
For Each col In transposed.columns.Reverse
result = Matrix.join(result, col, result.shape.Item2)
Next
Return rotate(result, theta - 1) 'As theta is a multiple of 90 and rotation is a
,→ commutative operation, if theta > 1, then return matrix rotated by theta-1
,→ times
End If
End Function 'COMPLETED -- Rotates the matrix by theta * 90

Public Function oneHot(ByVal num_classes As Integer) As Matrix


'oneHot converts a row vector into a matrix of shape(num_classes, num_samples). Each
,→ corresponding item in the row vector is then used
'to select each column in the resulting matrix by placing a 1, whilst the rest of the
,→ items are set to 0.
If Me.getshape(0) <> 1 Then 'Checking to see if the matrix is a row vector or not
Throw New System.Exception("Matrix must be a row vector for one hot")
End If
Dim oneHotArr(num_classes - 1, Me.getshape(1) - 1) As Double
For j As Integer = 0 To Me.getshape(1) - 1
oneHotArr(Me.item(1, j + 1), j) = 1 'Useing the items in the matrix to select,
,→ place a 1, in the resultant matrix
Next
Return New Matrix(oneHotArr)
End Function 'COMPLETED -- Returns the onehot of a matri

Public Function invOneHot() As Matrix


Dim result As New Matrix(1, Me.getshape(1))

34
For j As Integer = 1 To Me.getshape(1)
Dim maxval As Double = Double.MinValue 'Variable used to store the maximum item in
,→ this column vector
Dim pos As Integer = 0 'Variable used to store the position of the maximum item in
,→ the vector
For i As Integer = 1 To Me.getshape(0)
If Me.item(i, j) > maxval Then 'If the maximum item is greater than assign the
,→ maxmav to that item and "pos" to the position of that maxmimum value
maxval = Me.item(i, j)
pos = i
End If
Next
result.item(1, j) = pos - 1
Next
Return result
End Function 'COMPLETED -- Returns the inverse of Onehot encoding to a matrix

Public Function normalize(Optional ByVal mean As Double = 0, Optional ByVal std As Double =
,→ 1) As Tensor Implements Tensor.normalize
Dim means As Matrix = Matrix.sum(Me, 0) / Me.getshape(0) 'Finds the column means
Dim stds As Matrix = Matrix.op(AddressOf Math.Pow, (Matrix.sum(Matrix.op(AddressOf
,→ Math.Pow, Me, 2), 0) - Matrix.op(AddressOf Math.Pow, Me, 2) * Me.getshape(0)) /
,→ (Me.getshape(0) - 1), 0.5) 'Finds the std for this particular colum

Dim result As New Matrix(Me.getshape(0), Me.getshape(1))


For i As Integer = 1 To Me.getshape(0)
For j As Integer = 1 To Me.getshape(1)
result.item(i, j) = (Me.item(i, j) - means.item(1, j)) / stds.item(1, j)
,→ 'normalises each item in the matrix
Next
Next
Return (result * std) + mean
End Function 'COMPLETED -- Returns the normalised version of a particular matrix

Public Shared Function matmul(ByVal x As Matrix, ByVal y As Matrix) As Matrix


If x.shape.Item2 <> y.shape.Item1 Then
Console.WriteLine("Shape of A is {0}, {1}. Shape of B is {2}, {3}", x.shape.Item1,
,→ x.shape.Item2, y.shape.Item1, y.shape.Item2)
Throw New System.Exception("Shapes do not conform for matrix multiplication")
End If
'The following code, is for matrix multiplication using the standard way
Dim result As New Matrix(x.shape.Item1, y.shape.Item2)
For i As Integer = 0 To x.shape.Item1 - 1
For j As Integer = 0 To y.shape.Item2 - 1
Dim sum As Decimal = 0
For k = 0 To x.shape.Item2 - 1
Try
sum += x.item(i + 1, k + 1) * y.item(k + 1, j + 1)
Catch ex As Exception
Throw New System.Exception("Value was too large or too Small.")
End Try
Next

35
result.values(i, j) = sum
Next
Next
Return result
End Function 'COMPLETED -- Returns the product of Matrix Multiplication

Public Shared Function dotsum(ByVal x As Matrix, ByVal y As Matrix) As Boolean


Dim product As Matrix = y * x
Return Matrix.sum(Matrix.sum(product, 1)).item(1, 1)
End Function 'COMPLETED -- Returns the sum of all the elementwise multiplication of the 2
,→ matrices

Public Shared Operator =(ByVal a As Matrix, ByVal b As Matrix) As Boolean


If Not a Like b Then
Return False
End If
For i As Integer = 1 To a.shape.Item1
For j As Integer = 1 To a.shape.Item2
If a.item(i, j) <> b.item(i, j) Then 'Statement compare each value in the
,→ matrix
Return False
End If
Next
Next
Return True
End Operator 'Completed -- Checks if all items in the matrix are the same

Public Overloads Iterator Function columns() As IEnumerable(Of Tensor)


For j As Integer = 1 To Me.shape.Item2
Dim result As New Matrix(Me.shape.Item1, 1)
For i As Integer = 1 To Me.shape.Item1
result.values(i - 1, 0) = Me.item(i, j) 'Copies items of the column of the
,→ matrix to another matrix
Next
Yield result
Next
End Function 'COMPLETED -- Returns an enumerable of columns of the matrix

Public Shared Iterator Function submatrix(ByVal m As Matrix, ByVal kernelx As Integer,


,→ ByVal kernely As Integer, ByVal stridesx As Integer, ByVal stridesy As Integer) As
,→ IEnumerable(Of Matrix)
For i As Integer = 1 To m.getshape(0) - kernely + 1 Step stridesy
For j As Integer = 1 To m.getshape(1) - kernelx + 1 Step stridesx
Yield m.item(i, i + kernely - 1, j, j + kernelx - 1)
Next
Next
End Function 'COMPLETED -- Returns a collection of matrices, by sliding a window of length
,→ (kernelx, kernely) and using strides = (stridesx, stridesy)

Public Shared Iterator Function val(ByVal m As Matrix, Optional stepx As Integer = 1,


,→ Optional stepy As Integer = 1) As IEnumerable(Of Double)
For i As Integer = 1 To m.getshape(0) Step stepy

36
For j As Integer = 1 To m.getshape(1) Step stepx
Yield m.item(i, j)
Next
Next
End Function 'COMPLETED -- Returns items from a matrix, using a stepsize of (stepx, stepy)

Below is the code for some of the volume operations that are being used in my project:

Public Property split(ByVal i_start As Integer, ByVal i_end As Integer, ByVal j_start As
,→ Integer, ByVal j_end As Integer, ByVal k As Integer) As Matrix
Get
Dim result As New Matrix(i_end - i_start + 1, j_end - j_start + 1)
For i As Integer = i_start To i_end
For j As Integer = j_start To j_end
result.item(i - i_start + 1, j - j_start + 1) = Me.item(i, j, k)
Next
Next
Return result
End Get
Set(ByVal value As Matrix)
For i As Integer = i_start To i_end
For j As Integer = j_start To j_end
Me.item(i, j, k) = value.item(i - i_start + 1, j - j_start + 1)
Next
Next
End Set
End Property 'COMPLETED -- Property used to set/select a portion of a volume

Public Function rotate(ByVal theta As Integer) As Volume


If theta = 0 Then
Return Me 'If theta is 0, then return identity
Else
Return op(AddressOf Matrix.rotate, Me).rotate(theta - 1) 'Else rotate each layer in
,→ the Volume, and then rotate by (theta-1)*90 degrees
End If
End Function 'COMPLETED -- Function rotates the Volume by theta * 90 degree

Public Function normalize(Optional ByVal mean As Double = 0, Optional ByVal std As Double =
,→ 1) As Tensor Implements Tensor.normalize
Dim n As Integer = Me.shape.Item1 * Me.shape.Item2 * Me.values.Count
Dim means As Double = Me.values.Select(Function(x) Matrix.sum(x).item(1, 1)).Sum / n
Dim stds As Double = Math.Sqrt((Me.values.Select(Function(x) Matrix.sum(x * x).item(1,
,→ 1)).Sum - (mean * mean * n)) / (n - 1))
Dim result As New List(Of Matrix)
For Each M In Volume.Items(Me)
result.Add((M - means) / stds)
Next
Return New Volume(result)
End Function 'COMPLETED -- Returns a volume whose layers are normlised using all the
,→ elements in the volume

Public Function mean(ByVal axis As Integer) As Matrix

37
Dim result As New Matrix(Me.shape.Item1, Me.shape.Item2)
If axis = 2 Then
For Each M In Me.values
result += M
Next
Return result / Me.values.Count
Else
Throw New System.Exception("axis 1 or 2 has not yet been implemented yet for
,→ Volume")
End If
End Function 'COMPLETED -- Returns the mean of a volume in a specified dimension.
,→ Currently, only works for axis = 2

Public Function transpose() As Volume


Return op(AddressOf Matrix.transpose, Me)
End Function 'COMPLETED -- Function returns the transpose of each element in the matrix

Public Function clone() As Tensor Implements Tensor.clone


Dim cloned As New List(Of Matrix)
For Each m In Items(Me)
cloned.Add(m)
Next
Return New Volume(cloned)
End Function 'COMPLETED -- Function returns a clone of the current volume

Public Shared Function conv2d(ByVal v As Volume, ByVal kernels As Volume, Optional stridesx
,→ As Integer = 1, Optional stridesy As Integer = 1, Optional padding As String = "valid")
,→ As Volume
'conv2d is applying a convolution using in 2 dimensions. This means that every 2d
,→ kernel is applied to every layer in the volume.
Dim result_values As New List(Of Matrix) : Dim all_channels As List(Of Matrix) =
,→ Items(v).ToList

For Each k In Items(kernels)


Dim temp As Matrix = Matrix.conv(all_channels(0), k, stridesx, stridesy, padding)
For Each M In all_channels.GetRange(1, all_channels.Count - 1)
temp += Matrix.conv(M, k, stridesx, stridesy, padding) 'Summing up the result
,→ of all the convolutions for that particular kernel
Next
result_values.Add(temp)
Next
Return New Volume(result_values)
End Function 'COMPLETED -- Applies 2d convolutio

Public Shared Function maxpool(ByVal filter As Volume, ByVal kernely As Integer, ByVal
,→ kernelx As Integer, ByVal stridesy As Integer, ByVal stridesx As Integer) As Volume
Dim result As New List(Of Matrix)
For Each M In Items(filter)
result.Add(M.maxpool(kernelx, kernely, stridesx, stridesy))
Next
Return New Volume(result)

38
End Function 'COMPLETED -- Returns the maxpooling of a volume using a kernel of shape
,→ (kernelx, kernely) and step size = (stridesx, stridesy)

Public Shared Function op(ByVal x As Volume, ByVal f As Func(Of Matrix, Matrix, Matrix),
,→ ByVal y As Volume) As Volume
Dim result As New List(Of Matrix)
For i As Integer = 0 To x.values.Count - 1
result.Add(f.Invoke(x.values(i), y.values(i)))
Next
Return New Volume(result)
End Function 'COMPLETED -- Applies a function f(Matrix, Matrix) -> Matrix, to all the
,→ layers in the Volumes x and y

Public Shared Function op(ByVal v As Volume, ByVal f As Func(Of Matrix, Matrix, Matrix),
,→ ByVal m As Matrix) As Volume
Dim result As New Volume(v.shape.Item1, v.shape.Item2, 0)
For Each x In Items(v)
result.values.Add(f.Invoke(x, m))
Next
Return result
End Function 'COMPLETED -- Function applies a function "f" to a Volume "x" and "y"
,→ layer-wis

Public Overloads Iterator Function Items() As IEnumerable(Of Tensor)


For Each Matrix In Me.values
Yield Matrix
Next
End Function 'COMPLETED -- Returns an IEnumerable of all the layers in the Volume

Public Shared Iterator Function subvolume(ByVal v As Volume, ByVal kernelx As Integer,


,→ ByVal kernely As Integer, ByVal stridesx As Integer, ByVal stridesy As Integer) As
,→ IEnumerable(Of Volume)
For i As Integer = 1 To v.shape.Item1 - kernely + 1 Step stridesy
For j As Integer = 1 To v.shape.Item2 - kernelx + 1 Step stridesx
Yield v.split(i, i + kernely - 1, j, j + kernelx - 1)
Next
Next
End Function 'COMPLETED -- Returns a collection of volumes, by sliding a window of length
,→ (kernelx, kernely) and using strides = (stridesx, stridesy) on each individual layer

Public Shared Function cast(ByVal matrixList As List(Of Matrix), ByVal rows As Integer,
,→ ByVal cols As Integer) As List(Of Volume)
Dim l_v As New List(Of Volume)
For Each M In matrixList
l_v.Add(Volume.cast(M, rows, cols))
Next
Return l_v
End Function 'COMPLETED -- Casts a list of matrices into volumes elementwise

Public Shared Function cast(ByVal v As Volume, ByVal rows As Integer, ByVal cols As
,→ Integer) As Matrix
If rows * cols <> v.shape.Item1 * v.shape.Item2 * v.values.Count Then

39
Throw New System.Exception("Dimensions for matrix must only be sufficient to store
,→ all the items in the Volume")
End If
Dim result As New Matrix(rows, cols)
Dim i As Integer = 0
For Each M In Volume.Items(v)
For Each k In Matrix.val(M)
result.item((Math.Truncate(i / result.getshape(1)) + 1), (i Mod
,→ result.getshape(1)) + 1) = k 'Assigning each element in result with
'its corresponding value in the Volume "v"
i += 1
Next
Next
Return result
End Function 'COMPLETED -- Function used to cast a Volume into a matrix of shape = (rows,
,→ cols)

Public Shared Function cast(ByVal m As Matrix, ByVal rows As Integer, ByVal cols As
,→ Integer) As Volume
Dim result As New Volume(rows, cols, m.getshape(0) * m.getshape(1) / (rows * cols))
Dim i As Integer = 0
For Each d In Matrix.val(m)
result.item(((Math.Truncate(i / (cols))) Mod rows) + 1, (i Mod (cols)) + 1,
,→ Math.Truncate(i / ((cols * rows)))) = d
i += 1
Next
Return result
End Function 'COMPLETED -- Function casts a matrix into a volume of shape = (rows, cols)

3.2.2 Techniques - Neural Networks


Finally, the main algorithms for my library are the actual neural networks. Each Neural network is made up of a
stack of Layers, which is stored in the Net class. Therefore, in order to implement these networks, I must have
implemented the specific layers first which requires defining the forward and backward propagation algorithms for
each layer.

40
Technique Used (& class How it works
Implemented)
DenseNetworks foreword- see section 2.1.1.1.1
prop
ConvNetworks foreword- see section 2.1.1.1.1
prop
Dense Back-prop The back-propagation algorithm for dense nets works recursively. The
gradients for a layer depend upon the gradient of the layer above. Matrix
multiplication then takes place between the transposed of these gradients
multiplied by the previous layers weights. This product is then scalar
multiplied by the gradient of the activation function which then returns
the gradient of the loss w.r.t to this layers bias. To find the gradient
w.r.t the weight matrix, the gradient w.r.t the bias is multiplied by the
input of the layer transposed.
Convolution Back-prop The back-propagation for the convolutional layers is very complicated
due to the fact that we are dealing with volumes instead of matrices and
differentiating w.r.t the cross correlation function. To find the gradient
of the loss w.r.t to a filter in a specific CNN layer, the previous layers
gradients are multiplied element-wise by the kernal and the derivative
of the activation function, that was used to produce these outputs. This
produces a 3d volume, which means we will need to find the mean across
the layers of this volume which results in a matrix corresponding to the
gradients for a specific layer in the filter volume.

Table 3.4: Techniques Used - Machine Learning

The following code below shows the forward propagation procedure for the layers.
For the Dense Layer:

Public Function f(ByVal X As Tensor) As Matrix Implements Layer(Of Matrix).f 'Shape of X is


,→ (_units_prev, m)
x_in = X
z = Matrix.matmul(w, X) + b
a = act.f(z)
Return a
End Function 'Returns the output of the layer using the inputs fed into the layer

For the conv Layer:

Public Function f(x As Tensor) As Volume Implements Layer(Of Volume).f


'x is the input into the layer
x_in = x
z = Volume.conv2d(x_in, filter, stridesx, stridesy, padding) + b
a = Volume.op(AddressOf act.f, z)
Return a
End Function 'Returns the output of the layer using the inputs fed into the layer

Finally, the following code below shows the back-propagation procedure for the layers.
For the Dense Layer:

41
Public Overridable Overloads Function update(ByVal l_r As Decimal, ByVal prev_delta As
,→ Tensor, ByVal ParamArray param() As Tensor) As Tensor Implements Layer(Of
,→ Matrix).update
Dim grads As IEnumerable(Of Matrix) = gradient(prev_delta, param) 'grads returns (dw,
,→ db)
w -= l_r * grads(0) : b -= l_r * grads(1) 'Parameters are being updated
Return grads(1) 'Function returns db, as will be needed for the next-layers update for
,→ back-prop
End Function 'Function updates the parameters using prev_delta and param

Public Overloads Function update(ByVal l_r As Decimal, ByVal prev_delta As Tensor) As


,→ Tensor Implements Layer(Of Matrix).update
'This update function is only used by the Last layer in the Net if it is a dense net
Dim grads As IEnumerable(Of Matrix) = gradient(prev_delta) 'gradient function returns
,→ (dw, db)
w -= l_r * grads(0) : b -= l_r * grads(1) 'Parameters are being updated
Return grads(1) 'Function returns db, as will be needed for the next-layers update for
,→ back-prop
End Function

For the conv Layer:

Public Function update(l_r As Decimal, _prev_dz As Tensor, ParamArray param() As Tensor) As


,→ Tensor Implements Layer(Of Volume).update
'param should be empty as convolution doesn't require previous weights for finding
,→ updates.
Dim dfilter As New List(Of Matrix)
Dim dz As Volume = _prev_dz.clone
Dim act_d As Volume = Volume.op(AddressOf act.d, z)
Dim dx As New List(Of Matrix)

'Following code finds dfilter, using _prev_dz


For filter_channel As Integer = 0 To filter.values.Count - 1
Dim temp As New Volume(kernely, kernelx, 1)
Dim i As Integer = 0
For Each kernel_window In Volume.subvolume(x_in, kernelx, kernely, stridesx,
,→ stridesy)
temp += kernel_window * dz.item(Math.Truncate(i / dz.shape.Item2) + 1, (i Mod
,→ dz.shape.Item2) + 1, filter_channel) * act_d.item(Math.Truncate(i /
,→ dz.shape.Item2) + 1, (i Mod dz.shape.Item2) + 1, filter_channel)
i += 1
Next
dfilter.Add(temp.mean(2))
Next

'The Following code finds dx using dfilter


For dx_channel As Integer = 0 To x_in.values.Count - 1
Dim dx_channel_sum As New Volume(x_in.shape.Item1, x_in.shape.Item2, 1)
For f As Integer = 0 To filter.values.Count - 1
Dim k As Integer = 0
For i As Integer = 0 To x_in.shape.Item1 - kernely Step stridesy

42
For j As Integer = 0 To x_in.shape.Item2 - kernelx Step stridesx
dx_channel_sum.split(i + 1, i + kernely, j + 1, j + kernelx, 0) =
,→ filter.values(f) * dz.item(Math.Truncate(k / dz.shape.Item2) + 1,
,→ (k Mod dz.shape.Item2) + 1, f)
k += 1
Next
Next
dx.Add(dx_channel_sum.mean(2))
Next
Next

filter -= New Volume(dfilter) * l_r


b -= Volume.op(AddressOf Matrix.sum, _prev_dz) * l_r
Return New Volume(dx)
End Function

3.2.3 Optimisation Techniques Used


The advanced optimisation algorithms for training dense nets was asked by my clients, therefore I have implemented
these algorithms as it was one of the needs of my target market, and also one of my key objectives set. Implementing
these algorithms also allow the user to view the difference each optimisation method has on the net and how changing
the architecture of the net can have an impact on the optimisation method being used. Finally, these advanced
optimisation methods are set methods and these algorithms have shown to work much better than the standard
back-propagation algorithm in the training of Dense nets, see section 2.1.1.2 for more detail.

Technique Used Class Used in How it works


Adam optimisation algorithm AdamOptimiser See 2.1.1.2.3
Momentum algorithm Momentum See 2.1.1.2.1
RMS algorithm RMS See 2.1.1.2.2
Back-prop algorithm GradientDescentOptimiser See 2.1.1.1.2

Table 3.5: Techniques Used - Optimisation Algorithms

The following algorithms below were used to train the net and they are, ADAM, RMS, Momentum and the
standard back-propagation algorithm.
The following code below is the code that was implemented for the standard backpropagation algorithm.

Public Overrides Function run(ByVal learning_rate As Decimal, ByVal printLoss As Boolean,


,→ ByVal batchSize As Integer, ByVal ParamArray param() As Decimal) As List(Of Tensor)
Dim batches As List(Of IEnumerable(Of Tuple(Of Tensor, Tensor))) =
,→ MyBase.splitdata(batchSize) 'This line splits the data into the different
,→ batchsizes
'Following code find the average cost for this particulat mini-batch

Dim pred As Tensor


Dim errors As New List(Of Matrix)

For Each batch In batches 'Looping over every batch


'The following code wil now find the average dirivative w.r.t the output of the net
,→ for this particulat mini-batch
For Each vector In batch

43
pred = model.predict(vector.Item1)
errors.Add(model.loss.d(pred, vector.Item2) *
,→ model.netLayers.Peek.parameters.Last)
Next
Dim deltas As New Stack(Of Tensor)
deltas.Push((New Volume(errors) / batch.Count).mean(2))

'Following code updates each layer using deltas


model.netLayers.Peek.update(learning_rate, deltas.Peek)
For layer As Integer = 1 To model.netLayers.Count - 1
deltas.Push(model.netLayers(layer).update(learning_rate, deltas.Peek,
,→ model.netLayers(layer - 1).parameters(0)))
Next
Next

losses.Add(calculateCost(dataxy)) 'loss is calulcated using the new updates


If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses
End Function 'Function applies mini-batch gradient descent to train the network

The following code below is the code that was implemented for the Momentum Optimisation algorithm.

Public Overrides Function run(ByVal l_r As Decimal, ByVal printLoss As Boolean, ByVal
,→ batchSize As Integer, ParamArray param() As Decimal) As List(Of Tensor)
If param.Count <> 1 Then
Throw New System.Exception("Momentum requires 1 parameters for training")
End If

Dim batches As List(Of IEnumerable(Of Tuple(Of Tensor, Tensor))) =


,→ MyBase.splitdata(batchSize) 'This line splits the data into the different
,→ batchsizes

For Each batch In batches 'Looping through each batch, as we are doing mini-batch
,→ gradient descent
Dim d As Tuple(Of List(Of Matrix), List(Of Matrix)) =
,→ MyBase.calculateGradients(batch)
Dim dw As List(Of Matrix) = d.Item1 : Dim db As List(Of Matrix) = d.Item2 'This
,→ line retreives the gradients for w & b

'Following code applies the momentum optimiser technique to the network


For layer As Integer = 0 To model.netLayers.Count - 1
v_dw(layer) = (v_dw(layer) * param(0) + (1 - param(0)) * dw(layer))
v_db(layer) = (v_db(layer) * param(0) + (1 - param(0)) * db(layer))
model.netLayers(layer).deltaUpdate(-l_r * v_dw(layer), -l_r * v_db(layer))
Next
Next

44
'Following code will store the new loss after the updates have been done
losses.Add(calculateCost(dataxy))
If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses

End Function

The following code below is the code that was implemented for the RMS Optimisation algorithm.

Public Overrides Function run(ByVal l_r As Decimal, ByVal printLoss As Boolean, ByVal
,→ batchSize As Integer, ParamArray param() As Decimal) As List(Of Tensor)
If param.Count <> 1 Then
Throw New System.Exception("RMS requires 1 parameters for training")
End If

Dim batches As List(Of IEnumerable(Of Tuple(Of Tensor, Tensor))) =


,→ MyBase.splitdata(batchSize) 'This line splits the data into the different
,→ batchsizes

For Each batch In batches 'Looping through each batch, as we are doing mini-batch
,→ gradient descent
Dim d As Tuple(Of List(Of Matrix), List(Of Matrix)) = calculateGradients(batch)
Dim dw As List(Of Matrix) = d.Item1 : Dim db As List(Of Matrix) = d.Item2
'Following code applies the RMS optimisation technique to the network
For layer As Integer = 0 To model.netLayers.Count - 1
s_dw(layer) = s_dw(layer) * param(0) + (1 - param(0)) * dw(layer) * dw(layer)
s_db(layer) = s_db(layer) * param(0) + (1 - param(0)) * db(layer) * db(layer)
model.netLayers(layer).deltaUpdate(-l_r * dw(layer) * (1 /
,→ (Matrix.op(Function(x) Math.Sqrt(x), s_dw(layer)) + 0.000001)), -l_r *
,→ db(layer) * (1 / (Matrix.op(Function(x) Math.Sqrt(x), s_db(layer)) +
,→ 0.000001)))
Next
Next

'Following code will store the new loss after the updates have been done
losses.Add(calculateCost(dataxy))
If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses
End Function

The following code below is the code that was implemented for the ADAM Optimisation algorithm.

Public Overrides Function run(ByVal l_r As Decimal, ByVal printLoss As Boolean, ByVal
,→ batchSize As Integer, ParamArray param() As Decimal) As List(Of Tensor)

45
If param.Count <> 2 Then
Throw New System.Exception("Adam requires 2 parameters for training")
End If

Dim batches As List(Of IEnumerable(Of Tuple(Of Tensor, Tensor))) =


,→ MyBase.splitdata(batchSize) 'This line splits the data into the different
,→ batchsizes
Dim decay_term As Decimal = Math.Sqrt(1 - Math.Pow(param(1), iterations)) / (1 -
,→ Math.Pow(param(0), iterations) + 0.000001) 'decay term is being set for this
,→ particular iteration

For Each batch In batches 'Looping through each batch, as we are doing mini-batch
,→ gradient descent
Dim d As Tuple(Of List(Of Matrix), List(Of Matrix)) = calculateGradients(batch)
Dim dw As List(Of Matrix) = d.Item1 : Dim db As List(Of Matrix) = d.Item2

'The following code applies Adam Optimization technique to the network


For layer As Integer = 0 To model.netLayers.Count - 1
s_dw(layer) = s_dw(layer) * param(1) + (1 - param(1)) * dw(layer) * dw(layer)
s_db(layer) = s_db(layer) * param(1) + (1 - param(1)) * db(layer) * db(layer)
v_dw(layer) = (v_dw(layer) * param(0) + (1 - param(0)) * dw(layer))
v_db(layer) = (v_db(layer) * param(0) + (1 - param(0)) * db(layer))
model.netLayers(layer).deltaUpdate(-l_r * dw(layer) * (1 /
,→ (Matrix.op(Function(x) Math.Sqrt(x), s_dw(layer)) + 0.000001)), -l_r *
,→ db(layer) * (1 / (Matrix.op(Function(x) Math.Sqrt(x), s_db(layer)) +
,→ 0.000001)))
model.netLayers(layer).deltaUpdate(-l_r * v_dw(layer) * (1 /
,→ (Matrix.op(Function(x) Math.Sqrt(x), s_dw(layer)) + 0.00001)) * decay_term,
,→ -l_r * v_db(layer) * (1 / (Matrix.op(Function(x) Math.Sqrt(x), s_db(layer))
,→ + 0.00001)) * decay_term)
Next
Next

'Following code will store the new loss after the updates have been done
losses.Add(calculateCost(dataxy))
If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses

End Function

46
Testing

In this Section, I will now be testing my library to make sure every function and sub-routine created works correctly.
It is important, that every component works correctly, otherwise it could result in a unsuccessful product that does
not fulfil the needs of my target audience.

4.1 Non-erroneous Tests


4.1.1 Matrix Class
The Matrix class is essential to the project as it is used by all my classes. Therefore, it is important that the matrix
class works correctly, otherwise it would result in all the classes working incorrectly. Therefore, before doing unit
tests on any other class, it is important that I first carry out a unit test on the matrix class. These are the results
for the unit tests on the matrix class:

47
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Instantiating Matrix with Matrix Instantiated with As expected N/A
initial values set set values
2 Instantiating Matrix with Matrix Instantiated with As expected N/A
fixed size size set
3 Instantiating Matrix with Matrix instantiated with As expected N/A
values ∼ N(0, 1) all values ∼ N(0, 1)

Table 4.1: Testing Matrix Constructors

Test Test Expected Outcome Actual Outcome Changes


No. Made
1 Returning a specific item Returns the selected cell As expected N/A
from a matrix from the matrix
2 Setting a specific cell in a The cell is set to the re- As expected N/A
matrix to a specified value quired value
3 Testing get/set property Property returned/set As expected N/A
to return/set the required correct portion of Matrix
portion of the matrix
4 getshape function Correct shape of Matrix is As expected N/A
returned as a list of dimen-
sions
5 Print Function All values printed cor- As expected N/A
rectly
6 transpose self sub-routine Variable tState is set to As expected N/A
followed by print function True and the transposed
matrix is printed out
7 Clone Function Returns new matrix with As expected N/A
same values and same
tState variable
8 Reshape function Returns new matrix with As expected N/A
same values, but with the
correct shape specified
9 addcol function Returns new matrix with As expected N/A
extra columns

Table 4.2: Testing Matrix basic functionality

48
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Sum of 2 matrices Matrix of correct values As expected N/A
returned
2 Difference of 2 matrices Matrix of correct value re- As expected N/A
turned
3 Element-wise multiplica- Matrix of correct values As expected N/A
tion between 2 matrices returned
4 Element-wise division of 2 Matrix of correct values As expected N/A
matrices returned
5 Matrix Multiplication Matrix of correct values As expected N/A
returned
6 clockwise rotation of ma- Matrix of correct values As expected N/A
trix returned
7 MaxPool operation on a Matrix of correct values As expected N/A
matrix returned
8 maximum item operation correct value returned As expected N/A
on matrix
9 Matrix Convolution Matrix of correct values As expected N/A
returned
10 Sum of 2 matrices using Matrix of correct values As expected N/A
broadcasting returned
11 DotSum operation be- Matrix of correct values As expected N/A
tween two matrices returned
12 Equality between 2 matri- Correct Boolean output As expected N/A
ces returned
13 Scalar multiplication be- Matrix of correct values As expected N/A
tween a double and matrix returned
14 Scalar Addition between a Matrix of correct values As expected N/A
double and matrix returned

Table 4.3: Testing Matrix Operations

Test Test Expected Outcome Actual Outcome Changes


No. Made
1 Value Iteration Iterator returns an IEnu- As expected N/A
merable of each value in
the matrix
2 Column Iteration Iterator returns an IEnu- As expected N/A
merable of all the columns
in the matrix
3 Sub matrix Iteration All sub-matrices returned Out of bound er- The for-loop
for the correct kernel size ror, as the striding started at 0,
and strides window went out- when it was
side the matrix supposed to
start from 1.

Table 4.4: Testing Matrix Iterators

49
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Function f (x)− > y, Function returns matrix As expected N/A
where x is double, is ap- with each value being the
plied on a matrix result of the function f (x)
2 Function f (x, y)− > y, Function returns matrix As expected N/A
where x and y are doubles with each value being
each from a separate ma- the result of the function
trix, is applied on matrix f (x, y)

Table 4.5: Testing inner Matrix functions

4.1.2 Volume Class


Like the matrix class, the Volume class is also essential to the success of my library. The volume class is used many
times throughout the Conv class, therefore before testing out the layers class, it is important that I test out the
volume class. These are the results for the unit test that was carried out on the volume class:

Test Test Expected Outcome Actual Outcome Changes


No. Made
1 Instantiating volume with Volume Instantiated with As expected N/A
initial values set values
2 Instantiating volume with Volume Instantiated with As expected N/A
the shape explicitly de- the shape set
fined
3 Instantiating volume with Volume Instantiated with As expected N/A
shapes set and all values ∼ set shape and all values
N(0, 1) are ∼ N(0, 1)

Table 4.6: Testing Volume Constructors

Test Test Expected Outcome Actual Outcome Changes


No. Made
1 Returning a specific item Returns the selected cell As expected N/A
from Volume of Volume
2 getshape function Returns correct shape of As expected N/A
volume as a list of dimen-
sions
3 Print Function prints out all the values in As expected N/A
the volume
4 Volume transpose All layers in the volume As expected N/A
need to be transposed
5 Clone Function Function returns the Vol- As expected N/A
ume with the same values
6 Volume transpose All layers in the volume As expected N/A
need to be transposed

Table 4.7: Testing Volume basic functionality

50
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Sum of 2 Volumes Volume of correct values As expected N/A
returned
2 Subtraction between 2 Volume of correct values As expected N/A
Volumes returned
3 Element-wise - Multiplica- Volume of correct values As expected N/A
tion between 2 Volumes returned
4 Element-wise - Division Volume of correct values As expected N/A
between 2 Volumes returned
5 Matrix Multiplication be- Volume of correct values As expected N/A
tween 2 Volumes returned
6 Element-wise - Division Volume of correct values As expected N/A
between 2 Volumes returned
7 Scalar multiplication be- Volume of correct values As expected N/A
tween Volume and double returned
8 Scalar Addition between Volume of correct values As expected N/A
Volume and double returned
9 Convolution 2d between Volume of correct values As expected N/A
Volumes returned
10 Maxpooling between 2 Volume of correct values As expected N/A
volumes returned

Table 4.8: Testing Volume Operations

Test Test Expected Outcome Actual Outcome Changes


No. Made
1 Layer Iteration Iterator needs to return As expected N/A
all layers in volume as an
IEnumerable of volume
2 Sub-volume iteration Iterator needs to return all As expected N/A
sub-volumes using a ker-
nel with fixed strides as an
IEnumerable of volume
3 Sub-Volume Iteration Iterator needs to return all As expected N/A
sub-volumes using a ker-
nel with fixed strides as an
IEnumerable of volume

Table 4.9: Testing Volume Iterators

51
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 List to volume cast Function needs to convert As expected N/A
list of matrix to a volume
given height and width
2 Volume to matrix cast Function needs to convert As expected N/A
a volume to a matrix,
given height and width of
the matrix
3 Matrix to volume cast Function needs to convert As expected N/A
a matrix to a volume given
the height and width of
the volume

Table 4.10: Testing Volume Casts

Test Test Expected Outcome Actual Outcome Changes


No. Made
1 function f (x)− > y, Function returns volume As expected N/A
where x is double, is ap- with each value being the
plied on a volume result of the function
2 function f (x, y)− > z, Function returns a single As expected N/A
where x and y are of volume with each value
type double and in sepa- being the result of the
rate volumes, is applied on function
2 volumes
3 function f (x, y)− > z, Function returns a single As expected N/A
where x and y are of type volume with each layer be-
matrix, is applied on a vol- ing the result of the func-
ume and a matrix tion

Table 4.11: Testing inner Volume functions

4.1.3 Mapping Class


The mapping class relies heavily on the matrix class to function correctly. This is why the matrix class was tested
before the Mapping class. The Mapping class can be used by the user to define their own activation or loss function
or to use a predefined function for a specific layer in the network. These are the results of the unit test ran on the
Mapping class:

52
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Instantiating a Mapping f and d are set to the val- As expected N/A
with activations specified ues specified
and its respective deriva-
tives
2 Instantiating a Mapping f and d are set to the val- As expected N/A
with loss functions spec- ues specified
ified and its respective
derivatives

Table 4.12: Testing Mapping Constructors

Test Test Expected Outcome Actual Outcome Changes


No. Made
1 Invoking activation func- The correct outputs of As expected N/A
tion defined the activation function de-
fined is returned
2 Invoking gradient of acti- The correct outputs of the As expected N/A
vation function defined gradient function is re-
turned
3 Invoking loss function de- The correct outputs of the As expected N/A
fined loss function defined is re-
turned
4 Invoking gradient function The correct outputs of the As expected N/A
of loss function defined gradient of the loss func-
tion is returned
5 Testing shared mappings The correct outputs and As expected N/A
defined that are used as its associated derivatives
activation functions (lin- are returned for each ac-
ear, relu, sigmoid, tanh, tivation
swish, softmax act)
6 Testing shared map- The correct outputs and As expected N/A
pings defined that are its associated derivatives
used as loss functions are returned for both loss
(squared error and soft- functions
max)

Table 4.13: Testing Mapping Class functionality

4.1.4 Net Class


The Net class is used by the user to create a Neural Network and to select its properties, such as the loss being
used, the number of features of the data, type of layers and activation function being used on each layer. The net
class will also be used to add some functionality to the neural network such as making predictions using a batch of
inputs or saving a specific model for later use. These are the results of the unit test on the Net class:

53
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Instantiating Net with loss Net Loss function and net- As expected N/A
function and data features feature variables are set to
specified the specified values

Table 4.14: Testing Net Constructors

Test Test Expected Outcome Actual Outcome Changes


No. Made
1 Adding Dense layer with A dense layer is pushed on As expected N/A
specified activation func- to the layers stack along
tion and neuron number with the activation func-
tion being used and the
number of neurons in the
layer.
2 Adding Convolutional A conv layer is pushed on As expected N/A
layer with kernel dimen- to the layers stack along
sions, strides and layer with its associated fea-
activation specified tures such as kernel size,
strides, and layer activa-
tion
3 Adding reshape layer A reshape layer is pushed As expected N/A
on to the layers stack
4 Adding MaxPool layer A MaxPool layer is pushed As expected N/A
on to the layers stack
along with its associ-
ated kernel dimensions
and strides
5 Predict function The net returns the cor- As expected N/A
rect output of an infer-
ence made using the layers
specified

Table 4.15: Testing Net functionality

4.1.5 Neural Networks Test


The main functionality of my library, is to allow users to create a convolutional or a dense neural network. There-
fore, it is important that I test this functionality out and make sure it works correctly. This test, unlike the previous
tests, is an integration test as I will be testing out many classes at the same time, including the Volume, Matrix, Net,
Conv, Dense, MaxPool, Reshape, Mapping, Optimiser and netData classes. Therefore, by doing this integration
test I will also know how well the constituent parts of my project work together.
It would be incredibly difficult to test whether a neural network works and whether the back propagation algorithm
works by tracing each step. However, I can still test whether the network learns from the data given by comparing
how well it performs against the data and how each back-propagation algorithm allows the network to learn. It is
important to note that the conv-layers are used in conjunction with the Dense layer, therefore it is first necessary
to test whether the dense layers work before moving on to the conv-layers.

I will now be testing the dense layers, using the squared error as my loss function and using 500 training iterations.
The training data that I will be using will be x = matrix.join(x1 , x2 ), where x1 ∼ N (3, 1) & x1 .shape = (5, 50)
and x1 ∼ N (0, 2) & x2 .shape = (5, 50).

54
This means, that I will be having a data-set of 100 examples with 5 features and the network will try to distinguish
between the two data-sets. If the data belongs in x1 , then the net will need to predict a 1 else it needs to predict a
0.
To see if the net has learnt the data, I will be testing it with a test set which will have the same shape as x, but the
difference being that it will have different values. In these tests the Net should learn the function to an appropriate
degree of accuracy with all hyper parameters kept constant besides the mini-batch size and learning rate. In these
tests the Net should achieve at least a minimum loss of 0.01 for a test to be said to be successful.
I will be recording the average loss for a particular test using 100 different tests for the same architecture and
optimiser.

The following table below shows the results of the test. The net architecture describes the number of layers in
each net and the activation used.

Test Net Architecture Optimiser Final loss


No.
1.0 4-relu, 3-relu, 1-sigmoid GDO 0.0099
1.1 4-relu, 3-relu, 1-sigmoid ADAM 0
1.2 4-relu, 3-relu, 1-sigmoid Mom 0.0031
1.3 4-relu, 3-relu, 1-sigmoid RMS 0.0023
2.0 4-tanh, 2-swish, 2-sigmoid, 1- GDO 0.003
sigmoid
2.1 4-tanh, 2-swish, 2-sigmoid, 1- ADAM 0
sigmoid
2.2 4-tanh, 2-swish, 2-sigmoid, 1- MOM 0.007
sigmoid
2.3 4-tanh, 2-swish, 2-sigmoid, 1- RMS 0.001
sigmoid
3.0 5-tanh, 3-relu, 1-sigmoid GDO 0.0054
3.1 5-tanh, 3-relu, 1-sigmoid ADAM 0.0019
3.2 5-tanh, 3-relu, 1-sigmoid MOM 0.0007
3.3 5-tanh, 3-relu, 1-sigmoid RMS 0.0008
4.0 7-swish, 1-sigmoid GDO 0.0002
4.1 7-swish, 1-sigmoid ADAM 0.0001
4.2 7-swish, 1-sigmoid MOM 0.0007
4.3 7-swish, 1-sigmoid RMS 0
5.0 1-sigmoid GDO 0.0099
5.1 1-sigmoid ADAM 0.0001
5.2 1-sigmoid MOM 0.0015
5.3 1-sigmoid RMS 0.0011

Table 4.16: Dense Neural Network Test

From this table, we can see that the gradient descent optimises are working as they should and that the net is
calculating the gradients correctly. This integration test shows that the constituent parts are functioning together
correctly as no exceptions turned up, which shows that the algorithms have been implemented carefully as no
overflow occurred. Furthermore, the time taken for this test was 2 hours as each test was repeated 100x. Finally,
this test was again repeated 8x using different training sets, but the results were all successful.
The conv layers, unlike the dense layers, do not have these advanced optimisation methods, therefore, I will only
need to test out the gradient descent optimiser(GDO), for the conv layers. One important point to note is that
training the conv layers can take a long time, possibly around 6 hours, depending upon the size of the data if the
tests are to be repeated many times to obtain reliable results.
The data-set I will be using for the conv layers will be the MNIST data set, which consist of a data-set of

55
hand-written digits. This data-set has been collected for the purpose of machine learning, therefore by using this
data-set I will be able to achieve reliable results.
The notation being used to represent the net architecture is:
• c = (f, kx , ky , sx , sy )a will denote a convolutional layer, with kernel of dimension (kx , ky , f ) with strides of
(sx , sy ), activation a and padding set to the default value: ”valid”. The activations that will be used are
relu(r), sigmoid(s) and swish(sw)
• m = (kx , ky , sx , sy ) will denote a max-pooling layer, with kernel of dimension (kx , ky ) with strides of (sx , sy )
and activation a.
The notation to represent the dense nets that follow on after the conv nets will use the same notation as before.
Finally, the softmax function will be used at the end to make a prediction for a specific class, and the loss function
that will be used will be the softmax cross entropy cost function. Due to the limited amount of time and available
computational resources, only 3 tests could be done to train a CNN using 1000 iterations, and each test is repeated
200 times to achieve reliable results for the final loss. The data consists of 50 images of size 28x28x1. The final
loss in this case is measured using another 50 images. Finally, in these tests the network should correctly classify
at least 90% of the test data.

Test Net Architecture Correctly classified img


No.
1.0 c = (32, 2, 2, 2, 2)r , c = (64, 2, 2, 2, 2)s , m = (2, 2, 2, 2), c = 47.6 - 95.2%
(16, 3, 3, 1, 1)r , 10(sof tmaxAct)
2.0 c = (32, 3, 3, 1, 1)s , m = (5, 5, 5, 5), c = 48.3 - 96.6%
(64, 2, 2, 2, 2)r , 32r , 10(sof tmaxAct)
3.0 c = (32, 5, 5, 1, 1)s , M = (5, 5, 2, 2), C = 46.1 - 92.2%
(5, 5, 2, 2)r , 5s w, 10(sof tmaxAct)

Table 4.17: Convolutional Neural Networks Test

In the second test an exception handler was thrown due to an overflow exception

Figure 4.1: Overflow Exception Handler

This overflow exception was because the input was too large before the softmax function was applied. I solved
ex xi −m
this problem through using the fact that Pjex = Pe (xj −m)
i e
Therefore, I have changed the line of code:

Public Shared softmax_act As New Mapping(Function(x) Matrix.exp(x) /


,→ Matrix.sum(Matrix.exp(x), 0).item(1, 1),Function(x) softmax_act.f(x) * (1 -
,→ softmax_act.f(x)))

56
to

Public Shared softmax_act As New Mapping(Function(x) Matrix.exp(x - x.max()) /


,→ Matrix.sum(Matrix.exp(x - x.max()), 0).item(1, 1), Function(x) softmax_act.f(x) * (1 -
,→ softmax_act.f(x)))

This works because both the numerator and denominator of the fraction are being divided by ex.max() , where
x.max() is the maximum element in the matrix, making all the values in the input matrix less than or equal to 1.

After changing this line, the nets worked perfectly fine. This shows that the convolutional networks are work-
ing as they should as the nets on average classified more than 90% of the images correctly for each test, which
means that the CNNs can now differentiate between images of numbers between 0 and 9. Finally, the time taken
to train the CNNs was approximately 19 hours, which is mainly due to the reason that the convolution operation
has time complexity O(n4 ).

57
4.1.6 Screen shots

Figure 4.2: Corresponding code for ConvNet for test 1

Figure 4.3: Corresponding code for ConvNet for test 2

Figure 4.4: Corresponding code for ConvNet for test 3

58
Figure 4.5: Results of first model(out of the 200) for Test No1 on Test Set: part 1

Figure 4.6: Results of first model for Test No1 on Test Set: part 2

Figure 4.7: Results of first model for Test No2 on Test Set: part 1

59
Figure 4.8: Results of first model for Test No2 on Test Set: part 2
Figure 4.9: Results of first model for Test No3 on Test Set: part 1

Figure 4.10: Results of first model for Test No3 on Test Set: part 2

Below are the YouTube link for the videos for the matrices, volume, and Dense networks tests. A video could
not be provided for the convolutional neural networks tests as training conv nets took more than 19 hours in total,
hence the reason why the screen shots were provided instead.

I will not be able to record the tests for the dense layer as training each different architecture 100 times on
different functions for the sake of reliability takes a long time - as it took approximately 8 hours in total. However,
I can record a dense net learning a function instead and training it using a variety of different optimises.

The playlist below shows an example of this:


https://www.youtube.com/watch?v=8UlRKepX28g&list=PLfRtphbl3o-0VLaZzFeRyGXw8GK2RV5T7

In the playlist above, we used a dense net to learn a function. We then made the function (data), much more
difficult and we see that the ADAM optimiser is able to train the net to learn the function to a good degree of
accuracy while the RMS optimiser is not able to do this.

Finally, the link for the testing of some of the functions of the matrix and Volume classes are below:

Matrix Testing video: https://www.youtube.com/watch?v=M1PnV63fa2g


Volume Testing video: https://www.youtube.com/watch?v=aO6wxGt3G58

Note: Not every function was tested in the video, however in the testing section every function was tested thor-
oughly.

60
4.2 Erroneous Testing
While it is important that the library works for valid inputs, it is also important to make sure the library handles
invalid inputs and overflows correctly. Therefore, I will be testing the Matrix for any errounous inputs and overflows,
and how it handles them through the use of exceptions.

4.2.1 Matrix Class


Exceptions handling is the only way the matrix class will be handling with erronous inputs. Therefore, I have made
sure that the errors that turn up are detailed and the user understands what went wrong so that they can go back
and change their inputs.

Test Test Expected Outcome Actual Outcome Changes


No. Made
1 Instantiating Matrix with System.OutOfMemoryException
As expected N/A
initial values set to inte- must occur
ger.maxvalue numbers
2 Instantiating Matrix with System.OutOfMemoryException
As expected N/A
fixed size - size is set must occur
to (integer.maxvalue, inte-
ger.maxvalue)

Table 4.18: Testing Matrix Constructors

Test Test Expected Outcome Actual Outcome Changes


No. Made
1 Returning an item from a Exception occurred As expected N/A
matrix from an invalid po- throwing ”index is out of
sition bounds for matrix”
2 Setting an invalid specific Exception occurred As expected N/A
cell in a matrix to a spec- throwing ”index is out of
ified value bounds for matrix”
3 Testing get/set property Exception occurred As expected N/A
to returns/set an invlaid throwing ”index is of
portion of the matrix bounds for matrix”
4 Testing reshaping func- Exception occurred As expected N/A
tion using invalid dimen- throwing ”Matrix dimen-
sions for reshape sions do not conform for
reshape”
9 Testing addcol function to Exception occurred As expected N/A
add new column in an in- throwing ”Index cannot
valid axis be more that zero, and
needs to be less than the
size of dimension in which
col/row is being added”

Table 4.19: Testing Matrix basic functionality

61
Test Test Expected Outcome Actual Outcome Changes
No. Made
1 Sum of 2 matrices with Exception occurred As expected N/A
invalid dimensions, i.e throwing ”shapes do not
shapes not conforming conform for addition”
2 Difference of 2 matrices Exception occurred As expected N/A
with dimensions not con- throwing ”shapes do not
forming conform for addition”
3 Element-wise multiplica- Exception occurred As expected N/A
tion between 2 matrices throwing ”shapes do not
with invalid dimensions conform for element-wise
multiplication”
4 Element-wise division of 2 Exception occurred As expected N/A
matrices with invalid di- throwing ”shapes do not
mensions conform for element-wise
multiplication”
5 Matrix Multiplication be- Exception occurred As expected N/A
tween 2 matrices with in- throwing ”shapes do
valid dimensions not conform for matrix
multiplication”
6 DotSum operation be- Exception occurred As expected N/A
tween 2 matrices with throwing ”shapes do
different shapes not conform for matrix
multiplication”

Table 4.20: Testing Matrix Operations

4.2.2 Volume Class


The volume class, being a list of volumes works very similar to the matrix class and uses many functions of the
matrix class, such as addition, multiplication and many other operations. This means that erroneous testing on
these matrix operations indirectly does erroneous testing on the volume operations.

Test Test Expected Outcome Actual Outcome Changes


No. Made
1 list to volume cast using Exception occurred As expected N/A
invalid dimensions throwing ”Dimensions
for matrix must only be
sufficient to store all the
items in the Volume”
2 Volume to matrix cast us- Exception thrown oc- As expected N/A
ing invalid dimensions curred throwing ”Dimen-
sions for matrix must only
be sufficient to store all
the items in the Volume”

Table 4.21: Testing Volume Casts

62
Evaluation

Finally, I will now be evaluating my project and looking back at the objectives and compare my project to the
objectives set and determining whether the library meets the required objectives.

5.1 Meeting the Objectives


5.1.1 Core
The core objectives of the project are:
1. The user can manipulate with matrices by joining two matrices together, splitting a given matrix, iterate
through the columns of a matrix or its values and applying a onehot encoding function to a given matrix

2. The user can multiply, add, subtract and transpose a given matrix.
3. The user can multiply, add, subtract and transpose a list of matrices together, i.e volumes.
These objectives specify the linear algebra part of my project and it is important that I meet these particular
objectives because they are used by every class in my project. I have met these objectives thoroughly as the matrix
class offers a wide range of functions and sub-routines which includes:

• Add/subtract/divide/multiply
• Matrix Multiplication
• reshaping matrix

• Transposing
• Iterations, which includes columns, values, and a list of matrices
• Onehot and inversing the onehot function
• Applying a function to the items of the matrix

Likewise, the same can be said of Volume as it offers a wide range of functions including the ones specified in the
objectives. Clearly, from this we can state that these objectives set have been met.

4. The user can create their own Dense Neural Networks.


5. The user has an option in which activation function they would like to use and also allow the user to experiment
with their own activation functions.
6. The user can tune the hyper parameters, such as the learning rate, number of neurons in a layer, number of
layers and loss function.
7. The user can train the network using a back-propagation algorithm.

63
8. The gradient descent algorithms should implement stochastic gradient descent, batch gradient descent as well
as mini-batch gradient descent.
9. The user can view the weights of the network i.e learned parameters of the network

10. The user can view the gradients for a specific layer in a dense-net, given the gradients for the layer above.

These objectives specify the neural networks part of my project which is the most important aspect of my project
as this is the main functionality of the library. The project has met all these objectives as it allows users to create
dense neural networks, layer by layer and for each layer allows the user to chose, layer activation, number of neurons
in layer and loss function. The user also has a choice of gradient descent optimises such as Adam, Momentum, RMS
or the standard back-propagation algorithm which all implement mini-batch gradient descent. By implementing
mini-batch gradient descent, the user can chose their batch size, which enables them to either use stochastic, batch
or mini-batch gradient descent. Furthermore, the library allows the user to view the parameters of the layer which
includes the weights, bias and also the output and allows the user to view the gradients for a specific layer in a
dense net given the gradients for the layer above.
One aspect however that could be further improved is that I could make the training of neural networks much more
efficient through the use of parallel programming. This would speed-up allot of the heavy computation hence saving
time for the user and much easier to run tests on. However, due to the limited time I was not able to do this as
I would then need to consider many other factors and parallel programming would introduce new problems that
require a certain level of expertise to deal with.
Finally, the project meets these objectives but if the user wanted to add many layers to a net or have 1000 neurons
in a layer, it could get out of hand as the amount of time required for one single backpropagation iteration would
increase exponentially.

11. The user can add convolutional layers to their network, which they can tune by changing the hyper parameters
such as the kernel dimensions, layer activation and kernel strides.

The project meets this objective as users can create a convolutional layer and allows the user to chose the
settings for that layer, which includes kernel dimensions, layer activation and kernel strides.

Overall, all the core objectives have been met to a satisfactory degree and there is very little that can be done
to improve upon the project.

5.1.2 Extension
The extension objectives are:
1. The user can choose from a wide range of optimisation algorithms such as momentum, RMS and Adam to
train their dense networks
2. The user can define their own back propagation algorithms to train their neural networks

3. The user can define their own layers


I have met these objectives through the extensive use of OOP by having a base class as Tensor, Layer and
Optimiser. By having these base classes, the user can define their own layers by inheriting from the Layer class.
Likewise, the same can be said about the Optimiser class as both classes include MustInherit functions that allows
the user-defined class to integrate successfully into the library. Furthermore, the library allows users to use advanced
optimisation techniques such as Adam, Momentum and RSM. From this it is clear that the project also meets the
extended objectives set.

64
5.2 Re-Interviewing Clients
I re-interviewed the clients and asked them for their opinion on the final product. These are some of the responses,
I received:
The first student, Nitish Bala said, ”The library is extremely intuitive to use and manipulating with data-structures
such as matrices and volumes is very easy to do. The optimisation methods also work well and it is very easy to
define your own optimisation techniques. Overall, I think this library is easy to use and making dense nets has
never been any easier than this.”

The second student, Basim Khajwal said, ”The library is easy to use and offers a variety of optimisation
techniques. One particular part, I like a lot about this library is that it allows the user to define their own layers,
which I think is a big plus. Overall, I think this library is the perfect library to go to for a beginner, if they would
like to get a hands on intro to machine learning.”

The third student, Taha Rind said, ”I like the design of this library as it makes the library very obvious to use.
I quite like the fact that making a simple dense network can be done in 5 lines of code, which makes this library
appealing to use.”

The fourth student, Mujahid Mamaniat said, ”Making a neural network including a CNN is very easy to do, as
creating a network only requires setting the parameters for each layer. Another aspect I like about the library is that
although it offers many features such as enabling users to create their own layers, optimisers, activation functions
it doesn’t sacrifice out on the design of the software like many other ML libraries out there.”

Finally, Jamie Stirling said, ”I like the way the library has an extra focus on dense nets as this is what many
libraries are lacking. I also like the way the library allows users to create their own activations and loss functions,
without making the library too difficult to work with. Overall, I think this library is perfect for a beginner as it is
not complicated but at the same time offers many features which might interest the user even more.”

Therefore, it is quite obvious from these feed-backs that the library has been successful in delivering a machine
learning library for beginners.

5.3 Revisiting the problem


In the future, if I were to revisit the problem, I would allow the user to view graphs of how the loss has changed over
time. By adding graphs, it would make the library much more interactive. I would also include graphs showing the
cost for different values of the weights and biases. Another area where I could improve upon is that I could also offer
advanced optimisation methods for the convolutional layers, as training these layers using standard back-propagation
algorithm is very slow. Finally, besides everything I have stated there isn’t much that I would change as the project
does meet the required objectives set and from the final interviews it also seems the project has met the users needs.

This problem that I am trying to solve however, can never be solved entirely, mainly because it is an open problem.
The needs of the users are constantly changing and machine learning is always advancing. It was only recently that
dense nets became the number one go to for machine learning, therefore in the future the user may require different
algorithms to suit their needs, however for the current climate of machine learning, this solution does meet the
required objectives to counter the original problem; offering an easy to use machine learning library for beginners
to use.

65
Code Listings

6.1 NeuralDot.Tensor

Public Interface Tensor


'The base class tensor, will be inherited by the Volume and the Matrix class.
'Also both Volume and Matrix are tensors and will both have functions that are in common.

Sub print() 'The sub print is used to print out the values of the Tensor. This is nessesary
,→ that every child inherits this as the user may want
'to see all the values the Tensor holds
Sub transposeSelf() 'This subroutine transposes the Tensor. This is a useful operation as
,→ Transpose is used many times in deep-nets, especially for back-prop

Function clone() As Tensor 'This function will be used by all Tensors, when cloning every
,→ layer. This clone function returns the exact same Tensor, with
'the same values and same state.
Function normalize(Optional ByVal mean As Double = 0, Optional ByVal std As Double = 1) As
,→ Tensor 'This function will be used to normalise the values
'in a matrix.
Function getshape() As List(Of Integer) 'Function returns the shape of the Tensor. Function
,→ returns a list as the tensors can have an arbitrary number
'of dimensions.

End Interface

6.2 NeuralDot.Matrix

Imports NeuralDot
Public Class Matrix
Implements Tensor

Private values(,) As Double


Private shape As Tuple(Of Integer, Integer)
Private tState As Boolean = False
'The values(,) variable is a 2d array that is used to store each element of the matrix
'The shape variable describes the shape of the matrix in the form (number of rows, number
,→ of columns)

66
'The tState variable can either be True or False. This is used to denote whether the matrix
,→ has been transposed or not

'Constructors
Public Sub New(ByVal initial_values(,) As Double)
values = initial_values
shape = New Tuple(Of Integer, Integer)(initial_values.GetLength(0),
,→ initial_values.GetLength(1))
End Sub 'COMPLETED -- This constructor is used to initialize a matrix with initial values

Public Sub New(ByVal rows As Integer, ByVal cols As Integer)


Me.shape = New Tuple(Of Integer, Integer)(rows, cols)
ReDim values(rows - 1, cols - 1)
End Sub 'COMPLETED -- This constructor is used to create a matrix of zeros with a SET size

Public Sub New(ByVal rows As Integer, ByVal cols As Integer, ByVal mean As Decimal, ByVal
,→ std As Decimal)
Me.New(rows, cols)
For j As Integer = 0 To cols - 1
For i As Integer = 0 To rows - 1
Me.values(i, j) = Math.Round(norm(mean, std), 2)
Next
Next
End Sub 'Completed -- Constructor Instantiates matrix with values which are normal
,→ Distribution

Public Sub New(ByVal rows As Integer, ByVal cols As Integer, ByVal value As Decimal)
Me.New(rows, cols)
For i As Integer = 0 To rows - 1
For j As Integer = 0 To cols - 1
Me.values(i, j) = value
Next
Next
End Sub 'COMPLTED -- Constrctor Instantiates matrix with all values the same

'ALL Constructors Defined

Public Shared Function arange(ByVal rows As Integer, ByVal cols As Integer) As Matrix
Dim result As New Matrix(rows, cols)
Dim i As Double = 0
For Each d In val(result)
result.item(Math.Truncate(i / cols) + 1, (i Mod cols) + 1) = i
i += 1
Next
Return result
End Function 'COMPLETED -- Returns a Matrix that has all the values in an incrementing
,→ order. Useful for debugging and Testing Other functions

Public Function reshape(ByVal rows As Integer, ByVal cols As Integer) As Matrix


If rows * cols <> Me.shape.Item1 * Me.shape.Item2 Then
Throw New System.Exception("Matrix dimensions do not conform for reshape")
End If

67
Dim result As New Matrix(rows, cols)
Dim i As Integer = 0
For Each d In val(Me)
result.item(Math.Truncate(i / cols) + 1, (i Mod cols) + 1) = d
i += 1
Next
Return result
End Function 'COMPLETED -- Rehapes a matrix in to another matrix with shape = (rows, cols)

Public Function getshape() As List(Of Integer) Implements Tensor.getshape


Return New List(Of Integer) From {Me.shape.Item1, Me.shape.Item2}
End Function 'COMPLETED -- Returns the variable shape which is = (values.shape(0) + 1,
,→ values.shape(2) + 1)

Public Property item(ByVal i As Integer, ByVal j As Integer) As Double


Get
'Get is used to return a specified value from a matrix, specified by the i,j values
Try
If tState = False Then
Return values(i - 1, j - 1)
'If matrix is tranposed return j,i instead of the value i,j
Else
Return values(j - 1, i - 1)
End If
Catch ex As Exception
If ((0 < i) And (i < Me.getshape(0) + 1)) AndAlso ((0 < j) And (j <
,→ Me.getshape(1) + 1)) Then
'This is becuase "i,j" is within the acceptebale range, meaning that the
,→ exception was due to the number being too small or large
Throw New System.Exception("Value is either too small or too large")
End If
Console.WriteLine(Me.getshape(0))
Console.WriteLine(Me.getshape(1))
Throw New System.Exception("Index is out of bounds for Matrix")
End Try
End Get
Set(value As Double)
'Set is used to set a particular element (i,j) of a matrix to a specified value
Try
If tState = False Then
values(i - 1, j - 1) = value
Else
'If the matrix is transposed then set the j,i item to be the value given
values(j - 1, i - 1) = value
End If
Catch ex As Exception
Console.WriteLine(Me.getshape(0))
Console.WriteLine(Me.getshape(1))
Throw New System.Exception("Index is out of bounds for Matrix")
End Try
End Set
End Property 'COMPLETED -- Property is used to set or get an item from a Matrix

68
Public Property item(ByVal i_start As Integer, ByVal i_end As Integer, ByVal j_start As
,→ Integer, ByVal j_end As Integer) As Matrix
Get
Dim result As New Matrix(i_end - i_start + 1, j_end - j_start + 1)
For i As Integer = i_start To i_end
For j As Integer = j_start To j_end
result.item(i - i_start + 1, j - j_start + 1) = Me.item(i, j)
Next
Next
Return result
'Get is used to return a Submatrix from a matrix
End Get
Set(ByVal value As Matrix)
For i As Integer = i_start To i_end
For j As Integer = j_start To j_end
Me.item(i, j) = value.item(i - i_start + 1, j - j_start + 1)
Next
Next
'Set is used to set a submatrix of a matrix
End Set
End Property 'COMPLETED -- Property is used to set or get a submatrix of a matrix

Public Overloads Sub print() Implements Tensor.print


For i As Integer = 1 To Me.shape.Item1
For j As Integer = 1 To Me.shape.Item2
Console.Write("{0} ", Me.item(i, j))
Next
Console.WriteLine("")
Next
End Sub 'COMPLETED -- Subroutine is used to print out the values of a matrix

Public Overloads Shared Sub print(ByVal MatList As IEnumerable(Of Matrix))


For Each m In MatList
m.print()
Console.WriteLine("")
Next
End Sub 'COMPLETED -- Prints out several the elements of several matrices

Public Overloads Shared Sub tensor_print(ByVal vollist As IEnumerable(Of Tensor))


For Each vol In vollist
vol.print()
Next
End Sub 'COMPLETED -- Used to print out an IEumerable of Tensors

Public Function get_values() As Double(,)


Return values
End Function 'COMPLETED -- Function returns all the values in the matrix

Public Shared Function randn(ByVal rows As Integer, ByVal cols As Integer) As Matrix
Return New Matrix(rows, cols, 0, 1)
End Function 'Completed -- Function returns a matrix of values with normal dist

69
Public Sub transposeSelf() Implements Tensor.transposeSelf
tState = Not tState
shape = New Tuple(Of Integer, Integer)(Me.shape.Item2, Me.shape.Item1)
End Sub 'COMPLETED -- Subroutine used to swap "ij" and also swap indices of shape -
,→ transposes the current matrix - Does not create an instance

Public Function transpose() As Matrix


Dim transposed As Matrix = Me.clone
transposed.transposeSelf()
Return transposed
End Function 'COMPLETED -- Function returns a transposed of a matrix

Public Shared Function transpose(ByVal m As Matrix) As Matrix


Dim transposed As Matrix = m.clone
transposed.transposeSelf()
Return transposed
End Function 'COMPLETED -- FUNCTION returns a transposed of a matrix

Public Function clone() As Tensor Implements Tensor.clone


Dim cloned As New Matrix(Me.shape.Item1, Me.shape.Item2)
cloned.values = Me.values.Clone
cloned.tState = Me.tState
'For a matrix to be cloned, values tState and shape variables are required to be the
,→ same
Return cloned
End Function 'completed -- function returns an identical matrix (A clone)

Public Shared Function remove_index(ByVal m As Matrix, ByVal row As Integer, ByVal axis As
,→ Integer) As Matrix
'If axis is 0 then a column is deleted else if axis is 1 then a row is deleted
'The row parameter is used to state which row or column is being deleted. The index
,→ starts from 0
If axis = 1 Then
m.transposeSelf()
'The matrix(M) is being transposed as removing a row from the matrix(M) is
,→ equaivilant to removing a column in the transposed matrix
End If
If axis <> 0 And axis <> 1 Then
Throw New System.Exception("Axis must be 0 or 1")
'An excpetion is thrown as the direction to remove a column or row is only limited
,→ to 0 or 1 meaning column or row, respectively.
ElseIf Not (0 <= row < m.shape.Item2) Then
Throw New System.Exception("Index was out of range")
'An exception is thrown as the row parameter is not within the acceptable range, so
,→ that the row/column can be deleted
End If

Dim result As New Matrix(m.shape.Item1, m.shape.Item2 - 1) 'The output matrix will


,→ have one less column than the input Matrix M
Dim index_i, index_j As Integer

70
'The following is used to copy all the values of the matrix onto the matrix result. The
,→ elements of the column/row that will be removed will not be copied on to the matrix
,→ result
For j As Integer = 0 To m.shape.Item2 - 1
index_i = 0
If Not (j = row) Then
For i As Integer = 0 To m.shape.Item1 - 1
result.values(index_i, index_j) = m.item(i + 1, j + 1)
index_i += 1
Next
index_j += 1
End If
Next
If axis = 1 Then
result.transposeSelf() 'The matrix is transposed to turn it back into its
,→ original form. The matrix will only be transposed if the axis is 1 as only then
,→ the Matrix is transposed.
End If
Return result
End Function 'COMPLETED --Removes a row/col from the list

Public Shared Function conv(ByVal m As Matrix, ByVal kernel As Matrix, Optional ByVal
,→ stridesx As Integer = 1, Optional ByVal stridesy As Integer = 1, Optional ByVal padding
,→ As String = "valid") As Matrix
Dim paddy, paddx As Integer
If padding = "full" Then
'For a full convolution, the Matrix should be first zero-padded such that the every
,→ element in the matrix can be used to convolve with the kernel, and then a Valid
,→ convolution is applied.
m = Matrix.padd(m, kernel.shape.Item1 - 1, kernel.shape.Item2 - 1)
Return conv(m, kernel, stridesx, stridesy, "valid")
End If
If padding = "same" Then
If ((m.shape.Item1 Mod stridesy) = 0) Then
paddy = Math.Max(kernel.shape.Item1 - stridesy, 0)
Else
paddy = Math.Max(kernel.shape.Item1 - (m.shape.Item1 Mod stridesy), 0)
End If
If ((m.shape.Item2 Mod stridesx) = 0) Then
paddx = Math.Max(kernel.shape.Item2 - stridesx, 0)
Else
paddx = Math.Max(kernel.shape.Item2 - (m.shape.Item2 Mod stridesx), 0)
End If
m = Matrix.addcol(m, 1, Math.Floor(paddy / 2), 1)
m = Matrix.addcol(m, m.shape.Item1 + 1, paddy - Math.Floor(paddy / 2), 1)
m = Matrix.addcol(m, 1, Math.Floor(paddx / 2), 0)
m = Matrix.addcol(m, m.shape.Item2 + 1, paddx - Math.Floor(paddx / 2), 0)
'The amount of padding done for SAME convolution follows the tensorlflow guidlines
,→ for the amount of padding
Return conv(m, kernel, stridesx, stridesy, "valid")
ElseIf padding = "valid" Then

71
Dim result As New Matrix(Math.Truncate((m.shape.Item1 - kernel.getshape(0)) /
,→ stridesy) + 1, Math.Truncate((m.shape.Item2 - kernel.getshape(1)) / stridesx) +
,→ 1)
Dim i As Integer = 0
'The following code is used to compute the resulting convolved Matrix
'The dot product is used as convolution is essentially a series of dot products
For Each S In submatrix(m, kernel.shape.Item2, kernel.shape.Item1, stridesx,
,→ stridesy)
result.values(Math.Truncate(i / result.shape.Item2), i Mod result.shape.Item2)
,→ = Matrix.dotsum(S, kernel)
i += 1
Next
Return result
End If
Console.WriteLine(padding)
Throw New System.Exception("Padding must be either valid, same or full")
End Function 'COMPLETED -- Returns the convolution after a kernel have been applied

Public Shared Function padd(ByVal m As Matrix, ByVal i As Integer, ByVal j As Integer) As


,→ Matrix
'This function is used to padd a Matrix with zeros and is used allot for convolutions
,→ and in general for image manipulation
Dim result As Matrix = m.clone
result = addcol(result, 1, i, 0)
result = addcol(result, result.shape.Item2 + 1, i, 0)
result = addcol(result, 1, j, 1)
result = addcol(result, result.shape.Item1 + 1, j, 1)
Return result
End Function 'COMPLETED -- Functions returns the matrix padded with zero

Public Overloads Shared Function addcol(ByVal m As Matrix, ByVal index As Integer, ByVal n
,→ As Integer, ByVal axis As Integer) As Matrix
Select Case axis
Case 1
'The matrix(M) is being transposed as adding a row to the matrix(M) is
,→ equaivilant to adding a column in the transposed matrix
Return addcol(m.transpose, index, n, 0).transpose
Case 0
If index <= 0 Or index > m.shape.Item2 + 1 Then
'The error is thrown here as the index is out of the accpetable bounds for
,→ a column to be inserted
Throw New System.Exception("Index cannot be more that zero, and needs to be
,→ less than the size of dimension in which col/row is being added")
End If
Dim output As New Matrix(m.shape.Item1, m.shape.Item2 + n) 'adds "n" extra
,→ columns
'The following code inserts all the values from the Matrix (M) into the output
,→ Matrix
For j As Integer = 0 To m.shape.Item1 - 1
For i As Integer = 0 To index - 2
output.values(j, i) = m.item(j + 1, i + 1)
Next

72
Next
For j As Integer = 0 To output.shape.Item1 - 1
For i As Integer = index To output.shape.Item2 - 1 - n + 1
output.values(j, i + n - 1) = m.item(j + 1, i)
Next
Next
Return output
Case Else
'Error is thrown here as only a column or row can only be inserted
Throw New System.Exception("axis can only be 0 or 1")
End Select
End Function 'COMPLETED -- Adds "n" number of cols/rows to a matrix

Public Overloads Shared Function join(ByVal m As Matrix, ByVal n As Matrix, ByVal index As
,→ Integer) As Matrix
If m.shape.Item1 <> n.shape.Item1 Then
'An error is thrown here is both Matrices do not have the same number of rows
Throw New System.Exception("Number of Rows must be the same for both Matrices")
End If
Dim result As New Matrix(m.shape.Item1, m.shape.Item2 + n.shape.Item2)
Dim i As Integer = 0
For k As Integer = 0 To m.shape.Item2 - 1
If i = index Then
i += n.shape.Item2
End If
For l As Integer = 0 To m.shape.Item1 - 1
result.values(l, i) = m.item(l + 1, k + 1)
Next
i += 1
Next
For k As Integer = 0 To n.shape.Item2 - 1
For l As Integer = 0 To m.shape.Item1 - 1
result.values(l, index + k) = n.item(l + 1, k + 1)
Next
Next
Return result
End Function 'COMPLETED -- Concatenates a Matrix(M) to another Matrix(N) at the specified
,→ axis

Public Function maxpool(ByVal kernelx As Integer, ByVal kernely As Integer, ByVal stridesx
,→ As Integer, ByVal stridesy As Integer) As Matrix
Dim result As New Matrix(((Me.shape.Item1 - kernely) / stridesy) + 1, ((Me.shape.Item2
,→ - kernelx) / stridesx) + 1)
Dim i As Integer = 0
'Following code is used to select the Maximum element out of each submatrix
For Each m In submatrix(Me, kernelx, kernely, stridesx, stridesy) 'submatrix is a
,→ function of type ienumerable
result.values(Math.Truncate(i / result.shape.Item2), i Mod result.shape.Item2) =
,→ m.max()
i += 1
Next
Return result

73
End Function 'Completed -- Applies maxpooling to a matrix with a kernel of size = (kernelx,
,→ kernely)

Public Function max() As Double


Dim max_val As Double 'Variable Used to store the maxmimum number currently found in
,→ Matrix
For Each value In Me.values
If value > max_val Then
max_val = value 'If the current value is larger than max_val, then set max_val
,→ to new maximum value
End If
Next
Return max_val
End Function 'COMPLETED -- Returns the maximum value in a Matrix

Public Shared Function rotate(ByVal m As Matrix, Optional ByVal theta As Integer = 1) As


,→ Matrix
If theta = 0 Then 'If theta is 0, then no rotation is required
Return m
Else
'Rotating a matrix 90 degrees clockwise is the same as transposing the matrix, and
,→ then reversing each row
Dim transposed As Matrix = m.transpose
Dim result As New Matrix(transposed.shape.Item1, 0)
For Each col In transposed.columns.Reverse
result = Matrix.join(result, col, result.shape.Item2)
Next
Return rotate(result, theta - 1) 'As theta is a multiple of 90 and rotation is a
,→ commutative operation, if theta > 1, then return matrix rotated by theta-1
,→ times
End If
End Function 'COMPLETED -- Rotates the matrix by theta * 90

Public Function oneHot(ByVal num_classes As Integer) As Matrix


'oneHot converts a row vector into a matrix of shape(num_classes, num_samples). Each
,→ corresponding item in the row vector is then used
'to select each column in the resulting matrix by placing a 1, whilst the rest of the
,→ items are set to 0.
If Me.getshape(0) <> 1 Then 'Checking to see if the matrix is a row vector or not
Throw New System.Exception("Matrix must be a row vector for one hot")
End If
Dim oneHotArr(num_classes - 1, Me.getshape(1) - 1) As Double
For j As Integer = 0 To Me.getshape(1) - 1
oneHotArr(Me.item(1, j + 1), j) = 1 'Useing the items in the matrix to select,
,→ place a 1, in the resultant matrix
Next
Return New Matrix(oneHotArr)
End Function 'COMPLETED -- Returns the onehot of a matrix

Public Function invOneHot() As Matrix


Dim result As New Matrix(1, Me.getshape(1))
For j As Integer = 1 To Me.getshape(1)

74
Dim maxval As Double = Double.MinValue 'Variable used to store the maximum item in
,→ this column vector
Dim pos As Integer = 0 'Variable used to store the position of the maximum item in
,→ the vector
For i As Integer = 1 To Me.getshape(0)
If Me.item(i, j) > maxval Then 'If the maximum item is greater than assign the
,→ maxmav to that item and "pos" to the position of that maxmimum value
maxval = Me.item(i, j)
pos = i
End If
Next
result.item(1, j) = pos - 1
Next
Return result
End Function 'COMPLETED -- Returns the inverse of Onehot encoding to a matrix

Public Function normalize(Optional ByVal mean As Double = 0, Optional ByVal std As Double =
,→ 1) As Tensor Implements Tensor.normalize
Dim means As Matrix = Matrix.sum(Me, 0) / Me.getshape(0) 'Finds the column means
Dim stds As Matrix = Matrix.op(AddressOf Math.Pow, (Matrix.sum(Matrix.op(AddressOf
,→ Math.Pow, Me, 2), 0) - Matrix.op(AddressOf Math.Pow, Me, 2) * Me.getshape(0)) /
,→ (Me.getshape(0) - 1), 0.5) 'Finds the std for this particular colum

Dim result As New Matrix(Me.getshape(0), Me.getshape(1))


For i As Integer = 1 To Me.getshape(0)
For j As Integer = 1 To Me.getshape(1)
result.item(i, j) = (Me.item(i, j) - means.item(1, j)) / stds.item(1, j)
,→ 'normalises each item in the matrix
Next
Next
Return (result * std) + mean
End Function 'COMPLETED -- Returns the normalised version of a particular matrix

Public Function broadcast(ByVal m As Matrix) As Matrix


If m.shape.Item2 = 1 Then
Dim vals(Me.shape.Item1 - 1, Me.shape.Item2 - 1) As Double
For i As Integer = 1 To Me.shape.Item1
For j As Integer = 1 To Me.shape.Item2
vals(i - 1, j - 1) = m.item(i, 1) 'Every column of the matrix is duplicated
,→ and joined together
Next
Next
Return New Matrix(vals)
ElseIf m.shape.Item1 = 1 Then
Return Me.broadcast(m.transpose).transpose
Else
Throw New System.Exception("m.getshape(1) must be 1 for broadcast")
End If
End Function 'Broadcast is used to apply an operation to two matrices that are not of the
,→ same shape. One of the matrices however, must be a row/column vector

Public Shared Function op(ByVal f As Func(Of Double, Double), ByVal m As Matrix) As Matrix

75
Dim result As New Matrix(m.shape.Item1, m.shape.Item2)
For i As Integer = 1 To m.shape.Item1
For j As Integer = 1 To m.shape.Item2
result.values(i - 1, j - 1) = f.Invoke(m.item(i, j))
Next
Next
Return result
End Function 'COMPLETED -- Applies a function(double -> double) elementwise to a matrix

Public Shared Function op(ByVal f As Func(Of Double, Double, Double), ByVal m As Matrix,
,→ ByVal n As Double) As Matrix
Dim result As New Matrix(m.shape.Item1, m.shape.Item2)
For i As Integer = 1 To m.shape.Item1
For j As Integer = 1 To m.shape.Item2
result.values(i - 1, j - 1) = f.Invoke(m.item(i, j), n) 'Applies function
,→ "f(x,y)" using parameters (m_ij, n)
Next
Next
Return result
End Function 'COMPLETED -- Applies a function((double, dobule) -> double) elementwise to
,→ matrix using "n" as the second parameter

Public Shared Function sum(ByVal m As Matrix, ByVal axis As Integer) As Matrix


Dim result As New Matrix(1, m.shape.Item2)
If axis = 0 Then
'Following code sums up all the columns of the matrix
For j As Integer = 1 To m.shape.Item2
Dim sum_ As Double = 0
For i As Integer = 1 To m.shape.Item1
sum_ += m.item(i, j)
Next
result.values(0, j - 1) = sum_
Next
ElseIf axis = 1 Then
Return sum(m.transpose, 0).transpose()
Else
Throw New System.Exception("Axis to reduce mean can only be 0 or 1")
End If
Return result
End Function 'COMPLETED -- Returns a reduced matrix by summing in the direction of "axis"

Public Shared Function sum(ByVal m As Matrix) As Matrix


Return Matrix.sum(Matrix.sum(m, 0), 1)
End Function 'Completed -- Returns the Sum of all the elements in the Matrix

Public Shared Function AbsSum(ByVal m As Matrix) As Matrix


Return Matrix.sum(Matrix.sum(Matrix.op(AddressOf Math.Abs, m), 0), 1)
End Function 'COMPLETED -- Returns the absolute sum of a matrix

Public Shared Function SquaredSum(ByVal m As Matrix) As Matrix


Return Matrix.sum(Matrix.sum(Matrix.op(AddressOf Math.Pow, m, 2), 0), 1)
End Function 'COMPLETED -- Returns the sum of the squares of each item in a matrix

76
Public Shared Function mean(ByVal m As Matrix, ByVal axis As Integer) As Matrix
Dim n As Integer = m.getshape(axis)
Return sum(m, axis) / n
End Function 'COMPLETED -- Returns a reduced matrix by calculating the mean in the
,→ direction of "axis"

Public Shared Function exp(ByVal m As Matrix) As Matrix


Return op(AddressOf Math.Exp, m)
End Function 'COMPLETED -- Exponentiates every item in the Matrix

'Defining matrix joint operations


Public Shared Function add(ByVal x As Matrix, ByVal y As Matrix, Optional broadcast As
,→ Boolean = True) As Matrix

'Following code checks for any possible broadcasting


'If one of the matrices, contain only 1 item, then apply scaler addition to the
,→ matrices
If (Not (y Like x)) And broadcast Then
If x.shape.Item1 = x.shape.Item2 And x.shape.Item2 = 1 Then
Return y + x.item(1, 1)
ElseIf y.shape.Item1 = y.shape.Item2 And y.shape.Item2 = 1 Then
Return x + y.item(1, 1)
End If
'If either of matrices, have a dimension that is of length 1, then brodcast in that
,→ dimension
If x.shape.Item2 = 1 Or y.shape.Item2 = 1 Then
If x.shape.Item2 = 1 Then
Return y.broadcast(x) + y
Else
Return x.broadcast(y) + x
End If
ElseIf x.shape.Item1 = 1 Or y.shape.Item1 = 1 Then
If x.shape.Item1 = 1 Then
Return y.broadcast(x) + y
Else
Return x.broadcast(y) + x
End If
End If
Console.WriteLine("Shape of me is {0}, {1}, Shape of B is {2}, {3},",
,→ y.shape.Item1, y.shape.Item2, x.shape.Item1, x.shape.Item2)
Throw New System.Exception("Shapes do not conform for addition")
End If

'The Following code, is used to add two matrices together


Dim outputshape As Tuple(Of Integer, Integer) = x.shape
Dim c As New Matrix(outputshape.Item1, outputshape.Item2)
For i As Integer = 1 To x.shape.Item1
For j As Integer = 1 To x.shape.Item2
c.values(i - 1, j - 1) = y.item(i, j) + x.item(i, j)
Next
Next

77
Return c
End Function 'COMPLETED -- Adds to matrixes together elementwise

Public Shared Function add(ByVal x As Matrix, ByVal y As Decimal) As Matrix


Dim result As New Matrix(x.shape.Item1, x.shape.Item2, y)
Return x + result
End Function 'COMPLETED -- Returns the sum of a matrix with a decimal value

Public Shared Function matmul(ByVal x As Matrix, ByVal y As Matrix) As Matrix


If x.shape.Item2 <> y.shape.Item1 Then
Console.WriteLine("Shape of A is {0}, {1}. Shape of B is {2}, {3}", x.shape.Item1,
,→ x.shape.Item2, y.shape.Item1, y.shape.Item2)
Throw New System.Exception("Shapes do not conform for matrix multiplication")
End If
'The following code, is for matrix multiplication using the standard way
Dim result As New Matrix(x.shape.Item1, y.shape.Item2)
For i As Integer = 0 To x.shape.Item1 - 1
For j As Integer = 0 To y.shape.Item2 - 1
Dim sum As Decimal = 0
For k = 0 To x.shape.Item2 - 1
Try
sum += x.item(i + 1, k + 1) * y.item(k + 1, j + 1)
Catch ex As Exception
Throw New System.Exception("Value was too large or too Small.")
End Try
Next
result.values(i, j) = sum
Next
Next
Return result
End Function 'COMPLETED -- Returns the product of Matrix Multiplication

Public Shared Function multiply(ByVal m As Matrix, ByVal c As Decimal) As Matrix


Dim result As New Matrix(m.shape.Item1, m.shape.Item2, c)
Return result * m
End Function 'COMPLETED -- Returns the Result of a Scaler multiplication with a constant c

Public Shared Function multiply(ByVal x As Matrix, ByVal y As Matrix) As Matrix


If Not (y Like x) Then
Console.WriteLine("y has shape {0}, {1}", y.shape.Item1, y.shape.Item2)
Console.WriteLine("x has shape {0}, {1}", x.shape.Item1, x.shape.Item2)
Throw New System.Exception("Shapes do not conform for elementwise multiplication")
End If
Dim result As New Matrix(y.shape.Item1, y.shape.Item2)
For i As Integer = 1 To y.shape.Item1
For j As Integer = 1 To y.shape.Item2
Try
result.values(i - 1, j - 1) = x.item(i, j) * y.item(i, j)
Catch ex As Exception
Throw New System.Exception("Value is either too large or too small")
End Try
Next

78
Next
Return result
End Function 'COMPLETED -- Returns the multiplication of two matrices elementwise

Public Shared Function dotsum(ByVal x As Matrix, ByVal y As Matrix) As Double


Dim product As Matrix = y * x
Return Matrix.sum(Matrix.sum(product, 1)).item(1, 1)
End Function 'COMPLETED -- Returns the sum of all the elementwise multiplication of the 2
,→ matrices

' All Joint Operations defined

' All operators defined below

Public Shared Operator =(ByVal a As Matrix, ByVal b As Matrix) As Boolean


If Not a Like b Then
Return False
End If
For i As Integer = 1 To a.shape.Item1
For j As Integer = 1 To a.shape.Item2
If a.item(i, j) <> b.item(i, j) Then 'Statement compare each value in the
,→ matrix
Return False
End If
Next
Next
Return True
End Operator 'Completed -- Checks if all items in the matrix are the same

Public Shared Operator <>(ByVal a As Matrix, ByVal b As Matrix) As Boolean


Return Not a = b
End Operator 'Completed -- Returns: Not=

Public Shared Operator +(ByVal a As Matrix, ByVal b As Matrix) As Matrix


Return Matrix.add(a, b)
End Operator 'COMPLETED -- Returns the sum of two matrices elementwise

Public Shared Operator +(ByVal a As Matrix, ByVal b As Decimal) As Matrix


Return Matrix.add(a, b)
End Operator 'COMPLETED -- Returns the sum of a decimal value to a matrix elementwise

Public Shared Operator +(ByVal a As Decimal, ByVal b As Matrix) As Matrix


Return Matrix.add(b, a)
End Operator 'COMPLETED -- Returns the sum of a decimal value to a matrix elementwise

Public Shared Operator -(ByVal a As Matrix, ByVal b As Matrix) As Matrix


Return a + (-1 * b)
End Operator 'COMPLETED -- Returns the differece of two matrices elementwise

Public Shared Operator -(ByVal a As Matrix, ByVal b As Decimal) As Matrix


Return Matrix.add(a, -b)

79
End Operator 'COMPLETED -- Returns the differece of a matrix with a decimal number

Public Shared Operator -(ByVal a As Decimal, ByVal b As Matrix) As Matrix


Return New Matrix(b.shape.Item1, b.shape.Item2, a) - b
End Operator 'COMPLETED -- Returns the differece of a matrix with a decimal number

Public Shared Operator *(ByVal a As Matrix, ByVal b As Matrix) As Matrix


Return Matrix.multiply(a, b)
End Operator 'COMPLETED -- Returns the multiplication of two matrices elementwise

Shared Operator /(ByVal a As Matrix, ByVal b As Matrix) As Matrix


Return a * (1 / b)
End Operator

Public Shared Operator *(ByVal a As Matrix, ByVal b As Decimal) As Matrix 'Scaler


,→ multiplication
Return Matrix.multiply(a, b)
End Operator 'COMPLETED -- Returns the scaler product of a Matrix with a Decimal

Public Shared Operator *(ByVal a As Decimal, ByVal b As Matrix) As Matrix


Return b * a
End Operator 'COMPLETED -- Returns the scaler product of a Matrix with a Decimal

Public Shared Operator /(ByVal a As Matrix, ByVal b As Decimal) As Matrix


Return a * (1 / b)
End Operator 'COMPLETED -- Returns the scaler division of a matrix

Public Shared Operator Like(ByVal a As Matrix, ByVal b As Matrix) As Boolean


If a.shape.Item1 = b.shape.Item1 And a.shape.Item2 = b.shape.Item2 Then
Return True
Else
Return False
End If
End Operator 'Completed -- Returns true if both A and B have the same shape

Public Function max(ByVal a As Decimal) As Matrix


Dim result As New Matrix(Me.shape.Item1, Me.shape.Item2)
For i As Integer = 1 To Me.shape.Item1
For j As Integer = 1 To Me.shape.Item2
result.values(i - 1, j - 1) = Math.Max(a, Me.item(i, j))
Next
Next
Return result
End Function 'COMPLETED -- Applies the max operator to each item in a matrix

Public Shared Operator <(ByVal a As Decimal, ByVal m As Matrix) As Matrix


Dim result As New Matrix(m.shape.Item1, m.shape.Item2)
For i As Integer = 1 To m.shape.Item1
For j As Integer = 1 To m.shape.Item2
result.item(i, j) = Int(a < m.item(i, j))
Next
Next

80
Return result
End Operator 'COMPLETED -- Compares each value in the matrix elementwise, with another item
,→ and returns 1 if value is larger,
'Else returns 0 If value was smaller

Public Shared Operator >(ByVal a As Decimal, ByVal m As Matrix) As Matrix


Dim result As New Matrix(m.shape.Item1, m.shape.Item2)
For i As Integer = 1 To m.shape.Item1
For j As Integer = 1 To m.shape.Item2
result.item(i, j) = Int(a > m.item(i, j))
Next
Next
Return result
End Operator 'COMPLETED -- Compares each value in the matrix elementwise, with another item
,→ and returns 1 if value is smaller,

Public Shared Operator /(ByVal a As Decimal, ByVal m As Matrix) As Matrix


Dim result As New Matrix(m.shape.Item1, m.shape.Item2)
For i As Integer = 1 To m.shape.Item1
For j As Integer = 1 To m.shape.Item2
result.values(i - 1, j - 1) = a / m.item(i, j)
Next
Next
Return result
End Operator 'COMPLETED -- each item in the matrix is set to m_ij -> a / m_ij

'All Operators defined

'Iterators

Public Overloads Iterator Function columns() As IEnumerable(Of Tensor)


For j As Integer = 1 To Me.shape.Item2
Dim result As New Matrix(Me.shape.Item1, 1)
For i As Integer = 1 To Me.shape.Item1
result.values(i - 1, 0) = Me.item(i, j) 'Copies items of the column of the
,→ matrix to another matrix
Next
Yield result
Next
End Function 'COMPLETED -- Returns an enumerable of columns of the matrix

Public Shared Iterator Function submatrix(ByVal m As Matrix, ByVal kernelx As Integer,


,→ ByVal kernely As Integer, ByVal stridesx As Integer, ByVal stridesy As Integer) As
,→ IEnumerable(Of Matrix)
For i As Integer = 1 To m.getshape(0) - kernely + 1 Step stridesy
For j As Integer = 1 To m.getshape(1) - kernelx + 1 Step stridesx
Yield m.item(i, i + kernely - 1, j, j + kernelx - 1)
Next
Next
End Function 'COMPLETED -- Returns a collection of matrices, by sliding a window of length
,→ (kernelx, kernely) and using strides = (stridesx, stridesy)

81
Public Shared Iterator Function val(ByVal m As Matrix, Optional stepx As Integer = 1,
,→ Optional stepy As Integer = 1) As IEnumerable(Of Double)
For i As Integer = 1 To m.getshape(0) Step stepy
For j As Integer = 1 To m.getshape(1) Step stepx
Yield m.item(i, j)
Next
Next
End Function 'COMPLETED -- Returns items from a matrix, using a stepsize of (stepx, stepy)
'All iterators defined
'Matrix Class Completed

Public Shared Function norm(ByVal mean As Decimal, ByVal std As Decimal) As Double
Randomize()
Return Math.Sqrt(-2 * Math.Log(Rnd())) * Math.Cos(2 * Math.PI * Rnd()) * std + mean
End Function 'Used to Create random normal numbers using the box-muller transform

End Class

6.3 NeuralDot.Volume

Imports NeuralDot

Public Class Volume


Implements Tensor

Public ReadOnly values As New List(Of Matrix)


Public ReadOnly shape As Tuple(Of Integer, Integer)
'The values variable is used to store all the elements of the Volume using matrices where
,→ each matrix repreents a layer
'The shape variable denotes the shape of the Volume in the form = (height, width)

'Constructors
Public Sub New(ByVal initial_values As List(Of Matrix))
values = initial_values
shape = New Tuple(Of Integer, Integer)(initial_values(0).getshape(0),
,→ initial_values(0).getshape(1))
End Sub 'This constructor is used to initalise the matrix, with initial values

Public Sub New(ByVal h As Integer, ByVal w As Integer, ByVal d As Integer, Optional ByVal
,→ val As Double = 0)
For i As Integer = 1 To d
values.Add(New Matrix(h, w, val))
Next
shape = New Tuple(Of Integer, Integer)(h, w)
End Sub 'Constructor creates a volume of shape = (h, w, d) with all values set to "val"

Public Sub New(ByVal h As Integer, ByVal w As Integer, ByVal d As Integer, ByVal mean As
,→ Double, ByVal std As Double)
For i As Integer = 1 To d
values.Add(New Matrix(h, w, mean, std))

82
Next
shape = New Tuple(Of Integer, Integer)(h, w)
End Sub 'Constructor creates a volume of shape = (h, w, d) with all values normally
,→ distributes with a set mean and std

'All Constructors defined

Public Shared Function arange(ByVal h As Integer, ByVal w As Integer, ByVal d As Integer)


,→ As Volume
Dim result As New List(Of Matrix)
For k As Integer = 1 To d
result.Add(Matrix.arange(h, w) + (k - 1) * (h * w))
Next
Return New Volume(result)
End Function 'COMPLETED -- Returns a Volume that has all the values in an incrementing
,→ order. Useful for debugging

'Accessors

Public Function getshape() As List(Of Integer) Implements Tensor.getshape


Return New List(Of Integer)({Me.shape.Item1, Me.shape.Item2, Me.values.Count})
End Function 'COMPLETED -- Function returns the shape of the Volume in the form (height,
,→ width, depth)

Public Overloads Sub print() Implements Tensor.print


For Each M In Items(Me)
M.print()
Console.WriteLine("")
Next
End Sub 'COMPLETED -- Function prints out all the elements in the Volume

Public Property item(ByVal i As Integer, ByVal j As Integer, ByVal k As Integer) As Double


Get
Return Me.values(k).item(i, j)
End Get
Set(value As Double)
Me.values(k).item(i, j) = value
End Set
End Property 'COMPLETED -- Property used to select and set element (i, j, k) in a volume

Public Property split(ByVal i_start As Integer, ByVal i_end As Integer, ByVal j_start As
,→ Integer, ByVal j_end As Integer, ByVal k As Integer) As Matrix
Get
Dim result As New Matrix(i_end - i_start + 1, j_end - j_start + 1)
For i As Integer = i_start To i_end
For j As Integer = j_start To j_end
result.item(i - i_start + 1, j - j_start + 1) = Me.item(i, j, k)
Next
Next
Return result
End Get
Set(ByVal value As Matrix)

83
For i As Integer = i_start To i_end
For j As Integer = j_start To j_end
Me.item(i, j, k) = value.item(i - i_start + 1, j - j_start + 1)
Next
Next
End Set
End Property 'COMPLETED -- Property used to set/select a portion of a volume

Public ReadOnly Property split(ByVal i_start As Integer, ByVal i_end As Integer, ByVal
,→ j_start As Integer, ByVal j_end As Integer) As Volume
Get
Dim result As New List(Of Matrix)
For Each M In Items(Me)
result.Add(M.item(i_start, i_end, j_start, j_end))
Next
Return New Volume(result)
End Get
End Property 'COMPLETED -- Property used to return a Volume of shape(i_start - i_end + 1,
,→ j_start - j_end + 1, k). All values in the volume
'correspond to an item in the indexing volume

'Volume Operations

Public Function rotate(ByVal theta As Integer) As Volume


If theta = 0 Then
Return Me 'If theta is 0, then return identity
Else
Return op(AddressOf Matrix.rotate, Me).rotate(theta - 1) 'Else rotate each layer in
,→ the Volume, and then rotate by (theta-1)*90 degrees
End If
End Function 'COMPLETED -- Function rotates the Volume by theta * 90 degrees

Public Shared Function j_rotate(ByVal v As Volume) As Volume


Dim result As New Volume(v.values.Count, v.shape.Item2, v.shape.Item1)
For k As Integer = 0 To v.shape.Item1 - 1
For i As Integer = 1 To v.values.Count
For j As Integer = 1 To v.shape.Item2
result.item(i, j, k) = v.item(k + 1, j, i - 1)
Next
Next
Next
Return result
End Function 'COMPLETED -- Rotates in the j direction by 90 degrees

Public Function normalize(Optional ByVal mean As Double = 0, Optional ByVal std As Double =
,→ 1) As Tensor Implements Tensor.normalize
Dim n As Integer = Me.shape.Item1 * Me.shape.Item2 * Me.values.Count
Dim means As Double = Me.values.Select(Function(x) Matrix.sum(x).item(1, 1)).Sum / n
Dim stds As Double = Math.Sqrt((Me.values.Select(Function(x) Matrix.sum(x * x).item(1,
,→ 1)).Sum - (mean * mean * n)) / (n - 1))
Dim result As New List(Of Matrix)
For Each M In Volume.Items(Me)

84
result.Add((M - means) / stds)
Next
Return New Volume(result)
End Function 'COMPLETED -- Returns a volume whose layers are normlised using all the
,→ elements in the volume

Public Function mean(ByVal axis As Integer) As Matrix


Dim result As New Matrix(Me.shape.Item1, Me.shape.Item2)
If axis = 2 Then
For Each M In Me.values
result += M
Next
Return result / Me.values.Count
Else
Throw New System.Exception("axis 1 or 2 has not yet been implemented yet for
,→ Volume")
End If
End Function 'COMPLETED -- Returns the mean of a volume in a specified dimension.
,→ Currently, only works for axis = 2

Public Function transpose() As Volume


Return op(AddressOf Matrix.transpose, Me)
End Function 'COMPLETED -- Function returns the transpose of each element in the matrix

Public Sub transposeSelf() Implements Tensor.transposeSelf


For Each m In values
m.transposeSelf()
Next
End Sub ' COMPLETED -- Sub routine transposes each layer in the matrix

Public Function clone() As Tensor Implements Tensor.clone


Dim cloned As New List(Of Matrix)
For Each m In Items(Me)
cloned.Add(m)
Next
Return New Volume(cloned)
End Function 'COMPLETED -- Function returns a clone of the current volume

'All Volume Operations Defined

'Joint Operations

Public Shared Operator +(ByVal x As Volume, ByVal y As Volume) As Volume


Return op(x, AddressOf Matrix.add, y)
End Operator 'COMPLETED -- Adds two Volumes together elementwise

Public Shared Operator +(ByVal x As Volume, ByVal y As Decimal) As Volume


Return op(x, AddressOf Matrix.add, y)
End Operator 'COMPLETED -- Adds a scaler to a volume elementwise

Public Shared Operator +(ByVal y As Decimal, ByVal x As Volume) As Volume

85
Return op(x, AddressOf Matrix.add, y)
End Operator 'COMPLETED -- Adds a scaler to a Volume elementwise

Public Shared Operator -(ByVal x As Volume, ByVal y As Volume) As Volume


Return op(x, AddressOf Matrix.add, -1 * y)
End Operator 'COMPLETED -- Subtracts a Volume by a scaler

Public Shared Operator -(ByVal x As Volume, ByVal y As Decimal) As Volume


Return op(x, AddressOf Matrix.add, -y)
End Operator 'COMPLETED -- Subtracts a Volume by a scaler

Public Shared Operator -(ByVal y As Decimal, ByVal x As Volume) As Volume


Return y + (-1 * x)
End Operator 'COMPLETED -- Subtracts a Volume by a scaler

Public Shared Operator *(ByVal x As Volume, ByVal y As Volume) As Volume


Return op(x, AddressOf Matrix.multiply, y)
End Operator 'COMPLETED -- Applies elementwise multiplication on two Volumes

Public Shared Operator *(ByVal x As Volume, ByVal y As Decimal) As Volume


Return op(x, AddressOf Matrix.multiply, New Volume(Items(x)(0).getshape(0),
,→ Items(x)(0).getshape(1), x.values.Count + 1, y))
End Operator 'COMPLETED -- Applies elementwise multiplication between a scaler and a Volume

Public Shared Operator *(ByVal x As Decimal, ByVal y As Volume) As Volume


Return y * x
End Operator 'COMPLETED -- Applies elementwise multiplication between a scaler and a Volume

Public Shared Operator /(ByVal x As Volume, ByVal y As Double) As Volume


Return x * (1 / y)
End Operator 'COMPLETED -- Divides each element of a Volume, i.e v_ij = v_ij / y

Public Shared Function conv2d(ByVal v As Volume, ByVal kernels As Volume, Optional stridesx
,→ As Integer = 1, Optional stridesy As Integer = 1, Optional padding As String = "valid")
,→ As Volume
'conv2d is applying a convolution using in 2 dimensions. This means that every 2d
,→ kernel is applied to every layer in the volume.
Dim result_values As New List(Of Matrix) : Dim all_channels As List(Of Matrix) =
,→ Items(v).ToList

For Each k In Items(kernels)


Dim temp As Matrix = Matrix.conv(all_channels(0), k, stridesx, stridesy, padding)
For Each M In all_channels.GetRange(1, all_channels.Count - 1)
temp += Matrix.conv(M, k, stridesx, stridesy, padding) 'Summing up the result
,→ of all the convolutions for that particular kernel
Next
result_values.Add(temp)
Next
Return New Volume(result_values)
End Function 'COMPLETED -- Applies 2d convolution

86
Public Shared Function maxpool(ByVal filter As Volume, ByVal kernely As Integer, ByVal
,→ kernelx As Integer, ByVal stridesy As Integer, ByVal stridesx As Integer) As Volume
Dim result As New List(Of Matrix)
For Each M In Items(filter)
result.Add(M.maxpool(kernelx, kernely, stridesx, stridesy))
Next
Return New Volume(result)
End Function 'COMPLETED -- Returns the maxpooling of a volume using a kernel of shape
,→ (kernelx, kernely) and step size = (stridesx, stridesy)

Public Shared Function op(ByVal x As Volume, ByVal f As Func(Of Matrix, Matrix, Matrix),
,→ ByVal y As Volume) As Volume
Dim result As New List(Of Matrix)
For i As Integer = 0 To x.values.Count - 1
result.Add(f.Invoke(x.values(i), y.values(i)))
Next
Return New Volume(result)
End Function 'COMPLETED -- Applies a function f(Matrix, Matrix) -> Matrix, to all the
,→ layers in the Volumes x and y

Public Shared Function op(ByVal v As Volume, ByVal f As Func(Of Matrix, Matrix, Matrix),
,→ ByVal m As Matrix) As Volume
Dim result As New Volume(v.shape.Item1, v.shape.Item2, 0)
For Each x In Items(v)
result.values.Add(f.Invoke(x, m))
Next
Return result
End Function 'COMPLETED -- Function applies a function "f" to a Volume "x" and "y"
,→ layer-wise

Public Shared Function op(ByVal m As Matrix, ByVal f As Func(Of Matrix, Matrix, Matrix),
,→ ByVal v As Volume) As Volume
Return op(v, f, m)
End Function 'COMPLETED -- Function applies a function "f" to a Volume "x" and "y"
,→ layer-wise

Public Shared Function op(ByVal v As Volume, ByVal f As Func(Of Matrix, Double, Matrix),
,→ ByVal y As Double) As Volume
Dim result As New Volume(v.shape.Item1, v.shape.Item2, 0)
For Each x In Items(v)
result.values.Add(f.Invoke(x, y))
Next
Return result
End Function 'COMPLETED -- Function applies a function "f(v,y)" to a volume "v" and a
,→ double "y"

Public Shared Function op(ByVal y As Double, ByVal f As Func(Of Matrix, Double, Matrix),
,→ ByVal v As Volume) As Volume
Return Volume.op(v, f, y)
End Function 'COMPLETED -- Function applies a function "f(v,y)" to a volume "v" and a
,→ double "y"

87
Public Shared Function op(ByVal f As Func(Of Matrix, Matrix), ByVal v As Volume) As Volume
Dim result As New Volume(v.shape.Item1, v.shape.Item2, 0)
For Each x In Items(v)
result.values.Add(f.Invoke(x))
Next
Return result
End Function 'COMPLETED -- Applies a function layer-wise to a Volume

Public Shared Function op(ByVal f As Func(Of Double, Double), ByVal v As Volume) As Volume
Dim result As New Volume(v.shape.Item1, v.shape.Item2, 0)
For Each m In Items(v)
result.values.Add(Matrix.op(f, m))
Next
Return result
End Function 'COMPLETED -- Applies a function elementwise to a Volume
'All Joint Operations Defined

'Iterators
Public Overloads Shared Iterator Function Items(ByVal v As Volume) As IEnumerable(Of
,→ Matrix)
For Each M In v.values
Yield M
Next
End Function 'COMPLETED -- Returns an IEnumerable of all the layers in the Volume

Public Overloads Iterator Function Items() As IEnumerable(Of Tensor)


For Each Matrix In Me.values
Yield Matrix
Next
End Function 'COMPLETED -- Returns an IEnumerable of all the layers in the Volume

Public Shared Iterator Function subvolume(ByVal v As Volume, ByVal kernelx As Integer,


,→ ByVal kernely As Integer, ByVal stridesx As Integer, ByVal stridesy As Integer) As
,→ IEnumerable(Of Volume)
For i As Integer = 1 To v.shape.Item1 - kernely + 1 Step stridesy
For j As Integer = 1 To v.shape.Item2 - kernelx + 1 Step stridesx
Yield v.split(i, i + kernely - 1, j, j + kernelx - 1)
Next
Next
End Function 'COMPLETED -- Returns a collection of volumes, by sliding a window of length
,→ (kernelx, kernely) and using strides = (stridesx, stridesy) on each individual layer
' All Iterators defined

'Casts
Public Shared Function cast(ByVal matrixList As List(Of Matrix), ByVal rows As Integer,
,→ ByVal cols As Integer) As List(Of Volume)
Dim l_v As New List(Of Volume)
For Each M In matrixList
l_v.Add(Volume.cast(M, rows, cols))
Next
Return l_v
End Function 'COMPLETED -- Casts a list of matrices into volumes elementwise

88
Public Shared Function cast(ByVal v As Volume, ByVal rows As Integer, ByVal cols As
,→ Integer) As Matrix
If rows * cols <> v.shape.Item1 * v.shape.Item2 * v.values.Count Then
Throw New System.Exception("Dimensions for matrix must only be sufficient to store
,→ all the items in the Volume")
End If
Dim result As New Matrix(rows, cols)
Dim i As Integer = 0
For Each M In Volume.Items(v)
For Each k In Matrix.val(M)
result.item((Math.Truncate(i / result.getshape(1)) + 1), (i Mod
,→ result.getshape(1)) + 1) = k 'Assigning each element in result with
'its corresponding value in the Volume "v"
i += 1
Next
Next
Return result
End Function 'COMPLETED -- Function used to cast a Volume into a matrix of shape = (rows,
,→ cols)

Public Shared Function cast(ByVal m As Matrix, ByVal rows As Integer, ByVal cols As
,→ Integer) As Volume
Dim result As New Volume(rows, cols, m.getshape(0) * m.getshape(1) / (rows * cols))
Dim i As Integer = 0
For Each d In Matrix.val(m)
result.item(((Math.Truncate(i / (cols))) Mod rows) + 1, (i Mod (cols)) + 1,
,→ Math.Truncate(i / ((cols * rows)))) = d
i += 1
Next
Return result
End Function 'COMPLETED -- Function casts a matrix into a volume of shape = (rows, cols)
'All Casts defined
'Volume Class Completed

End Class

6.4 NeuralDot.Layer

Public Interface Layer(Of Out T As Tensor)


'This base class Layer will be inherited by Dense, Conv, MaxPool and Reshape
'This base class will hold the common function that all thses layers will use
'This is a generic class of type Tensor, as all layers must be of type Tensor

ReadOnly Property parameters As List(Of Tensor)


'The parameter property will be in common for all layers as all layers will need to output
,→ the variables they are storing
'This property is useful for back-prop and debugging

Function f(ByVal x As Tensor) As T

89
'Function will be used to forward propagate through a layer
Function clone() As Layer(Of T)
'Function will be used to clone a layer. This is useful when saving a model as all layers
,→ will need to be cloned when saving a model
Function update(ByVal learning_rate As Decimal, ByVal prev_delta As Tensor, ByVal
,→ ParamArray param() As Tensor) As Tensor
'This function is used when a layer depends upon the prevous layers parameters. Therefore,
,→ this function is a MustInherit, as when the user defines their
'own functions they may need to use this depending upon how forward propagation works
,→ within the layers

Function update(ByVal learning_rate As Decimal, ByVal prev_delta As Tensor) As Tensor


'This function updates the parameters usisng prev_delta which is the gradient of loss
,→ function w.r.t the parameters
'The applicability of this function depends upon how the forward propagation works in this
,→ layer.

Sub deltaUpdate(ByVal ParamArray deltaParams() As Tensor)


'This sub-routine updates the parameters that are being trained, using deltaParams as the
,→ respective gradient via backprop
End Interface

6.5 NeuralDot.Dense

Imports NeuralDot

Public Class Dense


Implements Layer(Of Matrix)

Private x_in, w, z, b, a As Matrix


Public ReadOnly act As Mapping 'User defined variable
Public ReadOnly units As Integer 'User defined variable

'x_in, w, z, b, a are all variables of type matrix that will be used by the net.
'Relationship between variables Is z = matmul(w,x_in) + b : a = act(z) & w = (units,
,→ prev_units)
'w and b are the parameter that will be optimised to lower cost
'x_in is the input into the layer and act is the activation for this current layer

Public ReadOnly Property parameters As List(Of Tensor) Implements Layer(Of


,→ Matrix).parameters
Get
Return New List(Of Tensor)({w, b, a, z, act.d(z)})
End Get
End Property 'Property returns parameters

Public Sub New(ByVal prevUnits As Integer, ByVal layerUnits As Integer, ByVal layerAct As
,→ Mapping, ByVal mean As Double, ByVal std As Double)
units = layerUnits : act = layerAct
w = New Matrix(layerUnits, prevUnits, mean, std)

90
b = New Matrix(layerUnits, 1, mean, std)
End Sub 'Constructor initialises parmeters that will be tuned

Public Function f(ByVal X As Tensor) As Matrix Implements Layer(Of Matrix).f 'Shape of X is


,→ (_units_prev, m)
x_in = X
z = Matrix.matmul(w, X) + b
a = act.f(z)
Return a
End Function 'Returns the output of the layer using the inputs fed into the layer

Public Overridable Overloads Function update(ByVal l_r As Decimal, ByVal prev_delta As


,→ Tensor, ByVal ParamArray param() As Tensor) As Tensor Implements Layer(Of
,→ Matrix).update
Dim grads As IEnumerable(Of Matrix) = gradient(prev_delta, param) 'grads returns (dw,
,→ db)
w -= l_r * grads(0) : b -= l_r * grads(1) 'Parameters are being updated
Return grads(1) 'Function returns db, as will be needed for the next-layers update for
,→ back-prop
End Function 'Function updates the parameters using prev_delta and param

Public Overloads Function update(ByVal l_r As Decimal, ByVal prev_delta As Tensor) As


,→ Tensor Implements Layer(Of Matrix).update
'This update function is only used by the Last layer in the Net if it is a dense net
Dim grads As IEnumerable(Of Matrix) = gradient(prev_delta) 'gradient function returns
,→ (dw, db)
w -= l_r * grads(0) : b -= l_r * grads(1) 'Parameters are being updated
Return grads(1) 'Function returns db, as will be needed for the next-layers update for
,→ back-prop
End Function

Public Function gradient(prev_delta As Matrix, ByVal ParamArray param() As Tensor) As


,→ IEnumerable(Of Matrix)
'prev_delta is the rate_of_change with respect the previous layers output and param is
,→ the weight parameter from the layer above
Dim delta As Matrix = Matrix.matmul(Matrix.transpose(param(0)), prev_delta) * act.d(z)
Return New List(Of Matrix)({Matrix.matmul(delta, x_in.transpose), delta})
End Function 'Function returns gradient in the form = (dw, db)

Public Function gradient(prev_delta As Matrix) As IEnumerable(Of Matrix)


Return New List(Of Matrix)({Matrix.matmul(prev_delta, x_in.transpose), prev_delta})
,→ 'Function returns gradient using omly the prev_delta.
'Function only used if this layer is the final layer in Net
End Function

Public Sub deltaUpdate(ByVal ParamArray deltaParams() As Tensor) Implements Layer(Of


,→ Matrix).deltaUpdate
w += deltaParams(0)
b += deltaParams(1)
'Function allows users to make their own updates. Function takes in parameters, and
,→ then updates using parameters
End Sub

91
Public Overridable Function clone() As Layer(Of Matrix) Implements Layer(Of Matrix).clone
Dim cloned As New Dense(Me.w.getshape(1), units, act, 0, 0)
cloned.w = w.clone : cloned.b = b.clone
Return cloned
End Function 'Function used to clone a layer used when saving a model. Note: It is required
,→ to first make a foreward propagation, before training clone as the variables such as
,→ layer outputs will not have instantiated.

End Class

6.6 NeuralDot.Conv

Imports NeuralDot

Public Class Conv


Implements Layer(Of Volume)
Public ReadOnly act As Mapping
Private x_in, filter, z, b, a As Volume
Public ReadOnly filters_depth, kernelx, kernely, stridesx, stridesy As Integer, padding As
,→ String 'User defined variables

'x_in, filter, z, b, a are all variables, with the trainable variables being only filter
,→ and b
'Relationship between variables is: z = conv2d(x_in, filter) + b : a = act(z)
'act is the activation function being used for this layer
'kernelx, kernely denote the width and height of the kernel being applied, respectively.
'stridesx, stridesy and padding denote the properties of the type of conv2d

Public Sub New(ByVal _filters_depth As Integer, ByVal _kernelx As Integer, ByVal _kernely
,→ As Integer, ByVal _stridesx As Integer, ByVal _stridesy As Integer,
ByVal _padding As String, ByVal _act As Mapping, ByVal mean As Double, ByVal
,→ std As Double)
kernelx = _kernelx : kernely = _kernely : filters_depth = _filters_depth
stridesx = _stridesx : stridesy = _stridesy : act = _act : padding = _padding
filter = New Volume(kernely, kernelx, filters_depth, mean, std) : b = New Volume(1, 1,
,→ _filters_depth, mean, std)
End Sub 'Constructor initialises parmeters that will be tuned

Public ReadOnly Property parameters As List(Of Tensor) Implements Layer(Of


,→ Volume).parameters
Get
Return New List(Of Tensor)({filter, b, a})
End Get
'Property returns the parameters of the conv_layer
End Property

Public Sub deltaUpdate(ParamArray deltaParams() As Tensor) Implements Layer(Of


,→ Volume).deltaUpdate
filter += deltaParams(0)

92
b += deltaParams(1)
'sub-routine allows user to make their own update to the layer - Useful when user wants
,→ to create/test their own optimization algorithms
End Sub

Public Overridable Function clone() As Layer(Of Volume) Implements Layer(Of Volume).clone


Dim cloned As New Conv(filters_depth, kernelx, kernely, stridesx, stridesy, padding,
,→ act, 0, 0)
cloned.filter = filter.clone : cloned.b = b.clone
Return cloned
'Function used to clone a layer used when saving a model.
End Function

Public Function f(x As Tensor) As Volume Implements Layer(Of Volume).f


'x is the input into the layer
x_in = x
z = Volume.conv2d(x_in, filter, stridesx, stridesy, padding) + b
a = Volume.op(AddressOf act.f, z)
Return a
End Function 'Returns the output of the layer using the inputs fed into the layer

Public Function update(l_r As Decimal, prev_delta As Tensor) As Tensor Implements Layer(Of


,→ Volume).update
Throw New NotImplementedException()
End Function

Public Function update(l_r As Decimal, _prev_dz As Tensor, ParamArray param() As Tensor) As


,→ Tensor Implements Layer(Of Volume).update
'param should be empty as convolution doesn't require previous weights for finding
,→ updates.
Dim dfilter As New List(Of Matrix)
Dim dz As Volume = _prev_dz.clone
Dim act_d As Volume = Volume.op(AddressOf act.d, z)
Dim dx As New List(Of Matrix)

'Following code finds dfilter, using _prev_dz


For filter_channel As Integer = 0 To filter.values.Count - 1
Dim temp As New Volume(kernely, kernelx, 1)
Dim i As Integer = 0
For Each kernel_window In Volume.subvolume(x_in, kernelx, kernely, stridesx,
,→ stridesy)
temp += kernel_window * dz.item(Math.Truncate(i / dz.shape.Item2) + 1, (i Mod
,→ dz.shape.Item2) + 1, filter_channel) * act_d.item(Math.Truncate(i /
,→ dz.shape.Item2) + 1, (i Mod dz.shape.Item2) + 1, filter_channel)
i += 1
Next
dfilter.Add(temp.mean(2))
Next

'The Following code finds dx using dfilter


For dx_channel As Integer = 0 To x_in.values.Count - 1
Dim dx_channel_sum As New Volume(x_in.shape.Item1, x_in.shape.Item2, 1)

93
For f As Integer = 0 To filter.values.Count - 1
Dim k As Integer = 0
For i As Integer = 0 To x_in.shape.Item1 - kernely Step stridesy
For j As Integer = 0 To x_in.shape.Item2 - kernelx Step stridesx
dx_channel_sum.split(i + 1, i + kernely, j + 1, j + kernelx, 0) =
,→ filter.values(f) * dz.item(Math.Truncate(k / dz.shape.Item2) + 1,
,→ (k Mod dz.shape.Item2) + 1, f)
k += 1
Next
Next
dx.Add(dx_channel_sum.mean(2))
Next
Next

filter -= New Volume(dfilter) * l_r


b -= Volume.op(AddressOf Matrix.sum, _prev_dz) * l_r
Return New Volume(dx)
End Function
End Class

6.7 NeuralDot.Reshape

Imports NeuralDot

Public Class Reshape


Implements Layer(Of Matrix)

Private x_in As Volume


Public ReadOnly units, v_rows, v_cols, v_depth As Integer
'x_in is the Volume variable fed into the layer
'v_rows, v_cols, v_depth depends upon the previous layer

'This reshape layer isn't limited to transforming from a conv-layer to a dense-layer, but
,→ can be used to go from one user-defined layer, of type Volume
'to another of type Matrix

Public Sub New(ByVal _v_rows As Integer, ByVal _v_cols As Integer, ByVal _v_depth As
,→ Integer)
v_rows = _v_rows : v_cols = _v_cols : v_depth = _v_depth : units = (_v_cols * _v_rows *
,→ _v_depth)
End Sub

Public ReadOnly Property parameters As List(Of Tensor) Implements Layer(Of


,→ Matrix).parameters
Get
Return New List(Of Tensor)({x_in})
End Get
End Property 'No parameters are being trained, so only x_in (input) is returned

Public Function clone() As Layer(Of Matrix) Implements Layer(Of Matrix).clone

94
Dim cloned As New Reshape(v_rows, v_cols, v_depth)
Return cloned
End Function 'Function used to clone a layer used when saving a model.

Public Function f(x As Tensor) As Matrix Implements Layer(Of Matrix).f


If x.GetType <> GetType(Volume) Then
Throw New System.Exception("Input needs to be a volume for reshape")
End If
x_in = x
Return Volume.cast(x_in, x_in.shape.Item1 * x_in.shape.Item2 * x_in.values.Count, 1)
End Function 'Returns the output of the layer using the inputs fed into the layer, after
,→ applying the reshape

Public Function update_d(l_r As Decimal, prev_delta As Tensor, ParamArray param() As


,→ Tensor) As Tensor Implements Layer(Of Matrix).update
Return Volume.cast(Matrix.matmul(Matrix.transpose(param(0)), prev_delta),
,→ x_in.shape.Item1, x_in.shape.Item2)
'Function casts the prev_deltas into a volume for the back-prop of the conv layers
End Function

Public Function update(l_r As Decimal, prev_delta As Tensor) As Tensor Implements Layer(Of


,→ Matrix).update
Throw New NotImplementedException()
End Function 'Function not nesesary as no parameters are being trained

Public Sub deltaUpdate(ParamArray deltaParams() As Tensor) Implements Layer(Of


,→ Matrix).deltaUpdate
Throw New NotImplementedException()
End Sub 'Function not nesesary as no parameters are being trained

End Class

6.8 NeuralDot.MaxPool

Imports NeuralDot

Public Class MaxPool


Implements Layer(Of Volume)

Public ReadOnly stridesx, stridesy, kernelx, kernely 'User defined


Private x_in As Volume
'x_in is the variable fed into the layer,
'No parameters are being trained and no non-linear transformation is being applied so no
,→ variables being tuned
'stridesx, stridesy, kernelx, kernely describe how maxpooling will occur.

Public Sub New(ByVal _kernelx As Integer, ByVal _kernely As Integer, ByVal _stridesx As
,→ Integer, ByVal _stridesy As Integer)
kernelx = _kernelx : kernely = _kernely
stridesx = _stridesx : stridesy = _stridesy

95
End Sub

Public ReadOnly Property parameters As List(Of Tensor) Implements Layer(Of


,→ Volume).parameters
Get
Return New List(Of Tensor)({x_in})
End Get 'No parameters are being trained, so only x_in (input) is returned
End Property

Public Sub deltaUpdate(ParamArray deltaParams() As Tensor) Implements Layer(Of


,→ Volume).deltaUpdate
Throw New NotImplementedException()
End Sub 'No updates can occur as no parameters are being updates

Public Function clone() As Layer(Of Volume) Implements Layer(Of Volume).clone


Return New MaxPool(kernelx, kernely, stridesx, stridesy)
End Function 'Function used to clone a layer used when saving a model.

Public Function f(x As Tensor) As Volume Implements Layer(Of Volume).f


x_in = x
Return Volume.maxpool(x, kernely, kernelx, stridesy, stridesx)
End Function 'Returns the output of the layer using the inputs fed into the layer

Public Function update(l_r As Decimal, prev_delta As Tensor) As Tensor Implements Layer(Of


,→ Volume).update
Throw New NotImplementedException()
End Function

Public Function update(l_r As Decimal, prev_delta As Tensor, ParamArray param() As Tensor)


,→ As Tensor Implements Layer(Of Volume).update
Dim dh As Volume = prev_delta
Dim result As New List(Of Matrix)
Dim x_channels = Volume.Items(x_in).ToList
Dim channel_num As Integer = 0

'Following code will find the gradient w.r.t the maxpooled elements.
'This algorithm can be derived through chain-rule and matrix-algebra.
For x_channel As Integer = 0 To x_channels.Count - 1
Dim x_slices As List(Of Matrix) = Matrix.submatrix(x_channels(x_channel), kernelx,
,→ kernely, stridesx, stridesy).ToList
Dim temp As New Matrix(x_in.shape.Item1, x_in.shape.Item2)
Dim k As Integer = 0

For i As Integer = 0 To x_in.shape.Item1 - kernely Step stridesy


For j As Integer = 0 To x_in.shape.Item2 - kernelx Step stridesx
temp.item(i + 1, i + kernely, j + 1, j + kernelx) += (x_slices(k).max -
,→ 0.00001 < x_slices(k)) * dh.item(Math.Truncate(k / dh.shape.Item2) + 1,
,→ (k Mod dh.shape.Item2) + 1, x_channel)
k += 1
Next
Next
result.Add(temp)

96
Next

Return New Volume(result)


End Function 'Function returns the gradient of the corresponding maxpool elements
End Class

6.9 NeuralDot.Net

Imports NeuralDot

Public Class Net

Public ReadOnly netLayers As New Stack(Of Layer(Of Tensor))


Public ReadOnly checkpoints As New List(Of Tuple(Of Stack(Of Layer(Of Tensor)), DateTime))
Public ReadOnly netFeatures As Integer, loss As Mapping
'Variable netLayers is used to store every layer in the Net, via a stack of layer
'The checkpoints variable stores, all the saved Models along with their respective Time
,→ when they were saved
'The variable feature is an integer denoting the number of features of the input variable
'The loss variable is used to denote the cost function being used.

Public Sub New(ByVal features As Integer, ByVal loss_function As Mapping)


'Constructing a Net only requires number of features and loss function being used to
,→ train Net.
netFeatures = features
loss = loss_function
End Sub

Public Sub AddDense(ByVal units As Integer, ByVal act As Mapping, Optional mean As Double =
,→ 0, Optional std As Double = 1)
'A dense layer consists of a set of layers, activation being applied after matrix
,→ multiplication, and the intial values set.
'The shape of the transformation matrix = (units,units_in_prev_layer)
If netLayers.Count = 0 Then 'If no layer set, then units_in_prev = netFeatures
netLayers.Push(New Dense(netFeatures, units, act, mean, std))
ElseIf netLayers.Peek.GetType = GetType(Reshape) Then
'If prevLayers was a conv_layer then units_in_prev = reshape.units
netLayers.Push(New Dense(DirectCast(netLayers.Peek(), Reshape).units, units, act,
,→ mean, std))
Else
netLayers.Push(New Dense(DirectCast(netLayers.Peek(), Dense).units, units, act,
,→ mean, std))
End If
End Sub

Public Sub AddConv(ByVal filters As Integer, ByVal kernelx As Integer, ByVal kernely As
,→ Integer, ByVal stridesx As Integer, ByVal stridesy As Integer,
ByVal act As Mapping, Optional ByVal padding As String = "valid", Optional
,→ ByVal mean As Double = 0, Optional ByVal std As Double = 1)

97
netLayers.Push(New Conv(filters, kernelx, kernely, stridesx, stridesy, padding, act,
,→ mean, std))
End Sub 'This sub-routine adds a conv-layer, with the properties defined by the user

Public Sub AddMaxPool(ByVal kernelx As Integer, ByVal kernely As Integer, ByVal stridesx As
,→ Integer, ByVal stridesy As Integer)
netLayers.Push(New MaxPool(kernelx, kernely, stridesx, stridesy))
End Sub 'This sub-routine adds a MaxPooling layer, with properties defined by the user

Public Sub AddReshape(ByVal v_rows As Integer, ByVal v_cols As Integer, ByVal v_depth As
,→ Integer)
netLayers.Push(New Reshape(v_rows, v_cols, v_depth))
End Sub 'This sub adds a reshape-layer that will be used to reshape a volume into a matrix.
,→ This is used to go from one-type of layer of type Tensor
'to another of type Tensor

Public Function predict(ByVal x As Tensor) As Tensor


Dim pred As Tensor = x
For Each layer In netLayers.Reverse
pred = layer.f(pred)
Next
Return pred
End Function 'Function returns prediction using the current weight values of the layers

Public Function predict(ByVal x As IEnumerable(Of Tensor)) As IEnumerable(Of Tensor)


Dim result As IEnumerable(Of Tensor) = x.Select(Function(g) predict(g))
Return result
End Function 'Function returns prediction for an enumerable of inputs

Public Sub save()


Dim ops_new As New Stack(Of Layer(Of Tensor))
For Each layer In Me.netLayers.Reverse
ops_new.Push(layer.clone)
Next
checkpoints.Add(New Tuple(Of Stack(Of Layer(Of Tensor)), Date)(ops_new,
,→ DateTime.Now.ToString("yyyy/MM/dd HH:mm:ss")))
End Sub 'sub-routine saves current state of the Net along with time saved to checkpoint
,→ list

Public Function load(ByVal check_point As Integer) As Net


If check_point > checkpoints.Count - 1 Or check_point < 0 Then
Throw New System.Exception("Check point is out of range")
End If
Dim prev_model As New Net(Me.netFeatures, Me.loss)
For Each layer In Me.checkpoints(check_point).Item1.Reverse
prev_model.netLayers.Push(layer.clone)
Next
Return prev_model
End Function 'Function returns Net saved.

End Class

98
6.10 NeuralDot.Mapping

Public Class Mapping


Public Delegate Function activation(ByVal x As Matrix) As Matrix
Public Delegate Function loss(ByVal x As Matrix, ByVal y As Matrix) As Matrix
Protected f_, d_

'act is a delegate function that is used to map a matrix of numbers to another matrix of
,→ numbers via a function "f" which maps a double to a double,
'i.e f(double) -> double

'loss is a delegate function that will be used to find the error and will be used for
,→ backprop. loss(x, y) - > e,
'where "x" represent the prediction, "y" represents the true value And "e" represents the
,→ error

Public Sub New(ByVal loss_f As loss, ByVal loss_d As loss)


'The function loss_f is used to find the loss given the output and actual value.
'The function loss_d is used to find the dirivative of the loss w.r.t the output
f_ = loss_f
d_ = loss_d
End Sub 'This constructor is used to generate a new loss function

Public Sub New(ByVal activation_f As activation, ByVal _d As activation)


'The function _f defined the activation function used in a layer.
'The function _d is used to find the dirivative of the loss w.r.t the output of the
,→ activation function (_f)
f_ = activation_f
d_ = _d
End Sub 'This constructor is used to generate a new activation function

Function f(ByVal x As Matrix) As Matrix


Return f_.Invoke(x)
End Function 'Returns the output using the activation function chosen, given an input "x"

Function f(ByVal x As Matrix, ByVal y As Matrix) As Matrix


Return f_.invoke(x, y)
End Function 'Returns the output using the loss functions chosen, given inputs "x,y"

Function d(ByVal x As Matrix) As Matrix


Return d_.Invoke(x)
End Function 'Function returns the dirivative of the activation function chosen for a
,→ particular "x" value

Function d(ByVal x As Matrix, ByVal y As Matrix) As Matrix


Return d_.Invoke(x, y)
End Function 'Function returns the dirivative of the loss function chosen for a particular
,→ "x" and "y" value

99
'The following are the lists of activation functions defined, and therefore can be used by
,→ the network
Public Shared linear As New Mapping(Function(x) x, Function(x) New Matrix(x.getshape(0),
,→ x.getshape(1), 1))
Public Shared relu As New Mapping(Function(x) x.max(0), Function(x) 0 < x)
Public Shared sigmoid As New Mapping(AddressOf sigmoidAct, AddressOf sigmoidDerivative)
Public Shared tanh As New Mapping(Function(x) Matrix.op(AddressOf Math.Tanh, x),
,→ Function(x) (1 / Matrix.op(AddressOf Math.Cosh, x)) * (1 / Matrix.op(AddressOf
,→ Math.Cosh, x)))
Public Shared swish As New Mapping(Function(x) x * sigmoid.f(x), Function(x) sigmoid.f(x) *
,→ x * sigmoid.d(x))
Public Shared softmax_act As New Mapping(Function(x) Matrix.exp(x - x.max()) /
,→ Matrix.sum(Matrix.exp(x - x.max()), 0).item(1, 1), Function(x) softmax_act.f(x) * (1 -
,→ softmax_act.f(x)))

'The following code for the sigmoid function is for the extra-speed up, as the dirivative
,→ of sigmoid is s*(1-s), where s = sigmoid(x)
Private Shared Function sigmoidAct(ByVal x As Matrix) As Matrix
Return 1 / (1 + Matrix.op(AddressOf Math.Exp, -1 * x))
End Function

Private Shared Function sigmoidDerivative(ByVal x As Matrix) As Matrix


Dim s = sigmoidAct(x)
Return s * (1 - s)
End Function

'The following are the lists of loss functions defined, and therefore can be used by the
,→ network
'The x value represents the prediction by the net, whereas the y value represents the
,→ actual value
Public Shared squared_error As New Mapping(Function(x, y) (x - y) * (x - y) * 0.5,
,→ Function(x, y) (x - y))
Public Shared softmax As New Mapping(Function(x, y) -1 * Matrix.op(AddressOf Math.Log, x) *
,→ y, Function(x, y) (x - y))

End Class

6.11 NeuralDot.netData

Imports NeuralDot

Public Class netData


'This class is responsible for setting up the data to be used by the network and making it
,→ easier for the user to set up the training data
Public ReadOnly data As IEnumerable(Of Tuple(Of Tensor, Tensor))
Public ReadOnly xdata, ydata As IEnumerable(Of Tensor)

'The data variable will store the x and y data as a tuple

100
'The xdata and ydata variables will store the inputs and the corresponding outputs,
,→ respectively

Public Sub New(ByVal datax As IEnumerable(Of Tensor), ByVal datay As IEnumerable(Of


,→ Tensor))
data = join(datax, datay)
xdata = datax : ydata = datay
End Sub

Public Shared Function join(ByVal datax As IEnumerable(Of Tensor), ByVal datay As


,→ IEnumerable(Of Tensor)) As IEnumerable(Of Tuple(Of Tensor, Tensor))
Dim data As IEnumerable(Of Tuple(Of Tensor, Tensor)) = datax.Zip(datay, Function(first,
,→ second) New Tuple(Of Tensor, Tensor)(first, second))
Return data
End Function 'This function returns the xdata and ydata after being joined together

Public Function oneHot(ByVal numClasses As Integer) As IEnumerable(Of Matrix)


Dim encoded_y As IEnumerable(Of Matrix) = ydata.Select(Function(X) DirectCast(X,
,→ Matrix).oneHot(numClasses)) 'Applying oneHot to all items in the ydata list
Return encoded_y
End Function 'This function oneHots the ydata. Useful for data-manipulation

Public Function normalise(ByVal mean As Double, ByVal std As Double) As netData


Dim normed As IEnumerable(Of Tensor) = Me.xdata.Select(Function(x) x.normalize(mean,
,→ std)) 'This line normalises each x data point
Return New netData(normed, ydata)
End Function 'This function normalises the xdata, removing the need for the user to do it.

Public Shared Function toConvNetData(ByVal xfileName As String, ByVal yfileName As String,


,→ ByVal rows As Integer, ByVal cols As Integer) As netData
Dim xdata As IEnumerable(Of Tensor) = Volume.cast(toMatrix(xfileName), rows, cols)
Dim ydata As IEnumerable(Of Tensor) = toMatrix(yfileName)
Return New netData(xdata, ydata)
End Function 'This function extracts the file that stores the xdata and stores it as a list
,→ of Volumes. The ydata is stored as a list of matrices

Public Shared Function toMatrix(ByVal fileName As String) As List(Of Matrix)


Dim data As New List(Of Matrix)
Using myreader As New Microsoft.VisualBasic.FileIO.TextFieldParser(fileName) 'Using
,→ streamreader
myreader.TextFieldType = FileIO.FieldType.Delimited : myreader.SetDelimiters(",")
Dim currentrow() As Double
While Not myreader.EndOfData
currentrow = Array.ConvertAll(myreader.ReadFields().ToArray, Function(x)
,→ Double.Parse(x)).ToArray()
Dim xval(currentrow.Count - 1, 0) As Double
Dim i As Integer = 0
For Each item In currentrow : xval(i, 0) = item : i += 1 : Next
data.Add(New Matrix(xval))
End While
End Using
Return Volume.j_rotate((New Volume(data))).values

101
End Function
'This functon extracts data stored in a csv file and stores the data in a matrix.

End Class

6.12 NeuralDot.Optimiser

Public MustInherit Class Optimizer


Public ReadOnly model As Net, dataxy As IEnumerable(Of Tuple(Of Tensor, Tensor))
Public iterations As Integer = 0, losses As New List(Of Tensor) 'The list losses will store
,→ all the losses for every iteration
'The variable Model stores the Net being trained by referenced. This means when an update
,→ occurs the net is updates
'dataxy stores the data that will be used to train the net

Public Sub New(ByRef _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
model = _net
dataxy = xydata
End Sub

MustOverride Function run(ByVal learning_rate As Decimal, ByVal printLoss As Boolean, ByVal


,→ batchSize As Integer, ByVal ParamArray param() As Decimal) As List(Of Tensor)
'This function "run" is MustOverride, as every optimiser must have a method to train the
,→ net using a mini-batch.
'There is no need to hardcode batch or stoachastic gradient descent as they are just
,→ special cases of mini-batch gradient descent, i.e batchsize = m, batchsize = 1,
,→ respectively.
'The iterations variable describe the number of training iterations, used to train the net
'If printLoss = True, then the loss is printed out on each training epoch
'param() denotes the parameters that will be used by the optimiser in traning the net
'After each training epoch, the error will be stored in a list, which is returned by this
,→ function

MustOverride Sub resetParameters() 'This sub-routine resets the parameters being used to
,→ train the net. This includes the iterations variable.

Public Function calculateCost(ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor))) As


,→ Matrix
Dim temp As New Matrix(1, 1)
For j As Integer = 0 To xydata.Count - 1
temp += model.loss.f(model.predict(xydata(j).Item1), xydata(j).Item2)
Next
Return temp / xydata.Count
End Function
'This function returns the average cost of the net using the current weights

Public Function splitdata(ByVal batchSize As Integer) As List(Of IEnumerable(Of Tuple(Of


,→ Tensor, Tensor)))
Dim batchdata As New List(Of IEnumerable(Of Tuple(Of Tensor, Tensor))) 'This list will
,→ store all the data for the seperate batches

102
For batchNum As Integer = 0 To dataxy.Count / batchSize - 1
Dim temp As New List(Of Tuple(Of Tensor, Tensor)) 'The Temp list will store the
,→ examples for a particular batch for the gradient descent
For n As Integer = 0 To batchSize - 1
temp.Add(dataxy(batchNum * batchSize + n))
Next
batchdata.Add(temp.AsEnumerable)
Next
Return batchdata
End Function 'This function will organise all the data examples into seperate batches for
,→ minibatch gradient descent

Public Function calculateGradients(ByVal xypoints As IEnumerable(Of Tuple(Of Tensor,


,→ Tensor))) As Tuple(Of List(Of Matrix), List(Of Matrix))

Dim pred As Tensor


Dim errors As New List(Of Matrix)
For Each point In xypoints
pred = model.predict(point.Item1)
errors.Add(model.loss.d(pred, point.Item2) * model.netLayers.Peek.parameters.Last)
Next
Dim deltas As New Stack(Of Tensor)
deltas.Push((New Volume(errors) / dataxy.Count).mean(2))

Dim d As IEnumerable(Of Matrix) = DirectCast(model.netLayers(0),


,→ Dense).gradient(deltas.Peek)
Dim dw As New List(Of Matrix)({d(0)})
Dim db As New List(Of Matrix)({d(1)})
For layer As Integer = 1 To model.netLayers.Count - 1

Dim dlayer As IEnumerable(Of Matrix) = DirectCast(model.netLayers(layer),


,→ Dense).gradient(deltas.Peek, model.netLayers(layer - 1).parameters(0))
dw.Add(dlayer(0))
db.Add(dlayer(1))
deltas.Push(db(layer))
Next
Return New Tuple(Of List(Of Matrix), List(Of Matrix))(dw, db)
End Function 'This function calculates the gradients for a batch of xypoints, i.e
,→ mini-batch gradient descent

End Class

6.13 NeuralDot.GradientDescentOptimiser

Public Class GradientDescentOptimizer


Inherits Optimizer

Public Sub New(ByRef _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
MyBase.New(_net, xydata)

103
resetParameters()
End Sub

Public Overrides Sub resetParameters()


iterations = 0 : losses.Clear()
End Sub

Public Overrides Function run(ByVal learning_rate As Decimal, ByVal printLoss As Boolean,


,→ ByVal batchSize As Integer, ByVal ParamArray param() As Decimal) As List(Of Tensor)
Dim batches As List(Of IEnumerable(Of Tuple(Of Tensor, Tensor))) =
,→ MyBase.splitdata(batchSize) 'This line splits the data into the different
,→ batchsizes
'Following code find the average cost for this particulat mini-batch

Dim pred As Tensor


Dim errors As New List(Of Matrix)

For Each batch In batches 'Looping over every batch


'The following code wil now find the average dirivative w.r.t the output of the net
,→ for this particulat mini-batch
For Each vector In batch
pred = model.predict(vector.Item1)
errors.Add(model.loss.d(pred, vector.Item2) *
,→ model.netLayers.Peek.parameters.Last)
Next
Dim deltas As New Stack(Of Tensor)
deltas.Push((New Volume(errors) / batch.Count).mean(2))

'Following code updates each layer using deltas


model.netLayers.Peek.update(learning_rate, deltas.Peek)
For layer As Integer = 1 To model.netLayers.Count - 1
deltas.Push(model.netLayers(layer).update(learning_rate, deltas.Peek,
,→ model.netLayers(layer - 1).parameters(0)))
Next
Next

losses.Add(calculateCost(dataxy)) 'loss is calulcated using the new updates


If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses
End Function 'Function applies mini-batch gradient descent to train the network

End Class

104
6.14 NeuralDot.AdvancedOptimisers

Public MustInherit Class AdvancedOptimisers


Inherits Optimizer

Public Sub New(ByRef _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
MyBase.New(_net, xydata)

'The following code, makes sure that the net is a dense net only
For Each layer In _net.netLayers
If layer.GetType <> GetType(Dense) Then
Throw New System.Exception("For AdvancedOptimizers, Network must be a dense
,→ neural network")
End If
Next
model.predict(xydata.ToList(0).Item1) 'By running this line of code, Object reference
,→ errors are avoided as any non initialised variables are then initialised.
End Sub

Public MustOverride Overrides Function run(ByVal learning_rate As Decimal, ByVal printLoss


,→ As Boolean, ByVal batchSize As Integer, ByVal ParamArray param() As Decimal) As List(Of
,→ Tensor)
Public MustOverride Overrides Sub resetParameters()

End Class

6.15 NeuralDot.Momentum

Class Momentum
Inherits AdvancedOptimisers
Private v_dw, v_db As New List(Of Matrix)

Public Sub New(ByVal _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
MyBase.New(_net, xydata)
resetParameters()
End Sub

Public Overrides Sub resetParameters()


'Following code initialises the variables required for momentum optimisation
v_dw.Clear() : v_db.Clear() : losses.Clear() : iterations = 0
For n As Integer = 0 To model.netLayers.Count - 1
Dim layer_par As List(Of Tensor) = model.netLayers(n).parameters
v_dw.Add(New Matrix(layer_par(0).getshape(0), layer_par(0).getshape(1)))
v_db.Add(New Matrix(layer_par(1).getshape(0), layer_par(1).getshape(1)))
Next
End Sub

Public Overrides Function run(ByVal l_r As Decimal, ByVal printLoss As Boolean, ByVal
,→ batchSize As Integer, ParamArray param() As Decimal) As List(Of Tensor)

105
If param.Count <> 1 Then
Throw New System.Exception("Momentum requires 1 parameters for training")
End If

Dim batches As List(Of IEnumerable(Of Tuple(Of Tensor, Tensor))) =


,→ MyBase.splitdata(batchSize) 'This line splits the data into the different
,→ batchsizes

For Each batch In batches 'Looping through each batch, as we are doing mini-batch
,→ gradient descent
Dim d As Tuple(Of List(Of Matrix), List(Of Matrix)) =
,→ MyBase.calculateGradients(batch)
Dim dw As List(Of Matrix) = d.Item1 : Dim db As List(Of Matrix) = d.Item2 'This
,→ line retreives the gradients for w & b

'Following code applies the momentum optimiser technique to the network


For layer As Integer = 0 To model.netLayers.Count - 1
v_dw(layer) = (v_dw(layer) * param(0) + (1 - param(0)) * dw(layer))
v_db(layer) = (v_db(layer) * param(0) + (1 - param(0)) * db(layer))
model.netLayers(layer).deltaUpdate(-l_r * v_dw(layer), -l_r * v_db(layer))
Next
Next

'Following code will store the new loss after the updates have been done
losses.Add(calculateCost(dataxy))
If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses

End Function

End Class

6.16 NeuralDot.RMS

Class RMS
Inherits AdvancedOptimisers
Private s_dw, s_db As New List(Of Matrix)

Public Sub New(ByVal _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
MyBase.New(_net, xydata)
resetParameters()
End Sub

Public Overrides Sub resetParameters()


'Following code initialises the variables required for RMS optimisation
s_dw.Clear() : s_db.Clear() : losses.Clear() : iterations = 0

106
For n As Integer = 0 To model.netLayers.Count - 1
Dim layer_par As List(Of Tensor) = model.netLayers(n).parameters
s_dw.Add(New Matrix(layer_par(0).getshape(0), layer_par(0).getshape(1)))
s_db.Add(New Matrix(layer_par(1).getshape(0), layer_par(1).getshape(1)))
Next
End Sub

Public Overrides Function run(ByVal l_r As Decimal, ByVal printLoss As Boolean, ByVal
,→ batchSize As Integer, ParamArray param() As Decimal) As List(Of Tensor)
If param.Count <> 1 Then
Throw New System.Exception("RMS requires 1 parameters for training")
End If

Dim batches As List(Of IEnumerable(Of Tuple(Of Tensor, Tensor))) =


,→ MyBase.splitdata(batchSize) 'This line splits the data into the different
,→ batchsizes

For Each batch In batches 'Looping through each batch, as we are doing mini-batch
,→ gradient descent
Dim d As Tuple(Of List(Of Matrix), List(Of Matrix)) = calculateGradients(batch)
Dim dw As List(Of Matrix) = d.Item1 : Dim db As List(Of Matrix) = d.Item2
'Following code applies the RMS optimisation technique to the network
For layer As Integer = 0 To model.netLayers.Count - 1
s_dw(layer) = s_dw(layer) * param(0) + (1 - param(0)) * dw(layer) * dw(layer)
s_db(layer) = s_db(layer) * param(0) + (1 - param(0)) * db(layer) * db(layer)
model.netLayers(layer).deltaUpdate(-l_r * dw(layer) * (1 /
,→ (Matrix.op(Function(x) Math.Sqrt(x), s_dw(layer)) + 0.000001)), -l_r *
,→ db(layer) * (1 / (Matrix.op(Function(x) Math.Sqrt(x), s_db(layer)) +
,→ 0.000001)))
Next
Next

'Following code will store the new loss after the updates have been done
losses.Add(calculateCost(dataxy))
If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses
End Function

End Class

6.17 NeuralDot.ADAM

Public Class AdamOptimizer


Inherits AdvancedOptimisers
Private v_dw, v_db, s_dw, s_db As New List(Of Matrix)

107
Public Sub New(ByVal _net As Net, ByVal xydata As IEnumerable(Of Tuple(Of Tensor, Tensor)))
MyBase.New(_net, xydata)
resetParameters()
End Sub

Public Overrides Sub resetParameters()


'Following code initialises the variables required for ADAM optimisation
v_dw.Clear() : v_db.Clear() : s_dw.Clear() : s_db.Clear() : losses.Clear() : iterations
,→ = 0
For n As Integer = 0 To model.netLayers.Count - 1
Dim layer_par As List(Of Tensor) = model.netLayers(n).parameters
v_dw.Add(New Matrix(layer_par(0).getshape(0), layer_par(0).getshape(1)))
v_db.Add(New Matrix(layer_par(1).getshape(0), layer_par(1).getshape(1)))
s_dw.Add(New Matrix(layer_par(0).getshape(0), layer_par(0).getshape(1)))
s_db.Add(New Matrix(layer_par(1).getshape(0), layer_par(1).getshape(1)))
Next
End Sub

Public Overrides Function run(ByVal l_r As Decimal, ByVal printLoss As Boolean, ByVal
,→ batchSize As Integer, ParamArray param() As Decimal) As List(Of Tensor)
If param.Count <> 2 Then
Throw New System.Exception("Adam requires 2 parameters for training")
End If

Dim batches As List(Of IEnumerable(Of Tuple(Of Tensor, Tensor))) =


,→ MyBase.splitdata(batchSize) 'This line splits the data into the different
,→ batchsizes
Dim decay_term As Decimal = Math.Sqrt(1 - Math.Pow(param(1), iterations)) / (1 -
,→ Math.Pow(param(0), iterations) + 0.000001) 'decay term is being set for this
,→ particular iteration

For Each batch In batches 'Looping through each batch, as we are doing mini-batch
,→ gradient descent
Dim d As Tuple(Of List(Of Matrix), List(Of Matrix)) = calculateGradients(batch)
Dim dw As List(Of Matrix) = d.Item1 : Dim db As List(Of Matrix) = d.Item2

'The following code applies Adam Optimization technique to the network


For layer As Integer = 0 To model.netLayers.Count - 1
s_dw(layer) = s_dw(layer) * param(1) + (1 - param(1)) * dw(layer) * dw(layer)
s_db(layer) = s_db(layer) * param(1) + (1 - param(1)) * db(layer) * db(layer)
v_dw(layer) = (v_dw(layer) * param(0) + (1 - param(0)) * dw(layer))
v_db(layer) = (v_db(layer) * param(0) + (1 - param(0)) * db(layer))
model.netLayers(layer).deltaUpdate(-l_r * dw(layer) * (1 /
,→ (Matrix.op(Function(x) Math.Sqrt(x), s_dw(layer)) + 0.000001)), -l_r *
,→ db(layer) * (1 / (Matrix.op(Function(x) Math.Sqrt(x), s_db(layer)) +
,→ 0.000001)))
model.netLayers(layer).deltaUpdate(-l_r * v_dw(layer) * (1 /
,→ (Matrix.op(Function(x) Math.Sqrt(x), s_dw(layer)) + 0.00001)) * decay_term,
,→ -l_r * v_db(layer) * (1 / (Matrix.op(Function(x) Math.Sqrt(x), s_db(layer))
,→ + 0.00001)) * decay_term)
Next

108
Next

'Following code will store the new loss after the updates have been done
losses.Add(calculateCost(dataxy))
If printLoss Then
Console.WriteLine("Error for epoch {0} is: ", iterations)
losses.Last.print()
End If
iterations += 1
Return losses

End Function
End Class

109

Vous aimerez peut-être aussi