Académique Documents
Professionnel Documents
Culture Documents
i
LIST OF TABLES
ii
LIST OF FIGURES
iii
ABBREVIATIONS
AE Autoencoder
CNN Convolutional Neural Network
DBN Deep Belief Network
DL Deep Learning
DNN Deep Neural Network
HSIs Hyperspectral Images
ReLU Rectified Linear Unit
RS Remote Sensory
SAE Stacked Autoencoder
UML Unified Modeling Language
WBS Work Breakdown Structure
iv
CONTENTS
CHAPTER 2 .......................................................................................................................................... 4
LITERATURE REVIEW ................................................................................................................ 4
2.5.1. Sigmoid................................................................................................................................. 7
CHAPTER 3 ........................................................................................................................................ 10
FEASIBILITY STUDY ...................................................................................................................... 10
3.1. Technical Feasibility ................................................................................................................ 10
CHAPTER 4 ........................................................................................................................................ 14
SYSTEM DESIGN .............................................................................................................................. 14
4.1. Use Case Diagram .................................................................................................................... 14
CHAPTER 5 ........................................................................................................................................ 17
IMPLEMENTATION ........................................................................................................................ 17
5.1. Data Exploratory and Analysis............................................................................................... 17
6.2. Loss Function of Base Model for different Activation Functions ........................................ 33
6.3. Discussion.................................................................................................................................. 33
CONCLUSION ................................................................................................................................... 35
REFERENCES .................................................................................................................................... 36
CHAPTER 1
INTRODUCTION
This chapter highlights the purpose of the project and the approach we have adopted. It also
outlines the aims and objectives, as well as the scope of the project.
1.1. Introduction
Satellite imagery are Remote sensing (RS) images. RS image classification plays an important
role in the earth observation technology that uses RS data [1, 2]. Performing Satellite imagery
classification can pose scientific and practical challenges due to the characteristics of RS data.
However, with the current trends in deep learning (DL) techniques; approaches to Satellite
imagery classification with DL have achieved significant breakthroughs [3].
Deep learning is a machine learning technique that learns features and tasks directly from data.
This data can be images, text, or sound. DL architectures are characterized as artificial neural
networks, involving usually more than two layers. Learning is performed in DL through a deep
and multi-layered network of interconnected “neurons” [4].
Compared with machine learning algorithms, DL networks exploit feature representations
learned exclusively from data. They do not require hand-crafted features that are mostly
designed on the basis of domain-specific knowledge. This eliminate the problem associated
with handcrafted features. Instead of relying on shallow manually engineered features, DL
techniques are able to automatically learn informative representations of raw input data with
multiple levels of abstraction. Such learned features have achieved success by being used in
many machine vision tasks [5].
Among the available DL architectures, the one that is of particular interest to this project is
convolutional neural network (CNN). CNN is famous for making its efficiency in image
classification; and it is known to have produce some significant results in satellite imagery
classification. A CNN convolves learned features with input data, and uses 2D convolutional
layers, making it well suitable for processing 2D data, such as images. CNN works by
extracting features directly from images. The relevant features are not pretrained; they are
learned while the network trains on a collection of images. This automated feature extraction
of CNN and other DL techniques make deep learning models highly effective for computer
vision tasks such as object classification.
Li et al highlight two important approaches of CNN in satellite image classification: pixel-wise
classification for hyperspectral images (HSIs) and scene classification for high-resolution
aerial or satellite images [5]. The first is concerned with identifying the category each pixel in
a given satellite image belongs to, and the second aims to automatically assign a semantic label
to each scene in the image. This project sought to implement the second approach of CNN to
extract and classify features from satellite imagery by assigning a semantic label (tag).
Aim
The aim of this project is to implement a set of algorithms that utilizes deep neural network to
analyzed satellite images, identify basic features, and classify identified information through
tagging.
Objectives
In order to fulfill the aim, set for this project, the model must fulfill the following objectives
1. Begin literature review on area of image processing and deep neural network.
2. Determine the appropriate methodology to be used to design and implement the
proposed model based on analysis of each methodology.
3. Formulate requirements based on techniques chosen from the literature during research.
4. Use analyzed requirements to begin designing the steps involved in image processing.
5. Implement the processes of image classification using a suitable programming language
and an appropriate open source library.
6. Assess the developed code, by applying various testing techniques to ensure that the
test cases developed for the model conform to the requirements specification.
7. Evaluate the implemented code and justify whether the derived results have achieved
the project aims and objectives
The literature on topics like Neural Network, Image processing and classification will be
researched and reviewed to gain information on various approaches to satellite image
classification. Based on the knowledge acquired about the subject, an appropriate development
methodology will be chosen to plan the progress of the project.
2
The information gathered from the literature review will be analyzed to determine the
requirements of the proposed model based on which the design of the model will be done. The
model will then be implemented, trained and tested to ensure there are no errors. Finally, the
project will be evaluated against the aim and objectives of the project.
The scope of this project involves reading satellite images from the memory and applying
different image processing techniques to analyze the images and then make predictions in the
form of tagging.
The dataset used in this project is taken from Planet website (a satellite imaging company).
Planet had released the dataset in 2017 for the Kaggle competition as mentioned in the
introduction. The dataset consists of more than 100,000 images from the Amazon basin meant
for the Kaggle competition involving labelling the atmosphere and ground features in the
images. Each image is 256 x 256 pixels and has RGB and near infrared channels. The images
released by Planet are high-resolution images that could enhance easy identification and
classification of features using deep learning techniques.
The dataset consists of 17 unique feature labels. Among these 17 labels, we have four weather
labels such as clear, cloudy, haze and partly cloudy in which one of them occur on each image.
While the rest of the 13 labels are land labels which may co-occur with each other.
Our training set consist of 40,479 jpg and tiff images. The jpg images consist of three bands
red, blue and green while the tiff images consist of 4 bands red, blue, green and infra-red. The
jpg images are in 8-bit color format (i.e. pixel value ranges from 0:255) whereas the tiff files
are in 16-bit color format (pixel value ranges from 0:65536). The test set consist of two folders
test-jpg and test-additional which consist of 40,669 and 20,552 images respectively. The test
set does not contain any labels so we can only evaluate our model in test set by submitting the
prediction file on the Kaggle which will give the F2 score of our model.
3
CHAPTER 2
LITERATURE REVIEW
Image Classification refers to the task of extracting information from an image. The primary
objective of image classification is to detect, identify and classify the features occurring in an
image in terms of the type of class these features represent on the field [7]. The journey in
image classification has undergone various phases of intense research and experimentation. A
proof of this, is the presence of various approaches, techniques, and methods for image
processing and classification. Majority of the recently documented literatures have shown deep
neural network to have attained tremendous result in image classification [1,2,3,4,8]. The most
commonly used deep learning models for remote sensing or satellite image classification are
Convolutional neural networks, Stacked autoencoders and Deep belief networks [5].
4
form an SAE, which forwards the code learned from the previous AE to the next in order to
accomplish a given task [5].
5
units to move on each pass). At each step, the maximum value within the window is pooled
into an output matrix.
Li et al. [5] summarized the learning and working process of CNN into two stages: (a)
networking training and (b) feature extraction and classification. There are two parts for the
first stage: a forward part and a backward part. In the forward part, the input images are fed
through the network to obtain an abstract representation, which will be used to compute the
loss cost with regard to the given ground truth labels. Based on the loss cost, the backward part
computes the gradients of each parameter of the network. Then all the parameters are updated
in response to the gradients in preparation for the next forward computation cycle. After
sufficient iterations of training, in the second stage, the trained network can be used to extract
deep features and classify unknown images.
6
non-linear transformation to the input making it capable to learn and perform more complex
tasks. There are different types of activation function. However, we shall review three that are
frequently used in neural network to enable us decide which is more appropriate to our project.
2.5.1. Sigmoid
Sigmoid function is a widely used non-linear activation function of the form: f(x)=1/(1+e-x)
[10]. This means, if we have multiple neurons having sigmoid function as their activation
function, the output is non-linear as well. The function ranges from 0 - 1 as shown in figure 2.
2.5.2. Tanh
Tanh function is actually a scaled version of the sigmoid function. It is of the form:
tanh(x)=2sigmoid(2x)-1 OR written directly as: tanh(x) = 2/(1+e(-2x)) -1 [10]
7
2.5.3. Rectified Linear Unit (ReLU)
The ReLU function is the Rectified linear unit. It is the most widely used activation function.
It is defined as: f(x) = max(0,x) [10]. The ReLU function is non-linear, which means we can
easily backpropagate the errors and have multiple layers of neurons being activated by the
ReLU function.
8
0 to 1. This means if a particular class is present the value of the neuron will be close to one
otherwise zero.
9
CHAPTER 3
FEASIBILITY STUDY
The aim of the feasibility study activity is to determine whether it would be economically and
technically feasible to develop the system or not. A feasibility study is carried out from
following different aspects:
S/N Hardware
1 RAM: 16 GB
2 CPU: i5
4 Storage: Minimum of 10 GB
10
S/N Software Application/Uses
11
In scheduling this project tasks, we have considered these activities
• Identification of all the tasks needed to complete the project.
• Break down of large tasks into smaller tasks.
• Determination of dependencies among different tasks.
• Allocation of resources to each task.
• Determination of the starting and ending dates for each activity.
• Determination of the critical path. The chain of activities that determines the duration
of the project.
The following sections highlight the Schedule information of this project.
Planning and
system Feasibility study System Design Coding phase Testing Phase Documentation
requirement Phase phase (12hrs) phase (15hrs) (90hrs) (68hrs) Phase (50hrs)
(90hrs)
Economic
study of exiting Unit Testing
feasibility
system (30hrs) Coding (17hrs)
(3hrs)
(90hrs)
Gatherig all the
documents
Technical
Integrated (30hrs)
information Feasibility
(3hrs) Testing
gathering for
(17hrs)
the proposed
Design Diagram
plan (20hrs)
Operational (15hrs)
Feasibility Validation
(3hrs) Testing
(17hrs) Documenting
Analysing the
information (20hrs)
(40hrs) Schedule
Feasibility System Testing
(3hrs) (17hrs)
12
5.3.2. Project Gantt Chart
A Gantt chart, is a commonly used tool in project management to show project activities (tasks
or events) against time. This allow tracking project and to show additional information about
the various phases /tasks of the project. Below is the Gantt chart of the project: Satellite
Imagery Processing for Automated Tagging.
GANTT CHART
Task Name
January February March April May
Planning
Requirement
Analysis
Design
Coding
Testing
Documentation
13
CHAPTER 4
SYSTEM DESIGN
This chapter outlines two UML design diagram used in this project to indicate user interaction
with the system on the one hand and on the other the flow of data processing within the system.
UML (Unified Modelling Language) is a most useful method of visualization and documenting
software systems design. UML includes a set of graphic notation techniques to create visual
models of software systems. It is used to specify, visualize, modify, construct and document
the artifact of an object-oriented software system under development. (Mishra 1997)
Input Data
User
View
Prediction
Figure 4.1: Use Case Diagram for Satellite Imagery Processing for Automated Tagging
14
4.2. Flowchart Diagram
Flowchart is graphical representation of the sequence of steps and decisions needed to perform
a process. Each step in the process is represented by a symbol that illustrate the description of
the process step. The flowchart symbols are linked together with arrows showing the process
flow direction. Below is the flowchart representation of the processes involved in Satellite
Imagery Processing for Automated Tagging.
Start
Data Data
Acquisition Preprocessing
Model
Sampling
Training
Training Hyperparameter
Dataset Optimatization
Exploratory Model
data analysis Evaluation
NO Low
Validation
Error?
YES
Final
Classification
Model
End
Figure 4.2: Flowchart Diagram for Satellite Imagery Processing for Automated Tagging
15
Below is the brief explanation of each the processes indicated in the flowchart.
In the Data Acquisition we collect the data from the source for processing. The data require for
processing for this model consist of 40479 images in three bands: red, blue and green; with
each image consisting of a single or multiple tag such as haze, primary, water, agriculture etc.
this involves dividing the dataset into training and validation sets. In sampling we divide the
dataset into training and validation sets. The output of this process is the training dataset, which
is used for data exploration. This is following preprocessing which involves normalizing the
data and creating batches of data. This preprocessed data is fed into our training model.
16
CHAPTER 5
IMPLEMENTATION
17
Data Description
Data Description
Table 5.2: Data used in project for both training and testing of models
The information contained in the table above which consist of three folders:
1. train-jpg: which consist of jpg images in three band rgb with labels
2. train-tif-v2: which consist of tiff images in four band rgb and nir(near-infrared)
3. test-jpg: which consist of test images without labels
4. test-jpg-additional consist of additional test images
18
Analyzing tiff images:
#Import necessary libraries
img=tiff.imread("D:/N/satellite/data/train-tif-v2/train_30.tif")
img.shape
(256, 256, 4)
Tiff image consist of four bands red, blue, green, near infra-red. It is a 16-bit color channel i.e.
the pixel intensities range from 0 to 65536 as compared to jpg which is 8-bit color channel.
array ([[[ 4934, 4159, 3139, 8409],
[ 4869, 4179, 3215, 8346],
[ 4941, 4172, 3192, 8218],
...,
Figure 5.2: Near infrared image Figure 5.3: Natural RGB image
Vegetation index:
a) b)
19
The NDVI: Normalized Difference Vegetation Index is used for finding state of healthy plants
and can be calculated as:
NDWI: Normalized Difference Water Index is used for finding water bodies can calculated as
The network uses two consecutive convolutional layers followed by a max pooling operation
to extract features from the input image. In the first convolution layer, we performed
convolution operation using 32 filters of size 3x3. On the second layer we perform another
convolution operation using 64 filters of same size. In next layer, we performed a max pooling
operation of pooling size 2x2 that sparse the layer to 62 x 62 dimension. After the max pooling
operation, the representation is flattened and passed through a Multi-Layer Perceptron (MLP)
to carry out the task of classification. This involves fully connecting the flattened representation
to 17 output neurons with the activation of each output neuron indicating whether the class is
present or not.
20
ReLU activation function is performed on output of each of the convolution layer and in the
final output we performed sigmoid activation which gives us the probability of the class being
present between 0 and 1.
a) b)
a) b)
In the first training as shown in figure 12 and 13, we noticed the gradual increase in the
accuracy with a corresponding drop in the loss function after each epoch. After 25 epochs we
obtained the following values for training and validation data respectively.
21
Training loss: 0.1184 Training Accuracy: 0.9538
Validation loss: 0.1385 Validation Accuracy: 0.9447
a) b)
a) b)
22
Figure 5.10: Graph of learning rate
Based on the result of the first architecture, we decided to add two more layers of
convolution consisting of 128 and 256 filters of size of 3x3 respectively. In this
architecture shown in figure 5.10, we performed max pooling after two convolutions.
Similarly, we have also implemented the VGG16 model as shown in figure 5.11 in an
attempt to arrive at a model with the best accuracy.
23
With the VGG16 architecture we obtained the following results from training the
model:
Training loss: 0.0717 Training Accuracy: 0.9722
Validation loss: 0.0867 Validation Accuracy: 0.9666
Finally, we implemented DenseNet architecture [11]. The approach used in this model is
different from the conventional approach for building convolutional neural network. Usually,
when the CNN goes deeper, the path for information from input layer to the output layer
becomes so big such that they can get vanished before reaching the other side.
DenseNet provides solution this problem by simplifying connectivity pattern between layers.
The figure below gives an idea of how a DenseNet architecture is built.
24
Below are the Results obtained from each of the Model on training dataset
25
data
Table 5.5: Classification report for DenseNet Model on training
data
data
data
26
Table 5.7: Classification report for Ensemble DenseNet and VGG16 Model on training data
data
Comparing the results obtained from the four models, we will observe a gradual improvement
in the accuracy of both training and validation and a corresponding drop in the loss function.
This phenomenon indicates that the deeper the convolutional network the higher the percentage
of making accurate predictions. However, there may be some exception where architecture
with fewer layers as in the case of the base model, perform better than those with more layers
as we will noticed in the testing phase of each of the models after training.
27
CHAPTER 6
This chapter shows the results obtained from testing each of the model we have built or
implemented. Testing of each of the model is done using the training dataset which has the true
label of the images. The testing dataset available does not have the true label with which we
could evaluate degree of accuracy of the prediction of each model on a particular image.
However, we evaluate our prediction by submitting our predicted labels to Kaggle Submission
which will gives the f2 score of our model.
Test 1:
28
Prediction:
Test 2:
29
Prediction:
Test 3:
30
Prediction:
Test 4:
31
Prediction:
F2 on Kaggle
Model F2 on Validation #parameters
Submission
32
6.2. Loss Function of Base Model for different Activation Functions
The table below contains results of loss function for the base model on different activation
functions.
Sigmoid
0.2747 0.9036
Tanh
0.2559 0.9053
ReLU
0.1917 0.9242
Table 6.6: Results for various activations functions on the Base Model
6.3. Discussion
Comparing the results from the base model to the Ensemble model shows an across-the-board
improvement. The base model is quite good at predicting primary and clear labels, which
makes sense since the data has the most training examples of these types. The base models
perform moderately well on most of the common atmosphere and ground features, but gets an
average 0.57 F1 for nearly all of the rare features. This is the main deficiency of the base models
its failure to identify the rare labels that appear to be similar. However, on distinctively unique
features, it performs pretty well. This deficiency is also reflected in the results obtained from
other models we have implemented. The inability of the models to make good prediction on
rare labels is due to the very fact that the number of these labels available in the training dataset
are few and thus they become bias toward those with higher numbers of labels. To resolve this
issue, there are various techniques that could be used augment the rare label features. Some of
these are undersampling, oversampling, synthetic sampling.
Undersampling involves randomly deleting the images with higher number of labels so the
labels are closely or evenly distributed. Oversampling on the other hand is increasing the
number of images of rare labels by copying the image with augmentation technique like
rotating, shifting, cropping etc. Synthetic Sampling uses some technique to synthetically
manufacture observation of rare labels.
33
To addressed the issue of relating imbalance class problem mentioned above, we implemented
oversampling technique. With this technique, we were able to improve the prediction on the
class which was oversampled. The figure below shows the improvement in the precision and
recalling of the slash_bur class.
Figure 6.5: Shows the improvement in the outcome of training data on the oversampled class
The improvement recorded in the oversampled class shows that we can improve the overall
accuracy of the model. However, to implement this technique it requires a huge amount of
time, ranging from data preparation to appropriate labelling to training of the data. Nonetheless,
we will need to explore this area further for an appropriate and efficient way to implement
these techniques.
34
CONCLUSION
This report contained the documentation of the project objectives, feasibility study, design
diagrams and implementation of Satellite Imagery Processing for Automatic Tagging. The
report highlights the project development phases along some intuition about how convolutional
neural network can be fine-tuned to improve performance as well as accuracy.
The aim of the project was to build a model that could automatically tag satellite images with
little or no human effort require. This we have done by building our own base model and then
implement high performing architecture used in image classification. The results obtained from
models were then evaluated with the appropriate metrics.
The project has broadened our general knowledge on image processing and machine learning.
The results obtained from the various model implemented in this project indicates some
important factors to be considered while building a convolutional neural network. These factors
include the size of the dataset available and how the labels are distributed within the dataset,
the depth of the network, the activation function to be used, the nature of the classification
problem etc.
For a multi-label classifier attempting to provide better information of a given location like the
one we have implemented, it should be able to identify activities such as mining, logging, and
slash-and-burn to be truly effective.
35
REFERENCES
[1] Li, Miao, et al. "A review of remote sensing image classification techniques: The role of spatio-
contextual information." European Journal of Remote Sensing 47.1 (2014): 389-411.
[2] Dos Santos, Jefersson A., et al. "Efficient and effective hierarchical feature propagation." IEEE
Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7.12 (2014):
4632-4643.
[3] Gardner, Daniel, and David Nichols. "Multi-label Classification of Satellite Images with Deep
Learning." http://cs231n.stanford.edu/reports/2017/pdfs/908.pdf 10/03/2019
[4] Richa Bhatia "Understanding the Difference Between Deep Learning & Machine Learning"
https://www.analyticsindiamag.com/understanding-difference-deep-learning-machine-
learning/ 13/2/2019.
[5] Li, Ying, et al. "Deep learning for remote sensing image classification: A survey." Wiley
Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8.6 (2018): e1264.
[6] Wei, Yunchao, et al. "Cnn: Single-label to multi-label." arXiv preprint arXiv:1406.5726 (2014).
[7] Kalra, Kanika, Anil Kumar Goswami, and Rhythm Gupta. "A comparative study of supervised
image classification algorithms for satellite images." International Journal of Electrical,
Electronics and Data Communication 1.10 (2013): 10-16.
[8] Lu, Dengsheng, and Qihao Weng. "A survey of image classification methods and techniques
for improving classification performance." International journal of Remote sensing 28.5
(2007): 823-870.
[11] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional
networks. InProceedings of the IEEE conference on computer vision and pattern recognition
2017 (pp. 4700-4708).
36