Vous êtes sur la page 1sur 42

ABSTRACT

KEYWORDS: Satellite Imagery, Imagery, Tagging, Convolutional Neural Network (CNN).


With the advancements in satellite imaging technology, there is an abundance of high
resolution and high-quality imagery collected on a daily basis from around the globe by many
satellites. Utilizing these image resources is highly desirable for data extraction and analysis.
One of the fundamental problems in managing these increasing image resources is automatic
image tagging. Image tagging is the task of assigning human-friendly tags to an image so that
the semantic tags can better reflect the content of the image and therefore can help users’ better
access the image. The quality of image tagging depends on the quality of concept modeling
which builds a mapping from concepts to visual images. A significant progress is been made
on image tagging, especially in the area of deep learning. Convolutional neural network (CNN),
for instance has proved very effective in image classification. This project titled Satellite
Imagery Processing for Automated Tagging is aimed at demonstrating how deep learning
techniques such as CNN provides a good framework for classification and automatic tagging
of satellite imagery.

i
LIST OF TABLES

Table Title Page


3.1 Software tools ............................................................................................... .………10
3.2 Hardware tools.............................................................................................. .………11
3.3 Work Breakdown Structure .......................................................................... .………12
3.4 Gantt chart .................................................................................................... ….……13
5.1 Name of the image with its corresponding labels ......................................... .………17
5.1 Data used in project for model training and testing ...................................... .………18
5.3 Classification report for Base Model on training data .................................. .………25
5.4 Classification report for second Base Model on training data ...................... .………25
5.5 Classification report for DenseNet Model on training data .......................... …….…26
5.6 Classification report for VGG16 Model on training data ............................. …….…26
5.7 Classification report for Ensemble of DenseNet & VGG16 on training data………27
6.1 Predicted Labels for Test image 1 ................................................................ ….……29
6.2 Predicted Labels for Test image 2 ................................................................ ….……30
6.3 Predicted Labels for Test image 3 ................................................................ ….……31
6.4 Predicted Labels for Test image 4 ................................................................ ….……32
6.5 Results obtained from Kaggle Submission for different Models ................ ….……32
6.6 Results for various activations functions on the Base Model ...................... ….……33

ii
LIST OF FIGURES

Figure Title Page


2.1 Three key parts of CNN................................................................................. 6
2.2 Function graph of Sigmoid ............................................................................ 7
2.3 Function graph of Tanh ................................................................................. 7
2.4: Function graph of ReLU ................................................................................ 8
4.1 Use Case Diagram ......................................................................................... 14
4.2 Flow Chart Diagram ...................................................................................... 15
5.1 Co-occurrence matrix of each label ............................................................... 18
5.2 Near infrared image ....................................................................................... 19
5.3 Natural RGB image ....................................................................................... 19
5.4 Normalize difference Vegetation and water index ........................................ 19
5.5 Project Base Model Architecture ................................................................... 20
5.7 Graph of validation accuracy and loss ........................................................... 21
5.6 Graph of accuracy and loss ............................................................................ 21
5.8 Graph of training accuracy and loss .............................................................. 22
5.9 Graph of validation accuracy and loss ........................................................... 22
5.10 Graph of learning rate .................................................................................... 23
6.1 Test Image 1 .................................................................................................. 28
6.2 Test Image 2 .................................................................................................. 29
6.3 Test Image 3 .................................................................................................. 30
6.4 Test Image 4 .................................................................................................. 31
6.5 Outcome of training data on the oversampled class ...................................... 34

iii
ABBREVIATIONS

AE Autoencoder
CNN Convolutional Neural Network
DBN Deep Belief Network
DL Deep Learning
DNN Deep Neural Network
HSIs Hyperspectral Images
ReLU Rectified Linear Unit
RS Remote Sensory
SAE Stacked Autoencoder
UML Unified Modeling Language
WBS Work Breakdown Structure

iv
CONTENTS

CERTIFICATE ........................................................................................Error! Bookmark not defined.


CERTIFICATE ........................................................................................Error! Bookmark not defined.
EXAMINATION CERTIFICATE .........................................................Error! Bookmark not defined.
DECLARATION......................................................................................Error! Bookmark not defined.
ACKNOWLEDGEMENT .......................................................................Error! Bookmark not defined.
ABSTRACT ............................................................................................................................................ i
LIST OF TABLES ................................................................................................................................ ii
LIST OF FIGURES ............................................................................................................................. iii
ABBREVIATIONS .............................................................................................................................. iv
CHAPTER 1 .......................................................................................................................................... 1
INTRODUCTION................................................................................................................................. 1
1.1. Introduction ................................................................................................................................ 1

1.2. Project Aim and Objectives ...................................................................................................... 2

1.3. Research approach..................................................................................................................... 2

1.4. Project Scope .............................................................................................................................. 3

1.5. Project Dataset ........................................................................................................................... 3

CHAPTER 2 .......................................................................................................................................... 4
LITERATURE REVIEW ................................................................................................................ 4

2.1. Image Classification ................................................................................................................... 4

2.2. Deep belief networks .................................................................................................................. 4

2.3. Stacked autoencoders ................................................................................................................ 4

2.4. Convolutional Neural Networks ............................................................................................... 5

2.4.1. Convolution Layers ............................................................................................................. 5

2.4.2. Pooling Layer ...................................................................................................................... 5

2.4.3. Fully Connected Layer ....................................................................................................... 6

2.5. Activation Function ................................................................................................................... 6

2.5.1. Sigmoid................................................................................................................................. 7

2.5.2. Tanh ..................................................................................................................................... 7

2.5.3. Rectified Linear Unit (ReLU) ............................................................................................ 8

2.6. Choice of Activation Function .................................................................................................. 8


2.7. Choosing Hyperparameters ..................................................................................................... 9

CHAPTER 3 ........................................................................................................................................ 10
FEASIBILITY STUDY ...................................................................................................................... 10
3.1. Technical Feasibility ................................................................................................................ 10

3.2. Implementation Feasibility...................................................................................................... 11

3.3. Schedule Feasibility ................................................................................................................. 11

3.3.1. Project Work Breakdown Structure ............................................................................. 12

5.3.2. Project Gantt Chart .......................................................................................................... 13

5.4. Economic Feasibility ................................................................................................................ 13

CHAPTER 4 ........................................................................................................................................ 14
SYSTEM DESIGN .............................................................................................................................. 14
4.1. Use Case Diagram .................................................................................................................... 14

CHAPTER 5 ........................................................................................................................................ 17
IMPLEMENTATION ........................................................................................................................ 17
5.1. Data Exploratory and Analysis............................................................................................... 17

5.2. The Base Model ........................................................................................................................ 20

5.3. Model Training......................................................................................................................... 21

TESTING AND REPORTS ............................................................................................................... 28


6.1 Test Results ................................................................................................................................ 28

6.2. Loss Function of Base Model for different Activation Functions ........................................ 33

6.3. Discussion.................................................................................................................................. 33

CONCLUSION ................................................................................................................................... 35
REFERENCES .................................................................................................................................... 36
CHAPTER 1

INTRODUCTION

This chapter highlights the purpose of the project and the approach we have adopted. It also
outlines the aims and objectives, as well as the scope of the project.

1.1. Introduction
Satellite imagery are Remote sensing (RS) images. RS image classification plays an important
role in the earth observation technology that uses RS data [1, 2]. Performing Satellite imagery
classification can pose scientific and practical challenges due to the characteristics of RS data.
However, with the current trends in deep learning (DL) techniques; approaches to Satellite
imagery classification with DL have achieved significant breakthroughs [3].
Deep learning is a machine learning technique that learns features and tasks directly from data.
This data can be images, text, or sound. DL architectures are characterized as artificial neural
networks, involving usually more than two layers. Learning is performed in DL through a deep
and multi-layered network of interconnected “neurons” [4].
Compared with machine learning algorithms, DL networks exploit feature representations
learned exclusively from data. They do not require hand-crafted features that are mostly
designed on the basis of domain-specific knowledge. This eliminate the problem associated
with handcrafted features. Instead of relying on shallow manually engineered features, DL
techniques are able to automatically learn informative representations of raw input data with
multiple levels of abstraction. Such learned features have achieved success by being used in
many machine vision tasks [5].
Among the available DL architectures, the one that is of particular interest to this project is
convolutional neural network (CNN). CNN is famous for making its efficiency in image
classification; and it is known to have produce some significant results in satellite imagery
classification. A CNN convolves learned features with input data, and uses 2D convolutional
layers, making it well suitable for processing 2D data, such as images. CNN works by
extracting features directly from images. The relevant features are not pretrained; they are
learned while the network trains on a collection of images. This automated feature extraction
of CNN and other DL techniques make deep learning models highly effective for computer
vision tasks such as object classification.
Li et al highlight two important approaches of CNN in satellite image classification: pixel-wise
classification for hyperspectral images (HSIs) and scene classification for high-resolution
aerial or satellite images [5]. The first is concerned with identifying the category each pixel in
a given satellite image belongs to, and the second aims to automatically assign a semantic label
to each scene in the image. This project sought to implement the second approach of CNN to
extract and classify features from satellite imagery by assigning a semantic label (tag).

1.2. Project Aim and Objectives

Aim
The aim of this project is to implement a set of algorithms that utilizes deep neural network to
analyzed satellite images, identify basic features, and classify identified information through
tagging.

Objectives
In order to fulfill the aim, set for this project, the model must fulfill the following objectives
1. Begin literature review on area of image processing and deep neural network.
2. Determine the appropriate methodology to be used to design and implement the
proposed model based on analysis of each methodology.
3. Formulate requirements based on techniques chosen from the literature during research.
4. Use analyzed requirements to begin designing the steps involved in image processing.
5. Implement the processes of image classification using a suitable programming language
and an appropriate open source library.
6. Assess the developed code, by applying various testing techniques to ensure that the
test cases developed for the model conform to the requirements specification.
7. Evaluate the implemented code and justify whether the derived results have achieved
the project aims and objectives

1.3. Research approach

The literature on topics like Neural Network, Image processing and classification will be
researched and reviewed to gain information on various approaches to satellite image
classification. Based on the knowledge acquired about the subject, an appropriate development
methodology will be chosen to plan the progress of the project.

2
The information gathered from the literature review will be analyzed to determine the
requirements of the proposed model based on which the design of the model will be done. The
model will then be implemented, trained and tested to ensure there are no errors. Finally, the
project will be evaluated against the aim and objectives of the project.

1.4. Project Scope

The scope of this project involves reading satellite images from the memory and applying
different image processing techniques to analyze the images and then make predictions in the
form of tagging.

1.5. Project Dataset

The dataset used in this project is taken from Planet website (a satellite imaging company).
Planet had released the dataset in 2017 for the Kaggle competition as mentioned in the
introduction. The dataset consists of more than 100,000 images from the Amazon basin meant
for the Kaggle competition involving labelling the atmosphere and ground features in the
images. Each image is 256 x 256 pixels and has RGB and near infrared channels. The images
released by Planet are high-resolution images that could enhance easy identification and
classification of features using deep learning techniques.
The dataset consists of 17 unique feature labels. Among these 17 labels, we have four weather
labels such as clear, cloudy, haze and partly cloudy in which one of them occur on each image.
While the rest of the 13 labels are land labels which may co-occur with each other.
Our training set consist of 40,479 jpg and tiff images. The jpg images consist of three bands
red, blue and green while the tiff images consist of 4 bands red, blue, green and infra-red. The
jpg images are in 8-bit color format (i.e. pixel value ranges from 0:255) whereas the tiff files
are in 16-bit color format (pixel value ranges from 0:65536). The test set consist of two folders
test-jpg and test-additional which consist of 40,669 and 20,552 images respectively. The test
set does not contain any labels so we can only evaluate our model in test set by submitting the
prediction file on the Kaggle which will give the F2 score of our model.

3
CHAPTER 2

LITERATURE REVIEW

In order to implement an effective satellite image classification deep learning algorithm, it is


important to select an effective design and training procedure. This chapter focuses on
reviewing some relevant literature relating to the project topic – Satellite Imagery Processing
for Automatic Tagging. Appropriate journals, books, Internet sources have been used to gather
the relevant literature. The chapter describes the literature associated with image processing;
the different technique used in image classification. The literature review also gives us an idea
of previous studies done in this field.

2.1. Image Classification

Image Classification refers to the task of extracting information from an image. The primary
objective of image classification is to detect, identify and classify the features occurring in an
image in terms of the type of class these features represent on the field [7]. The journey in
image classification has undergone various phases of intense research and experimentation. A
proof of this, is the presence of various approaches, techniques, and methods for image
processing and classification. Majority of the recently documented literatures have shown deep
neural network to have attained tremendous result in image classification [1,2,3,4,8]. The most
commonly used deep learning models for remote sensing or satellite image classification are
Convolutional neural networks, Stacked autoencoders and Deep belief networks [5].

2.2. Deep belief networks


Deep Belief Network is a probabilistic generative model which provides a joint probability
distribution over observable data and labels. A DBN first takes advantages of an efficient layer-
by-layer greedy learning strategy to initialize the deep network, and then finetunes all of the
weights jointly with the desired outputs [5].

2.3. Stacked autoencoders


A stacked autoencoder (SAE) is a deep network model consisting of multiple layers of
autoencoders (AEs), each of which is a special type of neural network used for efficient
encodings. Instead of training the network to predict a certain target label given inputs, an AE
is trained to reconstruct its own inputs. A single AE is not able to get the discriminative and
representative features of raw input data. Multiple AEs are usually stacked with one other to

4
form an SAE, which forwards the code learned from the previous AE to the next in order to
accomplish a given task [5].

2.4. Convolutional Neural Networks


Convolutional Neural Networks (CNNs) is the most popular neural network model being used
for image classification problem. The CNN wield its popularity from the practical benefit of
having fewer parameters which greatly improves the time it takes to learn as well as reduces
the amount of data required to train the model. Generally, a CNN consists of three key parts:
convolution layers, pooling layers, and fully connected layers [5]. Each of these parts play
unique roles.

2.4.1. Convolution Layers


In convolution layers, the input maps are convolved with learnable filters and are subsequently
put through the activation function to form the output feature maps. A filter is just a matrix of
values, called weights, that are trained to detect specific features. The filter moves over each
part of the image to check if the feature it is meant to detect is present. It is usually expressed
as a matrix (M x M x 3). The filter carries out a convolution operation, which is an element-
wise product and sum between two matrices that gives the activation map. When the feature is
present in part of an image, the convolution operation between the filter and that part of the
image results in a real number with a high value. If the feature is not present, the resulting value
is low.
The dimension of the activation maps can be determined by using the formula:
(N + 2P - F)/ S + 1; where N denotes the dimension of input image; P denotes the Padding; F
denotes the dimension of filter; S denotes the Stride.
Convolution layers introduce weight sharing mechanism within same feature maps. This helps
reduce significantly the number of parameters required. It can take two-dimensional (2D)
images with any scale directly as input while reserving the location information of objects in
the images.

2.4.2. Pooling Layer


The pooling layer follows the convolution layer and it is used to reduce the dimensionality of
feature maps. Pooling is done for the sole purpose of reducing the spatial size of the image. To
speed up the training process and reduce the amount of memory consumed by the network, we
try to reduce the redundancy present in the input feature using map pooling. The process in
max pooling is similar to passing a window over an image according to a set stride (how many

5
units to move on each pass). At each step, the maximum value within the window is pooled
into an output matrix.

2.4.3. Fully Connected Layer


In the fully connected layer, the input representation is flattened into a feature vector and passed
through a network of neurons to predict the output probabilities. Here the output maps of the
last convolution layer or pooling layer are arranged into vectors, acting as the inputs to the first
fully connected layer. The output of the final fully connected layer can be regarded as the learnt
feature, forming the result extracted from the input image by the convolutional network. The
classification operation can be simply implemented by connecting this output to a learning
classifier [5,9].

Figure 2.1: Three key parts of CNN

Li et al. [5] summarized the learning and working process of CNN into two stages: (a)
networking training and (b) feature extraction and classification. There are two parts for the
first stage: a forward part and a backward part. In the forward part, the input images are fed
through the network to obtain an abstract representation, which will be used to compute the
loss cost with regard to the given ground truth labels. Based on the loss cost, the backward part
computes the gradients of each parameter of the network. Then all the parameters are updated
in response to the gradients in preparation for the next forward computation cycle. After
sufficient iterations of training, in the second stage, the trained network can be used to extract
deep features and classify unknown images.

2.5. Activation Function


An extremely important feature of the artificial neural networks is the activation function. They
basically decide whether a neuron should be activated or not. Whether the information that the
neuron is receiving is relevant for the given information or should it be ignored. Activation is
understood as “the non-linear transformation that we do over the input signal. This transformed
output is then sent to the next layer of neurons as input” [10]. The activation function does the

6
non-linear transformation to the input making it capable to learn and perform more complex
tasks. There are different types of activation function. However, we shall review three that are
frequently used in neural network to enable us decide which is more appropriate to our project.

2.5.1. Sigmoid
Sigmoid function is a widely used non-linear activation function of the form: f(x)=1/(1+e-x)
[10]. This means, if we have multiple neurons having sigmoid function as their activation
function, the output is non-linear as well. The function ranges from 0 - 1 as shown in figure 2.

Figure 2.2: function graph of Sigmoid


Two major problem of sigmoid are; once the function falls in the region where the gradients
become very small, (gradient is approaching to zero) the network does not really learn. The
other problem the sigmoid function suffers is that the values only range from 0 to 1. This means
that the sigmoid function is not symmetric around the origin and the values received are all
positive [10]. However, there are times when we desire the values going to the next neuron not
to be of the same sign.

2.5.2. Tanh
Tanh function is actually a scaled version of the sigmoid function. It is of the form:
tanh(x)=2sigmoid(2x)-1 OR written directly as: tanh(x) = 2/(1+e(-2x)) -1 [10]

Figure 2.3: function graph of Tanh


Tanh works similar to the sigmoid function but is symmetric over the origin. it ranges from -1
to 1. It solves the problem of the values being of the same sign. All other properties are the
same as that of the sigmoid function.

7
2.5.3. Rectified Linear Unit (ReLU)
The ReLU function is the Rectified linear unit. It is the most widely used activation function.
It is defined as: f(x) = max(0,x) [10]. The ReLU function is non-linear, which means we can
easily backpropagate the errors and have multiple layers of neurons being activated by the
ReLU function.

Figure 2.4: function graph of ReLU


One of the greatest advantages ReLU has over other activation functions is that it does not
activate all neurons at the same time. This is because ReLU converts all negative inputs to zero,
hence, the neuron does not get activated. This makes it very computationally efficient as few
neurons are activated per time. In practice, ReLU converges six times faster than tanh and
sigmoid activation functions [10].
Researchers have found out that ReLU layers work far better because the network is able to
train a lot faster (because of the computational efficiency) without making a significant
difference to the accuracy. It also helps to alleviate the vanishing gradient problem, which is
the issue where the lower layers of the network train very slowly because the gradient decreases
exponentially through the layers.
However, ReLU falls prey to the problem of the gradients moving towards zero. If you look at
the negative side of the graph in figure 3, the gradient is zero. With the gradient equal to zero,
the weights are not updated during back propagation. This limitation can act as a regularization
technique which prevent our network from overfitting. However, this can create dead neurons
which never get activated.

2.6. Choice of Activation Function


We intend in this section to state our choice of activation function. Based on our review of the
three commonly used activation function in neural network and the nature of the network we
intend to build, we had opted for ReLU and Sigmoid activation functions. ReLU is used in the
convolutional layer to transfer information from one feature map to another. With ReLU, we
are able to minimize the probability of losing valuable information. Similarly, we used sigmoid
function for the network output neuron. The reason for implementing sigmoid function for the
output neuron is obtain an output that gives us the probability of each label within the range of

8
0 to 1. This means if a particular class is present the value of the neuron will be close to one
otherwise zero.

2.7. Choosing Hyperparameters


Another important consideration to be made when a building a CNN model is the question of
how to determine the number layers to use, how many convolution layers, what are the filter
sizes, or the values for stride and padding. These are not trivial questions and there isn’t a set
standard that is used by all researchers. This is because the network will largely depend on the
type of data that one has. Data can vary by size, complexity of the image, type of image
processing task, and more. When looking at your dataset, one way to think about how to choose
the hyperparameters is to find the right combination that creates abstractions of the image at a
proper scale.

9
CHAPTER 3
FEASIBILITY STUDY
The aim of the feasibility study activity is to determine whether it would be economically and
technically feasible to develop the system or not. A feasibility study is carried out from
following different aspects:

3.1. Technical Feasibility


Technical feasibility corresponds to determination of whether it is technically feasible to
develop the software. That is, if the app can be implemented using the current technology.
The following technical feasibility areas were probed during the feasibility study phase:
• The necessary technology for developing this model are readily available.
• The front-end tool proposed is easily compatible with the current hardware
configuration available.
• The back-end tool proposed has the capacity to process the data required for building
this model.
Windows operating system is used for the development of this model. As such, the following
are Hardware and Software are used for the developing the model.

S/N Hardware

1 RAM: 16 GB

2 CPU: i5

3 GPU: GTX 1060

4 Storage: Minimum of 10 GB

Table 3.1: Software tools required for the project

10
S/N Software Application/Uses

Python programming A tool for development of programs which


1
language perform data manipulations.
A library for Python programming language, used
2 NumPy
for efficient computation on arrays
A plotting library for Python programming
3 Matplotlib language. It provides an object-oriented API for
embedding plots into applications
An open source artificial intelligence library used
4 TensorFlow for Classification, Perception, Understanding,
Discovering, Prediction and Creation
A software library written for the Python
5 Pandas programming language for data manipulation and
analysis.
Seaborn is a Python data visualization library,
6 Seaborn used for drawing attractive and informative
statistical graphics.

Table 3.2: Hardware tools required for the project

3.2. Implementation Feasibility


The model can be used in any system with python and tensorflow installed. Once this two
software are available and the weights of the model is present in the system, the user only needs
to import the model and make prediction on an image.

3.3. Schedule Feasibility


Schedule Feasibility is part of project management, which relates to the use of schedule such
as Gantt charts to plan and subsequently report progress within the project environment. In this
phase we defined the project scope and the appropriate methods for completing the project.
Next, we determined the durations of each task necessary to complete the work. These, we
have listed and grouped into a work breakdown structure. We have also tried to optimized the
project plan to achieve the appropriate balance between resource usage and project duration so
as to meet with the project objectives.

11
In scheduling this project tasks, we have considered these activities
• Identification of all the tasks needed to complete the project.
• Break down of large tasks into smaller tasks.
• Determination of dependencies among different tasks.
• Allocation of resources to each task.
• Determination of the starting and ending dates for each activity.
• Determination of the critical path. The chain of activities that determines the duration
of the project.
The following sections highlight the Schedule information of this project.

3.3.1. Project Work Breakdown Structure


The work breakdown structure (WBS) is a technique that provides the necessary framework
for detailing and estimating the project cost along with guidance for schedule development and
control. Below is the work breakdown structure of Satellite Imagery Processing for Automated
Tagging.

SATELLITE IMAGERY PROCESSING


FOR AUTOMATED TAGGING

325 hours (65 days)

Planning and
system Feasibility study System Design Coding phase Testing Phase Documentation
requirement Phase phase (12hrs) phase (15hrs) (90hrs) (68hrs) Phase (50hrs)
(90hrs)

Economic
study of exiting Unit Testing
feasibility
system (30hrs) Coding (17hrs)
(3hrs)
(90hrs)
Gatherig all the
documents
Technical
Integrated (30hrs)
information Feasibility
(3hrs) Testing
gathering for
(17hrs)
the proposed
Design Diagram
plan (20hrs)
Operational (15hrs)
Feasibility Validation
(3hrs) Testing
(17hrs) Documenting
Analysing the
information (20hrs)
(40hrs) Schedule
Feasibility System Testing
(3hrs) (17hrs)

Table 3.3: WBS showing project schedule

12
5.3.2. Project Gantt Chart
A Gantt chart, is a commonly used tool in project management to show project activities (tasks
or events) against time. This allow tracking project and to show additional information about
the various phases /tasks of the project. Below is the Gantt chart of the project: Satellite
Imagery Processing for Automated Tagging.

GANTT CHART
Task Name
January February March April May

Planning
Requirement
Analysis
Design
Coding

Testing

Documentation

Table 3.4: Project Gantt chart

5.4. Economic Feasibility


The software required for designing and implementing this model are readily available.
Similarly, the dataset required for training and testing this model are freely available at the
Planet website. As such, there is no any economical constrains on how to get the required
resources for this project.

13
CHAPTER 4

SYSTEM DESIGN

This chapter outlines two UML design diagram used in this project to indicate user interaction
with the system on the one hand and on the other the flow of data processing within the system.
UML (Unified Modelling Language) is a most useful method of visualization and documenting
software systems design. UML includes a set of graphic notation techniques to create visual
models of software systems. It is used to specify, visualize, modify, construct and document
the artifact of an object-oriented software system under development. (Mishra 1997)

4.1. Use Case Diagram


A use case diagram is a dynamic or behavior diagram in UML. Use case diagrams model the
functionality of a system using actors and use cases. Use cases are a set of actions, services,
and functions that the system needs to perform.
The Use case diagram below shows the set of action the user of this model needs to perform in
order to obtain the appropriate result of a given input. The user gives an input to the system to
process. The system generates an output known as prediction for the user to view.

Input Data

User

View
Prediction

Figure 4.1: Use Case Diagram for Satellite Imagery Processing for Automated Tagging

14
4.2. Flowchart Diagram
Flowchart is graphical representation of the sequence of steps and decisions needed to perform
a process. Each step in the process is represented by a symbol that illustrate the description of
the process step. The flowchart symbols are linked together with arrows showing the process
flow direction. Below is the flowchart representation of the processes involved in Satellite
Imagery Processing for Automated Tagging.

Start

Data Data
Acquisition Preprocessing

Model
Sampling
Training

Training Hyperparameter
Dataset Optimatization

Exploratory Model
data analysis Evaluation

NO Low
Validation
Error?

YES
Final
Classification
Model

End
Figure 4.2: Flowchart Diagram for Satellite Imagery Processing for Automated Tagging

15
Below is the brief explanation of each the processes indicated in the flowchart.

In the Data Acquisition we collect the data from the source for processing. The data require for
processing for this model consist of 40479 images in three bands: red, blue and green; with
each image consisting of a single or multiple tag such as haze, primary, water, agriculture etc.
this involves dividing the dataset into training and validation sets. In sampling we divide the
dataset into training and validation sets. The output of this process is the training dataset, which
is used for data exploration. This is following preprocessing which involves normalizing the
data and creating batches of data. This preprocessed data is fed into our training model.

16
CHAPTER 5

IMPLEMENTATION

5.1. Data Exploratory and Analysis


The working principle of this model can be formalized as follows:
• Our input is a training dataset that consists of 40,479 jpg and tiff images, each labeled
with one of the 17 different classes.
• Then, we use this training set to train a classifier to learn what every one of the classes
looks like.
• In the end, we evaluate the quality of the classifier by asking it to predict labels for a
new set of images that it has never seen before. We will then compare the true labels of
these images to the ones predicted by the classifier.

Table 5.1: Name of the image with its corresponding labels

17
Data Description

Data Description

Consists of jpg images (train-jpg) of total size 634.68MB with 40479


Training Data jpg
number of files

Consists of tiff images (train-tif-v2) of total size 40479 with 40479


Training Data tiff
number of files

Consists of jpg images (test-jpg) of total size 637.8MB with 40669


Test Data one
number of files

Consists of jpg images (test-jpg) of total size 321.08MB with 20522


Test Data two
number of files

Table 5.2: Data used in project for both training and testing of models

The information contained in the table above which consist of three folders:
1. train-jpg: which consist of jpg images in three band rgb with labels
2. train-tif-v2: which consist of tiff images in four band rgb and nir(near-infrared)
3. test-jpg: which consist of test images without labels
4. test-jpg-additional consist of additional test images

Co-Occurrence Matrix of Each Labels:

Figure 5.1: Co-occurrence matrix of each label

18
Analyzing tiff images:
#Import necessary libraries
img=tiff.imread("D:/N/satellite/data/train-tif-v2/train_30.tif")
img.shape
(256, 256, 4)
Tiff image consist of four bands red, blue, green, near infra-red. It is a 16-bit color channel i.e.
the pixel intensities range from 0 to 65536 as compared to jpg which is 8-bit color channel.
array ([[[ 4934, 4159, 3139, 8409],
[ 4869, 4179, 3215, 8346],
[ 4941, 4172, 3192, 8218],
...,

Figure 5.2: Near infrared image Figure 5.3: Natural RGB image

Vegetation index:

a) b)

Figure 5.4: Normalize difference Vegetation and water index

19
The NDVI: Normalized Difference Vegetation Index is used for finding state of healthy plants
and can be calculated as:

NDWI: Normalized Difference Water Index is used for finding water bodies can calculated as

5.2. The Base Model

Figure 5.5: Project Base Model Architecture

The network uses two consecutive convolutional layers followed by a max pooling operation
to extract features from the input image. In the first convolution layer, we performed
convolution operation using 32 filters of size 3x3. On the second layer we perform another
convolution operation using 64 filters of same size. In next layer, we performed a max pooling
operation of pooling size 2x2 that sparse the layer to 62 x 62 dimension. After the max pooling
operation, the representation is flattened and passed through a Multi-Layer Perceptron (MLP)
to carry out the task of classification. This involves fully connecting the flattened representation
to 17 output neurons with the activation of each output neuron indicating whether the class is
present or not.

20
ReLU activation function is performed on output of each of the convolution layer and in the
final output we performed sigmoid activation which gives us the probability of the class being
present between 0 and 1.

5.3. Model Training


We trained our model on binary cross entropy loss function using Adam optimizer. First, we
trained the model up to 25 epochs. During training, we monitor the validation loss. We reduced
the learning rate once the validation loss does not improve. After 25 epochs we obtained a
validation loss of 0.1385 and validation accuracy of 0.9447. Then we retrained the model for
another 25 epochs. After the eighth epoch, the validation loss did not improve so we stopped
the training of the model.
First training of the model

a) b)

Figure 5.6: Graph of accuracy and loss

a) b)

Figure 5.7: Graph of validation accuracy and loss

In the first training as shown in figure 12 and 13, we noticed the gradual increase in the
accuracy with a corresponding drop in the loss function after each epoch. After 25 epochs we
obtained the following values for training and validation data respectively.

21
Training loss: 0.1184 Training Accuracy: 0.9538
Validation loss: 0.1385 Validation Accuracy: 0.9447

Second training of the model


In the second training phase shown in figure 14 and 15, the validation loss increases after
second epoch causing our model’s optimizer to overshoot and thus diverges from the optimum
value. So, to resolve this, we decrease the learning rate by 0.1 factor which results in decrease
of validation loss. The validation loss stops decreasing at the eighth epoch, so we halt the
training process at the tenth epoch.
We obtained the following values for training and validation data respectively in our second
training phase.
Training loss: 0.1139 Training Accuracy: 0.9558
Validation loss: 0.1202 Validation Accuracy: 0.9533

a) b)

Figure 5.8: Graph of training accuracy and loss

a) b)

Figure 5.9: Graph of validation accuracy and loss

22
Figure 5.10: Graph of learning rate

Based on the result of the first architecture, we decided to add two more layers of
convolution consisting of 128 and 256 filters of size of 3x3 respectively. In this
architecture shown in figure 5.10, we performed max pooling after two convolutions.

Figure 5.10: Base Model Architecture with more convolution


layers
The result obtained from this model shows improved accuracy as shown below
Training loss: 0.1030 Training Accuracy: 0.9597
Validation loss: 0.1170 Validation Accuracy: 0.9540

Similarly, we have also implemented the VGG16 model as shown in figure 5.11 in an
attempt to arrive at a model with the best accuracy.

23
With the VGG16 architecture we obtained the following results from training the
model:
Training loss: 0.0717 Training Accuracy: 0.9722
Validation loss: 0.0867 Validation Accuracy: 0.9666

Finally, we implemented DenseNet architecture [11]. The approach used in this model is
different from the conventional approach for building convolutional neural network. Usually,
when the CNN goes deeper, the path for information from input layer to the output layer
becomes so big such that they can get vanished before reaching the other side.
DenseNet provides solution this problem by simplifying connectivity pattern between layers.
The figure below gives an idea of how a DenseNet architecture is built.

Figure 5.12: DenseNet Architecture


Using the above architecture of DenseNet, we obtained the following result:
Training loss: 0.0598 Training Accuracy: 0.9772
Validation loss: 0.0930 Validation Accuracy: 0.9655

24
Below are the Results obtained from each of the Model on training dataset

Table 5.3: Classification report for Base Model on training data

Table 5.4: Classification report for Second Base Model on training


data

25
data
Table 5.5: Classification report for DenseNet Model on training
data

data

Table 5.6: Classification report for VGG16 Model on training data

data

26
Table 5.7: Classification report for Ensemble DenseNet and VGG16 Model on training data

data
Comparing the results obtained from the four models, we will observe a gradual improvement
in the accuracy of both training and validation and a corresponding drop in the loss function.
This phenomenon indicates that the deeper the convolutional network the higher the percentage
of making accurate predictions. However, there may be some exception where architecture
with fewer layers as in the case of the base model, perform better than those with more layers
as we will noticed in the testing phase of each of the models after training.

27
CHAPTER 6

TESTING AND REPORTS

This chapter shows the results obtained from testing each of the model we have built or
implemented. Testing of each of the model is done using the training dataset which has the true
label of the images. The testing dataset available does not have the true label with which we
could evaluate degree of accuracy of the prediction of each model on a particular image.
However, we evaluate our prediction by submitting our predicted labels to Kaggle Submission
which will gives the f2 score of our model.

6.1 Test Results

Test 1:

Figure 6.1: Test Image 1

28
Prediction:

Table 6.1: Predicted Labels for Test image 1

Test 2:

Figure 6.2: Test Image 2

29
Prediction:

Table 6.2: Predicted Labels for Test image 2

Test 3:

Figure 6.3: Test Image 3

30
Prediction:

Table 6.3: Predicted Labels for Test image 3

Test 4:

Figure 6.4: Test Image 4

31
Prediction:

Table 6.4: Predicted Labels for Test image 4

The table below shows the Kaggle Submission results.

F2 on Kaggle
Model F2 on Validation #parameters
Submission

Base Model 1 0.88864 0.87388 31.51M

Base Model 2 0.89492 0.89154 27.94M

VGG16 0.93763 0.92544 14.85M

DenseNet 121 0.92327 0.91585 7.31M


Ensemble
(VGG16 & - 0.92725 -
DenseNet 121)
Table 6.5: Results obtained from Kaggle Submission for different Models

32
6.2. Loss Function of Base Model for different Activation Functions
The table below contains results of loss function for the base model on different activation
functions.

Activation Function Validation Loss Validation Accuracy

Sigmoid
0.2747 0.9036

Tanh
0.2559 0.9053

ReLU
0.1917 0.9242

Table 6.6: Results for various activations functions on the Base Model

6.3. Discussion
Comparing the results from the base model to the Ensemble model shows an across-the-board
improvement. The base model is quite good at predicting primary and clear labels, which
makes sense since the data has the most training examples of these types. The base models
perform moderately well on most of the common atmosphere and ground features, but gets an
average 0.57 F1 for nearly all of the rare features. This is the main deficiency of the base models
its failure to identify the rare labels that appear to be similar. However, on distinctively unique
features, it performs pretty well. This deficiency is also reflected in the results obtained from
other models we have implemented. The inability of the models to make good prediction on
rare labels is due to the very fact that the number of these labels available in the training dataset
are few and thus they become bias toward those with higher numbers of labels. To resolve this
issue, there are various techniques that could be used augment the rare label features. Some of
these are undersampling, oversampling, synthetic sampling.
Undersampling involves randomly deleting the images with higher number of labels so the
labels are closely or evenly distributed. Oversampling on the other hand is increasing the
number of images of rare labels by copying the image with augmentation technique like
rotating, shifting, cropping etc. Synthetic Sampling uses some technique to synthetically
manufacture observation of rare labels.

33
To addressed the issue of relating imbalance class problem mentioned above, we implemented
oversampling technique. With this technique, we were able to improve the prediction on the
class which was oversampled. The figure below shows the improvement in the precision and
recalling of the slash_bur class.

Figure 6.5: Shows the improvement in the outcome of training data on the oversampled class

The improvement recorded in the oversampled class shows that we can improve the overall
accuracy of the model. However, to implement this technique it requires a huge amount of
time, ranging from data preparation to appropriate labelling to training of the data. Nonetheless,
we will need to explore this area further for an appropriate and efficient way to implement
these techniques.

34
CONCLUSION

This report contained the documentation of the project objectives, feasibility study, design
diagrams and implementation of Satellite Imagery Processing for Automatic Tagging. The
report highlights the project development phases along some intuition about how convolutional
neural network can be fine-tuned to improve performance as well as accuracy.
The aim of the project was to build a model that could automatically tag satellite images with
little or no human effort require. This we have done by building our own base model and then
implement high performing architecture used in image classification. The results obtained from
models were then evaluated with the appropriate metrics.
The project has broadened our general knowledge on image processing and machine learning.
The results obtained from the various model implemented in this project indicates some
important factors to be considered while building a convolutional neural network. These factors
include the size of the dataset available and how the labels are distributed within the dataset,
the depth of the network, the activation function to be used, the nature of the classification
problem etc.
For a multi-label classifier attempting to provide better information of a given location like the
one we have implemented, it should be able to identify activities such as mining, logging, and
slash-and-burn to be truly effective.

35
REFERENCES

[1] Li, Miao, et al. "A review of remote sensing image classification techniques: The role of spatio-
contextual information." European Journal of Remote Sensing 47.1 (2014): 389-411.

[2] Dos Santos, Jefersson A., et al. "Efficient and effective hierarchical feature propagation." IEEE
Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7.12 (2014):
4632-4643.

[3] Gardner, Daniel, and David Nichols. "Multi-label Classification of Satellite Images with Deep
Learning." http://cs231n.stanford.edu/reports/2017/pdfs/908.pdf 10/03/2019

[4] Richa Bhatia "Understanding the Difference Between Deep Learning & Machine Learning"
https://www.analyticsindiamag.com/understanding-difference-deep-learning-machine-
learning/ 13/2/2019.

[5] Li, Ying, et al. "Deep learning for remote sensing image classification: A survey." Wiley
Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8.6 (2018): e1264.

[6] Wei, Yunchao, et al. "Cnn: Single-label to multi-label." arXiv preprint arXiv:1406.5726 (2014).

[7] Kalra, Kanika, Anil Kumar Goswami, and Rhythm Gupta. "A comparative study of supervised
image classification algorithms for satellite images." International Journal of Electrical,
Electronics and Data Communication 1.10 (2013): 10-16.

[8] Lu, Dengsheng, and Qihao Weng. "A survey of image classification methods and techniques
for improving classification performance." International journal of Remote sensing 28.5
(2007): 823-870.

[9] Dishashree Gupta “Architecture of Convolutional Neural Networks (CNNs) demystified”


Analytics Vidhya, https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-
scratch-in-python-and-r/ 10/03/2019

[10] Dishashree Gupta “Architecture of Convolutional Neural Networks (CNNs) demystified”


Analytics Vidhya, https://www.analyticsvidhya.com/blog/2017/10/fundamentals-deep-
learning-activation-functions-when-to-use-them/ 10/03/2019.

[11] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional
networks. InProceedings of the IEEE conference on computer vision and pattern recognition
2017 (pp. 4700-4708).

36

Vous aimerez peut-être aussi