Académique Documents
Professionnel Documents
Culture Documents
net/publication/333001175
CITATIONS READS
0 4,849
1 author:
Vansh Jatana
SRM University
6 PUBLICATIONS 1 CITATION
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Vansh Jatana on 10 May 2019.
Machine learning is the scientific study of algorithms and statistical model that computer system
use to effectively perform a specific task without using explicit instructions, relying on patterns
and inference.A machine is said to learn when its performance improves with experience.
Learning requires algorithms and programs that capture data and ferret out the interesting or
useful patterns.
Problem Definition
In this we define our problem in both formal and informal way and we also take care of
assumptions related to our problem.Each attribute of data must be well defined along with its
relation, benefits and uses as proper understanding of data is very important.We also prepare a
manual solution for our problem
Analyse Data
There are two steps for analysing data.The first thing we do is summarise our data using data
structure and do data distribution.When a distribution of categorical data is organized,the
number or percentage of individuals are in each group. When a distribution of numerical data is
organized, they’re often ordered from smallest to largest, broken into reasonably sized
groups.The second step is visualisation. In this we present our data graphically rather than
numerically.When data is visualized, it's easier to see emerging trends,it is also a powerful
way to communicate a finding because the fast intuition possible supports easier
collaboration and faster innovation.
Prepare Data
Data preparation is very important step for solving any problem.In this we preprocess data and
transform it.Data preprocessing is a data mining technique that involves transforming raw data
into an understandable format. Real-world data is often incomplete, inconsistent or lacking in
certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven
method of resolving such issues. Preprocessing can be done by formatting,cleaning and
sampling.After preprocessing we convert data or information from one format to another, usually
from the format of a source system into the required format of a new destination system. It is
done using scaling, decomposition and aggregation
Evaluate Algorithms
The evaluation of algorithm consist three following steps:-
1. Test Harness
2. Explore and select algorithms
3. Interpret and report results
Test Harness
Test harness is a collection of software and test data used to test models during
development.provides a consistent way to evaluate machine learning algorithms on a dataset. It
must allow for different machine learning algorithms to be evaluated, whilst the dataset,
resampling method and performance measures are kept constant.
Improve Results
We always try to get more accuracy and minimum error and for that we do feature engineering
and ensembling methods.
Feature engineering is the process of using domain knowledge of the data to create features
that make machine learning algorithms work. Feature engineering is fundamental to the
application of machine learning.
Ensemble methods is a machine learning technique that combines several base models in order
to produce one optimal predictive model. It can be done by bagging, boosting and blending
Present Results
The last stage is to present our result in which the following things items should be present
1. Context
2. Problem
3. Solution
4. Findings
5. Limitations
6. Conclusion
View publication stats