Vous êtes sur la page 1sur 19

Gas Mileage Prediction

Using ANFIS model


Automobile MPG (miles per
gallon) prediction
• A typical nonlinear regression problem.
– several attributes of an automobile's profile information are used
to predict another continuous attribute, the fuel consumption in
MPG.
• Objective of this problem:
– find the important input variables which contribute more in the
prediction of MPG.
– training an adaptive network with fewer data points than
required.
• Two problems:
– Data Scarcity.
– Input Space Partitioning.
Training Data
Input Attributes Output

Number of
Car Name Displacement Horsepower Weight Acceleration Year MPG
Cylinders

Chevrolet
Chevelle 8 307 130 3504 12 70 18
Malibu

Plymouth
6 198 95 2833 15.5 70 22
Duster

Fiat 128 4 90 75 2108 15.5 74 24

Oldsmobile
Cutlass 8 260 110 4060 19 77 17
Supreme

Toyota
4 89 62 2050 17.3 81 37.7
Tercel

Honda
4 107 75 2205 14.5 82 36
Accord

Ford Ranger 4 120 79 2625 18.6 82 28


Data Scarcity problem
• For single-input data-scarcity problem, ideally 10
data points are required.
• Therefore, for 6-input model, 106 data points
required!!!
• The auto-mpg data set of the UCI repository
contains only 392 data instances.
• To solve this problem of data scarcity, the entire
dataset is partitioned into two sets :
– A training set used for model building, and
– A testing set used for model validation.
Input Space Partitioning
• Grid partitioning on a problem of 6 inputs
leads to atleast 26 = 64 rules.
– (6+1) x 4 = 448 linear parameters required for
first-order Sugeno model.
• To solve this, we can:
– select certain inputs with more predictive
power than other inputs, or
– choose tree or scatter partitioning technique .
Finding Attributes with More
Predictive Power
• The training and checking set are used to
select the set of inputs that most influence
the fuel consumption.
• We build an ANFIS model for each
combination
– Train it for one epoch.
– Report the performance achieved.
• First, we plot the ANFIS model for each of
the input variable.
Effect of every Variable on Fuel
Consumption

Blue line – Root Mean Square Errors for Training data


Green line – Root Mean Square Errors for Testing data
Effect of every Variable on Fuel
Consumption
• The plot clearly shows that Weight is the
most influential variable.
• But, training and testing RMSE are
comparable, hence there is no
“overfitting”.
• So, we can move to more than one
variable combination for the model.
Effect of Two input variable
combinations on fuel consumption
Effect of Two input variable
combinations on fuel consumption
• “Weight” and “Displacement” are
individually the most influential attributes.
• But, in fig., combination of “Weight” and
“Year” has least RMSE value.
• Hence for 2 input model, “Weight” and
“Year” attributes used.
• For other combination, onset of overfitting
observed.
Effect of 3 input variable
combinations on fuel consumption
Effect of 3 input variable
combinations on fuel consumption
• 'Weight', 'Year', and 'Acceleration' are
selected as the best combination of three
input variables.
• However, the minimal training (and
checking) error do not reduce significantly
from that of the best 2-input model.
• Therefore we will stick to the two-input
ANFIS.
Training the ANFIS Model
• For the inputs fixed, 100 epochs of training
done.

• ANFIS gives plot of RMSE for training and


checking data.
Training the ANFIS Model
• The minimal checking error occurs at
about epoch 45.
• Checking error curve goes up after 50
epochs.
• This indicates overfitting.
• So, ideally the number of training epochs
must be kept to 45-50 epochs to prevent
overtraining.
Analyzing the ANFIS Model

Unavailability of
training data

Surface of plot of inputs to output


Limitations and Cautions
• The elevated corners indicates the fact
that that the heavier an automobile is, the
more gas-efficient it will be.

• This is counter-intuitive.

• It happens due to lack of data.


Unavailability of data

Fig. shows lack of data in the upper right corner.


ANFIS vs Linear Regression
• The same problem when solved using
Linear regression gives a root mean
square error of 3.444.
• Using ANFIS, RMSE is 2.978.
• This indicates that ANFIS model
outperforms the linear regression model
by giving the most appropriate and better
results.

Vous aimerez peut-être aussi