Vous êtes sur la page 1sur 14

Experiment No.

1
_______________________________________________________________
Title: Data Preprocessing

i) Data Type Conversion


ii) Data Transformation
Objectives:

To do data preprocessing with weka

Requirements
1. The Explorer
2. The Knowledge Flow interface
3. Describe the arff file format.

Steps of execution:

Step1: Load the data set by using open file in the explorer.

Step2: Weka recognizes the attributes and relation in the dataset

Step3:

Data Preprocessing

3.1 Data Type Conversion-


3.1.1 Select Choose ->weka.filters.unsupervised.attribute.the desired data type conversion
Filter. For example weka.filter.unsupervised.attribute.NominaltoString
3.1.2: Click the textbox immediately to the right of the choose button. In the resulting
dialog box enter the index of the attribute to be filtered out.
3.1.3: Make sure that invert selection option is set to false and click OK
3.1.4: Click the apply button to apply filter to this data. This will remove the attribute and
create new working relation.
3.1.5: Save the new working relation as an arff file.
3.2 Data Transformation:
3.2.1 Select Choose ->weka.filters.unsupervised.discretize
3.2.2 Select either first-last or the desired numeric column and press apply.
3.2.3 The above apply converts numeric data into categorical data
3.2.4 Save the work as new relation file.
Input Relation as CSV:
Name,Gender,RollNo,Mark1,Mark2,Total,Average,Class,Result
Raja,M,3401,45,60,105,52.5,Second,Pass
Devi,,3402,65,25,90,45,Second,Reappear
Kannan,M,3403,67,89,156,78,First,Pass
Ahamed,M,3404,100,34,134,67,First,Pass
Shanthi,F,3405,90,25,115,57.5,Second,Reappear
Ananthi,F,3406,45,100,145,72.5,First,Pass
Aravind,M,3407,23,65,88,44,Second,Reappear
Anvar,M,3408,74,38,112,56,Second,Pass
Lakshmi,,3409,63,78,141,70.5,First,Pass
Ramu,M,3410,43,59,102,51,Second,Pass
Sakthivel,M,3411,83,47,130,65,First,
Vijaya,F,3412,96,36,132,66,First,Pass
Kokul,M,3413,73,24,97,48.5,Second,Reappear
Ranjith,M,3414,34,45,79,39.5,Second,Pass
Prasana,M,3415,62,78,140,70,First,Pass
Abinaya,F,3416,93,47,140,70,First,Pass
Ganesh,M,3417,98,62,160,80,First,Pass
Bharathi,F,3418,87,100,187,93.5,First,Pass
Output Relations as arff
Datatype conversion
@relation Student-weka.filters.unsupervised.attribute.NominalToString-C1
@attribute Name string
@attribute Gender {M,F}
@attribute RollNo numeric
@attribute Mark1 numeric
@attribute Mark2 numeric
@attribute Total numeric
@attribute Average numeric
@attribute Class {Second,First}
@attribute Result {Pass,Reappear}
@data
Raja,M,3401,45,60,105,52.5,Second,Pass
Devi,?,3402,65,25,90,45,Second,Reappear
Kannan,M,3403,67,89,156,78,First,Pass
Ahamed,M,3404,100,34,134,67,First,Pass
Shanthi,F,3405,90,25,115,57.5,Second,Reappear
Ananthi,F,3406,45,100,145,72.5,First,Pass
Aravind,M,3407,23,65,88,44,Second,Reappear
Anvar,M,3408,74,38,112,56,Second,Pass
Lakshmi,?,3409,63,78,141,70.5,First,Pass
Ramu,M,3410,43,59,102,51,Second,Pass
Sakthivel,M,3411,83,47,130,65,First,?
Vijaya,F,3412,96,36,132,66,First,Pass
Kokul,M,3413,73,24,97,48.5,Second,Reappear
Ranjith,M,3414,34,45,79,39.5,Second,Pass
Prasana,M,3415,62,78,140,70,First,Pass
Abinaya,F,3416,93,47,140,70,First,Pass
Ganesh,M,3417,98,62,160,80,First,Pass
Bharathi,F,3418,87,100,187,93.5,First,Pass
Input for data transformation
@relation weather
@attribute outlook {sunny,overcast,rainy}
@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE,FALSE}
@attribute play {yes,no}
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
Output relation as arff
@relation weather-weka.filters.unsupervised.attribute.Discretize-B3-M-1.0-R2
@attribute outlook {sunny,overcast,rainy}
@attribute temperature {'\'(-inf-71]\'','\'(71-78]\'','\'(78-inf)\''}
@attribute humidity numeric
@attribute windy {TRUE,FALSE}
@attribute play {yes,no}
@data
sunny,'\'(78-inf)\'',85,FALSE,no
sunny,'\'(78-inf)\'',90,TRUE,no
overcast,'\'(78-inf)\'',86,FALSE,yes
rainy,'\'(-inf-71]\'',96,FALSE,yes
rainy,'\'(-inf-71]\'',80,FALSE,yes
rainy,'\'(-inf-71]\'',70,TRUE,no
overcast,'\'(-inf-71]\'',65,TRUE,yes
sunny,'\'(71-78]\'',95,FALSE,no
sunny,'\'(-inf-71]\'',70,FALSE,yes
rainy,'\'(71-78]\'',80,FALSE,yes
sunny,'\'(71-78]\'',70,TRUE,yes
overcast,'\'(71-78]\'',90,TRUE,yes
overcast,'\'(78-inf)\'',75,FALSE,yes
rainy,'\'(-inf-71]\'',91,TRUE,no
Output:Data type conversion:
Data transformation:
Discretize(Numeric to Nominal using weather numeric.arff)
Conclusion:
The data set is pre processed with type conversion and numeric to nominal value transformation
using weka.
Experiment No. 2
_______________________________________________________________
Title: Filters

i) Replace missing Values


ii) Add expression

Feature Selection

i)Filters ii) Wrapper & iii)Dimensionality reduction


Objectives:

To filter and formalize missed values and to convert the multi dimensional data to single
dimensional data

Steps for execution:

Step1: Load the data set by using open file in the explorer.

Step2: Weka recognizes the attributes and relation in the dataset

Step 3:

3.1 Replace missing values

3.1.1Choose weka.filters.unsupervised.attribute.RelplacemissingWithUserConstant

3.1.2 Give the values for missed attributes by clicking the textbox right of choose
button.

3.1.4 Give 0 for numerical data and unknown for nominal and string data and press
apply

3.1.5 Weka replaces all missed values with the given values.

3.1.6 Save the work as new relation.

3.2 Add expression

3.2.1 Choose weka.filters.unsupervised.attribute.AddExpression

3.2.2 Define the expression and name by clicking the textbox after the choose.

3.2.3 Click Apply & Save it as new relation


3.3 Feature Selection using Filters

Feature Selection enables reduction of noisy data.

3.3.1 Feature Selection based on Filters:

Based on preprocessing.

a.

Input relation as CSV


Name,Gender,RollNo,Mark1,Mark2,Total,Average,Class,Result
Raja,M,3401,45,60,105,52.5,Second,Pass
Devi,,3402,65,25,90,45,Second,Reappear
Kannan,M,3403,67,89,156,78,First,Pass
Ahamed,M,3404,100,34,134,67,First,Pass
Shanthi,F,3405,90,25,115,57.5,Second,Reappear
Ananthi,F,3406,45,,145,72.5,First,Pass
Aravind,M,3407,23,65,88,44,Second,Reappear
Anvar,M,3408,74,38,112,56,Second,Pass
Lakshmi,,3409,63,78,141,70.5,First,Pass
Ramu,M,3410,43,59,102,51,Second,Pass
Sakthivel,M,3411,83,47,130,65,First,
Vijaya,F,3412,,36,132,66,First,Pass
Kokul,M,3413,73,24,97,48.5,Second,Reappear
Ranjith,M,3414,34,45,79,39.5,Second,Pass
Prasana,M,3415,62,78,140,70,First,Pass
Abinaya,F,3416,93,47,140,70,First,Pass
Ganesh,M,3417,98,,160,80,First,Pass
Bharathi,F,3418,87,100,187,93.5,First,Pass
Output as arff
@relation 'Student-
weka.filters.unsupervised.attribute.ReplaceMissingWithUserConstant-Afirst-last-Nunknown-
R0-Fyyyy-MM-dd\'T\'HH:mm:ss-weka.filters.unsupervised.attribute.AddExpression-
Eifelse(A7>=75,1,0)-NDistinction'
@attribute Name
{unknown,Raja,Devi,Kannan,Ahamed,Shanthi,Ananthi,Aravind,Anvar,Lakshmi,Ramu,Sakthi
vel,Vijaya,Kokul,Ranjith,Prasana,Abinaya,Ganesh,Bharathi}
@attribute Gender {unknown,M,F}
@attribute RollNo numeric
@attribute Mark1 numeric
@attribute Mark2 numeric
@attribute Total numeric
@attribute Average numeric
@attribute Class {unknown,Second,First}
@attribute Result {Pass,Reappear}
@attribute Distinction numeric
@data
Raja,M,3401,45,60,105,52.5,Second,Pass,0
Devi,unknown,3402,65,25,90,45,Second,Reappear,0
Kannan,M,3403,67,89,156,78,First,Pass,1
Ahamed,M,3404,100,34,134,67,First,Pass,0
Shanthi,F,3405,90,25,115,57.5,Second,Reappear,0
Ananthi,F,3406,45,0,145,72.5,First,Pass,0
Aravind,M,3407,23,65,88,44,Second,Reappear,0
Anvar,M,3408,74,38,112,56,Second,Pass,0
Lakshmi,unknown,3409,63,78,141,70.5,First,Pass,0
Ramu,M,3410,43,59,102,51,Second,Pass,0
Sakthivel,M,3411,83,47,130,65,First,?,0
Vijaya,F,3412,0,36,132,66,First,Pass,0
Kokul,M,3413,73,24,97,48.5,Second,Reappear,0
Ranjith,M,3414,34,45,79,39.5,Second,Pass,0
Prasana,M,3415,62,78,140,70,First,Pass,0
Abinaya,F,3416,93,47,140,70,First,Pass,0
Ganesh,M,3417,98,0,160,80,First,Pass,1
Bharathi,F,3418,87,100,187,93.5,First,Pass,1

Output
Conclusion
The given dataset with missed values is replaced with default user values. A new
expression is added as attribute for further processing.

Vous aimerez peut-être aussi