Vous êtes sur la page 1sur 10

INDUSTRIAL TRAINING REPORT

ON

HOUSE PRICE PREDICTION

Submitted in the partial fulfillment of the


Requirements for the award of

Degree of Bachelor of Technology in Computer Science and Engg.

Submitted By:-
Name:-Drishti Gupta
University Roll No.:-8716113

SUBMITTED TO:-
Department of Computer Science and Engineering
STATE INSTITUTE OF ENGG. AND TECH.,NILOKHERI(KARNAL)
Affiliated to kurukshetra university

i
CERTIFICATE

ii
DECLARATION

I Drishti Gupta Roll No. 8716113 , student of Bachelors of Technology (Computer


Science and Engineering), a batch of 2016-20, State Institute of Engineering and
Technology, Nilokheri hereby declare that the Summer Training project entitled “House
Price Prediction” is an original work and the same has not been submitted to any other
Institute for the award of any other degree.

Date:
Place:

DRISHTI GUPTA
Roll No.:-8716113
Computer Science and Engineering

iii
ABSTRACT
This project explores the question of how house prices in different counties are affected
by housing characteristics (both internally, such as number of bathrooms, bedrooms, etc. and
externally, such as public schools’ scores or the walkability score of the neighborhood). The
train data and test data are provided by Kaggle. The prediction target is the Sale Price of a house
given its features such as area, no of bedrooms, year built in and so on. The model is evaluated
on the root mean square error between your predictions and the truth value.

It is always a good idea to utilize the data provided in its original form to build a model and
understand what accuracy you are starting off at. This can serve as a starting point from where
on future models can be optimized. I used RandomForestRegressor from the scikit-learn library
on the training set as benchmark. Random Forest is a tree based machine learning algorithm
which is robust to overfitting. This is because it is an aggregation of imperfect decision trees.
When the predictions of all the trees are averaged, the imperfections get minimized. This is
called bagging.

Using data provided in the test and the train dataset collected from the various websites, this
project utilizes both the hedonic pricing model (Linear Regression) and various machine
learning algorithms, such as Random Forest (RF) and Support Vector Regression (SVR), to
predict house prices. The models’ prediction scores, as well as the ratio of overestimated
houses to underestimated houses are compared against Zillow’s price estimation scores and
ratio. Results show that RF gives a better price prediction score than the Zillow’s baseline on
the same dataset of Hunt County (TX) and RF gives close or the same prediction scores to the
baseline on three other counties. Moreover, this paper’s models reduce the overestimated to
underestimated house ratio of 3:2 from Zillow’s estimation to a ratio of 1:1. This paper also
identifies the four most important attributes in housing price prediction across the counties as
assessment, comparable houses’ sold price, listed price and number of bathrooms.

iv
ACKNOWLEDGEMENT
First and foremost, we wish to express my profound gratitude to Mr. Manoj Dhiman,
Chief Mentor, TCIL-IT, Chandigarh and for giving us the opportunity to carry out our project
at TCIL-IT. We find great pleasure to express our unfeigned thanks to our trainer Mr. Jitender
Kumar for his invaluable guidance, support and useful suggestions at every stage of this
project work.

No words can express out deep sense of gratitude to Mr. Jitender, without whom this project
would not have turned up this way. Our heart felt thanks to him for his immense help and
support, useful discussions and valuable recommendations throughout the course of my
project work.

We wish to thank our respected faculty and our classmates for their support.

Last but not the least we thank the almighty for enlightening us with his blessings.

v
PREFACE
In the 60 days’ summer training we study about so many languages and then we chose
to learn Machine Learning (with Python) in our summer training used because it is easy to
manage, and it is object oriented and availability of debugging tools. And then we are start to
search the best institute who give us summer training in Python. Then we found that TCIL-IT
is the best company who deal in the Python. Then we start our 60 days’ summer training from
TCIL-IT. First we learn how to make basic programs in Python. Then we start Machine
Leraning concepts with Python. Machine Learning is a field of Artificial Intelligence that
uses statistical techniques to give computer systems the ability to computer systems to learn
from the given dataset, without being explicitly programmed. After 60 days training we are
able to develop applications in Python. In 60 days’ training we implement this technology to
Automation system for house loan predictor.

Keywords: Python, Machine Learning, House price predictor.

vi
ABOUT THE COMPANY

TCIL-IT is a leading company for providing six months industrial training in Chandigarh and
six weeks industrial training in Chandigarh for IT students. The TCIL-IT is training division of
TCIL, a premier engineering organization, is a Government of India Enterprise, Ministry of
Communication and Information Technology associated with administrative control of
department of telecommunications, which was started in the year 1978. Further in the year
1999, ICS had initiated the Six months/Six weeks training division with TCIL-IT, which is
managed by ICSIL in Chandigarh. This joint venture is the coordination of Delhi State
Industrial Infrastructure Development Corporation (DSIIDC) and an undertaking of Delhi
Government & Telecommunication Consultants India Limited (TCIL) itself.

Software Development

We provide the best and latest IT software training which helps all the fresher and the
corporates to understand well and give them the knowledge to go hand in hand with the latest
technologies.

Instructor led campus

TCIL-IT helps all the new instructors to get the best exposure to show their talent in right way.

Workshops and Placement Service

At TCIL-IT, workshops are held to increase the understanding level because theoretical values
are always not enough. We provide the best placement services and for that we give our best to
give you the best.

vii
LIST OF FIGURES

FIGURES PAGE NO.


Figure 2.1 3
Figure 2.2 5
Figure 2.3 6
Figure 2.4 9
Figure 2.5 10
Figure 2.6 11
Figure 2.7 12
Figure 2.8 13
Figure 4.1 18
Figure 4.2 18
Figure 4.3 19
Figure 4.4 19
Figure 7.1 24
Figure 7.2 24
Figure 7.3 25
Figure 7.4 25
Figure 8.1 28
Figure 8.2 29
Figure 8.3 29
Figure 8.4 30
Figure 8.5 30
Figure 8.6 31
Figure 8.7 31

viii
LIST OF CONTENTS

PAGE NO.
DECLARATION iii
ABSTRACT iv
ACKNOWLEDGEMENT v
PREFACE vi
ABOUT COMPANY vii
CHAPTER 1 INTRODUCTION 1-2
CHAPTER 2 LITERATURE SURVEY 3-13
2.1Python 3
2.1.1 Advantages of Python 4
2.2 Data Science 4
2.2.1 Practical Implementation of Data Science 5
2.3 Machine Learning 6
2.3.1 How Machine Learning Works? 6-7
2.3.2 Adantages of Machine Learning 7-8
2.4 Numpy 9
2.5 Pandas 10
2.6 Matplotlib 11
2.7 Scikit-Learn 12
2.7.1 Advantages of Scikit-Learn 13
2.8 Scipy 13
2.8.1 Features of Scipy 13
2.8.2 Where is Scipy used? 13
CHAPTER 3 SYSTEM REQUIREMENT SPECIFICATION 14-16
3.1 Non-Functional Requirement 14
3.1.1 Specific Requirements 14
3.2 Software Requirement 15

ix
3.2.1 Front-End Software Requirement 15
3.2.2 Back-End Software Requirement 16
3.3 Hardware Requirement 16
CHAPTER 4 SYSTEM DESIGN AND ARCHITECTURE 17-19
4.1 System Architecture 17
4.2 Use Case Diagram 17
CHAPTER 5 SYSTEM ANALYSIS 20-21
5.1 System Objective 20
5.1.1 Relation to External Environment
5.2 Design Consideration Approach 20
5.3 Operational Concepts and Scenarios 20-21
CHAPTER 6 SYSTEM ANALYSIS TRANSFIGURATION 22
6.1 Hardware Requirements 22
CHAPTER 7 INTRODUCTION TO IDLE- DEFAULT PYTHON IDE 23-25
CHAPTER 8 SOFTWARE REQUIREMENT ANALYSIS PYTHON 26-31
CHAPTER 9 SCREENSHOT 32-42
9.1 Screenshot of Design 32-39
9.2 Screenshot of Coding Panel 40-41
CONCLUSION 43
REFERENCES 44

Vous aimerez peut-être aussi