House Price Prediction - Research Paper FINAL DRAFT
House Price Prediction - Research Paper FINAL DRAFT
Abstract:
Real estate is one of the well-known investments through out history, thus it at least once
crosses a person’s mind to buy or sell house
In the first scenario a buyer world requires a modest price for the house which covers all
the factors they are looking. On the other hand, a seller would like to get a sensible for
the facilities provided by their property. These quips require a vast knowledge and
expertise of the field or a simple price predicting system
The given system predicts housing price using a subset of supervised machine learning
which is multiple linear regression
Key words House price prediction , supervised machine learning, multiple linear
regression
1. Introduction
When people want to buy a house, they usually opt for one that is affordable and has all
of the amenities they desire. The house price forecast will assist them in determining
whether the house they want to buy is worth the money. People who want to sell their
home face a similar situation. Using the house price prediction system, the seller will be
able to determine what characteristics he or she may add to the house in order to increase
the sale price.
The goal of this research is to forecast housing prices using multiple criteria. This will
provide the buyer an indication of how much money he or she would need to buy the
house of their dreams. It will also enable the seller to obtain information about the
house's true value and how he or she might optimize the profit from the sale.
There are numerous platforms that assist buyers and sellers in estimating the value of a
home they desire and the property they seek. MagicBricks and 99acres are few of them.
They allow the user to enter the house's location, as well as any other features, from
anywhere in India, rendering the house price prediction system more productive.
2. LITRATURE SURVEY
Over the past few years, there have been a lot of studies conducted regarding the
analysis and prediction of house prices. Wilson [7] developed an artificial neural
network which helped in predicting the future trends of house prices in England.
Mark and John [3] developed a regression model which was useful in analyzing
house price trends of an area. Tinghao [5] predicted the real estate prices using
auto regressive integrated moving average model. Zhangming [8] predicted
house prices by using back propaga- tion neural network model. Sampath
Kumar and Santhi [4] used multiple linear regres- sion technique to predict
house price of an area, and they also predicted what would be the increase in
price of the land after a period of one year. Kilpatrick [1] stated how and why the
time series regression models are useful for the prediction of house prices. Wang
and Tian [6] made use of the neural networks in order to find out the house price
trends. Li Li and Kai-Hsuan Chuet [2] also used neural networks to predict
house prices in Taipei. Instead of using normal parameters, they used economic
parameters in order to make their house price prediction model.
The block diagram given below is the summary of the methodology followed in the paper
Fig. 1. Block Diagram
3.1 GATHERING AND ANALYSIS
There are various stages to our technique. The first stage is data collection, where We
gathered the data from the internet. The machine learning model will be trained with this
data. This stage's dataset is unstructured and raw data. The data collection consists of 546
rows and 12 columns. Prices are listed in Indian rupees, while plot sizes are listed in
square feet, according to the dataset. The dependent variable in the dataset is the price
column, while the remaining columns are independent variables (also called features).
In this phase, we transformed our unstructured dataset into a format suitable for training a
machine learning model. Because we'll be using a multivariate regression model that will
be trained using our data, all of the independent variables must store data in the form of
numbers rather than text.
However, the columns driveway, recroom (recreational room), fullbase (full basement),
gashw (hot water supply), airco (central air conditioning), and pre-farea (preferred area)
in our dataset contain text data in the form of yes or no. I used the scikit-learn python
library's 'LabelBinarizer' function to transform this into numerical data, where 'yes' is
represented by the number 1 and 'no' is represented by the number 0.
The story column, which denotes the number of floors in the house, contains text data in
the form of one, two, three, and four. We used the concept of 'one hot encoding' to turn
this text data into numerical data. Stories one, stories two, stories three, and stories four
were created by dividing the stories column into four new columns. The data will now be
stored in binary numbers, with 0 representing 'false' or 'no' and 1 representing 'true' or
'yes' in the new columns. The original story column is then removed because it is no
longer necessary.
After completing all of the procedures above, the dataset's information is in the form of
numerical data, and it is qualified to be used.
From the figure above, we can see that price is dependent on many factors and
these factors are the independent variables/features. The model’s task will be to
calculate the coefficients (m1, m2, , m14) and to calculate the intercept ‘c’.
After calculating these, the model will be able to calculate the price for any
custom input.
We developed a scatter plot to show the contrast between the actual prices of houses
mentioned in the dataset and the prices projected by the model to see how well the model
is working.
From the preceding graph, it can be seen that the actual price for some datapoints is very
near to the predicted price, indicating that the model is highly accurate for those
datapoints. However, the picture also reveals that the discrepancy between the actual
price and the predicted price is considerable for some data points, indicating that the
result is less accurate for those data points. In general, we can state that the model is
reasonably accurate.
Fig. 5. Scatterplot showing difference between actual price and
predicted price
4. RESULTS
A comparison of the multivariate linear regression model and various machine learning
models is shown in the diagram below.
5. CONCLUSION
We utilised a regression model to forecast the price of several houses in this study. It
belongs to the supervised learning category, which is one of the machine learning
subcategories. All of the processes necessary for the house price prediction system's
effective completion have been accomplished. The multiple linear regression model
appears to be appropriate for forecasting housing prices.
6. REFRENCES
1. Wilson, I.D., Paris, S.D, Ware, J.A., & Jenkins, D.H. Residential Property Price Time
Series Forecasting With Neural Networks. Journal of Knowledge-Based Systems; 2002,
15: 335-341
2. Mark, A.S., & John, W.B. Estimating Price Paths for Residential Real Estate. Journal of
Real Estate Research; 2003: 25, 277–300.
3. Tinghao,. Real Estate Price Index Based on ARMA Model, Statistics and Deci- sion;
2007, 7.
4. Zhangming, H. Research on Forecasting Real Estate Price Index Based on Neu- ral
Networks. Journal of the Graduates Sun Yat Sen University, 2006;27.
5. Sampathkumar.V and Helen Santhi.M. Artificial Neural Network Modeling of Land
Price at Sowcarpet in Chennai City, International Journal of Computer Science &
Emerging Technologies; 2010, 1:44–49.
6. Kilpatrick, J.A Factors Influencing CBD Land Prices. Journal of Real Estate; 2000, 25: 28-
29.
7. Wang, J., & Tian, P. Real Estate Price Indices Forecast by Using Wavelet Neural
Network, Computer Simulation, 2005:2.
8. Nihar Bhagat, Ankit Mohokar, Shreyash House Price Forecasting using Data
Mining.International Journal of Computer Applications 152(2):23-26, October 2016.
9. Li Li and Kai-Hsuan Chu, “Prediction of Real Estate Price Variation Based on Economic
Parameters,” Department of Financial Management, Business School, Nankai
University, 2017.
10. Abbasov C. The prediction of the chance of selling of houses as the factor of financial
stability. InApplication of Information and Communication Technologies (AICT), 2016
IEEE 10th International Conference on 2016 Oct 12 (pp. 1-4). IEEE.
11. Atharva chogle1, Priyanka khaire2, Akshata gaud3, Jinal Jain ”House Price Forecasting
using Data Mining Techniques”, International Journal of Advanced Research in
Computer and Communication Engineering, Vol. 6, December 2017
12. Banerjee D, Dutta S. Predicting the housing price direction using machine learning
techniques. In 2017 IEEE International Conference on Power, Control,Signals and
Instrumentation Engineering (ICPCSI) 2017 Sep 21 (pp. 2998-3000). IEEE.
13. Febrita RE, Alfiyatin AN, Taufiq H, Mahmud WF. Data-driven fuzzy rule extraction for
housing price prediction in Malang, East Java. In Advanced Computer Science and
Information Systems (ICACSIS), 2017 International Conference on 2017 Oct 28 (pp.
351-358). IEEE.
14. Lim WT, Wang L, Wang Y, Chang Q. Housing price prediction using neural networks. In
Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2016
12th International Conference on 2016 Aug 13 (pp.518-522). IEEE.
15. Lu, Sifei, et al. "A hybrid regression technique for house prices prediction." Industrial
Engineering and Engineering Management (IEEM), 2017 IEEE International Conference
on. IEEE, 2017
16. Mukhlishin, Muhammad Fahmi, Ragil Saputra, and Adi Wibowo. "Predicting house
sale price using fuzzy logic and K-Nearest Neighbor." Informatics and Computational
Sciences (ICICoS), 2017 1st International Conference on. IEEE, 2017
17. Tan F, Cheng C, Wei Z. Time-Aware Latent Hierarchical Model for Predicting House
Prices. In2017 IEEE International Conference on Data Mining (ICDM) 2017 Nov 1 (pp.
1111-1116). IEEE.
18. Wang JJ, Hu SG, Zhan XT, Luo Q, Yu Q, Liu Z, Chen TP, Yin Y, Hosaka S, Liu Y. Predicting
house price with a memristor-based artificial neural network. IEEE Access.
2018;June(6):(pp.16523- 16528).IEEE.
19. Yoon JH, Baldick R, Novoselac A. Dynamic demand response controller based on real-
time retail price for residential buildings. IEEE Transactions on Smart Grid. 2014
Jan;5(1):(pp.121-9).IEEE.
20. Arulkumar V. "An Intelligent Technique for Uniquely Recognising Face and Finger
Image Using Learning Vector Quantisation (LVQ)-based Template Key Generation."
International Journal of Biomedical Engineering and Technology 26, no. 3/4 (February
2, 2018): 237-49.
21. Prof. Prachiti Deshpande. (2016). Performance Analysis of RPL Routing Protocol for
WBANs. International Journal of New Practices in Management and Engineering,
5(01), 14 - 21.
22. V Arulkumar, Charlyn Puspha Latha, Daniel Jr Dasig, "Concept of Implementing Big
Data In Smart City: Applications, Services, Data Security In Accordance With Internet
of Things and AI" International Journal of Recent Technology and Engineering 8, no. 3
23. V Arulkumar, C Selvan, V Vimalkumar, "Big data Analytics in healthcare industry. An
Analysis of Healthcare applications in Machine learning with Big data Analytics" IGI
Global Big Data Analytics for Sustainable Computing, 8, no. 3 (September 2019): 350.