Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Predicting Amsterdam house / real estate prices using Ordinary Least Squares-, XGBoost-, KNN-, Lasso-, Ridge-, Polynomial-, Random Forest-, and Neural Network MLP Regression (via scikit-learn)

Notifications You must be signed in to change notification settings

MBKraus/Predicting_real_estate_prices_using_scikit-learn

Repository files navigation

Predicting Amsterdam house / real estate prices using Ordinary Least Squares-, XGBoost-, KNN-, Lasso-, Ridge-, Polynomial-, Random Forest-, and Neural Network MLP Regression (via scikit-learn)

Approach:

  • load Pandas DataFrame containing (Dec-17) housing data retrieved by means of the following scraper, supplemented with longitude and latitude coordinates mapped to zip code (via GeoPy
  • do some simple data exploration / visualisation
  • remove non-numeric data, NaNs, and outliers (everything above 3 x standard dev of y)
  • define explanatory variables (surface,latitude,and longitude) and independent variable (price EUR)
  • split the data in train and test sets (+ normalise independent variables where required)
  • find the optimal model parameters using scikit-learn's GridSearchCV
  • fit the model using GridSearchCV's optimal parameters
  • evaluate estimator performance by means of 5 fold 'shuffled' nested cross-validation
  • predict cross validated estimates of y for each data point and plot on scatter diagram vs true y

Packages required

Scores (5 fold nested 'shuffled'cross-validation - Rsquared)

1. XGBoost Regression

  • Parameters: max_depth: 5, min_child_weight: 6, gamma: 0.01, colsample_bytree: 1, subsample: 0.7
  • Score: 0.887

2. Random Forest Regression

  • Parameters: max_depth: 6, max_feat: None, n_estimators: 10
  • Score: 0.839

3. Polynomial Regression

  • Parameters: degrees: 2
  • Score: 0.731

4. Neural Network MLP Regression

  • Parameters: act: relu, alpha: 0.01, hidden_layer_size: (10,10), learning_rate: invscal
  • Score: 0.715

5. KNN Regression

  • Parameters: n_neighbours: 10
  • Score: 0.711

6. Ordinary Least-Squares Regression

  • Parameters: None
  • Score: 0.694

7. Ridge Regression

  • Parameters: alpha: 0.01
  • Score: 0.694

8. Lasso Regression

  • Parameters: alpha 0.01
  • Score: 0.693

Sample data input (Pandas DataFrame)

   surface  rooms_new  zipcode_new  price_new   latitude  longitude
0    138.0        4.0         1060     420000  40.804672 -73.963420
1    130.0        5.0         1087     550000  52.355590   5.000561
2    116.0        5.0         1061     425000  52.373044   4.837568
3     92.0        5.0         1035     349511  52.416895   4.906767
4    127.0        4.0         1013    1050000  52.396789   4.876607

Scatter plot - Surface vs. Asking Price (EUR)

alt text

XGBoost - Predicted prices vs. True price (EUR)

alt text

Random Forest - Predicted prices vs. True price (EUR)

alt text

Polynomial - Predicted prices vs. True price (EUR)

alt text

Neural Network MLP - Predicted prices vs. True price (EUR)

alt text

KNN - Predicted prices vs. True price (EUR)

alt text

OLS - Predicted prices vs. True price (EUR)

alt text

Lasso - Predicted prices vs. True price (EUR)

alt text

Ridge - Predicted prices vs. True price (EUR)

alt text

About

Predicting Amsterdam house / real estate prices using Ordinary Least Squares-, XGBoost-, KNN-, Lasso-, Ridge-, Polynomial-, Random Forest-, and Neural Network MLP Regression (via scikit-learn)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.