Let's start with the code: Line 1 assigns to a variable the location of the dataset. Line 7 imports the matplotlib library to plot the data and the results while line 8 imports seaborn which is another data visualization library which, I believe, improves the aesthetics of the plots and will be also used for some specific plots.įrom sklearn.linear_model import LinearRegressionįrom sklearn.model_selection import cross_val_score,train_test_splitįrom sklearn.feature_selection import RFE Line 6 imports the metrics that will be used to evaluate the performance of the model. Line 5 imports a feature selection function which we will use at the end to try to reduce the number of variables. Line 4 imports functions that we will be using to split the data into training and testing and also functions that perform cross-validation. Line 3 shows how we import the Linear Regression model that we are going to use today. The second is numpy which is especially useful in data with many dimensions and it also has many mathematical functions. The first library is pandas, which is used to handle data and load files. If you are interested in creating Linear Regression from scratch you can click the button below. I am going to use Python and a free machine learning library called Scikit-Learn. Now that we have defined the problem and the methodology we can show how this is done in practice. Linear Regression will be used to predict these values, click the button below for an intro on Linear Regression which highlights the main concepts and formulas. The output,called MEDV, is the median value of the houses (in dollars) and the number is divided by 1000. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town TAX full-value property-tax rate per $10,000 RAD index of accessibility to radial highways DIS weighted distances to five Boston employment centres AGE proportion of owner-occupied units built prior to 1940 NOX nitric oxides concentration (parts per 10 million) CHAS Charles River dummy variable (= 1 if tract bounds river 0 otherwise) INDUS proportion of non-retail business acres per town ZN proportion of residential land zoned for lots over 25,000 sq.ft. Below you can find the description of all the 13 variables which take into account different aspects such as crime rate, pollution, number of rooms and education: The dataset can be ound Here under the "Boston housing" section. This is a small dataset from 1978 with 506 records and 13 variables that define the houses. The problem that we need to solve is about predicting the values of the houses in Boston.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |