# Prepare and Fit Spatial Regression Models 20190222

 Pay Notebook Creator: Roy Hyunjin Han 0 Set Container: Numerical CPU with TINY Memory for 10 Minutes 0 Total 0

# Train Model to Estimate Graduation Rate from Tree Count¶

## Train Dummy Model¶

In [ ]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()

In [ ]:
import pandas as pd
dataset = pd.DataFrame([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
], columns=['x1', 'x2', 'y'])
dataset

In [ ]:
X = dataset[['x1', 'x2']].values
X

In [ ]:
y = dataset['y'].values
y

In [ ]:
model.fit(X, y)

In [ ]:
model.predict([[8, 9]])

In [ ]:
model.predict([
[0, 1],
[8, 9],
])


## Save Dummy Model¶

In [ ]:
# Save using pickle
from pickle import dump
dump(model, open('dummy-model.pkl', 'wb'))

In [ ]:
# Save using joblib which is another option
import subprocess
subprocess.call('pip install joblib'.split())
from joblib import dump
dump(model, '/tmp/dummy-model.joblib')


In [ ]:
from pickle import load
model

In [ ]:
# Load using joblib which is another option
# import subprocess
# subprocess.call('pip install joblib'.split())
model


## Train Example Model¶

In [ ]:
import pandas as pd
t


### Prepare Feature Matrices X1, X2 and Target Variable y¶

In [ ]:
X1 = t[[
'Tree Count Within 100 Meters',
'Sum of Distances from Trees Within 100 Meters',
'Average Risk of Trees Within 100 Meters']].values
X1

In [ ]:
X2 = t[[
'Tree Count Within 100 Meters',
'Average Risk of Trees Within 100 Meters']].values
X2

In [ ]:
y = t['Graduation Rate']
y


### Compare Models That Use Different Features and Algorithms¶

You will need to choose an appropriate metric to evaluate the performance of your fitted model.

Which metric you choose depends on whether you are performing classification, clustering or regression.

If the target variable that we want to predict is ...

• a category (classification) then use a classification metric like f1
• a number (regression) then use a regression metric like neg_mean_absolute_error
In [ ]:
from sklearn.model_selection import cross_val_score
models = []
scores = []

def train(model, X):
model.fit(X, y)
models.append(model)
score = cross_val_score(
model, X, y, cv=3,
scoring='neg_mean_absolute_error',
).mean()
scores.append(score)
return score

In [ ]:
from sklearn.linear_model import LinearRegression
train(LinearRegression(), X1)

In [ ]:
train(LinearRegression(), X2)

In [ ]:
from sklearn.linear_model import BayesianRidge
train(BayesianRidge(), X1)

In [ ]:
train(BayesianRidge(), X2)

In [ ]:
from sklearn.svm import SVR
train(SVR(gamma='scale'), X1)

In [ ]:
from sklearn.svm import SVR
train(SVR(gamma='scale'), X2)


### Choose Model with Least Error¶

In [ ]:
import numpy as np
best_index = np.argmax(scores)
best_index

In [ ]:
best_model = models[best_index]
best_model

In [ ]:
import pickle
pickle.dump(best_model, open('/tmp/model.pkl', 'wb'))