# Prepare and Fit Spatial Regression Models 20190222

 Pay Notebook Creator: Roy Hyunjin Han 0 Set Container: Numerical CPU with TINY Memory for 10 Minutes 0 Total 0

# Predict Metrics by Zipcode¶

Here is an dummy tool template that you can use to prototype your tool. This tool template assumes that each row of your training dataset corresponds to a zipcode.

Note that this tool uses a dummy model. Please modify the inputs, outputs and model to fit your chosen hypothesis and training dataset.

Thanks to the following groups for making this work possible:

{zipcode_table : Zipcodes ? Specify the zipcodes for which you would like to predict metrics}

In [ ]:
# CrossCompute
zipcode_table_path = 'zipcode.csv'
target_folder = '/tmp'


In [ ]:
import pandas as pd
zipcode_table[:3]


## Normalize Variables¶

Here we want to normalize tree count. We can do that by dividing tree count by zipcode area in square meters.

In [ ]:
url = 'https://data.cityofnewyork.us/download/i8iw-xf4u/application%2Fzip'

In [ ]:
# Use default projection to get zipcode area in square feet
# import geotable
# print(nyc_zipcode_table.iloc['AREA'])
# print(nyc_zipcode_table.iloc['geometry_object'].area)

In [ ]:
# Get UTM projection
import geotable
utm_proj4

In [ ]:
# Get zipcode area in square meters
nyc_zipcode_table.iloc['geometry_object'].area

In [ ]:
nyc_zipcode_table['Area in Square Meters'] = nyc_zipcode_table[
'geometry_object'].apply(lambda g: g.area)
nyc_zipcode_table[['ZIPCODE', 'Area in Square Meters']][:5]

In [ ]:
# Extract relevant columns
nyc_zipcode_table = nyc_zipcode_table[['ZIPCODE', 'Area in Square Meters']].copy()

In [ ]:
# Merge tables
zipcode_table['Zipcode'] = zipcode_table['Zipcode'].astype(str)
dataset_table = pd.merge(zipcode_table, nyc_zipcode_table, left_on='Zipcode', right_on='ZIPCODE')
dataset_table[:5]

In [ ]:
# Add normalized column
dataset_table['Tree Count Per Square Meter'] = dataset_table[
'Tree Count'] / dataset_table['Area in Square Meters']
dataset_table[:5]


## Run Model to Estimate Target Variable¶

In [ ]:
# Load model
model = load(open('dummy-model.pkl', 'rb'))  # !!! Replace dummy model with your model
model

In [ ]:
# Run model
X = dataset_table[['Tree Count Per Square Meter', 'Air Pollution Value']].values
y = model.predict(X)
y

In [ ]:
# Add column
dataset_table


## Render Table¶

In [ ]:
# Select columns
output_table = dataset_table[[
'Zipcode',
'Tree Count Per Square Meter',
'Air Pollution Value',
]].copy()

In [ ]:
# Save file to target folder to include it in the result download
target_path = target_folder + '/a.csv'
output_table.to_csv(target_path, index=False)
print(f'a_table_path = {target_path}')  # Print table_path to render table


## Render Map¶

In [ ]:
output_geotable = output_table.copy()  # Prevent SettingwithCopyWarning
output_geotable

In [ ]:
# Define wkt_by_zipcode
import geotable
# Specify target_proj4 to convert from NYC spatial reference to longitude and latitude

In [ ]:
wkt_by_zipcode = {}
for index, row in nyc_zipcode_table.iterrows():
zipcode = row['ZIPCODE']
geometry = row['geometry_object']
wkt_by_zipcode[zipcode] = geometry.wkt

In [ ]:
# Geocode zipcode polygons
output_geotable['WKT'] = output_geotable['Zipcode'].apply(
lambda zipcode: wkt_by_zipcode.get(str(int(zipcode))))
output_geotable = output_geotable.dropna(subset=['WKT'])
output_geotable[:3]

In [ ]:
# nyc_zipcode_table[nyc_zipcode_table['ZIPCODE'] == '10019'].iloc['geometry_object'].wkt

In [ ]:
# Set color for each geometry using a gradient
# output_geotable['FillReds'] = output_geotable['Predicted Graduation Rate']

In [ ]:
# Set color for each geometry using a rule
output_geotable['FillColor'] = output_geotable.apply(
lambda row: 'r' if row['Predicted Graduation Rate'] < 50 else 'g',
axis=1)

In [ ]:
# See what we did
output_geotable[:3]

In [ ]:
# Save file to target folder to include it in the result download
target_path = target_folder + '/b.csv'
output_geotable.to_csv(target_path, index=False)
print(f'b_geotable_path = {target_path}')  # Print geotable_path to render map


## Render Plot¶

In [ ]:
%matplotlib inline
axes = output_table[[
]].plot(kind='bar')

In [ ]:
# Save file to target folder to include it in the result download
target_path = target_folder + '/c.png'
figure = axes.get_figure()
figure.savefig(target_path)
print(f'c_image_path = {target_path}')