# Introduction to Computational Analysis

 Pay Notebook Creator: Roy Hyunjin Han 0 Set Container: Numerical CPU with TINY Memory for 10 Minutes 0 Total 0
In [31]:
import pandas
import numpy as np
import os


Look at spinach prices from the US food prices dataset.

In [32]:
excelFile = pandas.ExcelFile('datasets/USA-SpinachPrices.xls')
excelFile.sheet_names

In [33]:
organicSpinachFrame = excelFile.parse('organic', index_col=0)
conventionalSpinachFrame = excelFile.parse('conventional', index_col=0)
conventionalSpinachFrame


Rearrange DataFrame into a Series with a MultiIndex.

In [34]:
organicSpinachHierarchicalSeries = organicSpinachFrame.stack()
conventionalSpinachHierarchicalSeries = conventionalSpinachFrame.stack()
conventionalSpinachHierarchicalSeries


Combine year and month into a single index.

In [35]:
organicSpinachHierarchicalSeries.index

In [36]:
organicSpinachHierarchicalSeries.index

In [37]:
from pandas import Timestamp
def parse_indexEntry(indexEntry):
return Timestamp('%d %s' % indexEntry)
parse_indexEntry(organicSpinachHierarchicalSeries.index[0])

In [38]:
def flatten_index(hierarchicalSeries):
return pandas.Series(hierarchicalSeries.values,
index=[parse_indexEntry(x) for x in hierarchicalSeries.index])
organicSpinachSeries = flatten_index(organicSpinachHierarchicalSeries)
conventionalSpinachSeries = flatten_index(conventionalSpinachHierarchicalSeries)
conventionalSpinachSeries[:5]


Compare monthly prices of organic vs conventional spinach.

In [39]:
import pylab as pl
pl.figure(figsize=(12,6))
organicSpinachSeries.plot(label='organic')
conventionalSpinachSeries.plot(label='conventional')
pl.legend()

In [40]:
organicSpinachFrame - conventionalSpinachFrame

In [41]:
organicSpinachSeries.describe()

In [42]:
organicSpinachSeries.describe() - conventionalSpinachSeries.describe()


Look at dates when the price of organic spinach was above average.

In [43]:
subIndex = organicSpinachSeries > organicSpinachSeries.mean()
organicSpinachSeries[subIndex]


Look at the price of conventional spinach for dates when the price of organic spinach was above average.

In [44]:
conventionalSpinachSeries[subIndex]


Was the price of conventional spinach above average on dates when the price of organic spinach was above average?

In [45]:
conventionalSpinachSeries[subIndex] > conventionalSpinachSeries.mean()


Assemble prices and their differences into a table and save it to a spreadsheet.

In [46]:
# Create dataFrame
spinachFrame = pandas.DataFrame.from_items([
('organic', organicSpinachSeries),
('conventional', conventionalSpinachSeries)])

In [47]:
# Add column of differences
spinachFrame['difference'] = organicSpinachSeries - conventionalSpinachSeries

In [48]:
# Save it to a spreadsheet


Convert spinachFrame from US dollars to Singapore dollars using an exchange rate of 1 USD = 1.28135 SGD.

In [49]:
# Type your solution here and press CTRL-ENTER


Compute the ratio of organic prices to conventional prices.

In [50]:
# Option 1
spinachFrame['organic'] / spinachFrame['conventional']

In [51]:
# Option 2
spinachFrame.apply(lambda row: row['organic'] / row['conventional'], axis=1)


Compute average prices by year.

In [52]:
from pandas.core.datetools import YearEnd
spinachFrame.groupby(YearEnd().rollforward).aggregate(np.mean)


Predict the price of organic spinach given the price of conventional spinach using ordinary least squares.

In [53]:
import statsmodels.api as sm
y = organicSpinachSeries[:-1]      # Exclude last price
X = conventionalSpinachSeries[:-1] # Exclude last price
model = sm.OLS(y, X)
results = model.fit()
results.params

In [54]:
results.predict(conventionalSpinachSeries[-1])