Correlating stock returns using Python


In this tutorial I'll walk you through a simple methodology to correlate various stocks against each other. We'll grab the prices of the selected stocks using the IEX api, drop them into a clean dataframe, run a correlation, and visualize our results.

(1) Import libraries, select our list of stocks to correlate

import numpy as np

import pandas as pd

#used to grab the stock prices, with IEX api

import pandas_datareader as web

from datetime import datetime

#to visualize the results

import matplotlib.pyplot as plt

import seaborn

#select start date for correlation window as well as list of tickers

start = datetime(2017, 1, 1)

symbols_list = ['AAPL', 'F', 'TWTR', 'FB', 'AAL', 'AMZN', 'GOOGL', 'GE']


(2) Pull stock prices, push into clean dataframe

#array to store prices

symbols=[]

#pull price using iex for each symbol in list defined above

for ticker in symbols_list:

r = web.DataReader(ticker, "iex", start)

# add a symbol column

r['Symbol'] = ticker

symbols.append(r)

# concatenate into df

df = pd.concat(symbols)

df = df.reset_index()

df = df[['date', 'close', 'Symbol']]

df.head()

We're now left with a table in this format:

date close symbol
2017-01-03 113.4101 AAPL
2017-01-04 113.2832 AAPL
2017-01-05 113.8593 AAPL

3) However, we want our symbols represented as columns so we'll have to pivot the dataframe:

df_pivot = df.pivot('date','symbol','close').reset_index()

df_pivot.head()


date AAL AAPL AMZN F FB GE GOOGL TWTR
0 2017-01-03 45.7092 113.4101 753.67 11.4841 116.86 30.1370 808.01 16.44
1 2017-01-04 46.1041 113.2832 757.18 12.0132 118.69 30.1465 807.77 16.86
2 2017-01-05 45.3044 113.8593 780.45 11.6483 120.67 29.9754 813.02 17.09
3 2017-01-06 45.6203 115.1286 795.99 11.6392 123.41 30.0610 825.21 17.17
4 2017-01-09 46.4792 116.1832 796.92 11.5206 124.90 29.9183 827.18 17.50

(4) Next, we can run the correlation. Using the Pandas 'corr' function to compute the Pearson correlation coeffecient between each pair of equities

corr_df = df_pivot.corr(method='pearson')

#reset symbol as index (rather than 0-X)

corr_df.head().reset_index()

del corr_df.index.name

corr_df.head(10)


symbol AAL AAPL AMZN F FB GE GOOGL TWTR
AAL 1.000000 0.235239 0.226061 0.068024 0.356063 -0.332681 0.494075 0.169487
AAPL 0.235239 1.000000 0.868763 0.184501 0.911380 -0.895210 0.903191 0.781755
AMZN 0.226061 0.868763 1.000000 0.108351 0.744732 -0.937415 0.864455 0.955373
F 0.068024 0.184501 0.108351 1.000000 0.206055 -0.216064 0.189753 0.161078
FB 0.356063 0.911380 0.744732 0.206055 1.000000 -0.814703 0.900033 0.650404
GE -0.332681 -0.895210 -0.937415 -0.216064 -0.814703 1.000000 -0.882526 -0.866871
GOOGL 0.494075 0.903191 0.864455 0.189753 0.900033 -0.882526 1.000000 0.789379
TWTR 0.169487 0.781755 0.955373 0.161078 0.650404 -0.866871 0.789379 1.000000

5) Finally, we can plot a heatmap of the correlations (with Seaborn and Matplotlib) to better visualize the results:

#take the bottom triangle since it repeats itself

mask = np.zeros_like(corr_df)

mask[np.triu_indices_from(mask)] = True

#generate plot

seaborn.heatmap(corr_df, cmap='RdYlGn', vmax=1.0, vmin=-1.0 , mask = mask, linewidths=2.5)

plt.yticks(rotation=0)

plt.xticks(rotation=90)

plt.show()


Create credentials

Here we can see that, as expected, the tech companies are generally pretty highly correlated (as indicated by dark green -- AMZN/GOOGL, FB/AAPL, etc). Conversely, Ford (F) and General Electric (GE) are either not correlated or negatively correlated with the rest of the group.


Hope you enjoyed the post. If you're looking to land a role in the data science/analytics space (or just looking to learn a bit more about it), feel free to sign up to receive a few DS interview questions per week.