PythonFinanceCorrelation

Correlating Stock Returns Using Python

Learn how to pull stock price data with Python and analyze correlations between different companies' stock returns using a Seaborn heatmap.

July 1, 2019
5 mins read

In this tutorial I'll walk you through a simple methodology to correlate various stocks against each other. We'll grab the prices of the selected stocks using python, drop them into a clean dataframe, run a correlation, and visualize our results.

(1) Import libraries, select our list of stocks to correlate

 import numpy as np
 import pandas as pd
 #used to grab the stock prices, with yahoo
 import pandas_datareader as web
 from datetime import datetime
 #to visualize the results
 import matplotlib.pyplot as plt
 import seaborn

 #select start date for correlation window as well as list of tickers
 start = datetime(2017, 1, 1)
 symbols_list = ['AAPL', 'F', 'TWTR', 'FB', 'AAL', 'AMZN', 'GOOGL', 'GE']

(2) Pull stock prices, push into clean dataframe

#array to store prices
symbols=[]

#pull price using iex for each symbol in list defined above
for ticker in symbols_list: 
    r = web.DataReader(ticker, 'yahoo', start)
    # add a symbol column
    r['Symbol'] = ticker 
    symbols.append(r)

# concatenate into df
df = pd.concat(symbols)
df = df.reset_index()
df = df[['Date', 'Close', 'Symbol']]
df.head()

We're now left with a table in this format:

dateclosesymbol
2017-01-03113.4101AAPL
2017-01-04113.2832AAPL
2017-01-05113.8593AAPL

3) However, we want our symbols represented as columns so we'll have to pivot the dataframe:

 df_pivot = df.pivot('date','symbol','close').reset_index()
 df_pivot.head()
dateAALAAPLAMZNFFBGEGOOGLTWTR
02017-01-0345.7092113.4101753.6711.4841116.8630.1370808.0116.44
12017-01-0446.1041113.2832757.1812.0132118.6930.1465807.7716.86
22017-01-0545.3044113.8593780.4511.6483120.6729.9754813.0217.09
32017-01-0645.6203115.1286795.9911.6392123.4130.0610825.2117.17
42017-01-0946.4792116.1832796.9211.5206124.9029.9183827.1817.50

(4) Next, we can run the correlation. Using the Pandas 'corr' function to compute the Pearson correlation coeffecient between each pair of equities

 corr_df = df_pivot.corr(method='pearson')
 #reset symbol as index (rather than 0-X)
 corr_df.head().reset_index()
 del corr_df.index.name
 corr_df.head(10)
symbolAALAAPLAMZNFFBGEGOOGLTWTR
AAL1.0000000.2352390.2260610.0680240.356063-0.3326810.4940750.169487
AAPL0.2352391.0000000.8687630.1845010.911380-0.8952100.9031910.781755
AMZN0.2260610.8687631.0000000.1083510.744732-0.9374150.8644550.955373
F0.0680240.1845010.1083511.0000000.206055-0.2160640.1897530.161078
FB0.3560630.9113800.7447320.2060551.000000-0.8147030.9000330.650404
GE-0.332681-0.895210-0.937415-0.216064-0.8147031.000000-0.882526-0.866871
GOOGL0.4940750.9031910.8644550.1897530.900033-0.8825261.0000000.789379
TWTR0.1694870.7817550.9553730.1610780.650404-0.8668710.7893791.000000

5) Finally, we can plot a heatmap of the correlations (with Seaborn and Matplotlib) to better visualize the results:

 #take the bottom triangle since it repeats itself
 mask = np.zeros_like(corr_df)
 mask[np.triu_indices_from(mask)] = True
 #generate plot
 seaborn.heatmap(corr_df, cmap='RdYlGn', vmax=1.0, vmin=-1.0 , mask = mask, linewidths=2.5)
 plt.yticks(rotation=0) 
 plt.xticks(rotation=90) 
 plt.show()

py corr output

Here we can see that, as expected, the tech companies are generally pretty highly correlated (as indicated by dark green -- AMZN/GOOGL, FB/AAPL, etc). Conversely, Ford (F) and General Electric (GE) are either not correlated or negatively correlated with the rest of the group.