10 years of Stock data analysis using pandas in less than 30 lines of code — Part 1
Analyzing stock data doesn’t require advanced coding skills. Using data to analyse investment or trading decisions take away emotions and often leads to a good choice. Before proceeding further, please read the disclaimer.
Disclaimer: Stock prices vary based on many factors not just on past data. This article is no means to be used for investment advice and purely for educational purposes. Please speak to your financial advisor for any specific investment decisions.
Here, we will analyse Indian stocks and index. But this analysis can be used for any stocks in any country. Yahoo Finance offers data from a wide variety of countries and exchanges.
NIFTY is a market index introduced by the NSE (National Stock Exchange). The name is derived cleverly from “National Stock Exchange” and “Fifty” coined by NSE on 21st April 1996. NIFTY 50 is a benchmark based index and also the flagship of NSE, which showcases the top 50 equity stocks traded in the exchange out of a total of 1900+ stocks.
By end of this article, we will understand, 10 years returns, which stock has a correlation with Nifty and the volatility factor. Let’s start with some coding now.
We will import Yahoo Finance for stock data, Pandas for data processing, Numpy for mathematical calculation and Matplotlib for visualization
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Create a list with Nifty and Top 10 stocks.
nifty_top = [“^NSEI”,”RELIANCE.NS”, “HDFCBANK.NS”, “INFY.NS”, “HDFC.NS”, “ICICIBANK.NS”, “TCS.NS”, “KOTAKBANK.NS”, “HINDUNILVR.NS”, “ITC.NS”, “AXISBANK.NS”]
Create an empty list that is used to store the stock data. For loop is used to read the stock ticker data from the list, read the daily data for last 10 years from yahoo finance and then append to stock_list. We are interested only in “Adjusted Close” price and of course the date. We will also drop any “Not a number” data using dropna() funtion.
Now concat the data.
data = pd.concat(stock_list, axis=1)
You will see data something like this.
The column name of “Adj Close” doesn't make sense to refer back. Change the column name
data.columns = nifty_top
Calculate 10 years returns in percentage
Let’s take a break from coding now and understanding some basics.
What is a correlation, how is this useful?
Our decision-making process is the result of multiple thinking patterns. Some data points are already stored in the brain, some we need to make based on the data that we have in front of us. For example, we take a decision to carry an umbrella by looking at the sky as our brain is already trained with past history. This is a correlation based on prevous experience. Correlation in stock market helps in pair trading.
Lets calculate the correlation now. If we can look at the daily returns of the each stock and see if there is a correlation on how these stocks behave.
Calculate the daily returns of the stocks. diff() function is Pandas calculates the difference between current data item with the previous data item
data_diff = data.diff()
Calculate the percentage of returns
data_percentage = (data_diff/data) * 100
You can also achieve the same by using log function in numpy
Use sweetviz to quickly get the visualization of the data.
You will see a beautiful html page like below.
The interesting thing is “Associations”
If you can see the color and size, interestingly — ICICI Bank has a good correlation with Nifty. Not Reliance which has the highest weightage. Lets not confuse the brain, instead lets calculate real data.
As you can see, Nifty has a good correlation with ICICIBank (0.76). Not out of the box thinking, TCS has a good correlation with Infosys as they both belong to same sector. You may now look at pair trading strategies between Nifty and ICICI Bank or TCS and Infosys.
What is stock volatility and why should you care about it?
It is a rate at which the price of a security increases or decreases for a given set of returns. For example, you would like to travel from Point A to Point B in 10 hours which is about 500 miles. By travelling 10, 20, 150, 100, 50, 70, 30, 40, 10, 20 miles per hour you can cover this distance within the stipulated time. You can also achieve this by travelling 50 miles per hour for 10 hours. In both cases, you achieved the target. However, the standard deviation for the first case is 42.89 and a possible heart attack to your fellow passenger after two hours of your journey. However, the standard deviation in the second case is Zero. The second case is more predictable and less risky while the first case is very risky and not predictable. When it comes to the stock market, it is advisable to buy stocks that give good returns yet less volatile (with less standard deviation). How to calculate stock volatility will be covered in my next article.