Retail sales forecast using Facebook’s Prophet

Prophet is open source project released by Facebook to forecast time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. Additive model is non-parametric regression model.

Install Prophet:

Installing prophet is very easy (if you are lucky!!!). If you are using Anaconda,

conda install gcc

conda install -c conda-forge fbprophet

It has a dependency on PyStan. I struggled for few hours to get the version things sorted out. But you will eventually get there if you encountered any issues.

I am not covering any theory. You can find how Prophet works at https://facebook.github.io/prophet/

For this analysis, I am using retail sales dataset.

import pandas as pd
from fbprophet import Prophet

above imports Prophet. Below loads the data into dataframe

df = pd.read_csv(‘/…./data/sales_data_set.csv’)
df.head()

You get to see head of the dataframe

Not needed for the forecasting. But if you would like to find what are the stores and their occurrences,

df[‘Store’].value_counts()

Below is partial output

For this, I am taking Store 1 and Department 1. Best way to do this is to write a function.

df1=df.loc[df[‘Store’] == 1]
df2=df1.loc[df[‘Dept’] == 1]

df2.head()

Prophet needs just date and sales data. So, copy these two columns into a new dataframe, model expects the names to be ‘ds’ and ‘y’

df3 = df2[[‘Date’,’Weekly_Sales’]]
df3.rename(columns={‘Date’:’ds’}, inplace=True)
df3.rename(columns={‘Weekly_Sales’:’y’}, inplace=True)

df3.head()

Now fit the model by instantiating a new Prophet object and then call its fit method and pass in the historical dataframe

m = Prophet()
m.fit(df3)

Now predict on a dataframe with a column ds containing the dates for which a prediction is to be made. Here I am predicting for next 365 days.

future = m.make_future_dataframe(periods=52)
future.tail()

The predict method does future forecasting. Forecast is a new dataframe witch contains date, forecasted value, lower and upper value predictions, which is uncertainty intervals.

forecast = m.predict(future)
forecast[[‘ds’, ‘yhat’, ‘yhat_lower’, ‘yhat_upper’]].tail()

Now plot the data

fig1 = m.plot(forecast)

You can also plot the forecast components.

fig2 = m.plot_components(forecast)

Below is the interactive figure of the forecast can be created with plotly.

from fbprophet.plot import plot_plotly
import plotly.offline as py
py.init_notebook_mode()

fig = plot_plotly(m, forecast) # This returns a plotly Figure
py.iplot(fig)

Now let’s check how good is this model.

Here we do cross-validation to assess prediction performance on a horizon of 180 days, starting with 600 days of training data in the first cutoff and then making predictions every 90 days. This corresponds to 4 total forecasts.

from fbprophet.diagnostics import cross_validation
df_cv = cross_validation(m, initial=’600 days’, period=’90 days’, horizon = ‘180 days’)
df_cv.head()

The statistics computed are mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), and coverage of the yhat_lower and yhat_upper estimates.

from fbprophet.diagnostics import performance_metrics
df_p = performance_metrics(df_cv)
df_p.head()

Cross validation performance metrics can be visualized with plot_cross_validation_metric, here shown for MAPE.

All the very best!!

Geek, Artist, Fitness enthusiast, Amateur investor and Simple human