Time Series Forecasting with Prophet

14 minute read

Introduction

Time series forecasting is used in multiple business domains, such as pricing, capacity planning, inventory management, etc. Forecasting with techniques such as ARIMA requires the user to correctly determine and validate the model parameters $(p,d,q)$ . This is a multistep process that requires the user to interpret the Autocorrelation Function (ACF) and Partial Autocorrelation (PACF) plots correctly. Using the wrong model can easily lead to erroneous results.

Prophet is an open source time series forecasting library made available by Facebook’s Core Data Science team. It is available both in Python and R, and it’s syntax follow’s Scikit-learn’s train and predict model.

Prophet is built for business cases typically encounted at Facebook, but which are also encountered in other businesses:

Hourly, Daily or Weekly data with sufficient historical data
Multiple Seasonality patterns related to human behaviour (day of week, seasons)
Important holidays that are irregularly spaced (Thanksgiving, Chinese New Year, etc.)
Reasonable amount of missing data
Historical trend changes
Non linear growth trends with saturation (capacity limit, etc.)

The goal of Prophet is to product high quality forecasts for decision making out of the box without requiring the user to have expert time series forecasting knowledge. The user can intuitively intervene during the model building process by introducing known parameters such as trend changepoints due to product introduction or trend saturation values due to capacity.

Stan

Prophet uses Stan as its optimization engine to fit its model and calculate uncertainty intervals. Stan is installed along with the R or Python libraries when Prophet is installed. Stan performs Maximum a Priori (MAP) optimization by default but if sampling can be requested.

Data

According to Facebook, Prophet is able to give accurate forecasting results with it’s default settings, little or no tuning required. Let’s test out Prophet using two different datasets:

Singapore Changi Airport Passenger arrivals
Maritime Tanker Arrival Data in Singapore

We’ll use the python library as there not many time series forecasting libraries in this language whereas R has popular packages such as forecast, tseries, etc.

We’ll load the tanker data into the dataframe tanker and the flight passenger data into the dataframe air.

import pandas as pd
from fbprophet import Prophet

air=pd.read_csv('D:/total-air-passenger-arrivals-by-country.csv')

# Filter Data for Passengers from China and drop na
air=air[air.level_3=='China']
air=air.drop(['level_1','level_2','level_3'],axis=1)
air=air[(air.value!='na') & (air.value!='-')]

Prophet requires the input dataframe’s columns to be named ds and y.

air=air.rename(columns={'month':'ds','value':'y'})

Parameters

According to Facebook’s article, Prophet uses an additive model:

$y(t) = g(t) + s(t) + h(t) +\epsilon_t$

where $g(t)$ represents the trend, $s(t)$ the periodic component, $h(t)$ holiday related events and $\epsilon_t$ the error. Our data is monthly so we are unanble to model using holidays.

Let us model the air passenger arrival data first. The syntax is similar to scikit-learn with calls to the fit and predict functions. We need to make a new data frame for forecasting via the make_future_dataframe function. The parameter freq controls the frequency (e.g. ‘D’ for days, ‘M’ for months)

model_air=Prophet()
model_air.fit(air)
future_air=model_air.make_future_dataframe(periods=12, freq='M')
forecast_air=model_air.predict(future_air)

INFO:fbprophet.forecaster:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:fbprophet.forecaster:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.

Prophet automatically detected monthly data and disabled weekly and daily seasonality. We can plot the forecast by Prophet

model_air.plot(forecast_air,xlabel='Time',
                             ylabel='Visitors from CHina')

png

Multiplicative Seasonality

We see that an additive model is not suitable; the predicted fluctuation due to seasonality is constant throughout the years. In reality we observe that the fluctuation is increasing. We can ask Prophet use multiplicative model instead by setting seasonality_mode='multiplicative'.

model_air=Prophet(seasonality_mode='multiplicative')
model_air.fit(air)
future_air=model_air.make_future_dataframe(periods=12, freq='M')
forecast_air=model_air.predict(future_air)

INFO:fbprophet.forecaster:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:fbprophet.forecaster:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.

model_air.plot(forecast_air,xlabel='Time',
                             ylabel='Visitors from China')

png

The seasonality effect increases over time with a multiplicative model.

Trend Change Points

The trend in a real time series can change abruptly. Prophet attempts to detect these changes automatically using a Laplacian or double exponential prior. By default, the change points are only fitted for the 1st 80% of the time series, allowing sufficient runway for the actual forecast. In the passenger arrival data, note that there is a sharp dip in 2003 due to the SARS outbreak in Singapore. These outliers should ideally be removed. Let’s display the change points detected by Prophet:

from fbprophet.plot import add_changepoints_to_plot
fig_air=model_air.plot(forecast_air)
a=add_changepoints_to_plot(fig_air.gca(),model_air,forecast_air)

png

Visually it appears that the general trend is correct but it is being underfit. For example the decline during the 2008-2009 financial crisis is not detected. To adjust the trend change, we can use the parameter changepoint_prior_scale which is set to 0.05 by default. Increasing its value would make the trend more flexible and reduce underfitting, at the risk of overfitting. Let us set it to 0.5 as suggested by the Prophet Documentation Guide. If we want to generate uncertainty intervals for the trend and seasonality components, we need to perform full Bayesian sampling, which can be done by using the mcmc_samples parameter in Prophet.

model_air = Prophet(changepoint_prior_scale=0.5,
                    seasonality_mode='multiplicative'
                    )
forecast_air = model_air.fit(air).predict(future_air)
fig_air = model_air.plot(forecast_air)
a=add_changepoints_to_plot(fig_air.gca(),model_air,forecast_air)

INFO:fbprophet.forecaster:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:fbprophet.forecaster:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.

png

Plot Model Components

We get a better fit of the trend when increasing changepoint_prior_scale. The decline during the financial crisis can be detected. The user can also manually define the change points based on domain knowledge (e.g. when forecasting sales the analyst might be aware of new product launches, sales, etc.) Using the plot_components function we can display the components of the model:

model_air.plot_components(forecast_air)

png

We observe a piecewise linear trend. Prophet also has the ability to fit saturating trends using a logistic growth trend model. This is applicable in cases where the trend is limited by capacity, e.g. the number of Facebook users in a country would be naturally limited by the number of people with access to the internet. This is done by setting the parameter growth=logistic and defining a column called cap in the dataframe.

Cross Validation

We can perform cross validation to measure forecast error. Cut off points are selected and we train the model with data up to that point. We can then compare the prediction vs actual data over a specified time horizon. This can be done using the cross_validation function. The parameter period specifies the interval between cut off points.

from fbprophet.diagnostics import cross_validation
df_cv = cross_validation(model_air, initial='4745 days',
                         period='365 days', horizon = '720 days')

INFO:fbprophet.diagnostics:Making 16 forecasts with cutoffs between 2000-09-15 00:00:00 and 2015-09-12 00:00:00

from fbprophet.diagnostics import performance_metrics
from fbprophet.plot import plot_cross_validation_metric

df_p = performance_metrics(df_cv)
df_p.head()

	horizon	mse	rmse	mae	mape	coverage
98	78 days	2.480517e+08	15749.656257	11932.484783	0.094478	0.394737
146	78 days	2.482712e+08	15756.623644	11972.847243	0.094157	0.421053
170	78 days	2.479630e+08	15746.841395	11919.787650	0.093090	0.421053
266	79 days	2.550821e+08	15971.290468	12338.291682	0.095245	0.394737
218	79 days	2.547107e+08	15959.658037	12271.050843	0.093660	0.421053

Tanker Data - Additional Regressor

Let’s see how the Prophet performs on the oil tanker arrival data. We also explore the option of adding additional regressors. Additional regressors can be discrete like future holidays or another time series. However this time series needs to be know or forecasted separately for future dates .

In our example of tanker arrival data we limit the tanker arrival to the year 2016 and use the monthly oil price until 2018 from the World Bank as our additional regressor. The oil price is chosen as an example; in reality other parameters such as refining capacity, contango in the oil market, and storage levels might have more of an influence on the number of oil tankers visiting Singapore. Note that it is not always possible to predict these parameters in advance; this additional regressor can also be forecasted, with an associated forecast error.

tanker=pd.read_csv('D:/tanker-arrivals-breakdown-monthly.csv')

# Filter Data for Oil Tankers only
tanker=tanker[tanker.category == 'Oil Tankers']
tanker=tanker.drop(['category',
                    'gross_tonnage'],axis=1)

# rename columns
tanker = tanker.rename(columns = {'number_of_tankers': 'y',
                                  'month':'ds'})

# Load oil data
oil=pd.read_excel('D:/CMOHistoricalDataMonthly.xlsx',
                   sheet_name='Monthly Prices',
                   skiprows=6,
                   usecols=1
                   )
# Rename column and clean up date format
oil=oil.rename({'Unnamed: 0':'ds'},axis=1)
clean_oil = lambda x: x.replace('M','-')

oil['ds']=oil['ds'].map(clean_oil)

tanker=pd.merge(tanker,oil,left_on='ds',
                right_on='ds',
                how='left')
price=tanker['CRUDE_PETRO']
tanker=tanker[:180]

model_tanker = Prophet(changepoint_prior_scale=0.25)
model_tanker.add_regressor('CRUDE_PETRO')
model_tanker.fit(tanker)
future_tanker=model_tanker.make_future_dataframe(periods=20,
                                                 freq='M')
future_tanker=pd.concat([future_tanker,price],axis=1)

INFO:fbprophet.forecaster:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
INFO:fbprophet.forecaster:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.

forecast_tanker=model_tanker.predict(future_tanker)
from fbprophet.plot import add_changepoints_to_plot
fig_tanker=model_tanker.plot(forecast_tanker)
a=add_changepoints_to_plot(fig_tanker.gca(),
                           model_tanker,
                           forecast_tanker)

png

fig = model_tanker.plot_components(forecast_tanker)

png

Conclusion

Prophet is easy and intuitive to use and the components of the model are easily explainable. It also allows the incorporation of domain knowledge into the model, for example via known change points or capacity limits. The forecasts are pretty decent but in some cases certain parameters have to be tweaked compared to the default setting, which is easily done.

Share on

Twitter Facebook Google+ LinkedIn

David Ten