How to Difference a Time Series Dataset with Python
Differencing is a popular and widely used data transform for time series.
In this tutorial, you will discover how to apply the difference operation to your time series data with Python.
After completing this tutorial, you will know:
 About the differencing operation, including the configuration of the lag difference and the difference order.
 How to develop a manual implementation of the differencing operation.
 How to use the builtin Pandas differencing function.
Let’s get started.
How to Difference a Time Series Dataset with Python
Photo by Marcus, some rights reserved.
Why Difference Time Series Data?
Differencing is a method of transforming a time series dataset.
It can be used to remove the series dependence on time, socalled temporal dependence. This includes structures like trends and seasonality.
Differencing can help stabilize the mean of the time series by removing changes in the level of a time series, and so eliminating (or reducing) trend and seasonality.
— Page 215, Forecasting: principles and practice
Differencing is performed by subtracting the previous observation from the current observation.
1

difference(t) = observation(t)  observation(t1)

In this way, a series of differences can be calculated.
Lag Difference
Taking the difference between consecutive observations is called a lag1 difference.
The lag difference can be adjusted to suit the specific temporal structure.
For time series with a seasonal component, the lag may be expected to be the period (width) of the seasonality.
Difference Order
Temporal structure may still exist after performing a differencing operation, such as in the case of a nonlinear trend.
As such, the process of differencing can be repeated more than once until all temporal dependence has been removed.
The number of times that differencing is performed is called the difference order.
Stop learning Time Series Forecasting the slow way
Signup and get a FREE 7day Time Series Forecasting MiniCourse
You will get:
...one lesson each day delivered to your inbox
...exclusive PDF ebook containing all lessons
...confidence and skills to work through your own projects
Download Your FREE MiniCourse
Shampoo Sales Dataset
This dataset describes the monthly number of sales of shampoo over a 3 year period.
The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis, Wheelwright, and Hyndman (1998).
You can download and learn more about the dataset here.
The example below loads and creates a plot of the loaded dataset.
1
2
3
4
5
6
7
8
9
10

from
pandas
import
read_csv
from
pandas
import
datetime
from
matplotlib
import
pyplot
def
parser
(
x
)
:
return
datetime
.
strptime
(
'190'
+
x
,
'%Y%m'
)
series
=
read_csv
(
'shampoosales.csv'
,
header
=
0
,
parse_dates
=
[
0
]
,
index_col
=
0
,
squeeze
=
True
,
date_parser
=
parser
)
series
.
plot
(
)
pyplot
.
show
(
)

Running the example creates the plot that shows a clear linear trend in the data.
Shampoo Sales Dataset Plot
Manual Differencing
We can difference the dataset manually.
This involves developing a new function that creates a differenced dataset. The function would loop through a provided series and calculate the differenced values at the specified interval or lag.
The function below named difference() implements this procedure.
1
2
3
4
5
6
7

# create a differenced series
def
difference
(
dataset
,
interval
=
1
)
:
diff
=
list
(
)
for
i
in
range
(
interval
,
len
(
dataset
)
)
:
value
=
dataset
[
i
]

dataset
[
i

interval
]
diff
.
append
(
value
)
return
Series
(
diff
)

We can see that the function is careful to begin the differenced dataset after the specified interval to ensure differenced values can, in fact, be calculated. A default interval or lag value of 1 is defined. This is a sensible default.
One further improvement would be to also be able to specify the order or number of times to perform the differencing operation.
The example below applies the manual difference() function to the Shampoo Sales dataset.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

from
pandas
import
read_csv
from
pandas
import
datetime
from
pandas
import
Series
from
matplotlib
import
pyplot
def
parser
(
x
)
:
return
datetime
.
strptime
(
'190'
+
x
,
'%Y%m'
)
# create a differenced series
def
difference
(
dataset
,
interval
=
1
)
:
diff
=
list
(
)
for
i
in
range
(
interval
,
len
(
dataset
)
)
:
value
=
dataset
[
i
]

dataset
[
i

interval
]
diff
.
append
(
value
)
return
Series
(
diff
)
series
=
read_csv
(
'shampoosales.csv'
,
header
=
0
,
parse_dates
=
[
0
]
,
index_col
=
0
,
squeeze
=
True
,
date_parser
=
parser
)
X
=
series
.
values
diff
=
difference
(
X
)
pyplot
.
plot
(
diff
)
pyplot
.
show
(
)

Running the example creates the differenced dataset and plots the result.
Manually Differenced Shampoo Sales Dataset
Automatic Differencing
The Pandas library provides a function to automatically calculate the difference of a dataset.
This diff() function is provided on both the Series and DataFrame objects.
Like the manually defined difference function in the previous section, it takes an argument to specify the interval or lag, in this case called the periods.
The example below demonstrates how to use the builtin difference function on the Pandas Series object.
1
2
3
4
5
6
7
8
9
10
11

from
pandas
import
read_csv
from
pandas
import
datetime
from
matplotlib
import
pyplot
def
parser
(
x
)
:
return
datetime
.
strptime
(
'190'
+
x
,
'%Y%m'
)
series
=
read_csv
(
'shampoosales.csv'
,
header
=
0
,
parse_dates
=
[
0
]
,
index_col
=
0
,
squeeze
=
True
,
date_parser
=
parser
)
diff
=
series
.
diff
(
)
pyplot
.
plot
(
diff
)
pyplot
.
show
(
)

As in the previous section, running the example plots the differenced dataset.
A benefit of using the Pandas function, in addition to requiring less code, is that it maintains the datetime information for the differenced series.
Automatic Differenced Shampoo Sales Dataset
Summary
In this tutorial, you discovered how to apply the difference operation to time series data with Python.
Specifically, you learned:
 About the difference operation, including the configuration of lag and order.
 How to implement the difference transform manually.
 How to use the builtin Pandas implementation of the difference transform.
Do you have any questions about differencing, or about this post?
Ask your questions in the comments below.
Want to Develop Time Series Forecasts with Python?
Develop Your Own Forecasts in Minutes
...with just a few lines of python code
Discover how in my new Ebook:
Introduction to Time Series Forecasting With Python
It covers selfstudy tutorials and endtoend projects on topics like:
Loading data, visualization, modeling, algorithm tuning, and much more...
Finally Bring Time Series Forecasting to
Your Own Projects
Skip the Academics. Just Results.
Click to learn more.