-
Notifications
You must be signed in to change notification settings - Fork 639
Mplfinance Time Axis Concerns, and Internals of Displaying or Not Displaying Non Trading periods
In this article we describe internal workings of the time axis (the x-axis) when plotting Time-Series data in general, and Financial Markets Data in particular.
This information is not necessary for most mplfinance users. The user simply provides Pandas Timestamps
, or python datetimes
, or strings
representing a date or datetime, and mplfinance handles the rest.
However, this article is important for mplfinance users who directly access the mplfinance Axes objects in a way that requires use of the x-axis.
As a reminder, direct access of mplfinance Axes objects is discouraged. Doing so will always require more code (see below). There are cases, however, where accessing Axes is necessary. Presently, there an enhancement underway to make it much less necessary. In the meantime, this article can provide a better understanding of time axis, in order to help those users who must directly access the mplfinance Axes objects in a way that requires use of the x-axis.
The first thing to understand is that Matplotlib datetimes
are not the same as Pandas Timestamps
nor python datetimes
(which in turn are also different from each other). Although it is relatively simple to convert from one to the other, mplfinance does this conversion for you. Users should not have to worry or think about matplotlib datetimes
.
- This has implications for people who choose to work directly with the Matplotlib Figure and Axes objects when working with mplfinance.
In such cases users must be aware that the x-axis data may be matplotlib datetimes and that they may have to convert their own data.
- Notice that we said may be matplotlib datetimes: There is another wrinkle the requires mplfinance to internally deal with yet another representation or mapping of datetimes.
Financial Market Data (that one can obtain from any number of market data sources) typical does not include rows for periods when the market was closed. For example, daily data will not include data rows for weekends. Hourly data, or minute-by-minute data, for more than one day, will not include data rows for nighttime hours when the market is closed.
For example, some daily data might look like this:
Notice that there are no rows for 03/19/2022 and 03/20/2022:
Date | Open | High | Low | Close |
---|---|---|---|---|
03/23/2022 | 446.9100 | 448.4900 | 443.7100 | 443.8000 |
03/22/2022 | 445.8600 | 450.5800 | 445.8600 | 449.5900 |
03/21/2022 | 444.3400 | 446.4600 | 440.6800 | 444.3900 |
03/18/2022 | 438.0000 | 444.8600 | 437.2200 | 444.5200 |
03/17/2022 | 433.5900 | 441.0700 | 433.1900 | 441.0700 |
Similarly, the following minute-by-minute data set
contains no data from the market close at 16:00 on 3/29 until the market open at 09:30 on 3/30:
DateTime | Open | High | Low | Close |
---|---|---|---|---|
03/30 09:33 | 460.6900 | 460.8299 | 460.6000 | 460.6700 |
03/30 09:32 | 460.6250 | 460.7900 | 460.5700 | 460.6700 |
03/30 09:31 | 460.6858 | 460.7300 | 460.5400 | 460.6250 |
03/30 09:30 | 460.3400 | 460.7300 | 460.2900 | 460.6800 |
03/29 15:59 | 461.2900 | 461.6200 | 461.1100 | 461.5300 |
03/29 15:58 | 461.2500 | 461.4100 | 461.2300 | 461.3000 |
03/29 15:57 | 461.0850 | 461.2900 | 460.9200 | 461.2500 |
03/29 15:56 | 461.0550 | 461.1500 | 461.0400 | 461.0900 |
When we tell Matplotlib that the x-axis is time, Matplotlib assumes the time axis as continuous (a reasonable assumption).
This means that the time axis will display ALL times between the first time and the last time in the data set.
The problem with NOT displaying Non-Trading periods is that, mathematically, THE TIME AXIS IS NOW DISCONTINUOUS with respect to time.
As mentioned above, when we squeeze out the non-trading gaps from market data (for example we put Monday right after Friday, or we put 09:30 Tuesday right after 16:00 Monday) we create discontinuities in the time-axis.
This means that a particular small length of x-axis may correpsond to 1 day at one place on the axis, but the same length of axis may correspond to 2 days or 3 days or even 4 days at another place on the same axis. (This has implications for drawing trend lines, and for interpolating between data points, as we will discuss below).
The underlying matplotlib Artists (graphics) code (as far as I know) does not support a discontinuous axis. It treats every axis within an Axes object as continuous, from its minimum to its maximum. Therefore, in order to plot a discontinuous time series with no gaps, we need to provide the Axes object with a set of continuous data to use for the x-axis. This is simple to do as long as that continuous data somehow maps to the discontinuous time series.
Mplfinance does this by using the row number of the DataFrame as the x-axis variable. For example, if the DataFrame contains 90 rows of data, then the x-axis variable will be a floating point number ranging from 0.0 to 89. Mplfinance then displays datetimes along the x-axis by internally maintaining a mapping between the row number and the datetime.
Most mplfinance users can be blissfully ignorant of these internal workings. The user provides Pandas Timestamps
(in the form of a DatetimeIndex
within a Pandas DataFrame
) and mplfinance handles the rest.
In some cases (for example with vlines
and alines
kwargs) the user may provide not only Pandas Timestamps
, but also python datetimes
, or even strings
representing a Date or Timestamp (for example "03-30-2022 13:00"). Again mplfinance handles this, converting the strings or datetimes, and even handling the case where the placement of vlines
or alines
requires time-axis interpolation between trading points in the OHLC data. (The interested user can see the code here.)
The cases where this does affect mplfinance users are those that involve directly accessing the mplfinance Axes objects in a way that requires use of the x-axis.
In such a case, the first rule is to be aware of the show_nontrading
setting. If this kwarg is not specified, then it defaults to False
. Then the following table applies:
- If
show_nontrading
isFalse
, then the x-axis variable is a floating point representing the row number of the data in the dataframe. - If
show_nontrading
isTrue
, then the x-axis variable is the matplotlib date.
When specifying x-axis data to an Axes object directly (i.e. not through mplfinance), the user must convert and pass the appropriate data (row number as a float, or matplotlib date).
- If a range of dates or datetimes are stored in a
Pandas DatetimeIndex
, then any Timestamp or datetime within that range can be easily converted to the floating point row number (interpolating for fractions of a row). We can do this using Pandas's ability to find (one or more rows) by the keys in an Pandas Index. - The simplest way to do this is to first convert the DatetimeIndex into a Pandas Series of datetimes (Timestamps) indexed by those same datetimes (Timestamps). If
dtindex
is the DatetimeIndex, then we simply do:dtseries = dtindex.to_series()
. - After this, the code looks something this:
- Note that (for simplicity) this code does not truly interpolate, but rather takes the midpoint if the datetime falls between two rows.
- For an example of code that linearly interpolates between rows see function
_date_to_iloc_linear()
(in file src/mplfinance/_utils.py).
def _date_to_iloc(dtseries,date):
'''Convert a `date` to a location, given a date series w/a datetime index.
If `date` does not exactly match a date in the series then interpolate between two dates.
If `date` is outside the range of dates in the series, then raise an exception
.
'''
d1s = dtseries.loc[date:]
if len(d1s) < 1:
sdtrange = str(dtseries[0])+' to '+str(dtseries[-1])
raise ValueError('User specified line date "'+str(date)+
'" is beyond (greater than) range of plotted data ('+sdtrange+').')
d1 = d1s.index[0]
d2s = dtseries.loc[:date]
if len(d2s) < 1:
sdtrange = str(dtseries[0])+' to '+str(dtseries[-1])
raise ValueError('User specified line date "'+str(date)+
'" is before (less than) range of plotted data ('+sdtrange+').')
d2 = dtseries.loc[:date].index[-1]
# If there are duplicate dates in the series, for example in a renko plot
# then .get_loc(date) will return a slice containing all the dups, so:
loc1 = dtseries.index.get_loc(d1)
if isinstance(loc1,slice): loc1 = loc1.start
loc2 = dtseries.index.get_loc(d2)
if isinstance(loc2,slice): loc2 = loc2.stop - 1
return (loc1+loc2)/2.0
- Pandas
DatetimeIndex
objects store their datetimes as PandasTimestamp
objects.Timestamps
are similar to but different from Pythondatetime
objects. - Matplotlib dates (datetimes) are simply floating point numbers where the whole number portion is the number of days since January 1st of the year zero on the Gregorian calendar, and the fractional portion is the fraction of a day since the start of the day (midnight).
- Afik, it is not possible to convert directly from
Timestamps
to Matplotlib dates, ratherTimestamps
must first be converted to Pythondatetimes
which are then converted to matplotlib dates (datetimes). - The code looks something like this:
import pandas as pd
import matplotlib.dates as mdates
import datetime
def _date_to_mdate(date):
# Whether a string or Timestamp, first convert to a Python datetime
if isinstance(date,str):
# use pandas to convert the string to a pydatetime
# (this could be done with module dateutil, but we're already using pandas)
pydt = pd.to_datetime(date).to_pydatetime()
elif isinstance(date,pd.Timestamp):
pydt = date.to_pydatetime()
elif isinstance(date,(datetime.datetime,datetime.date)):
pydt = date
else:
return None
# convert Python datetime to matplotlib datetime:
return mdates.date2num(pydt)