Times and Dates in Python

SALOME SONYA LOMSADZE
7 min readMar 6, 2021

Dates are everywhere in Data Science & Analytics. Also, everywhere in the World of data; in voting, stock prices, sales, customer behaviors, etc. Knowing how to analyze data over time is a core skill for anyone who wants to be an expert in data related works.
This study is divided into 5 chapters:

  • Chp 1: Dates and Calendars
  • Chp 2: Combining Dates and Times
  • Chp 3: Time zones and Daylight Saving
  • Chp 4: Dates and Times in Pandas

Apart from “string”, “array” and numbers, Python has a special date class “date”.

Chp 1: Dates and Calendars

# Read pickle dataset of Florida Hurricanes 
import pandas as pd
from pandas import DataFrame
hurricane_list=pd.read_pickle('florida_hurricane_dates.pkl')
# Convert list into Pandas DataFrame
df=DataFrame(df,columns=['Date'])
from datetime import date
two_hurricanes_dates=[date(2016, 10, 7),date(2017, 6, 21)]
print(two_hurricanes_dates[0].year)
print(two_hurricanes_dates[0].month)
print(two_hurricanes_dates[0].day)

2016
10
7

The format of dates is ISO format: YYYY-MM-DD. After creating the DateTime objects by date() method, you can easily access year, month, and day information by dot-year, dot-month, and dot-day attributes. For more complicated work, like asking weekday, you can call the dot-weekday() method. Output yields numerical values from 0 (Monday) to 6 (Sunday). For months, it starts from 1 (January) to 12 ( December).

How many hurricanes come early?

Florida hurricane list consists of 235 rows of hurricanes that made landfall in Florida from 1950 to 2017. Atlantic hurricane season begins on June 1. I want to explore how many hurricanes have made landfall early.

# Initialize early_events to be zero
early_events = 0
# We loop over the dates
for hurricane in hurricane_list:
# Check if the month is before June (6).
if hurricane.month < 6:
early_events= early_events+ 1

print(early_events)

It’s 10. Now let’s do some math on dates such as counting days between two events, moving forward and backward by the number of days, putting dates in order, and so on.

How many hurricanes per month?

# A dictionary to counthurricanes_months= {1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6:0,7: 0, 8:0, 9:0, 10:0, 11:0, 12:0}# Loop over all hurricanesfor hurricane in hurricane_list:# Get the monthmonth = hurricane.month# Increment occuring events by onehurricanes_months[month] +=1print(hurricanes_months)

{1: 0, 2: 1, 3: 0, 4: 1, 5: 8, 6: 32, 7: 21, 8: 49, 9: 70, 10: 43, 11: 9, 12: 1}

Order Dates

To find the earliest and the latest dates sorted or min-max methods will be useful:

print(sorted(hurricane_list)[0])
print(sorted(hurricane_list)[-1])

1950-08-31
2017-10-29

Date to Strings

  • strftime() method is very flexible unless you want to use ISO format. You can give any arbitrary strings or formats for whatever you want.
ymd=date(2021,8,4)
print(ymd.strftime("Year is %Y"))
print(ymd.strftime("Today is %Y/%m/%d"))
print(ymd.strftime("%B (%Y)))
print(ymd.strftime("%Y-%j")

Year is 2021
Today is 2017/11/05
January (2021)
2021–216

Don’t be surprised:) Astronomers usually use the day number out of 366 to avoid ambiguities.

Chp 2: Combining Dates and Times

When you working with both date and time, you need to import DateTime from the DateTime module. The new object is called DateTime.DateTime object instead of date.DateTime.

from datetime import datetime
# First 3 arguments are exactly the same as date class.
dt=datetime(2017, 10, 1, 15, 23, 25 )
print(dt)
print(dt.replace(minute=0, second=0))

2017-10-01 15:23:25
2017-10-01 15:00:00

Capital Bikeshare

Capital Bikeshare is DC, Virginia, and Maryland’s bike share system, and the first large-scale bike-share in the US. The dataset can be downloaded here.

Each row consists of the start and end date of bike-sharing, locations, member types, and bike numbers. First, we need to convert ‘Start date’ and ‘End Date’ columns to DateTime objects in order to count trip events before and afternoon.

bike[‘Start date’]=pd.to_datetime(bike[‘Start date’])
bike[‘End date’]=pd.to_datetime(bike[‘End date’])
bike_datetimes=bike[[‘End date’,’Start date’]].to_dict(‘records’)
# Create dictionary to hold results
trip_counts = {'Before Noon (AM)': 0, 'After Noon (PM)': 0}

# Loop over all events
for events in bike_datetimes:
# Check to see if the trip starts before noon
if events['Start date'].hour < 12:
# Increment the counter for before noon
trip_counts['Before Noon (AM)'] += 1
else:
# Increment the counter for after noon
trip_counts['After Noon (PM)'] += 1

print(trip_counts)

{'Before Noon (AM)': 94, 'After Noon (PM)': 196}

It looks like the bike is used about twice as much after noon than it is before noon. One follow-up would be to see which hours the bike is most likely to be taken out for a ride.

Printing and parsing datetimes (strptime())

In short, strptime() standing for string parse time. It takes two arguments: the first is the string to parse, and the second argument is the same format of the string we’re working with. We need an exact match, otherwise, it raises a ValueError. It’s a very common situation for Pandas to read dates as strings (obj), strptime method is quite helpful for the data preparation process.

dt=datetime.strptime(“11/30/2017 15:29:13”,”%m/%d/%Y %H:%M:%S”)
print(dt)

2017-11-30 15:29:13

Unix Timestamp

Many computers store DateTime information behind the scenes as the number of seconds since January 1, 1970. This date is largely considered the birth of modern-style computers.

# A timestamp
ts=1614694153.0
print(datetime.fromtimestamp(ts))

2021-03-02 17:09:13

Creating Start and End Date Dictionary for Follow-Ups

x=bike[‘Start date’].to_list()
y=bike[‘End date’].to_list()
datetime_strings=list(zip(x,y))
print(datetime_strings)

[('2017-10-01 15:23:25', '2017-10-01 15:26:26'),
('2017-10-01 15:42:57', '2017-10-01 17:49:59')..and so on.

datetime_strings
# Write the format string
ft = “%Y-%m-%d %H:%M:%S”
# Initialize a list to hold the pairs of datetime objects
bike_datetimes = []
# Loop over all trips
for (start, end) in datetime_strings:
trip = {‘start’: start,
‘end’: end}

# Append the trip
bike_datetimes.append(trip)
bike_datetimes[1]

{'start': datetime.datetime(2017, 10, 1, 15, 42, 57),
'end': datetime.datetime(2017, 10, 1, 17, 49, 59)}

It is one of the data cleaning processes used in many complex tasks.

Working with durations

print(start)
print(start+timedelta(seconds=1))
print(start +timedelta(days=1, seconds=1))

2017-10-02 08:56:45
2017-10-02 08:56:46

2017-10-03 08:56:46

Parameters in Timedelta can be any number of weeks, days, minutes, hours, seconds, or microseconds. Also, it can be negative.

Let’s calculate the number of seconds that the bike was out of the station for each trip.

bike_durations=[]
for trip in bike_datetimes:
duration=trip[‘end’]-trip[‘start’]
duration_inseconds=duration.total_seconds()
bike_durations.append(duration_inseconds)

Average trip time

# What was the total duration of all trips?
total_duration = sum(bike_durations)
# What was the total number of trips?
number_of_trips = len(bike_durations)

# Divide the total duration by the number of trips
print(total_duration / number_of_trips)

It turns out to be 1178.93 seconds. To detect if there is fishy data let’s check min and max duration seconds.

The shortest trip was -3346.0 seconds
The longest trip was 76913.0 seconds

bike[bike[‘duration_in_seconds’]<0]

Weird huh?! But in November, clocks are moved back an hour at 2 a.m. local daylight time so they will then read 1 a.m. local standard time.

Chp 3: Time Zones and Daylight Saving

source

Until now, the DateTime and Timestamp objects we’ve worked with were what is called “naive”, and they can not be compared across different parts of the world. They do not know anything about corresponding time zones.
Before time zones, each city set its time according to when the sun was directly overhead which causes differences among cities in different locations. This difference matters now where you can move or communicate fast enough with someone that away from you thousands of miles.
Governments solved this issue by stating that all clocks within a wide area would agree on the hour, even if some were ahead or behind of their solar time. For example, the US has 4 major time zones, plus 1 for Alaska and the other for Hawaii. Our bike data observes Eastern Time.

Governments solved this issue by stating that all clocks within a wide area would agree on the hour, even if some were ahead or behind of their solar time. For example, the US has 4 major time zones, plus 1 for Alaska and the other for Hawaii. Our bike data observes Eastern Time. Since we’re not using the sun anymore, how do we know how to set the clock? The UK was the first to standardize its time, the World sets its clocks relative to the original UK standard, which is called UTC.

  • Usually, clocks west of the UK are earlies than clocks east of the UK. For example, the eastern US is UTC minus 5 hrs, while India typically is UTC plus 5 hours and 30 minutes.

Let’s see this in code

from datetime import timezone
# US Eastern time zone
et=timezone(timedelta(hours=-5))
# Timezone datetime
dt_et=datetime(2017,12,30,15,9,3, tzinfo=et)
dt_et_nontimezone=datetime(2017,12,30,15,9,3)
print(dt_et)
print(dt_et_nontimezone)

2017-12-30 15:09:03-05:00
2017-12-30 15:09:03

Now it includes UTC offset.

# India
ind=timezone(timedelta(hours=5, minutes=30))
# The Eastern time accoding to India Standar Time
print(dt_et.astimezone(ind))

2017-12-31 01:39:03+05:30

Same moment different clock.

# set the US Eastern as timezone
print(dt_et_nontimezone.replace(tzinfo=timezone.utc))
print(dt_et_nontimezone.astimezone(timezone.utc))

Results are the same : 2017-12-30 15:09:03+00:00

To be continued...

--

--

SALOME SONYA LOMSADZE

Sr. Customer Analytics , BI Developer, Experienced in SQL, Python, Qlik, B.Sc in Chemistry at Bogazici University https://github.com/sonyalomsadze