Times and Dates in Python -II

SALOME SONYA LOMSADZE
5 min readMar 6, 2021

Using Time Zone In Practice

All of the different time zones in the World as of 2017

pip install python-dateutil
from dateutil import tz

The format of tz database is ‘Continent/City’.

# Last ride
last_no_tz=datetime(2017,12,30,15,9,3)
last_tz=datetime(2017,12,30,15,9,3,tzinfo=et)
print(last_no_tz)
print(last_tz)
# First ride
first_no_tz=datetime(2017,10,1,15,23,25)
first_tz=datetime(2017,10,1,15,23,25,tzinfo=et)
print(first_no_tz)
print(first_tz)
2017-12-30 15:09:03
2017-12-30 15:09:03-05:00
2017-10-01 15:23:25
2017-10-01 15:23:25-04:00

You do not need to specify UTC offset by yourself thanks to tz database.

When we change the code with the first trip date, we get a different result since in some places clockes change twice a year (saving daylight).

Chp 4: Dates and Times in Pandas

Because we did not specify date columns as DateTime objects simply they are just strings. If we want Pandas to treat ‘start’, ‘end’ columns as DateTime parsing dates are necessary:

bike=pd.read_csv('capital-onebike.csv',parse_dates=['Start date','End date'])

Now the type of the columns is datetime64. Another method is:

bike[‘Start date’]=pd.to_datetime(bike[‘Start date’],
format=”%Y-%m-%d %H:%M:%S”)

Did you know that pandas has a pd.read_excel(), pd.read_json(), and even a pd.read_clipboard() function to read tabular data that you've copied from a document or website? Most have date parsing functionality too.

Duration time in seconds

bike['duration_time']=bike['End date']-bike['Start date']
bike['duration_second']=(bike['End date']-bike['Start date']).dt.total_seconds()
bike.duration_second.head()

How many days bike out of the station?

print(bike[‘Start date’].max()-bike[‘Start date’].min())89 days 23:45:38print(bike.duration_time.sum()/timedelta(days=91))0.04348417785917786

The bike was out about 4.3 % of the time, meaning 96% of the time bike was waiting in the station.

Member type and duration

bike.groupby(‘Member type’)[‘duration_second’].mean()

Rides from casual members last nearly twice as long on average. You can count values with groupby(‘column’).size(). Or, you can call groupby(‘column’).first() for the first row of each column.

We can also group by time with pandas resample. Notice that resample() method can only be used with DateTime or timestamp objects. ‘M’ is for months, ‘Y’ is years, ‘D’ is days.

bike.resample(‘M’,on=’Start date’)[‘duration_second’].mean()

Plotting results

One big outlier occurs in the middle of Oct. Average rides in that period were 25000 seconds long, or nearly 7 hours. It’s possibly a bike repair. Bike repair probably happened after many days of zero ride which makes sense.

How many joy rides?

If we have a theory that some people take long bike rides before putting their bike back in the same dock. Let’s call these rides “joyrides”.

# Create joyridesjoyrides = (bike['Start station'] == bike['End station'])# Total number of joyridesprint("{} rides were joyrides".format(joyrides.sum()))# Median of all ridesprint("The median duration overall was {:.2f} seconds"\.format(bike['duration_second'].median()))# Median of joyridesprint("The median duration for joyrides was {:.2f} seconds"\.format(bike[joyrides]['duration_second'].median()))
6 rides were joyrides
The median duration overall was 660.00 seconds
The median duration for joyrides was 2642.50 seconds

Is it cold outside?

Washington, D.C. has a high temperature in October (68ºF / 20ºC) and it is certainly higher than the average high temperature in December (47ºF / 8ºC). But people also travel more in December, and they work fewer days. Let’s observe some trends on the graph

Because in December it is colder outside, traveling in hot areas or staying at home sounds more attractive!

Do members and casual riders drop off at the same rate over October to December, or does one drop off faster than the other?

# Resample rides to be monthly on the basis of Start date
monthly_rides = bike.resample(‘M’, on = ‘Start date’)[‘Member type’]
# Take the ratio of the .value_counts() over the total number of rides
print(monthly_rides.value_counts() / monthly_rides.size())

It certainly looks like the fraction of Casual riders went down as the number of rides dropped. With a little more digging, we could figure out if keeping Member rides only would be enough to stabilize the numbers of users throughout the fall.

# Group rides by member type, and resample to the month
grouped = bike.groupby(‘Member type’)\
.resample(‘M’, on = ‘Start date’)
# Print the median duration for each group
print(grouped[‘duration_second’].median())

It looks like casual riders consistently took longer rides, but both groups took shorter rides as the months went by.

How long per day?

# Add a column for the weekday of the start of the ride
bike[‘Ride start weekday’] = bike[‘Start date’].dt.strftime(“%A”)
# Print the median trip time per weekday
print(bike.groupby(‘Ride start weekday’)[‘duration_second’].median())

How much time elapsed between rides?

# Shift the index of the end date up one; now subract it from the start date
bike[‘Time since’] = bike[‘Start date’] — (bike[‘End date’].shift(1))
# Move from a timedelta to a number of seconds, which is easier to work with
bike[‘Time since’] = bike[‘Time since’].dt.total_seconds()
# Resample to the month
monthly = bike.resample(‘M’, on = ‘Start date’)
# Print the average hours between rides each month
print(monthly[‘Time since’].mean()/(60*60))

--

--

SALOME SONYA LOMSADZE

Sr. Customer Analytics , BI Developer, Experienced in SQL, Python, Qlik, B.Sc in Chemistry at Bogazici University https://github.com/sonyalomsadze