Date, Time and Time Zones

October 4, 2023·Erik Naggum,Christopher Tyler,Dimitri Fontaine

While reading The Art of PostgreSQL, the section dealing with dates and time was very relevant to me. A lot of the data I deal with work deal with dates and time, sometimes without a time zone, sometimes with a time zone and in different formats. This article is a combination from the Art of PostgreSQL and a web post from Erik Naggum.

TLDR, always use timestamp with time zones. If you don’t have timezone, it will be very difficult to do it afterwards. For example you could assume that the timestamp is local but it could be based on UTC instead even though your company is only based in one time zone.

The Long Painful History of Time

This section is from Erik’s article.

The measurement of time has a very long history dating back thousands of years. The basic problem with time is that we need to express both time and place whenever we want to place some event in time and space, yet we tend to assume spatial coordinates even more than we assume temporal coordinates and in the case of time in ordinary communication, it is simply left out entirely.

Scientific Time

Scientific time is typically easy to deal with since the scientific community just need to agree. For example we have a world-wide standard for time, called the Coordinated Universal Time (UTC). Common Lisp starts at 0 seconds to 1900-01-01 while Unix 0 is 1970-01-01 UTC.

Political Time

Political time is closely related to territory, power and collective human irrationality. Normally we would say the political rotation takes 24 hours but one day out of the year it takes only 23 hours and another day out of the year it takes 25 hours thanks to the wonders of daylight saving time. Yet it is this political time that most people want their computers to produce when they ask for the date or the time of day.

Key Takeaways

Basically it can be difficult to understand time unless you have the location and some reference (time zone). Even with that it can be really complicated due to Political Time. There really should be no reason to figure out time zones on your own. Most programming languages will have a library/tool that will take care of this for you.

Time Intervals

The rest of the article are my notes from reading Dimitri’s book.

An interval describes a duration like a month or two weeks. Keep in mind from the earlier section, it is still important to know the location due to political time.

Date/Time Processing and Querying

I needed to get the data:

> git clone git@github.com:dimitri/pgloader.git
> git log --pretty=format:'pgloader,%h,%an,%ad,%cn,%cd,"%s"' --date=short > ~/programming/taop/pg_loader_git.csv
> git clone git://git.postgresql.org/git/postgresql.git
> git log --pretty=format:'postgresql%x09%h%x09%an%x09%ad%x09%cn%x09%cd%x09"%s"' --date=short --encoding=utf-8 > ~/programming/taop/postgresql_git.csv
> awk '{gsub(/[^a-zA-Z0-9 ,"]/,"",$0); print}' pg_loader_git.csv > pg_loader_git_awk.csv
> awk '{gsub(/[^a-zA-Z0-9 ,"|\t]/,"",$0); print}' postgresql_git.csv > postgresql_git_awk.csv

I had to use tabs for PostgreSQL because I was having issues with , and |. Then I need to create the table and load in the data.

postgres@192:training> drop table if exists taop.commitlog;
 create table taop.commitlog (
          id bigserial not null
         ,project text null
         ,hash text null
         ,author text null
         ,ats timestamptz null
         ,committer text null
         ,cts timestamptz null
         ,subject text null
         ,primary key (id)
 );
You're about to run a destructive command.
Do you want to proceed? (y/n): y
Your call!
table "commitlog" does not exist, skipping
None

DROP TABLE
CREATE TABLE
Time: 0.044s

Finally load the data:

> cat pg_loader_git.csv | psql --no-psqlrc -U postgres -h psql -h 192.168.0.113 -p 5434 training -c "copy taop.commitlog (project, hash, author, ats, committer, cts, subject) from stdin with (format csv, header false);"
> head -n 25000 postgresql_git_awk.csv | psql --no-psqlrc -U postgres -h psql -h 192.168.0.113 -p 5434 training -c "copy taop.commitlog (project, hash, author, ats, committer, cts, subject) from stdin with (format csv, header false, delimiter E'\t');"

I had an issue with PostgreSQL even with tab delimited so I took the 1st 25,000 rows which was able to load with no issues.

postgres@192:training>select extract(year from ats) as year
         ,count(*) filter(where project = 'postgresql') as postgresql
         ,count(*) filter(where project = 'pgloader') as pgloader
     from taop.commitlog
 group by year
 order by year;
+------+------------+----------+
| year | postgresql | pgloader |
|------+------------+----------|
| 2011 | 1423       | 0        |
| 2012 | 1605       | 0        |
| 2013 | 1368       | 386      |
| 2014 | 1745       | 387      |
| 2015 | 1817       | 218      |
| 2016 | 2086       | 139      |
| 2017 | 2469       | 193      |
| 2018 | 2122       | 148      |
| 2019 | 2130       | 73       |
| 2020 | 2179       | 60       |
| 2021 | 2276       | 15       |
| 2022 | 2484       | 51       |
| 2023 | 1296       | 0        |
+------+------------+----------+
SELECT 13
Time: 0.017s

Looking at the commit history, let’s look at how many commits each project had. This output is a pivot query.

We can also build a reporting on the repartition of commits by weekday from the beginning of the project in order to guess if contributors are working on the project on the job only or on the weekend.

postgres@192:training>select extract(isodow from ats) as dow
         ,to_char(ats, 'Day') as day
         ,count(*) as commits
         ,round(100.0 * count(*) / sum(count(*)) over(), 2) as pct
         ,repeat('*', (100 * count(*) / sum(count(*)) over())::int) as hist
     from taop.commitlog
    where project = 'postgresql'
 group by dow, day
 order by dow;
+-----+-----------+---------+-------+-------------------+
| dow | day       | commits | pct   | hist              |
|-----+-----------+---------+-------+-------------------|
| 1   | Monday    | 4109    | 16.44 | ****************  |
| 2   | Tuesday   | 4127    | 16.51 | ***************** |
| 3   | Wednesday | 4351    | 17.40 | ***************** |
| 4   | Thursday  | 4190    | 16.76 | ***************** |
| 5   | Friday    | 4132    | 16.53 | ***************** |
| 6   | Saturday  | 2166    | 8.66  | *********         |
| 7   | Sunday    | 1925    | 7.70  | ********          |
+-----+-----------+---------+-------+-------------------+
SELECT 7
Time: 0.019s

This is just a couple of things you can do using PostgreSQL and timestamps.

Key Takeaways

Basically dealing with dates, time and time zone can get complicated and it is easy to make mistakes. PostgreSQL deals with this for you really well. It is easier to ensure that the data is saved to PostgreSQL with proper timestamps with time zones instead of depending on your program language that is interacting with the database.

SQL Window Functions Explained Rebuild Pelican Source