I’ve been using SQLite a bit in some of my own projects recently, and I think it’s a great solution for small self-hosted applications, avoiding the need to run a full-blown RDBMS where it’s not really warranted. Xe’s post about JSON in SQLite talks a lot about how to use SQLite to handle JSON data and load it into other tables without having to have your application do it, which is great.
On the other hand, in 2020 I spent a lot of time with a team demolishing an old enterprise system at work, with a fairly convoluted database structure. I learned a bit about PostgreSQL’s JSON functions in the process. I’ve been meaning to write up what I learned for some time now before I banished forgot it entirely, so Xe’s post was a good prompt.
It turns out that similar capabilities exist in both databases, so I’m going to translate Xe’s post more or less directly into PostgreSQL equivalents. My site doesn’t have a JSONFeed endpoint (yet, perhaps Zola will acquire one, one day), so I’m going to borrow Xe’s feed and Python script as the basis for what I do here. I’m using PostgreSQL 13.5 here. There are some neat features in 14, like array subscripting for JSON column types and things like that, which are worth reading up on.
Setting up the tables
SQLite uses text data types to store JSON in, and has operators which can work on that. However, PostgreSQL has a separate JSON datatype. In fact, it has two–json
and jsonb
1. The json
type stores an exact copy of the input text, while jsonb
transforms it into a binary form that is faster for many operations as well as being indexable. I’ve only ever used jsonb
, and I’ll continue to do that here.
Because PostgreSQL is a relational database server, there is some work around setting up a new database and its permissions that I’m going to skip past. There are plenty of tutorials for getting this up and running already elsewhere. Because I’m a Nix fan I used nix-shell
to install PostgreSQL, as described in a previous post. For this example I’m going to assume the database is local, and the database, username and password are all mydb
.
Let’s create a table, and throw some data into it. The key differences are the column types I’m using (timestamp
and jsonb
), and the slightly different syntax required for the default timestamp.
# schema.sql
( feed_url TEXT PRIMARY KEY
, scrape_date TIMESTAMP NOT NULL DEFAULT NOW
, raw JSONB NOT NULL
);
There are a couple of Python libraries for communicating with PostgreSQL, but the one I’m most familar with is the dubiously-named psycopg2
.
#!/usr/bin/env nix-shell
#! nix-shell -p python39 python39Packages.psycopg2 -i python
=
=
=
=
We can create the table from the schema above with the psql
command:
Then use the Python script to load the data:
JSON operators
PostgreSQL has a few operators that can be used for accessing data held in JSON columns. They look a little different from the ones in SQLite. The ->
operator returns a JSONB type, which can be queried further–we’ll see this in action later. However, the ->>
operator will return a text representation of whatever value it retrieves. Note that the names of the JSON fields need to be quoted, otherwise PostgreSQL will think that you are referring to a column.
mydb=> select raw->'title' as title, pg_typeof(raw->'title') from jsonfeed_raw;
title | pg_typeof
-------------+-----------
"Xe's Blog" | jsonb
(1 row)
mydb=> select raw->>'title' as title, pg_typeof(raw->>'title') from jsonfeed_raw;
title | pg_typeof
-----------+-----------
Xe's Blog | text
(1 row)
You can chain these operators, and also use integers to select array elements, so you can use something like this to select the title of the first post in the feed:
mydb=> select raw->'items'->0->'title' as post_title from jsonfeed_raw;
post_title
-------------------------------
"A Tool to Aid Forgetfulness"
(1 row)
Rather than using this syntax, it’s possible to use other operators to go more directly to the data you need. The query above could also be expressed as one of the following:
-- Using a list of object keys and array indices
select raw #> '{items,0,title}' as post_title from jsonfeed_raw;
-- Using JSONPath
select jsonb_path_query(raw, '$.items[0].title') from jsonfeed_raw;
Let’s go on and create the metadata table and insert the data into it. I only had to make one change to the syntax here, to the default value on the updated_at
column:
( feed_url TEXT PRIMARY KEY
, title TEXT NOT NULL
, description TEXT
, home_page_url TEXT
, updated_at TEXT NOT NULL DEFAULT NOW
);
Populating the second table looks fairly similar, although note the different operators inside the SELECT
:
INSERT INTO jsonfeed_metadata
( feed_url
, title
, description
, home_page_url
, updated_at
)
SELECT jsonfeed_raw.feed_url AS feed_url
, raw->>'title' AS title
, raw->>'description' AS description
, raw->>'home_page_url' AS home_page_url
, NOWAS updated_at
FROM jsonfeed_raw;
Triggers
Triggers are a little more complex in PostgreSQL. It’s not currently possible to write triggers in plain SQL2. Instead we have to write a function in another language and call that. The easiest choice in this case is to use PL/pgSQL, which is a PostgreSQL-specific superset of SQL. That’s going to look something like this:
EXISTS jsonfeed_raw_ins ON jsonfeed_raw;
IF EXISTS jsonfeed_update;
IF
AS $jsonfeed_update$
RETURNS trigger BEGIN
INSERT INTO jsonfeed_metadata
( feed_url
, title
, description
, home_page_url
)
VALUES ( NEW.feed_url
, NEW.raw->'title'
, NEW.raw->'title'
, NEW.raw->'home_page_url'
)
ON CONFLICT (feed_url) DO
UPDATE SET
title = NEW.raw->'title'
, description = NEW.raw->'description'
, home_page_url = NEW.raw->'home_page_url';
RETURN NULL; -- Ignored because this is an AFTER trigger
END;
$jsonfeed_update$ LANGUAGE plpgsql;
AFTER INSERT OR UPDATE OR DELETE ON jsonfeed_raw
FOR EACH ROW EXECUTE FUNCTION jsonfeed_update ;
This is a little more verbose, but does the same thing. Note the slightly different ON CONFLICT
syntax, where we have to specify a column name unlike in the SQLite version. Speaking of conflicts, let’s also update the Python script so that it does an update on a conflict in the jsonfeed_raw
table:
Now, if we delete everything from each of the tables and run the script again, we should see data in jsonfeed_metadata
.
mydb=> select * from jsonfeed_metadata;
feed_url | title | description | home_page_url | updated_at
-------------------------------------+-------------+------------------------------------------------------------+-----------------------------+-------------------------------
https://christine.website/blog.json | "Xe's Blog" | "My blog posts and rants about various technology things." | "https://christine.website" | 2022-01-12 22:15:36.920008+00
(1 row)
mydb=>
Extracting rows from the JSON
Like the json_each
function in SQLite, PostgreSQL can unpack arrays of items in JSON and use those to create more rows using a function called jsonb_array_elements
. Let’s create a table to put blog posts into:
( url TEXT PRIMARY KEY
, feed_url TEXT NOT NULL
, title TEXT NOT NULL
, date_published TIMESTAMP NOT NULL
);
We can populate this table as follows:
INSERT INTO jsonfeed_posts
( url
, feed_url
, title
, date_published
)
SELECT
items->>'url' AS url
, raw->>'feed_url' AS feed_url
, items->>'title' AS title
, (items->>'date_published')::timestamp AS date_published
FROM
jsonfeed_raw
, jsonb_array_elements(jsonfeed_raw.raw->'items') as items;
We can do the same thing here as we did before, and turn this into a trigger. Again, we’ll need to use a slightly different language to do it, but it’s not going to be particularly difficult.
EXISTS jsonfeed_raw_ins_posts ON jsonfeed_raw;
IF EXISTS jsonfeed_update_posts;
IF
AS $jsonfeed_update_posts$
RETURNS trigger BEGIN
INSERT INTO jsonfeed_posts
( url
, feed_url
, title
, date_published
)
SELECT
items->>'url' AS url
, raw->>'feed_url' AS feed_url
, items->>'title' AS title
, (items->>'date_published')::timestamp AS date_published
FROM
jsonfeed_raw
, jsonb_array_elements(jsonfeed_raw.raw->'items') as items
ON CONFLICT (url) DO
UPDATE SET title = EXCLUDED.title
, date_published = EXCLUDED.date_published;
RETURN NULL;
END;
$jsonfeed_update_posts$ LANGUAGE plpgsql;
AFTER INSERT OR UPDATE OR DELETE ON jsonfeed_raw
FOR EACH ROW EXECUTE FUNCTION jsonfeed_update_posts ;
Next time you run jsonfeedfetch.py
, you should have a fully-populated jsonfeed_posts
table containing all of the posts from the feed.
There are also operators that can work on JSON stored in text fields, but I’ve not used those at all.
From the PostgreSQL documentation on triggers