Output Relationships as Nested JSON with PostgreSQL
Defining the data model
Our blogging app's database defines the following tables:
posts
contains the articlecategories
contains a blog categorypost_categories
defines the categories of each post.
The entity relationship diagram of this many-to-many relation looks as follows:
For our GetPosts
API, we want to fetch posts with their respective categories
using the following structure.
interface Category {
id: number;
name: string;
}
interface Post {
id: number;
body: string;
categories: Category[];
}
The tricky part here is that a normalized SQL database
returns flat data, but our API requires categories
to be a nested property
of Post
.
A naive attempt to query structured data
We could get our desired JSON structure by making multiple queries and building the desired structure on the server, but this is error-prone, tedious, and less performant.
Generally speaking, we can assume the Postgres query optimizer, written in C, will outperform our application code.
Postgres aggregate function to output relationships as nested JSON
To set this up, we
need to join posts
and categories
.
SELECT p.post_id, p.body, c.category_id, c.name
FROM posts p
LEFT JOIN post_categories pc ON p.post_id = pc.post_id
LEFT JOIN categories c ON c.category_id = pc.category_id
If we run this, we'll get the same post for each of the post's categories. For example:
post_id | category_id |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
2 | 1 |
2 | 2 |
3 | 1 |
So if a post has four categories, then this query will return four records for each of it's categories. This isn't exactly what we want because it would require us to wrangle the data server-side to get the JSON structure we want.
What we want is one row for each post, such that the categories are returned as JSON.
post_id | category_id |
---|---|
1 | [1, 2, 3] |
2 | [1, 2] |
3 | [1] |
For this, we're going to make use of PostgreSQL's jsonb_agg
aggregate function.
We'll update our query to group by post_id
and aggregate all the categories
associated with that post_id
, and render them as JSON using jsonb_agg
.
SELECT p.post_id, p.body, jsonb_agg(c) as categories
FROM posts p
LEFT JOIN post_categories pc ON p.post_id = pc.post_id
LEFT JOIN categories c ON c.category_id = pc.category_id
GROUP BY p.post_id
When we run this query we get a new categories
field with the nested JSON. ๐
There is a slight issue though. When a post has no categories, jsonb_agg
will
return the singleton array: [null]
.
How to fix the singleton [null]
array with coalesce
For our API, we don't want to have to deal with this edge case. Instead,
we'd rather return an empty list []
for posts that don't have any categories.
Here's how we'll modify our query to handle this case:
- Filter out any element that is
null
- Coalesce the result to the empty array
[]
Here's how it'll look.
SELECT p.post_id,
p.body,
coalesce(jsonb_agg(c) FILTER (WHERE c.category_id IS NOT NULL), '[]') as categories
FROM posts p
LEFT JOIN post_categories pc ON p.post_id = pc.post_id
LEFT JOIN categories c ON c.category_id = pc.category_id
GROUP BY p.post_id
When we run this query, we'll get a single record per post,
with each post's category as a nested JSON array.
If a post doesn't have any categories, it will return []
.
Next, we'll adapt this SQL query to the knex.js
query builder
to perform this query in our application code.
Feel free to skip this section is you're not using knex.js
in your app.
Use Knex.js
to query relationships as structured JSON
For the most part we can use the standard query builder methods.
We just need to add a raw SQL statement to do the JSON aggregation piece
i.e. coalesce
, jsonb_agg
, etc.
import knex from 'knex';
async function getPosts() {
return knex
.from('posts')
.leftJoin('post_categories', 'post.post_id', 'post_categories.post_id')
.leftJoin(
'categories',
'categories.category_id',
'post_categories.category_id',
)
.groupBy('post.post_id')
.select([
'post.post_id',
'post.body',
knex.raw(
`coalesce(jsonb_agg(category) filter (where category.id is not null), '[]') as categories`,
),
]);
}
And that's all there is to it. You should now be able to:
- Understand the pitfalls of translating normalized data to structured JSON
- Use Postgres aggregate function
jsonb_agg
to query relationships as structured data - Use the
knex.js
library to perform queries withjsonb_agg