Two Exciting PostgreSQL Features To Improve NULL Handling
I recently learned about two PostgreSQL features that make handling NULL
values more sane.
NULL Values In Unique Columns
A well known but annoying weirdness with NULL
values is that NULL != NULL
, so a UNIQUE
column can still have multiple NULL
values.
(Examples use numeric id columns for simplicity, but I generally prefer more complex ids such as ULIDs.)
CREATE TABLE test (
id serial PRIMARY KEY,
value TEXT UNIQUE
);
INSERT INTO test (value) VALUES ('a');
-- This fails on the duplicate:
-- ERROR: duplicate key value violates unique constraint "test_value_key"
-- DETAIL: Key (value)=(a) already exists.
INSERT INTO test (value) VALUES ('a');
-- But this does not:
INSERT INTO test (value) VALUES (null);
INSERT INTO test (value) VALUES (null);
SELECT * from test;
id | value
----+-------
1 | a
3 |
4 |
(3 rows)
However, PostgreSQL 15 released a new feature which can change this behavior: UNIQUE NULLS NOT DISTINCT
:
CREATE TABLE test (
id serial PRIMARY KEY,
value TEXT UNIQUE NULLS NOT DISTINCT
);
-- Now this fails on the second insert:
-- ERROR: duplicate key value violates unique constraint "test_value_key"
-- DETAIL: Key (value)=(null) already exists.
INSERT INTO test (value) VALUES (null);
INSERT INTO test (value) VALUES (null);
Read more about it in the release notes: PostgreSQL Release 15 Notes
Ensuring A Single Column Has a Value
A common use case that I’ve run into is a table which has multiple foreign keys, but only one is expected to be populated. For example, say we have a notifications
table to represent notifications we send out (e.g. emails, text messages, etc). These notifications might be triggered and related to a specific entity in our system, such as an order, user, company, etc. We want to add foreign keys to represent what this notification is for, but we only want to populate one of them.
For example:
CREATE TABLE notifications (
id serial,
company_id INT REFERENCES companies (id),
order_id INT REFERENCES orders (id),
user_id INT REFERENCES users (id)
);
INSERT INTO notifications (company_id) VALUES (100);
INSERT INTO notifications (order_id) VALUES (200);
SELECT * from notifications;
id | company_id | order_id | user_id
----+------------+----------+---------
1 | 100 | |
2 | | 200 |
(2 rows)
Often, I’ll see a table like this add a constraint to ensure that at least one of the columns is populated:
ALTER TABLE notifications
ADD CONSTRAINT notifications_reference
CHECK (company_id IS NOT NULL OR order_id IS NOT NULL OR user_id IS NOT NULL);
However, this does not stop you from accidentally populating more than one column. This can happen easily if you use an Object Relational Mapper (ORM) which generates the SQL for you, and you’ve accidentally set more than one attribute of your object:
INSERT INTO notifications (company_id, order_id, user_id)
VALUES (NULL, 300, 400);
SELECT * from notifications;
id | company_id | order_id | user_id
----+------------+----------+---------
1 | 100 | |
2 | | 200 |
3 | | 300 | 400
(3 rows)
There’s a pair of PostgreSQL functions which can let us write a better constraint checks, though, called num_nulls/num_nonnulls
. This lets us check that there is only one non NULL value among a set of columns. For example:
ALTER TABLE notifications
ADD CONSTRAINT notifications_reference
CHECK (num_nonnulls(company_id, order_id, user_id) = 1);
-- Now we get an error on insert if there is more than one value:
-- ERROR: new row for relation "notifications" violates check constraint "notifications_reference"
-- DETAIL: Failing row contains (3, null, 300, 400).
INSERT INTO notifications (company_id, order_id, user_id)
VALUES (NULL, 300, 400);
-- Or if there are no values:
-- ERROR: new row for relation "notifications" violates check constraint "notifications_reference"
-- DETAIL: Failing row contains (4, null, null, null).
INSERT INTO notifications (company_id, order_id, user_id)
VALUES (NULL, NULL, NULL);
Read more about them in the docs: Comparison Functions