Historically, when I wanted to store data in a database, I (or the project/team I was on) used an incrementing integer to uniquely identify each row (e.g. the SERIAL type in PostgreSQL).
Later, many of my teams/projects switched to random or pseudorandom string identifiers. These have many advantages over incrementing integers, especially when used as public identifiers (e.g. in URLs):
- String Ids can contain extra info, such as their type (e.g. whether it’s a User Id, Payment Id, etc). This helps with debugging and support.
- String Ids cannot be used to infer data size or growth from a random Id (e.g. if a newly created user returns a URL like /users/4321, you can infer there are ~4300 users).
- Typos or copy/paste errors don’t result in a valid but incorrect string Id the way they might with a numeric Id (e.g. 1234 -> 123).
- Sharding or splitting the dataset across databases is easier if you don’t have to worry about numeric sequences and collisions (and you can even embed shard info into the Id if desired).
One easy way to generate unique, random identifiers is by using a UUID. But lately, I’ve been using ULID types instead. ULID stands for
Universally Unique Lexicographically Sortable Identifier, which is like a time sortable UUID.
ULIDs look like
01GPC4NAN03RXV2EXS7308BHJ6, and we can include extra information by prepending. For example, a Payment Id could be
Benefits from the spec:
- 128-bit compatibility with UUID
- 1.21e+24 unique ULIDs per millisecond
- Lexicographically sortable!
- Canonically encoded as a 26 character string, as opposed to the 36 character UUID
- Uses Crockford’s base32 for better efficiency and readability (5 bits per character)
- Case insensitive
- No special characters (URL safe)
- Monotonic sort order (correctly detects and handles the same millisecond)
A few more benefits:
- Sortable Ids are handy for things like pagination, especially when you use cursors instead of offsets (e.g. with GraphQL Pagination and Edges).
- Sortable Ids can be more performant and less fragmented in data structures and indexes (e.g. than a random UUIDv4).
- ULID Ids can replace a
created_atcolumn if desired since the time is embedded.
And there are implementations in many languages.
ULID Tools Website
One downside of ULIDs, however, is their lack of tooling. Periodically, I’d want a quick way to generate new ULIDs. Or I’d want to parse an existing ULID and see when it was generated (since they embed the timestamp).
It currently does 3 things:
- Generates new ULIDs at the current time
- Generates new ULIDs at a user specified time
- Decodes existing ULIDs and displays the time
In fairness, everything comes with tradeoffs and ULIDs aren’t without their faults. For example:
- Numeric Ids take up a lot less space in the database.
- ULIDs are a bit long, which makes URLs super long (e.g.
- There may be cases where it’s undesirable to expose when an Id was created.
There’s a draft spec for new UUID versions which are time sorted (inspired by ULID and others): https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format
Maybe these will be accepted and gain widespread adoption in the future.