Some time ago I was writing some PostreSQL queries involving columns of type jsonb. For some reason I could not understand my query didn’t work as expected. I decided to dig a little bit deeper, and here is what I learned.
As we know, SQL (and PostgreSQL as well, of course) has the null
value, which is kind of tricky (for instance, it is not equal to itself). That makes sense, because SQL’s null
means “of unknown value”.
As we also know, JSON also has the null
value (or type). In JavaScript, it is equal (and even identical) to itself. This also makes sense, because JSON’s null
means “empty/no value”.
When we use PostgreSQL to query JSON values which may contain null
values, however, hilarity ensues. First of all, if you want to actually type a JSON null
in Postgres, you can say e.g. 'null'::jsonb
(let’s agree on using jsonb
and avoiding json
, and calling it just JSON). Of course, this value is different from an SQL null
. If you try to say null::jsonb
instead, you get an SQL null
, since (apparently) casting null
to anything results in null
(I could not find evidence for this in the PostgreSQL manual, but a few experiments seem to confirm this behavior, and it makes perfect sense).
Now there are at least two ways one can achieve some kind of null
when extracting parts of JSON objects: either an explicit null
is present in JSON, or the requested property is missing altogether. In the former case, we get a JSON null
, and in the latter an SQL null
, which is slightly weird given the semantic above, but I can accept it in light of the fact that SQL does not have an “empty” type/value, and there are no natural candidates for the job of representing a value not present in the JSON object (well, technically, PostgreSQL could throw an error in such a case, but it doesn’t – go figure).
Then, there are two ways one can extract something from JSON in PostgreSQL – as JSON (e.g. ->
) or as text (e.g. ->>
).
Combining the above we get the following results.
select '{"a": 1, "b": null}'::jsonb->'b';
yields a JSON null
(so something that does not satisfy the SQL “predicate” is null
). However,
select '{"a": 1, "b": null}'::jsonb->'c';
gives a proper SQL null
.
Using the text extraction operators, however, we get an SQL null both times:
select '{"a": 1, "b": null}'::jsonb->>'b' is null; select '{"a": 1, "b": null}'::jsonb->>'c' is null;
The latter is not unexpected at all – we get an SQL null
, which (after casting to text
) remains null
. The former, however, came as a surprise to me. I thought it should be the SQL string 'null'
, not the SQL null
, as casting an explicit JSON null
to text gives that string.
The programmers behind PostgreSQL didn’t agree, though. I could not understand why, and I asked a question on StackOverflow about it. After several hours I got a very detailed and interesting response. To cut a long story short, here is the rationale behind this seemingly strange behavior: if PostgreSQL returned 'null'
(i.e., a string) in response to select '{"a": 1, "b": null}'::jsonb->>'b';
, then the result would be indistinguishable from the result of select '{"a": 1,
"b": "null"}'::jsonb->>'b';
. One of the design principles of PostgreSQL, however, is that casting anything to text should give something parsable back to the original value (whenever possible).
Well, nice to learn something new.
And while we are at PostgreSQL null
and related topics, here is a nice tip I learned some time ago: if you put the line \pset null '∅'
in your .psqlrc
file, psql
will render null
values as Unicode “empty set” symbol. It is extremely useful if you need to distinguish between null
and an empty string in psql
’s output (unless you have Unicode empty set symbols in your data, of course).
And finally another tip, this time related to the jsonb
type, is the jsonb_strip_nulls
function. Given a JSONB object, it removes all properties whose values are explicit null
s (and it does it recursively, so deeper levels of hierarchy are affected, too). Note that it keeps null
s that are elements of arrays, though.