2024-05-04 Pretty printing SQL

I work with PostgreSQL on a daily basis, and I often share queries with my teammates. I like to show them nicely formatted queries instead of the mess I sometimes write;-). A bit surprisingly, I had troubles finding a nice SQL pretty-printer/formatter.

For a few years I used sqlformat, a tool written in Python. I was quite happy with it, until I noticed that it didn’t work with queries referring to jsonb columns properly. For example, this query

select jsonb_column->'property',

becomes this mess.

select jsonb_column->'property',

After I had to deal with a query which selected well over twenty or thirty properties from jsonb columns, I decided to stop and look for another solution.

This time, I found SQL Formatter. This one is written in JavaScript, and can be used both from JS code and command line. Since I’m primarily working with Node.js, this looks very good – there is a potential for logging formatted queries instead of one-line monstrosities, for example. My first use, however, is this quick-and-dirty Emacs command.

(defun sql-format-region ()
  "Format SQL in region using `sql-formatter'."
  (shell-command-on-region (region-beginning)
                           "sql-formatter -c ~/.sql-formatter.json"

enabling me to quickly format an SQL query in the active region. The ~/.sql-formatter.json file looks like this:

  "language": "postgresql",
  "keywordCase": "lower",
  "dataTypeCase": "lower",
  "functionCase": "lower"

As you can see, I’m not fond of screaming at my database;-).

(My simple command probably should use the r interactive code, but I never had the need to call it non-interactively, so there’s no benefit for me to refactoring it into something nicer.) The above query looks after formatting looks like this:

  jsonb_column -> 'property',
  jsonb_column -> 'another' -> 'subproperty',
  jsonb_column -> 'yet' -> 'another'

Maybe it’s not my favorite style, but it’s legible and consistent. Here is a more complicated query (which I shamelessly asked an LLM to write for me):

WITH cte AS (
  SELECT id, json_column->>'key1' AS value1, json_column->>'key2' AS value2
  FROM table1
SELECT cte.id, table2.column1, SUM(table2.column2) AS total
FROM cte
JOIN table2 ON cte.value1 = table2.value1
JOIN table3 ON cte.value2 = table3.value2
WHERE (cte.value1 = 'some_value' AND table2.column3 = 'another_value')
   OR (cte.value2 = 'another_value' AND table3.column4 = 'some_value')
GROUP BY cte.id, table2.column1
HAVING SUM(table2.column2) > 100
ORDER BY cte.id ASC;

and after reformatting, it looks like this:

  cte as (
      json_column ->> 'key1' as value1,
      json_column ->> 'key2' as value2
  sum(table2.column2) as total
  join table2 on cte.value1 = table2.value1
  join table3 on cte.value2 = table3.value2
    cte.value1 = 'some_value'
    and table2.column3 = 'another_value'
  or (
    cte.value2 = 'another_value'
    and table3.column4 = 'some_value'
group by
  sum(table2.column2) > 100
order by
  cte.id asc;

After I found SQL Formatter, I stumbled upon Steve Purcell’s sqlformat.el package, which is way more advanced – for example, it supports several SQL formatters. It’s entirely possible that one day I’ll ditch my command in favor of sqlformat.el, but for now it is sufficient for me. (One reason I might want to use it is that it can use even more formatters than the two I’ve mentioned, and some of them look pretty advanced and configurable.)

CategoryEnglish, CategoryBlog, CategoryPostgreSQL, CategoryEmacs