Marcin Borkowski: 2025-07-26 Finding entities with given uuids in the current project

Note: this post is a longer one, especially that not only do I show the code I’ve written, but I also discuss some design considerations that led me to it.

One of the Node.js projects I work on has some fake/seed data in JSON files. These data are inserted into the local database during development. (Your mileage may vary – you might use Knex for seeding, or you might have your seeds as csvs or even sql files with a bunch of inserts.)

Of course quite a few tables contain some foreign keys. Sometimes I open one of the JSON files of such tables in Emacs, for example to edit some data – but it is often the case that the related data (pointed to by the foreign key) should also be changed. In cases like this it is very useful to be able to find those related data quickly.

Some time ago I had the idea that it would be great if I could easily see all the data about some entity when the point is on its id. I assumed that I could somehow detect when the point is on a uuid (this one is actually easy using thing-at-point) and then use rg to find that id in my codebase (and this is where things get complicated). Of course, this won’t work very smoothly – it will also find all the places where the id is referenced, which is not what I want.

One way to partially mitigate that would be to ignore the files which contain the given uuid more than once. Here, I assume that I have one seed file per table, and that if a uuid appears more than once in a table, it must be a reference, not the key (since the latter is unique). Of course, it may still happen that a reference just happens to be unique (especially in a small dataset), so this won’t always work.

There are a few other ways I could use to solve this. One would be to devise some heuristics. For example, if a table is called user or users, the corresponding seed file is in JSON format, and some property (which may correspond to a column) is called id, or user_id, or userId, then it’s probably the primary key for this table. Needless to say, this approach seems pretty complicated and extremely fragile (though I think if could work often enough to be feasible). I’m definitely not a fan of it.

Another approach could be for my Elisp code to somehow analyze the actual database structure and check which columns are primary keys, which ones are foreign keys etc. I did not want to go down that route (although it looked promising) – I wanted my solution to be based solely on files in the repo. Connecting to the database would make it much more complicated – the database might be in a virtual machine or a container, not easily available from the host machine, or kept in various databases – in my case it’s PostgreSQL, but it could be MySQL or SQLite or whatever.

Some of you might already be thinking, why don’t I utilize an LLM to infer the database structure from the seed files? While I’m generaly rather skeptical about LLMs, I admit that this might be one of the legitimate and useful applications of them. Still, it should preferably be a local model (because I don’t want random strangers on the internet to know too much about my project, which might even be confidential), and it could be doable but tricky to make this solution fast enough to be useful.

After considering my options, I decided to go with the common route of using the human brain to solve a problem which is too difficult for a computer. This is perhaps an underappreciated method. We programmers love to automate things, but sometimes it may be a good idea to automate 80% of a thing and let the operator perform the missing 20%, preferably using an intuitive UI. This idea is better than you may think, and is actually quite common (though maybe a bit old-fashioned). A classical example would be the Emacs sentence commands. It is very useful to be able to operate on sentences when writing in a natural language, and when Emacs was written, natural language processing was too simple, too slow or both to be able to do this “correctly”. (Of course, you can’t just assume that a period ends the sentence since abbreviations do exist.) The solution Emacs went with is to expect the human writer to type two spaces after a sentence-ending period and one after an abbreviation-ending one. (A somewhat similar apprach is employed by LaTeX and uses the \@ macro.) Yet another example is my own mrr-replace-mode.

So, let’s see how this could work. I’d like to have a function, which would check if the point is on a uuid, and if yes, search for that uuid in the current project. It should make a list of all files which contain exactly one instance of that uuid (optionally including files containing more than one) and show the user the given uuid in the first of those files. It should also show some context around that uuid – for instance, if it is a user_id, the context will hopefully include the user’s name andor email. Since it may happen that this is not/ the place I’m interested in, it should allow the operator to change the displayed file easily, preferably with some “next/previous file”-style commands.

Note how I tried to make it as general as possible. For example, I did not assume that the files would be in JSON format. If I did, I could employ some trickery I mentioned above to decide which file to show first (for instance, if the searched uuid was present in a JSON file as a value of a user_id property, the code could check if there is a file called user.json with that id and show it by default). As I mentioned, tricks like that would be nice, but they would make the code a lot more complex and less general.

After I made that decision, I switched to some other projects, including my Eldoc experiments, and I realized that employing Eldoc for this would be nearly perfect. This, however, would preclude my idea about going to the next or previous file found – as far as I know, Eldoc cannot do that, it has to work fully automatically (which makes sense). This means going back to the drawing board (although I still keep other ideas in my notes).

Let’s make the solution less and more general at the same time. First of all, instead of coding everything in Elisp, I’m going to write a simple Node.js script accepting the uuid and outputting the entity found. The script will indeed connect to the database. As I said, this is less general than my first approach, since it will be now restricted to PostgreSQL databases (although it’s probably easy to adapt it to MySQL or SQLite etc.), and the database must be running and accepting connections for it to work. On the other hand, it is also more general at the same time, since the script won’t be tied to Emacs and the solution could be made to work for other editors, too. In fact, this is the very reason I want to write an external script and not an Elisp function to do that.

So, in today’s post I’ll discuss the script I wrote, and I’m going to show how to plug it into Eldoc in a future post.

The first thing we need to do is to query the database to get all the tables which have a uuid-typed primary key. I admit that I’m not good enough at PostgreSQL to write that by myself, but ChatGPT is:

select
    tc.table_schema as schema,
    tc.table_name as table,
    kcu.column_name as column
from
    information_schema.table_constraints as tc
join
    information_schema.key_column_usage as kcu
    on tc.constraint_name = kcu.constraint_name
    and tc.table_schema = kcu.table_schema
join
    information_schema.columns as c
    on c.table_schema = kcu.table_schema
    and c.table_name = kcu.table_name
    and c.column_name = kcu.column_name
where
    tc.constraint_type = 'PRIMARY KEY'
    and c.data_type = 'uuid'
order by
    tc.table_schema,
    tc.table_name;

Note that after ChatGPT wrote this for me, I looked at it carefully and it turns out that it’s actually pretty simple. Needless to say, a diligent code review written by LLMs is an obvious necessity!

The next thing is that I want to be able to look up my uuids as quickly as possible. I could make a script accepting a UUID on the command line, running the query above, and then trying to find the given UUID in all the tables it found, but that would take some time. Finding the set of tables to search already takes over 350ms, and I have about twenty tables with UUIDs as primary keys. Assuming querying each of them takes 15ms, I would have to wait about half a second on average to find a UUID. That could be fast enough, but why not make it faster? The list of tables to query won’t change too often, so getting it each time I need to find any UUID seems pretty wasteful.

Here is an idea. Instead of a script driven from the command line, I can write a REPL-type tool which – upon starting – reads the list of tables from the database into memory, and then accepts UUIDs on stdin and looks for them. To make it even faster, I made an assumption that when the user searches for several UUIDs in a row, it is reasonable to expect that they will all come from the same column, so the script remembers which one was a hit last time and starts searching from that. (If you read my blog regularly, you will certainly realize this idea is exactly the reason I wrote about interacting with external processes via their stdin and stdout before, and then showed how to make it work with processes which may be slow or less reliable.)

So, here is the script. I decided to keep the dependencies minimal, so it only uses dotenv, uuid-validate and two PostgreSQL-related modules. The whole dependency tree has less than 20 modules and a bit over 1MB, which is very little for a Node.js project (and yes, there is a bit of sarcasm here). It uses the readline library which gives me the usual readline goodies. Other interesting tidbits are:

terminal: false which disables ANSI terminal codes in the script’s output (as they were interfering with my Emacs interface);
the pg-format module which lets me build PostreSQL queries without fear of SQL injection (which couldn’t happen in this simple script anyway, but it’s a good habit and I wanted to keep it);
the format_result function which performs two tricks with the JSON it receives from the database. Firstly, it pushes all fields with the string name in their key to the front. The idea is that if a table contains a column whose name contains that very string, it probably means that it is a human-readable column (first or last name of a person, or a name of some entity), so it is a good idea to show it near the beginning. Secondly, it puts the column the UUID was found in as the next one. Then, it omits any remaining columns whose type is not a text-based one or a time-based one. This way, all names, descriptions, enums etc. will be shown as well as dates and timestamps, but not arrays, JSON(B) columns etc. The idea is that those are not needed to properly identify the row, and showing them would make the output much less legible.

#!/bin/env node

import {createInterface} from 'node:readline';
import {fileURLToPath} from 'node:url';
import {dirname} from 'node:path';

import dotenv from 'dotenv';
import {Client} from 'pg';
import uuid_validate from 'uuid-validate';
import format from 'pg-format';

const rl = createInterface({
        input: process.stdin,
        output: process.stdout,
    terminal: false,
        prompt: 'uuid:'
});

async function main() {
        const __dirname = dirname(fileURLToPath(import.meta.url));
        const env = {};
        dotenv.config({path: __dirname + '/.env', processEnv: env});
        const {HOST: host, DATABASE: database, PORT: port, USER: user, PASSWORD: password} = env;
        const client = new Client({
                host,
                database,
                port,
                user,
                password,
        });
        await client.connect();
        const columns = (await client.query(`select
                tc.table_schema as schema,
                tc.table_name as table,
                kcu.column_name as column
        from
                information_schema.table_constraints as tc
        join
                information_schema.key_column_usage as kcu
                on tc.constraint_name = kcu.constraint_name
                and tc.table_schema = kcu.table_schema
        join
                information_schema.columns as c
                on c.table_schema = kcu.table_schema
                and c.table_name = kcu.table_name
                and c.column_name = kcu.column_name
        where
                tc.constraint_type = 'PRIMARY KEY'
                and c.data_type = 'uuid'
        order by
                tc.table_schema,
                tc.table_name;
        `)).rows;
        let column_index = 0;

        async function process_uuid(uuid) {
                let starting_index = column_index;
                let result;
                do {
                        const query = format(
                                'select * from %I.%I where %I = %L',
                                columns[column_index].schema,
                                columns[column_index].table,
                                columns[column_index].column,
                                uuid,
                        );
                        result = await client.query(query);
                        if (result.rows.length > 0) {
                                break;
                        }
                        column_index = (column_index + 1) % columns.length;
                } while (starting_index !== column_index)
                if (result.rowCount > 0) {
                        return {
                                table: columns[column_index].table,
                                column: columns[column_index].column,
                                entity: result.rows[0],
                        };
                }
        }

        function format_result(row, column) {
                const entries = Object.entries(row);
                const name_entries = [];
                let id_entry;
                const rest_entries = [];
                for (const [key, value] of entries) {
                        if (/name/i.test(key)) {
                                name_entries.push([key, value]);
                        } else if (key === column) {
                                id_entry = [key, value];
                        } else if (['string', 'Date'].includes(typeof value)) {
                                rest_entries.push([key, value]);
                        }
                }
                return Object.fromEntries(
                        [...name_entries, id_entry, ...rest_entries],
                );
        }

        rl.prompt();

        rl.on('line', async(line) => {
                const uuid = line.trim();
                if (uuid_validate(uuid)) {
                        const row = await process_uuid(line);
                        if (row) {
                                console.log(JSON.stringify({
                                        table: row.table,
                                        entity: format_result(row.entity, row.column),
                                }));
                        } else {
                                console.error('not found!')
                        }
                } else {
                        console.error('Invalid uuid!');
                }
                rl.prompt();
        });

        rl.on('close', async() => {
                console.log('\nBye!');
                await client.end();
                process.exit(0);
        });
}

main();

That’s it for today!

CategoryEnglish, CategoryBlog, CategoryEmacs, CategoryPostgreSQL