Blog

For the English part of the blog, see Content AND Presentation.

2023-05-29 A PostgreSQL query to set the serial counter

Some time ago I encountered an unpleasant bug I made in a certain PostgreSQL query. Assume that you have a table with an integer primary key id, autoincremented on every insert.

create table some_table(
  id serial primary key,
  content text
);

Assume also that in the course of work, you need to “reset” this table once in a while – that is, truncate it and then populate with some data used for development and testing. Of course, after populating it, you want the “next value” of id to be set so that the next insert will use the first available integer value.

The way I usually solved that was using the following query:

select setval('some_table_id_seq', max(id)) from some_table;

It works pretty well except in one case – when the initial contents of the table are empty. It turns out that when you give null as the second argument to setval, it essentially becomes a no-op. The solution is easy:

select setval('some_table_id_seq', coalesce(max(id) + 1, 1), false) from some_table;

This way the table is ready for entering new data whether it is empty or not.

That’s it for today. Happy hacking!

CategoryEnglish, CategoryBlog, CategoryPostgreSQL

Comments on this page

2023-05-20 How to filter rows with duplicate ids in a csv

Some time ago I encountered an interesting problem. I had a CSV file (imported from a database), and I wanted to see only rows for which a certain column was duplicated. For example, like this (assume that the table is sorted by user_id, so the rows with the same ids are grouped):

Datapoint id,User id,Some more data
3,1,
1,2,
4,2,
2,3,
5,4,
6,4,
7,4,

So, imagine we have some datapoints (say, measurements), and each datapoint is associated with a user. I’d like to see only the rows showing datapoints for users who have more than one – so essentially this:

Datapoint id,User id,Some more data
1,2,
4,2,
5,4,
6,4,
7,4,

How to do that?

One way is to use the good old uniq. It turns out that you may tell it to skip N characters at the beginning of each line and stop comparing after M characters: -s N -w M. Coupled with -D (“print all duplicate lines”), we get exactly what we need (modulo the lack of the header line, which is not a big issue, and could probably be corrected with some clever invocation of pee and head). One problem with this solution is that it requires the widths of the columns to be constant. In my case it was not a problem, since they were UUIDs, but in the case of integers there would be problems.

Another solution would be to use a tool like AWK. This would probably work very well unless the CSV file contained (escaped) cells with newlines. Since AWK is line-based, and does not know proper CSV syntax, such a solution would break then (unless AWK has a way of dealing with things like that – I suppose it does not, but after all, it’s AWK…).

Yet another idea is to use a tool like Miller. I use Miller from time to time, and I’m almost sure it could handle this particular issue, but I would have to search through its manual to discover the way to do it. And I wanted a quick solution, so teaching myself more advanced AWK or Miller syntax was out of the question.

Well, I could (of course!) write some Elisp. But even though I like command-line tools, and I love Emacs, I wanted something really, really quick. So in the end, I went with the following trick with (gasp!) LibreOffice.

I introduced another column (moving all the other ones to the right), and inserted into the A2 cell a formula like this:

=OR(C2=C1,C2=C3)

and copied it to the whole A column. This way, I had true in A if, and only if, the corresponding cell in the C column had the same value as the next or previous one. Now the only thing that remained was to filter out all the false ones.

So, even though I have some issues with LibreOffice, I have to admit that there is a certain category of tasks where it’s really hard to beat a spreadsheet!

CategoryEnglish, CategoryBlog

Comments on this page

2023-05-13 Dieting with Org mode

Some time ago I started to weigh myself every day. Some people argue that this is a bad idea and you should weigh yourself once per week to avoid worrying about random fluctuations. I disagree – the way not to worry about random fluctuations is to look at moving averages instead, Hacker's Diet-style. Note: I do not necessarily endorse Hacker’s Diet as a whole, but the moving average approach is certainly a good one. For the sake of simplicity, I am going to use simple (unweighted) moving averages over one week.

I want to keep my weight data in an Org mode table (obviously). The first step is to create a capture template:

("W"
 "Weight"
 table-line
 (file+olp "/path/file.org" "Weight" "Data")
 "| %U | %? | "
 :unnarrowed t)

The next step is adding a column with a simple moving average.

| Date+time              | Weight |   MA |
|------------------------+--------+------|
| [2023-04-30 Sun 19:23] |   66.1 |    0 |
| [2023-05-01 Mon 05:13] |   66.2 |    0 |
| [2023-05-04 Thu 04:56] |   66.0 |    0 |
| [2023-05-05 Fri 05:08] |   65.8 |    0 |
| [2023-05-06 Sat 19:23] |   66.0 |    0 |
| [2023-05-07 Sun 05:13] |   65.9 |    0 |
| [2023-05-10 Wed 04:56] |   65.6 | 65.9 |
| [2023-05-12 Fri 05:15] |   65.7 | 65.9 |
| [2023-05-13 Sat 05:07] |   65.8 | 65.8 |
#+TBLFM: $3=vmean(@-6$-1..@0$-1);%.1f::@2$3=0::@3$3=0::@4$3=0::@5$3=0::@6$3=0::@7$3=0

(The data are in kilograms – this is Europe! – and completely fictitious, by the way.) Notice how I apparently forgot to weigh myself on last Thursday. I still do use a moving average over 7 last entries. While it is probably technically incorrect, I don’t care about it – for the past 2 months, there were only two days I didn’t weigh myself, for example, so I can live with data which is skewed just a tiny little bit.

By the way, the zeros in the first six data rows are there because Org table formulas cannot refer to places outside the table (and calculating the mean of 6 numbers and the word Weight doesn’t make sense, either). Of course, I could add cell formulas to put the averages of less than 7 rows in those cells, but why would I do that? As I said, I want to keep thing simple, and since I’ve been weighing myself (almost) every day for more than half a year now, I really don’t care about those first 6 datapoints.

Now I decided it would be great if I could add org-table-recalculate to some hook so that the table could be recalculated every time I capture my weight – but it turns out that Org mode already does that! Now that’s a pleasant surprise!

The last thing I’d like to have is to tell Beeminder that I weighed myself whenever I use Org capture to put a row in my table. Since Org 9.6, I can add several hooks to the capture template (which is great!) – previously I would have to put some function into org-capture-.*-finalize-hook and detect which capture template was used. So, let’s add this to my template:

:prepare-finalize (lambda ()
  (beeminder-submit-datapoint
    "weighing"
    1
    (beeminder-default-comment
      (time-to-seconds (beeminder-current-time)))))

And this is it for today. I’ve been using this code for several weeks now and it works very well for me. One more repetitive task automated!

CategoryEnglish, CategoryBlog, CategoryEmacs

Comments on this page

More...