Witam na mojej prywatnej stronie internetowej!

[If this is all Polish to you, click here: English]

Jesteś tu pierwszy raz lub nie wiesz, gdzie czegoś szukać? Przeczytaj pomoc!
Chcesz wiedzieć, dlaczego możesz mi ufać, ale nie do końca? Przeczytaj, Kto za tym stoi!
Masz zamiar skomentować coś, co napisałem ja bądź ktoś z moich znajomych? Przeczytaj o komentarzach!
Nie wiesz, kim jestem? Przeczytaj informację o mnie!
Ciekawi Cię, co ostatnio napisałem na swoim blogu? Ostatni wpis znajduje się poniżej!

Uwaga: z oczywistych powodów nie mogę zagwarantować swojej nieomylności, choć staram się o zgodność tego, co piszę, z Prawdą. Jest również oczywiste, że nie gwarantuję takiej zgodności w przypadku komentarzy. Umieszczenie linku do strony spoza niniejszego serwisu nie musi oznaczać, że podzielam poglądy autora tej strony, a jedynie, że uważam ją za wartościową z takich czy innych powodów.

Marcin ‘mbork’ Borkowski

2026-03-14 Selecting a sample of a PostgreSQL table

Today I have a nice SQL tip about sampling tables in SQL. (I’m writing about PostgreSQL as usual but this applies to other databases, too, since this feature is in the SQL standard.) Imagine you have a large table and you want to get a general feeling about what its contents look like. Saying select * from my_table limit 10 may be a nice idea, but it doesn’t always suffice. For example, if you have a table where you never delete or update rows, only insert them, that query will often just show the earliest 10 rows, and you might want to see how the contents of the table changed over time.

You can say, for example,

select * from my_table tablesample bernoulli (5)

and see about 5 percent of rows from table my_table. (It’s approximate because each row is included in the result with a probability of 5%, so depending on the RNG there may be more or fewer rows in the result.) The mysterious bernoulli keyword tells Postgres that the decision whether to include a row or not must be made for each row individually; if you use system instead, these decisions are made for pages (by default, a page contains 8kB of data, so if your row contains, say, half a kilobyte of data on average, then a page may have about 16 rows – or probably 15, since there is some overhead for the page header), which is “less random” but much faster.

One more thing worth knowing is the repeatable keyword, which allows to provide a random seed. If you say

select * from my_table tablesample bernoulli (5) repeatable (1337)

you will get the exact same 5% of rows of the underlying table every time (provided that you don’t make any changes to its contents, of course).

That’s it for today, see you next time!

CategoryEnglish, CategoryBlog, CategoryPostgreSQL

Comments on this page

More...

Strona domowa

2026-03-14 Selecting a sample of a PostgreSQL table