Postgres TOAST

TOAST aka The Oversized Attribute Storage Technique aka the best thing since sliced bread, is a technique Postgres uses to store large values. Under the normal Postgres data layout, every row is stored on 8kb pages, with no row spanning multiple pages. To support rows that can be larger than 8kb, Postgres makes use of TOAST.

If Postgres is about to store a row and the row is over 2kb, TOAST will kick in and Postgres will first attempt to shrink the row by compressing the large variable width fields of the row. If the row is still over 2kb after compressing the fields of it, Postgres will then repeatedly store the large fields outside of the row until the row is under 2kb1. To do so, Postgres splits up the compressed value into individual chunks of ~2kb2. Each of those chunks are then stored in a “TOAST table”. Every regular Postgres table has a corresponding TOAST table in which the TOASTed values are stored. A value stored in this manner is commonly referred to as “TOASTed”.

Every TOAST table has three columns. It has a column called chunk_id which is used to distinguish the specific TOASTed values each chunk is for. All chunks of the same toasted value have the same chunk_id. The chunk_id is what is stored in the original row and is what allows Postgres to determine what chunks are for a given value. The second field of a TOAST table is chunk_seq which determines the ordering of the chunks with the same chunk_id. The first chunk of a TOASTed value has chunk_seq=0, the second has chunk_seq=1 and so on. The last column is chunk_data which contains the actual data for the TOASTed value.

At query time, when a TOASTed value is needed, Postgres will use an index on the TOAST table on (chunk_id, chunk_seq) to lookup all of the chunks with a given chunk_id sorted by chunk_seq. From there, it can stitch all of the chunks back together and decompress that result to obtain the original value of the row.

Under certain circumstances, TOAST can actually make queries faster. If a TOASTed field isn’t needed to answer a query, Postgres doesn’t have to read the chunks for the TOASTed value, and can skip reading the value into memory. In some cases, this will dramatically reduce the amount of disk I/O Postgres needs to perform to answer a query.

You can actually access the TOAST table for a given table directly and inspect the values stored in it. As a demonstration, let’s create a table messages that has a single column message:

CREATE TABLE messages (message text);

We can then insert a few random strings to be TOASTed:

INSERT INTO messages
        string_agg(chr(floor(random() * 26)::int + 65), '')
        FROM generate_series(1,10000)) 
FROM generate_series(1,10);

Now that we have a table with values we know are toasted, we first need to lookup the name of the TOAST table for the messages table. We can do this by looking up the name of the corresponding TOAST table with the following query:

> SELECT reltoastrelid::regclass 
> FROM pg_class 
> WHERE relname = 'messages';
(1 row)

The query pulls information from the pg_class Postgres table, which is a table where Postgres stores metadata about tables.

Now that we have the name of the TOAST table, we can read from it just like any other table:

> SELECT * FROM pg_toast.pg_toast_59611;
 chunk_id | chunk_seq | chunk_data
    59617 |         0 | \x4c4457...
    59617 |         1 | \x424d4b...

Note that the chunk_data is the binary of the compressed version of the field, so it isn’t exactly human readable.

Overall, TOAST is a clever technique that reuses the ordinary Postgres storage technique to store larger values. It’s completely transparent to the user, yet if you really want to, you can dig into your TOAST tables and see exactly how Postgres is storing your data.

  1. If a row still does not fit, that means it must have lots of columns. It’s difficult to reach this case without an unreasonable design. I believe Postgres won’t let you create a table with columns where this is possible.
  2. The size of a chunk is such that 4 chunks can fit on a single page.

Leave a Reply

Your email address will not be published. Required fields are marked *