Although Postgres may seem magical, it really isn’t. When data is stored in Postgres, Postgres in turn stores that data in regular files in the filesystem. In this blog post, we’ll take a look at how Postgres uses files to represent data stored in the database.
First of all, each table in Postgres is represented by one or more underlying files. Each 1GB chunk of the table is stored in a separate file. It is actually pretty easy to find the actual underlying files for a table. To do so, you first need to find the Postgres data directory, which is the directory in which Postgres keeps all of your data. You can find where the data directory is by running SHOW DATA_DIRECTORY;. When I run it locally, I see the following:
> SHOW DATA_DIRECTORY; data_directory ------------------------------ /var/lib/postgresql/9.5/main (1 row)
Now that you know where the Postgres data directory is, you will need to find where the files for the specific table we are looking for is located. To do so, you can use the pg_relation_filepath function with the name of the table you want to find the file for. The function will return the relative filepath of the files from the data directory. Here is what I see when I run the command on a table I have locally:
> SELECT pg_relation_filepath('people'); pg_relation_filepath ---------------------- base/16387/51330 (1 row)
Together with the location of the location of the data directory, this gives us the location of the files for the people table. All of the files are stored in /var/lib/postgresql/9.5/main/base/16387/. The first GB of the table is stored in a file called 51330, the second in a file called 51330.1, the third in 51330.2, and so on. You can actually read and write data to the file yourself, but I heavily suggest not doing so as you will most likely wind up corrupting your database.
Now that we’ve found the actual files, let’s walk through how each file is laid out. Each file is broken up into 8kb chunks, called “pages”1. For example, a 1.5GB table will be stored across two files and 196,608 pages2 and look like the following:
Each row is stored on a single page (with the exception of when a row is too large, in which case a technique called TOAST is used). Pages are the unit of which Postgres reads and writes data to the filesystem. Whenever Postgres reads a row it needs to answer a query from disk , Postgres reads the entire page the row is on. When Postgres writes to a row on a page, it writes a whole new copy of the entire page to disk at one time. Postgres operates in this way for numerous reasons which are outside of the scope of this blog post.
Pages themselves have the following format:
The header is 24 bytes and contains various metadata about the page, including a checksum and information necessary for WAL. The row offsets is of pointers into the rows field, with the Nth pointer pointing to the Nth row. The offsets can be used to quickly lookup an arbitrary row of a page. If we emphasize the individual rows on the page, the page winds up looking like:
The first thing you likely noticed is that the first rows are stored at the back of the page. That is so the offsets and the actual row data can both grow towards the middle. If a new row is inserted, we can allocate a new offset from the front of the free space, and allocate the space for the row from the back of the free space.
As for each row, they have a format that looks like the following:
The header of each row is 23 bytes and includes the transaction ids for MVCC as well as other metadata about the row. Based on the table schema, each field of the row is either a fixed width type or a variable width type. If the field is fixed width, Postgres already knows how long the field is and just stores the field data directly in the row.
If the field is variable width there are two possibilities for how the field is stored. Under normal circumstances, it would be stored directly the row with a header detailing how large the field is. In certain special cases, or when it’s impossible to store the field directly in the row, the field will be stored outside of the row using a technique using TOAST, which we will take a look at in my next post.
To recap, each row is stored on an 8kb page along with several other rows. Each page in turn is part of a 1GB file. While processing a query, when Postgres needs to fetch a row from disk, Postgres will read the entire page the row is stored on. This is, at a high level, how Postgres represents data stored in it on disk.