Announcing Perfalytics: Never Run EXPLAIN ANALYZE Again

Today I’m announcing the beta for Perfalytics. Perfalytics is a tool I’ve been working on designed to make it easy to analyze query performance for Postgres. As the lead of the Database team at Heap, where I optimize the performance of a Postgres cluster with more than 1PB of data, I’ve done my fair share of Postgres performance analysis. I believe the standard way of debugging a slow Postgres query is tedious and cumbersome.

Debugging a slow Postgres query usually goes something like this. Given the slow query, you run EXPLAIN ANALYZE on it. This will rerun the query with performance instrumentation. This gives you the query plan of the query and how much time was spent processing each part of the query plan.

If you’re lucky, the slow query is reproducible and you will see why the query was slow. In my experience, this is frequently not the case.  All too often, by the time you run the query with EXPLAIN ANALYZE, something has changed and the query is now fast. This can happen for a number of reasons including caching, the makeup of the table changing, decreased server load, and many other potential factors.

The idea behind Perfalytics is to turn this way of doing things on it’s head. Instead of running EXPLAIN ANALYZE after the fact to try to reproduce the slow query, why not gather the exact same information at query time? That way if the query is slow, you can see exactly why that particular run of the query was slow.

That’s right. Perfalytics records the query plan and various query statistics for every single query at query time. This means you have all the information you need to debug why a particular query was slow. There’s no need to try to reproduce the problem after the fact.

In addition to making it easier to debug slow queries, collecting this information has a number of other advantages. Because you are collecting this information for every single query, you can look at overall trends. For example, you can, for each slow query, programmatically classify why exactly that query was slow. This lets you see how often different kinds of slow queries come up and tells you exactly where you should focus your efforts.

Once you try using something like Perfalytics, you’ll never want to go back to the old way of doing things.

If you want to try out Perfalytics, sign up for an beta invite.

If you have any questions, feel free to contact me at michaelmalis2@gmail.com.

3 thoughts on “Announcing Perfalytics: Never Run EXPLAIN ANALYZE Again

  1. Interesting!

    Does it use Common Lisp inside?

    Is it possible to install on my server?

    How much performance hit does this instrumentation do?

    How does it differ from pg_stat_statements extension?

    How does it behave under a high load?

    Where it stores all gathered information?

    1. Hey Alexander,

      Excellent questions!

      > Does it use Common Lisp inside?

      At the moment Perfalytics is written primarily in Golang. I’m using Go mainly because:

      a) I wanted to use something like AWS lambda to limit the need for servers.
      b) I don’t know any of the other languages supported by AWS lambda that well.

      > Is it possible to install on my server?

      There will be two parts to Perfalytics. There’s the tracker which is a Postgres extension which you will have to install. The tracker will then send data to the Perfalytics backend. Initially you won’t be able to run the backend yourself. If there’s enough interest I may make it possible to run it yourself.

      > How much performance hit does this instrumentation do?

      It varies. When I did benchmarks myself, it resulted in an ~15% increase in p95. This is highly dependent on what kind of workload you are running and the kind of hardware you are using.

      > How does it differ from pg_stat_statements extension?

      The pg_stat_statements collects a lot less data. This limits the analysis you can do with it. For some examples:

      a) pg_stat_statements doesn’t collect query plans. This means you have no understanding of why a query was slow. It only tells you what queries are slow.
      b) pg_stat_statements gives you aggregations. An issue may be that a query on average is fast, but 5% of the time, it’s way slower. It’s impossible to tell if this happens with pg_stat_statements.
      c) Because pg_stat_statements only gives you aggregations, you can’t do any time based analysis. As an example, you may want to see whether a query is faster this week compared to last week. You can’t do that with pg_stat_statements, but that is something you can do with Perfalytics.

      > How does it behave under a high load?

      I’m going to do a number of experiments on the tracker to see what it’s failure cases are.

      > Where it stores all gathered information?

      See answer above to “Is it possible to install on my server?”. Initially it will only support storing the data on the Perfalytics backend. If there’s enough interest, I might work on a version that you can self host.

Leave a Reply

Your email address will not be published. Required fields are marked *