Replication in PostgreSQL - basics

2015-05-19T15:21:00 BST │   Comment

People familiar with MySQL are usually a bit confused with how replication is done in PostgreSQL. This is due to the fact that there are many ways to do this depending on what you want to achieve.

Probably the most common form of replication used with MySQL is statement-based replication. I even done it myself once or twice. It's easy to set up, doesn't eat up a lot of space, can be audited with text tools. That probably all the advantages. Back in the days this was the only way to do it in MySQL.

In contrast, in PostreSQL we can choose if we want to replicate the whole database cluster1 or only some tables, do we want to use master-slave or multi-master relationship and if we want synchronous or asynchronous replication.

In this post I'll try to explain the basics about replication in Postgres. Obviously though you shouldn't treat it as an exhaustive overview.

So depending on the goal there are few options. Let me break them down for you.

Suppose you are a retailer with a portfolio of few brands each of which has it's own database. But the products are actually the same and the brands are sharing a central warehouse. So you have to have a way of keeping the products and stock tables in both databases in sync, otherwise you risk selling an item that you don't have.
There are few ways to approach this.

  1. You can have a "main" instance of products and stock writable only by a dedicated product-stock-management application, replicate them in a master-slave mode to the other database and grant read-only access to your webapp users.
  2. You can set it up like above but instead of replication use FDW2.
  3. You can use multi-master replication software with some type of conflict handling.

In the first case you would need a trigger-based replication system that would allow you to replicate only given tables without the need to synchronise both databases in full. One of the best tools there are to do this is Slony-I. I've used it in a few projects for some big clients and it was perfect. Actually, I once had a setup exactly like this running.
Slony-I is robust, proven and a breeze to administer once you get familiar with it's interfaces (I really like Slony's perl tools).

Second case uses no replication, I just mentioned it for the sake of completeness. Used a similar setup once, though postgres_fdw was quite slow with large tables.

Third case would require you to use also a trigger-based system, but with the ability to do multi-master replication. I tried Bucardo once but decided to use SymmetricDS instead. Actually I've implemented a quite complex "central warehouse" solution with it and from what I hear it's still working very well.

Another scenario might be when you have a reporting team working on the data that is generated by your webapp, but don't want them to mess up your well-balanced live database server with their heavy reporting queries.
The obvious choice would be to set up a replicated standby database and point your reporting guys at it. You don't need third-party tools to do that - a native PostgreSQL technology like streaming replication would be perfectly fine for the job. If you want a simple way of managing your setup, I would recommend repmgr. Used it myself to do exactly what I described.

While trigger-based systems are by their nature asynchronous, you can choose if you want your streaming replication to be synchronous or asynchronous.

The difference is that with asynchronous replication your database will report a transaction as commited when the master reports it complete; with synchronous replication the master will wait for slave(s) to report successful write and only then will the transaction be reported as commited to the user.

So there you have it - you can either replicate single tables or the whole cluster. The former relays on third-party trigger-based solutions and is asynchronous whilst the latter can be done with native streaming replication (or WAL shipping) and can be both asynchronous or synchronous.

  1. This term is used in Postgres for one particular installation that can consist of many databases each of which can contain may schemas. ↩

  2. Foreign Data Wrapper - more here ↩

If you like this article please consider sharing it with your friends.