Lately I have been doing a lot of learning about PostgreSQL and how it can be used to replace other databases, such as Oracle RDBMS, MySQL, etc. One of the things that I have been looking to most recently is how to make PostgreSQL highly available. In Oracle you would use RAC, however in PostgreSQL you have streaming replication, but that only leaves for a single master server and technically an unlimited number of slaves. In reality most web based applications these days are 90% read and about 10% write so having tons of slaves for read-only queries is awesome, but what if your master goes down?
That is where Patroni comes in. Patroni is a framework that handles the auto failover of the master instance of PostgreSQL between multiple servers. However Patroni alone won’t do this for you, you will need some other software as well. The other two pieces of software that I used was etcd and haproxy. etcd will be the quorum system and haproxy will be the “load balancer” so that your applications only have to have one hostname to point to.
As a small example setup, I used 3 machines running PostgreSQL 10, 1 machine running etcd and one machine running haproxy. What the rest of this post will be about is setting up the different machines and software on them.
How to setup Centos 7 + PostgreSQL 10 + Patroni
All Machines Software Installs:
The first thing to do is install CentOS 7, fully patch it and record the IP address of each machine if they are on DHCP. I used the minimal install so there wasn’t a lot of extra software on the machines.
PostgreSQL 10 Servers:
For all the PostgreSQL servers the following packages will need to be installed: gcc, python-devel, epel-release
yum install -y gcc python-devel epel-release
After you have those installed (in particular the epel-release one) you can install the following : python2-pip
yum install -y python2-pip
Next you need to add the PostgreSQL Yum Repo from: https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7-x86_64/pgdg-centos10-10-2.noarch.rpm
yum install -y https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7-x86_64/pgdg-centos10-10-2.noarch.rpm
Once you have that RPM installed you can install the following packages: postgresql10-server postgresql10
yum install -y postgresql10-server postgrseql10
etcd server:
The following packages needs to be installed on the etcd server: gcc,python-devel, epel-release
yum install -y gcc python-devel epel-release
After epel is installed, you can then install etcd:
yum install -y etcd
haproxy server:
The following packages are needed on the haproxy server: epel-release and then haproxy.
yum install -y epel-release
yum install -y haproxy
Software Configuration
Postgresql Servers
Now that the software has been installed, it is time to configure the components. (Everything here is ran as root)
We will start with the PostgreSQL Servers, mine are named pg01 (10.0.2.124), pg02 (10.0.2.125), and pg03(10.0.2.126) respectively. The following needs to be done on each of the 3 servers:
First some pip items:
pip install --upgrade setuptools pip install patroni pip install python-etcd pip install psycopg2-binary
Next we are going to create a systemd service for patroni. So edit the file /etc/systemd/system/patroni.service to contain the following:
[Unit] Description=Runners to orchestrate a high-availability PostgreSQL After=syslog.target network.target [Service] Type=simple User=postgres Group=postgres ExecStart=/bin/patroni /etc/patroni.yml KillMode=process TimeoutSec=30 Restart=no [Install] WantedBy=multi-user.targ
Next create a /etc/patroni.yml file. This file will control the startup/shutdown, etc of the postgres instance. Here is an example of mine from my pg01 server. In it, you will need to replace IP addresses for your own servers and usernames and passwords.
scope: postgres name: pg01 restapi: listen: 10.0.2.124:8008 connect_address: 10.0.2.124:8008 etcd: host: 10.0.2.128:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true initdb: - encoding: UTF8 - data-checksums pg_hba: - host replication replicator 127.0.0.1/32 md5 - host replication replicator 10.0.2.124/0 md5 - host replication replicator 10.0.2.125/0 md5 - host replication replicator 10.0.2.126/0 md5 - host all all 0.0.0.0/0 md5 users: admin: password: admin options: - createrole - createdb postgresql: listen: 10.0.2.124:5432 bin_dir: /usr/pgsql-10/bin connect_address: 10.0.2.124:5432 data_dir: /data/patroni pgpass: /tmp/pgpass authentication: replication: username: replicator password: PASSWORD superuser: username: postgres password: PASSWORD parameters: unix_socket_directories: '.' tags: nofailover: false noloadbalance: false clonefrom: false nosync: false
etcd server
Now lets move on to the etcd server. The only thing on the etcd server that needs edited is the /etc/etcd/etcd.conf file. Here is what I changed in mine:
ETCD_LISTEN_PEER_URLS="http://10.0.2.128:2380" ETCD_LISTEN_CLIENT_URLS="http://localhost:2379,http://10.0.2.128:2379" ETCD_NAME="etcd0" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.2.128:2380" ETCD_ADVERTISE_CLIENT_URLS="http://10.0.2.128:2379" ETCD_INITIAL_CLUSTER="etcd0=http://10.0.2.128:2380" ETCD_INITIAL_CLUSTER_TOKEN="cluster1" ETCD_INITIAL_CLUSTER_STATE="new"
Some of the above lines may be commented out, if so uncomment them and replace the values.
HAProxy Server
Now to config the HAProxy Server. Replace the /etc/haproxy/haproxy.cfg with the following:
global maxconn 100 log 127.0.0.1 local2 defaults log global mode tcp retries 2 timeout client 30m timeout connect 4s timeout server 30m timeout check 5s listen stats mode http bind *:7000 stats enable stats uri / listen postgres bind *:5000 option httpchk http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server postgresql_pg01_5432 10.0.2.124:5432 maxconn 100 check port 8008 server postgresql_pg02_5432 10.0.2.125:5432 maxconn 100 check port 8008 server postgresql_pg03_5432 10.0.2.126:5432 maxconn 100 check port 8008
N.B. The logging setup in the haproxy config requires that you set up rsyslog to allow logging to local files and to local2.*
Starting it all up
So assuming that I haven’t missed anything above, and there are no typo’s on my or your part, we can start starting things up. But first some notes. Something in SElinux breaks some of this and I haven’t had enough time to look in to it, so disable SElinux for now (setenforce 0). In addition I did not add any ports to the firewalld, as this was just a test setup, so I disabled the firewall on all 5 machines. You can leave it enabled and then using the configs above add the appropriate ports to be allowed between the appropriate machines.
So easy starts first:
Start the etcd server:
systemctl start etcd
Start the haproxy server:
systemctl start haproxy
Now for the PostgreSQL servers there are a few things to do before we start them. On each PostgreSQL host do the following:
mkdir -p /data/patroni chown -R postgres:postgres /data chmod -R 700 /data
Now we can start Patroni and see if everything works. When you start Patroni, it will start PostgreSQL in the background and create the first database and set the username and password for the postgres user to what you specify in the /etc/patroni.yml file.
systemctl start patroni
On the first host that you start Patroni on if you run systemctl status patroni you should see that it is running and active. In the /var/log/messages you should see something like :
Sep 2 19:53:08 pg01 patroni: 2018-09-02 19:53:08,628 INFO: Lock owner: pg01; I am pg01 Sep 2 19:53:08 pg01 patroni: 2018-09-02 19:53:08,636 INFO: no action. i am the leader with the lock
When you start the two slaves you should see something similar to this in the /var/log/messages:
Sep 2 19:54:17 pg02 patroni: 2018-09-02 19:54:17,828 INFO: Lock owner: pg01; I am pg02 Sep 2 19:54:17 pg02 patroni: 2018-09-02 19:54:17,829 INFO: does not have lock Sep 2 19:54:17 pg02 patroni: 2018-09-02 19:54:17,836 INFO: no action. i am a secondary and i am following a leader
Now if you switch to the web interface of HAProxy you should see something like this:
What this image shows is that the pg01 server is the active master server. If you were to shut down pg01, then one of the other two will become the new master and remain the master until you either promote another server or the new master goes down.
Now that everything is running, you would point your clients to the haproxy address on port 5000.
Closing Comments
While this describes how to setup an HA Environment for PostgreSQL, there are 2 single points of failure with this example setup. One is the etcd server, which should be clustered, the other is the haproxy which should be clustered as well. In addition I didn’t cover setting up read-only slaves which I may do at some point in the future.