Lately I have been doing a lot of learning about PostgreSQL and how it can be used to replace other databases, such as Oracle RDBMS, MySQL, etc. One of the things that I have been looking to most recently is how to make PostgreSQL highly available. In Oracle you would use RAC, however in PostgreSQL you have streaming replication, but that only leaves for a single master server and technically an unlimited number of slaves. In reality most web based applications these days are 90% read and about 10% write so having tons of slaves for read-only queries is awesome, but what if your master goes down?

That is where Patroni comes in. Patroni is a framework that handles the auto failover of the master instance of PostgreSQL between multiple servers. However Patroni alone won’t do this for you, you will need some other software as well. The other two pieces of software that I used was etcd and haproxy. etcd will be the quorum system and haproxy will be the “load balancer” so that your applications only have to have one hostname to point to.

As a small example setup, I used 3 machines running PostgreSQL 10, 1 machine running etcd and one machine running haproxy. What the rest of this post will be about is setting up the different machines and software on them.

How to setup Centos 7 + PostgreSQL 10 + Patroni

All Machines Software Installs:

The first thing to do is install CentOS 7, fully patch it and record the IP address of each machine if they are on DHCP. I used the minimal install so there wasn’t a lot of extra software on the machines.

PostgreSQL 10 Servers:

For all the PostgreSQL servers the following packages will need to be installed: gcc, python-devel, epel-release

yum install -y gcc python-devel epel-release

After you have those installed (in particular the epel-release one) you can install the following : python2-pip

yum install -y python2-pip

Next you need to add the PostgreSQL Yum Repo from: https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7-x86_64/pgdg-centos10-10-2.noarch.rpm

yum install -y https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7-x86_64/pgdg-centos10-10-2.noarch.rpm

Once you have that RPM installed you can install the following packages: postgresql10-server postgresql10

yum install -y postgresql10-server postgrseql10

etcd server:

The following packages needs to be installed on the etcd server: gcc,python-devel, epel-release

yum install -y gcc python-devel epel-release

After epel is installed, you can then install etcd:

yum install -y etcd

haproxy server:

The following packages are needed on the haproxy server: epel-release and then haproxy.

yum install -y epel-release

yum install -y haproxy

Software Configuration

Postgresql Servers

Now that the software has been installed, it is time to configure the components. (Everything here is ran as root)

We will start with the PostgreSQL Servers, mine are named pg01 (10.0.2.124), pg02 (10.0.2.125), and pg03(10.0.2.126) respectively. The following needs to be done on each of the 3 servers:

First some pip items:

pip install --upgrade setuptools
pip install patroni
pip install python-etcd
pip install psycopg2-binary

Next we are going to create a systemd service for patroni. So edit the file /etc/systemd/system/patroni.service to contain the following:


[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=syslog.target network.target

[Service]
Type=simple

User=postgres
Group=postgres

ExecStart=/bin/patroni /etc/patroni.yml

KillMode=process

TimeoutSec=30

Restart=no

[Install]
WantedBy=multi-user.targ

Next create a /etc/patroni.yml file. This file will control the startup/shutdown, etc of the postgres instance. Here is an example of mine from my pg01 server. In it, you will need to replace IP addresses for your own servers and usernames and passwords.

scope: postgres
name: pg01

restapi:
    listen: 10.0.2.124:8008
    connect_address: 10.0.2.124:8008

etcd:
    host: 10.0.2.128:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
            use_pg_rewind: true

    initdb:
    - encoding: UTF8
    - data-checksums

    pg_hba:
    - host replication replicator 127.0.0.1/32 md5
    - host replication replicator 10.0.2.124/0 md5
    - host replication replicator 10.0.2.125/0 md5
    - host replication replicator 10.0.2.126/0 md5
    - host all all 0.0.0.0/0 md5

    users:
        admin:
            password: admin
            options:
                - createrole
                - createdb

postgresql:
    listen: 10.0.2.124:5432
    bin_dir: /usr/pgsql-10/bin
    connect_address: 10.0.2.124:5432
    data_dir: /data/patroni
    pgpass: /tmp/pgpass
    authentication:
        replication:
            username: replicator
            password: PASSWORD
        superuser:
            username: postgres
            password: PASSWORD
    parameters:
        unix_socket_directories: '.'

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

etcd server

Now lets move on to the etcd server. The only thing on the etcd server that needs edited is the /etc/etcd/etcd.conf file. Here is what I changed in mine:

ETCD_LISTEN_PEER_URLS="http://10.0.2.128:2380"
ETCD_LISTEN_CLIENT_URLS="http://localhost:2379,http://10.0.2.128:2379"
ETCD_NAME="etcd0"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.2.128:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.2.128:2379"
ETCD_INITIAL_CLUSTER="etcd0=http://10.0.2.128:2380"
ETCD_INITIAL_CLUSTER_TOKEN="cluster1"
ETCD_INITIAL_CLUSTER_STATE="new"

Some of the above lines may be commented out, if so uncomment them and replace the values.

HAProxy Server

Now to config the HAProxy Server. Replace the /etc/haproxy/haproxy.cfg with the following:

global
        maxconn 100
        log     127.0.0.1 local2

defaults
        log global
        mode tcp
        retries 2
        timeout client 30m
        timeout connect 4s
        timeout server 30m
        timeout check 5s

listen stats
        mode http
        bind *:7000
        stats enable
        stats uri /

listen postgres
        bind *:5000
        option httpchk
        http-check expect status 200
        default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
        server postgresql_pg01_5432 10.0.2.124:5432 maxconn 100 check port 8008
        server postgresql_pg02_5432 10.0.2.125:5432 maxconn 100 check port 8008
        server postgresql_pg03_5432 10.0.2.126:5432 maxconn 100 check port 8008

N.B. The logging setup in the haproxy config requires that you set up rsyslog to allow logging to local files and to local2.*

Starting it all up

So assuming that I haven’t missed anything above, and there are no typo’s on my or your part, we can start starting things up. But first some notes. Something in SElinux breaks some of this and I haven’t had enough time to look in to it, so disable SElinux for now (setenforce 0). In addition I did not add any ports to the firewalld, as this was just a test setup, so I disabled the firewall on all 5 machines. You can leave it enabled and then using the configs above add the appropriate ports to be allowed between the appropriate machines.

So easy starts first:

Start the etcd server:

systemctl start etcd

Start the haproxy server:

systemctl start haproxy

Now for the PostgreSQL servers there are a few things to do before we start them. On each PostgreSQL host do the following:

mkdir -p /data/patroni
chown -R postgres:postgres /data
chmod -R 700 /data

Now we can start Patroni and see if everything works. When you start Patroni, it will start PostgreSQL in the background and create the first database and set the username and password for the postgres user to what you specify in the /etc/patroni.yml file.

systemctl start patroni

On the first host that you start Patroni on if you run systemctl status patroni you should see that it is running and active. In the /var/log/messages you should see something like :

Sep 2 19:53:08 pg01 patroni: 2018-09-02 19:53:08,628 INFO: Lock owner: pg01; I am pg01
Sep 2 19:53:08 pg01 patroni: 2018-09-02 19:53:08,636 INFO: no action. i am the leader with the lock

When you start the two slaves you should see something similar to this in the /var/log/messages:

Sep 2 19:54:17 pg02 patroni: 2018-09-02 19:54:17,828 INFO: Lock owner: pg01; I am pg02
Sep 2 19:54:17 pg02 patroni: 2018-09-02 19:54:17,829 INFO: does not have lock
Sep 2 19:54:17 pg02 patroni: 2018-09-02 19:54:17,836 INFO: no action. i am a secondary and i am following a leader

Now if you switch to the web interface of HAProxy you should see something like this:

What this image shows is that the pg01 server is the active master server. If you were to shut down pg01, then one of the other two will become the new master and remain the master until you either promote another server or the new master goes down.

Now that everything is running, you would point your clients to the haproxy address on port 5000.

Closing Comments

While this describes how to setup an HA Environment for PostgreSQL, there are 2 single points of failure with this example setup. One is the etcd server, which should be clustered, the other is the haproxy which should be clustered as well. In addition I didn’t cover setting up read-only slaves which I may do at some point in the future.

So I will preface this post by saying I love Splunk, it is the best log aggregation application out there.

So on with the post, and it must be a good one right? Anyways, Splunk released version 7.0 of their Splunk Enterprise product last week during their .Conf 2017 conference, which I was an FTR at. There were a few new features in it that were amazing, such as the new metrics index type which was blazingly fast. So like all “fanboys” of anything I decided to update my home server on Thursday night after the conference was over. This is where the fun began.

First when I started using Splunk years ago, it supported a myriad of operating systems for the servers. If you wanted Solaris, FreeBSD, AIX, HP-UX, MacOSX, Windows or Linux you were golden. However over the years that list has been pared back to now just Linux and Windows. (MacOSX is supported, but only for the free and trial editions. Basically used for development and home use, not for enterprise use.)

So now that Solaris is no longer supported, I needed to switch my home system from OpenIndiana (aka OpenSolaris) over to Linux. With that I spun up a new CentOS 7 VM on my home server, and copied over all my Splunk data from the Solaris one to the Linux one. I then removed the bin and lib directories (I use the tar installs and that is they only place machine specific binaries exist.) With that done, I untarred the Linux Splunk 7.0 over top my current directory and started it up. So far everything was good, until I tried to do a search. If it was a search for like the last 15 minutes it worked, but anything over that was dead because one of the hot buckets was corrupted. I am not sure if it happened during the transit or what. So off to the fsck commmand to try to fix them. An hour or so later it couldn’t fix some of them, so it was getting late and I just went to bed.

The next day when I returned home I tried to log in to my Splunk instance to see how it was doing, to my surprise I couldn’t even log in to it. It appeared that the linux host had crashed. I was dumbfounded as I hadn’t seen an actual kernel panic like that in a while. So I restarted the machine and started splunk back up and everything was working again.

A few days past and I went to check on it again, and once again it was dead. So now I am really curious. I ended up installing the crash utilities on the host and started going through the vmcore files. Yup each time it crashed it was splunkd that caused it. Unfortunately I don’t know much more than that as to what is actually causing it to happen. It appears to happen at random times.

The output of crash shows this:
KERNEL: /usr/lib/debug/lib/modules/3.10.0-693.2.2.el7.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2017-10-01-12:26:50/vmcore [PARTIAL DUMP] CPUS: 8 DATE: Sun Oct 1 12:26:43 2017 UPTIME: 1 days, 17:12:55 LOAD AVERAGE: 0.00, 0.01, 0.05 TASKS: 330 NODENAME: splunk RELEASE: 3.10.0-693.2.2.el7.x86_64 VERSION: #1 SMP Tue Sep 12 22:26:13 UTC 2017 MACHINE: x86_64 (3399 Mhz) MEMORY: 3 GB PANIC: "double fault: 0000 [#1] SMP " PID: 1420 COMMAND: "splunkd" TASK: ffff8800bae91fa0 [THREAD_INFO: ffff880000120000] CPU: 3 STATE: TASK_RUNNING (PANIC)

First few lines of the “bt” output from crash:
PID: 1420 TASK: ffff8800bae91fa0 CPU: 3 COMMAND: "splunkd" #0 [ffff8800bfac4d88] machine_kexec at ffffffff8105c4cb #1 [ffff8800bfac4de8] __crash_kexec at ffffffff81104a32 #2 [ffff8800bfac4eb8] crash_kexec at ffffffff81104b20 #3 [ffff8800bfac4ed0] oops_end at ffffffff816ad2b8 #4 [ffff8800bfac4ef8] die at ffffffff8102e97b #5 [ffff8800bfac4f28] do_double_fault at ffffffff8102b6e2 #6 [ffff8800bfac4f50] double_fault at ffffffff816b6908 [exception RIP: page_fault+13] RIP: ffffffff816ac52d RSP: ffff880000122fc8 RFLAGS: 00010092 RAX: 0000000000000ff8 RBX: 0000000000000000 RCX: ffffffff816ac2ac RDX: 00001fffffffffff RSI: ffffffff81a73118 RDI: 0000000000000000 RBP: ffff880000123098 R8: ffffffff81911167 R9: ffffea00002e7b80 R10: ffffea00002e7b80 R11: 0000000000000000 R12: ffffffff81a73118 R13: ffffffff81a73118 R14: ffff880000120000 R15: ffff88008665a580 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 --- --- #7 [ffff880000122fc8] page_fault at ffffffff816ac52d #8 [ffff880000123048] spurious_fault at ffffffff816afd8e #9 [ffff8800001230a0] __do_page_fault at ffffffff816b01ae #10 [ffff880000123100] do_page_fault at ffffffff816b0325 #11 [ffff880000123130] page_fault at ffffffff816ac548 [exception RIP: spurious_fault+48] RIP: ffffffff816afd8e RSP: ffff8800001231e8 RFLAGS: 00010002 RAX: 0000000000000ff8 RBX: 0000000000000000 RCX: ffffffff816ac2ac RDX: 00001fffffffffff RSI: ffffffff81a73118 RDI: 0000000000000000 RBP: ffff880000123208 R8: ffffffff81911167 R9: ffffea00002e7b80 R10: ffffea00002e7b80 R11: 0000000000000000 R12: ffffffff81a73118 R13: ffffffff81a73118 R14: ffff880000120000 R15: ffff88008665a580 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #12 [ffff880000123210] __do_page_fault at ffffffff816b01ae #13 [ffff880000123270] do_page_fault at ffffffff816b0325 #14 [ffff8800001232a0] page_fault at ffffffff816ac548 [exception RIP: spurious_fault+48] RIP: ffffffff816afd8e RSP: ffff880000123358 RFLAGS: 00010002 RAX: 0000000000000ff8 RBX: 0000000000000000 RCX: ffffffff816ac2ac RDX: 00001fffffffffff RSI: ffffffff81a73118 RDI: 0000000000000000 RBP: ffff880000123378 R8: ffffffff81911167 R9: ffffea00002e7b80 R10: ffffea00002e7b80 R11: 0000000000000000 R12: ffffffff81a73118 R13: ffffffff81a73118 R14: ffff880000120000 R15: ffff88008665a580 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000

Output from the “vm” command:
PID: 1420 TASK: ffff8800bae91fa0 CPU: 3 COMMAND: "splunkd" MM PGD RSS TOTAL_VM ffff88008665a580 ffff8800b87f4000 324076k 864244k VMA START END FLAGS FILE ffff8800a5efe5e8 563d22303000 563d248b7000 8000875 /splunk/splunk/bin/splunkd ffff8800a5efe438 563d248b7000 563d24964000 8100873 /splunk/splunk/bin/splunkd ffff8800a5efe288 563d24964000 563d249db000 8100073 ffff880097e90438 7f4f8d800000 7f4f8ea00000 8200073 ffff8800a1f6cd80 7f4f90400000 7f4f91c00000 8200073 ffff8800a1f6ca20 7f4f923f7000 7f4f923f8000 8100070 ffff8800a1f6c948 7f4f923f8000 7f4f925f8000 8100073 ffff8800a1f6c6c0 7f4f925f8000 7f4f925f9000 8100070 ffff8800a1f6c798 7f4f925f9000 7f4f927f9000 8100073 ffff8800a1f6cca8 7f4f927f9000 7f4f927fa000 8100070 ffff8800a1f6c870 7f4f927fa000 7f4f929fa000 8100073 ffff88009ce66948 7f4f935ff000 7f4f93600000 8100070 ffff88009ce66a20 7f4f93600000 7f4f93a00000 8100073 ffff88009466aca8 7f4f93a00000 7f4f94800000 8200073 ffff88009ca98e58 7f4f949cd000 7f4f949e3000 8000075 /usr/lib64/libresolv-2.17.so ffff88009ca98d80 7f4f949e3000 7f4f94be3000 8000070 /usr/lib64/libresolv-2.17.so ffff88009ca99008 7f4f94be3000 7f4f94be4000 8100071 /usr/lib64/libresolv-2.17.so ffff88009ca990e0 7f4f94be4000 7f4f94be5000 8100073 /usr/lib64/libresolv-2.17.so ffff88009ca98f30 7f4f94be5000 7f4f94be7000 8100073 ffff88009ca98bd0 7f4f94be7000 7f4f94bec000 8000075 /usr/lib64/libnss_dns-2.17.so ffff88009ca98af8 7f4f94bec000 7f4f94deb000 8000070 /usr/lib64/libnss_dns-2.17.so ffff88009ca98ca8 7f4f94deb000 7f4f94dec000 8100071 /usr/lib64/libnss_dns-2.17.so ffff88009ca991b8 7f4f94dec000 7f4f94ded000 8100073 /usr/lib64/libnss_dns-2.17.so ffff88009ca98798 7f4f94ded000 7f4f94df9000 8000075 /usr/lib64/libnss_files-2.17.so ffff88009ca986c0 7f4f94df9000 7f4f94ff8000 8000070 /usr/lib64/libnss_files-2.17.so ffff88009ca98948 7f4f94ff8000 7f4f94ff9000 8100071 /usr/lib64/libnss_files-2.17.so ffff88009ca98a20 7f4f94ff9000 7f4f94ffa000 8100073 /usr/lib64/libnss_files-2.17.so ffff88009ca98870 7f4f94ffa000 7f4f95000000 8100073 ffff8800a1f6ce58 7f4f95000000 7f4f95600000 8200073 ffff880036735cb0 7f4f957ef000 7f4f957f0000 8100070 ffff880036735878 7f4f957f0000 7f4f959f0000 8100073 ffff880036735a28 7f4f959f0000 7f4f959f1000 8100070 ffff880036734af8 7f4f959f1000 7f4f95bf1000 8100073 ffff880036735d88 7f4f95bf1000 7f4f95bf2000 8100070

So now that means I will definitely hold off upgrading my production servers as if this is happening on my personal one, then I can only imagine what would happen to larger instances. It could also be a result of me being a fanboy and installing the .0 release of software, which any good admin will tell you “just say no to .0”.

Category: Linux

CentOS 7 + PostgreSQL 10 + Patroni