master與standby的switchover

repmgr可以透過 repmgr standby switchover協助我們進行master與stanby的角色互換，由於switchover操作非常複雜，所有事前準備工作必須嚴謹看待，避免實際錯誤switchover所造成的二次災難。在開始前，請再三確認下列事項是否都先確認過：

設定好兩方節點的SSH免密碼登入，詳細請參考前面章節1的"設定SSH免密碼登入"部份。
repmgr standby switchover底層是使用pg_rewind，請確保兩邊節點的postgres使用者可搜尋並執行該指令。Debian/Ubuntu預設是無法直接執行pg_rewind，請參考章節1的"加入pg_ctl搜尋路徑部份"。
wal_log_hints = on 必須要開啟。
總是加上參數-v來執行repmgr standby switchover，確保沒有漏掉任何除錯訊息。
switchover必須執行在standby機器上，而不是master上。
為避免網路狀況不佳，導致switchover不完全，建議讓switchover, replication走獨立內部網路。
switchover只適用於只有兩個節點(master, stnadby)的情況下，若有第3個standby節點，在switchover後必須手動執行repmgr standby follow。
若有使用repmgrd做自動failover，在手動switchover之前一定要先停掉所有節點的repmgrd，否則在switchover後，standby會被自動拉起來作master。

先查看並確認目前master掛在哪個節點上，以下圖為例目前master長在node2上，請到將被提升為master的standby機器上，也就是node1上執行所有作業。

postgres@node1:~$ repmgr cluster show
Role      | Name  | Upstream | Connection String
----------+-------|----------|----------------------------------------------
* master  | node2 |          | host=192.168.252.51 user=repmgr dbname=repmgr
  standby | node1 | node2    | host=192.168.252.50 user=repmgr dbname=repmgr

[on node1]

直接執行repmgr -v standby switchover

postgres@node1:~$ repmgr -v standby switchover
NOTICE: looking for configuration file in current directory
NOTICE: looking for configuration file in /etc
NOTICE: configuration file found at: /etc/repmgr.conf
NOTICE: switching current node 1 to master server and demoting current master to standby...
NOTICE: 0 files copied to /tmp/repmgr-node2-archive
NOTICE: current master has been stopped
ERROR: connection to database failed: could not connect to server: Connection refused
    Is the server running on host "192.168.252.51" and accepting
    TCP/IP connections on port 5432?

NOTICE: promoting standby
NOTICE: promoting server using 'pg_ctl -D /var/lib/postgresql/10/main promote'
waiting for server to promote.... done
server promoted
NOTICE: STANDBY PROMOTE successful
NOTICE: Executing pg_rewind on old master server
NOTICE: 0 files copied to /var/lib/postgresql/10/main
NOTICE: restarting server using 'pg_ctl  -w -D /var/lib/postgresql/10/main -m fast restart'
pg_ctl: PID file "/var/lib/postgresql/10/main/postmaster.pid" does not exist
Is server running?
starting server anyway
ERROR: unable to restart server
NOTICE: node 2 is replicating in state "streaming"
NOTICE: replication slot "repmgr_slot_1" deleted on node 2
NOTICE: switchover was successful

Note: 從-v選項可看到一些ERROR，因switchover在某些階段，會需要暫停某一邊的服務，造成暫時無法連線的情況，但不一定是有問題，最後還是要從結果來看。

最後確認是否正確switchover，從下面結果可看出，master已從原本node2轉移到node1

postgres@node1:~$ repmgr cluster show
Role      | Name  | Upstream | Connection String
----------+-------|----------|----------------------------------------------
  standby | node2 |          | host=192.168.252.51 user=repmgr dbname=repmgr
* master  | node1 |          | host=192.168.252.50 user=repmgr dbname=repmgr

且node1也有接上Streaming Replication

postgres=# SELECT * FROM pg_replication_slots ;
   slot_name   | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
 repmgr_slot_2 |        | physical  |        |          | f         | t      |       2443 |      |              | 0/B000AB8   | 
(1 row)

但在repmgr3.3執行完switch over後，若直接去查看repl_nodes，有時會發現原standby node的狀態沒有更新到

repmgr=# SELECT * FROM repmgr_dbc.repl_nodes ORDER BY id;
 id |  type  | upstream_node_id | cluster | name  |                   conninfo                    |   slot_name   | priority | active 
----+--------+------------------+---------+-------+-----------------------------------------------+---------------+----------+--------
  1 | master |                  | dbc     | node2 | host=192.168.252.51 user=repmgr dbname=repmgr | repmgr_slot_2 |      100 | t
  2 | master |                  | dbc     | node1 | host=192.168.252.50 user=repmgr dbname=repmgr | repmgr_slot_1 |      100 | f
(2 rows)

若遇到該狀況，請直接手動從資料表中刪除

repmgr=# DELETE FROM repmgr_dbc.repl_nodes WHERE name = 'node1';

並在switchover後的standby host重新註冊repmgr

postgres@node1:~$ repmgr standby register

2.3 switchover between master and standby

master與standby的switchover

[on node1]

results matching ""

No results matching ""