將stanby host(slave)提升為master
若master因意外導致無法正常服務,為能夠正常繼續提供服務,此時就需要將slave提升為master,確保DB能夠正常寫入資料。本章節將模擬master無法提供服務之情況,並透過repmgr standby promote功能,將slave提升為master角色,流程大致如下列:
- 確認repmgr是否運作正常
- 關閉mater postgreql服務
- 於slave手動執行repmgr standby promote,提升為mater角色
- 確認DB可正常被寫入
1.確認repmgr是否運作正常
[On master & slave]
先確認repmgr運作是否正常運作
repmgr=# SELECT * FROM repmgr_dbc.repl_nodes ORDER BY id;
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active
----+---------+------------------+---------+-------+-----------------------------------------------+---------------+----------+--------
1 | master | | dbc | node1 | host=192.168.252.50 user=repmgr dbname=repmgr | repmgr_slot_1 | 100 | t
2 | standby | 1 | dbc | node2 | host=192.168.252.51 user=repmgr dbname=repmgr | repmgr_slot_2 | 100 | t
(2 rows)
2.關閉mater postgreql服務
[on master]
~$ sudo service postgresql stop
or
~$ sudo pg_ctlcluster 10 main stop --force (On Debian/Ubuntu)
[on slave]
確認節點狀態
repmgr=# SELECT * FROM repmgr_dbc.repl_nodes ORDER BY id;
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active
----+---------+------------------+---------+-------+-----------------------------------------------+---------------+----------+--------
1 | master | | dbc | node1 | host=192.168.252.50 user=repmgr dbname=repmgr | repmgr_slot_1 | 100 | t
2 | standby | 1 | dbc | node2 | host=192.168.252.51 user=repmgr dbname=repmgr | repmgr_slot_2 | 100 | t
(2 rows)
此時在slave上會發現mater狀態仍是active,這個狀態我也疑惑了一陣子,後來在repmgr的文件中得到了幹話般解釋:
At this point the replication cluster will be in a partially disabled state with both standbys accepting read-only connections while attempting to connect to the stopped master. Note that the repl_nodes table will not yet have been updated and will still show the master as active.
3. 於slave手動執行repmgr standby promote,提升為mater角色
[on slave]
postgres@DBC-slave:/home/tw_it$ repmgr standby promote
ERROR: connection to database failed: could not connect to server: Connection refused
Is the server running on host "192.168.252.50" and accepting
TCP/IP connections on port 5432?
NOTICE: promoting standby
NOTICE: promoting server using 'pg_ctl -D /var/lib/postgresql/10/main promote'
waiting for server to promote.... done
server promoted
NOTICE: STANDBY PROMOTE successful
再次確認節點狀態,會發現原本master狀態已變為"f", slave的type也會轉變為"master"。
repmgr=# SELECT * FROM repmgr_dbc.repl_nodes ORDER BY id;
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active
----+--------+------------------+---------+-------+-----------------------------------------------+---------------+----------+--------
1 | master | | dbc | node1 | host=192.168.252.50 user=repmgr dbname=repmgr | repmgr_slot_1 | 100 | f
2 | master | | dbc | node2 | host=192.168.252.51 user=repmgr dbname=repmgr | repmgr_slot_2 | 100 | t
Note1: 若有出現"sh: 1: pg_ctl: not found"訊息,請參閱上一章節"加入pg_ctl搜尋路徑" Note2: repmgr會先確認master是否可以正常連線,若在master安好的狀態下,repmgr會拒絕將slave promote為master。所以第一筆輸出連線失敗的ERROR訊息是正常的。