Using Master Mirroring
There are two masters in a HAWQ cluster– a primary master and a standby master. Clients connect to the primary master and queries can be executed only on the primary master.
You deploy a backup or mirror of the master instance on a separate host machine from the primary master so that the cluster can tolerate a single host failure. A backup master or standby master serves as a warm standby if the primary master becomes non-operational. You create a standby master from the primary master while the primary is online.
The primary master continues to provide services to users while HAWQ takes a transactional snapshot of the primary master instance. In addition to taking a transactional snapshot and deploying it to the standby master, HAWQ also records changes to the primary master. After HAWQ deploys the snapshot to the standby master, HAWQ deploys the updates to synchronize the standby master with the primary master.
After the primary master and standby master are synchronized, HAWQ keeps the standby master up to date using walsender and walreceiver, write-ahead log (WAL)-based replication processes. The walreceiver is a standby master process. The walsender process is a primary master process. The two processes use WAL-based streaming replication to keep the primary and standby masters synchronized.
Since the master does not house user data, only system catalog tables are synchronized between the primary and standby masters. When these tables are updated, changes are automatically copied to the standby master to keep it current with the primary.
Figure 1: Master Mirroring in HAWQ
If the primary master fails, the replication process stops, and an administrator can activate the standby master. Upon activation of the standby master, the replicated logs reconstruct the state of the primary master at the time of the last successfully committed transaction. The activated standby then functions as the HAWQ master, accepting connections on the port specified when the standby master was initialized.
If the master fails, the administrator uses command line tools or Ambari to instruct the standby master to take over as the new primary master.
Tip: You can configure a virtual IP address for the master and standby so that client programs do not have to switch to a different network address when the ‘active’ master changes. If the master host fails, the virtual IP address can be swapped to the actual acting master.
You can configure a new HAWQ system with a standby master during HAWQ’s installation process, or you can add a standby master later. This topic assumes you are adding a standby master to an existing node in your HAWQ cluster.
Add a standby master to an existing system
Ensure the host machine for the standby master has been installed with HAWQ and configured accordingly:
- The gpadmin system user has been created.
- HAWQ binaries are installed.
- HAWQ environment variables are set.
- SSH keys have been exchanged.
- HAWQ Master Data directory has been created.
Initialize the HAWQ master standby:
a. If you use Ambari to manage your cluster, follow the instructions in Adding a HAWQ Standby Master.
b. If you do not use Ambari, log in to the HAWQ master and re-initialize the HAWQ master standby node:
$ ssh gpadmin@<hawq_master> hawq_master$ . /usr/local/hawq/greenplum_path.sh hawq_master$ hawq init standby -s <new_standby_master>
where <new_standby_master> identifies the hostname of the standby master.
Check the status of master mirroring by querying the
gp_master_mirroring systemview. See Checking on the State of Master Mirroring for instructions.
To activate or failover to the standby master, see Failing Over to a Standby Master.
If the primary master fails, log replication stops. You must explicitly activate the standby master in this circumstance.
Upon activation of the standby master, HAWQ reconstructs the state of the master at the time of the last successfully committed transaction.
To activate the standby master
Ensure that a standby master host has been configured for the system.
Activate the standby master:
a. If you use Ambari to manage your cluster, follow the instructions in Activating the HAWQ Standby Master.
b. If you do not use Ambari, log in to the HAWQ master and activate the HAWQ master standby node:
hawq_master$ hawq activate standby
After you activate the standby master, it becomes the active or primary master for the HAWQ cluster.
(Optional, but recommended.) Configure a new standby master. See Add a standby master to an existing system for instructions.
Check the status of the HAWQ cluster by executing the following command on the master:
hawq_master$ hawq state
The newly-activated master’s status should be Active. If you configured a new standby master, its status is Passive. When a standby master is not configured, the command displays
-No entries found, the message indicating that no standby master instance is configured.
gp_segment_configurationtable to verify that segments have registered themselves to the new master:
hawq_master$ psql dbname -c 'SELECT * FROM gp_segment_configuration;'
Finally, check the status of master mirroring by querying the
gp_master_mirroringsystem view. See Checking on the State of Master Mirroring for instructions.
To check on the status of master mirroring, query the
gp_master_mirroring system view. This view provides information about the walsender process used for HAWQ master mirroring.
hawq_master$ psql dbname -c 'SELECT * FROM gp_master_mirroring;'
If a standby master has not been set up for the cluster, you will see the following output:
summary_state | detail_state | log_time | error_message ----------------+--------------+----------+--------------- Not Configured | | | (1 row)
If the standby is configured and in sync with the master, you will see output similar to the following:
summary_state | detail_state | log_time | error_message ---------------+--------------+------------------------+--------------- Synchronized | | 2016-01-22 21:53:47+00 | (1 row)
The standby can become out-of-date if the log synchronization process between the master and standby has stopped or has fallen behind. If this occurs, you will observe output similar to the following after querying the
summary_state | detail_state | log_time | error_message ------------------+--------------+------------------------+--------------- Not Synchronized | | | (1 row)
To resynchronize the standby with the master:
If you use Ambari to manage your cluster, follow the instructions in Removing the HAWQ Standby Master.
If you do not use Ambari, execute the following command on the HAWQ master:
hawq_master$ hawq init standby -n
This command stops and restarts the master and then synchronizes the standby.