Best Practices for Operating HAWQ
This topic provides best practices for operating HAWQ, including recommendations for stopping, starting and monitoring HAWQ.
For best results in using
hawq start and
hawq stop to manage your HAWQ system, the following best practices are recommended.
- Issue the
CHECKPOINTcommand to update and flush all data files to disk and update the log file before stopping the cluster. A checkpoint ensures that, in the event of a crash, files can be restored from the checkpoint snapshot.
Stop the entire HAWQ system by stopping the cluster on the master host:
shell $ hawq stop cluster
To stop segments and kill any running queries without causing data loss or inconsistency issues, use
immediatemode on the cluster:
$ hawq stop cluster -M fast
$ hawq stop cluster -M immediate
hawq stop masterto stop the master only. If you cannot stop the master due to running transactions, try using fast shutdown. If fast shutdown does not work, use immediate shutdown. Use immediate shutdown with caution, as it will result in a crash-recovery run when the system is restarted.
$ hawq stop master -M fast
$ hawq stop master -M immediate
When stopping a segment or all segments, you can use the default mode of smart mode. Using fast or immediate mode on segments will have no effect since segments are stateless.
$ hawq stop segment
$ hawq stop allsegments
Typically you should always use
hawq start clusteror
hawq restart clusterto start the cluster. If you do end up using
hawq start standby|master|segmentto start nodes individually, make sure you always start the standby before the active master. Otherwise, the standby can become unsynchronized with the active master.
This topic provides some guidelines around expanding your HAWQ cluster.
There are several recommendations to keep in mind when modifying the size of your running HAWQ cluster:
- When you add a new node, install both a DataNode and a physical segment on the new node.
- After adding a new node, you should always rebalance HDFS data to maintain cluster performance.
- Adding or removing a node also necessitates an update to the HDFS metadata cache. This update will happen eventually, but can take some time. To speed the update of the metadata cache, execute
- Note that for hash distributed tables, expanding the cluster will not immediately improve performance since hash distributed tables use a fixed number of virtual segments. In order to obtain better performance with hash distributed tables, you must redistribute the table to the updated cluster by either the ALTER TABLE or CREATE TABLE AS command.
- If you are using hash tables, consider updating the
default_hash_table_bucket_numberserver configuration parameter to a larger value after expanding the cluster but before redistributing the hash tables.
|List segments that are currently down. If any rows are returned, this should generate a warning or alert.
Recommended frequency: run every 5 to 10 minutes
|Run the following query in the
||If the query returns any rows, follow these steps to correct the problem:
|Underlying platform check for maintenance required or system down of the hardware.
Recommended frequency: real-time, if possible, or every 15 minutes
|Set up system check for hardware and OS errors.||If required, remove a machine from the HAWQ cluster to resolve hardware and OS issues, then add it back to the cluster.|
|Check disk space usage on volumes used for HAWQ data storage and the OS.
Recommended frequency: every 5 to 30 minutes
Set up a disk space check.
|Free space on the system by removing some data or files.|
|Check for errors or dropped packets on the network interfaces.
Recommended frequency: hourly
|Set up a network interface checks.||
Work with network and OS teams to resolve errors.
|Check for RAID errors or degraded RAID performance.
Recommended frequency: every 5 minutes
|Set up a RAID check.||
|Check for adequate I/O bandwidth and I/O skew.
Recommended frequency: when create a cluster or when hardware issues are suspected.
|Run the HAWQ
The cluster may be under-specified if data transfer rates are not similar to the following:
If the machines on the cluster display an uneven performance profile, work with the system administration team to fix faulty machines.
|Check for missing statistics on tables.||Check the
|Mark deleted rows in HAWQ system catalogs (tables in the
Recommended frequency: daily
|Vacuum each system catalog:
||Vacuum system catalogs regularly to prevent bloating.|
|Update table statistics.
Recommended frequency: after loading data and before executing queries
|Analyze user tables:
||Analyze updated tables regularly so that the optimizer can produce efficient query execution plans.|
|Backup the database data.
Recommended frequency: daily, or as required by your backup plan
|See Backing up and Restoring HAWQ Databases for a discussion of backup procedures||Best practice is to have a current backup ready in case the database must be restored.|
|Reindex system catalogs (tables in the
Recommended frequency: weekly, or more often if database objects are created and dropped frequently
||The optimizer retrieves information from the system tables to create query plans. If system tables and indexes are allowed to become bloated over time, scanning the system tables increases query execution time.|
|Ensure any bug fixes or enhancements are applied to the kernel.
Recommended frequency: at least every 6 months
|Follow the vendor’s instructions to update the Linux kernel.||Keep the kernel current to include bug fixes and security fixes, and to avoid difficult future upgrades.|
|Install HAWQ minor releases.
Recommended frequency: quarterly
|Always upgrade to the latest in the series.||Keep the HAWQ software current to incorporate bug fixes, performance enhancements, and feature enhancements into your HAWQ cluster.|