Running a HAWQ Cluster
This section provides information for system administrators responsible for administering a HAWQ deployment.
You should have some knowledge of Linux/UNIX system administration, database management systems, database administration, and structured query language (SQL) to administer a HAWQ cluster. Because HAWQ is based on PostgreSQL, you should also have some familiarity with PostgreSQL. The HAWQ documentation calls out similarities between HAWQ and PostgreSQL features throughout.
HAWQ supports users with both administrative and operating privileges. The HAWQ administrator may choose to manage the HAWQ cluster using either Ambari or the command line. Managing HAWQ Using Ambari provides Ambari-specific HAWQ cluster administration procedures. Starting and Stopping HAWQ, Expanding a Cluster, and Removing a Node describe specific command-line-managed HAWQ cluster administration procedures. Other topics in this guide are applicable to both Ambari- and command-line-managed HAWQ clusters.
The default HAWQ admininstrator user is named
gpadmin. The HAWQ admin may choose to assign administrative and/or operating HAWQ privileges to additional users. Refer to Configuring Client Authentication and Managing Roles and Privileges for additional information about HAWQ user configuration.
A typical HAWQ deployment includes single HDFS and HAWQ master and standby nodes and multiple HAWQ segment and HDFS data nodes. The HAWQ cluster may also include systems running the HAWQ Extension Framework (PXF) and other Hadoop services. Refer to HAWQ Architecture and Select HAWQ Host Machines for information about the different systems in a HAWQ deployment and how they are configured.
You manage HAWQ databases at the command line using the psql utility, an interactive front-end to the HAWQ database. Configuring client access to HAWQ databases and tables may require information related to Establishing a Database Session.
HAWQ Database Drivers and APIs identifies supported HAWQ database drivers and APIs for additional client access methods.
HAWQ internal data resides in HDFS. You may require access to data in different formats and locations in your data lake. You can use HAWQ and the HAWQ Extension Framework (PXF) to access and manage both internal and this external data:
- Managing Data with HAWQ discusses the basic data operations and details regarding the loading and unloading semantics for HAWQ internal tables.
- Using PXF with Unmanaged Data describes PXF, an extensible framework you may use to query data external to HAWQ.
Refer to Introducing the HAWQ Operating Environment for a discussion of the HAWQ operating environment, including a procedure to set up the HAWQ environment. This section also provides an introduction to the important files and directories in a HAWQ installation.