Pivotal HDB 2.0 Release Notes

Supported Platforms

The supported platform for running for Pivotal HDB 2.0 comprises:

  • Red Hat Enterprise Linux (RHEL) 6.4+ (64-bit) (See note in Known Issues and Limitations for kernel limitations.)
  • Hortonworks Data Platform (HDP) 2.3.4 or 2.4.0
  • Ambari 2.2.2 (for Ambari-based installation and management). Follow the instructions in Upgrading Ambari if you installed HDP 2.3.4 with an earlier version of Ambari.

    Note: Releases prior to Ambari 2.2.2 do not contain the necessary HAWQ service compatibility. You must follow the instructions to upgrade Ambari if you are using a version earlier than 2.2.2.

Each Pivotal HDB host machine must also meet the Apache HAWQ (Incubating) system requirements. See Apache HAWQ System Requirements for more information.

Product Support Matrix

The following table summarizes Pivotal HDB product support for current and previous versions of HDB, Hadoop, HAWQ, Ambari, and operating systems.

Pivotal HDB Version HDP Version Requirement (Pivotal HDP and Hortonworks HDP) Ambari Version Requirement HAWQ Ambari Plug-in Requirement RHEL/CentOS Version Requirement SuSE Version Requirement
2.0 2.3.4, 2.4.0 2.2.2 2.0 6.4+ (64-bit) n/a
1.3.1.1 2.2.6 2.0.x 1.3.1 6.4+ SLES 11 SP3
1.3.1.0 2.2.6 2.0.x 1.3.1 6.4+ SLES 11 SP3
1.3.0.3 2.2.4.2 1.7 1.2 6.4+ SLES 11 SP3
1.3.0.2 2.2.4.2 1.7 1.2 6.4+ SLES 11 SP3
1.3.0.1 2.2.4.2 1.7 1.1 6.4+ n/a
1.3.0.0 n/a n/a n/a n/a n/a

Note: RHEL/CentOS 7 is not supported.

AWS Support Requirements

Pivotal HDB is supported on Amazon Web Services (AWS) servers using either Amazon block level Instance store (Amazon uses the volume names ephemeral[0-23]) or Amazon Elastic Block Store (Amazon EBS) storage. Use long-running EC2 instances with these for long-running HAWQ instances, as Spot instances can be interrupted. If using Spot instances, minimize risk of data loss by loading from and exporting to external storage.

Pivotal HDB 2.0 Features and Changes

Pivotal HDB 2.0 is based on Apache HAWQ (Incubating), and includes the following new features and changes in behavior as compared to Pivotal HAWQ 1.3.x.

Elastic Query Execution Runtime

HAWQ 1.x used a fixed number of segments (compute resources carriers) for queries, executing a query on all the physical segments (hosts) on the cluster, regardless of the size of the query. HDB 2.0 now uses segment elasticity based on virtual segments, which are allocated and returned based on resource cost and demand. Thus, large queries use more segments than smaller queries, and segments are de-allocated when they are no longer needed. The number of virtual segments allocated for query processing can be managed by implementing segment allocation policies.

Resource Management and YARN Integration

HAWQ now includes a resource manager that supports multiple levels and approaches to resource management. With HDB 2.0 you can now:

  • Integrate HAWQ with the YARN global resource manager. YARN integration allows HAWQ to dynamically request resources through YARN and return those resources when the HAWQ workload decreases. This allows HDB 2.0 to integrate even more with the Hadoop ecosystem.
  • Manage resources at the query level. By defining resource queues, you configure resource usage and distribution across queries.
  • Define hierarchical resource queues to ease resource management tasks.
  • Enforce resource limits. You can set CPU and memory usage for queries.

Dynamic Cluster Sizing

HDB 2.0 completely decouples compute resources from storage, and HAWQ segments from the HAWQ master. This means that HAWQ tables do not need to be redistributed after HAWQ nodes are added or removed from the cluster. You can now dynamically add or remove HAWQ nodes.

Block Level Storage

Both Append-Only (AO) and Parquet tables now support block level storage, allowing different readers to read different parts of a data file, to achieve maximum parallelism in processing randomly distributed tables.

Single Directory Per Table

HAWQ organizes the data files for a given table in a single directory. This makes it easy to exchange data with external systems. Using a directory for each table can also simplify multi-tenancy implementations, especially with regard to access control on shared data.

Dispatcher

HDB 2.0 includes a new dispatcher that was developed to support the new elasticity features. The dispatcher can assemble executors dynamically on different hosts into groups to support query execution. These groups and the dispatcher form the basic infrastructure components that support elastic query execution.

Fault Tolerance

The HAWQ fault tolerance service is now based on heartbeats and on-demand probe protocols. HAWQ can automatically identify newly-added nodes, and can remove nodes from the cluster when they become unresponsive.

HDFS Catalog Cache

HAWQ requires information about data location in HDFS in order to decide which segments should process specific data sets. Typically HDFS is slow in handling remote procedure calls (RPC), especially when the number of concurrent requests is high. To improve HDFS RPC performance, HAWQ implements an HDFS catalog cache. The HDFS catalog cache is a caching service that the HAWQ master uses to determine the distribution of table data on HDFS.

HCatalog Integration for Simplified PXF Querying of Hive

HAWQ is now integrated with HCatalog and simplifies PXF-based querying of external tables. HCatalog provides Hive table metadata to PXF so users no longer need to create external tables first before querying external Hive data. You can also use the \d or \d+ command in psql to describe HCatalog tables and schemas.

Command Line Management Interface

In HDB 2.0, most management tools are consolidated under a single hawq command. The usage of this command is as follows:

 hawq <subcommand> [<object>] [<options>] [--version]

The available subcommands are:

  • activate
  • check
  • config
  • extract
  • filespace
  • init
  • load
  • restart
  • scp
  • ssh
  • ssh-exkeys
  • start
  • state
  • stop

Ambari Web Management Interface

Ambari 2.2.2 provides significantly enhanced deployment configuration and comprehensive cluster administration for HDB 2.0. New features in Ambari 2.2.2 include:

  • Installation and Configuration Presets - Installing a cluster with Ambari:
    • Performs proactive checks on component colocation rules and port conflicts
    • Calculates and sets certain parameter default values
    • Updates parameter values in dependent services such as HDFS and YARN
    • Leverages advanced UI controls for better usability
  • Configuration and Operational Checks - Ambari provides:
    • Verification of configuration parameter settings for operating sytem, HAWQ, and supporting services (HDFS, YARN, Kerberos, etc.)
    • HAWQ and PXF service checks to ensure these services are operational and accessible
  • Kerberos Support - HAWQ installation and configuration supports Kerberos mode, whether Kerberos is enabled in the cluster before or after HAWQ install.
  • High Availability Support - HAWQ installation via Ambari supports high availability by:
    • Configuring HAWQ parameters for YARN HA as well as NodeName HA mode
    • Providing wizard-based activation, addition, and removal of HAWQ Standby Master component
  • Metrics Collection - The Ambari dashboard and HAWQ Service page include system metrics widgets that provide a graphical representation of resource usage on the HAWQ cluster.
  • Pre-Defined Alerts - The HAWQ and PXF services are installed with out-of-the-box alert definitions, enabling immediate monitoring of these service components.

Improved Logging for Runaway Query Termination

This release enhances the HAWQ logging mechanism during the termination of runaway queries. The log information helps find the cause of query execution issues. The HAWQ runaway query termination feature helps ensure system stability by proactively terminating the largest query before the system runs out of memory. Now, HAWQ logs terminated query information such as memory allocation history and context information as well as query plan operator memory usage information. This information is sent to the master and segment instance log files.

User Interface Changes

  • External Tables:
    • The file protocol is deprecated for creating a readable external table.
    • The gphdfs protocol is deprecated. Use PXF instead.
    • The ON ALL or HOST options to the EXECUTE command are deprecated. Use ON MASTER, a number, or SEGMENT segment_id to specify the segment instances that should execute the given command.
  • Creating Tables:

    • CREATE TABLE uses random (instead of hash) as the default table distribution policy.
    • The Table Distribution Policy has changed. In HAWQ 1.3 , “create table test ( col )” was used to create a hash table distributed by the first column. The bucketnum parameter was not used in creating tables. The new policy uses bucketnum for hash table queries.
    • To create a hash distributed table, you must include a bucketnum attribute. The following statement creates a table, “sales,” with 100 buckets. This would be similar to a Pivotal HAWQ 1.3.x hash-distributed table on 100 segments:

      create table sales(id int, profit float) with (bucketnum=100) distributed by (id);
      

    Policies for different application scenarios can be specified to optimize performance. The number of virtual segments used for query execution can now be tuned using the hawq_rm_nvseg_perquery_limitand hawq_rm_nvseg_perquery_perseg_limit parameters, in connection with the default_hash_table_bucket_number parameter, which sets the default bucketnum. For more information, see the guidelines for Virtual Segments in Query Performance.

  • Cluster Expansion: gpexpand is no longer used to expand the Pivotal HDB system.

  • Data Compression: The quicklz compression format has been deprecated.

  • Management Utilities: The hawq command-line utility replaces many of the utilities that were used in Pivotal HAWQ 1.3.x. hawq check (formerly gpcheck) was enhanced to check Kerberos and HA configurations for both HDFS and YARN.

  • Configuration: The main configuration file that specifies HAWQ server configuration parameters and their values is now hawq-site.xml. This configuration file is located in the $GPHOME/etc directory on all HAWQ instances and can be modified by using the hawq config utility. The same configuration file can be used cluster-wide across both master and segments. While postgresql.conf is still included in HAWQ, any parameters defined in hawq-site.xml will overwrite configurations in postgresql.conf. For this reason, we recommend that you only use hawq-site.xml to configure your HAWQ cluster. Many of the configuration parameters used in Pivotal HAWQ 1.3.x are no longer applicable to Pivotal HDB.

  • Configuration Parameters: The following table provides a list of configuration parameters that are deprecated in HDB 2.0, as well as a list of new configuration parameters. See the Server Configuration Parameter Reference for more information.

    Deprecated ParametersNew Parameters

    fips_mode
    gp_connection_send_timeout
    gp_create_table_random_default_distribution
    gp_enable_adaptive_nestloop
    gp_fastsequence
    gp_fts_probe_interval
    gp_fts_probe_threadcount
    gp_fts_probe_timeout
    gp_max_local_distributed_cache
    gp_resqueue_memory_policy
    gp_resqueue_priority
    gp_resqueue_priority_cpucores_per_segment
    gp_resqueue_priority_sweeper_interval
    gp_set_read_only
    gp_vmem_idle_resource_timeout
    gp_vmem_protect_limit
    max_statement_mem
    statement_mem
    stats_queue_level
    vacuum_cost_page_hit
    xid_warn_limit

    default_hash_table_bucket_number
    hawq_dfs_url
    hawq_global_rm_type
    hawq_master_address_host
    hawq_master_address_port
    hawq_master_directory
    hawq_master_temp_directory hawq_re_memory_overcommit_max hawq_rm_cluster_report_period hawq_rm_force_alterqueue_cancel_queued_request hawq_rm_master_port hawq_rm_memory_limit_perseg hawq_rm_min_resource_perseg hawq_rm_nresqueue_limit hawq_rm_nslice_perseg_limit hawq_rm_nvcore_limit_perseg hawq_rm_nvseg_perquery_limit hawq_rm_nvseg_perquery_perseg_limit hawq_rm_nvseg_variance_amon_seg_limit hawq_rm_rejectrequest_nseg_limit hawq_rm_resource_idle_timeout hawq_rm_return_percent_on_overcommit hawq_rm_segment_port
    hawq_rm_segment_heartbeat_interval hawq_rm_stmt_nvseg
    hawq_rm_stmt_vseg_memory
    hawq_rm_tolerate_nseg_limit
    hawq_rm_yarn_address
    hawq_rm_yarn_app_name
    hawq_rm_yarn_queue_name
    hawq_rm_yarn_scheduler_address
    hawq_segment_address_port
    hawq_segment_directory
    hawq_segment_temp_directory
    pxf_enable_stat_collection
    pxf_service_address
    pxf_service_port
    pxf_stat_max_fragments
    runaway_detector_activation_percent

  • PXF:

    • PXF provides advanced statistics support for HDFS files.
    • The namespace for PXF has changed from com.pivotal.pxf to org.apache.hawq.pxf. If you have any existing custom PXF plugins, you must recompile them to use the new namespace. If you have defined any PXF tables with built-in plugins, you must recreate the tables with the new package names. In this case, we recommend using the PROFILE option. See Renamed Package Reference in the Apache HAWQ documentation for a list of renamed PXF packages.
    • gphd is no longer used in the PXF directory path.
    • PXF now disallows use of the HEADER option in external tables.
    • Secure Isilon environments are now supported for PXF queries.
    • PXF no longer uses the publicstage folder for storing custom JAR files. Use the /etc/pxf/conf/pxf-public.classpath configuration file to include custom JARs for PXF.
    • All PXF log files have been consolidated under /var/log/pxf. This includes pxf-service.log and all Tomcat-related log files.
    • The Analyzer API is no longer supported. Instead, a new function in the Fragmenter API getFragmentsStats, is used to gather initial statistics for the data source, and further queries gather sampling tuples for that data source. Custom plugins that implement their own XXXAnalyzer will now have to override getFragmentsStats in their custom Fragmenter.
    • A ProtocolVersion command has been added to the REST API used internally by PXF external tables. The path is <pxf-host:port>/pxf/ProtocolVersion. The output is {"version":"pxf-version"}.
    • If a port number is specified in the LOCATION clause of a CREATE EXTERNAL TABLE command with the pxf protocol, PXF connects to the PXF service at that port and fails if the connection cannot be established. When the port is omitted from the LOCATION clause, PXF connects at the High Availability (HA) name service port, 51200 by default. The HA port number can be changed by setting the pxf_service_port configuration parameter.
  • Catalog Table Changes: HDB 2.0 includes the following changes:

    • New table:
      • gp_persistent_relfile_node
      • gp_relfile_node (formerly gp_relation_node)
    • Modified tables:

      • gp_distribution_policy
        Added Columns Dropped Columns
        bucketnum  
      • gp_persistent_relation_node
        Added Columns Dropped Columns
        reserved contentid, segment_file_num, relation_storage_manager, mirror_existence_state, mirror_data_synchronization_state, mirror_bufpool_marked_for_scan_incremental_resync, mirror_bufpool_resync_changed_page_count, mirror_bufpool_resync_ckpt_loc, mirror_bufpool_resync_ckpt_block_num, mirror_append_only_loss_eof, mirror_append_only_new_eof, relation_bufpool_kind, create_mirror_data_loss_tracking_session_num, mirror_existence_state, shared_storage
      • gp_persistent_database_node
        Added Columns Dropped Columns
          contentid, create_mirror_data_loss_tracking_session_num, mirror_existence_state, shared_storage
      • gp_persistent_filespace_node
        Added Columns Dropped Columns
        db_id (formerly db_id_1), location (formerly location_1) contentid, create_mirror_data_loss_tracking_session_num, mirror_existence_state, shared_storage, db_id_2, location_2
      • gp_persistent_tablespace_node
        Added Columns Dropped Columns
          contentid, create_mirror_data_loss_tracking_session_num, mirror_existence_state, shared_storage
      • gp_relfile_node (renamed from gp_relation_node)
        Added Columns Dropped Columns
          create_mirror_data_loss_tracking_session_num, contentid
      • gp_segment_configuration
        Added Columns Dropped Columns
        registration_order, failed_tmpdir_num, failed_tmpdir dbid content, preferred_role, mode, replication_port, san_mounts
      • pg_appendonly
        Added Columns Dropped Columns
        splitsize  
      • pg_resqueue
        Added Columns Dropped Columns
        parentoid, activestats, memorylimit, corelimit, resovercommit, allocpolicy, vsegresourcequota, nvsegupperlimit, nvseglowerlimit, nvsegupperlimitperseg, nvseglowerlimitperseg, creationtime, updatetime, status rsqcountlimit, rsqcostlimit, rsqovercommit, rsqignorecostlimit
      • pg_resqueue_status
        Added Columns Dropped Columns
        rsqname, segmem, segcore, segsize, segsizemax, inusemem, inusecore, rsqholders, rsqwaiters, paused all columns from previous version of HAWQ
    • Deleted tables:

      • gp_fault_strategy
      • gp_fastsequence
      • pg_resourcetype
      • pg_resqueuecapability
      • pg_resqueue_status_kv UDF

Documentation Changes

This release of the Pivotal HDB documentation has been reorganized and utilizes documentation from Apache HAWQ (Incubating) for most of the core features and functionality.

Differences Compared to Apache HAWQ (Incubating)

Pivotal HDB 2.0 includes all of the functionality in Apache HAWQ (Incubating), and adds:

  • The GPORCA next generation query optimizer is used by default in Pivotal HDB. Apache HAWQ (Incubating) uses the legacy planner by default, but can be compiled with GPORCA support as an option.

Upgrade Paths

HDB 2.0 introduces numerous architectural, catalog, configuration, and management utility changes as compared to HAWQ 1.x. Because of this, there is no in-place or automated upgrade procedure to migrate a cluster from HAWQ 1.x to HDB 2.0. Instead, HAWQ 1.x users who want to migrate to HDB 2.0 will need to export all of their existing data and then import that data into a new installation of Pivotal HDB 2.0. The following guidelines should be considered before attempting an export/import migration:

  • Use your normal backup procedure (either gpfdist or PXF) to create a full backup of your cluster data. Ensure that you can reliably restore your backup to the 1.x cluster. gpfdist is recommended if you have enough local file system space to store a full backup.
  • Keep in mind that HDB 2.0 introduces new resource management options including hierarchical resource queues. Read How HAWQ Manages Resources and related topics to plan your resource management configuration in the new cluster.
  • Make backups of your HAWQ 1.x configuration files and determine any other configuration changes that you may want to make in the new HDB 2.0 system.
  • Make backup copies of any DDL scripts you use for creating content in the 1.x cluster. Modify your DDL scripts to make use of new or updated DDL syntax in HDB 2.0. For example:
    • If you create hash tables you must now specify a bucketnum attribute in the CREATE TABLE statement. Also keep in mind that CREATE TABLE uses random (instead of hash) as the default table distribution policy.
    • The namespace for PXF has changed from com.pivotal.pxf to org.apache.hawq.pxf. If you have defined any PXF tables with built-in plugins, you must recreate the tables with the new package names. In this case, we recommend using the PROFILE option. See Renamed Package Reference in the Apache HAWQ documentation for a list of renamed PXF packages.
  • If you have use custom PXF plugins, you must recompile them to use the new org.apache.hawq.pxf namespace.

If you are unable to create a separate HDB 2.0 cluster as part of your migration plan, or if you do not have enough local file system space to store a full cluster backup, contact your Pivotal representative for assistance in migrating to HDB 2.0.

Known Issues and Limitations

Operating System

  • Some Linux kernel versions between 2.6.32 to 4.3.3 (not including 2.6.32 and 4.3.3) have a bug that could introduce a getaddrinfo() function hang. To avoid this issue, upgrade the kernel to version 4.3.3+.

PXF

  • PXF in a Kerberos-secured cluster requires YARN to be installed due to a dependency on YARN libraries.
  • In order for PXF to interoperate with HBase, you must manually add the PXF HBase JAR file to the HBase classpath after installation. See Post-Install Procedure for Hive and HBase on HDP.

PL/Perl

  • Although PL/Perl is listed as a procedural language in the pg_pltemplate system catalog, PL/Perl is not installable or usable in Pivotal HDB 2.0 due to an outdated plperl.so library in the current distribution. An updated library will be provided in the next release of HDB. If you need access to PL/Perl before the next HDB release, please contact your Pivotal Support representative for a solution.

YARN Integration

  • If you are using YARN mode for HAWQ resource management on Hortonworks HDP 2.3 and the timeline server is configured, you must set yarn.resourcemanager.system-metrics-publisher.enabled to false in yarn-site.xml. Otherwise, HAWQ may fail to register itself to YARN. This is a known problem with YARN as described in YARN-4452. YARN-4452 is fixed in HDP 2.4.

Ambari

  • When installing HAWQ in a Kerberos-secured cluster, the installation process may report a warning/failure in Ambari if the HAWQ configuration for resource management type is switched to YARN mode during installation. The warning is related to HAWQ not being able to register with YARN until the HDFS & YARN services are restarted with new configurations resulting from the HAWQ installation process.
  • The HAWQ standby master will not work after you change the HAWQ master port number. To enable the standby master you must first remove and then re-initialize it. See Removing the HAWQ Standby Master and Activating the HAWQ Standby Master.
  • The Ambari Re-Synchronize HAWQ Standby Master service action fails if there is an active connection to the HAWQ master node. The HAWQ task output shows the error, Active connections. Aborting shutdown... If this occurs, close all active connections and then try the re-synchronize action again.
  • The Ambari Run Service Check action for HAWQ and PXF may not work properly on a secure cluster if PXF is not co-located with the YARN component.
  • In a secured cluster, if you move the YARN Resource Manager to another host you must manually update hadoop.proxyuser.yarn.hosts in the HDFS core-site.xml file to match the new Resource Manager hostname. If you do not perform this step, HAWQ segments fail to get resources from the Resource Manager.
  • The Ambari Stop HAWQ Server (Immediate Mode) service action or hawq stop -M immediate command may not stop all HAWQ master processes in some cases. Several postgres processes owned by the gpadmin user may remain active.
  • Ambari checks whether the hawq_rm_yarn_address and hawq_rm_yarn_scheduler_address values are valid when YARN HA is enabled. In clusters that use YARN HA, these properties are not used and may get out-of-sync with the active Resource Manager. This can leading to false warnings from Ambari if you try to change the property value.
  • Ambari does not support Custom Configuration Groups with HAWQ.
  • If you install HAWQ using Ambari 2.2.2 with the HDP 2.3 stack, before you attempt to upgrade to HDP 2.4 you must use Ambari to change the dfs.allow.truncate property to false. Ambari will display a configuration warning with this setting, but it is required in order to complete the upgrade; choose Proceed Anyway when Ambari warns you about the configured value of dfs.allow.truncate. After you complete the upgrade to HDP 2.4, change the value of dfs.allow.truncate back to true to ensure that HAWQ can operate as intended.
  • A failure in the Ambari Activate Standby Wizard can leave the hawq-site.xml file inconsistent. Re-running the wizard will continue to fail with the message: Error: UnboundLocalError: local variable 'old_standby_host_name' referenced before assignment. As a workaround, exit the Activate Standby Wizard and restart the HAWQ service to push the old configuration to the cluster. Then start the Activate Standby Wizard once again.
  • Certain HAWQ server configuration parameters related to resource enforcement are not active. Modifying the parameters has no effect in HAWQ since the resource enforcement feature is not currently supported. These parameters include hawq_re_cgroup_hierarchy_name, hawq_re_cgroup_mount_point, and hawq_re_cpu_enable. These parameters appear in the Advanced hawq-site configuration section of the Ambari management interface.

Workaround Required after Moving Namenode

If you use the Ambari Move Namenode Wizard to move a Hadoop namenode, the Wizard does not automatically update the HAWQ configuration to reflect the change. This leaves HAWQ in an non-functional state, and will cause HAWQ service checks to fail with an error similar to:


2017-04-19 21:22:59,138 - SQL command executed failed: export PGPORT=5432 && source
/usr/local/hawq/greenplum_path.sh && psql -d template1 -c \\\\\"CREATE  TABLE
ambari_hawq_test (col1 int) DISTRIBUTED RANDOMLY;\\\\\"
Returncode: 1
Stdout:
Stderr: Warning: Permanently added 'ip-10-32-36-168.ore1.vpc.pivotal.io,10.32.36.168'
(RSA) to the list of known hosts.
WARNING:  could not remove relation directory 16385/1/18366: Input/output error
CONTEXT:  Dropping file-system object -- Relation Directory: '16385/1/18366'
ERROR:  could not create relation directory
hdfs://ip-10-32-36-168.ore1.vpc.pivotal.io:8020/hawq_default/16385/1/18366: Input/output error

2016-04-19 21:22:59,139 - SERVICE CHECK FAILED: HAWQ was not able to write and query from a table 2016-04-19 21:23:02,608 - ** FAILURE **: Service check failed 1 of 3 checks stdout: /var/lib/ambari-agent/data/output-281.txt

To work around this problem, perform one of the following procedures after you complete the Move Namenode Wizard.

Workaround for Non-HA NameNode Clusters:
  1. Perform an HDFS service check to ensure that HDFS is running properly after you moved the NameNode.
  2. Use the Ambari config.sh utility to update hawq_dfs_url to the new NameNode address. See the Modify configurations on the Ambari Wiki for more information. For example:
    
    cd /var/lib/ambari-server/resources/scripts/
    ./configs.sh set {ambari_server_host} {clustername} hawq-site
    hawq_dfs_url {new_namenode_address}:{port}/hawq_default
      
  3. Restart the HAWQ configuration to apply the configuration change.
  4. Use ssh to log into a HAWQ node and run the checkpoint command: psql -d template1 -c "checkpoint"
  5. Stop the HAWQ service.
  6. Copy the master directory to a backup location: cp -r MASTER_DATA_DIRECTORY /catalog/backup/location
  7. Execute this query to display all available HAWQ filespaces:
    SELECT fsname, fsedbid, fselocation FROM pg_filespace as sp,
    pg_filespace_entry as entry, pg_filesystem as fs WHERE sp.fsfsys = fs.oid
    and fs.fsysname = 'hdfs' and sp.oid = entry.fsefsoid ORDER BY
    entry.fsedbid;
    
    fsname | fsedbid | fselocation -------------+---------+------------------------------------------------ cdbfast_fs_a | 0 | hdfs://hdfs-cluster/hawq//cdbfast_fs_a dfs_system | 0 | hdfs://test5:9000/hawq/hawq-1459499690 (2 rows)
  8. Execute the hawq filespace command on each filespace that was returned by the previous query. For example:
    hawq filespace --movefilespace dfs_system --location=hdfs://new_namenode:port/hawq/hawq-1459499690
    
    hawq filespace --movefilespace cdbfast_fs_a --location=hdfs://new_namenode:port/hawq//cdbfast_fs_a
  9. If your cluster uses a HAWQ standby master, reinitialize the standby master in Ambari using the Remove Standby Wizard followed by the Add Standby Wizard.
  10. Start the HAWQ Service.
  11. Run a HAWQ service check to ensure that all tests pass.
Workaround for HA NameNode Clusters:
  1. Perform an HDFS service check to ensure that HDFS is running properly after you moved the NameNode.
  2. Use Ambari to expand Custom hdfs-client in the HAWQ Configs tab, then update the dfs.namenode. properties to match the current NameNode configuration.
  3. Restart the HAWQ configuration to apply the configuration change.
  4. Run a HAWQ service check to ensure that all tests pass.