Pivotal HDB 2.1.0 Release Notes

Supported Platforms

The supported platform for running Pivotal HDB 2.1.0 comprises:

Each Pivotal HDB host machine must also meet the Apache HAWQ (Incubating) system requirements. See Apache HAWQ System Requirements for more information.

Product Support Matrix

The following table summarizes Pivotal HDB product support for current and previous versions of HDB, Hadoop, HAWQ, Ambari, and operating systems.

Pivotal HDB Version HDP Version Requirement (Pivotal HDP and Hortonworks HDP) Ambari Version Requirement HAWQ Ambari Plug-in Requirement MADlib Version Requirement RHEL/CentOS Version Requirement SuSE Version Requirement
2.1.0.0 2.5 2.4.1 2.1.0 1.9, 1.9.1 6.4+ (64-bit) n/a
2.0.1.0 2.4.0, 2.4.2 2.2.2, 2.4 2.0.1 1.9, 1.9.1 6.4+ (64-bit) n/a
2.0.0.0 2.3.4, 2.4.0 2.2.2 2.0.0 1.9, 1.9.1 6.4+ (64-bit) n/a
1.3.1.1 2.2.6 2.0.x 1.3.1 1.7.1, 1.8, 1.9, 1.9.1 6.4+ SLES 11 SP3
1.3.1.0 2.2.6 2.0.x 1.3.1 1.7.1, 1.8, 1.9, 1.9.1 6.4+ SLES 11 SP3
1.3.0.3 2.2.4.2 1.7 1.2 1.7.1, 1.8, 1.9, 1.9.1 6.4+ SLES 11 SP3
1.3.0.2 2.2.4.2 1.7 1.2 1.7.1, 1.8, 1.9, 1.9.1 6.4+ SLES 11 SP3
1.3.0.1 2.2.4.2 1.7 1.1 1.7.1, 1.8, 1.9, 1.9.1 6.4+ n/a
1.3.0.0 n/a n/a n/a 1.7.1, 1.8, 1.9, 1.9.1 n/a n/a

Note: RHEL/CentOS 7 is not supported.

Note: If you are using Ambari 2.4.1 and you want to install both HDP and HAWQ at the same time, see Installing HDP and HDB with Ambari 2.4.1 before you begin.

Procedural Language Support Matrix

The following table summarizes component version support for Procedural Languages available in Pivotal HDB 2.x. The versions listed have been tested with HDB. Higher versions may be compatible. Please test higher versions thoroughly in your non-production environments before deploying to production.

Pivotal HDB Version PL/Java Java Version Requirement PL/R R Version Requirement PL/Perl Perl Version Requirement PL/Python Python Version Requirement
2.1.0.0 1.7 3.3.1 5.10.1 2.6.2
2.0.1.0 1.7 3.3.1 5.10.1 2.6.2
2.0.0.0 1.6, 1.7 3.1.0 5.10.1 2.6.2

AWS Support Requirements

Pivotal HDB is supported on Amazon Web Services (AWS) servers using either Amazon block level Instance store (Amazon uses the volume names ephemeral[0-23]) or Amazon Elastic Block Store (Amazon EBS) storage. Use long-running EC2 instances with these for long-running HAWQ instances, as Spot instances can be interrupted. If using Spot instances, minimize risk of data loss by loading from and exporting to external storage.

Pivotal HDB 2.1.0 Features and Changes

Pivotal HDB 2.1.0 is based on Apache HAWQ (Incubating), and includes the following new features and changes in behavior as compared to Pivotal HDB 2.0.1.0:

  • HDP 2.5.0 support

    This is an upgrade from HDP 2.4.0 and HDP 2.4.2 to HDP 2.5.0 stack. See HDP 2.5.0 Release Notes for details. The main change is upgrade to Apache Hadoop 2.7.3 from 2.7.1.

  • gporca upgrade to version 1.684 from 1.638

    Many new features and bug fixes in the modular query optimizer are integrated with HAWQ. Refer to gporca releases for details.

  • hawq register enhancements and support for partition tables

    The hawq register command, a new feature in HDB 2.0.1, registers and loads data files into HAWQ internal tables using table metadata defined in a YAML configuration file. hawq register now supports loading and registering 1-level partition tables.

  • PXF predicate pushdown and column projection availability

    HAWQ now makes available to the PXF service both the predicate (filter string) and the column projection information. With this feature, PXF plug-in developers can implement predicate pushdown for their custom plug-ins.

  • PXF checksum verification

    HAWQ now performs client-side checksum verification when reading blocks of data from HDFS.

Installing HDP and HDB with Ambari 2.4.1

If you are using Ambari 2.4.1 and you want to install both HDP and HAWQ at the same time, special care must be taken if you want to install the very latest version of the HDP stack instead of the default version. Follow these steps:

  1. After installing Ambari, start the Cluster Install Wizard and proceed until you reach the Select Version screen.
  2. On the Select Version screen, select HDB-2.5 from the list of available stack versions.
  3. While still on the Select Version screen, copy the Base URL values for the HDP-2.5 and HDP-UTILS-1.1.0.21 repositories that are listed for your operating system. Paste these values into a temporary file; you will need to restore these Base URL values later.
  4. Use the drop-down menu for HDP-2.5 to select the stack option, HDP-2.5 (Default Version Definition). Verify that the hdb-2.1.0.0 and hdb-add-ons-2.1.0.0 repositories now appear in the list of Repositories for your operating system.
  5. To install the very latest version of HDP, replace the Base URL values for the HDP-2.5 and HDP-UTILS-1.1.0.21 repositories with the values you pasted into the text file in Step 3.
  6. Click Next to continue, and finish installing the new HDP cluster.
  7. Install and configure Pivotal HDP as described in Installing HAWQ Using Ambari.

Note: This workaround may not be required with later versions of Ambari 2.4.

HDB 2.0.x to HDB 2.1.0 Upgrade

The HDB 2.0.x to 2.1.0 Upgrade guide provides specific details on upgrading your HAWQ installation. Note: If you are upgrading an HDB version prior to 2.0, refer to the HDB 2.0 documentation.

Differences Compared to Apache HAWQ (Incubating)

Pivotal HDB 2.1.0 includes all of the functionality in Apache HAWQ (Incubating), and adds several bug fixes described below.

Resolved Issues

The following HAWQ issues were resolved in HDB 2.1.0.

Apache Jira Component Summary
HAWQ-583 PXF Extended PXF to enable plugins to support returning partial content from SELECT(column projection) statements
HAWQ-779 PXF Support more PXF filter pushdown
HAWQ-932 PXF HAWQ fails to query external table defined with “localhost” in URL
HAWQ-963 PXF Enhanced PXF to support Null operators
HAWQ-964 PXF Support for additional logical operators in PXF
HAWQ-992 PXF PXF Hive data type check in Fragmenter too restrictive
HAWQ-997 PXF HAWQ doesn’t send PXF data type with precision
HAWQ-1006 PXF Fix RPM compliance in Redhat Satellite
HAWQ-1035 Command Line Tools Support partition table register
HAWQ-1045 PXF Update PXF rpm to include virtual RPM
HAWQ-1051 Resource Manager Failing in reverse DNS lookup causes resource manager core dump
HAWQ-1068 Catalog Master process panic with signal 11 when call get_ao_compression_ratio(null)
HAWQ-1070 PXF Make PXF javadoc compliant
HAWQ-1075 PXF Restore default behavior of client side(PXF) checksum validation when reading blocks from HDFS
HAWQ-1076 Catalog Permission denied for using sequence with SELECT/USUAGE privilege
HAWQ-1084 PXF Fixed psql crashes and “out of memory” errors that occurred while executing the \d hcatalog.* command
HAWQ-1091 Storage Fix HAWQ InputFormat Bugs
HAWQ-1092 Catalog, Command Line Tools lc_collate and lc_ctype do not work after setting through hawq init
HAWQ-1094 Fault Tolerance Select on INTERNAL table returns wrong results when hdfs blocks have checksum errors
HAWQ-1099 Command Line Tools Output yaml file should not contain Bucketnum attribute with random distributed table
HAWQ-1100 PXF Fix to support decimal values in PXF filter
HAWQ-1103 PXF Fix to send constant datatype and length in filter string to PXF service
HAWQ-1104 Command Line Tools Add tupcount, varblockcount and eofuncompressed value in hawq extract yaml configuration, also add implementation in hawq register to recognize these values
HAWQ-1111 PXF Support for IN() operator in PXF
HAWQ-1112 Command Line Tools Error message is not accurate when hawq register with single file and the size is larger than real size
HAWQ-1113 Command Line Tools In force mode, hawq register error when files in yaml is disordered
HAWQ-1117 Core RM crash when init db after configure with parameter ’–enable-cassert’
HAWQ-1120 Command Line Tools Optimize hawqregister performance
HAWQ-1121 Command Line Tools MADlib gppkg install hang while installation actually finished
HAWQ-1124 PXF Updated the hadoop version to 2.7.3 in gradle.properties
HAWQ-1127 Command Line Tools HAWQ should print error message instead of python function stack when yaml file is invalid
HAWQ-1128 Command Line Tools Support HAWQ register tables with same file name in different schema
HAWQ-1129 Procedural Language, Command Line Tools Install PLR into hawq home directory
HAWQ-1130 PXF HCatalog integration now works for non-superusers
HAWQ-1133 Command Line Tools Should print out date/time information in hawq register output
HAWQ-1135 Core MADlib: Raising exception leads to database connection termination
HAWQ-1141 PXF Updated PXF and default HDB stack versions
HAWQ-1143 libhdfs3 libhdfs3 create semantic is not consistent with posix standard
HAWQ-1144 Command Line Tools Register into a 2-level partition table, hawq register didn’t throw error, and indicates that hawq register succeed, but no data can be selected out.
HAWQ-1149 Core, Catalog gp_persistent_build_all issue in gp_relfile_node and gp_persistent_relfile_node
HAWQ-1152 PXF Ensured that the PXF bridge (read and write) accessor resource is closed in every scenario
HAWQ-1159 Command Line Tools Skip namenode check while namenode not part of hawq cluster
HAWQ-1162 Resource Manager Resource manager does not reference dynamic minimum water level of each segment when it times out YARN containers
HAWQ-1167 Storage Add parquet format estimate column width for bpchar type
HAWQ-1171 Core Support upgrade for hawq register
HAWQ-1174 Resource Manager Double type core counter of container set has precision issue

Known Issues and Limitations

MADlib Compression

Pivotal HDB 2.1.0 is compatible with MADlib 1.9 and 1.9.1. However, you must download and execute a script in order to remove the MADlib Quicklz compression, which is not supported in HDB 2.1.0. Run this script if you are upgrading to HDB 2.1.0, or if you are installing MADlib on HDB 2.1.0.

If you are upgrading an HDB 2.0 system that contains MADlib:

  1. Complete the Pivotal HDB 2.1.0 upgrade procedure as described in Upgrading to Pivotal HDB 2.1.0.

  2. Download and unpack the MADlib 1.9.1 binary distribution from the Pivotal HDB Download Page on Pivotal Network.

  3. Execute the remove_compression.sh script in the MADlib 1.9.1 distribution, providing the path to your existing MADlib installation:

    $ remove_compression.sh --prefix <madlib-installation-path>
    

    Note: If you do not include the --prefix option, the script uses the location ${GPHOME}/madlib.

For new MADlib installations, complete these steps after you install Pivotal HDB 2.1.0:

  1. Download and unpack the MADlib 1.9.1 binary distribution from the Pivotal HDB Download Page on Pivotal Network.

  2. Install the MADlib .gppkg file:

    $ gppkg -i <path-to>/madlib-ossv1.9.1_pv1.9.6_hawq2.0-rhel5-x86_64.gppkg
    
  3. Execute the remove_compression.sh script, optionally providing the MADlib installation path:

    $ remove_compression.sh --prefix <madlib-installation-path>
    

    Note: If you do not include the --prefix option, the script uses the location ${GPHOME}/madlib.

  4. Continue installing MADlib using the madpack install command as described in the MADlib Installation Guide. For example:

    $ madpack –p hawq install
    

Operating System

  • Some Linux kernel versions between 2.6.32 to 4.3.3 (not including 2.6.32 and 4.3.3) have a bug that could introduce a getaddrinfo() function hang. To avoid this issue, upgrade the kernel to version 4.3.3+.

Command Line Tools

  • HAWQ-1213 - hawq register returns the following error when you attempt to use a YAML file to register a randomly-distributed table to a destination randomly-distributed table that you created with a non-default default_hash_table_bucket_number:

    Bucket number of <table-name> is not consistent with previous bucket number.
    

    If you wish to use this feature in HDB 2.1.0, set default_hash_table_bucket_number to 6 before creating the destination randomly-distributed table you wish to register to.

PXF

  • PXF in a Kerberos-secured cluster requires YARN to be installed due to a dependency on YARN libraries.
  • In order for PXF to interoperate with HBase, you must manually add the PXF HBase JAR file to the HBase classpath after installation. See Post-Install Procedure for Hive and HBase on HDP.
  • HAWQ-974 - When using certain PXF profiles to query against larger files stored in HDFS, users may occasionally experience hanging or query timeout. This is a known issue that will be improved in a future HDB release. Refer to Addressing PXF Memory Issues for a discussion of the configuration options available to address these issues in your PXF deployment.
  • After upgrading from HDB version 2.0.0, HCatalog access through PXF may fail with the following error:

    postgres=# \d hcatalog.default.hive_table
    ERROR:  function return row and query-specified return row do not match
    DETAIL:  Returned row contains 5 attributes, but query expects 4.
    

    To restore HCatalog access, you must update the PXF pxf_get_item_fields() function definition. Perform this procedure only if you upgraded from HDB 2.0.0.

    1. Log in the HAWQ master node and start the psql subsystem:

      $ ssh gpadmin@master
      gpadmin@master$ psql -d postgres
      
    2. List all but the hcatalog and template0 databases:

      postgres=# SELECT datname FROM pg_database WHERE NOT datname IN ('hcatalog', 'template0');
      
    3. Run the following commands on each database identified in Step 2 to update the pxf_get_item_fields() function definition:

      postgres=# CONNECT <database>;
      postgres=# SET allow_system_table_mods = 'dml';
      postgres=# UPDATE pg_proc
                   SET proallargtypes = '{25,25,25,25,25,25,25}',  proargmodes = '{i,i,o,o,o,o,o}',  proargnames = '{profile,pattern,path,itemname,fieldname,fieldtype,sourcefieldtype}'
                 WHERE proname = 'pxf_get_item_fields';
      
    4. Reset your psql session:

      postgres=# RESET allow_system_table_mods;
      

      Note: Use the allow_system_table_mods server configuration parameter and identified SQL commands only in the context of this workaround. They are not otherwise supported.

PL/R

The HAWQ PL/R extension is provided as a separate RPM in the hdb-add-ons-2.1.0.0 repository. The files installed by this RPM are owned by root. If you installed HAWQ via Ambari, HAWQ files are owned by gpadmin. Perform the following steps on each node in your HAWQ cluster after PL/R RPM installation to align the ownership of PL/R files:

root@hawq-node$ cd /usr/local/hawq
root@hawq-node$ chown gpadmin:gpadmin share/postgresql/contrib/plr.sql docs/contrib/README.plr lib/postgresql/plr.so

Ambari

  • Ambari-managed clusters should only use Ambari for setting system parameters. Parameters modified using the hawq configcommand will be overwritten on Ambari startup or reconfiguration.
  • In certain configurations, the HAWQ Master may fail to start in Ambari versions prior to 2.4.2 when webhdfs is disabled. Refer to AMBARI-18837. To work around this issue, enable webhdfs by setting dfs.webhdfs.enabled to True in hdfs-site.xml, or contact Support.
  • When installing HAWQ in a Kerberos-secured cluster, the installation process may report a warning/failure in Ambari if the HAWQ configuration for resource management type is switched to YARN mode during installation. The warning is related to HAWQ not being able to register with YARN until the HDFS & YARN services are restarted with new configurations resulting from the HAWQ installation process.
  • The HAWQ standby master will not work after you change the HAWQ master port number. To enable the standby master you must first remove and then re-initialize it. See Removing the HAWQ Standby Master and Activating the HAWQ Standby Master.
  • The Ambari Re-Synchronize HAWQ Standby Master service action fails if there is an active connection to the HAWQ master node. The HAWQ task output shows the error, Active connections. Aborting shutdown... If this occurs, close all active connections and then try the re-synchronize action again.
  • The Ambari Run Service Check action for HAWQ and PXF may not work properly on a secure cluster if PXF is not co-located with the YARN component.
  • In a secured cluster, if you move the YARN Resource Manager to another host you must manually update hadoop.proxyuser.yarn.hosts in the HDFS core-site.xml file to match the new Resource Manager hostname. If you do not perform this step, HAWQ segments fail to get resources from the Resource Manager.
  • The Ambari Stop HAWQ Server (Immediate Mode) service action or hawq stop -M immediate command may not stop all HAWQ master processes in some cases. Several postgres processes owned by the gpadmin user may remain active.
  • Ambari checks whether the hawq_rm_yarn_address and hawq_rm_yarn_scheduler_address values are valid when YARN HA is not enabled. In clusters that use YARN HA, these properties are not used and may get out-of-sync with the active Resource Manager. This can leading to false warnings from Ambari if you try to change the property value.
  • Ambari does not support Custom Configuration Groups with HAWQ.
  • Certain HAWQ server configuration parameters related to resource enforcement are not active. Modifying the parameters has no effect in HAWQ since the resource enforcement feature is not currently supported. These parameters include hawq_re_cgroup_hierarchy_name, hawq_re_cgroup_mount_point, and hawq_re_cpu_enable. These parameters appear in the Advanced hawq-site configuration section of the Ambari management interface.

Workaround Required after Moving Namenode

If you use the Ambari Move Namenode Wizard to move a Hadoop namenode, the Wizard does not automatically update the HAWQ configuration to reflect the change. This leaves HAWQ in an non-functional state, and will cause HAWQ service checks to fail with an error similar to:


2017-04-19 21:22:59,138 - SQL command executed failed: export PGPORT=5432 && source
/usr/local/hawq/greenplum_path.sh && psql -d template1 -c \\\\\"CREATE  TABLE
ambari_hawq_test (col1 int) DISTRIBUTED RANDOMLY;\\\\\"
Returncode: 1
Stdout:
Stderr: Warning: Permanently added 'ip-10-32-36-168.ore1.vpc.pivotal.io,10.32.36.168'
(RSA) to the list of known hosts.
WARNING:  could not remove relation directory 16385/1/18366: Input/output error
CONTEXT:  Dropping file-system object -- Relation Directory: '16385/1/18366'
ERROR:  could not create relation directory
hdfs://ip-10-32-36-168.ore1.vpc.pivotal.io:8020/hawq_default/16385/1/18366: Input/output error

2016-04-19 21:22:59,139 - SERVICE CHECK FAILED: HAWQ was not able to write and query from a table 2016-04-19 21:23:02,608 - ** FAILURE **: Service check failed 1 of 3 checks stdout: /var/lib/ambari-agent/data/output-281.txt

To work around this problem, perform one of the following procedures after you complete the Move Namenode Wizard.

Workaround for Non-HA NameNode Clusters:
  1. Perform an HDFS service check to ensure that HDFS is running properly after you moved the NameNode.
  2. Use the Ambari config.sh utility to update hawq_dfs_url to the new NameNode address. See the Modify configurations on the Ambari Wiki for more information. For example:

    $ cd /var/lib/ambari-server/resources/scripts/
    $ ./configs.sh set {ambari_server_host} {clustername} hawq-site
    $ hawq_dfs_url {new_namenode_address}:{port}/hawq_default
    
  3. Restart the HAWQ configuration to apply the configuration change.

  4. Use ssh to log into a HAWQ node and run the checkpoint command:

    $ psql -d template1 -c "checkpoint"
    
  5. Stop the HAWQ service.

  6. The master data directory is identified in the $GPHOME/etc/hawq-site.xml file hawq_master_directory property value. Copy the master data directory to a backup location:

    $ export MDATA_DIR=/value/from/hawqsite
    $ cp -r $MDATA_DIR /catalog/backup/location
    
  7. Execute this query to display all available HAWQ filespaces:

  8. SELECT fsname, fsedbid, fselocation FROM pg_filespace as sp,
    pg_filespace_entry as entry, pg_filesystem as fs WHERE sp.fsfsys = fs.oid
    and fs.fsysname = 'hdfs' and sp.oid = entry.fsefsoid ORDER BY
    entry.fsedbid;
    
          fsname | fsedbid | fselocation
    -------------+---------+------------------------------------------------
    cdbfast_fs_a | 0       | hdfs://hdfs-cluster/hawq//cdbfast_fs_a
    dfs_system   | 0       | hdfs://test5:9000/hawq/hawq-1459499690
    (2 rows)
    
  9. Execute the hawq filespace command on each filespace that was returned by the previous query. For example:

    $ hawq filespace --movefilespace dfs_system --location=hdfs://new_namenode:port/hawq/hawq-1459499690
    $ hawq filespace --movefilespace cdbfast_fs_a --location=hdfs://new_namenode:port/hawq//cdbfast_fs_a
    
  10. If your cluster uses a HAWQ standby master, reinitialize the standby master in Ambari using the Remove Standby Wizard followed by the Add Standby Wizard.

  11. Start the HAWQ Service.

  12. Run a HAWQ service check to ensure that all tests pass.

Workaround for HA NameNode Clusters:
  1. Perform an HDFS service check to ensure that HDFS is running properly after you moved the NameNode.
  2. Use Ambari to expand Custom hdfs-client in the HAWQ Configs tab, then update the dfs.namenode. properties to match the current NameNode configuration.
  3. Restart the HAWQ configuration to apply the configuration change.
  4. Run a HAWQ service check to ensure that all tests pass.