Accessing HBase Data
Before trying to access HBase data with PXF, verify the following:
/etc/hbase/conf/hbase-env.shconfiguration file must reference the
pxf-hbase.jar. For example,
/etc/hbase/conf/hbase-env.shshould include the line:
Note: You must restart HBase after making any changes to the HBase configuration.
PXF HBase plug-in is installed on all cluster nodes.
HBase and ZooKeeper jars are installed on all cluster nodes.
To create an external HBase table, use the following syntax:
CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name ( column_name data_type [, ...] | LIKE other_table ) LOCATION ('pxf://host[:port]/hbase-table-name?Profile=HBase') FORMAT 'CUSTOM' (Formatter='pxfwritable_import');
The HBase profile is equivalent to the following PXF parameters:
Most HAWQ external tables (PXF or others) require that the HAWQ table attributes match the source data record layout, and include all the available attributes. With HAWQ, however, you use the PXF HBase plug-in to specify the subset of HBase qualifiers that define the HAWQ PXF table. To set up a clear mapping between each attribute in the PXF table and a specific qualifier in the HBase table, you can use either direct mapping or indirect mapping. In addition, the HBase row key is handled in a special way.
You can use the HBase table row key in several ways. For example, you can see them using query results, or you can run a WHERE clause filter on a range of row key values. To use the row key in the HAWQ query, define the HAWQ table with the reserved PXF attribute
recordkey. This attribute name tells PXF to return the record key in any key-value based system and in HBase.
Note: Because HBase is byte and not character-based, you should define the recordkey as type bytea. This may result in better ability to filter data and increase performance.
CREATE EXTERNAL TABLE <tname> (recordkey bytea, ... ) LOCATION ('pxf:// ...')
Use Direct Mapping to map HAWQ table attributes to HBase qualifiers. You can specify the HBase qualifier names of interest, with column family names included, as quoted values.
For example, you have defined an HBase table called
hbase_sales with multiple column families and many qualifiers. To create a HAWQ table with these attributes:
saleidin the column family
commentsin the column family
use the following
CREATE EXTERNAL TABLE syntax:
CREATE EXTERNAL TABLE hbase_sales ( recordkey bytea, "cf1:saleid" int, "cf8:comments" varchar ) ...
The PXF HBase plug-in uses these attribute names as-is and returns the values of these HBase qualifiers.
The direct mapping method is fast and intuitive, but using indirect mapping helps to reconcile HBase qualifier names with HAWQ behavior:
- HBase qualifier names may be longer than 32 characters. HAWQ has a 32-character limit on attribute name size.
- HBase qualifier names can be binary or non-printable. HAWQ attribute names are character based.
In either case, Indirect Mapping uses a lookup table on HBase. You can create the lookup table to store all necessary lookup information. This works as a template for any future queries. The name of the lookup table must be
pxflookup and must include the column family named
Using the sales example in Direct Mapping, if our
rowkey represents the HBase table name and the
mapping column family includes the actual attribute mapping in the key value form of
<hawq attr name>=<hbase cf:qualifier>.
This example maps the
saleid qualifier in the
cf1 column family to the HAWQ
id column and the
comments qualifier in the
cf8 family to the HAWQ
The mapping assigned new names for each qualifier. You can use these names in your HAWQ table definition:
CREATE EXTERNAL TABLE hbase_sales ( recordkey bytea id int, cmts varchar ) ...
PXF automatically matches HAWQ to HBase column names when a
pxflookup table exists in HBase.