Working with PXF and External Data

HAWQ Extension Framework (PXF) is an extensible framework that allows HAWQ to query external system data. 

PXF includes built-in connectors for accessing data inside HDFS files, Hive tables, and HBase tables. PXF also integrates with HCatalog to query Hive tables directly.

PXF allows users to create custom connectors to access other parallel data stores or processing engines. To create these connectors using Java plug-ins, see the PXF External Tables and API.

  • Installing PXF Plug-ins

    This topic describes how to install the built-in PXF service plug-ins that are required to connect PXF to HDFS, Hive, and HBase. You should install the appropriate RPMs on each node in your cluster.

  • Configuring PXF

    This topic describes how to configure the PXF service.

  • Accessing HDFS File Data

    This topic describes how to access HDFS file data using PXF.

  • Accessing Hive Data

    This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF’s integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog.

  • Accessing HBase Data

    This topic describes how to access HBase data using PXF.

  • Using Profiles to Read and Write Data

    PXF profiles are collections of common metadata attributes that can be used to simplify the reading and writing of data. You can use any of the built-in profiles that come with PXF or you can create your own.

  • PXF External Tables and API

    You can use the PXF API to create your own connectors to access any other type of parallel data store or processing engine.

  • Troubleshooting PXF