Using Profiles to Read and Write Data

PXF profiles are collections of common metadata attributes that can be used to simplify the reading and writing of data. You can use any of the built-in profiles that come with PXF or you can create your own.

For example, if you are writing single line records to text files on HDFS, you could use the built-in HdfsTextSimple profile. You specify this profile when you create the PXF external table used to write the data to HDFS.

Built-In Profiles

PXF comes with a number of built-in profiles that group together a collection of metadata attributes. PXF built-in profiles simplify access to the following types of data storage systems:

  • HDFS File Data (Read + Write)
  • Hive (Read only)
  • HBase (Read only)

You can specify a built-in profile when you want to read data that exists inside HDFS files, Hive tables, HBase tables, and for writing data into HDFS files.

Profile Description Fragmenter/Accessor/Resolver
HdfsTextSimple Read or write delimited single line records from or to plain text files on HDFS.
  • org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter
  • org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor
  • org.apache.hawq.pxf.plugins.hdfs.StringPassResolver
HdfsTextMulti Read delimited single or multi-line records (with quoted linefeeds) from plain text files on HDFS. This profile is not splittable (non parallel); therefore reading is slower than reading with HdfsTextSimple.
  • org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter
  • org.apache.hawq.pxf.plugins.hdfs.QuotedLineBreakAccessor
  • org.apache.hawq.pxf.plugins.hdfs.StringPassResolver
Hive Use this when connecting to Hive. The Hive table can use any of the available storage formats: text, RC, ORC, Sequence, or Parquet.
  • org.apache.hawq.pxf.plugins.hive.HiveDataFragmenter
  • org.apache.hawq.pxf.plugins.hive.HiveAccessor
  • org.apache.hawq.pxf.plugins.hive.HiveResolver
HiveRC Use this when connecting to a Hive table where each partition is stored as an RCFile. This profile is optimized for it.
Note: The DELIMITER parameter is mandatory.
  • org.apache.hawq.pxf.plugins.hive.HiveInputFormatFragmenter
  • org.apache.hawq.pxf.plugins.hive.HiveRCFileAccessor
  • org.apache.hawq.pxf.plugins.hive.HiveColumnarSerdeResolver
HiveText Use this profile when connecting to a Hive table where each partition is stored as a text file. This profile is optimized for it.
Note: The DELIMITER parameter is mandatory.
  • org.apache.hawq.pxf.plugins.hive.HiveInputFormatFragmenter
  • org.apache.hawq.pxf.plugins.hive.HiveLineBreakAccessor
  • org.apache.hawq.pxf.plugins.hive.HiveStringPassResolver
HBase Use this profile when connected to an HBase data store engine.
  • org.apache.hawq.pxf.plugins.hbase.HBaseDataFragmenter
  • org.apache.hawq.pxf.plugins.hbase.HBaseAccessor
  • org.apache.hawq.pxf.plugins.hbase.HBaseResolver
Avro Use this profile for reading Avro files (fileName.avro).
  • org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter
  • org.apache.hawq.pxf.plugins.hdfs.AvroFileAccessor
  • org.apache.hawq.pxf.plugins.hdfs.AvroResolver

Adding and Updating Profiles

Administrators can add new profiles or edit the built-in profiles inside /etc/conf/pxf-profiles.xml. You can use all the profiles in /etc/conf/pxf-profiles.xml.

Note: Add any JAR files that contain custom profile plug-ins to the /etc/pxf/conf/pxf-public.classpath configuration file.

Each profile has a mandatory unique name and an optional description.

In addition, each profile contains a set of plug-ins that are an extensible set of metadata attributes.

After you make changes in pxf-profiles.xml (or any other PXF configuration file), propagate the changes to all nodes with PXF installed, and then restart the PXF service on all nodes.

Custom Profile Example

<profile> 
 <name>MyCustomProfile</name>
 <description>A Custom Profile Example</description>
 <plugins>
    <fragmenter>package.name.CustomProfileFragmenter</fragmenter>
    <accessor>package.name.CustomProfileAccessor</accessor>
    <customPlugin1>package.name.MyCustomPluginValue1</customPlugin1>
    <customPlugin2>package.name.MyCustomPluginValue2</customPlugin2>
 </plugins>
</profile>