Write the gpfdist Configuration
gpfdist configuration is specified as a YAML 1.1 document. It specifies rules that
gpfdist uses to select a Transform to apply when loading or extracting data.
gpfdist configuration contains the following items:
input_transform.shwrapper script, referenced in the
input_transform.stxjoost transformation, called from
Aside from the ordinary YAML rules, such as starting the document with three dashes (
gpfdist configuration must conform to the following restrictions:
VERSIONsetting must be present with the value
TRANSFORMATIONSsetting must be present and contain one or more mappings.
Each mapping in the
TYPEwith the value ‘input’ or 'output’
COMMANDindicating how the transform is run.
Each mapping in the
TRANSFORMATIONcan contain optional
gpfdist configuration called
config.YAML applies to the prices example. The initial indentation on each line is significant and reflects the hierarchical nature of the specification. The name
prices_input in the following example will be referenced later when creating the table in SQL.
--- VERSION: 22.214.171.124 TRANSFORMATIONS: prices_input: TYPE: input COMMAND: /bin/bash input_transform.sh %filename%
COMMAND setting uses a wrapper script called
input_transform.sh with a
%filename% placeholder. When
gpfdist runs the
prices_input transform, it invokes
/bin/bash and replaces the
%filename% placeholder with the path to the input file to transform. The wrapper script called
input_transform.sh contains the logic to invoke the STX transformation and return the output.
If Joost is used, the Joost STX engine must be installed.
#!/bin/bash # input_transform.sh - sample input transformation, # demonstrating use of Java and Joost STX to convert XML into # text to load into HAWQ. # java arguments: # -jar joost.jar joost STX engine # -nodecl don't generate a <?xml?> declaration # $1 filename to process # input_transform.stx the STX transformation # # the AWK step eliminates a blank line joost emits at the end java \ -jar joost.jar \ -nodecl \ $1 \ input_transform.stx \ | awk 'NF>0
input_transform.sh file uses the Joost STX engine with the AWK interpreter. The following diagram shows the process flow as
gpfdist runs the transformation.