Configuration File Format
gpfdist configuration file uses the YAML 1.1 document format and implements a schema for defining the transformation parameters. The configuration file must be a valid YAML document.
gpfdist program processes the document in order and uses indentation (spaces) to determine the document hierarchy and relationships of the sections to one another. The use of white space is significant. Do not use white space for formatting and do not use tabs.
The following is the basic structure of a configuration file.
--- VERSION: 22.214.171.124 TRANSFORMATIONS: transformation_name1: TYPE: input | output COMMAND: command CONTENT: data | paths SAFE: posix-regex STDERR: server | console transformation_name2: TYPE: input | output COMMAND: command ...
Required. The version of the
gpfdist configuration file schema. The current version is 126.96.36.199.
Required. Begins the transformation specification section. A configuration file must have at least one transformation. When
gpfdist receives a transformation request, it looks in this section for an entry with the matching transformation name.
Required. Specifies the direction of transformation. Values are
gpfdisttreats the standard output of the transformation process as a stream of records to load into HAWQ.
gpfdisttreats the standard input of the transformation process as a stream of records from HAWQ to transform and write to the appropriate output.
Required. Specifies the command
gpfdist will execute to perform the transformation.
For input transformations,
gpfdist invokes the command specified in the
CONTENT setting. The command is expected to open the underlying file(s) as appropriate and produce one line of
TEXT for each row to load into HAWQ />. The input transform determines whether the entire content should be converted to one row or to multiple rows.
For output transformations,
gpfdist invokes this command as specified in the
CONTENT setting. The output command is expected to open and write to the underlying file(s) as appropriate. The output transformation determines the final placement of the converted output.
Optional. The values are
paths. The default value is
data, the text
COMMANDsection is replaced by the path to the file to read or write.
paths, the text
COMMANDsection is replaced by the path to the temporary file that contains the list of files to read or write.
The following is an example of a
COMMAND section showing the text
%filename% that is replaced.
COMMAND: /bin/bash input_transform.sh %filename%
POSIXregular expression that the paths must match to be passed to the transformation. Specify
SAFE when there is a concern about injection or improper interpretation of paths passed to the command. The default is no restriction on paths.
Optional.The values are
This setting specifies how to handle standard error output from the transformation. The default,
server, specifies that
gpfdist will capture the standard error output from the transformation in a temporary file and send the first 8k of that file to HAWQ as an error message. The error message will appear as a SQL error.
Console specifies that
gpfdist does not redirect or transmit the standard error output from the transformation.