About gpfdist Setup and Performance
Consider the following scenarios for optimizing your ETL network performance.
- Allow network traffic to use all ETL host Network Interface Cards (NICs) simultaneously. Run one instance of
gpfdiston the ETL host, then declare the host name of each NIC in the
LOCATIONclause of your external table definition (see Creating External Tables - Examples).
- Divide external table data equally among multiple
gpfdistinstances on the ETL host. For example, on an ETL system with two NICs, run two
gpfdistinstances (one on each NIC) to optimize data load performance and divide the external table data files evenly between the two
Figure: External Tables Using Multiple gpfdist Instances with Multiple NICs
Note: Use pipes (|) to separate formatted text when you submit files to
gpfdist. HAWQ encloses comma-separated text strings in single or double quotes.
gpfdist has to remove the quotes to parse the strings. Using pipes to separate formatted text avoids the extra step and improves performance.