Table Storage Model and Distribution Policy
HAWQ supports several storage models and a mix of storage models. When you create a table, you choose how to store its data. This topic explains the options for table storage and how to choose the best storage model for your workload.
Note: To simplify the creation of database tables, you can specify the default values for some table storage options with the HAWQ server configuration parameter
HAWQ provides storage orientation models of either row-oriented or Parquet tables. Evaluate performance using your own data and query workloads to determine the best alternatives.
Row-oriented storage: good for OLTP types of workloads with many iterative transactions and many columns of a single row needed all at once, so retrieving is efficient.
Note: Column-oriented storage is no longer available. Parquet storage should be used, instead.
Row-oriented storage provides the best options for the following situations:
- Updates of table data. Where you load and update the table data frequently.
- Frequent INSERTs. Where rows are frequently inserted into the table
- Number of columns requested in queries. Where you typically request all or the majority of columns in the
WHEREclause of your queries, choose a row-oriented model.
- Number of columns in the table. Row-oriented storage is most efficient when many columns are required at the same time, or when the row-size of a table is relatively small.
ALTER TABLEcommand changes the definition of a table. Use
ALTER TABLE to change table attributes such as column definitions, distribution policy, storage model, and partition structure (see also Maintaining Partitioned Tables). For example, to add a not-null constraint to a table column:
=> ALTER TABLE address ALTER COLUMN street SET NOT NULL;
ALTER TABLE provides options to change a table’s distribution policy . When the table distribution options change, the table data is redistributed on disk, which can be resource intensive. You can also redistribute table data using the existing distribution policy.
For partitioned tables, changes to the distribution policy apply recursively to the child partitions. This operation preserves the ownership and all other attributes of the table. For example, the following command redistributes the table sales across all segments using the customer_id column as the distribution key:
ALTER TABLE sales SET DISTRIBUTED BY (customer_id);
When you change the hash distribution of a table, table data is automatically redistributed. Changing the distribution policy to a random distribution does not cause the data to be redistributed. For example:
ALTER TABLE sales SET DISTRIBUTED RANDOMLY;
To redistribute table data for tables with a random distribution policy (or when the hash distribution policy has not changed) use
REORGANIZE=TRUE. Reorganizing data may be necessary to correct a data skew problem, or when segment resources are added to the system. For example, the following command redistributes table data across all segments using the current distribution policy, including random distribution.
ALTER TABLE sales SET WITH (REORGANIZE=TRUE);
DROP TABLEcommand removes tables from the database. For example:
DROP TABLE mytable;
To empty a table of rows without removing the table definition, use
TRUNCATE. For example:
DELETE FROM mytable; TRUNCATE mytable;
DROP TABLEalways removes any indexes, rules, triggers, and constraints that exist for the target table. Specify
CASCADEto drop a table that is referenced by a view.
CASCADE removes dependent views.