Monitoring a HAWQ System

You can monitor a HAWQ system using a variety of tools included with the system or available as add-ons.

Observing the HAWQ system day-to-day performance helps administrators understand the system behavior, plan workflow, and troubleshoot problems. This chapter discusses tools for monitoring database performance and activity.

Also, be sure to review Recommended Monitoring and Maintenance Tasks for monitoring activities you can script to quickly detect problems in the system.

Monitoring System State

As a HAWQ administrator, you must monitor the system for problem events such as a segment going down or running out of disk space on a segment host. The following topics describe how to monitor the health of a HAWQ system and examine certain state information for a HAWQ system.

Checking System State

A HAWQ system is comprised of multiple PostgreSQL instances (the master and segments) spanning multiple machines. To monitor a HAWQ system, you need to know information about the system as a whole, as well as status information of the individual instances. The hawq state utility provides status information about a HAWQ system.

Viewing Master and Segment Status and Configuration

The default hawq state action is to check segment instances and show a brief status of the valid and failed segments. For example, to see a quick status of your HAWQ system, type:

$ hawq state -b

You can also display information about the HAWQ master data directory by using hawq state with the -d option:

$ hawq state -d MASTER_DIR

Checking Disk Space Usage

Checking Sizing of Distributed Databases and Tables

The hawq_toolkit administrative schema contains several views that you can use to determine the disk space usage for a distributed HAWQ database, schema, table, or index.

Viewing Disk Space Usage for a Database

To see the total size of a database (in bytes), use the hawq_size_of_database view in the hawq_toolkit administrative schema. For example:

=> SELECT * FROM hawq_toolkit.hawq_size_of_database
ORDER BY sodddatname;
Viewing Disk Space Usage for a Table

The hawq_toolkit administrative schema contains several views for checking the size of a table. The table sizing views list the table by object ID (not by name). To check the size of a table by name, you must look up the relation name (relname) in the pg_class table. For example:

=> SELECT relname AS name, sotdsize AS size, sotdtoastsize
AS toast, sotdadditionalsize AS other
FROM hawq_size_of_table_disk as sotd, pg_class
WHERE sotd.sotdoid=pg_class.oid ORDER BY relname;
Viewing Disk Space Usage for Indexes

The hawq_toolkit administrative schema contains a number of views for checking index sizes. To see the total size of all index(es) on a table, use the hawq_size_of_all_table_indexes view. To see the size of a particular index, use the hawq_size_of_index view. The index sizing views list tables and indexes by object ID (not by name). To check the size of an index by name, you must look up the relation name (relname) in the pg_class table. For example:

=> SELECT soisize, relname as indexname
FROM pg_class, hawq_size_of_index
WHERE pg_class.oid=hawq_size_of_index.soioid
AND pg_class.relkind='i';

Viewing Metadata Information about Database Objects

HAWQ tracks various metadata information in its system catalogs about the objects stored in a database, such as tables, views, indexes and so on, as well as global objects such as roles and tablespaces.

Viewing the Last Operation Performed

You can use the system views pg_stat_operations and pg_stat_partition_operations to look up actions performed on an object, such as a table. For example, to see the actions performed on a table, such as when it was created and when it was last vacuumed and analyzed:

=> SELECT schemaname as schema, objname as table,
usename as role, actionname as action,
subtype as type, statime as time
FROM pg_stat_operations
WHERE objname='cust';
 schema | table | role | action  | type  | time
--------+-------+------+---------+-------+--------------------------
  sales | cust  | main | CREATE  | TABLE | 2010-02-09 18:10:07.867977-08
  sales | cust  | main | VACUUM  |       | 2010-02-10 13:32:39.068219-08
  sales | cust  | main | ANALYZE |       | 2010-02-25 16:07:01.157168-08
(3 rows)

Viewing the Definition of an Object

To see the definition of an object, such as a table or view, you can use the \d+ meta-command when working in psql. For example, to see the definition of a table:

=> \d+ mytable

Viewing Query Workfile Usage Information

The HAWQ administrative schema hawq_toolkit contains views that display information about HAWQ workfiles. HAWQ creates workfiles on disk if it does not have sufficient memory to execute the query in memory. This information can be used for troubleshooting and tuning queries. The information in the views can also be used to specify the values for the HAWQ configuration parameters hawq_workfile_limit_per_query and hawq_workfile_limit_per_segment.

These are the views in the schema hawq_toolkit:

  • The hawq_workfile_entries view contains one row for each operator using disk space for workfiles on a segment at the current time.
  • The hawq_workfile_usage_per_query view contains one row for each query using disk space for workfiles on a segment at the current time.
  • The hawq_workfile_usage_per_segment view contains one row for each segment. Each row displays the total amount of disk space used for workfiles on the segment at the current time.

For information about using hawq_toolkit, see Using hawq_toolkit.

Viewing the Database Server Log Files

Every database instance in HAWQ (master and segments) runs a PostgreSQL database server with its own server log file. Daily log files are created in the pg_log directory of the master and each segment data directory ($GPHOME/masterdd/pg_log and $GPHOME/segmentdd/pg_log).

Log File Format

The server log files are written in comma-separated values (CSV) format. Some log entries will not have values for all log fields. For example, only log entries associated with a query worker process will have the slice_id populated. You can identify related log entries of a particular query by the query’s session identifier (gp_session_id) and command identifier (gp_command_count).

The following fields are written to the log:

#Field NameData TypeDescription
1event_timetimestamp with time zoneTime that the log entry was written to the log
2user_namevarchar(100)The database user name
3database_namevarchar(100)The database name
4process_idvarchar(10)The system process ID (prefixed with “p”)
5thread_idvarchar(50)The thread count (prefixed with “th”)
6remote_hostvarchar(100)On the master, the hostname/address of the client machine. On the segment, the hostname/address of the master.
7remote_portvarchar(10)The segment or master port number
8session_start_timetimestamp with time zoneTime session connection was opened
9transaction_idintTop-level transaction ID on the master. This ID is the parent of any subtransactions.
10gp_session_idtextSession identifier number (prefixed with “con”)
11gp_command_counttextThe command number within a session (prefixed with “cmd”)
12gp_segmenttextThe segment content identifier. The master always has a content ID of -1.
13slice_idtextThe slice ID (portion of the query plan being executed)
14distr_tranx_idtextDistributed transaction ID
15local_tranx_idtextLocal transaction ID
16sub_tranx_idtextSubtransaction ID
17event_severityvarchar(10)Values include: LOG, ERROR, FATAL, PANIC, DEBUG1, DEBUG2
18sql_state_codevarchar(10)SQL state code associated with the log message
19event_messagetextLog or error message text
20event_detailtextDetail message text associated with an error or warning message
21event_hinttextHint message text associated with an error or warning message
22internal_querytextThe internally-generated query text
23internal_query_posintThe cursor index into the internally-generated query text
24event_contexttextThe context in which this message gets generated
25debug_query_stringtextUser-supplied query string with full detail for debugging. This string can be modified for internal use.
26error_cursor_posintThe cursor index into the query string
27func_nametextThe function in which this message is generated
28file_nametextThe internal code file where the message originated
29file_lineintThe line of the code file where the message originated
30stack_tracetextStack trace text associated with this message

Searching the HAWQ Server Log Files

HAWQ provides a utility called gplogfilter can search through a HAWQ log file for entries matching the specified criteria. By default, this utility searches through the HAWQ master log file in the default logging location. For example, to display the entries to the master log file starting after 2 pm on a certain date:

$ gplogfilter -b '2016-01-18 14:00'

To search through all segment log files simultaneously, run gplogfilter through the hawq ssh utility. For example, specify the seg_host_log_file that contains hosts to participate in the session, then use gplogfilter to display the last three lines of each segment log file:

$ hawq ssh -f seg_host_log_file
=> source ~/greenplum_path.sh
=> gplogfilter -n 3 /data/hawq-install-path/segmentdd/pg_log/hawq*.csv

Using hawq_toolkit

Use HAWQ’s administrative schema hawq_toolkit to query the system catalogs, log files, and operating environment for system status information. The hawq_toolkit schema contains several views you can access using SQL commands. The hawq_toolkit schema is accessible to all database users. Some objects require superuser permissions. Use a command similar to the following to add the hawq_toolkit schema to your schema search path:

=> ALTER ROLE myrole SET search_path TO myschema,hawq_toolkit;

HAWQ Error Codes

The following section describes SQL error codes for certain database events.

SQL Standard Error Codes

The following table lists all the defined error codes. Some are not used, but are defined by the SQL standard. The error classes are also shown. For each error class there is a standard error code having the last three characters 000. This code is used only for error conditions that fall within the class but do not have any more-specific code assigned.

The PL/pgSQL condition name for each error code is the same as the phrase shown in the table, with underscores substituted for spaces. For example, code 22012, DIVISION BY ZERO, has condition name DIVISION_BY_ZERO. Condition names can be written in either upper or lower case.

Note: PL/pgSQL does not recognize warning, as opposed to error, condition names; those are classes 00, 01, and 02.

Error Code Meaning Constant
Class 00— Successful Completion
00000 SUCCESSFUL COMPLETION successful_completion
Class 01 — Warning
01000 WARNING warning
0100C DYNAMIC RESULT SETS RETURNED dynamic_result_sets_returned
01008 IMPLICIT ZERO BIT PADDING implicit_zero_bit_padding
01003 NULL VALUE ELIMINATED IN SET FUNCTION null_value_eliminated_in_set_function
01007 PRIVILEGE NOT GRANTED privilege_not_granted
01006 PRIVILEGE NOT REVOKED privilege_not_revoked
01004 STRING DATA RIGHT TRUNCATION string_data_right_truncation
01P01 DEPRECATED FEATURE deprecated_feature
Class 02 — No Data (this is also a warning class per the SQL standard)
02000 NO DATA no_data
02001 NO ADDITIONAL DYNAMIC RESULT SETS RETURNED no_additional_dynamic_result_sets_returned
Class 03 — SQL Statement Not Yet Complete
03000 SQL STATEMENT NOT YET COMPLETE sql_statement_not_yet_complete
Class 08 — Connection Exception
08000 CONNECTION EXCEPTION connection_exception
08003 CONNECTION DOES NOT EXIST connection_does_not_exist
08006 CONNECTION FAILURE connection_failure
08001 SQLCLIENT UNABLE TO ESTABLISH SQLCONNECTION sqlclient_unable_to_establish_sqlconnection
08004 SQLSERVER REJECTED ESTABLISHMENT OF SQLCONNECTION sqlserver_rejected_establishment_of_sqlconnection
08007 TRANSACTION RESOLUTION UNKNOWN transaction_resolution_unknown
08P01 PROTOCOL VIOLATION protocol_violation
Class 09 — Triggered Action Exception
09000 TRIGGERED ACTION EXCEPTION triggered_action_exception
Class 0A — Feature Not Supported
0A000 FEATURE NOT SUPPORTED feature_not_supported
Class 0B — Invalid Transaction Initiation
0B000 INVALID TRANSACTION INITIATION invalid_transaction_initiation
Class 0F — Locator Exception
0F000 LOCATOR EXCEPTION locator_exception
0F001 INVALID LOCATOR SPECIFICATION invalid_locator_specification
Class 0L — Invalid Grantor
0L000 INVALID GRANTOR invalid_grantor
0LP01 INVALID GRANT OPERATION invalid_grant_operation
Class 0P — Invalid Role Specification
0P000 INVALID ROLE SPECIFICATION invalid_role_specification
Class 21 — Cardinality Violation
21000 CARDINALITY VIOLATION cardinality_violation
Class 22 — Data Exception
22000 DATA EXCEPTION data_exception
2202E ARRAY SUBSCRIPT ERROR array_subscript_error
22021 CHARACTER NOT IN REPERTOIRE character_not_in_repertoire
22008 DATETIME FIELD OVERFLOW datetime_field_overflow
22012 DIVISION BY ZERO division_by_zero
22005 ERROR IN ASSIGNMENT error_in_assignment
2200B ESCAPE CHARACTER CONFLICT escape_character_conflict
22022 INDICATOR OVERFLOW indicator_overflow
22015 INTERVAL FIELD OVERFLOW interval_field_overflow
2201E INVALID ARGUMENT FOR LOGARITHM invalid_argument_for_logarithm
2201F INVALID ARGUMENT FOR POWER FUNCTION invalid_argument_for_power_function
2201G INVALID ARGUMENT FOR WIDTH BUCKET FUNCTION invalid_argument_for_width_bucket_function
22018 INVALID CHARACTER VALUE FOR CAST invalid_character_value_for_cast
22007 INVALID DATETIME FORMAT invalid_datetime_format
22019 INVALID ESCAPE CHARACTER invalid_escape_character
2200D INVALID ESCAPE OCTET invalid_escape_octet
22025 INVALID ESCAPE SEQUENCE invalid_escape_sequence
22P06 NONSTANDARD USE OF ESCAPE CHARACTER nonstandard_use_of_escape_character
22010 INVALID INDICATOR PARAMETER VALUE invalid_indicator_parameter_value
22020 INVALID LIMIT VALUE invalid_limit_value
22023 INVALID PARAMETER VALUE invalid_parameter_value
2201B INVALID REGULAR EXPRESSION invalid_regular_expression
22009 INVALID TIME ZONE DISPLACEMENT VALUE invalid_time_zone_displacement_value
2200C INVALID USE OF ESCAPE CHARACTER invalid_use_of_escape_character
2200G MOST SPECIFIC TYPE MISMATCH most_specific_type_mismatch
22004 NULL VALUE NOT ALLOWED null_value_not_allowed
22002 NULL VALUE NO INDICATOR PARAMETER null_value_no_indicator_parameter
22003 NUMERIC VALUE OUT OF RANGE numeric_value_out_of_range
22026 STRING DATA LENGTH MISMATCH string_data_length_mismatch
22001 STRING DATA RIGHT TRUNCATION string_data_right_truncation
22011 SUBSTRING ERROR substring_error
22027 TRIM ERROR trim_error
22024 UNTERMINATED C STRING unterminated_c_string
2200F ZERO LENGTH CHARACTER STRING zero_length_character_string
22P01 FLOATING POINT EXCEPTION floating_point_exception
22P02 INVALID TEXT REPRESENTATION invalid_text_representation
22P03 INVALID BINARY REPRESENTATION invalid_binary_representation
22P04 BAD COPY FILE FORMAT bad_copy_file_format
22P05 UNTRANSLATABLE CHARACTER untranslatable_character
Class 23 — Integrity Constraint Violation
23000 INTEGRITY CONSTRAINT VIOLATION integrity_constraint_violation
23001 RESTRICT VIOLATION restrict_violation
23502 NOT NULL VIOLATION not_null_violation
23503 FOREIGN KEY VIOLATION foreign_key_violation
23505 UNIQUE VIOLATION unique_violation
23514 CHECK VIOLATION check_violation
Class 24 — Invalid Cursor State
24000 INVALID CURSOR STATE invalid_cursor_state
Class 25 — Invalid Transaction State
25000 INVALID TRANSACTION STATE invalid_transaction_state
25001 ACTIVE SQL TRANSACTION active_sql_transaction
25002 BRANCH TRANSACTION ALREADY ACTIVE branch_transaction_already_active
25008 HELD CURSOR REQUIRES SAME ISOLATION LEVEL held_cursor_requires_same_isolation_level
25003 INAPPROPRIATE ACCESS MODE FOR BRANCH TRANSACTION inappropriate_access_mode_for_branch_transaction
25004 INAPPROPRIATE ISOLATION LEVEL FOR BRANCH TRANSACTION inappropriate_isolation_level_for_branch_transaction
25005 NO ACTIVE SQL TRANSACTION FOR BRANCH TRANSACTION no_active_sql_transaction_for_branch_transaction
25006 READ ONLY SQL TRANSACTION read_only_sql_transaction
25007 SCHEMA AND DATA STATEMENT MIXING NOT SUPPORTED schema_and_data_statement_mixing_not_supported
25P01 NO ACTIVE SQL TRANSACTION no_active_sql_transaction
25P02 IN FAILED SQL TRANSACTION in_failed_sql_transaction
Class 26 — Invalid SQL Statement Name
26000 INVALID SQL STATEMENT NAME invalid_sql_statement_name
Class 27 — Triggered Data Change Violation
27000 TRIGGERED DATA CHANGE VIOLATION triggered_data_change_violation
Class 28 — Invalid Authorization Specification
28000 INVALID AUTHORIZATION SPECIFICATION invalid_authorization_specification
Class 2B — Dependent Privilege Descriptors Still Exist
2B000 DEPENDENT PRIVILEGE DESCRIPTORS STILL EXIST dependent_privilege_descriptors_still_exist
2BP01 DEPENDENT OBJECTS STILL EXIST dependent_objects_still_exist
Class 2D — Invalid Transaction Termination
2D000 INVALID TRANSACTION TERMINATION invalid_transaction_termination
Class 2F — SQL Routine Exception
2F000 SQL ROUTINE EXCEPTION sql_routine_exception
2F005 FUNCTION EXECUTED NO RETURN STATEMENT function_executed_no_return_statement
2F002 MODIFYING SQL DATA NOT PERMITTED modifying_sql_data_not_permitted
2F003 PROHIBITED SQL STATEMENT ATTEMPTED prohibited_sql_statement_attempted
2F004 READING SQL DATA NOT PERMITTED reading_sql_data_not_permitted
Class 34 — Invalid Cursor Name
34000 INVALID CURSOR NAME invalid_cursor_name
Class 38 — External Routine Exception
38000 EXTERNAL ROUTINE EXCEPTION external_routine_exception
38001 CONTAINING SQL NOT PERMITTED containing_sql_not_permitted
38002 MODIFYING SQL DATA NOT PERMITTED modifying_sql_data_not_permitted
38003 PROHIBITED SQL STATEMENT ATTEMPTED prohibited_sql_statement_attempted
38004 READING SQL DATA NOT PERMITTED reading_sql_data_not_permitted
Class 39 — External Routine Invocation Exception
39000 EXTERNAL ROUTINE INVOCATION EXCEPTION external_routine_invocation_exception
39001 INVALID SQLSTATE RETURNED invalid_sqlstate_returned
39004 NULL VALUE NOT ALLOWED null_value_not_allowed
39P01 TRIGGER PROTOCOL VIOLATED trigger_protocol_violated
39P02 SRF PROTOCOL VIOLATED srf_protocol_violated
Class 3B — Savepoint Exception
3B000 SAVEPOINT EXCEPTION savepoint_exception
3B001 INVALID SAVEPOINT SPECIFICATION invalid_savepoint_specification
Class 3D — Invalid Catalog Name
3D000 INVALID CATALOG NAME invalid_catalog_name
Class 3F — Invalid Schema Name
3F000 INVALID SCHEMA NAME invalid_schema_name
Class 40 — Transaction Rollback
40000 TRANSACTION ROLLBACK transaction_rollback
40002 TRANSACTION INTEGRITY CONSTRAINT VIOLATION transaction_integrity_constraint_violation
40001 SERIALIZATION FAILURE serialization_failure
40003 STATEMENT COMPLETION UNKNOWN statement_completion_unknown
40P01 DEADLOCK DETECTED deadlock_detected
Class 42 — Syntax Error or Access Rule Violation
42000 SYNTAX ERROR OR ACCESS RULE VIOLATION syntax_error_or_access_rule_violation
42601 SYNTAX ERROR syntax_error
42501 INSUFFICIENT PRIVILEGE insufficient_privilege
42846 CANNOT COERCE cannot_coerce
42803 GROUPING ERROR grouping_error
42830 INVALID FOREIGN KEY invalid_foreign_key
42602 INVALID NAME invalid_name
42622 NAME TOO LONG name_too_long
42939 RESERVED NAME reserved_name
42804 DATATYPE MISMATCH datatype_mismatch
42P18 INDETERMINATE DATATYPE indeterminate_datatype
42809 WRONG OBJECT TYPE wrong_object_type
42703 UNDEFINED COLUMN undefined_column
42883 UNDEFINED FUNCTION undefined_function
42P01 UNDEFINED TABLE undefined_table
42P02 UNDEFINED PARAMETER undefined_parameter
42704 UNDEFINED OBJECT undefined_object
42701 DUPLICATE COLUMN duplicate_column
42P03 DUPLICATE CURSOR duplicate_cursor
42P04 DUPLICATE DATABASE duplicate_database
42723 DUPLICATE FUNCTION duplicate_function
42P05 DUPLICATE PREPARED STATEMENT duplicate_prepared_statement
42P06 DUPLICATE SCHEMA duplicate_schema
42P07 DUPLICATE TABLE duplicate_table
42712 DUPLICATE ALIAS duplicate_alias
42710 DUPLICATE OBJECT duplicate_object
42702 AMBIGUOUS COLUMN ambiguous_column
42725 AMBIGUOUS FUNCTION ambiguous_function
42P08 AMBIGUOUS PARAMETER ambiguous_parameter
42P09 AMBIGUOUS ALIAS ambiguous_alias
42P10 INVALID COLUMN REFERENCE invalid_column_reference
42611 INVALID COLUMN DEFINITION invalid_column_definition
42P11 INVALID CURSOR DEFINITION invalid_cursor_definition
42P12 INVALID DATABASE DEFINITION invalid_database_definition
42P13 INVALID FUNCTION DEFINITION invalid_function_definition
42P14 INVALID PREPARED STATEMENT DEFINITION invalid_prepared_statement_definition
42P15 INVALID SCHEMA DEFINITION invalid_schema_definition
42P16 INVALID TABLE DEFINITION invalid_table_definition
42P17 INVALID OBJECT DEFINITION invalid_object_definition
Class 44 — WITH CHECK OPTION Violation
44000 WITH CHECK OPTION VIOLATION with_check_option_violation
Class 53 — Insufficient Resources
53000 INSUFFICIENT RESOURCES insufficient_resources
53100 DISK FULL disk_full
53200 OUT OF MEMORY out_of_memory
53300 TOO MANY CONNECTIONS too_many_connections
Class 54 — Program Limit Exceeded
54000 PROGRAM LIMIT EXCEEDED program_limit_exceeded
54001 STATEMENT TOO COMPLEX statement_too_complex
54011 TOO MANY COLUMNS too_many_columns
54023 TOO MANY ARGUMENTS too_many_arguments
Class 55 — Object Not In Prerequisite State
55000 OBJECT NOT IN PREREQUISITE STATE object_not_in_prerequisite_state
55006 OBJECT IN USE object_in_use
55P02 CANT CHANGE RUNTIME PARAM cant_change_runtime_param
55P03 LOCK NOT AVAILABLE lock_not_available
Class 57 — Operator Intervention
57000 OPERATOR INTERVENTION operator_intervention
57014 QUERY CANCELED query_canceled
57P01 ADMIN SHUTDOWN admin_shutdown
57P02 CRASH SHUTDOWN crash_shutdown
57P03 CANNOT CONNECT NOW cannot_connect_now
Class 58 — System Error (errors external to HAWQ )
58030 IO ERROR io_error
58P01 UNDEFINED FILE undefined_file
58P02 DUPLICATE FILE duplicate_file
Class F0 — Configuration File Error
F0000 CONFIG FILE ERROR config_file_error
F0001 LOCK FILE EXISTS lock_file_exists
Class P0 — PL/pgSQL Error
P0000 PLPGSQL ERROR plpgsql_error
P0001 RAISE EXCEPTION raise_exception
P0002 NO DATA FOUND no_data_found
P0003 TOO MANY ROWS too_many_rows
Class XX — Internal Error
XX000 INTERNAL ERROR internal_error
XX001 DATA CORRUPTED data_corrupted
XX002 INDEX CORRUPTED index_corrupted