16.10 - Tools for Troubleshooting and Administrating - Teradata Database

Teradata Database Administration

Product
Teradata Database
Release Number
16.10
Release Date
April 2018
Content Type
Administration
Publication ID
B035-1093-161K
Language
English (United States)

The following table lists some common tools you can use to administrate your system. The table also lists tools intended for use by Teradata field engineers or Teradata Database system developers. If the tools are described as for use by Teradata personnel only, do not use them unless instructed by Teradata Customer Support.

For more information on the respective tool, see the chapter or book referenced. For a list of the utilities by suggested function (such as maintenance, troubleshooting, or installation), see the “Alphabetical Listing of Utilities” in Utilities.

What to Troubleshoot or Administrate Tool and Description Reference
Administration Workstation The AWS console for an MPP installation displays the status of each physical component, including nodes, DBW consoles, and the BYNET. It provides many of the functions the System Console would for an SMP. AWS manuals
AMPs and AMP Worker Tasks The ampload utility reports AMP bottlenecks due to unavailable free AWTs.

The AWT Monitor (awtmon) utility quickly views how many AMP worker tasks are active (in use) and determines “hot AMPs” so you can troubleshoot performance problems.

awtmon is a front-end tool to the “puma -c” command and prints AWT inuse count information in a summary format.

By default, awtmon displays the AWT in-use count of the local node. With the -s option, print system-wide information and locate hot AMPs on the nodes of the entire system. When hot AMPs are identified, run awtmon on those hot nodes using pcl or psh.

You can also use the ResUsageSawt table to see AMP worker tasks in-use counts by work type and determine when AMPs enter the state of flow control.

To determine which workload is responsible for I/O skew, use the ResUsageSps table and then use DBQL to identify specific queries.

  • “AMP Load (ampload)” in Utilities
  • “AWT Monitor (awtmon)” in Utilities
  • “ResUsageSawt Table” and “ResUsageSps Table” in Resource Usage Macros and Tables
Configuration changes The Parallel Upgrade Tool (PUT) utility reboots nodes, sets up disk arrays, or changes the configuration of your system. PUT can perform configuration tasks such things as moving AMPs and PEs, changing clustering assignments, or partitioning Logical Unit Numbers (LUNs). Parallel Upgrade Tool (PUT) Reference
Data corruption and integrity Checksums check the integrity of Teradata Database disk I/O operations. A checksum is a numeric value computed from data. For a given set of data, the checksum value is always the same, provided the data is unchanged.

Because calculating checksums requires system resources and may affect system performance, system-wide checksumming is disabled by default on most platforms. Contact Teradata Customer Support if you suspect disk corruption.

Use the CHECKSUM option of CREATE TABLE, ALTER TABLE, CREATE JOIN INDEX, and CREATE HASH INDEX statements to enable checksums for a table, hash index, or join index.

“Checksum Fields” in Utilities
  The SCANDISK command of the Ferret utility checks the integrity of the Teradata Database file system. This includes the master index, cylinder index, and data block structures (including structures associated with WAL logs).

Previously halted SCANDISK operations can be restarted using a Perl script.

If further events occur after SCANDISK has been running for some time, restarting SCANDISK where it left off the last time will not find errors in previously scanned data that were caused by the most recent failures. Therefore, be very careful when deciding to restart a pending SCANDISK operation versus starting the entire operation again from the beginning.
  • “Ferret Utility (ferret)” in Utilities
  • For Teradata Customer Support personnel, see “Filer Utility (filer)” in Support Utilities
Defining vprocs and hosts The Configuration utility defines AMPs, PEs, and hosts and their interrelationships for Teradata Database. This utility is used in conjunction with the Reconfiguration utility. “Configuration Utility (config)” in Support Utilities
  The PUT utility also allows you to define and set up vprocs on the system. Parallel Upgrade Tool (PUT) Reference
Disk space The Ferret utility allows field engineers and Teradata support people to monitor and control disk usage.

The commands of this utility allow the system to combine free sectors on a cylinder, reconfigure contents on a disk leaving a specified percentage of space free for future use, report which tables qualify for packing and display disk cylinder utilization and available free cylinders.

This utility is used by Teradata Database engineers to perform routine and special diagnostics.
“Ferret Utility (ferret)” in Utilities
Update DBC utility recalculates PermSpace, SpoolSpace, and TempSpace for user DBC in DBase table. Then based on DBase values, Update DBC utility recalculates MaxPermSpace and MaxSpoolSpace in DBC.DataBaseSpace for all databases.

Use Update DBC only to correct inconsistency in the DBase or DataBaseSpace tables, which might occur as a result of rare types of system failures.

“Update DBC (updatedbc)” in Utilities
Update Space recalculates permanent, temporary, or spool spaces for one database or all databases. “Update Space (updatespace)” in Utilities
  The FixCurrentSpace SQL stored procedure performs the same functions as the Update Space utility.fixcurrentspace FixCurrentSpace Procedure
  The FixAllocatedSpace SQL stored procedure repairs the dynamic need-based allocations for databases that use global space accounting. This procedure is fast because it assumes actual current usage is correct. FixAllocatedSpace Procedure
  The DBC.DiskSpaceV view provides disk space usage information per AMP. This includes permanent and spool data by database or account for each AMP. Use this view to track large spool usage and available PERM space.
Down AMP If you receive an error that an AMP is down (for example from a MultiLoad or BTEQ job) use Vprocmanager to reboot a fatal AMP or bring a offline down AMP up. “Vproc Manager (vprocmanager)” in Utilities
Down Table The system will mark a data or index subtable as down if an error occurs in several regions of the file system structures of the table. This error could be due to hardware or bad I/O. To address or fix a down region in a subtable, do one of the following:
  • Use the DELETE ALL/DROP table statement.
  • Rebuild the table using the Table Rebuild utility.
  • Restore the table from backup.
  • Drop and recreate the index.
  • Drop and recreate the fallback subtable.

After repairing a down table, use the ALTER TABLE…RESET DOWN statement to reset a down table.

In the case of hardware errors, a fallback table can be rebuilt to copy the rows from the fallback subtable, to repair the rows in the down regions of the primary subtable, and vice versa. If the down table is rebuilt or restored from an earlier back up, the down status flag for the various subtables will be reset, and the rows in them will also be moved to unmarked data blocks and cylinders from the fallback copies.

To change the number of regions per AMP that must fail before a table is marked as down, use the MaxDownRegions field in DBS Control.

See “Table” in this table for more information.

  • “ALTER TABLE” in SQL Data Definition Language
  • “MaxDownRegions” in Utilities
Global parameters DBS Control displays and modifies the tunable global parameters in the DBS Control record. “DBS Control (dbscontrol)” in Utilities
Gateway settings The Gateway Control utility allows you to set Gateway settings such as how to handle encryption, number of session allowed for host group, time out values and more.

You can use options in the Gateway Control utility to select things like connection time out length, external authentication settings, or whether to use the system default or customer-settable defaults after a configuration or addition of new host groups and gateway vprocs, and more.

This utility is used by Teradata Database engineers to perform routine and special diagnostics.
The Gateway Global utility monitors and controls sessions for LAN-connected users. You can monitor traffic, control network sessions, display which users are using which sessions, or abort specific sessions.
Hangs or slowdowns See the topics “Sessions that are hung”, “AMPs and AMP Worker Tasks”, and “Resource usage” in this table.  
  Resource Check Tools detect hangs or slowdowns as follows:
  • mboxchk:Monitors Teradata Database response time. If the response time indicates the database is slow or unresponsive, mboxchk runs a Teradata Support Center script to gather system information, logs an event, and contacts Teradata Vital Infrastructure (TVI).
  • nodecheck: Displays local, node-level resources only. Provides summary data to syscheck for analysis.
  • syscheck: Monitors the system for signs of congestion that might lead to system slowdowns or perceived hangs and notifies you when Warning or Alert thresholds are reached.

    A text file called syscheckrc resides on each node. You can set it up to detect and report any resource that falls below a threshold.

    The resource check tools are located in the /usr/pde/bin directory.

mboxchk, nodecheck, syscheck in man pages or pdehelp
Loads into Large Tables To improve the speed of a batch INSERT/SELECT request into a large table, partition the table by transaction date, which groups the inserts into data blocks within the partition instead of throughout the table. This reduces the number of I/Os.

You can also use the DBQL utility job log table, DBC.DBQLUtilityTbl, to troubleshoot utility job performance.

Lock contentions or held locks There are several things that might cause lock contention: a pending lock, a very congested system, a number of transactions exceeding the machine capacity, or a conflict with ARC.

Careful planning of what sessions run in which order or at what time of day can help prevent lock contention. Use the EXPLAIN. Consider running a request that needs an EXCLUSIVE lock and will take a while to process during off hours.

About Lock Contentions
  The Lock Display utility shows all real-time locks in use. This includes locks for concurrency control. You can identify which transactions are blocked and which transactions are doing the blocking. “Lock Display (lokdisp)” in Utilities
  You may have locking issues on DBC.AccessRights if you are running concurrent jobs with the same userid in the same database and dropping or creating objects.

To reduce locking issues on DBC.AccessRights:

  • Run the jobs with a few different userids and distribute them randomly among the connection strings to reduce the occurrence of two userids running concurrently.
  • Determine if housekeeping needs to be performed on DBC.AccessRights for the UserID/DatabaseID involved.
  • Determine if statistics need to be collected on the Data Dictionary tables.
Teradata knowledge articles on reducing locking issues with specific applications
  The DBQL Lock Log logs in XML format in DBQLXMLLOCKTbl any lock contention longer than a user-specified threshold.

You can access the DBQL lock log through the Viewpoint lock viewer portlet or query the system table DBC.DBQLXMLLOCKTbl or the view DBC.QrylockLogXMLV.

XML Lock Log Table: DBQLXMLLockTbl.
  Showlocks utility identifies and displays all active host utility (HUT) locks placed on databases and tables by ARC operations.

HUT locks may interfere with application processing and are normally released after the utility process is complete.

If locks interfere with application processing, you can remove them by invoking the RELEASE LOCK statement. It is available through the ARC utility or as an SQL statement.

Showlocks can be started from the DB Window Supervisor screen with the command start showlocks.

  • Teradata Archive/Recovery Utility Reference
  • “Show Locks (showlocks)” in Utilities
Memory 
(insufficient memory resulting in Error 3710) Teradata recommends that you use the default value set for the MaxParseTreeSegs field in DBS Control. If you run into this error, first try simplifying your queries and/or dropping unnecessary join indexes. If this still does not work, verify the value you have set for MaxParseTreeSegs and contact Teradata Customer Support.
The optimal value depends on your system and what release and fix level you are running. Increasing MaxParseTreeSegs to a value larger than the default may allow a previously failed 3710 error to run but could result in other problems such as a long parsing time or in extreme cases, a Teradata reset.
“MaxParseTreeSegs” in Utilities
Priority of jobs TASM assigns different priorities to different types of jobs. Higher priority work receives access to CPU and I/O more frequently.

Use the ResUsageSps table to understand how CPU is being consumed by different workloads. You can also use this table to understand the pattern of database activity across the processing day.

“ResUsageSps Table” in Resource Usage Macros and Tables

Queries Use BEGIN/REPLACE QUERY LOGGING WITH XMLPLAN or use the EXPLAIN IN XML request modifier.

Alternately, use the STEPINFO option of BEGIN/REPLACE QUERY LOGGING to determine which step uses the most resources and if there are any large differences between estimated and actual CPU, I/O, and number of rows.

Manage your queries using Teradata Viewpoint. You can:

  • Filter queries against object-access or query-resource rules.
  • Control the flow of queries coming into the system for concurrency reasons and delay them if necessary.
  • Log any system exception actions.
 
Resource usage Using resource usage macros and tables to obtain information about which nodes or which vprocs are heavily using which resources. This can help you identify bottlenecks. Resource Usage Macros and Tables
  The DBQL view DBC.QryLogV reports things such as the AMP using the most CPU, the AMP with the most I/O, or maximum amount of spool used when processing a query. View of the Main Log Table: QryLog[V]
Sessions that are hung Use Query Session utility to determine the state of each session and then use Lock Display utility to find lock contentions. Troubleshooting Problems with Sessions
Session information when logged onto multiple sessions To find the Thread ID or Job ID of the session from which you are querying, first submit a SELECT SESSION;

Then after that query returns the session number n, use that number in the following query:

sel logonsource
from sessioninfovx
where sessionno =  n;

*** Query completed. One row found. One column
returned.
*** Total elapsed time was 1 second.

LogonSource
-------------------------------------------------------
(TCP/IP) 09AB 127.0.0.1 DBC    8804  USER  BTEQ  01 LSS

This helps you determine which Client process/Thread ID belongs to which session. This is especially useful if you want to find out which Thread ID belongs to which session because you are logged on to multiple sessions.

“SessionInfo[V][X]” in Data Dictionary
  Query Session utility displays the state of load utility and query sessions. Details can include statuses such as Parsing, Active, Blocked, Response, whether stored procedures are being processed, and so on. “Query Session (qrysessn)” in Utilities
Skew There are many ways to detect skew:
  • Use Teradata Viewpoint to determine which user is using the most CPU. In general, a bad query can be detected by finding users with high CPU relative to I/O access.
  • Look at the EXPLAIN text to determine the number of steps used for the query and the type of operation. You may find something like a product join in a poorly written query. You can abort the query and investigate further or fix the query.
  • To identify node-level skew, use the ResUsageSpma table, comparing maximum CPU usage to the average CPU consumed across all nodes. If parallel efficiency as shown in the ResUsage Spma table is not approaching 100%, look for the reasons for the imbalance.
  • To find AMP-level skew, there are several options. Use the ResCPUByAMP macro to find AMP-level CPU skew. You can look at the ResUsageSpvr table and compare resource usage across different vprocs. You can also look at the number of requests field in the ResUsageSps table to see if workloads on one node are actively supporting more requests than on other nodes, then drill down to AMPs. Also using the ResUsageSps table, you can examine queue wait and service time numbers to find backed up queries by workload or allocation group.
  • To find skew at the request level, you can compare DBQLogTbl fields MaxAMPCPUTime with MinAMPCPU. Because user logging is often incomplete with DBQL, ResUsage tables and macros may provide more accurate information.
  • Another option is to use PM/API to develop your own application to monitor for high CPU consumers.
  • “EXPLAIN Request Modifier” in SQL Data Manipulation Language
  • “ResUsageSps Table,” “ResUsageSpma Table,” “ResUsageSvpr table,” and “ResCPUByAMP Macros” in Resource Usage Macros and Tables
Slowdowns See the topic “Hangs or slowdowns” in this table.  
Slow queries to DBC tables If your site uses a lot of custom applications or tools that often query the DBC tables, consider collecting statistics on these tables About Collecting Statistics on Data Dictionary Tables
Space Exceeded If a space-intensive operation, such as a load utility or restoring from an archive, exceeds available perm space, two DBS Control utility fields can be set temporarily to allow operations to complete and avoid out of space errors.

The settings apply to all operations and all instances of space accounting, such as table inserts, so Teradata recommends setting these fields to a non-zero value only temporarily.

OverflowMaxPermSpace-
Percentage and OverflowMaxPermSpace-
KByteCount fields of the DBS Control utility in Utilities.
Alternatively, use global space accounting, which provides extra space to AMPs when needed, preventing space-intensive operations from failing due to lack of space. About Global Space Accounting
Spool space depletion There are several reasons that could contribute to limited spool usage:
  • Not using global space accounting, which allows extra space to AMPs when needed.
  • Poorly written queries can sometimes consume excessive spool space. Use Teradata Viewpoint to check queries before allowing them to run. You can check the EXPLAIN to see how much spool is used and prevent the system from getting overloaded with bad queries.
  • Outdated statistics can cause bad queries. The Optimizer may calculate query plans differently than it ought if statistics it uses does not accurately reflect the system. The Optimizer also determines how much spool it needs or does not need based on collected statistics. Refresh collected statistics regularly.
  • If you are using OLAP statistical functions and you run out of spool, check the syntax of your queries. For example, when using the PARTITION BY clause, if a large number of identical values in the partitioning column hash to the same AMP, this can result in out-of-spool errors. Choose a column that results in rows being distributed over a broader set of AMPs.
  • Poorly skewed data. Poor data distribution means one AMP is responsible for a larger proportion of the data. Because the spool space is divided by the number of AMPs, reaching the maximum spool space limit on one AMP means your query will fail even despite the other unused spool space on the other AMPs.
  • Poorly chosen PI that leads to AMP skew. Because the spool space is divided by the number of AMPs, reaching the maximum spool space limit on one AMP means your query will fail even despite the other unused spool space on the other AMPs.
  • Adding nodes or AMPs without reassigning spool space means spreading out the spool amongst more AMPs.
  • The limit for spool space as defined in the profile overrides the user settings. Check the limit for the profile to see if it is lower than the limit defined for the user. You can compare the values of MaxProfileSpool and MaxSpool in DBC.DiskSpace.V

Limiting spool space for new users helps reduce the impact of possibly bad queries (such as product joins). With a reasonable limit, the user will get out-of-spool messages before a bad query ties up the system. Furthermore, adding spool to a bad query will allow the query to run longer but a bad query may still not complete.

Space allocated to storage Use the Teradata Virtual Allocator Manager (TVAM) utility.

Allows you to display mapping or force migration of cylinders on the system.

Do not use the TVAM utility unless instructed by Teradata Customer Support.
Man page or online help
System crashes or failures Screen Debug and Screen Dump controls how and what your system records for crashdumps. Teradata recommends that default settings for all dumps be changed only when requested by a system support representative or Teradata Customer Support. “Control GDO Editor (ctl)” in Utilities
DBC.Software_Event_LogV is a system view that provides detailed information about errors or system failures and their related node, vproc, partition, task, function, software version, optional backtrace, diagnostic information, and so on.

The Event_Tag field of the view reports the error message number.

  • Data Dictionary
  • Messages
DUL utility saves or restores system dump tables onto tape. “Dump Unload/Load Utility (dul)” in Utilities
System recovery Recovery Manager utility monitors and reports the progress of a system recovery after a crash or a user abort. You can use this utility to monitor transaction rollbacks, cancel rollbacks, and more. “Recovery Manager (rcvmanager)” in Utilities
Table The Table Rebuild utility rebuilds tables Teradata Database cannot recover automatically. Table Rebuild can rebuild the following for an AMP:
  • The primary or fallback portion of a table
  • An entire table (both primary and fallback portions)
  • A particular range of rows of a subtable
  • The corrupted regions of a table
  • A corrupted or missing table header
  • All tables
  • All fallback tables
  • All tables in a database
  • Single tables sequentially or tables of each database in parallel.
  • “Table Rebuild (rebuild)” in Utilities
  • See also “Table inconsistencies or corruption”
Table inconsistencies or corruption CheckTable is a diagnostic tool that checks for:
  • Table and dictionary inconsistencies, such as differing table versions, ParentCount and ChildCount data, and partitioning definitions.
  • Table corruption, such as duplicate rows or unique values and data inconsistency, of primary and fallback data and stored procedure tables.
  • Inconsistencies in internal data structures such as table headers, row identifiers, secondary indexes, and reference indexes.
  • Invalid row hash values and partition numbers.

Options for CheckTable include the following:

  • Four levels of data integrity checking (PENDINGOP, ONE, TWO, and THREE). Each successive level performs additional checks that are more thorough and consequently more time consuming.
  • The DOWN ONLY option displays status information for down tables only.

    The DOWN ONLY option works for all levels except PENDINGOP.

  • The ERROR ONLY option causes CheckTable to display only skipped tables, down tables, and tables with errors and warnings. This allows you to quickly identify and address problems found by CheckTable.
  • The OUTPUT command allows you to redirect CheckTable output to a log file. The ECHO option for this command allows you to direct output to both the console and a log file.

CHECKTABLEB is a non-interactive batch mode version of CheckTable that can be run through the cnsrun utility. CHECKTABLEB is identical to CheckTable except that it is meant to run from a script.

“CheckTable (checktable)” in Utilities
  The Table Rebuild utility allows you to fix a primary or fallback copy of a table or only specific corrupted parts. When the rebuild operation successfully completes on an AMP, TableRebuild will remove the down region information from the table header. If, however, the fallback subtable has any down regions defined on it, the rebuild process aborts and the system returns an error.

You cannot rebuild a table with parts of its fallback down. While you can rebuild stored procedures, UDFs, and UDMS, you cannot rebuild a join index or hash index defined on a table being rebuilt.

  • “Table Rebuild (rebuild)” in Utilities
  • See also “Table inconsistencies or corruption”
Vprocs The Vproc Manager utility allows you to:
  • View or change vproc states
  • Initialize and boot a specific vproc
  • Initialize the storage associated with a specific vproc
  • Force a manual Teradata Database restart

A few of the vprocs statuses include:

  • FATAL: the vproc or its associated storage is not operational. The system marks it fatal because of repeated crashes or if data integrity is in danger. Multiple crashes are indicated by the crash count and indicate a repeatable condition of data corruption, bad SQL or a software bug. Contact Teradata Customer Support.
  • FATAL** : this is an “fsgwizard” data integrity situation. Contact Teradata Customer Support.
  • OFFLINE: the vproc is operational, but either set offline purposely by an operator or if, for example, all AMPs in a clique are OFFLINE, they were probably set OFFLINE by the system because none of the nodes in its clique were up. This usually occurs during a system-wide reboot of all nodes, but nodes come up at different times.
    A local crash of an OFFLINE vproc will cause the database to restart.
  • UTILITY Catchup (offline catchup): the vproc was previously down (OFFLINE or FATAL or some other state). If its fallback data was written to another vproc in its cluster, the vproc is now back up (but not online) while it catches up its fallback data.
“Vproc Manager (vprocmanager)” in Utilities
Vprocs and storage From the Supervisor screen of the Database Window, enter GET CONFIG to view the current system configuration.

To administrate vprocs and disks, use the Vproc Manager utility.

  • Database Window (xdbw)” in Utilities
  • “Vproc Manager (vprocmanager)” in Utilities
  The Query Configuration utility reports the status of vprocs managed by node or for the entire Teradata Database system. “Query Configuration (qryconfig)” in Utilities
Vproc Manager (vprocmanager) utility If vprocmanager exits with an error...
# vprocmanager
PDE is not running: Operation not permitted
PDE event 10117 reported when task is not attached 
to PDE or PDE is down. Task should stop on this
error.

then you are probably attempting to run it from a non-TPA node. Run vprocmanager from a TPA node (a node running Teradata Database software).

“Vproc Manager (vprocmanager)” in Utilities