Tools for Troubleshooting and Administrating - Teradata Database

Teradata Database Administration

Product
Teradata Database
Release Number
15.10
Language
English (United States)
Last Update
2018-10-06
Product Category
Software

The following table lists some common tools you can use to administrate your system. The table also lists tools intended for use by Teradata field engineers or Teradata Database system developers. If the tools are described as for use by Teradata personnel only, do not use them unless instructed by Teradata Customer Support.

For more information on the respective tool, see the chapter or book referenced. For a list of the utilities by suggested function (such as maintenance, troubleshooting, or installation), see the “Alphabetical Listing of Utilities” in Utilities.

 

What to Troubleshoot or Administrate

Tool and Description

Reference

Administration Workstation

The AWS console for an MPP installation displays the status of each physical component, including nodes, DBW consoles, and the BYNET. It provides many of the functions the System Console would for an SMP.

AWS manuals (Visit www.info.teradata.com
and search for the keyword “AWS”.)

AMPs and AMP Worker Tasks

The ampload utility reports AMP bottlenecks due to unavailable free AWTs.

“AMP Load (ampload)” in Utilities

AMPs and AMP Worker Tasks (continued)

The AWT Monitor (awtmon) utility quickly views how many AMP worker tasks are active (in use) and determines “hot AMPs” so you can troubleshoot performance problems.

awtmon is a front-end tool to the “puma -c” command and prints AWT inuse count information in a summary format.

By default, awtmon displays the AWT in-use count of the local node. With the -s option, print system-wide information and locate hot AMPs on the nodes of the entire system. When hot AMPs are identified, run awtmon on those hot nodes using pcl or psh.

You can also use the ResUsageSawt table to see AMP worker tasks in-use counts by work type and determine when AMPs enter the state of flow control.

To determine which workload is responsible for I/O skew, use the ResUsageSps table and then use DBQL to identify specific queries.

  • “AWT Monitor (awtmon)” in Utilities
  • “ResUsageSawt Table” and “ResUsageSps Table” in Resource Usage Macros and Tables
  •  

    Configuration changes

    The Parallel Upgrade Tool (PUT) utility reboots nodes, sets up disk arrays, or changes the configuration of your system. PUT can perform configuration tasks such things as moving AMPs and PEs, changing clustering assignments, or partitioning Logical Unit Numbers (LUNs).

    Parallel Upgrade Tool (PUT) Reference

    crashdumps

    Use the crashdumps analysis tools such as the Crashdumps Save Program (CSP) for viewing the crashdumps, csppeek for checking the contents of a saved crashdump, and the error logs.

    “Appendix D Handling Teradata Crashdumps: Operational DBAs” on page 553.

    Data corruption and integrity

    Checksums check the integrity of Teradata Database disk I/O operations. A checksum is a numeric value computed from data. For a given set of data, the checksum value is always the same, provided the data is unchanged.

    Because calculating checksums requires system resources and may affect system performance, system-wide checksumming is disabled by default on most platforms. Contact Teradata Customer Support if you suspect disk corruption.

    Use the CHECKSUM option of CREATE TABLE, ALTER TABLE, CREATE JOIN INDEX, and CREATE HASH INDEX statements to enable checksums for a table, hash index, or join index.

    “Checksum Fields” in Utilities

     

    The SCANDISK command of the Ferret utility checks the integrity of the Teradata Database file system. This includes the master index, cylinder index, and data block structures (including structures associated with WAL logs).

    Previously halted SCANDISK operations can be restarted using a Perl script.

    Note: If further events occur after SCANDISK has been running for some time, restarting SCANDISK where it left off the last time will not find errors in previously scanned data that were caused by the most recent failures. Therefore, be very careful when deciding to restart a pending SCANDISK operation versus starting the entire operation again from the beginning.

  • “Ferret Utility (ferret)” in Utilities
  • For Teradata Customer Support personnel, see “Filer Utility (filer)” in Support Utilities
  • Defining vprocs and hosts

    The Configuration utility defines AMPs, PEs, and hosts and their interrelationships for Teradata Database. This utility is used in conjunction with the Reconfiguration utility.

    Configuration Utility (config)” in Support Utilities

     

    The PUT utility also allows you to define and set up vprocs on the system.

    Parallel Upgrade Tool (PUT) Reference

    Disk space

     

     

    The Ferret utility allows field engineers and Teradata support people to monitor and control disk usage.

    The commands of this utility allow the system to combine free sectors on a cylinder, reconfigure contents on a disk leaving a specified percentage of space free for future use, report which tables qualify for packing and display disk cylinder utilization and available free cylinders.

    Note: This utility is used by Teradata Database engineers to perform routine and special diagnostics.

    “Ferret Utility (ferret)” in Utilities

    Update DBC utility recalculates PermSpace, SpoolSpace, and TempSpace for user DBC in DBase table. Then based on DBase values, Update DBC utility recalculates MaxPermSpace and MaxSpoolSpace in DBC.DataBaseSpace for all databases.

    Use Update DBC only to correct inconsistency in the DBase or DataBaseSpace tables, which might occur as a result of rare types of system failures.

    “Update DBC (updatedbc)” in Utilities

    Update Space recalculates permanent, temporary, or spool spaces for one database or all databases.

    “Update Space (updatespace)” in Utilities

     

    The DBC.DiskSpaceV view provides disk space usage information per AMP. This includes permanent and spool data by database or account for each AMP. Use this view to track large spool usage and available PERM space.

  • “Querying the DiskSpaceV View to Find System Space Issues” on page 172.
  • Data Dictionary
  • Down AMP

    If you receive an error that an AMP is down (for example from a MultiLoad or BTEQ job) use Vprocmanager to reboot a fatal AMP or bring a offline down AMP up.

    “Vproc Manager (vprocmanager)” in Utilities

    Down Table

    The system will mark a data or index subtable as down if an error occurs in several regions of the file system structures of the table. This error could be due to hardware or bad I/O. To address or fix a down region in a subtable, do one of the following:

  • Use the DELETE ALL/DROP table statement.
  • Rebuild the table using the Table Rebuild utility.
  • Restore the table from backup.
  • Drop and recreate the index.
  • Drop and recreate the fallback subtable.
  • After repairing a down table, use the ALTER TABLE…RESET DOWN statement to reset a down table.

    In the case of hardware errors, a fallback table can be rebuilt to copy the rows from the fallback subtable, to repair the rows in the down regions of the primary subtable, and vice versa. If the down table is rebuilt or restored from an earlier back up, the down status flag for the various subtables will be reset, and the rows in them will also be moved to unmarked data blocks and cylinders from the fallback copies.

    To change the number of regions per AMP that must fail before a table is marked as down, use the MaxDownRegions field in DBS Control.

    See “Table” in this table for more information.

  • “ALTER TABLE” in SQL Data Definition Language
  • “MaxDownRegions” in Utilities
  • Global parameters

    DBS Control displays and modifies the tunable global parameters in the DBS Control record.

    DBS Control (dbscontrol)” in Utilities

    Gateway settings

    The Gateway Control utility allows you to set Gateway settings such as how to handle encryption, number of session allowed for host group, time out values and more.

    You can use options in the Gateway Control utility to select things like connection time out length, external authentication settings, or whether to use the system default or customer-settable defaults after a configuration or addition of new host groups and gateway vprocs, and more.

    Note: This utility is used by Teradata Database engineers to perform routine and special diagnostics.

  • “Tools for Troubleshooting Client Connections” on page 447
  • Gateway Control (gtwcontrol)” in Utilities
  • “Gateway Global (gtwglobal)” in Utilities
  •  

     

     

     

     

     

     

     

    The Gateway Global utility monitors and controls sessions for LAN-connected users. You can monitor traffic, control network sessions, display which users are using which sessions, or abort specific sessions.

    Hangs or slowdowns

    See the topics “Sessions that are hung”, “AMPs and AMP Worker Tasks”, and “Resource usage” in this table.

     

     

    Resource Check Tools detect hangs or slowdowns as follows:

  • mboxchk: Monitors Teradata Database response time. If the response time indicates the database is slow or unresponsive, mboxchk runs a Teradata Support Center script to gather system information, logs an event, and contacts Teradata Vital Infrastructure (TVI).
  • nodecheck: Displays local, node-level resources only. Provides summary data to syscheck for analysis.
  • syscheck: Monitors the system for signs of congestion that might lead to system slowdowns or perceived hangs and notifies you when Warning or Alert thresholds are reached.
  • A text file called syscheckrc resides on each node. You can set it up to detect and report any resource that falls below a threshold.

    The resource check tools are located in the /usr/pde/bin directory.

    mboxchk, nodecheck, syscheck in man pages or pdehelp

    Loads into Large Tables

    To improve the speed of a batch INSERT/SELECT request into a large table, partition the table by transaction date, which groups the inserts into data blocks within the partition instead of throughout the table. This reduces the number of I/Os.

    You can also use the DBQL utility job log table, DBC.DBQLUtilityTbl, to troubleshoot utility job performance.

  • “Row Partitioning” on page 65
  • SQL Data Definition Language Syntax and Examples
  • “Utility Job Performance Analysis and Capacity Planning” on page 147
  •  

    Lock contentions or held locks

     

    There are several things that might cause lock contention: a pending lock, a very congested system, a number of transactions exceeding the machine capacity, or a conflict with ARC.

    Careful planning of what sessions run in which order or at what time of day can help prevent lock contention. Use the EXPLAIN. Consider running a request that needs an EXCLUSIVE lock and will take a while to process during off hours.

    “About Lock Contentions” on page 248

     

    The Lock Display utility shows all real-time locks in use. This includes locks for concurrency control. You can identify which transactions are blocked and which transactions are doing the blocking.

    “Lock Display (lokdisp)” in Utilities

     

    You may have locking issues on DBC.AccessRights if you are running concurrent jobs with the same userid in the same database and dropping or creating objects.

    To reduce locking issues on DBC.AccessRights:

  • Run the jobs with a few different userids and distribute them randomly among the connection strings to reduce the occurrence of two userids running concurrently.
  • Determine if housekeeping needs to be performed on DBC.AccessRights for the UserID/DatabaseID involved.
  • Determine if statistics need to be collected on the Data Dictionary tables.
  • Teradata knowledge articles on reducing locking issues with specific applications

     

    The DBQL Lock Log logs in XML format in DBQLXMLLOCKTbl any lock contention longer than a user-specified threshold.

    You can access the DBQL lock log through the Viewpoint lock viewer portlet or query the system table DBC.DBQLXMLLOCKTbl or the view DBC.QrylockLogXMLV.

    “XML Lock Log Table: DBQLXMLLockTbl” on page 347.

     

    Showlocks utility identifies and displays all active host utility (HUT) locks placed on databases and tables by ARC operations.

    HUT locks may interfere with application processing and are normally released after the utility process is complete.

    If locks interfere with application processing, you can remove them by invoking the RELEASE LOCK statement. It is available through the ARC utility or as an SQL statement.

    Showlocks can be started from DB Window Supervisor screen or the HUTCNS console.

  • Supervisor: start showlocks
  • HUTCNS: LOCksdisplay (entering LOC is all that is needed)
  • Teradata Archive/Recovery Utility Reference
  • Show Locks (showlocks)” in Utilities
  • Memory
    (insufficient memory resulting in Error 3710)

    Teradata recommends that you use the default value set for the MaxParseTreeSegs field in DBS Control. If you run into this error, first try simplifying your queries and/or dropping unnecessary join indexes. If this still does not work, verify the value you have set for MaxParseTreeSegs and contact Teradata Customer Support.

    Note: The optimal value depends on your system and what release and fix level you are running.

    Increasing MaxParseTreeSegs to a value larger than the default may allow a previously failed 3710 error to run but could result in other problems such as a long parsing time or in extreme cases, a Teradata reset.

    “MaxParseTreeSegs” in Utilities

    Priority of jobs

    Priority Scheduler assigns different priorities to different types of jobs. Higher priority work receives access to CPU and I/O more frequently.

    Use the ResUsageSps table to understand how CPU is being consumed by different workloads and, on SLES 10, to validate the relative weights given to workloads. You can also use this table to understand the pattern of database activity across the processing day.

  • Priority Scheduler (schmon)” in Utilities
  • “ResUsageSps Table” in Resource Usage Macros and Tables
  •  

    Queries

    Use BEGIN/REPLACE QUERY LOGGING WITH XMLPLAN or use the EXPLAIN IN XML request modifier.

    Alternately, use the STEPINFO option of BEGIN/REPLACE QUERY LOGGING to determine which step uses the most resources and if there are any large differences between estimated and actual CPU, I/O, and number of rows.

    Manage your queries using Teradata Viewpoint. You can:

  • Filter queries against object-access or query-resource rules.
  • Control the flow of queries coming into the system for concurrency reasons and delay them if necessary.
  • Log any system exception actions.
  • Determine in which Performance Group queries should run (in SLES 10).
  •  

    Resource usage

     

    Using resource usage macros and tables to obtain information about which nodes or which vprocs are heavily using which resources. This can help you identify bottlenecks.

    Resource Usage Macros and Tables

     

    The DBQL view DBC.QryLogV reports things such as the AMP using the most CPU, the AMP with the most I/O, or maximum amount of spool used when processing a query.

    “Chapter 15 Tracking Query Behavior with Database Query Logging: Operational DBAs” on page 259

    Sessions that are hung

    Use Query Session utility to determine the state of each session and then use Lock Display utility to find lock contentions.

    “Troubleshooting Problems with Sessions” on page 155

    Session information when logged onto multiple sessions

    To find the Thread ID or Job ID of the session from which you are querying, first submit a SELECT SESSION;

    Then after that query returns the session number n, use that number in the following query:

    sel logonsource
    from sessioninfovx
    where sessionno = n;
     
    *** Query completed. One row found. One column
     returned.
    *** Total elapsed time was 1 second.
     
    LogonSource
    -------------------------------------------------------
    (TCP/IP) 09AB 127.0.0.1 DBC    8804  USER  BTEQ  01 LSS
     

    This helps you determine which Client process/Thread ID belongs to which session. This is especially useful if you want to find out which Thread ID belongs to which session because you are logged on to multiple sessions.

    “SessionInfo[V][X]” in Data Dictionary

     

    Query Session utility displays the state of load utility and query sessions. Details can include statuses such as Parsing, Active, Blocked, Response, whether stored procedures are being processed, and so on.

    Query Session (qrysessn) in Utilities

    Skew

    There are many ways to detect skew:

  • Use Teradata Viewpoint to determine which user is using the most CPU. In general, a bad query can be detected by finding users with high CPU relative to I/O access.
  • Look at the EXPLAIN text to determine the number of steps used for the query and the type of operation. You may find something like a product join in a poorly written query. You can abort the query and investigate further or fix the query.
  • To identify node-level skew, use the ResUsageSpma table, comparing maximum CPU usage to the average CPU consumed across all nodes. If parallel efficiency as shown in the ResUsage Spma table is not approaching 100%, look for the reasons for the imbalance.
  • To find AMP-level skew, there are several options. Use the ResCPUByAMP macro to find AMP-level CPU skew. You can look at the ResUsageSpvr table and compare resource usage across different vprocs. You can also look at the number of requests field in the ResUsageSps table to see if workloads on one node are actively supporting more requests than on other nodes, then drill down to AMPs. Also using the ResUsageSps table, you can examine queue wait and service time numbers to find backed up queries by workload or allocation group.
  • To find skew at the request level, you can compare DBQLogTbl fields MaxAMPCPUTime with MinAMPCPU. Because user logging is often incomplete with DBQL, ResUsage tables and macros may provide more accurate information.
  • Another option is to use PM/API to develop your own application to monitor for high CPU consumers.
  • “EXPLAIN Request Modifier” in SQL Data Manipulation Language
  • “ResUsageSps Table,” “ResUsageSpma Table,” “ResUsageSvpr table,” and “ResCPUByAMP Macros” in Resource Usage Macros and Tables
  •  

    Slowdowns

    See the topic “Hangs or slowdowns” in this table.

     

    Slow queries to DBC tables

    If your site uses a lot of custom applications or tools that often query the DBC tables, consider collecting statistics on these tables

    “Collecting Statistics on Data Dictionary Tables” on page 200

    Space Exceeded

    If a space-intensive operation, such as a load utility or restoring from an archive, exceeds available perm space, two DBS Control utility fields can be set temporarily to allow operations to complete and avoid out of space errors.

    The settings apply to all operations and all instances of space accounting, such as table inserts, so Teradata recommends setting these fields to a non-zero value only temporarily.

    OverflowMaxPermSpace-
    Percentage and OverflowMaxPermSpace-
    KByteCount fields of the DBS Control utility in
    Utilities.

    Spool space depletion

    There are several reasons that could contribute to limited spool usage:

  • Poorly written queries can sometimes consume excessive spool space. Use Teradata Viewpoint to check queries before allowing them to run. You can check the EXPLAIN to see how much spool is used and prevent the system from getting overloaded with bad queries.
  • Outdated statistics can cause bad queries. The Optimizer may calculate query plans differently than it ought if statistics it uses does not accurately reflect the system. The Optimizer also determines how much spool it needs or does not need based on collected statistics. Refresh collected statistics regularly.
  • If you are using OLAP statistical functions and you run out of spool, check the syntax of your queries. For example, when using the PARTITION BY clause, if a large number of identical values in the partitioning column hash to the same AMP, this can result in out-of-spool errors. Choose a column that results in rows being distributed over a broader set of AMPs.
  • Poorly skewed data. Poor data distribution means one AMP is responsible for a larger proportion of the data. Because the spool space is divided by the number of AMPs, reaching the maximum spool space limit on one AMP means your query will fail even despite the other unused spool space on the other AMPs.
  • Poorly chosen PI that leads to AMP skew. Because the spool space is divided by the number of AMPs, reaching the maximum spool space limit on one AMP means your query will fail even despite the other unused spool space on the other AMPs.
  • Adding nodes or AMPs without reassigning spool space means spreading out the spool amongst more AMPs.
  • The limit for spool space as defined in the profile overrides the user settings. Check the limit for the profile to see if it is lower than the limit defined for the user. You can compare the values of MaxProfileSpool and MaxSpool in DBC.DiskSpace.V
  • Limiting spool space for new users helps reduce the impact of possibly bad queries (such as product joins). With a reasonable limit, the user will get out-of-spool messages before a bad query ties up the system. Furthermore, adding spool to a bad query will allow the query to run longer but a bad query may still not complete.

  • “Troubleshooting Spool Space Problems” on page 448
  • “EXPLAIN Request Modifier” in SQL Data Manipulation Language
  • Teradata Viewpoint
  • “Ordered Analytical Functions” in SQL Functions, Operators, Expressions, and Predicates
  • Space allocated to storage

    Use the Teradata Virtual Allocator Manager (TVAM) utility.

    Allows you to display mapping or force migration of cylinders on the system.

    Note: Do not use the TVAM utility unless instructed by Teradata Customer Support.

    Man page or online help

    System crashes or failures

     

     

    Screen Debug and Screen Dump controls how and what your system records for crashdumps. Teradata recommends that default settings for all dumps be changed only when requested by a system support representative or Teradata Customer Support.

    “Control GDO Editor (ctl)” in Utilities

    DBC.Software_Event_LogV is a system view that provides detailed information about errors or system failures and their related node, vproc, partition, task, function, software version, optional backtrace, diagnostic information, and so on.

    The Event_Tag field of the view reports the error message number.

  • Data Dictionary
  • Messages
  • DUL utility saves or restores system dump tables onto tape.

    Dump Unload/Load Utility (dul, dultape)” in Utilities

    System recovery

    Recovery Manager utility monitors and reports the progress of a system recovery after a crash or a user abort. You can use this utility to monitor transaction rollbacks, cancel rollbacks, and more.

    Recovery Manager (rcvmanager)” in Utilities

    Table

    The Table Rebuild utility rebuilds tables Teradata Database cannot recover automatically. Table Rebuild can rebuild the following for an AMP:

  • The primary or fallback portion of a table
  • An entire table (both primary and fallback portions)
  • A particular range of rows of a subtable
  • The corrupted regions of a table
  • A corrupted or missing table header
  • All tables
  • All fallback tables
  • All tables in a database
  • Single tables sequentially or tables of each database in parallel.
  • “Table Rebuild (rebuild)” in Utilities
  • See also “Table inconsistencies or corruption”
  • Table inconsistencies or corruption

    CheckTable is a diagnostic tool that checks for:

  • Table and dictionary inconsistencies, such as differing table versions, ParentCount and ChildCount data, and partitioning definitions.
  • Table corruption, such as duplicate rows or unique values and data inconsistency, of primary and fallback data and stored procedure tables.
  • Inconsistencies in internal data structures such as table headers, row identifiers, secondary indexes, and reference indexes.
  • Invalid row hash values and partition numbers.
  • Options for CheckTable include the following:

  • Four levels of data integrity checking (PENDINGOP, ONE, TWO, and THREE). Each successive level performs additional checks that are more thorough and consequently more time consuming.
  • The DOWN ONLY option displays status information for down tables only.
  • Note: The DOWN ONLY option works for all levels except PENDINGOP.

  • The ERROR ONLY option causes CheckTable to display only skipped tables, down tables, and tables with errors and warnings. This allows you to quickly identify and address problems found by CheckTable.
  • The OUTPUT command allows you to redirect CheckTable output to a log file. The ECHO option for this command allows you to direct output to both the console and a log file.
  • CHECKTABLEB is a non-interactive batch mode version of CheckTable that can be run through the cnsrun utility. CHECKTABLEB is identical to CheckTable except that it is meant to run from a script.

    “CheckTable (checktable)” in Utilities

     

    The Table Rebuild utility allows you to fix a primary or fallback copy of a table or only specific corrupted parts. When the rebuild operation successfully completes on an AMP, TableRebuild will remove the down region information from the table header. If, however, the fallback subtable has any down regions defined on it, the rebuild process aborts and the system returns an error.

    You cannot rebuild a table with parts of its fallback down. While you can rebuild stored procedures, UDFs, and UDMS, you cannot rebuild a join index or hash index defined on a table being rebuilt.

  • “Table Rebuild (rebuild)” in Utilities
  • See also “Table inconsistencies or corruption”
  • Vprocs

    The Vproc Manager utility allows you to:

  • View or change vproc states
  • Initialize and boot a specific vproc
  • Initialize the storage associated with a specific vproc
  • Force a manual Teradata Database restart
  • A few of the vprocs statuses include:

  • FATAL: the vproc or its associated storage is not operational. The system marks it fatal because of repeated crashes or if data integrity is in danger. Multiple crashes are indicated by the crash count and indicate a repeatable condition of data corruption, bad SQL or a software bug. Contact Teradata Customer Support.
  • FATAL** : this is an “fsgwizard” data integrity situation. Contact Teradata Customer Support.
  • OFFLINE: the vproc is operational, but either set offline purposely by an operator or if, for example, all AMPs in a clique are OFFLINE, they were probably set OFFLINE by the system because none of the nodes in its clique were up. This usually occurs during a system-wide reboot of all nodes, but nodes come up at different times.
  • Note: A local crash of an OFFLINE vproc will cause the database to restart.

  • UTILITY Catchup (offline catchup): the vproc was previously down (OFFLINE or FATAL or some other state). If its fallback data was written to another vproc in its cluster, the vproc is now back up (but not online) while it catches up its fallback data.
  • “Vproc Manager (vprocmanager)” in Utilities

    Vprocs and storage

    From the Supervisor screen of the Database Window, enter GET CONFIG to view the current system configuration.

    To administrate vprocs and disks, use the Vproc Manager utility.

  • Database Window (xdbw)” in Utilities
  • “Vproc Manager (vprocmanager)” in Utilities
  •  

    The Query Configuration utility reports the status of vprocs managed by node or for the entire Teradata Database system.

    “Query Configuration (qryconfig)” in Utilities

    Vproc Manager (vprocmanager) utility

    If vprocmanager exits with an error...

    # vprocmanager
    PDE is not running: Operation not permitted
    PDE event 10117 reported when task is not attached to PDE or PDE is down. Task should stop on this error.

    then you are probably attempting to run it from a non-TPA node. Run vprocmanager from a TPA node (a node running Teradata Database software).

    “Vproc Manager (vprocmanager)” in Utilities