16.10 - Troubleshooting Crashdumps - Teradata Database

Teradata Database Administration

Product
Teradata Database
Release Number
16.10
Release Date
April 2018
Content Type
Administration
Publication ID
B035-1093-161K
Language
English (United States)

The following section is intended for Teradata support personnel and lists some common questions regarding how to troubleshoot crashdumps.

  1. What are the most common errors encountered while handling crashdumps and why do these occur?
    Question Answer
    What does an error 2644 indicate? DBC.Crashdumps database is full.

    Delete unwanted tables to free up more space or add additional space to the Crashdumps database.

    What does an error “Filesystem full” indicate? The raw PDE crashdump directory is full.

    You should delete unwanted crashdumps from the dump directory using the clear option of the CSP utility. For more information, submit:

    csp -h

    at a Teradata command prompt.

    What does the error “Control GDO maxdumps exceeded; no more dumps will be captured” indicate? This error occurs when the maxdump parameter has been reached and another crashdump occurs.
    I notice a crashdump with the following name:

    Crash_yyyymmddhhmmss_nn_Part

    What does this mean?

    This is a partial crashdump. If a crashdump is not saved to the Crashdumps database yet, it is considered a partial crashdump. You should simply wait and give the system more time to process the crashdump. If you have waited but are not sure the crashdump is okay to use, run the csppeek utility on the saved crashdump.

    If a crashdump is not saved completely in the dump directory, the crashdump is considered an incomplete crashdump.

    You can find the crashdump error messages in the log for your operating system (see Viewing Teradata Crashdump Messages.)

  2. What does it mean when there are “.Err” files present?

    If you submit the following command in BTEQ:

    HELP DATABASE CRASHDUMPS;

    you might see .Err files, for example .Err1 and .Err2 files.

    This may be normal if the system is still in the process of saving a crashdump. These error files will be cleared when the save completes. However, if the logs show that CSP did not successfully save the crashdump and the .Err files are left over (that is, they remain uncleared), this indicates that something went wrong while saving the dump.

  3. What should I do if I am unable to save crashdumps?
    If a CSP save failed, always check the log first. Then check the following items:
    Item To Check Description
    Are your host entries properly configured? Is the Host Group ID defined on all of the hosts?

    The COP entries are needed in the hosts file for a client to login to the database. The copname must be configured in the /etc/hosts file.

    In addition to the hostname entries, you must also define the hostname COP alias on each node for all the crashdumps tools accessing crashdumps. Set up is normally done during staging or configuration of client utilities. Consider the following for setting up COP entries:
    • The cop alias entries should follow standard Teradata Network CLI protocols.
    • One of the cop entries in the file must be dbccop1. This name is used as a default for CLI. If you do not have dbccop1 in the hosts file then you get the following error:
      MOSI: ER_HOST (117): DBC name not found - possible HOSTS file problem.
      *** ERROR 204: Can't Init Things. Exiting!
    • The COP entries must be sequential or CLI will stop looking for COP entries in the first COP entry that does not return successfully. For example, if dbccop1, dbccop2, dbccop3, dbccop5 are entries in the hosts file, CLI will not use dbccop5 to connect to the Teradata Database server, because dbccop4 is missing.

      If a Teradata Database system can connect to four COPs, the COP entries in the hosts file should look as follows:

      [IP address of NODE1] COPNAMEcop1
      [IP address of NODE2] COPNAMEcop2
      [IP address of NODE3] COPNAMEcop3
      [IP address of NODE4] COPNAMEcop4

      You can have multiple COP entries per node as long as the COPNAME is different. For example:

      39.64.8.9  smp001-4 dbccop1 hotwmacop1 wmahotcop1
      39.64.8.10 smp001-5 dbccop2 hotwmacop2 wmahotcop2
      39.64.8.11 smp001-6 dbccop3 hotwmacop3 wmahotcop3
      39.64.8.12 smp001-7 dbccop4 hotwmacop4 wmahotcop4
    Have you made any tuning changes to throttle the number of FastLoad sessions running at any given time? If a user FastLoad is running and crashdumps attempts to log on its session while using a PE-based host group, there may not be enough sessions available.
    Does the Crashdumps database have enough space to save the dumps? See Configuring the Crashdumps Database for information on how to check how much space Crashdumps has and how to allocate more space. You may also need to delete unwanted crashdumps to make more room.
    Is CSP unable to allocate more than the minimum number of sessions per node in the specified host group? You cannot save crashdumps in this situation. Find a different time window when the system is more quiescent.
  4. What if CSP runs out of sessions?

    To save a crashdump, the system must be able to log on a minimum number of sessions as defined in the resource file for CSP. The default is 4 sessions per node. There is also a maximum number of internal sessions set on the system per PE.

    If you notice that CSP runs out of sessions, run CSP at a different time when the system is more quiescent.

  5. How can I tell if a crashdump is okay to use?
    • Use the CSP utility to check the raw PDE crashdumps with the following command:
      csp -mode list -source dump

      The output will appear as follows:

      # csp -mode list
      csp: Searching for dumps in raw dump directory /var/opt/teradata/tddump
      csp: 4 dumps found, 4 dumps to process
      csp: Sel  ID (date-time-token)   Nodes Event Instigator      Status
      csp: ---  ---------------------- ----- ----- --------------- ------
      csp:  *   2008/09/09-17:03:13-02 1     10196 16384/0/(11480)
      csp:      dump_20080909_180130   1           16384           Required files are missing or corrupted
      csp:      2008/08/14-11:35:58-01 1     10196 16384/0/(15160)
      csp:  *   2008/07/11-14:41:58-01 1     10416 1/11/(28272)
      csp:  *   2008/07/01-11:58:42-05 1     10196 16384/0/(6418)
      #

      If the Status column reports “Required files are missing or corrupted” or “Dump belongs to a different PDE version,” the crashdump should not be used. If the Status column is blank, however, this means that the crashdump is okay to use.

    • Use the csppeek utility by submitting the following command at a Teradata command prompt for a saved crashdump:
      csppeek -i -d crashdumpname

      where crashdumpname is in the form of Crash_yyyymmdd_hhmmss_nn.

      If csppeek reports information about the crashdump as follows:

      Dump (version 20) contains 1 node, 10 vprocs, 35 pids, 839 tids, 0 memos
        nodeids: 16384(I)
      
      Software versions:
      PDE    13.10.00.00
      TDBMS    13.10.00.00
      TGTW    13.10.00.00
      TCHN    13.10.00.00
      TDGSS    13.10.00.00
      
      NODE 16384 is system noun
        Dump token: Crash_080911_080707_01  instigator: 33//3268
        Caused by logevent: 10196-60 reported by /32382720 (3268/16384/0)
      
      node 16384 contains vprocs 16384(N)
       16383(P)
       16382(P)
       10238(V)
       10237(V)
       8192(G)
       3(A)
       2(A)
       1(A)
       0(A)

      This means the crashdump is okay to use. If it does not report anything or reports an error, this indicates a problem with the crashdump because it could not read the crashdump information and you should not use the crashdump.

  6. How do I get CSP to resume automatically saving crashdumps when it reports a 2644 failure (no more space in crashdumps database)?

    When CSP reports this failure, CSP and cspslave will go to sleep until they are awakened by a utility named cspwake or by a tpareset. After you add more space to the Crashdumps database, you can run cspwake at a command prompt without any arguments to resume automatic dump saving. If automatic dump saving was not being done, then CSP would have to be manually started with the –force option instead.