The following section is for Teradata Support personnel and lists common questions about troubleshooting crashdumps.
- What are the most common errors encountered while handling crashdumps and why do these occur?
Question Answer What does an error 2644 indicate? DBC.Crashdumps database is full. Delete unwanted tables to free up more space or add additional space to the Crashdumps database.
What does an error "Filesystem full" mean? The raw PDE crashdump directory is full. Delete unwanted crashdumps from the dump directory using the clear option of the CSP utility. For more information, submit:
csp -h
at a Teradata command prompt.
What does the error "Control GDO maxdumps exceeded; no more dumps will be captured" mean? This error occurs when the maxdump parameter has been reached and another crashdump occurs. I notice a crashdump with the following name: Crash_yyyymmddhhmmss_nn_Part
What does this mean?
This is a partial crashdump. If a crashdump is not saved to the Crashdumps database yet, it is considered a partial crashdump. Give the system more time to process the crashdump. If you have waited but are not sure the crashdump is okay to use, run the csppeek utility on the saved crashdump. If a crashdump is not saved completely in the dump directory, the crashdump is considered an incomplete crashdump.
You can find the crashdump error messages in the log for your operating system (see Viewing Teradata Crashdump Messages.)
- What does it mean when there are ".Err" files present?
If you submit the following command in BTEQ:
HELP DATABASE CRASHDUMPS
you might see .Err files, for example .Err1 and .Err2 files.
This may be normal if the system is still in the process of saving a crashdump. These error files are cleared when the save completes. However, if the logs show that CSP did not successfully save the crashdump and the .Err files remain uncleared, something went wrong while saving the dump.
- What if I am unable to save crashdumps?If a CSP save failed, always check the log first. Then check the following items:
Item To Check Description Are your host entries properly configured? Is the Host Group ID defined on all of the hosts? The COP entries are needed in the hosts file for a client to login to the database. The copname must be configured in the /etc/hosts file.
In addition to the hostname entries, you must also define the hostname COP alias on each node for all the crashdumps tools accessing crashdumps. Set up is normally done during staging or configuration of client utilities. Consider the following for setting up COP entries:- The cop alias entries should follow standard Teradata Network CLI protocols.
- One of the cop entries in the file must be dbccop1. This name is used as a default for CLI. If you do not have dbccop1 in the hosts file then you get the following error:
MOSI: ER_HOST (117): DBC name not found - possible HOSTS file problem. *** ERROR 204: Can't Init Things. Exiting!
The COP entries must be sequential or CLI will stop looking for COP entries in the first COP entry that does not return successfully. For example, if dbccop1, dbccop2, dbccop3, dbccop5 are entries in the hosts file, CLI will not use dbccop5 to connect to the server, because dbccop4 is missing.
If a system can connect to four COPs, the COP entries in the hosts file should look as follows:
[IP address of NODE1] COPNAMEcop1 [IP address of NODE2] COPNAMEcop2 [IP address of NODE3] COPNAMEcop3 [IP address of NODE4] COPNAMEcop4
You can have multiple COP entries per node as long as the COPNAME is different. For example:
39.64.8.9 smp001-4 dbccop1 hotwmacop1 wmahotcop1 39.64.8.10 smp001-5 dbccop2 hotwmacop2 wmahotcop2 39.64.8.11 smp001-6 dbccop3 hotwmacop3 wmahotcop3 39.64.8.12 smp001-7 dbccop4 hotwmacop4 wmahotcop4
Have you made any tuning changes to throttle the number of FastLoad sessions running at any given time? If a user FastLoad is running and crashdumps attempts to log on its session while using a PE-based host group, there may not be enough sessions available. Does the Crashdumps database have enough space to save the dumps? See Configuring the Crashdumps Database for information on how to check how much space Crashdumps has and how to allocate more space. You may also need to delete unwanted crashdumps to make more room. Is CSP unable to allocate more than the minimum number of sessions per node in the specified host group? You cannot save crashdumps in this situation. Find a different time window when the system is more quiescent. - What if CSP runs out of sessions?
To save a crashdump, the system must be able to log on a minimum number of sessions as defined in the resource file for CSP. The default is 4 sessions per node. There is also a maximum number of internal sessions set on the system per PE.
If you notice that CSP runs out of sessions, run CSP at a different time when the system is more quiescent.
- How can I tell if a crashdump is okay to use?
- Use the CSP utility to check the raw PDE crashdumps with the following command:
csp -mode list -source dump
Output:
# csp -mode list csp: Searching for dumps in raw dump directory /var/opt/teradata/tddump csp: 4 dumps found, 4 dumps to process csp: Sel ID (date-time-token) Nodes Event Instigator Status csp: --- ---------------------- ----- ----- --------------- ------ csp: * 2008/09/09-17:03:13-02 1 10196 16384/0/(11480) csp: dump_20080909_180130 1 16384 Required files are missing or corrupted csp: 2008/08/14-11:35:58-01 1 10196 16384/0/(15160) csp: * 2008/07/11-14:41:58-01 1 10416 1/11/(28272) csp: * 2008/07/01-11:58:42-05 1 10196 16384/0/(6418) #
If the Status column reports “Required files are missing or corrupted” or “Dump belongs to a different PDE version,” the crashdump should not be used. If the Status column is blank, however, this means that the crashdump is okay to use.
- Use the csppeek utility by submitting the following command at a Teradata command prompt for a saved crashdump:
csppeek -i -d crashdumpname
where crashdumpname is in the form of Crash_yyyymmdd_hhmmss_nn.
If csppeek reports information about the crashdump as follows:
Dump (version 20) contains 1 node, 10 vprocs, 35 pids, 839 tids, 0 memos nodeids: 16384(I) Software versions: PDE 13.10.00.00 TDBMS 13.10.00.00 TGTW 13.10.00.00 TCHN 13.10.00.00 TDGSS 13.10.00.00 NODE 16384 is system noun Dump token: Crash_080911_080707_01 instigator: 33//3268 Caused by logevent: 10196-60 reported by /32382720 (3268/16384/0) node 16384 contains vprocs 16384(N) 16383(P) 16382(P) 10238(V) 10237(V) 8192(G) 3(A) 2(A) 1(A) 0(A)
This means you can use the crashdump. If it reports nothing or reports an error, the crashdump was unable to read the crashdump information, and you cannot use it.
- Use the CSP utility to check the raw PDE crashdumps with the following command:
- How do I get CSP to resume automatically saving crashdumps when it reports a 2644 failure (no more space in crashdumps database)?
When CSP reports this failure, CSP and cspslave go to sleep until they are awakened by a utility named cspwake or by a tpareset. After you add more space to the Crashdumps database, you can run cspwake at a command prompt without any arguments to resume automatic dump saving. If automatic dump saving was not being done, then CSP would have to be manually started with the –force option instead.