The following section is intended for Teradata support personnel and lists some common questions regarding how to troubleshoot crashdumps.
1 What are the most common errors encountered while handling crashdumps and why do these occur?
Question |
Answer |
What does an error 2644 indicate? |
DBC.Crashdumps database is full. Delete unwanted tables to free up more space or add additional space to the Crashdumps database. |
What does an error “Filesystem full” indicate? |
The raw PDE crashdump directory is full. You should delete unwanted crashdumps from the dump directory using the clear option of the CSP utility. For more information, submit: csp -h
at a Teradata command prompt. |
What does the error “Control GDO maxdumps exceeded; no more dumps will be captured” indicate? |
This error occurs when the maxdump parameter has been reached and another crashdump occurs. |
I notice a crashdump with the following name: Crash_yyyymmddhhmmss_nn_Part What does this mean? |
This is a partial crashdump. If a crashdump is not saved to the Crashdumps database yet, it is considered a partial crashdump. You should simply wait and give the system more time to process the crashdump. If you have waited but are not sure the crashdump is okay to use, run the csppeek utility on the saved crashdump. If a crashdump is not saved completely in the dump directory, the crashdump is considered an incomplete crashdump. |
You can find the crashdump error messages in the log for your operating system (see “Viewing Teradata Crashdump Messages” on page 558.)
2 What does it mean when there are “.Err” files present?
If you submit the following command in BTEQ:
HELP DATABASE CRASHDUMPS;
you might see .Err files, for example .Err1 and .Err2 files.
This may be normal if the system is still in the process of saving a crashdump. These error files will be cleared when the save completes. However, if the logs show that CSP did not successfully save the crashdump and the .Err files are left over (that is, they remain uncleared), this indicates that something went wrong while saving the dump.
3 What should I do if I am unable to save crashdumps?
If a CSP save failed, always check the log first. Then check the following items:
a Are your host entries properly configured? Is the Host Group ID defined on all of the hosts?
The COP entries are needed in the hosts file for a client to login to the database.
The copname must be configured in the /etc/hosts
file.
In addition to the hostname entries, you must also define the hostname COP alias on each node for all the crashdumps tools accessing crashdumps. Set up is normally done during staging or configuration of client utilities. Consider the following for setting up COP entries:
MOSI: ER_HOST (117): DBC name not found - possible HOSTS file problem.
*** ERROR 204: Can't Init Things. Exiting!
If a Teradata Database system can connect to four COPs, the COP entries in the hosts file should look as follows:
[IP address of NODE1] COPNAMEcop1
[IP address of NODE2] COPNAMEcop2
[IP address of NODE3] COPNAMEcop3
[IP address of NODE4] COPNAMEcop4
Note: You can have multiple COP entries per node as long as the COPNAME is different. For example:
39.64.8.9 smp001-4 dbccop1 hotwmacop1 wmahotcop1
39.64.8.10 smp001-5 dbccop2 hotwmacop2 wmahotcop2
39.64.8.11 smp001-6 dbccop3 hotwmacop3 wmahotcop3
39.64.8.12 smp001-7 dbccop4 hotwmacop4 wmahotcop4
b Have you made any tuning changes to throttle the number of FastLoad sessions running at any given time? If a user FastLoad is running and crashdumps attempts to log on its session while using a PE-based host group, there may not be enough sessions available.
c Does the Crashdumps database have enough space to save the dumps? See “Configuring the Crashdumps Database” on page 558 for information on how to check how much space Crashdumps has and how to allocate more space. You may also need to delete unwanted crashdumps to make more room.
d When CSP cannot allocate more than the minimum number of sessions per node in the specified host group, you are unable to save crashdumps. Find a different time window when the system is more quiescent.
4 What if CSP runs out of sessions?
To save a crashdump, the system must be able to log on a minimum number of sessions as defined in the resource file for CSP. The default is 4 sessions per node. There is also a maximum number of internal sessions set on the system per PE.
If you notice that CSP runs out of sessions, run CSP at a different time when the system is more quiescent.
5 How can I tell if a crashdump is okay to use?
csp -mode list -source dump
The output will appear as follows:
# csp -mode list
csp: Searching for dumps in raw dump directory /var/opt/teradata/tddump
csp: 4 dumps found, 4 dumps to process
csp: Sel ID (date-time-token) Nodes Event Instigator Status
csp: --- ---------------------- ----- ----- ------------- ------
csp: * 2008/09/09-17:03:13-02 1 10196 16384/0/(11480)
csp: dump_20080909_180130 1 16384 Required files are missing or corrupted
csp: 2008/08/14-11:35:58-01 1 10196 16384/0/(15160)
csp: * 2008/07/11-14:41:58-01 1 10416 1/11/(28272)
csp: * 2008/07/01-11:58:42-05 1 10196 16384/0/(6418)
#
If the Status column reports “Required files are missing or corrupted” or “Dump belongs to a different PDE version,” the crashdump should not be used. If the Status column is blank, however, this means that the crashdump is okay to use.
csppeek -i -d crashdumpname
where crashdumpname is in the form of Crash_yyyymmdd_hhmmss_nn.
If csppeek reports information about the crashdump as follows:
Dump (version 20) contains 1 node, 10 vprocs, 35 pids, 839 tids, 0 memos
nodeids: 16384(I)
Software versions:
PDE 13.10.00.00
TDBMS 13.10.00.00
TGTW 13.10.00.00
TCHN 13.10.00.00
TDGSS 13.10.00.00
NODE 16384 is system noun
Dump token: Crash_080911_080707_01 instigator: 33//3268
Caused by logevent: 10196-60 reported by /32382720 (3268/16384/0)
node 16384 contains vprocs 16384(N)
16383(P)
16382(P)
10238(V)
10237(V)
8192(G)
3(A)
2(A)
1(A)
0(A)
This means the crashdump is okay to use. If it does not report anything or reports an error, this indicates a problem with the crashdump because it could not read the crashdump information and you should not use the crashdump.
6 How do I get CSP to resume automatically saving crashdumps when it reports a 2644 failure (no more space in crashdumps database)?
When CSP reports this failure, CSP and cspslave will go to sleep until they are awakened by a utility named cspwake or by a tpareset. After you add more space to the Crashdumps database, you can run cspwake at a command prompt without any arguments to resume automatic dump saving. If automatic dump saving was not being done, then CSP would have to be manually started with the –force option instead.