Wednesday, August 6, 2014

Exadata Commands to generate Logs for Troubleshooting

1. Specify the exact time of issue to focus. 

2. /var/log/messages file covering the time of reboot from both nodes. 

3. Opatch 

cd $ORACLE_HOME/OPatch 

./opatch -lsinventory 

4) Please upload diagcollection.sh output files from all DB nodes. 

# to know the eviction information from cluster front. 

$GRID_HOME/bin/diagcollection.sh 
For diagcollection refer to CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide ( Doc ID 330358.1 ) 

# script /tmp/diag.log 
# id 
# env 
# cd <temp-directory-with-plenty-free-space> 
# $GRID_HOME/bin/diagcollection.sh 
# exit 

The following .gz files will be generated in the current directory and need to be uploaded along with /tmp/diag.log: 

crsData_<hostname>.tar.gz, 
ocrData_<hostname>.tar.gz, 
oraData_<hostname>.tar.gz, 
os_<hostname>.tar.gz 

4) Please upload alert.log from DB and ASM instances from all DB nodes 

5) Grid home logs from all db nodes. 

% cd $grid_home/log/<hostname> 
% find . | xargs grep '2011-11-29' -sl 

or 

cd $GRID_HOME/log 
tar -cvzf node1.tar.gz * 

6)Cell trace & logs 

Upload logs for me please from the problem cell, covering problem time. 

% cd $ADR_BASE/diag/asm/cell/<hostname>/trace/ 
% egrep -lr 2012-09-06 * | xargs tar cvfz `hostname -s`_cell_diag.tar.gz 

7) Please upload the Storage Cell alert.log and Storage Cell alert history: from all cell nodes 

a.) /opt/oracle/cell/log/diag/asm/cell/{node name}/trace/alert.log 
b.) # cellcli -l show alerthistory 

8) OS watcher information: 

location of logs==> /opt/oracle.oswatcher/osw/archive 

cd /opt/oracle.oswatcher/osw/archive 
find . -name '*12.09.05*' -exec zip /tmp/osw_`hostname -a`.zip {} \; 

cd /opt/oracle.oswatcher/osw/archive - 
->Change date and time covering problem time. 
find . -name "*11.10.13.1[8-9]00*" -exec zip /tmp/osw_`hostname -a`_0503.zip {} \; 
-->please do not miss ';' at the last in the above command 

To get OS watcher data of specific date : 
cd /opt/oracle.oswatcher/osw/archive 
find . -name '*12.01.13*' -print -exec zip /tmp/osw_`hostname`.zip {} \; 
where 12- year 01- Month 13-day 

9) Linux OFA & kernel information 

a) dcli -l root -c <list of dbnodes> "rpm -qa | grep ofa" 
b). dcli -l root -c <list of dbnodes> "uname -a" 

10) Linux crash file information 

# to check the possibility of LINUX PROBLEM 

grep path /etc/kdump.conf will give the core file location 

Please check for a vmcore/crashcore that may have been generated during the reboot. 

The cells and database servers of Oracle Exadata Database Machine are configured to generate Linux kernel crash core files when there is a Linux crash. 
Common locations are /var/crash or /u01/crashfiles on database nodes, and /var/log/oracle/crashfiles on cell nodes. 

Start in this file: 
cat /etc/kdump.conf 
Look for the uncommented filesystem (ext3) and path. 
example: 
ext3 /dev/md11 
path /crashfiles 

The location is the /crashfiles folder from the mount point of /dev/md11. 
The find where md11 is mounted: 
df 

example: 
/dev/md11 2395452 280896 1992872 13% /var/log/oracle 

Then change to that directory: 
cd /var/log/oracle/crashfiles 


11) SOS report 

sosreport requires root permissions to run. 

# /usr/sbin/sosreport 
The sosreport will run for several minutes, according to different system, the running time maybe more longer. 

Once completed, “sosreport” will generate a compressed a bz2 file under /tmp. 


12) Run infinicheck (from any of the db nodes) command: 

# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g /opt/oracle.SupportTools/onecommand/dbs_ib_group |tee /tmp/infinicheck_`hostname`.log 

where dbs_ib_group contains the hostnames/IPoIB of the compute nodes part of the cluster. It has to be using the IB subnet. 

Note: The location for dbs_ib_group is for Oracle Sun DB Machine. For other systems, specify the location of the file 


13) Run sundiag.sh

The script needs to be executed as root on the Exadata Storage server having disk problems and sometimes also on Db nodes or Storage Servers for some other hardware issues.

For gathering sundiag.sh - output across a whole rack using dcli, the outputs may end up with the same tarball name which will overwrite each other upon unzipping.  To avoid this, use the following from DB01:

1. [root@exadb01 ~]# cd /opt/oracle.SupportTools/onecommand (or wherever the all_group file is with the list of the rack hostnames)

2. [root@exadb01 onecommand]# dcli -g all_group -l root /opt/oracle.SupportTools/sundiag.sh 2>&1
<this will take up to about 2 minutes>

3. Verify there is output in /tmp on each node:
[root@exadb01 onecommand]# dcli -g all_group -l root --serial 'ls -l /tmp/sundiag* '


For gathering OS Watcher data alongside sundiag: 

# /opt/oracle.SupportTools/sundiag.sh osw

Execution will create a date stamped tar.bz2 file in /tmp/sundiag_/tar.bz2 including OS Watcher archive logs. These logs may be very large.


# /opt/oracle.SupportTools/sundiag.sh -h

Oracle Exadata Database Machine - Diagnostics Collection Tool

Version: 1.5.1_20140521

Usage: ./sundiag.sh [osw] [ilom | snapshot]
   osw      - Copy ExaWatcher or OSWatcher log files (Can be several 100MB's)
   ilom     - User level ILOM data gathering option via ipmitool, in place of
              separately using root login to get ILOM snapshot over the network.
   snapshot - Collects node ILOM snapshot- requires host root password for ILOM
              to send snapshot data over the network.


Exadata Diagnostic Collection Guide (Doc ID 1353073.2)

No comments:

Post a Comment