1. Specify the exact time of issue to focus.
2. /var/log/messages file covering the time of reboot from both nodes.
3. Opatch
cd $ORACLE_HOME/OPatch
./opatch -lsinventory
4) Please upload diagcollection.sh output files from all DB nodes.
# to know the eviction information from cluster front.
$GRID_HOME/bin/diagcollection.sh
For diagcollection refer to CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide ( Doc ID 330358.1 )
# script /tmp/diag.log
# id
# env
# cd <temp-directory-with-plenty-free-space>
# $GRID_HOME/bin/diagcollection.sh
# exit
The following .gz files will be generated in the current directory and need to be uploaded along with /tmp/diag.log:
crsData_<hostname>.tar.gz,
ocrData_<hostname>.tar.gz,
oraData_<hostname>.tar.gz,
os_<hostname>.tar.gz
4) Please upload alert.log from DB and ASM instances from all DB nodes
5) Grid home logs from all db nodes.
% cd $grid_home/log/<hostname>
% find . | xargs grep '2011-11-29' -sl
or
cd $GRID_HOME/log
tar -cvzf node1.tar.gz *
6)Cell trace & logs
Upload logs for me please from the problem cell, covering problem time.
% cd $ADR_BASE/diag/asm/cell/<hostname>/trace/
% egrep -lr 2012-09-06 * | xargs tar cvfz `hostname -s`_cell_diag.tar.gz
7) Please upload the Storage Cell alert.log and Storage Cell alert history: from all cell nodes
a.) /opt/oracle/cell/log/diag/asm/cell/{node name}/trace/alert.log
b.) # cellcli -l show alerthistory
8) OS watcher information:
location of logs==> /opt/oracle.oswatcher/osw/archive
cd /opt/oracle.oswatcher/osw/archive
find . -name '*12.09.05*' -exec zip /tmp/osw_`hostname -a`.zip {} \;
cd /opt/oracle.oswatcher/osw/archive -
->Change date and time covering problem time.
find . -name "*11.10.13.1[8-9]00*" -exec zip /tmp/osw_`hostname -a`_0503.zip {} \;
-->please do not miss ';' at the last in the above command
To get OS watcher data of specific date :
cd /opt/oracle.oswatcher/osw/archive
find . -name '*12.01.13*' -print -exec zip /tmp/osw_`hostname`.zip {} \;
where 12- year 01- Month 13-day
9) Linux OFA & kernel information
a) dcli -l root -c <list of dbnodes> "rpm -qa | grep ofa"
b). dcli -l root -c <list of dbnodes> "uname -a"
10) Linux crash file information
# to check the possibility of LINUX PROBLEM
grep path /etc/kdump.conf will give the core file location
Please check for a vmcore/crashcore that may have been generated during the reboot.
The cells and database servers of Oracle Exadata Database Machine are configured to generate Linux kernel crash core files when there is a Linux crash.
Common locations are /var/crash or /u01/crashfiles on database nodes, and /var/log/oracle/crashfiles on cell nodes.
Start in this file:
cat /etc/kdump.conf
Look for the uncommented filesystem (ext3) and path.
example:
ext3 /dev/md11
path /crashfiles
The location is the /crashfiles folder from the mount point of /dev/md11.
The find where md11 is mounted:
df
example:
/dev/md11 2395452 280896 1992872 13% /var/log/oracle
Then change to that directory:
cd /var/log/oracle/crashfiles
11) SOS report
sosreport requires root permissions to run.
# /usr/sbin/sosreport
The sosreport will run for several minutes, according to different system, the running time maybe more longer.
Once completed, “sosreport” will generate a compressed a bz2 file under /tmp.
12) Run infinicheck (from any of the db nodes) command:
# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g /opt/oracle.SupportTools/onecommand/dbs_ib_group |tee /tmp/infinicheck_`hostname`.log
where dbs_ib_group contains the hostnames/IPoIB of the compute nodes part of the cluster. It has to be using the IB subnet.
Note: The location for dbs_ib_group is for Oracle Sun DB Machine. For other systems, specify the location of the file
13) Run sundiag.sh
The script needs to be executed as root on the Exadata Storage server having disk problems and sometimes also on Db nodes or Storage Servers for some other hardware issues.
For gathering sundiag.sh - output across a whole rack using dcli, the outputs may end up with the same tarball name which will overwrite each other upon unzipping. To avoid this, use the following from DB01:
1. [root@exadb01 ~]# cd /opt/oracle.SupportTools/onecommand (or wherever the all_group file is with the list of the rack hostnames)
2. [root@exadb01 onecommand]# dcli -g all_group -l root /opt/oracle.SupportTools/sundiag.sh 2>&1
<this will take up to about 2 minutes>
3. Verify there is output in /tmp on each node:
[root@exadb01 onecommand]# dcli -g all_group -l root --serial 'ls -l /tmp/sundiag* '
For gathering OS Watcher data alongside sundiag:
# /opt/oracle.SupportTools/sundiag.sh osw
Execution will create a date stamped tar.bz2 file in /tmp/sundiag_/tar.bz2 including OS Watcher archive logs. These logs may be very large.
# /opt/oracle.SupportTools/sundiag.sh -h
Oracle Exadata Database Machine - Diagnostics Collection Tool
Version: 1.5.1_20140521
Usage: ./sundiag.sh [osw] [ilom | snapshot]
osw - Copy ExaWatcher or OSWatcher log files (Can be several 100MB's)
ilom - User level ILOM data gathering option via ipmitool, in place of
separately using root login to get ILOM snapshot over the network.
snapshot - Collects node ILOM snapshot- requires host root password for ILOM
to send snapshot data over the network.
Exadata Diagnostic Collection Guide (Doc ID 1353073.2)
2. /var/log/messages file covering the time of reboot from both nodes.
3. Opatch
cd $ORACLE_HOME/OPatch
./opatch -lsinventory
4) Please upload diagcollection.sh output files from all DB nodes.
# to know the eviction information from cluster front.
$GRID_HOME/bin/diagcollection.sh
For diagcollection refer to CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide ( Doc ID 330358.1 )
# script /tmp/diag.log
# id
# env
# cd <temp-directory-with-plenty-free-space>
# $GRID_HOME/bin/diagcollection.sh
# exit
The following .gz files will be generated in the current directory and need to be uploaded along with /tmp/diag.log:
crsData_<hostname>.tar.gz,
ocrData_<hostname>.tar.gz,
oraData_<hostname>.tar.gz,
os_<hostname>.tar.gz
4) Please upload alert.log from DB and ASM instances from all DB nodes
5) Grid home logs from all db nodes.
% cd $grid_home/log/<hostname>
% find . | xargs grep '2011-11-29' -sl
or
cd $GRID_HOME/log
tar -cvzf node1.tar.gz *
6)Cell trace & logs
Upload logs for me please from the problem cell, covering problem time.
% cd $ADR_BASE/diag/asm/cell/<hostname>/trace/
% egrep -lr 2012-09-06 * | xargs tar cvfz `hostname -s`_cell_diag.tar.gz
7) Please upload the Storage Cell alert.log and Storage Cell alert history: from all cell nodes
a.) /opt/oracle/cell/log/diag/asm/cell/{node name}/trace/alert.log
b.) # cellcli -l show alerthistory
8) OS watcher information:
location of logs==> /opt/oracle.oswatcher/osw/archive
cd /opt/oracle.oswatcher/osw/archive
find . -name '*12.09.05*' -exec zip /tmp/osw_`hostname -a`.zip {} \;
cd /opt/oracle.oswatcher/osw/archive -
->Change date and time covering problem time.
find . -name "*11.10.13.1[8-9]00*" -exec zip /tmp/osw_`hostname -a`_0503.zip {} \;
-->please do not miss ';' at the last in the above command
To get OS watcher data of specific date :
cd /opt/oracle.oswatcher/osw/archive
find . -name '*12.01.13*' -print -exec zip /tmp/osw_`hostname`.zip {} \;
where 12- year 01- Month 13-day
9) Linux OFA & kernel information
a) dcli -l root -c <list of dbnodes> "rpm -qa | grep ofa"
b). dcli -l root -c <list of dbnodes> "uname -a"
10) Linux crash file information
# to check the possibility of LINUX PROBLEM
grep path /etc/kdump.conf will give the core file location
Please check for a vmcore/crashcore that may have been generated during the reboot.
The cells and database servers of Oracle Exadata Database Machine are configured to generate Linux kernel crash core files when there is a Linux crash.
Common locations are /var/crash or /u01/crashfiles on database nodes, and /var/log/oracle/crashfiles on cell nodes.
Start in this file:
cat /etc/kdump.conf
Look for the uncommented filesystem (ext3) and path.
example:
ext3 /dev/md11
path /crashfiles
The location is the /crashfiles folder from the mount point of /dev/md11.
The find where md11 is mounted:
df
example:
/dev/md11 2395452 280896 1992872 13% /var/log/oracle
Then change to that directory:
cd /var/log/oracle/crashfiles
11) SOS report
sosreport requires root permissions to run.
# /usr/sbin/sosreport
The sosreport will run for several minutes, according to different system, the running time maybe more longer.
Once completed, “sosreport” will generate a compressed a bz2 file under /tmp.
12) Run infinicheck (from any of the db nodes) command:
# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g /opt/oracle.SupportTools/onecommand/dbs_ib_group |tee /tmp/infinicheck_`hostname`.log
where dbs_ib_group contains the hostnames/IPoIB of the compute nodes part of the cluster. It has to be using the IB subnet.
Note: The location for dbs_ib_group is for Oracle Sun DB Machine. For other systems, specify the location of the file
13) Run sundiag.sh
The script needs to be executed as root on the Exadata Storage server having disk problems and sometimes also on Db nodes or Storage Servers for some other hardware issues.
For gathering sundiag.sh - output across a whole rack using dcli, the outputs may end up with the same tarball name which will overwrite each other upon unzipping. To avoid this, use the following from DB01:
1. [root@exadb01 ~]# cd /opt/oracle.SupportTools/onecommand (or wherever the all_group file is with the list of the rack hostnames)
2. [root@exadb01 onecommand]# dcli -g all_group -l root /opt/oracle.SupportTools/sundiag.sh 2>&1
<this will take up to about 2 minutes>
3. Verify there is output in /tmp on each node:
[root@exadb01 onecommand]# dcli -g all_group -l root --serial 'ls -l /tmp/sundiag* '
For gathering OS Watcher data alongside sundiag:
# /opt/oracle.SupportTools/sundiag.sh osw
Execution will create a date stamped tar.bz2 file in /tmp/sundiag_/tar.bz2 including OS Watcher archive logs. These logs may be very large.
# /opt/oracle.SupportTools/sundiag.sh -h
Oracle Exadata Database Machine - Diagnostics Collection Tool
Version: 1.5.1_20140521
Usage: ./sundiag.sh [osw] [ilom | snapshot]
osw - Copy ExaWatcher or OSWatcher log files (Can be several 100MB's)
ilom - User level ILOM data gathering option via ipmitool, in place of
separately using root login to get ILOM snapshot over the network.
snapshot - Collects node ILOM snapshot- requires host root password for ILOM
to send snapshot data over the network.
Exadata Diagnostic Collection Guide (Doc ID 1353073.2)
No comments:
Post a Comment