The KDB kernel debugger and the kdb command are useful for debugging device drivers, kernel extensions, and the kernel itself. Although they appear similar, the KDB kernel debugger and the kdb command are two separate tools:
KDB KERNEL DEBUGGER
It is integrated into the kernel and allows full control of the system while a debugging session is in progress.
KDB COMMAND:
It is implemented as an ordinary user-space program and can be used for analyzing the following:
1. A running system: When used to analyze a running system, the kdb command opens the /dev/pmem special file, which allows direct access to the system's physical memory. The kdb command performs its own address translation internally using the same algorithms as the KDB kernel debugger.
2. A system dump file produced by a previously crashed-system: A system dump contains certain critical data structures. Only the memory belonging to the process that was running on the processor that created the dump image can be included in the dump file. When you work with a system dump, any subcommands that modify memory are not valid because the system dump is merely a snapshot of the real memory in a system.
When you are analyzing a system dump file, the kdb command must be started with arguments that specify the location of the dump file and the kernel file:
# kdb /var/adm/ras/vmcore.0 /unix
(The kernel file is used by the kdb command to resolve symbol names from the dump file.)
------------------------------------
A very valuable benefit of kdb, that a device setting stored in ODM (lsattr..) can be compared with the realtime value used in running kernel with kdb!!
------------------------------------
KDB COMMAND:
help display context lists subcommands with the context "display"
p -? list of parameters for the p subcommand and a brief description
! <command> shell escape (provides a convenient way to run UNIX commands without leaving kdb)
hi print history
lke list loaded extensions
pvol -M <major> -m <minor> display physical volume info
stat system status info
status processor status
e exit from kdb
------------------------------------
echo vfcs fcs0 | kdb | grep num_cmd_elems shows num_cmd_elems in hex on VIO client with NPIV (compare with odm: lsattr -El fcs0)
(if you change num_cmd_elems with chdev, you can check in kdb if it really has been changed)
echo scsidisk hdisk0 | kdb | grep queue_depth shows real-time value in hex of queue_depth of given disk
------------------------------------
Check VSCSI adapter mapping:
(run this on vio client, not on vio server)
root@bb_lpar: / # echo "cvai" | kdb | grep vscsi <--cvai is a kdb subcommand
read vscsi_scsi_ptrs OK, ptr = 0xF1000000C01A83C0
vscsi0 0x000007 0x0000000000 0x0 aix-vios1->vhost2 <--shows which vhost is used on which vio server for this client
vscsi1 0x000007 0x0000000000 0x0 aix-vios1->vhost1
vscsi2 0x000007 0x0000000000 0x0 aix-vios2->vhost2
Check NPIV adapter mapping:
(run this on vio client, not on vio server)
root@bb_lpar: / # echo "vfcs" | kdb <--vfcs is a kdb subcommand
...
NAME ADDRESS STATE HOST HOST_ADAP OPENED NUM_ACTIVE
fcs0 0xF1000A000033A000 0x0008 aix-vios1 vfchost8 0x01 0x0000 <--shows which vfchost is used on vio server for this client
fcs1 0xF1000A0000338000 0x0008 aix-vios2 vfchost6 0x01 0x0000
------------------------------------
Check physical FC adapter setting (not in virtual environment):
(dyntrk, fc_err_recov, num_cmd_elems)
These are the settings what we would like to verify:
----------
root@bb_lpar: / # lsattr -El fscsi0| egrep 'dyntrk|fc_err_recov'
dyntrk yes Dynamic Tracking of FC Devices True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True
root@bb_lpar: / # lsattr -El fcs0| grep num_cmd_elems
num_cmd_elems 200 Maximum number of COMMANDS to queue to the adapter True
----------
Verifying the settings from kernel:
1. root@bb_lpar: / # echo efscsi fscsi0 | kdb | grep efscsi_ddi
struct efscsi_ddi ddi = 0xF1000A06007FA080 <--this hexa value will be used
2. root@bb_lpar: / # echo dd 0xF1000A06007FA080+20 2 | kdb <--"+20 2" should be added to the above hexa value
... (20 is a reserved number)
F1000A06007FA0A0: 0101020202010200 000000B400000028 ...............( <--on the specified locations you can decode the numbers there
FFDD NNNNNNNN
FF = fc_error_recov:(we have "02" in this example here, which is fast_fail)
01 = delayed_fail
02 = fast_fail
DD = dyntrk: (we have "01" in this example here, which means "yes")
00 = disabled (no)
01 = enabled (yes)
NNNN = num_cmd_elems: (we have "B4" in this example here, but some calculation is still needed)
1. change to decimal value: 000000B4 --> 180
2. add 20 to the decimal number: 180 + 20 = 200
(you must always add "20" to the decimal value you get)
------------------------------------
Volume group and lv info:
The volgrp subcommand displays information about vg and its lvs.
The volgrp structure addresses are registered in the devsw table in the DSDPTR field.
(devsw: displays miscellaneous kernel data structures)
root@bb_lpar: /dev # echo devsw | kdb | grep dsdptr | grep -v 00000000
dsdptr: F1000A0600751800 <--this will be used for "volgrp" command
dsdptr: 05A50280
dsdptr: F1000A0600751400
root@bb_lpar: /dev # echo volgrp F1000A0600751800| kdb <--displays info about given volgrp
...
VOLGRP............. F1000A0600751800
vg_eyec............ 4C564D766F6C6772 (LVMvolgr)
vg_name............ rootvg
vg_ras_name........ rootvg
vg_id.............. 00080E820000D900000001335FBB8276
vg_lock.......... @ F1000A0600751868 vg_lock............ 0000000000000000
major_num.......... 0000000A flags.............. 00040001
snapshot_copy...... 0000 partshift.......... 0012 (128M)
ltg_shift.......... 0001 (256K) open_count......... 000A
max_lvs............ 0100 max_pvs............ 0020
....
------------------------------------
Check hcheck_interval value of a disk:
1. root@bb_lpar: / # echo lke | kdb | grep pcm
59 F1000000A063D200 05A60000 00030000 02080242 /usr/lib/drivers/aixdiskpcmke <--this shows slot number, what we can use (here 59)
2. root@bb_lpar: / # echo "lke -s 59" | kdb | grep le_data
le_data........ 0000000005A80000 le_datasize.... 0000000000002828 <--this shows le_data value
(we will use this in adevq subbcommand)
3. root@bb_lpar: / # kdb
(0)> adevq
Unable to find <pcm_info>
Enter the pcm_info address (in hex): 0000000005A80000 <--the above value is given here
NAME ADDR STATE MACHINE ACTIVE_IO <--then we will see the list of hdisks
hdisk1 0xF1000A0600740400 0x0 0x 0 <--choose the address of a disk and run adevq against it
NAME ADDR STATE MACHINE ACTIVE_IO
hdisk2 0xF1000A0600740E00 0x0 0x 0
NAME ADDR STATE MACHINE ACTIVE_IO
hdisk3 0xF1000A0600741800 0x0 0x 0
NAME ADDR STATE MACHINE ACTIVE_IO
hdisk0 0xF1000A0600742200 0x0 0x 0
4. (0)> adevq 0xF1000A0600740400 | grep hcheck <--this shows the address of hcheck, what we will use
hcheck_t &hcheck = 0xF1000A0600740470
5. (0)> ahcheck 0xF1000A0600740470 | grep interval
uint interval = 0x0 <--this shows hcheck_interval value in hex (we have 0)
------------------------------------
Check for a process which is using a specific network port:
1. root@bb_lpar: / # netstat -Aan | grep 22 <--check for address of the port
f1000e000330ebb8 tcp4 0 0 *.22 *.* LISTEN
2. root@bb_lpar: / # kdb
(0)> sockinfo f1000e000330ebb8 tcpcb | grep pvproc <--feed the addres in sockinfo subcommand (grep for pvproc)
pvproc+016000 88*sshd ACTIVE 058000E 03A00A2 000000083846E480 0 0001
3. (0)> hcal 058000E <--calculate decimal value (this is the pid of the process)
Value hexa: 0058000E Value decimal: 5767182
(0)> e <--exit from kdb
4. root@bb_lpar: / # ps -fp 5767182 <--shows the process of a given pid
UID PID PPID C STIME TTY TIME CMD
root 5767182 3801250 0 May 09 - 0:00 /usr/sbin/sshd
No comments:
Post a Comment