Sunday, August 5, 2012

BUG: soft lockup - CPU#11 stuck for 10s! [updatedb:20793] update db mlocate consuming 100% of cpu resources

Updatedb process consumes the 100% of resources and the NFS or iscsci storage mount points staled 

the update db process consumes log resources when the Server io is high so in the high io system we can stop the updatedb command by commenting the mlocate  file in the /etc/cron.daily 
if you check the /var/log/messages you can find the error messages 

Aug  5 04:03:57 fahmid kernel: BUG: soft lockup - CPU#11 stuck for 10s! [updatedb:20793]
Aug  5 04:03:57 fahmid kernel: CPU 11:
Aug  5 04:03:57 fahmid kernel: Modules linked in: autofs4 hidp nfs fscache nfs_acl rfcomm l2cap bluetooth lockd sunrpc bonding be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio cxgb3i cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport st joydev sr_mod cdrom sg qla2xxx hpilo scsi_transport_fc bnx2 serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Aug  5 04:03:57 fahmid kernel: Pid: 20793, comm: updatedb Not tainted 2.6.18-194.el5 #1
Aug  5 04:03:57 fahmid kernel: RIP: 0010:[<ffffffff88052f98>]  [<ffffffff88052f98>] :ext3:ext3_find_entry+0x20f/0x575
Aug  5 04:03:57 fahmid kernel: RSP: 0018:ffff8106ba77fb48  EFLAGS: 00000206
Aug  5 04:03:57 fahmid kernel: RAX: 0000000000000000 RBX: ffff81025c11bc10 RCX: 000000003386148f
Aug  5 04:03:57 fahmid kernel: RDX: ffff81061cf64c00 RSI: ffff8106ba77fb28 RDI: ffff8101157326e8
Aug  5 04:03:57 fahmid kernel: RBP: 0000000000000000 R08: ffff8109bf933a48 R09: 0000000000000034
Aug  5 04:03:57 fahmid kernel: R10: ffff81063510c790 R11: 000000005019c3a3 R12: 0000000000000002
Aug  5 04:03:57 fahmid kernel: R13: ffff81025c11bc10 R14: ffffffff80064ab0 R15: ffff810b8ded8740
Aug  5 04:03:57 fahmid kernel: FS:  00002ae9f2950260(0000) GS:ffff810c1fc25cc0(0000) knlGS:0000000000000000
Aug  5 04:03:57 fahmid kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug  5 04:03:57 fahmid kernel: CR2: 000000009ff81bbc CR3: 00000001a233d000 CR4: 00000000000006e0
Aug  5 04:03:57 fahmid kernel:


Step 1 Check the top command for the highest consuming resources

[root@fahmid cron.daily]# top
top - 17:42:35 up 25 days,  8:02,  8 users,  load average: 775.61, 776.08, 775.54
Tasks: 1594 total,   6 running, 1588 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.3%us,  6.4%sy,  0.9%ni, 18.4%id, 74.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  49454804k total, 49333240k used,   121564k free,   107464k buffers
Swap: 50331636k total,  5675056k used, 44656580k free,  4303088k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  20793 root      34  19  3904  592  504 R 100.0  0.0   5139:00 updatedb
  380 oracle    21   0 3693m 1.4g 4804 S  6.2  3.0   1560:36 java
  432 oracle    20   0 3579m 1.4g 4024 S  5.6  3.0   1674:38 java

Step 2 Check the updatedb process

[root@fahmid log]# ps -ef |grep updatedb
root      6046     1  0 Aug03 ?        00:00:00 /usr/bin/updatedb -f sysfs?rootfs?bdev?proc?cpuset?binfmt_misc?debugfs?securityfs?sockfs?usbfs?pipefs?anon_inodefs?futexfs?tmpfs?inotifyfs?eventpollfs?devpts?ramfs?hugetlbfs?mqueue?rpc_pipefs?nfs?nfs4?autofs
root      6657     1  0 04:03 ?        00:00:00 /usr/bin/updatedb -f sysfs?rootfs?bdev?proc?cpuset?binfmt_misc?debugfs?securityfs?sockfs?usbfs?pipefs?anon_inodefs?futexfs?tmpfs?inotifyfs?eventpollfs?devpts?ramfs?hugetlbfs?mqueue?rpc_pipefs?nfs?nfs4?autofs
root     20793     1 99 Aug02 ?        3-14:02:05 /usr/bin/updatedb -f sysfs?rootfs?bdev?proc?cpuset?binfmt_misc?debugfs?securityfs?sockfs?usbfs?pipefs?anon_inodefs?futexfs?tmpfs?inotifyfs?eventpollfs?devpts?ramfs?hugetlbfs?mqueue?rpc_pipefs?nfs?nfs4?autofs
root     21571     1  0 Aug04 ?        00:00:00 /usr/bin/updatedb -f sysfs?rootfs?bdev?proc?cpuset?binfmt_misc?debugfs?securityfs?sockfs?usbfs?pipefs?anon_inodefs?futexfs?tmpfs?inotifyfs?eventpollfs?devpts?ramfs?hugetlbfs?mqueue?rpc_pipefs?nfs?nfs4?autofs
root     21982 20160  0 18:05 pts/16   00:00:00 grep updatedb

Step 3 Kill the process by using the hard kill

Try to kill all the process by hard killing but still it will not kill the process 

[root@fahmid log]# kill -9 6046
[root@fahmid log]# kill -9 6657
[root@fahmid log]# kill -9 20793
[root@fahmid log]# kill -9 21571
[root@fahmid log]# ps -ef |grep updatedb
root      6046     1  0 Aug03 ?        00:00:00 /usr/bin/updatedb -f sysfs?rootfs?bdev?proc?cpuset?binfmt_misc?debugfs?securityfs?sockfs?usbfs?pipefs?anon_inodefs?futexfs?tmpfs?inotifyfs?eventpollfs?devpts?ramfs?hugetlbfs?mqueue?rpc_pipefs?nfs?nfs4?autofs
root      6657     1  0 04:03 ?        00:00:00 /usr/bin/updatedb -f sysfs?rootfs?bdev?proc?cpuset?binfmt_misc?debugfs?securityfs?sockfs?usbfs?pipefs?anon_inodefs?futexfs?tmpfs?inotifyfs?eventpollfs?devpts?ramfs?hugetlbfs?mqueue?rpc_pipefs?nfs?nfs4?autofs
root     20793     1 99 Aug02 ?        3-14:05:54 /usr/bin/updatedb -f sysfs?rootfs?bdev?proc?cpuset?binfmt_misc?debugfs?securityfs?sockfs?usbfs?pipefs?anon_inodefs?futexfs?tmpfs?inotifyfs?eventpollfs?devpts?ramfs?hugetlbfs?mqueue?rpc_pipefs?nfs?nfs4?autofs
root     21571     1  0 Aug04 ?        00:00:00 /usr/bin/updatedb -f sysfs?rootfs?bdev?proc?cpuset?binfmt_misc?debugfs?securityfs?sockfs?usbfs?pipefs?anon_inodefs?futexfs?tmpfs?inotifyfs?eventpollfs?devpts?ramfs?hugetlbfs?mqueue?rpc_pipefs?nfs?nfs4?autofs
root     22232 20160  0 18:09 pts/16   00:00:00 grep updatedb

Step 4 Check the mlocate  in the /etc/cron.daily 

[root@fahmid cron.daily]# pwd
/etc/cron.daily
[root@fahmid cron.daily]# ls
0anacron  0logwatch  cups  logrotate  makewhatis.cron  mlocate.cron  prelink  rpm  tetex.cron  tmpwatch
[root@fahmid cron.daily]# vi mlocate.cron
#!/bin/sh
nodevs=$(< /proc/filesystems awk '$1 == "nodev" { print $2 }')
renice +19 -p $$ >/dev/null 2>&1
/usr/bin/updatedb -f "$nodevs"

Step 5 comment the mlocate file in the cron .daily

change the mlocate  to another name 
[root@fahmid cron.daily]# vi mlocateold.cron
#!/bin/sh
#nodevs=$(< /proc/filesystems awk '$1 == "nodev" { print $2 }')
#renice +19 -p $$ >/dev/null 2>&1
#/usr/bin/updatedb -f "$nodevs"

Step 6 Try to kill the process but still the process is alive then restart the server

[root@fahmid cron.daily]# reboot

Broadcast message from root (pts/16) (Sun Aug  5 18:35:36 2012):

The system is going down for reboot NOW!
Some times the reboot from the terminal does not work  if we had the console then try to reboot through the console
or physical reboot






No comments:

Post a Comment