Oracle Database - Enterprise Edition -
Version 11.2.0.3 and later
Information in this document applies to any platform.
A node is evicted from the cluster due to network
communication error.
GI Alert log reports following errors and one node
gets evicted:
Node 1 GI Alert log ------------------------
CRS-1612:Network communication with node prodrac2(2) missing for 50% of
timeout interval. Removal of this node from cluster in 29.240 seconds
..
CRS-1610:Network communication with node prodrac2 (2) missing for 90% of
timeout interval. Removal of this node from cluster in 3.740 seconds
CRS-1607:Node utx2db02 is being evicted in cluster incarnation 278185525;
details at (:CSSNM00007:) in
/orabase1/app/11.2.0.3/grid_6/log/utx2db01/cssd/ocssd.log.
Node 2 GI Alert log ------------------------
CRS-1610:Network communication with node prodrac1 (1) missing for 90% of
timeout interval. Removal of this node from cluster in 3.740 seconds
CRS-1609:This node is unable to communicate with other nodes in the cluster
and is going down to preserve cluster integrity; details at (:CSSNM00008:)
in /orabase1/app/11.2.0.3/grid_6/log/utx2db02/cssd/ocssd.log.
CRS-1656:The CSS daemon is terminating due to a fatal error; Details at
(:CSSSC00012:) in /orabase1/app/11.2.0.3/grid_6/log/utx2db02/cssd/ocssd.log
Top output shows that Cluster Health Monitor (CHM)
daemon ologgerd using high CPU and starts spinning before the reboot
PID
USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31439 root RT 0 375m 142m 58m S 161.8 0.1 359:42.49
/orabase1/app/11.2.0.3/grid_6/bin/ologgerd -m utx2db01 -r -d
/orabase1/app/11.2.0.3/grid_
The call stack also shows page allocation failure for ologgerd:
Jan
14 03:16:16 utx2db02 kernel: Free swap = 13181784kB
Jan 14 03:16:21 utx2db02 kernel: Total swap = 25165816kB
Jan 14 03:16:30 utx2db02 kernel: ologgerd: page allocation failure.
order:4, mode:0xd0
Jan 14 03:16:35 utx2db02 kernel: Pid: 31475, comm: ologgerd Not tainted
2.6.32-400.21.1.el5uek #1 <<<<<<<<<<
Jan 14 03:16:40 utx2db02 kernel: Call Trace:
Jan 14 03:16:44 utx2db02 kernel: [] __alloc_pages_nodemask+0x524/0x595
Jan 14 03:17:01 utx2db02 kernel: [] kmem_getpages+0x4f/0xf4
Jan 14 03:17:05 utx2db02 kernel: [] fallback_alloc+0x12e/0x1ce
Jan 14 03:17:06 utx2db02 kernel: [] ____cache_alloc_node+0x121/0x134
Jan 14 03:17:07 utx2db02 kernel: [] kmem_cache_alloc_node_notrace+0x84/0xb9
Jan 14 03:17:09 utx2db02 kernel: [] __kmalloc_node+0x46/0x73
Jan 14 03:17:13 utx2db02 kernel: [] ? __alloc_skb+0x72/0x13d
Jan 14 03:17:13 utx2db02 kernel: [] __alloc_skb+0x72/0x13d
Jan 14 03:17:15 utx2db02 kernel: [] sk_stream_alloc_skb+0x3d/0xaf
Jan 14 03:17:16 utx2db02 kernel: [] tcp_sendmsg+0x176/0x6cf
Jan 14 03:17:16 utx2db02 kernel: [] __sock_sendmsg+0x5e/0x67
Jan 14 03:17:18 utx2db02 kernel: [] sock_sendmsg+0xcc/0xe5
Jan 14 03:17:19 utx2db02 kernel: [] ? radix_tree_delete+0xf1/0x194
Jan 14 03:17:20 utx2db02 kernel: [] ? autoremove_wake_function+0x0/0x3d
Jan 14 03:17:21 utx2db02 kernel: [] ? security_sk_alloc+0x16/0x18
Jan 14 03:17:23 utx2db02 kernel: [] ? fget_light+0x58/0x73
Jan 14 03:17:25 utx2db02 kernel: [] ? sockfd_lookup_light+0x20/0x58
Jan 14 03:17:26 utx2db02 kernel: [] sys_sendto+0x12f/0x171
Jan 14 03:17:27 utx2db02 kernel: [] ? audit_syscall_entry+0x103/0x12f
Jan 14 03:17:31 utx2db02 kernel: [] system_call_fastpath+0x16/0x1b
None.
Loggerd uses high cpu and do lots of I/O to the disk
where the BDB (Berkeley Database used by CHM) resides.
This is due to BUG 13867435 - OLOGGERD USING
A LOT OF RESOURCES .
Apply Patch 13867435 - OLOGGERD USING A LOT
OF RESOURCES on top of 11.2.0.3.
The bug is fixed in 11.2.0.4 GI PSU.
我的处理方式如下 :
1. grid@woqurac1:/home/grid>crsctl
stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE rac1 --查看该资源
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE
ora.cssdmonitor
1 ONLINE ONLINE rac1
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.drivers.acfs
1 ONLINE ONLINE rac1
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE rac1
ora.gpnpd
1 ONLINE ONLINE rac1
ora.mdnsd
1 ONLINE ONLINE rac1
|