- 浏览: 4372926 次
- 性别:
- 来自: 厦门
文章分类
- 全部博客 (634)
- Oracle日常管理 (142)
- Oracle体系架构 (45)
- Oracle Tuning (52)
- Oracle故障诊断 (35)
- RAC/DG/OGG (64)
- Oracle11g New Features (48)
- DataWarehouse (15)
- SQL, PL/SQL (14)
- DB2日常管理 (9)
- Weblogic (11)
- Shell (19)
- AIX (12)
- Linux/Unix高可用性 (11)
- Linux/Unix日常管理 (66)
- Linux桌面应用 (37)
- Windows (2)
- 生活和工作 (13)
- 私人记事 (0)
- Python (9)
- CBO (15)
- Cognos (2)
- ORACLE 12c New Feature (2)
- PL/SQL (2)
- SQL (1)
- C++ (2)
- Hadoop大数据 (5)
- 机器学习 (3)
- 非技术 (1)
最新评论
-
di1984HIT:
xuexilee!!!
Oracle 11g R2 RAC高可用连接特性 – SCAN详解 -
aneyes123:
谢谢非常有用那
PL/SQL的存储过程和函数(原创) -
jcjcjc:
写的很详细
Oracle中Hint深入理解(原创) -
di1984HIT:
学习了,学习了
Linux NTP配置详解 (Network Time Protocol) -
avalonzst:
大写的赞..
AIX内存概述(原创)
CSS工作原理
在理解脑裂(Brain Split)处理过程前,有必要介绍一下Oracle RAC Css(Cluster Synchronization Services)的工作框架:
Oracle RAC CSS提供2种后台服务包括群组管理(Group Managment简称GM)和节点监控(Node Monitor简称NM),其中GM管理组(group)和锁(lock)服务。在集群中任意时刻总有一个节点会充当GM主控节点(master node)。集群中的其他节点串行地将GM请求发送到主控节点(master node),而master node将集群成员变更信息广播给集群中的其他节点。组成员关系(group membership)在每次发生集群重置(cluster reconfiguration)时发生同步。每一个节点独立地诠释集群成员变化信息。
而节点监控NM服务则负责通过skgxn(skgxn-libskgxn.a,提供节点监控的库)与其他厂商的集群软件保持节点信息的一致性。此外NM还提供对我们熟知的网络心跳(Network heartbeat)和磁盘心跳(Disk heartbeat)的维护以保证节点始终存活着。当集群成员没有正常Network heartbeat或Disk heartbeat时NM负责将成员踢出集群,被踢出集群的节点将发生节点重启(reboot)。
NM服务通过OCR中的记录(OCR中记录了Interconnect的信息)来了解其所需要监听和交互的端点,将心跳信息通过网络发送到其他集群成员。同时它也监控来自所有其他集群成员的网络心跳Network heartbeat,每一秒钟都会发生这样的网络心跳,若某个节点的网络心跳在misscount(by the way:10.2.0.1中Linux上默认misscount为60s,其他平台为30s,若使用了第三方vendor clusterware则为600s,但10.2.0.1中未引入disktimeout;10.2.0.4以后misscount为60s,disktimeout为200s;11.2以后misscount为30s:CRS-4678: Successful get misscount 30 for Cluster Synchronization Services,CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services)指定的秒数中都没有被收到的话,该节点被认为已经”死亡”了。NM还负责当其他节点加入或离开集群时初始化集群的重置(Initiates cluster reconfiguration)。
在解决脑裂的场景中,NM还会监控voting disk以了解其他的竞争子集群(subclusters)。关于子集群我们有必要介绍一下,试想我们的环境中存在大量的节点,以Oracle官方构建过的128个节点的环境为我们的想象空间,当网络故障发生时存在多种的可能性,一种可能性是全局的网络失败,即128个节点中每个节点都不能互相发生网络心跳,此时会产生多达128个的信息”孤岛”子集群。另一种可能性是局部的网络失败,128个节点中被分成多个部分,每个部分中包含多于一个的节点,这些部分就可以被称作子集群(subclusters)。当出现网络故障时子集群内部的多个节点仍能互相通信传输投票信息(vote mesg),但子集群或者孤岛节点之间已经无法通过常规的Interconnect网络交流了,这个时候NM Reconfiguration就需要用到voting disk投票磁盘。
Voting Disk
因为NM要使用voting disk来解决因为网络故障造成的通信障碍,所以需要保证voting disk在任意时刻都可以被正常访问。在正常状态下,每个节点都会进行磁盘心跳活动,具体来说就是会到投票磁盘的某个块上写入disk心跳信息,这种活动每一秒钟都会发生,同时CSS还会每秒读取一种称作”kill block”的”赐死块”,当”kill block”的内容表示本节点被驱逐出集群时,CSS会主动重启节点。
为了保证以上的磁盘心跳和读取”kill block”的活动始终正常运作CSS要求保证至少(N/2+1)个投票磁盘要被节点正常访问,这样就保证了每2个节点间总是至少有一个投票磁盘是它们都可以正常访问的,在正常情况下(注意是风平浪静的正常情况)只要节点所能访问的在线voting disk多于无法访问的voting disk,该节点都能幸福地活下去,当无法访问的voting disk多于正常的voting disk时,Cluster Communication Service进程将失败并引起节点重启。所以有一种说法认为voting disk只要有2个足以保证冗余度就可以了,没有必要有3个或以上voting disk,这种说法是错误的。Oracle推荐集群中至少要有3个voting disks。
Question:
有同学问那么voting disk 必须是奇数个呢?
Answer:
实际上我们仅仅是推荐使用奇数个vote disk ,而非必须是奇数个。10gR2中vote disk的数目上限是32个。
Question
我们可以使用2或4个vote disk吗?
Answer:
可以的。 但是2、4这样的数目在“至少(N/2+1)个投票磁盘要被节点正常访问”这一disk heartbeat的硬性算法下是不利的:
当我们使用2个vote disk 时,不能发生任意个vote disk的心跳失败
当我们使用3个vote disk 时,不能发生大于1个的vote disk心跳失败
当我们使用4个vote disk 时,不能发生大于1个的vote disk心跳失败 ,这和3个时的容错率是一样,但是因为我们有更多的vote disk,这会导致管理成本和引入的风险增长
当我们使用5个vote disk 时,不能发生大于2个的vote disk心跳失败
当我们使用6个vote disk 时,仍然不能发生大于2个的vote disk心跳失败, 同样的因为比5时多出一个 ,也会引入不合理的管理成本和风险
Question:
若节点间的网络心跳正常,且节点所能正常心跳的vote disk 大于不能正常访问的 ,如3个votedisk 时恰巧有1个vote disk 的disk heartbeat 超时,此时Brain split 会发生吗?
Answer:
这种情况即不会触发Brain Split,也不会引发节点驱逐协议(eviction protocol)。 当单个或小于(N/2+1)个的voting disk心跳失败(disk heartbeat failure)时,这种心跳失败可能是由于短期内节点访问voting disk发生I/O error错误而引起的,此时css会立刻将这些失败的voting disk标记为OFFLINE。虽然有一定数量的voting disk OFFLINE了,但是我们仍有至少(N/2+1)个投票磁盘可用,这保证了eviction protocol不会被调用,所以没有节点会被reboot重启。紧接着node monitor模块的Disk ping Monitor Thread(DPMT-clssnmDiskPMT)会重复尝试访问这些失败的OFFLINE voting disk,若这些投票磁盘变得再次可I/O访问且经过验证其上的数据也没有讹误,那么css会再次将此voting disk标记为ONLINE;但是如果在45s( 这里的45s是基于misscount和 内部算法获得的) 内仍不能正常访问相关的voting disk,那么DMPT将在cssd.log中生成警告信息,如:
CSSD]2011-11-11 20:11:20.668 >
WARNING: clssnmDiskPMT: long disk latency >(45940 ms) to voting disk (0//dev/asm-votedisk1)
假设以上发生clssnmDiskPMT警告的RAC场景共有3个voting disk,现已有一个asm-votedisk1因为I/O error或其他原因而被标记为OFFLINE,若此时再有一个votedisk也出现了问题并disk heartbeat 失败,那么节点会因为少于规定数目(2)的votedisk而引发eviction protocol,进而重启reboot。
单个或小于(N/2+1)个的voting disk心跳失败都仅仅生成警告(Warning),而非致命的错误。因为仍有绝大数量的vote disk可被访问,因此生成的警告都是非致命的,eviction protocol将不会被触发。
当实际的NM Reconfiguration集群重置情况发生时所有的active节点和正在加入集群的节点都会参与到reconfig中,那些没有应答(ack)的节点都将不再被归入新的集群关系中。实际上reconfig重置包括多个阶段:
1.初始化阶段 — reconfig manager(由集群成员号最低的节点担任)向其他节点发送启动reconfig的信号
2.投票阶段 — 节点向reconfig manager发送该节点所了解的成员关系
3.脑裂检查阶段 — reconfig manager检查是否脑裂
4.驱逐阶段 — reconfig manager驱逐非成员节点
5.更新阶段 — reconfig manager向成员节点发送权威成员关系信息
在脑裂检查阶段Reconfig Manager会找出那些没有Network Heartbeat而有Disk Heartbeat的节点,并通过Network Heartbeat(如果可能的话)和Disk Heartbeat的信息来计算所有竞争子集群(subcluster)内的节点数目,并依据以下2种因素决定哪个子集群应当存活下去:
拥有最多节点数目的子集群(Sub-cluster with largest number of Nodes)
若子集群内数目相等则为拥有最低节点号的子集群(Sub-cluster with lowest node number),举例来说在一个2节点的RAC环境中总是1号节点会获胜。
采用Stonith algorithm 的IO fencing(remote power reset)
STONITH算法是一种常用的I/O Fencing algorithm,是RAC中必要的远程关闭节点的接口。其想法十分简单,当某个节点上运行的软件希望确保本集群内的其他节点不能使用某种资源时,拔出其他节点的插座即可。这是一种简单、可靠且有些残酷的算法。Stonith 的优势是其没有特定的硬件需求,也不会限制集群的扩展性。
Oracle Clusterware的Process Monitor模块负责实现IO fencing,保证不会因节点/实例的不协调工作而产生corruption。Process Monitor的工作具体由hangcheck timer或者oprocd 完成, 在Linux平台上10.2.0.4 之前都没有oprocd的存在(其他Unix平台在10.2.0.1就有了),在安装RAC之前需要额外安装hangcheck timer软件以保证IO fencing, 到10.2.0.4 时Linux上也有了oprocd,具体见<Know about RAC Clusterware Process OPROCD> 一文。 这些负责IO fencing的进程一般都会被锁定在内存中(locked in memory)、实时运行(Real time)、休眠固定的时间(Sleep a fixed time)、以root身份运行;若进程唤醒后发现时间已经太晚,那么它会强制reboot;若这些进程发生失败,则也会重启,所以在RAC环境中oprocd是一个很重要的进程,不要失去手动去kill这个进程。
在完成脑裂检查后进入驱逐阶段,被驱逐节点会收到发送给它们的驱逐信息(如果网络可用的话),若无法发送信息则会通过写出驱逐通知到voting disk上的”kill block”来达到驱逐通知的目的。同时还会等待被驱逐节点表示其已收到驱逐通知,这种表示可能是通过网络通信的方式也可能是投票磁盘上的状态信息。
可以看到Oracle CSS中Brain Split Check时会尽可能地保证最大的一个子集群存活下来以保证RAC系统具有最高的可用性。
实际案例日志
1号节点网络失败,2,3号节点形成子集群;2,3节点通过voting disk向1号节点发起驱逐:
以下为1号节点的ocssd.log日志:
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 29.610 seconds
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE: clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1037, misstime 30390
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 3.280308e-268artbeat fatal, eviction in 29.150 seconds
对2,3号节点发起misscount计时
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE: clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30850
[ CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2011-04-23 17:11:44.368 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 28.610 seconds
[ CSSD]2011-04-23 17:12:04.778 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 14.580 seconds
[ CSSD]2011-04-23 17:12:04.779 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 3.280308e-268artbeat fatal, eviction in 14.120 seconds
[ CSSD]2011-04-23 17:12:06.207 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 13.580 seconds
[ CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 5.560 seconds
[ CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 5.100 seconds
[ CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 4.560 seconds
[ CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 4.100 seconds
[ CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 3.560 seconds
[ CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 3.100 seconds
[ CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 2.560 seconds
[ CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 2.100 seconds
[ CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 1.550 seconds
[ CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 1.090 seconds
2号节点的ocssd.log日志:
[ CSSD]2011-04-23 17:11:53.054 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 29.800 seconds
[ CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE: clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 1037, misstime 30200
[ CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2011-04-23 17:11:54.516 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 28.790 seconds
[ CSSD]2011-04-23 17:12:14.826 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 14.800 seconds
[ CSSD]2011-04-23 17:12:16.265 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 13.800 seconds
[ CSSD]2011-04-23 17:12:27.755 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 5.800 seconds
[ CSSD]2011-04-23 17:12:29.197 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 4.800 seconds
[ CSSD]2011-04-23 17:12:30.658 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 3.800 seconds
[ CSSD]2011-04-23 17:12:32.133 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 2.800 seconds
[ CSSD]2011-04-23 17:12:33.602 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 1.790 seconds
[ CSSD]2011-04-23 17:12:35.126 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 0.800 seconds
[ CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE: clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10]
clssnmHandleSyn应答3号节点发送的同步信息
[ CSSD]2011-04-23 17:12:35.399 [5073104] >USER: NMEVENT_SUSPEND [00][00][00][0e]
发生Node Monitoring SUSPEND事件
[ CSSD]2011-04-23 17:12:35.405 [117574544] >TRACE: clssnmSendVoteInfo: node(3) syncSeqNo(10)
通过clssnmSendVoteInfo向3号节点发送投票信息Vote mesg
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 1, state (3/0) unique (1303592601/1303592601) prevConuni(0) birth (9/9) (old/new)
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7e80ae8), probe((nil))
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmDeactivateNode: node 1 (vrh1) left cluster
确认1号节点离开了集群cluster
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)
[ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3) (old/new)
[ CSSD]2011-04-23 17:12:35.415 [117574544] >USER: clssnmHandleUpdate: SYNC(10) from node(3) completed
[ CSSD]2011-04-23 17:12:35.416 [117574544] >USER: clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2011-04-23 17:12:35.416 [117574544] >USER: clssnmHandleUpdate: NODE 3 (vrh3) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2011-04-23 17:12:35.416 [117574544] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2011-04-23 17:12:35.416 [3021970320] >TRACE: clssgmReconfigThread: started for reconfig (10)
[ CSSD]2011-04-23 17:12:35.416 [3021970320] >USER: NMEVENT_RECONFIG [00][00][00][0c]
[ CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock crs_version type 2
[ CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(9/9)
[ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3
[ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2
[ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(EVMDMAIN) birth(9/9)
[ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2
[ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(CRSDMAIN) birth(9/9)
[ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock #CSS_CLSSOMON type 2
[ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(9/9)
[ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 10
[ CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE: clssgmPeerDeactivate: node 1 (vrh1), death 10, state 0x80000000 connstate 0xa
[ CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE: clssgmPeerListener: connects done (2/2)
[ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2)
[ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[ CSSD]2011-04-23 17:12:35.428 [3021970320] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes
[ CSSD]CLSS-3001: local node number 2, master node number 2
完成reconfiguration
[ CSSD]2011-04-23 17:12:35.440 [3021970320] >TRACE: clssgmReconfigThread: completed for reconfig(10), with status(1)
以下为3号节点的ocssd.log:
[ CSSD]2011-04-23 17:12:36.303 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 1.867300e-268artbeat fatal, eviction in 29.220 seconds
[ CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE: clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 1037, misstime 30780
[ CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2011-04-23 17:12:57.889 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 1.867300e-268artbeat fatal, eviction in 14.220 seconds
[ CSSD]2011-04-23 17:13:10.674 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 5.220 seconds
[ CSSD]2011-04-23 17:13:12.115 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 4.220 seconds
[ CSSD]2011-04-23 17:13:13.597 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 3.210 seconds
[ CSSD]2011-04-23 17:13:15.024 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 2.220 seconds
[ CSSD]2011-04-23 17:13:16.504 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 1.220 seconds
[ CSSD]2011-04-23 17:13:17.987 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 0.220 seconds
[ CSSD]2011-04-23 17:13:18.325 [3053439888] >TRACE: clssnmPollingThread: Eviction started for node vrh1 (1), flags 0x040d, state 3, wt4c 0
[ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmDoSyncUpdate: Initiating sync 10
[ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (11)
[ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
[ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ALIVE
[ CSSD]2011-04-23 17:13:18.327 [3032460176] >TRACE: clssnmSendSync: syncSeqNo(10)
[ CSSD]2011-04-23 17:13:18.329 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(2)
[ CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE: clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10]
[ CSSD]2011-04-23 17:13:18.330 [8136912] >USER: NMEVENT_SUSPEND [00][00][00][0e]
[ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(11)
[ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmDoSyncUpdate: Terminating node 1, vrh1, misstime(60010) state(5)
[ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (13)
[ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ACTIVE
[ CSSD]2011-04-23 17:13:18.334 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(2)
[ CSSD]2011-04-23 17:13:18.335 [89033616] >TRACE: clssnmSendVoteInfo: node(3) syncSeqNo(10)
[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(13)
以上完成了2-3节点间的Vote mesg通信,这些信息包含Node identifier,GM peer to peer listening endpoint以及
View of cluster membership。
[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmCheckDskInfo: Checking disk info...
开始检测voting disk上的信息
[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmCheckDskInfo: node 1, vrh1, state 5 with leader 1 has smaller cluster size 1; my cluster size 2 with leader 2
发现其他子集群,包含1号节点且1号节点为该子集群的leader,为最小子集群;3号与2号节点组成最大子集群,2号节点为leader节点
[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmEvict: Start
[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmEvict: Evicting node 1, vrh1, birth 9, death 10,
impendingrcfg 1, stateflags 0x40d
发起对1号节点的驱逐
[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmSendShutdown: req to node 1, kill time 443294
[ CSSD]2011-04-23 17:13:18.339 [3032460176] >TRACE: clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7eaf220), probe((nil))
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmWaitOnEvictions: Start
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmWaitOnEvictions: node 1, vrh1, undead 1
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmCheckKillStatus: Node 1, vrh1, down, LATS(443144),timeout(150)
clssnmCheckKillStatus检查1号节点是否down了
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (15)
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ACTIVE
[ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmSendUpdate: syncSeqNo(10)
[ CSSD]2011-04-23 17:13:18.341 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(2)
[ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmUpdateNodeState: node 1, state (5/0) unique (1303592601/1303592601) prevConuni(1303592601) birth (9/9) (old/new)
[ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmDeactivateNode: node 1 (vrh1) left cluster
[ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)
[ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmUpdateNodeState: node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3) (old/new)
[ CSSD]2011-04-23 17:13:18.342 [89033616] >USER: clssnmHandleUpdate: SYNC(10) from node(3) completed
[ CSSD]2011-04-23 17:13:18.342 [89033616] >USER: clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2011-04-23 17:13:18.342 [89033616] >USER: clssnmHandleUpdate: NODE 3 (vrh3) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2011-04-23 17:13:18.342 [89033616] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2011-04-23 17:13:18.347 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(15)
[ CSSD]2011-04-23 17:13:18.348 [3032460176] >TRACE: clssnmDoSyncUpdate: Sync 10 complete!
[ CSSD]2011-04-23 17:13:18.350 [3021970320] >TRACE: clssgmReconfigThread: started for reconfig (10)
[ CSSD]2011-04-23 17:13:18.350 [3021970320] >USER: NMEVENT_RECONFIG [00][00][00][0c]
[ CSSD]2011-04-23 17:13:18.351 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock crs_version type 2
[ CSSD]2011-04-23 17:13:18.352 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(9/9)
[ CSSD]2011-04-23 17:13:18.353 [3063929744] >TRACE: clssgmDispatchCMXMSG(): got message type(7) src(2) incarn(10) during incarn(9/9)
[ CSSD]2011-04-23 17:13:18.354 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3
.........................省略若干行.........................
[ CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(9/9)
[ CSSD]2011-04-23 17:13:18.357 [3021970320] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 10
[ CSSD]2011-04-23 17:13:18.366 [3063929744] >TRACE: clssgmPeerDeactivate: node 1 (vrh1), death 10, state 0x80000000 connstate 0xa
[ CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(68) incarn 10
[ CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE: clssgmPeerListener: connects done (2/2)
[ CSSD]2011-04-23 17:13:18.369 [3021970320] >TRACE: clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2)
更新阶段
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes
[ CSSD]CLSS-3001: local node number 3, master node number 2
[ CSSD]2011-04-23 17:13:18.372 [3021970320] >TRACE: clssgmReconfigThread: completed for reconfig(10), with status(1)
另一场景为1号节点未加入集群,2号节点的网络失败,因2号节点的member number较小故其通过voting disk向3号节点发起驱逐
以下为2号节点的ocssd.log日志
[ CSSD]2011-04-23 17:41:48.643 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 29.890 seconds
[ CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE: clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30110
[ CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2011-04-23 17:41:50.132 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 28.890 seconds
[ CSSD]2011-04-23 17:42:10.533 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 14.860 seconds
[ CSSD]2011-04-23 17:42:11.962 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 13.860 seconds
[ CSSD]2011-04-23 17:42:23.523 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 5.840 seconds
[ CSSD]2011-04-23 17:42:24.989 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 4.840 seconds
[ CSSD]2011-04-23 17:42:26.423 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 3.840 seconds
[ CSSD]2011-04-23 17:42:27.890 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 2.840 seconds
[ CSSD]2011-04-23 17:42:29.382 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 1.840 seconds
[ CSSD]2011-04-23 17:42:30.832 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 0.830 seconds
[ CSSD]2011-04-23 17:42:32.020 [3053439888] >TRACE: clssnmPollingThread: Eviction started for node vrh3 (3), flags 0x040d, state 3, wt4c 0
[ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmDoSyncUpdate: Initiating sync 13
[ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (11)
[ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
[ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmSendSync: syncSeqNo(13)
[ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(1)
[ CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[vrh2] seq[13] sync[13]
[ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(11)
[ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmDoSyncUpdate: Terminating node 3, vrh3, misstime(60000) state(5)
[ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (13)
[ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2011-04-23 17:42:32.021 [5073104] >USER: NMEVENT_SUSPEND [00][00][00][0c]
[ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(1)
[ CSSD]2011-04-23 17:42:32.022 [117574544] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(13)
[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(13)
[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmCheckDskInfo: Checking disk info...
[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmCheckDskInfo: node 3, vrh3, state 5 with leader 3 has smaller cluster size 1; my cluster size 1 with leader 2
检查voting disk后发现子集群3为最小"子集群"(3号节点的node number较2号大);2号节点为最大子集群
[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Start [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Evicting node 3, vrh3, birth 3, death 13, impendingrcfg 1, stateflags 0x40d
[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmSendShutdown: req to node 3, kill time 1643084
发起对3号节点的驱逐和shutdown request
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmDiscHelper: vrh3, node(3) connection failed, con (0xb7e79bb0), probe((nil))
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmWaitOnEvictions: Start
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmWaitOnEvictions: node 3, vrh3, undead 1
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmCheckKillStatus: Node 3, vrh3, down, LATS(1642874),timeout(210)
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (15)
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmSendUpdate: syncSeqNo(13)
[ CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(1)
[ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmUpdateNodeState: node 1, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)
[ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmUpdateNodeState: node 3, state (5/0) unique (1303591326/1303591326) prevConuni(1303591326) birth (3/3) (old/new)
[ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmDeactivateNode: node 3 (vrh3) left cluster
[ CSSD]2011-04-23 17:42:32.024 [117574544] >USER: clssnmHandleUpdate: SYNC(13) from node(2) completed
[ CSSD]2011-04-23 17:42:32.024 [117574544] >USER: clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(15)
[ CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE: clssnmDoSyncUpdate: Sync 13 complete!
[ CSSD]2011-04-23 17:42:32.024 [3021970320] >TRACE: clssgmReconfigThread: started for reconfig (13)
[ CSSD]2011-04-23 17:42:32.024 [3021970320] >USER: NMEVENT_RECONFIG [00][00][00][04]
[ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock crs_version type 2
[ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(crs_version) birth(3/3)
[ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3
................省略若干行..............
[ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(#CSS_CLSSOMON) birth(3/3)
[ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 13
[ CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE: clssgmPeerDeactivate: node 3 (vrh3), death 13, state 0x0 connstate 0xf
[ CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE: clssgmPeerListener: connects done (1/1)
[ CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE: clssgmEstablishMasterNode: MASTER for 13 is node(2) birth(2)
[ CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[ CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 13 with 1 nodes
[ CSSD]CLSS-3001: local node number 2, master node number 2
完成reconfiguration
[ CSSD]2011-04-23 17:42:32.027 [3021970320] >TRACE: clssgmReconfigThread: completed for reconfig(13), with status(1)
以下为3号节点的ocssd.log日志:
[ CSSD]2011-04-23 17:42:33.204 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 1.867300e-268artbeat fatal, eviction in 29.360 seconds
[ CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE: clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1039, misstime 30640
[ CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2011-04-23 17:42:55.168 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 1.867300e-268artbeat fatal, eviction in 14.330 seconds
[ CSSD]2011-04-23 17:43:08.182 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 5.310 seconds
[ CSSD]2011-04-23 17:43:09.661 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 4.300 seconds
[ CSSD]2011-04-23 17:43:11.144 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 3.300 seconds
[ CSSD]2011-04-23 17:43:12.634 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 2.300 seconds
[ CSSD]2011-04-23 17:43:14.053 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 1.300 seconds
[ CSSD]2011-04-23 17:43:15.467 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 0.300 seconds
[ CSSD]2011-04-23 17:43:15.911 [3053439888] >TRACE: clssnmPollingThread: Eviction started for node vrh2 (2), flags 0x040f, state 3, wt4c 0
[ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmDoSyncUpdate: Initiating sync 13
[ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (11)
[ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ALIVE
[ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmSendSync: syncSeqNo(13)
[ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(1)
[ CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[ CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE: clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[29] sync[13]
[ CSSD]2011-04-23 17:43:15.912 [8136912] >USER: NMEVENT_SUSPEND [00][00][00][0c]
[ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(11)
[ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmDoSyncUpdate: Terminating node 2, vrh2, misstime(60010) state(5)
[ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (13)
[ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ACTIVE
[ CSSD]2011-04-23 17:43:15.913 [89033616] >TRACE: clssnmSendVoteInfo: node(3) syncSeqNo(13)
[ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(1)
[ CSSD]2011-04-23 17:43:15.913 [3032460176] >TRACE: clssnmCheckDskInfo: Checking disk info...
[ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: clssnmCheckDskInfo: Aborting local node to avoid splitbrain. [ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: : my node(3), Leader(3), Size(1) VS Node(2), Leader(2), Size(1)
读取voting disk后发现kill block,为避免split brain,自我aborting!
[ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: ###################################
[ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: clssscExit: CSSD aborting
[ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: ###################################
[ CSSD]--- DUMP GROCK STATE DB ---
[ CSSD]----------
[ CSSD] type 2, Id 4, Name = (crs_version)
[ CSSD] flags: 0x1000
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =2, master 2
[ CSSD] . . . . .
[ CSSD] memberNo =2, seq 2
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD] . . . . .
[ CSSD] memberNo =1, seq 12
[ CSSD] flags = 0x1000, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 2, nodeBirth 2
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 3, Id 11, Name = (_ORA_CRS_FAILOVER)
[ CSSD] flags: 0x0
[ CSSD] grant: count=1, type 3, wait 1
[ CSSD] Member Count =1, master -3
[ CSSD] . . . . .
[ CSSD] memberNo =0, seq 0
[ CSSD] flags = 0x12, granted 1
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 2, Id 2, Name = (EVMDMAIN)
[ CSSD] flags: 0x1000
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =2, master 2
[ CSSD] . . . . .
[ CSSD] memberNo =2, seq 1
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 2, nodeBirth 2
[ CSSD] privateDataSize = 508
[ CSSD] publicDataSize = 504
[ CSSD] . . . . .
[ CSSD] memberNo =3, seq 2
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 508
[ CSSD] publicDataSize = 504
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 2, Id 5, Name = (CRSDMAIN)
[ CSSD] flags: 0x1000
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =1, master 3
[ CSSD] . . . . .
[ CSSD] memberNo =3, seq 2
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 128
[ CSSD] publicDataSize = 128
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh1)
[ CSSD] flags: 0x0
[ CSSD] grant: count=1, type 3, wait 1
[ CSSD] Member Count =1, master -3
[ CSSD] . . . . .
[ CSSD] memberNo =0, seq 0
[ CSSD] flags = 0x12, granted 1
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh3)
[ CSSD] flags: 0x0
[ CSSD] grant: count=1, type 3, wait 1
[ CSSD] Member Count =1, master -3
[ CSSD] . . . . .
[ CSSD] memberNo =0, seq 0
[ CSSD] flags = 0x12, granted 1
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 2, Id 3, Name = (ocr_crs)
[ CSSD] flags: 0x1000
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =2, master 3
[ CSSD] . . . . .
[ CSSD] memberNo =3, seq 2
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 32
[ CSSD] . . . . .
[ CSSD] memberNo =2, seq 12
[ CSSD] flags = 0x1000, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 2, nodeBirth 2
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 32
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 2, Id 1, Name = (#CSS_CLSSOMON)
[ CSSD] flags: 0x1000
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =2, master 2
[ CSSD] . . . . .
[ CSSD] memberNo =2, seq 1
[ CSSD] flags = 0x1000, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 2, nodeBirth 2
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD] . . . . .
[ CSSD] memberNo =3, seq 2
[ CSSD] flags = 0x1000, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 3, nodeBirth 3
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD]----------
[ CSSD]--- END OF GROCK STATE DUMP ---
[ CSSD]------- Begin Dump -------
补充官方文档
The RM (Reconfig Manager) sends a sync message to all participating nodes. Participating nodes respond with a sync acknowledgement. After this the vote phase begins and the master sends a vote message to all participating nodes. Participating nodes repond with a vote info message containing their node identifier and GM peer to peer listening endpoint. In the split-check phase, the RM uses the voting disk to verify there is no split-brain. It finds nodes heartbeating to disk that are not connected via the network. If it finds these, it will determine which nodes are talking to which and the largest subcluster survives. For example, if we have a 5 node cluster and all of the nodes are heartbeating to the voting disk but only a group of 3 can communicate via the network and a group of 2 can communication via the network, this means we have 2 subclusters. The largest subcluster (3) would survive while the other subcluster (2) would not. After this the evict phase would evict nodes previously in the cluster but not considered members in this incarnation. In this case we would send a message to evicted nodes (if possible) and write eviction notice to a ‘kill’ block in the voting file. We would wait for the node to indicate it got the eviction notice (wait for seconds). The wait is terminated by a message or status on the voting file indicating that the node got the eviction notice. In the update phase the master sends an update message containing the definitive cluster membership and node information for all particpating nodes. The participating nodes send update acknowledgements. All members queue the reconfiguration event ot their GM.
As far as voting disks are concerned, a node must be able to access strictly more than half of the voting disks at any time. So if you want to be able to tolerate a failure of n voting disks, you must have at least 2n+1 configured. (n=1 means 3 voting disks). You can configure up to 32 voting disks, providing protection against 15 simultaneous disk failures.
Oracle recommends that customers use 3 or more voting disks in Oracle RAC 10g Release 2. Note: For best availability, the 3 voting files should be physically separate disks. It is recommended to use an odd number as 4 disks will not be any more highly available than 3 disks, 1/2 of 3 is 1.5…rounded to 2, 1/2 of 4 is 2, once we lose 2 disks, our cluster will fail with both 4 voting disks or 3 voting disks.
When interconnect breaks – keeps the largest cluster possible up, other nodes will be evicted, in 2 node cluster lowest number node remains.
Node eviction: pick a cluster node as victim to reboot.Always keep the largest cluster possible up, evicted other nodes two nodes: keep the lowest number node up and evict other
参考至:http://blog.csdn.net/liu_maclean/article/details/7171511
如有错误,欢迎指正
邮箱:czmcj@163.com
发表评论
-
Duplicate a Database Using RMAN in Oracle Database 11g Release 2
2016-11-03 15:38 945Duplicate a Database Using RMA ... -
grid control 安装总结 (三)之EMD upload error: uploadXMLFiles skipped或忘记OMS密码
2014-11-17 17:16 2777[oracle@anbob agent10g]$ bin/e ... -
Grid Control服务端看不到Agent相关信息,同时OMS状态为unknown处理
2014-11-17 17:14 2148一次Grid Control的agent是启动的,但是在GC ... -
给11gR2 RAC添加LISTENER监听器并静态注册
2014-10-22 17:52 3158之前有同学想要给11gR2的RAC添加LISTENER监听器 ... -
HOW TO REMOVE CRS AUTO START AND RESTART FOR A RAC INSTANCE
2014-10-12 20:08 1461Subject: HOW TO REMOVE CRS AU ... -
How To Enable Archive Logging In RAC Environment [ID 1186764.1]
2014-10-12 19:45 1015How To Enable Archive Logging ... -
Oracle 11g RAC使用Manual和Policy Managed方法配置
2014-10-12 17:25 8859软件环境: 操作系统:Red Hat Enterpris ... -
Moving OCR and Voting Disk to Oracle ASM Online
2014-10-12 16:35 2628Prior Oracle 11gR2, it was only ... -
Oracle 11g R2的启动与关闭(原创)
2014-10-10 15:04 27186概述 Oracle 11gR2 中对CRSD资源进行了重新 ... -
重新配置与卸载11gR2 Grid Infrastructure
2014-10-10 08:24 2290概述Oracle 11g R2 Grid Infrastru ... -
Data Guard Broker Properties
2014-10-07 16:15 967Data Guard Broker Properties ... -
Data Guard - Snapshot Standby Database配置
2014-10-07 07:39 933概述一般情况下,物理standby数据库处于mount状态 ... -
Active Data Guard
2014-10-06 19:27 870Oracle Active Data Guard is a ... -
Data Guard Broker系列之六:Fast-Start Failover
2014-10-06 18:29 3068Fast-Start Failover 基本概念 Fa ... -
Data Guard Broker系列之五:数据库角色转换
2014-10-06 18:28 1227数据库转换的基本概 ... -
Data Guard Broker系列之四:数据库管理
2014-10-06 18:27 1196数据库状态管理 数据 ... -
Data Guard Broker系列之三:Data Guard Broker配置管理
2014-10-06 16:56 1262配置data guard broker配置文件基本参数 ... -
Data Guard Broker系列之二:Data Guard Broker配置实战
2014-10-05 16:16 1577配置之前DG环境状态 测试的DG环境安装在同一个机器上, ... -
Data Guard broker系列之一:Data Guard Broker基础
2014-10-05 16:13 2272Data Guard broker概览 Data ... -
Dataguard切换保护模式(从MAXIMUM PERFORMANCE 到 maximize availability)
2014-10-05 14:52 12401. 首先查看主库当前保护模式(主库操作)SQL> s ...
相关推荐
大话Oracle_RAC__集群_高可用性_备份与恢复.pdf
oracle rac (veritas)双机集群,一套机器挂了没发现,等两套都挂了才发现都是主板坏。。。
这个文档详细的记录了重启Oracle rac的每一个步骤,重启的方法很多,这个可能不是最适合你的,但这个是有效的,这个文档是我给外行看的
Oracle RAC 19c集群搭建指南:实现高可用、高性能的数据库集群(超详细),一看就会的那种
oracle,rac,集群,可用,备份,恢复
教程名称:Oracle RAC数据库集群视频教程(10讲)课程目录:【】1.OracleRAC集群体系结构_drm【】10.测试OracleRAC数据库集群功能【】2.安装OracleRAC数据库(一)【】3.安装OracleRAC数据库(二)_drm【】4.安装...
Oracle单机通过rman迁移到RAC集群
教程名称:Oracle RAC真正应用集群系列视频教程课程目录:【】Oracle RAC真正应用集群3集【】OracleRAC真正应用集群1集【】OracleRAC真正应用集群2集【】OracleRAC真正应用集群4集【】OracleRAC真正应用集群5集 ...
大话Oracle RAC:集群、高可用性、备份与恢复(带目录清晰中文完整版)
在多数业务中,Oracle 真正应用集群 (RAC) 配置的主要业务要求是整个系统中数据库层的可伸缩性 — 这样,当用户数增加时,可将额外实例添加到该集群来分发该负载。 在 Oracle RAC 10g 中,这个特定的功能已经变得...
《大话Oracle RAC:集群 高可用性 备份与恢复》以Oracle 10g为基础,对Oracle RAC进行了全面的介绍和分析。全书分为两个部分,共14章,第1部分是集群理论篇,这部分从集群基础知识入手,通过分析集群环境和单机环境...
资源名称:深入理解Oracle RAC 12c内容简介:《深入理解Oracle RAC 12c》介绍了Oracle RAC 12c技术的方方面面,涵盖了与RAC技术相关的集群件知识、数据库知识、存储知识、网络知识,并在基于RAC的应用软件设计、优化...
oracle RAC 和 weblogic集群安装配置和部署详细文档
在 Oracle Enterprise Linux 和 iSCSI 上构建您自己的 Oracle RAC 11g 集群
Oracle RAC数据库集群PPT教案.pptx
Linux 常用命令 oracle rac 中文手册 官网翻译 oracle rac
Linux下Oracle RAC集群配置详细说明
aix-rac-1-aix操作系统必要文件集的安装与升级.pdf Oracle RAC集群aix
基于Oracle RAC 双活方案实施,如何规避脑裂风险- 最佳实践.docx