ITPUX技术网

交流 . 资讯 . 分享
Make progress together!
Oracle数据库DBA高级工程师培训视频
Oracle数据库培训-备份恢复-性能优化-集群容灾
mysql数据库视频教程

Oracle RAC数据库集群实例宕机崩溃故障的5种原因分析

内容发布:风哥| 发布时间:2014-1-21 22:35:58
Oracle RAC数据库集群实例宕机故障的5种原因分析

以下来自Oracle官方文档:
Top 5 RAC Instance Crash Issues

DetailsIssue #1: ORA-29770 LMHB Terminate Instance

Symptoms:
LMON (ospid: 31216) waits for event 'control file sequential read' for 88 secs.
Errors in file /oracle/base/diag/rdbms/prod/prod3/trace/prod3_lmhb_31304.trc (incident=2329):
ORA-29770: global enqueue process LMON (OSID 31216) is hung for more than 70 seconds
LMHB (ospid: 31304) is terminating the instance.

or
LMON (ospid: 8594) waits for event 'control file sequential read' for 118 secs.
ERROR: LMON is not healthy and has no heartbeat.
ERROR: LMHB (ospid: 8614) is terminating the instance.

Possible Causes:
  LMHB crashes the instance with LMON waiting on controlfile read
Bug 11890804 LMHB crashes instance with ORA-29770 after long "control file sequential read" waits

Solutions:
Bug 8888434 has been fixed in 11.2.0.2+
Bug 11890804 has been fixed in 11.2.0.3+
Please refer Document 1197674.1, Document 8888434.8 and Document 11890804.8 for more details

Issue #2: Instance crash with ORA-481
Symptoms:
1. PMON (ospid: 12585): terminating the instance due to error 481
LMON trace shows:
Begin DRM(107) (swin 0)
* drm quiesce <kjxgmrcfg: Reconfiguration started, type 6

LMSx trace shows:
2011-07-05 10:53:44.218905 : Start affinity expansion for pkey 81885.0
2011-07-05 10:53:44.498923 : Expand failed: pkey 81885.0, 229 shadows traversed, 153 replayed 1 retries

2. PMON (ospid: 4915562): terminating the instance due to error 481
Sat Oct 01 19:21:37 2011
System state dump requested by (instance=2, osid=4915562 (PMON)), summary=[abnormal instance termination].

Possible Causes:
1. Bug 11875294 LMS gets stuck during DRM, Instance crashed with ORA-481
2. HAIP is not online on partial of cluster nodes, or HAIP is online on all cluster nodes but they are not pingable

Solutions:
1. Bug 11875294 has been fixed in 11.2.0.3, workaround is:
Disable read-mostly locking by setting:
_gc_read_mostly_locking=FALSE.
Please refer < Document 11875294.8> for more information.

2. Fix HAIP issue per Document 1383737.1

Issue #3: ORA-600[kjbmprlst:shadow], ORA-600[kjbrref:pkey], ORA-600[kjbmocvt:rid], [kjbclose_remaster:!drm], ORA-600 [kjbrasr:pkey], instance crash
Symptoms:
RAC instance crashes with ORA-600 [kjbmprlst:shadow] or ORA-600[kjbrref:pkey], or ORA-600[kjbmocvt:rid],[kjbclose_remaster:!drm], ORA-600 [kjbrasr:pkey]

Possible Causes:
This group of ORA-600 are related with DRM (dynamic resource remastering) messaging or read mostly locking. Quite few bugs involved:
Document 9458781.8 Missing close message to master leaves closed lock dangling crashing the instance with assorted Internal error
Document 9835264.8 ORA-600 [kjbrasr:pkey] / ORA-600 [kjbmocvt:rid] in RAC with dynamic remastering
Document 10200390.8 ORA-600[kjbclose_remaster:!drm] in RAC with fix for 9979039
Document 10121589.8 ORA-600 [kjbmprlst:shadow] can occur in RAC
Document 11785390.8 Stack corruption / incorrect behaviour possible in RAC
Document 12408350.8 ORA-600 [kjbrasr:pkey] in RAC with read mostly locking
Document 12834027.8 ORA-600 [kjbmprlst:shadow] / ORA-600 [kjbrasr:pkey] with RAC read mostly locking

Solutions:
Most of above bugs are fixed in 11.2.0.3, apply 11.2.0.3 patchset should avoid the bugs with the exception of Bug 12834027, this bug will be fixed in 12.1. Workaround for the bug is:

Disable DRM
or
Disable read-mostly object locking
eg: Run with "_gc_read_mostly_locking"=FALSE

Please refer to above Document number for each bug explanation and solution.

Issue #4: Dumps on kcldle / kclfplz / kcbbxsv_l2 / kclfprm using flash
Symptoms:
ORA-7445[kcldle]
ORA-7445[kclfplz]
ORA-7445[kcbbxsv_12]
ORA-744[kclfprm]  reported in alert log

Possible Causes:
They are caused by various bugs which closed as base Bug 12337941 Dumps on kcldle / kclfplz / kcbbxsv_l2 / kclfprm using flash

Solutions:
The bug has been fixed in 11.2.0.3, either apply the patchset or use workaround: Disable the flash cache
Refer Document 12337941.8 for more details

Issue #5: LMS gets ORA-600 [kclpdc_21] and instance crashes
Symptoms:
ORA-600[kclpdc_21] reported in alert log

Possible Causes:
Document 10040035.8  LMS gets ORA-600 [kclpdc_21] and instance crashes

Solutions:
The bug has been fixed in 11.2.0.3

Issue for 10.2.0.5
Symptoms:
1. lms report ORA-600[kjccgmb:1], instance crash with LMS<n>: terminating instance due to error 484
2. Instance crash with:
Received an instance abort message from instance 2 (reason 0x0)
Please check instance 2 alert and LMON trace files for detail.
LMD0: terminating instance due to error 481

Possible Causes:
1. Bug 11893577 - LMD CRASHED WITH ORA-00600 [KJCCGMB:1]
2. Bug 9577274 - 1OFF:UNABLE TO VIEW REQUEST OUTPUT AND LOG AFTER APPLYING FIX TO ISSUE IN BUG 9400041
Solutions:
1. For 10.2.0.5.0, please apply merge patch 12616787 only
2. For 10.2.0.5.5, please apply merge patch 13470618 only
At the time of writing, patch only available for certain platform. It is not required to apply both of above patches for any 10.2.0.5.x release.
[size=130%]
[size=130%]
[size=130%]


上一篇:Oracle RAC数据库集群节点被驱逐的5种原因分析
下一篇:关于ORACLE RAC集群私网网卡用直接还是交换机连接的问题
189070296,150201289

专业提供Oracle数据库服务、主机、存储、备份、中间件等相关技术支持服务,QQ号:176140749
关注ITPUX技术网微信公众号itpux_com  ,了解本站最新技术资料的分享.

欢迎加QQ群,提供超多高质量Oracle/Unix/Linux技术文档与视频教程的下载。

Oracle/MySQL/Linux群4-5:189070296  150201289  
Oracle/MySQL/Linux群6-8:244609803   522261684   522651731
备注:请勿重复加群,另请注明 from itpux

加群分享视频教程部分如下:

1、公开课视频:Oracle/MySQL数据库工程师职业发展前景讲解(免费)
http://edu.51cto.com/course/7015.html

2、51CTO学院Oracle数据库高级工程师培训(高薪就业.课程介绍)
http://edu.51cto.com/px/train/131?xiaotu

3、Oracle DBA数据库高级工程师培训视频课程1.1(系列78套+七大阶段+上千案例)
套餐视频地址: http://edu.51cto.com/topic/1121.html

4、MySQL数据库(终身门徒)套餐:http://edu.51cto.com/sd/1e1a6

回复

使用道具 举报

内容发布:xjcydf909| 发布时间:2014-10-28 13:30:08
学习了~~~~
回复

使用道具 举报

1框架
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

快速回复 返回顶部 返回列表