发现数据库告警,查看alert日志,发现如下报错
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lgwr_26383.trc:
ORA-04021: timeout occurred while waiting to lock object 
LGWR (ospid: 26383): terminating the instance due to error 4021
Sun Mar 25 03:29:07 2018
System state dump requested by (instance=1, osid=26383 (LGWR)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcle1_diag_26321_20180325032907.trc
Instance terminated by LGWR, pid = 26383
Sun Mar 25 03:29:20 2018
Starting ORACLE instance (normal)

先处理DG备库问题,查看状态发现库是MOUNT状态,先将数据库启动。
SQL> alter database open;
SQL> alter database recover managed standby database using current logfile disconnect;
SQL> select open_mode from v$database;

OPEN_MODE

READ ONLY WITH APPLY

再查看问题,MOS对该问题解释如下:
APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.3 and later
Information in this document applies to any platform.
SYMPTOMS

DR database crashed with below errors..

Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=XX.XXX.XXX.XX)(PORT=54537))
WARNING: inbound connection timed out (ORA-3136)
Wed Jul 13 13:43:24 2016
Errors in file /u01/app/oracle/diag/rdbms/rxeprr_dr/RXEPRR1/trace/RXEPRR1_lgwr_31312.trc:
ORA-04021: timeout occurred while waiting to lock object
LGWR (ospid: 31312): terminating the instance due to error 4021
Wed Jul 13 13:43:24 2016
System state dump requested by (instance=1, osid=31312 (LGWR)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/rxeprr_dr/RXEPRR1/trace/RXEPRR1_diag_31221.trc
Wed Jul 13 13:43:25 2016
License high water mark = 318
Instance terminated by LGWR, pid = 31312
USER (ospid: 20898): terminating the instance
Instance terminated by USER, pid = 20898
Wed Jul 13 13:43:39 2016
Starting ORACLE instance (normal)

CHANGES

No changes

CAUSE

Bug 16717701 - ADG SHOULD GET THE INSTANCE PARSE LOCK WITH A TIMEOUT

Bug 11712267 - ACTIVE DATA GUARD DATABASE HUNG ON 'LIBRARY CACHE: MUTEX X' WAIT EVENT

LGWR trace file (RXEPRR1_lgwr_31312.trc)

2016-07-13 13:43:24.498
SESSION ID:(6709.1) 2016-07-13 13:43:24.498
CLIENT ID:() 2016-07-13 13:43:24.498
SERVICE NAME:(SYS$BACKGROUND) 2016-07-13 13:43:24.498
MODULE NAME:() 2016-07-13 13:43:24.498
ACTION NAME:() 2016-07-13 13:43:24.498

error 4021 detected in background process
ORA-04021: timeout occurred while waiting to lock object
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+1296<-kjzdicrshnfy()+364<-ksuitm()+1688<-ksbrdp()+4296<-opirip()+1680<-opidrv()+748<-sou2o()+88<-opimai_real()+276<-ssthrdmain()+316<-main()+316<-_start()+380
----- End of Abridged Call Stack Trace -----

SOLUTION

Issue matches with bug 11712267 and bug 16717701

Since two bugs are matching with the case,

You can try with option (1) . As per Bug 11712267

change the cursor_sharing to force on Active dataguard (ADG).

Monitor your environment for sometime.

If it crashes again then follow with the option (2)
Option (2):

As per bug description

LGWR can request DBINSTANCE lock in X mode without any timeout which can lead to a hang / deadlock.

Both fixes are already included in 11.2.0.4 but the fix is DISABLED by default.
== > To ENABLE the fix one has to set == > "_adg_parselock_timeout" > to the number of centi-seconds == > LGWR should wait before backing off and retrying the request.

Value should be in centi seconds. == > I Don't think there is really any hard fast rule for a value - at default (0) it will not timeout.
A value representing a few seconds seems reasonable - if LGWR has been stuck for say 5 seconds waiting it seems reasonable guess it is not going to get the lock.

The param just causes it to abort the current attempt and retry If you want to play safe can start with a higher value then decrease later.
A higher value will just mean more sessions blocked for longer in case of the deadlock situation.
500 Seems reasonable , but I have no data to base it on.

There should be a statistic "ADG parselock X get attempts" If it gets set too small that value would likely increase a lot due to keep timing out and retrying.

This is a dynamic parameter

Follow option (1) .

change the cursor_sharing to force on ADG

If issue re-appears then follow option (2) as below

Please set "_adg_parselock_timeout" to 500 == >

SQL > alter system set "_adg_parselock_timeout"=500 scope=both sid='*';

简单翻译如下:
1、将cursor_sharing 参数改成FORCE
2、将 "_adg_parselock_timeout" 设置为500
SQL > alter system set "_adg_parselock_timeout"=500 scope=both sid='*';

转载至https://blog.51cto.com/lyzbg/2090812

ORA-04021导致oracle11gADG备库宕机问题处理相关推荐

  1. PostgreSQL主库创建表空间导致备库宕机

    PostgreSQL主库创建表空间导致备库宕机 PG版本:11.7 最后编辑时间:2022年1月23日00:17:06 主库创建表空间 [postgres@rhel6wcb /]$ mkdir -p ...

  2. oracle备库重启后失去连接,关于dataguard备库宕机后重启后的问题

    iori809 发表于 2011-10-3 13:06 看你的archive log的sequence已经大于3913了~ 你的alert输出什么呢?显示一直在 recover 3913还是? 你可 ...

  3. mysql主从复制安装部署配置操作步骤及主从库宕机处理办法

    mysql主从复制安装部署配置操作步骤及主从库宕机处理办法 大家好,我是酷酷的韩~ 1.前期准备 (1)首先需要准备至少两台mysql 这里方便演示,用了两台虚拟机 环境是centos7 主: 172 ...

  4. Mysql从库主键卡住_从库宕机引发的主键冲突

    刚刚接到报警短信,从库宕机,马上通知机房重启,在检查MySQL时,发现同步挂了,报主键冲突,询问开发是不是有往里面写数据,回答没有.这就奇怪了,怎么会无缘 刚刚接到报警短信,从库宕机,马上通知机房重启 ...

  5. 【华为云技术分享】MongoDB经典故障系列四:调整oplog大小,引起从库宕机怎么办?

    一不小心调整了自建MongoDB数据库的oplog大小,从而引起从库宕机怎么办?别急,华为云数据库给您支招:一是取消延迟配置,先扩容延时从库的oplog大小,再扩容主库的oplog:二是对主库先降级再 ...

  6. 恢复Redis中主、从库宕机

    1.什么是哨兵 哨兵是对Redis的系统的运行情况的监控,它是一个独立进程,功能有二个: 监控主数据库和从数据库是否运行正常: 主数据出现故障后自动将从数据库转化为主数据库: 2.原理 单个哨兵的架构 ...

  7. Fastly道歉:软件漏洞导致全球大量网站宕机

    本文转载自IT之家 北京时间 6 月 9 日下午消息,据报道,云服务提供商 Fastly 今日表示,导致昨日大量网站宕机的罪魁祸首是一个软件漏洞(Bug),该漏洞由一家客户更改其设置后触发. 昨晚,全 ...

  8. Redis中主、从库宕机如何恢复?

    作者 | tamir_2017 来源 | blog.csdn.net/py_tamir/article/details/82555338 1.什么是哨兵 哨兵是对Redis的系统的运行情况的监控,它是 ...

  9. HBase案例 | 20000个分区导致HBase集群宕机事故处理

    这是几个月前遇到的一次HBase集群宕机事件,今天重新整理下事故分析报告.概况的说是业务方的一个10节点HBase集群支撑百TB级别的数据量,集群region数量达 23000+,最终集群支持不住业务 ...

最新文章

  1. python是开源的.它可以被移植_Facebook 发布开源框架 PyTorch, Torch 终于被移植到 Python 生态圈...
  2. 脑电信号滤波方式汇总
  3. Median of Two Sorted Arrays
  4. Repository模式(转载)
  5. 医学院计算机社发展,医学院计算机教学创新思路.docx
  6. C++设计模式之访问者模式
  7. [你必须知道的.NET]第三十四回,object成员,不见了!
  8. 三甲医院his系统源码_三甲医院科研管理系统是什么,科研成果包括哪些
  9. azure未连接_将Azure Databricks数据连接到Power BI Desktop
  10. c#调用python脚本效率,C#调用Python脚本 C#调用Python脚本的简单示例
  11. win10系统服务器不能创建对象,win10系统中activex部件不能创建对象怎么修复
  12. 2017阿里巴巴实习生招聘编程题
  13. java重新温习基础笔记
  14. 软件设计师刷题(2)
  15. python 处理数据小工具_用Python这个小工具,一次性把论文作图与数据处理全部搞定!...
  16. ds12c887c语言初始化,DS12C887的参数设置与应用.pdf
  17. 网易彩票APP:世界杯竞彩 细节定成败
  18. 正则表达式去掉回车、换行、空白符号、空格
  19. python写入文本文件的数据类型必须是_三:python数据类型和文件操作
  20. R语言绘制gif动图

热门文章

  1. permit doing 与permit to do详细区别
  2. ARM裸机的知识点总结---------11、iNandFlash (sd卡芯片化)
  3. AMD的复兴之路 发力ARM服务器芯片
  4. 发那科 Fanuc 虚拟机采集环境搭建
  5. Win10改Win7后USB没有被驱动
  6. 为什么说冯诺依曼结构是现代计算机的基础,为什么现代计算机被称为冯·诺依曼结构计算机??...
  7. 使用python构建数据库_使用Python构建(半)自主无人机
  8. PS鼠绘教程:PS鼠绘炫酷红色保时捷跑车
  9. 100个python算法超详细讲解:存钱
  10. 图像处理--鱼眼图像