RemoteServiceException: can‘t deliver broadcast 问题分析
一、问题背景
最近测试跑monkey连续压测,报了一个应用稳定性的问题。因为该问题比较典型,并且需要我们编码上也要注意规避该问题。我在分析过程中一直没找到根因,最后求助于leader,非常感谢不吝指教(一块周五加班分析到11点多)。
分析崩溃log,核心堆栈如下(已脱敏,出问题的android系统版本是api11,AndroidR):
ps: 本次涉及的应用包名统一用com.my.app代替
11-28 03:57:20.326 12039 12039 E AndroidRuntime: FATAL EXCEPTION: main
11-28 03:57:20.326 12039 12039 E AndroidRuntime: Process: com.my.app, PID: 12039
11-28 03:57:20.326 12039 12039 E AndroidRuntime: android.app.RemoteServiceException: can't deliver broadcast
11-28 03:57:20.326 12039 12039 E AndroidRuntime: at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2073)
11-28 03:57:20.326 12039 12039 E AndroidRuntime: at android.os.Handler.dispatchMessage(Handler.java:111)
11-28 03:57:20.326 12039 12039 E AndroidRuntime: at android.os.Looper.loop(Looper.java:250)
11-28 03:57:20.326 12039 12039 E AndroidRuntime: at android.app.ActivityThread.main(ActivityThread.java:7860)
11-28 03:57:20.326 12039 12039 E AndroidRuntime: at java.lang.reflect.Method.invoke(Native Method)
11-28 03:57:20.326 12039 12039 E AndroidRuntime: at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:591)
11-28 03:57:20.326 12039 12039 E AndroidRuntime: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1123)
网上查了一番,发现有一些前人的分析:
- viewpager嵌套fragment或其他大量动态广播注册注销出现的崩溃: https://blog.csdn.net/qq_27381325/article/details/82811079
但上述可能算是避开该问题的方法(避开Binder的IPC调用),能规避问题,并不是根因,于是仍然需要跟进该问题,分析根因。如果业务场景真的是需要全局广播,那是无法换成LocalBroadcastManager的。
同理,如果是应用内通信的话,完全不用广播也是可以的,使用EventBus等就可以,当然EventBus在业内各位大佬的拓展下也出了支持跨进程的版本。
二、异常抛出位置分析
一眼看过去像是系统问题,刚开始抛给系统看了,结果系统讲这可能是广播传输的数据量过大,导致framework层报出的异常。
2.1 sendBroadcast方法是否本就可能抛出RemoteServiceException?
查阅代码,初步看到RemotionServiceException可以先推测跟binder跨进程服务有关。这里错误message的提醒是传递广播不正确,看下sendBroadcast()方法的定义,确实是有可能抛出RemoteServiceException异常。
- android.app.ContextImpl#sendBroadcast(android.content.Intent)源码定义:
@Overridepublic void sendBroadcast(Intent intent) {warnIfCallingFromSystemProcess();String resolvedType = intent.resolveTypeIfNeeded(getContentResolver());try {intent.prepareToLeaveProcess(this);ActivityManager.getService().broadcastIntentWithFeature(mMainThread.getApplicationThread(), getAttributionTag(), intent, resolvedType,null, Activity.RESULT_OK, null, null, null, AppOpsManager.OP_NONE, null, false,false, getUserId());} catch (RemoteException e) { // 这里有声明抛出RemoteException异常,说明发送广播本身就可能抛出异常。// 然鹅并不是RemoteServiceException类型异常。throw e.rethrowFromSystemServer();}}
android.os.RemoteException#rethrowFromSystemServer:看抛出的方法也没有做异常类型的包装。
@NonNullpublic RuntimeException rethrowFromSystemServer() {if (this instanceof DeadObjectException) {throw new RuntimeException(new DeadSystemException());} else {throw new RuntimeException(this);}}
再看RemoteExcetion和本次的RemoteServiceException有没有继承关系,发现是没有的,看来这里的异常声明和这次崩溃的堆栈可能关系不大,以下是类的继承关系:
2.2 在google源码平台查询崩溃message确定崩溃抛出位置
排除了sendBroadcast本身会抛出RemoteServiceException的可能性。下一步还是要确定异常从哪儿报出来的,通过https://cs.android.com/直接搜索“can’t deliver broadcast”,发现能搜出来,google的源码查看网站果然强大,瞬间感觉了有了源码查看利器,强烈推荐使用!
下面Android源码版本,选择一个当前出问题的Android版本即可。
- tips: 选择不同Android版本的方式,不过同一个版本也有很多个小的分支。
- 搜索结果:
简单遴选下,发现这个错误最终是在BroadcastQueue中报出来的。 - com.android.server.am.BroadcastQueue#performReceiveLocked
看catch语句中调用app.scheduleCrash()的地方。
void performReceiveLocked(ProcessRecord app, IIntentReceiver receiver,Intent intent, int resultCode, String data, Bundle extras,boolean ordered, boolean sticky, int sendingUser)throws RemoteException {// Send the intent to the receiver asynchronously using one-way binder calls.if (app != null) {if (app.thread != null) {// If we have an app thread, do the call through that so it is// correctly ordered with other one-way calls.try {// 下面没有任何代码了,所以catch中异常只能是下面这行抛出的。// 其实IPC调用中RemoteException是很常见的的,app.thread.scheduleRegisteredReceiver(receiver, intent, resultCode,data, extras, ordered, sticky, sendingUser, app.getReportedProcState());// TODO: Uncomment this when (b/28322359) is fixed and we aren't getting// DeadObjectException when the process isn't actually dead.//} catch (DeadObjectException ex) {// Failed to call into the process. It's dying so just let it die and move on.// throw ex;} catch (RemoteException ex) {// 下面这句注释也很重要!!英语功底很重要!!// 但是这句真的很难理解意思!!老外你不能表达清楚点吗,真的让人费解呀,有理解的哥们请不吝赐教。// 翻译过来:无法转调这个方法。 它(应该是指app.thread这个binder接口,是由应用实现的)要么快死了,// 要么被楔住了。 平缓地杀死它。// wedged:查了词典楔形的,跟这里有啥关系。把...楔住,最相近的是卡住,姑且理解为卡住、阻塞的意思吧。// Failed to call into the process. It's either dying or wedged. Kill it gently.synchronized (mService) {Slog.w(TAG, "Can't deliver broadcast to " + app.processName+ " (pid " + app.pid + "). Crashing it.");// 这里通过scheduleCrash抛出去的message跟本次要追踪的吻合。// 可以确定就是从这里抛出去的异常。app.scheduleCrash("can't deliver broadcast");}// 这里还是抛出了RemoteException,最终异常会往上一层层抛,// 到ActivityManagerService的MainHandler中,最后抛到Loper.loop()中谁处理的?// 不是本题重点,先存疑throw ex;}} else {// Application has died. Receiver doesn't exist.// 如果是在调用到performReceiveLocked()时应用进程已经挂掉了的状态,应该是这个log才对。// 因为应用进程kill过程也是需要流程的,上面的app.thread.scheduleRegisteredReceiver()// 在调用过程中app进程挂掉是有可能的,注释里也有相应的提醒throw new RemoteException("app.thread must not be null");}} else {// app为null还要转调onReceive方法??还不清楚何种情况下会传为null的ProcessRecord,先存疑。receiver.performReceive(intent, resultCode, data, extras, ordered,sticky, sendingUser);}}
- com.android.server.am.ProcessRecord#scheduleCrash:
void scheduleCrash(String message) {// Checking killedbyAm should keep it from showing the crash dialog if the process// was already dead for a good / normal reason.if (!killedByAm) {if (thread != null) {if (pid == Process.myPid()) {Slog.w(TAG, "scheduleCrash: trying to crash system process!");return;}long ident = Binder.clearCallingIdentity();try {// thread即app的IApplicationThread对象,指向ActivityThread.ApplicationThread对象。// 在ActivityThread中有new的过程。thread.scheduleCrash(message);} catch (RemoteException e) {// If it's already dead our work is done. If it's wedged just kill it.// We won't get the crash dialog or the error reporting.// 这里catch的是thread的binder调用,如果再抛出异常,直接Process.kill()了// 通知应用崩溃的message发送过程异常了啊,系统的设计选择了直接将app进程kill了// 回传的kill reason是REASON_CRASH = 4:Application process died because of // an unhandled exception in Java code.kill("scheduleCrash for '" + message + "' failed",ApplicationExitInfo.REASON_CRASH, true);} finally {Binder.restoreCallingIdentity(ident);}}}}
可以看到ProcessRecord#scheduleCrash()是通过ApplicationThread抛给了应用层,ApplicationThread接口是SystemServer进程和应用进程通信的AIDL接口,定义在ActivityThread.java文件中。
注意:在这里的场景下,AMS是client端,应用进程是服务端。IApplicationThread的实现是在应用进程中,AMS运行在system_server进程中,通过ProcessRecord持有了IApplicationThread对象,以此实现应用进程的方法调用(通信)。
- android.app.ActivityThread.ApplicationThread#scheduleCrash:
通过该方法将framework层system_server进程的异常传递给了应用进程,并通过Handler的方式跑到了主线程。
public void scheduleCrash(String msg) {sendMessage(H.SCHEDULE_CRASH, msg);
}
- H#handleMessage()
case SCHEDULE_CRASH:throw new RemoteServiceException((String)msg.obj);
上面的代码,抛出了RemoteServiceException,然后在Looper.loop()的dispatchMessage()中抛出,形成了上面讲的堆栈。
由此,堆栈就对应上了。
2.3 从整体流程看,在哪一步抛出异常?
到目前为止,我们明确了异常抛出的位置。但到底是在广播发送流程抛出,还是接收流程抛出仍未明确。
这里直接贴广播的发送流程uml序列图了,感兴趣的可以对照翻代码(基于Android Api30,很早之前画的,比较粗糙见谅)。
由上图能看出对于应用进程(发生崩溃的进程)是接收流程,也就是说崩溃的进程是在接收广播时,AMS通过iApplicationThread转调应用进程的onReceive()方法的过程中抛出异常了。
三、根因分析
3.1 app.thread.scheduleRegisteredReceiver()为何失败?
也就是说AMS调用IApplicationThread的一个AIDL方法为何失败?
这个问题跟随便写一个aidl调用进行IPC调用,但是抛出了RemoteException异常本质一样。
Binder调用失败,刚好log中在发生崩溃前有大量的binder调用阻塞和失败记录。
- Binder调用频繁失败
11-28 03:56:59.642 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.645 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.648 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.657 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.668 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.676 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.689 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.707 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.719 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.720 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.721 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.738 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.743 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.746 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.767 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.773 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.782 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.788 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.805 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.809 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.812 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.829 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.833 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.844 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.847 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.851 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.854 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.859 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.872 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.876 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.880 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.881 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.882 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.883 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.901 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.904 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.908 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.922 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.924 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.929 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.930 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.949 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.953 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.971 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.974 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.977 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.994 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:56:59.997 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:57:00.000 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:57:00.007 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:57:00.018 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:57:00.022 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:57:00.027 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:57:00.030 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:57:00.033 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:57:00.043 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:57:00.052 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:57:00.055 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
11-28 03:57:00.068 599 20783 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 232)
- Binder调用阻塞很长时间
- Binder泄漏导致资源耗尽,system_server异常
跟进源码发现抛出崩溃的位置:frameworks/base/core/java/android/os/BinderProxy.java#set()
四、小结
- 异常不确定时可通过https://cs.android.com/直接搜索message确定异常抛出位置;
- 无论复杂or简单的问题,逐步分析的思路和方向要明确:初步确定有哪些可能抛出该问题?->在源码中搜索异常message确定抛出位置?->分析源码哪些情况下会走异常逻辑?-> A情况?、B情况?C情况?-> 改写代码凑条件复现该问题-> 总结、复盘该题结论。->后面如何避免该问题?
- 理性分析,尊重客观事实。
RemoteServiceException: can‘t deliver broadcast 问题分析相关推荐
- [Q]Sending non-protected broadcast问题分析
有同事遇到发送广播接收不到的问题,分析log发现是system进程发送non-protected广播的问题.Ams在发送广播时,对于systemApp会要求广播必须是声明在frameworks\bas ...
- Android 11 Sending non-protected broadcast问题分析
带android:sharedUserId="android.uid.system" 发送广播时,会出现 Sending non-protected broadcast 异常提醒: ...
- Android Broadcast原理分析之registerReceiver(一)
目录 BroadcastReceiver概述 BroadcastReceiver分类 registerReceiver流程图 源码解析 总结 1. BroadcastReceiver概述 广播作为四大 ...
- Android 7.0 ActivityManagerService(5) 广播(Broadcast)相关流程分析
本篇博客旨在分析Android中广播相关的源码流程. 一.基础知识 广播(Broadcast)是一种Android组件间的通信方式. 从本质上来看,广播信息的载体是intent.在这种通信机制下,发送 ...
- Sending non-protected broadcast
Android发送广播时报错: Sending non-protected broadcast xxxxxxx from system xxxxxxxxxx 原因: Ams在发送广播时,对于syste ...
- ActivityManagerService第三讲之Broadcast Receiver工作流程
一.Broadcast Receiver工作流程 1.注册Broadcast Receiver 分为静态注册(在AndroidManifest.xml中)和 动态注册(在代码中调用registerRe ...
- Android10.0 BroadcastCast广播机制原理
原文地址:https://skytoby.github.io/2019/BroadcastCast%E5%B9%BF%E6%92%AD%E6%9C%BA%E5%88%B6%E5%8E%9F%E7%90 ...
- 如何解决Binder泄漏问题
作者:王小二C 2019/09/06 前言 [011]一个看似是系统问题的应用问题的解决过程[1]中我们解决了一个注册过多的BroadcastReceiver导致的某一次发送广播失败的问题.我这边遇 ...
- 一个看似是系统问题的应用问题的解决过程
作者:王小二C 2019/09/04 前言 今天遇到一个问题,应用工程师分析是系统层的问题,然后就把这个锅给了我.最后我又把锅甩回给了应用工程师. 异常log如下: I [2019-08-18 10 ...
最新文章
- HDOJ 5421 Victor and String 回文串自己主动机
- 想学习测试人必看的5本好书,没看过你就吃亏啦
- 从入门到精通的Java进阶学习笔记整理,不愧是大佬
- 网站开发建设过程中所涉及到的技术问题应当如何面对?
- QT4 自定义槽和信号
- readyState属性和status属性
- 【Python】pymysql.err.InternalError: (1236, 'Misconfigured master - server_id was not set')
- centos光盘修复引导_CentOS 6.5 修复grub引导
- 工作队列 ( workqueue )
- 《概率论与数理统计》之常见概率分布
- chrome保存网页为图片
- word外部表不是预期的格式_邮件合并为什么会出现外部表不是预期格式
- SCI论文的Highlights怎么写(正经的教你怎么写)
- vue中使用 svg图片
- Python+scrapy+mysql实现爬取磁力链接
- 算术左移,逻辑左移,算术右移,逻辑右移之间的区别
- 离散数学之数理结构推理理论
- 获取textarea标签中的换行符和空格
- Linux软件安装-RPM的安装技巧
- 哈希计划(河南省多校联盟第六次)