初始化

Watchdog作为一个独立的线程在SystemServer进程中被初始化:

 private void startBootstrapServices(@NonNull TimingsTraceAndSlog t) {// Start the watchdog as early as possible so we can crash the system server// if we deadlock during early boott.traceBegin("StartWatchdog");final Watchdog watchdog = Watchdog.getInstance();watchdog.start();t.traceEnd();}

Watchdog类没有在定义时实现Runnable接口,但其实现了run()方法,类变量 private final Thread mThread; 在构造器中被初始化,watchdog.start();开始执行此线程。

构造其中添加了"foreground thread",“main thread”,“ui thread”,“i/o thread”,“display thread”,“animation thread”,“surface animation thread”,"BinderThreadMonitor"等HandlerChecker。

    private Watchdog() {mThread = new Thread(this::run, "watchdog");// Initialize handler checkers for each common thread we want to check.  Note// that we are not currently checking the background thread, since it can// potentially hold longer running operations with no guarantees about the timeliness// of operations there.// The shared foreground thread is the main checker.  It is where we// will also dispatch monitor checks and do other work.mMonitorChecker = new HandlerChecker(FgThread.getHandler(),"foreground thread", DEFAULT_TIMEOUT);mHandlerCheckers.add(mMonitorChecker);// Add checker for main thread.  We only do a quick check since there// can be UI running on the thread.mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),"main thread", DEFAULT_TIMEOUT));// Add checker for shared UI thread.mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),"ui thread", DEFAULT_TIMEOUT));// And also check IO thread.mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),"i/o thread", DEFAULT_TIMEOUT));// And the display thread.mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),"display thread", DEFAULT_TIMEOUT));// And the animation thread.mHandlerCheckers.add(new HandlerChecker(AnimationThread.getHandler(),"animation thread", DEFAULT_TIMEOUT));// And the surface animation thread.mHandlerCheckers.add(new HandlerChecker(SurfaceAnimationThread.getHandler(),"surface animation thread", DEFAULT_TIMEOUT));// Initialize monitor for Binder threads.addMonitor(new BinderThreadMonitor());mInterestingJavaPids.add(Process.myPid());// See the notes on DEFAULT_TIMEOUT.assert DB ||DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;mTraceErrorLogger = new TraceErrorLogger();}

AMS, PKMS, WMS等会在自己的构造器中将自己添加到Watchdog的HandlerChecker中:

public ActivityManagerService() {Watchdog.getInstance().addMonitor(this);Watchdog.getInstance().addThread(mHandler);
}
public PackageManagerService() {Watchdog.getInstance().addThread(mHandler, WATCHDOG_TIMEOUT);
}
public WindowManagerService{public void onInitReady() {// Add ourself to the Watchdog monitors.Watchdog.getInstance().addMonitor(this);}
}

工作流程

Watchdog作为单独执行的线程,在run()方法中循环监测所有HandlerChecker的状态,导出异常进程的运行日志,必要时给当前进程(system_server)发送signal 9,杀掉此进程。

public void run{while (true) {synchronized (mLock) {//1. 遍历所有HandlerCheckerfor (int i=0; i<mHandlerCheckers.size(); i++) {HandlerChecker hc = mHandlerCheckers.get(i);hc.scheduleCheckLocked();}//2. mLock.wait(timeout);使当前线程处于等待状态,等待时间为timeout = CHECK_INTERVAL: 30s// NOTE: We use uptimeMillis() here because we do not want to increment the time we// wait while asleep. If the device is asleep then the thing that we are waiting// to timeout on is asleep as well and won't have a chance to run, causing a false// positive on when to kill things.long start = SystemClock.uptimeMillis();while (timeout > 0) {Log.d(TAG, "run: timeout = " + timeout);if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}try {mLock.wait(timeout);// Note: mHandlerCheckers and mMonitorChecker may have changed after waiting} catch (InterruptedException e) {Log.wtf(TAG, e);}if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);}//3. 监测HandlerChecker的完成状态final int waitState = evaluateCheckerCompletionLocked();if (waitState == COMPLETED) {waitedHalf = false;continue;} else if (waitState == WAITING) {continue;} else if (waitState == WAITED_HALF) {if (!waitedHalf) {waitedHalf = true;pids = new ArrayList<>(mInterestingJavaPids);doWaitedHalfDump = true;} else {continue;}} else {// 存在超时的 HandlerChecker !!!// something is overdue!blockedCheckers = getBlockedCheckersLocked();subject = describeCheckersLocked(blockedCheckers);allowRestart = mAllowRestart;pids = new ArrayList<>(mInterestingJavaPids);}} // END synchronized (mLock)// 4. 导出异常日志 ANR:/data/anrfinal File finalStack = ActivityManagerService.dumpStackTraces(pids, processCpuTracker, new SparseArray<>(), nativePids,tracesFileException, subject);// 5. 导出异常日志 dropbox:/data/system/dropbox/Thread dropboxThread = new Thread("watchdogWriteToDropbox") {public void run() {if (mActivity != null) {mActivity.addErrorToDropBox("watchdog", null, "system_server", null, null, null,null, report.toString(), finalStack, null, null, null,errorId);}}};dropboxThread.start();try {dropboxThread.join(2000);  // wait up to 2 seconds for it to return.} catch (InterruptedException ignored) {}// 6. 导出异常日志到 kernel log后关机(trigger kernel panic), 通过/proc/sysrq-trigger触发if (crashOnWatchdog) {// Trigger the kernel to dump all blocked threads, and backtraces// on all CPUs to the kernel logSlog.e(TAG, "Triggering SysRq for system_server watchdog");doSysRq('w');doSysRq('l');// wait until the above blocked threads be dumped into kernel logSystemClock.sleep(3000);doSysRq('c');}// 7. 向ActivityController汇报当前状态IActivityController controller;if (controller != null) {Slog.i(TAG, "Reporting stuck state to activity controller");try {Binder.setDumpDisabled("Service dumps disabled due to hung system process.");// 1 = keep waiting, -1 = kill systemint res = controller.systemNotResponding(subject);if (res >= 0) {Slog.i(TAG, "Activity controller requested to coninue to wait");waitedHalf = false;continue;}} catch (RemoteException e) {}}// 8. 判断是否需要杀掉当前进程(system_server进程) Process.killProcess(Process.myPid())// Only kill the process if the debugger is not attached.if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}if (debuggerWasConnected >= 2) {Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");} else if (debuggerWasConnected > 0) {Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");} else if (!allowRestart) {Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");} else {Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);Slog.w(TAG, "*** GOODBYE!");if(SmartTraceUtils.isPerfettoDumpEnabled() && dueTime > SystemClock.uptimeMillis()){long timeDelta = dueTime - SystemClock.uptimeMillis();// wait until perfetto log to be dumped completelySlog.i(TAG,"Sleep "+ timeDelta+" ms to make sure perfetto log to be dumped completely");SystemClock.sleep(timeDelta);}if (!Build.IS_USER && isCrashLoopFound()&& !WatchdogProperties.should_ignore_fatal_count().orElse(false)) {breakCrashLoop();}Process.killProcess(Process.myPid());System.exit(10);}waitedHalf = false;}}

检测机制

Watchdog在初始化时将一些重要进程添加到HandlerChecker列表中,通过HandlerChecker对各个监测对象进行监测。

HandlerChecker大致可以分为两类:

  • Monitor Checker,用于检查是Monitor对象可能发生的死锁, FgThread, AMS, WMS等核心的系统服务都是Monitor对象。
  • Looper Checker,用于检查线程的消息队列是否长时间处于工作状态。Watchdog自身的消息队列,Ui, Io, Display这些全局的消息队列都是被检查的对象。此外,一些重要的线程的消息队列,也会加入到Looper Checker中,譬如AMS, PKMS,这些是在对应的对象初始化时加入的。
    public void addMonitor(Monitor monitor) {synchronized (mLock) {mMonitorChecker.addMonitorLocked(monitor);}}public void addThread(Handler thread) {addThread(thread, DEFAULT_TIMEOUT);}

HandlerChecker是Watchdog的内部类,也实现了Runnable接口。

从上面Watchdog的工作流程中可以看到,Watchdog主要通过HandlerChecker的scheduleCheckLocked()方法监测进程状态。

在scheduleCheckLocked()方法开始初始化类变量mMonitors,mMonitors变量包含了所有的Monitor Checker对象,如上文说的FgThread, AMS, WMS等。

下面主要关注scheduleCheckLocked()方法中的两行代码:

  1. 通过*mHandler.getLooper().getQueue().isPolling()*方法判断Loop对象是否依然活跃而不是卡住。对于Looper Checker而言,会判断线程的消息队列是否处于空闲状态。 如果被监测的消息队列一直闲不下来,则说明可能已经阻塞等待了很长时间

  2. mHandler.postAtFrontOfQueue(this); 将Monitor Checker的对象置于消息队列之前,优先运行。mHandler.postAtFrontOfQueue(Runable r)参数为Runable对象,将HandlerChecker类中实现的run()方法放在监测对象mHandler进程中执行,调用其实现的monitor()方法,方法实现一般很简单,就是获取当前类的对象锁,如果当前对象锁已经被持有,则monitor()会一直处于wait状态,直到超时,这种情况下,很可能是线程发生了死锁。

public final class HandlerChecker implements Runnable {public void scheduleCheckLocked() {if (mCompleted) {// Safe to update monitors in queue, Handler is not in the middle of workmMonitors.addAll(mMonitorQueue);mMonitorQueue.clear();}if ((mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling())|| (mPauseCount > 0)) {// Don't schedule until after resume OR// If the target looper has recently been polling, then// there is no reason to enqueue our checker on it since that// is as good as it not being deadlocked.  This avoid having// to do a context switch to check the thread. Note that we// only do this if we have no monitors since those would need to// be executed at this point.mCompleted = true;return;}if (!mCompleted) {// we already have a check in flight, so no needreturn;}mCompleted = false;mCurrentMonitor = null;mStartTime = SystemClock.uptimeMillis();mHandler.postAtFrontOfQueue(this);}@Overridepublic void run() {// Once we get here, we ensure that mMonitors does not change even if we call// #addMonitorLocked because we first add the new monitors to mMonitorQueue and// move them to mMonitors on the next schedule when mCompleted is true, at which// point we have completed execution of this method.final int size = mMonitors.size();for (int i = 0 ; i < size ; i++) {synchronized (mLock) {mCurrentMonitor = mMonitors.get(i);}mCurrentMonitor.monitor();}synchronized (mLock) {mCompleted = true;mCurrentMonitor = null;}}
}
Monitor

Monitor是Watchdog的内部接口:

public class Watchdog {public interface Monitor {void monitor();}
}

AMS的monitor()实现:

public class ActivityManagerService extends IActivityManager.Stubimplements Watchdog.Monitor, BatteryStatsImpl.BatteryCallback, ActivityManagerGlobalLock {/** In this method we try to acquire our lock to make sure that we have not deadlocked */public void monitor() {synchronized (this) { }}
}

Android Watchdog分析相关推荐

  1. Android手机开发总结——Android核心分析

    导读:对于Android开发者来说,成系列的技术文章对他们的技术成长帮助最大.如下是我们向您强烈推荐的主题为Android开发的第一个系列文章. <Android核心分析>整理如下: 1. ...

  2. Android核心分析

    导读:对于Android开发者来说,成系列的技术文章对他们的技术成长帮助最大.如下是我们向您强烈推荐的主题为Android开发的第一个系列文章. <Android核心分析>整理如下: 1. ...

  3. android watchdog机制

    Android Watchdog 机制 早期手机平台上通常是在设备中增加一个硬件看门狗(WatchDog), 软件系统必须定时的向看门狗硬件中写值来表示自己没出故障(俗称"喂狗") ...

  4. Android 核心分析 之七------Service深入分析

    http://blog.csdn.net/maxleng/article/details/5504485 Service深入分析 上一章我们分析了Android IPC架构,知道了Android服务构 ...

  5. android逆向分析概述_Android存储概述

    android逆向分析概述 Storage is this thing we are all aware of, but always take for granted. Not long ago, ...

  6. Android JNI入门第五篇——Android.mk分析

    转载请标明出处: http://blog.csdn.net/michael1112/article/details/56671708 江东橘子的博客 Android.mk文件是在使用NDK编译C代码时 ...

  7. Android多线程分析之二:Thread的实现

    Android多线程分析之二:Thread的实现 罗朝辉 (http://www.cnblogs.com/kesalin/) CC 许可,转载请注明出处 在前文<Android多线程分析之一:使 ...

  8. Android内存分析和调优(上)

    Android内存分析和调优(上) Android内存分析和调优(上) Android内存分析工具(四):adb命令 posted on 2017-09-25 19:29 时空观察者9号 阅读(... ...

  9. Android Telephony分析(七) ---- 接口扩展(异步转同步)

    本文是基于上一篇<Android Telephony分析(六) -- 接口扩展(实践篇)>来写的.  上一篇介绍的接口扩展的方法需要实现两部分代码:  1. 从APP至RIL,发送请求:  ...

最新文章

  1. flutter 泛型_Flutter/Dart - 泛型
  2. ffmpeg+ffserver搭建流媒体服务器
  3. ORACLE11G RAC 在 centeros5.5 的安装日志
  4. 网站前端设计,从960框架开始
  5. Python学习笔记14(socket编程)
  6. Python+OpenGL实现物体快速运动时的模糊效果
  7. 虚拟机下安装MS-DOS
  8. 销售与顾客的PV同步算法
  9. Cordova for iOS
  10. [RTOS]--uCOS、FreeRTOS、RTThread、RTX等RTOS的对比之特点
  11. 微信爬虫服务器,微信文章爬虫使用教程 - 八爪鱼采集器
  12. bartend无法自动打印的问题
  13. 节目源php代理_【斗鱼直播源】浏览器抓取真实直播源地址(纯前端JS PHP解析源码)...
  14. fqa什么意思_FQA是什么意思
  15. 数据库自增 ID 用完了会咋样?
  16. Using RCU‘s CPU Stall Detector(待更新)
  17. office2019安装教程
  18. 四轴mpu6050姿态角卡尔曼滤波代码分析
  19. add p4 多个文件_P4_tutorials
  20. win7 共享计算机 网络密码怎么设置,Win7无线网络共享设置方法丨Win7无线网络共享设置方法图解...

热门文章

  1. 前端开发工程师快速装机指南
  2. PID调节之比例(P)调节
  3. linux下smbd安装使用
  4. 4、HTML 学习记录——元素学习
  5. 年轻人如何创业(如何创业白手起家)
  6. 【20230416】
  7. kaldi-yesno例子
  8. 仅需3分钟,你就能明白Kafka的工作原理
  9. [工具]python中文分词---【jieba】
  10. cimco edit v5_网钛CMS PHP版 V5.32 更新下载