android watchdog机制
Android Watchdog 机制
早期手机平台上通常是在设备中增加一个硬件看门狗(WatchDog), 软件系统必须定时的向看门狗硬件中写值来表示自己没出故障(俗称“喂狗”), 否则超过了规定的时间看门狗就会重新启动设备. 大体原理是, 在系统运行以后启动了看门狗的计数器, 看门狗就开始自动计数,如果到了一定的时间还不去清看门狗,那么看门狗计数器就会溢出从而引起看门狗中断,造成系统复位。
而手机, 其实是一个超强超强的单片机, 其运行速度比单片机快N倍, 存储空间比单片机大N倍, 里面运行了若干个线程, 各种软硬件协同工作, Android 的 SystemServer 是一个非常复杂的进程,里面运行的服务超过五十种,是最可能出问题的进程,因此有必要对 SystemServer 中运行的各种线程实施监控。
但是如果使用硬件看门狗的工作方式,每个线程隔一段时间去喂狗,不但非常浪费CPU,而且会导致程序设计更加复杂。因此 Android 开发了 Watchdog 类作为软件看门狗来监控 SystemServer 中的线程。一旦发现问题,Watchdog 会杀死 SystemServer 进程。
Watchdog的功能
Watchdog主要有两个作用
- Blocked in Monitor 被监控线程的monitor接口实现阻塞
- Blocked int handler 被监控线程的消息队列不处理消息
判断线程是否卡住的方法
MessageQueue.isPolling
Monitor.monitor
---
HandlerChecker 检查looper是否阻塞
monitor 检查是否死锁
Watchdog的工作机制
Watchdog的工作机制 https://img-blog.csdnimg.cn/img_convert/e5c8133c7f86583251c775de4ceae9c0.jpeg
Watchdog 的启动
Watchdog 是在 SystemServer 进程中被初始化和启动的,在 SystemServer 的 run 方法中,各种Android 服务被注册和启动,其中也包括了Watchdog 的初始化和启动,代码如下:
final Watchdog watchdog = Watchdog.getInstance();//line: 864
watchdog.init(context, mActivityManagerService);
在 SystemServer 中 startOtherServices()
的后半段,在 AMS(ActivityManagerService) 的 SystemReady 接口的 CallBack 函数中实现 Watchdog 的启动:
Watchdog.getInstance().start();//line: 1852
Watchdog的构造方法
super("watchdog");
//初始化每一个我们希望检查的线程
//这里没有检查后台线程
//共享的前台线程是主检查器, 还有分配其monitor检查其它线程
mMonitorChecker = new HandlerChecker(FgThread.getHandler(),"foreground thread", DEFAULT_TIMEOUT);
mHandlerCheckers.add(mMonitorChecker);
// 为主线程添加检查器
mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),"main thread", DEFAULT_TIMEOUT));
// 为共享UI线程添加检查器
mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),"ui thread", DEFAULT_TIMEOUT));
// 为共享IO线程添加检查器
mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),"i/o thread", DEFAULT_TIMEOUT));
// 为共享display线程添加检查器.
mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),"display thread", DEFAULT_TIMEOUT));// 初始化检查器 binder线程.
addMonitor(new BinderThreadMonitor());mOpenFdMonitor = OpenFdMonitor.create();// See the notes on DEFAULT_TIMEOUT.
assert DB ||DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;
Watchdog的构造方法中创建了一些HandlerChecker对象, 并添加到自己的监听队列中.
Watchdog添加的监听handler
线程名 | 对应handler | 说明 | Timeout |
---|---|---|---|
foreground thread | FgThread.getHandler() | 前台线程 | 60s |
main thread | new Handler(Looper.getMainLooper()) | 主线程 | 60s |
ui thread | UiThread.getHandler() | UI线程 | 60s |
i/o thread | IoThread.getHandler() | IO线程 | 60s |
display thread | DisplayThread.getHandler() | Display线程 | 60s |
PackageManager | addThread(mHandler, time) | PackageManagerService主动add的线程 | 10min |
PackageManager | addThread(mHandler, time) | PermissionManagerService主动add的线程 | 60s |
PowerManagerService | addThread(mHandler, time) | PowerManagerService主动add的线程 | 60s |
ActivityManagerService | addThread(mHandler, time) | ActivityManagerService主动add的线程 | 60s |
Watchdog添加的监听monitor
monitor程名 | 说明 | Timeout |
---|---|---|
BinderThreadMonitor | 检查Binder线程 | 60s |
OpenFdMonitor | 检查fd线程 | 60s |
TvRemoteService | addMonitor(this) mLock | |
ActivityManagerService | addMonitor(this) this | |
MediaProjectionManagerService | addMonitor(this) mLock | |
MediaRouterService | addMonitor(this) mLock | |
MediaSessionService | addMonitor(this) mLock | |
InputManagerService |
addMonitor(this) mInputFilterLock nativeMonitor(mPtr); |
|
PowerManagerService | addMonitor(this) mLock | |
NetworkManagementService | addMonitor(this) mConnector | |
StorageManagerService | addMonitor(this) mVold | |
WindowManagerService | addMonitor(this) mWindowMap |
HandlerChecker
public final class HandlerChecker implements Runnable
HandlerChecker用于检查句柄线程的状态和调度监视器回调, 其原理就是通过各个Handler的looper的MessageQueue来判断该线程是否卡住了。当然,该线程是运行在SystemServer进程中的线程。
Watchdog中会构建很多的HandlerChecker, 可以分为两类
- Monitor Checker,用于检查是Monitor对象可能发生的死锁, AMS, PKMS, WMS等核心的系统服务都是Monitor对象。
- Looper Checker,用于检查线程的消息队列是否长时间处于工作状态。Watchdog自身的消息队列,ui, Io, display这些全局的消息队列都是被检查的对象。此外,一些重要的线程的消息队列,也会加入到Looper Checker中,譬如AMS, PKMS,这些是在对应的对象初始化时加入的。
两类HandlerChecker的侧重点不同
- Monitor Checker 预警我们不能长时间持有核心系统服务的对象锁,否则会阻塞很多函数的运行
- Looper Checker预警我们不能长时间的霸占消息队列,否则其他消息将得不到处理
HandlerChecker的构造函数
public final class HandlerChecker implements Runnable {private final Handler mHandler;private final String mName;private final long mWaitMax;private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();private boolean mCompleted;private Monitor mCurrentMonitor;private long mStartTime;HandlerChecker(Handler handler, String name, long waitMaxMillis) {mHandler = handler; //线程handlermName = name; //名称mWaitMax = waitMaxMillis; //等待超时时间mCompleted = true; //线程状态}
}
HandlerChecker::scheduleCheckLocked
这个方法是在Watchdog中的run方法会调用, 是HandlerChecker的核心方法, 用来检查HandlerChecker是否发生了死锁.
public void scheduleCheckLocked() {if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {// If the target looper has recently been polling, then// there is no reason to enqueue our checker on it since that// is as good as it not being deadlocked. This avoid having// to do a context switch to check the thread. Note that we// only do this if mCheckReboot is false and we have no// monitors, since those would need to be executed at this point.mCompleted = true;return;}if (!mCompleted) {// we already have a check in flight, so no needreturn;}mCompleted = false;mCurrentMonitor = null;mStartTime = SystemClock.uptimeMillis();mHandler.postAtFrontOfQueue(this);
}
- isPolling() 这个方法是判断当前线程Looper是否就绪的核心方法. 如果true 当前正在轮询事件, 正常运行, 会继续向下执行
- 如果没有mCompleted, 说明已经在检查了
- `mHandler.postAtFrontOfQueue(this)将自己post到队列中, 之后会执行run方法
在scheduleCheckLocked 中,其实主要是处理mMonitorChecker 的情况,对于其他的没有monitor 注册进来的且处于polling 状态的 HandlerChecker 是不去检查的,例如,UiThread,肯定一直处于polling 状态。
MessageQueue::isPolling
mHandler.getLooper().getQueue().isPolling()
这个方法可以判断当前线程是否被卡住.
true: 表示looper当前正在轮询事件,
这个方法的实现在MessageQueue中,可以看到上面的注释写到:返回当前的looper线程是否在polling工作来做,这个是个很好的用于检测loop是否存活的方法。
frameworks/base/core/java/android/os/MessageQueue.java
/*** Returns whether this looper's thread is currently polling for more work to do.* This is a good signal that the loop is still alive rather than being stuck* handling a callback. Note that this method is intrinsically racy, since the* state of the loop can change before you get the result back.** <p>This method is safe to call from any thread.** @return True if the looper is currently polling for events.* @hide*/
public boolean isPolling() {synchronized (this) {return isPollingLocked();}
}
HandlerChecker::run
@Override
public void run() {final int size = mMonitors.size();for (int i = 0 ; i < size ; i++) {synchronized (Watchdog.this) {mCurrentMonitor = mMonitors.get(i);}mCurrentMonitor.monitor();}synchronized (Watchdog.this) {mCompleted = true;mCurrentMonitor = null;}
}
- 里面对自己的Monitors遍历并进行monitor。若有monitor发生了阻塞,那么mComplete会一直是false。
- for循环用来检测监听列表中是否有阻塞,而且只有mMonitorChecker会走进此循环
- 其余的handlerChecker因为mMonitors为空,都不会执行此循环
HandlerChecker::getCompletionStateLocked
public int getCompletionStateLocked() {if (mCompleted) {return COMPLETED;} else {long latency = SystemClock.uptimeMillis() - mStartTime;if (latency < mWaitMax/2) {return WAITING;} else if (latency < mWaitMax) {return WAITED_HALF;}}return OVERDUE;
}
- 获取完成时间标识, mStartTime初值是在scheduleCheckLocked中设置的
- 在系统检测调用这个获取未完成状态时,就会进入else里面,进行了时间的计算,并返回相应的时间状态码。
线程的状态
状态 | 描述 |
---|---|
COMPLETED | 对应消息已处理完毕线程无阻塞 |
WAITING | 对应消息处理花费0~29秒,继续运行 |
WAITED_HALF | 对应消息处理花费30~59秒,线程可能已经被阻塞,需要保存当前AMS堆栈状态, 继续监听 |
OVERDUE | 对应消息处理已经花费超过60, 准备 kill 当前进程. 能够走到这里,说明已经发生了超时60秒了。那么下面接下来全是应对超时的情况 |
HandlerThread的继承关系
这里的HandlerChecker使用的传入参数都是创建的HandlerThread线程的Handler
java.lang.Object↳ Thread implements Runnable↳ HandlerThread extends Thread↳ ServiceThread extends HandlerThread↳ FgThread extends ServiceThread
初始化的HandlerChecker
public ServiceThread(String name, int priority, boolean allowIo)private FgThread() {super("android.fg", android.os.Process.THREAD_PRIORITY_DEFAULT, true /*allowIo*/);
}private UiThread() {super("android.ui", Process.THREAD_PRIORITY_FOREGROUND, false /*allowIo*/);
}private IoThread() {super("android.io", android.os.Process.THREAD_PRIORITY_DEFAULT, true /*allowIo*/);
}private DisplayThread() {//DisplayThread运行重要的东西,但这些东西不如AnimationThread中运行的东西重要。//因此,将优先级设置为较低的一个。super("android.display", Process.THREAD_PRIORITY_DISPLAY + 1, false /*allowIo*/);
}
Android线程优先级
frameworks/base/core/java/android/os/Process.java
public static final int THREAD_PRIORITY_DEFAULT = 0; //默认的线程优先级
public static final int THREAD_PRIORITY_LOWEST = 19; //最低的线程级别
public static final int THREAD_PRIORITY_BACKGROUND = 10; //后台线程建议设置这个优先级
public static final int THREAD_PRIORITY_FOREGROUND = -2; //用户正在交互的UI线程,代码中无法设置该优先级,系统会按照情况调整到该优先级
public static final int THREAD_PRIORITY_DISPLAY = -4; //也是与UI交互相关的优先级界别,但是要比THREAD_PRIORITY_FOREGROUND优先
public static final int THREAD_PRIORITY_URGENT_DISPLAY = -8; //显示线程的最高级别,用来处理绘制画面和检索输入事件
public static final int THREAD_PRIORITY_AUDIO = -16; //声音线程的标准级别
public static final int THREAD_PRIORITY_URGENT_AUDIO = -19; //声音线程的最高级别,优先程度较THREAD_PRIORITY_AUDIO要高。
public static final int THREAD_PRIORITY_MORE_FAVORABLE = -1; //相对THREAD_PRIORITY_DEFAULT稍微优先
public static final int THREAD_PRIORITY_LESS_FAVORABLE = 1; // 相对THREAD_PRIORITY_DEFAULT稍微落后一些
应用设置线程优先级的方法如下, 但是有一些级别是不允许应用设置的, 是由系统进行分配的.
Process.setThreadPriority(Process.THREAD_PRIORITY_BACKGROUND +Process.THREAD_PRIORITY_LESS_FAVORABLE)
describeBlockedStateLocked
public String describeBlockedStateLocked() {if (mCurrentMonitor == null) {return "Blocked in handler on " + mName + " (" + getThread().getName() + ")";} else {return "Blocked in monitor " + mCurrentMonitor.getClass().getName()+ " on " + mName + " (" + getThread().getName() + ")";}
}
打印Monitor信息
Monitor
Monitor是一个接口, 用来
public interface Monitor {void monitor();
}
实现Watchdog.Monitor接口的类
ActivityManagerService
WindowManagerService
PowerManagerService
InputManagerService
MediaSessionService
MediaRouterService
StorageManagerService
NetworkManagementService
NativeDaemonConnector
MediaProjectionManagerService
TvRemoteService
BinderThreadMonitor
OpenFdMonitor
Monitor是一个接口,实现这个接口的类有好几个。比如:如下是android9.0搜出来的结果
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QpJfi2aa-1666612570217)(/home/jun/Desktop/Plane3/CoreSystemServer/watchdog/WatchdogImplClass.png)]
使用Watchdog
这么多的类实现了该接口, 他们都注册到了Watchdog中, 如AMS中
public class ActivityManagerService extends IActivityManager.Stubimplements Watchdog.Monitor, BatteryStatsImpl.BatteryCallback {......public ActivityManagerService(Context systemContext) {......Watchdog.getInstance().addMonitor(this);Watchdog.getInstance().addThread(mHandler);......}....../** In this method we try to acquire our lock to make sure that we have not deadlocked */public void monitor() {synchronized (this) { }}......
}
Watchdog::addThread
public void addThread(Handler thread) {addThread(thread, DEFAULT_TIMEOUT); //60s
}public void addThread(Handler thread, long timeoutMillis) {synchronized (this) {if (isAlive()) {throw new RuntimeException("Threads can't be added once the Watchdog is running");}final String name = thread.getLooper().getThread().getName();mHandlerCheckers.add(new HandlerChecker(thread, name, timeoutMillis));}
}
- addThread是将线程的Hander传给Watchdog, 然后Watchdog会根据Handler创建一个新的HandlerChecker,
- 将新的HandlerChecker添加到监听队列中
Watchdog::addMonitor
public void addMonitor(Monitor monitor) {synchronized (this) {if (isAlive()) {throw new RuntimeException("Monitors can't be added once the Watchdog is running");}mMonitorChecker.addMonitor(monitor);}
}
- 传递monitor, Watchdog会调用monitor方法, 来判断是否发生阻塞
- 所有的Monitor都添加到了mMonitorChecker, 所以只有mMonitorChecker里是有Monitor的
Watchdog::run()
Watchdog的核心方法, 检查线程死锁, looper阻塞, 收集信息和kill掉system_server进程, 重启
@Override
public void run() {boolean waitedHalf = false;while (true) {final List<HandlerChecker> blockedCheckers;final String subject;final boolean allowRestart;int debuggerWasConnected = 0;synchronized (this) {long timeout = CHECK_INTERVAL;// Make sure we (re)spin the checkers that have become idle within// this wait-and-check intervalfor (int i=0; i<mHandlerCheckers.size(); i++) {//调用每个HandlerChecker的scheduleCheckLocked() 方法HandlerChecker hc = mHandlerCheckers.get(i);hc.scheduleCheckLocked();}if (debuggerWasConnected > 0) {debuggerWasConnected--;}// NOTE: We use uptimeMillis() here because we do not want to increment the time we// wait while asleep. If the device is asleep then the thing that we are waiting// to timeout on is asleep as well and won't have a chance to run, causing a false// positive on when to kill things.long start = SystemClock.uptimeMillis(); while (timeout > 0) {if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}try {wait(timeout);} catch (InterruptedException e) {Log.wtf(TAG, e);}if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);}boolean fdLimitTriggered = false;if (mOpenFdMonitor != null) {fdLimitTriggered = mOpenFdMonitor.monitor();}if (!fdLimitTriggered) {final int waitState = evaluateCheckerCompletionLocked();if (waitState == COMPLETED) { //线程状态正常,重新轮询// The monitors have returned; resetwaitedHalf = false;continue;} else if (waitState == WAITING) {//处于阻塞状态,但监测时间小于30s,继续监测// still waiting but within their configured intervals; back off and recheckcontinue;} else if (waitState == WAITED_HALF) {//处于阻塞状态,监测时间已经超过30s,开始dump一些系统信息,然后继续监测30sif (!waitedHalf) {// We've waited half the deadlock-detection interval. Pull a stack// trace and wait another half.ArrayList<Integer> pids = new ArrayList<Integer>();pids.add(Process.myPid());ActivityManagerService.dumpStackTraces(true, pids, null, null,getInterestingNativePids());waitedHalf = true;}continue;}// something is overdue!blockedCheckers = getBlockedCheckersLocked();subject = describeCheckersLocked(blockedCheckers);} else {blockedCheckers = Collections.emptyList();subject = "Open FD high water mark reached";}allowRestart = mAllowRestart;}// If we got here, that means that the system is most likely hung.// First collect stack traces from all threads of the system process.// Then kill this process so that the system will restart.EventLog.writeEvent(EventLogTags.WATCHDOG, subject);ArrayList<Integer> pids = new ArrayList<>();pids.add(Process.myPid());if (mPhonePid > 0) pids.add(mPhonePid);// Pass !waitedHalf so that just in case we somehow wind up here without having// dumped the halfway stacks, we properly re-initialize the trace file.final File stack = ActivityManagerService.dumpStackTraces(!waitedHalf, pids, null, null, getInterestingNativePids());// Give some extra time to make sure the stack traces get written.// The system's been hanging for a minute, another second or two won't hurt much.SystemClock.sleep(2000);// Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel logdoSysRq('w');doSysRq('l');// Try to add the error to the dropbox, but assuming that the ActivityManager// itself may be deadlocked. (which has happened, causing this statement to// deadlock and the watchdog as a whole to be ineffective)Thread dropboxThread = new Thread("watchdogWriteToDropbox") {public void run() {mActivity.addErrorToDropBox("watchdog", null, "system_server", null, null,subject, null, stack, null);}};dropboxThread.start();try {dropboxThread.join(2000); // wait up to 2 seconds for it to return.} catch (InterruptedException ignored) {}IActivityController controller;synchronized (this) {controller = mController;}if (controller != null) {Slog.i(TAG, "Reporting stuck state to activity controller");try {Binder.setDumpDisabled("Service dumps disabled due to hung system process.");// 1 = keep waiting, -1 = kill systemint res = controller.systemNotResponding(subject);if (res >= 0) {Slog.i(TAG, "Activity controller requested to coninue to wait");waitedHalf = false;continue;}} catch (RemoteException e) {}}// Only kill the process if the debugger is not attached.if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}if (debuggerWasConnected >= 2) {Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");} else if (debuggerWasConnected > 0) {Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");} else if (!allowRestart) {Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");} else {Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);Slog.w(TAG, "*** GOODBYE!");Process.killProcess(Process.myPid());System.exit(10);}waitedHalf = false;}
}
run() 方法就是死循环, 不断的去遍历所有HandlerChecker,并调其监控方法,等待三十秒,评估状态。
遍历所有的HandlerChecker, 并调用其scheduleCheckLocked方法, 记录开始时间
for (int i=0; i<mHandlerCheckers.size(); i++) {HandlerChecker hc = mHandlerCheckers.get(i);hc.scheduleCheckLocked(); }
等待 30 秒
// 等待30秒 //使用uptimeMills是为了不把手机睡眠时间算进入,手机睡眠时系统服务同样睡眠 long start = SystemClock.uptimeMillis(); while (timeout > 0) {if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}try {wait(timeout);} catch (InterruptedException e) {Log.wtf(TAG, e);}if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start); }
评估Checker的状态,里面会遍历所有的HandlerChecker,并获取最大的返回值。
最大的返回值有四种情况:- COMPLETED 对应消息已处理完毕线程无阻塞
- WAITING 对应消息处理花费0~29秒,继续运行
- WAITED_HALF 对应消息处理花费30~59秒,线程可能已经被阻塞,需要保存当前AMS堆栈状态, 继续监听
- OVERDUE 对应消息处理已经花费超过60, 准备 kill 当前进程. 能够走到这里,说明已经发生了超时60秒了。那么下面接下来全是应对超时的情况
boolean fdLimitTriggered = false; if (mOpenFdMonitor != null) {fdLimitTriggered = mOpenFdMonitor.monitor(); } if (!fdLimitTriggered) {final int waitState = evaluateCheckerCompletionLocked();if (waitState == COMPLETED) {// The monitors have returned; resetwaitedHalf = false;continue;} else if (waitState == WAITING) {// still waiting but within their configured intervals; back off and recheckcontinue;} else if (waitState == WAITED_HALF) {if (!waitedHalf) {// We've waited half the deadlock-detection interval. Pull a stack// trace and wait another half.ArrayList<Integer> pids = new ArrayList<Integer>();pids.add(Process.myPid());ActivityManagerService.dumpStackTraces(true, pids, null, null,getInterestingNativePids());waitedHalf = true;}continue;}// something is overdue!blockedCheckers = getBlockedCheckersLocked();subject = describeCheckersLocked(blockedCheckers); } else {blockedCheckers = Collections.emptyList();subject = "Open FD high water mark reached"; }
fdMonitor
public boolean monitor() {if (mFdHighWaterMark.exists()) {dumpOpenDescriptors();return true;}return false; }
收集信息
杀死系统进程
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
Slog.w(TAG, "*** GOODBYE!");
Process.killProcess(Process.myPid());
System.exit(10);
HandlerChecker::scheduleCheckLocked
HandlerChecker::run
Watchdog::evaluateCheckerCompletionLocked
评估Checker的状态,里面会遍历所有的HandlerChecker,并获取最大的返回值。
private int evaluateCheckerCompletionLocked() {int state = COMPLETED;// COMPLETED = 0for (int i=0; i<mHandlerCheckers.size(); i++) {HandlerChecker hc = mHandlerCheckers.get(i);state = Math.max(state, hc.getCompletionStateLocked());}return state;
}
HandlerChecker::getCompletionStateLocked
Watchdog::getBlockedCheckersLocked
Watchdog::describeCheckersLocked
private ArrayList<HandlerChecker> getBlockedCheckersLocked() {ArrayList<HandlerChecker> checkers = new ArrayList<HandlerChecker>();for (int i=0; i<mHandlerCheckers.size(); i++) {HandlerChecker hc = mHandlerCheckers.get(i);if (hc.isOverdueLocked()) {checkers.add(hc);}}return checkers;
}private String describeCheckersLocked(List<HandlerChecker> checkers) {StringBuilder builder = new StringBuilder(128);for (int i=0; i<checkers.size(); i++) {if (builder.length() > 0) {builder.append(", ");}builder.append(checkers.get(i).describeBlockedStateLocked());}return builder.toString();
}
- 打印阻塞或死锁线程的信息
注意
通过 monitor() 方法检查死锁针对不同线程之间的,而服务主线程是否阻塞是针对主线程,所以通过 sendMessage() 方式是只能检测主线程是否阻塞,而不能检测是否死锁,因为如果服务主线程和另外一个线程发生死锁(如另外一个线程synchronized 关键字长时间持有某个锁,不释放),此时向主线程发送 Message,主线程的Handler是可以继续处理的。
触发方法
- Blocked in Monitor
使用Monitor接口中的锁一直无法释放即可 - Blocked in handler
可以在Service的onCreate中做crash, 这样长时间就会导致systemServer重启.
触发log
常见Log有下面两种,一种是Blocked in handler 、另外一种是: Blocked in monitor
Blocked in handler
11-15 06:56:39.696 24203 24902 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in handler on main thread (main), Blocked in handler on ui thread (android.ui)
11-15 06:56:39.696 24203 24902 W Watchdog: main thread stack trace:
11-15 06:56:39.696 24203 24902 W Watchdog: at android.os.MessageQueue.nativePollOnce(Native Method)
11-15 06:56:39.696 24203 24902 W Watchdog: at android.os.MessageQueue.next(MessageQueue.java:323)
11-15 06:56:39.696 24203 24902 W Watchdog: at android.os.Looper.loop(Looper.java:142)
11-15 06:56:39.696 24203 24902 W Watchdog: at com.android.server.SystemServer.run(SystemServer.java:377)
11-15 06:56:39.696 24203 24902 W Watchdog: at com.android.server.SystemServer.main(SystemServer.java:239)
11-15 06:56:39.696 24203 24902 W Watchdog: at java.lang.reflect.Method.invoke(Native Method)
11-15 06:56:39.696 24203 24902 W Watchdog: at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:901)
11-15 06:56:39.696 24203 24902 W Watchdog: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:791)
11-15 06:56:39.696 24203 24902 W Watchdog: ui thread stack trace:
......
Blocked in monitor
10-26 00:07:00.884 1000 17132 17312 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.Watchdog$BinderThreadMonitor on foreground thread (android.fg)
10-26 00:07:00.884 1000 17132 17312 W Watchdog: foreground thread stack trace:
10-26 00:07:00.885 1000 17132 17312 W Watchdog: at android.os.Binder.blockUntilThreadAvailable(Native Method)
10-26 00:07:00.885 1000 17132 17312 W Watchdog: at com.android.server.Watchdog$BinderThreadMonitor.monitor(Watchdog.java:381)
10-26 00:07:00.885 1000 17132 17312 W Watchdog: at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:353)
10-26 00:07:00.885 1000 17132 17312 W Watchdog: at android.os.Handler.handleCallback(Handler.java:873)
10-26 00:07:00.886 1000 17132 17312 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:99)
10-26 00:07:00.886 1000 17132 17312 W Watchdog: at android.os.Looper.loop(Looper.java:193)
10-26 00:07:00.886 1000 17132 17312 W Watchdog: at android.os.HandlerThread.run(HandlerThread.java:65)
10-26 00:07:00.886 1000 17132 17312 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:44)
10-26 00:07:00.886 1000 17132 17312 W Watchdog: *** GOODBYE!
reference
Android SystemServer 中 WatchDog 机制介绍
Android系统层Watchdog机制源码分析
Watchdog原理和问题分析
Android 系统中的 WatchDog 详解
应用与系统稳定性第五篇—Watchdog原理和问题分析
Watchdog 日志分析
Watchdog识别到SystemServer线程死锁后, 会收集打印信息, 代码在run函数中
while (true) {//如果发生了死锁或者消息队列阻塞就会走到下面 // If we got here, that means that the system is most likely hung.// First collect stack traces from all threads of the system process.// Then kill this process so that the system will restart.EventLog.writeEvent(EventLogTags.WATCHDOG, subject);ArrayList<Integer> pids = new ArrayList<>();pids.add(Process.myPid());if (mPhonePid > 0) pids.add(mPhonePid);// Pass !waitedHalf so that just in case we somehow wind up here without having// dumped the halfway stacks, we properly re-initialize the trace file.final File stack = ActivityManagerService.dumpStackTraces(!waitedHalf, pids, null, null, getInterestingNativePids());// Give some extra time to make sure the stack traces get written.// The system's been hanging for a minute, another second or two won't hurt much.SystemClock.sleep(2000);// Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel logdoSysRq('w');doSysRq('l');// Try to add the error to the dropbox, but assuming that the ActivityManager// itself may be deadlocked. (which has happened, causing this statement to// deadlock and the watchdog as a whole to be ineffective)Thread dropboxThread = new Thread("watchdogWriteToDropbox") {public void run() {mActivity.addErrorToDropBox("watchdog", null, "system_server", null, null,subject, null, stack, null);}};dropboxThread.start();try {dropboxThread.join(2000); // wait up to 2 seconds for it to return.} catch (InterruptedException ignored) {}IActivityController controller;synchronized (this) {controller = mController;}if (controller != null) {Slog.i(TAG, "Reporting stuck state to activity controller");try {Binder.setDumpDisabled("Service dumps disabled due to hung system process.");// 1 = keep waiting, -1 = kill systemint res = controller.systemNotResponding(subject);if (res >= 0) {Slog.i(TAG, "Activity controller requested to coninue to wait");waitedHalf = false;continue;}} catch (RemoteException e) {}}// Only kill the process if the debugger is not attached.if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2;}if (debuggerWasConnected >= 2) {Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");} else if (debuggerWasConnected > 0) {Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");} else if (!allowRestart) {Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");} else {Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);Slog.w(TAG, "*** GOODBYE!");Process.killProcess(Process.myPid());System.exit(10);}waitedHalf = false;
}
输出event log
EventLog.writeEvent(EventLogTags.WATCHDOG, subject);
dump 堆栈信息
ArrayList<Integer> pids = new ArrayList<>();
pids.add(Process.myPid());
if (mPhonePid > 0) pids.add(mPhonePid);
// Pass !waitedHalf so that just in case we somehow wind up here without having
// dumped the halfway stacks, we properly re-initialize the trace file.
final File stack = ActivityManagerService.dumpStackTraces(!waitedHalf, pids, null, null, getInterestingNativePids());
// Give some extra time to make sure the stack traces get written.
// The system's been hanging for a minute, another second or two won't hurt much.
SystemClock.sleep(2000);
dump kerner info
// Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log doSysRq('w'); doSysRq('l');
收集dropbox信息
// Try to add the error to the dropbox, but assuming that the ActivityManager // itself may be deadlocked. (which has happened, causing this statement to // deadlock and the watchdog as a whole to be ineffective) Thread dropboxThread = new Thread("watchdogWriteToDropbox") {public void run() {mActivity.addErrorToDropBox("watchdog", null, "system_server", null, null,subject, null, stack, null);} }; dropboxThread.start(); try {dropboxThread.join(2000); // wait up to 2 seconds for it to return. } catch (InterruptedException ignored) {}
kill 掉系统进程, 如果不在debug模式, 就kill掉自己
// Only kill the process if the debugger is not attached. if (Debug.isDebuggerConnected()) {debuggerWasConnected = 2; } if (debuggerWasConnected >= 2) {Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process"); } else if (debuggerWasConnected > 0) {Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process"); } else if (!allowRestart) {Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process"); } else {Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);Slog.w(TAG, "*** GOODBYE!");Process.killProcess(Process.myPid());System.exit(10); }
prop dalvik.vm.stack-trace-dir
指的是 /data/anr
final String tracesDirProp = SystemProperties.get("dalvik.vm.stack-trace-dir", "");
reference
Android 系统中WatchDog 日志分析
Java基础之—反射
android watchdog机制相关推荐
- Android消息处理机制
Google参考了Windows的消息处理机制,在Android系统中实现了一套类似的消息处理机制.学习Android的消息处理机制,有几个概念(类)必须了解: 1. Message 消息 ...
- linux的watchdog代码分析,Watchdog机制以及问题分析
目录 1. 概览 Watchdog的中文的"看门狗",有保护的意思.最早引入Watchdog是在单片机系统中,由于单片机的工作环境容易受到外界磁场的干扰,导致程序"跑飞& ...
- Watchdog机制原理
Watchdog机制 1.什么是SWT: Softwere Watchdog Timeout,顾名思义就是软件超时监控狗. Watchdog.java 位于frameworks/base/servic ...
- Android lmkd 机制从R到T
源码基于:Android T 相关博文: Android lmkd 机制详解(一) Android lmkd 机制详解(二) 0. 前言 之前有粉丝在问笔者,如上面详解的两篇博文都是基于 Androi ...
- Android消息机制Handler用法
这篇文章介绍了Android消息机制Handler用法总结,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧 1.简述 Handler消息机制主要包括: Messa ...
- 【腾讯Bugly干货分享】经典随机Crash之二:Android消息机制
为什么80%的码农都做不了架构师?>>> 本文作者:鲁可--腾讯SNG专项测试组 测试工程师 背景 承上经典随机Crash之一:线程安全 问题的模型 好几次灰度top1.top ...
- Android刷新机制-View绘制原理
Android刷新机制-View绘制原理 Android刷新机制-SurfaceFlinger原理 Android刷新机制-Choreographer原理 一.概述 本文将从startActivity ...
- android消息池,回转寿司你一定吃过!——Android消息机制(构造)
消息机制的故事寿司陈放在寿司碟上,寿司碟按先后顺序被排成队列送上传送带.传送带被启动后,寿司挨个呈现到你面前,你有三种享用寿司的方法. 将Android概念带入后,就变成了Android消息机制的故事 ...
- android handler的机制和原理_一文搞懂handler:彻底明白Android消息机制的原理及源码
提起Android消息机制,想必都不陌生.其中包含三个部分:Handler,MessageQueue以及Looper,三者共同协作,完成消息机制的运行.本篇文章将由浅入深解析Android消息机制的运 ...
最新文章
- 简单介绍python装饰器
- RDKit | 基于随机森林的化合物活性二分类模型
- @程序员,什么才是“2020-1024”的正确打开姿势?
- GP TEE_ObjectInfo结构体在不同的版本之间的变化
- 归并排序 java 迭代_经典排序算法之归并排序(示例代码)
- leetcode114. 二叉树展开为链表(深度优先搜索)
- 重磅更新!YoloV4最新论文!解读yolov4框架
- 红帽 安装oracle11g,64位RedHat 5.6下安装Oracle 11g
- 让我们深入了解PP YOLO做出的贡献
- MABN论文的译读笔记
- 【现代编译器】语法分析——正则表达式,上下文无关文法,递归下降分析,分析树...
- sqlserver中的循环遍历(普通循环和游标循环)(转载)
- 【Python】使用分隔符拆分字符串
- Windows安装Redis并设置为开机启动
- package.json browserslist
- 基础(四)之java后端根据经纬度获取地址
- 300例注册表应用技巧
- 合工大计算机与信息学院保研,合肥工业大学计算机与信息学院(专业学位)计算机技术保研夏令营...
- HTML开发过程中遇到的尺寸问题
- 汉语言文学专业c学校,自考汉语言文学专业哪个学校好?