Android中ANR的分析和解决
一、认识ANR
ANR的定义
ANR,是“Application Not Responding”的缩写,即“应用程序无响应”。如果你应用程序在UI线程被阻塞太长时间,就会出现ANR,通常出现ANR,系统会弹出一个提示提示框,让用户知道,该程序正在被阻塞,是否继续等待还是关闭。
ANR的类型
(1)KeyDispatchTimeout(常见)
input事件在5S内没有处理完成发生了ANR。
logcat日志关键字:Input event dispatching timed out
(2)BroadcastTimeout
前台Broadcast:onReceiver在10S内没有处理完成发生ANR。
后台Broadcast:onReceiver在60s内没有处理完成发生ANR。
logcat日志关键字:Timeout of broadcast BroadcastRecord
(3)ServiceTimeout
前台Service:onCreate,onStart,onBind等生命周期在20s内没有处理完成发生ANR。
后台Service:onCreate,onStart,onBind等生命周期在200s内没有处理完成发生ANR
logcat日志关键字:Timeout executing service
(4)ContentProviderTimeout
ContentProvider 在10S内没有处理完成发生ANR。
logcat日志关键字:timeout publishing content providers
ANR出现的原因
(1)主线程频繁进行耗时的IO操作:如数据库读写
(2)多线程操作的死锁,主线程被block;
(3)主线程被Binder 对端block;
(4)System Server中WatchDog出现ANR;
(5)service binder的连接达到上线无法和和System Server通信
(6)系统资源已耗尽(管道、CPU、IO)
如何分析ANR
(1)日志分析:ANR发生时都会在log中输出错误信息,从log中可以获得ANR的类型,CPU的使用情况,CPU使用率过高有可能是CPU饥饿导致了ANR。CPU使用率过低说明主线程被block了,如果IOwait高是因为主线程进行I/O操作造成的。
(2)traces文件分析:除了log输出外,你会发现各个应用进程和系统进程的函数堆栈信息都输出到了一个/data/anr/traces.txt的文件中,这个文件是分析ANR原因的关键文件.要获取到该文件可使用adb指令进行赋权后拉出查看调用stack。通过log、trace.text、代码结合分析ANR的成因(iowait?Memoryleak?Block?)
(3)traces文件无法分析的:不过还存在一些ANR问题,trace文件是分析不了的,例如我们的系统上,人脸识别活体攻击的时候,native算法耗尽cpu资源导致其他app无法抢占cpu时间片导致anr,假如ANR的app是你开发的,估计查到死也找不到问题所在,类似这类问题也写过简要的分析文章:
接下来我们将一步一步分析ANR,这个过程能让我们进一步明白如何找到问题、分析问题以及解决问题。
二、ANR发生时调用的关键代码
当ANR发生的时候,系统会调用如下相关的关键函数代码,来将系统当前的关键信息保存到日志当中:
framework/base/services/core/java/com/android/server/am/AppErrors.java
class AppErrors { final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation) { ArrayList<Integer> firstPids = new ArrayList<Integer>(5); SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20); if (mService.mController != null) { try { // 0 == continue, -1 = kill process immediately int res = mService.mController.appEarlyNotResponding( app.processName, app.pid, annotation); if (res < 0 && app.pid != MY_PID) { app.kill("anr", true); } } catch (RemoteException e) { mService.mController = null; Watchdog.getInstance().setActivityController(null); } } long anrTime = SystemClock.uptimeMillis(); if (ActivityManagerService.MONITOR_CPU_USAGE) { mService.updateCpuStatsNow(); } // Unless configured otherwise, swallow ANRs in background processes & kill the process. boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(), Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0; boolean isSilentANR; synchronized (mService) { // PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down. if (mService.mShuttingDown) { Slog.i(TAG, "During shutdown skipping ANR: " + app + " " + annotation); return; } else if (app.notResponding) { Slog.i(TAG, "Skipping duplicate ANR: " + app + " " + annotation); return; } else if (app.crashing) { Slog.i(TAG, "Crashing app skipping ANR: " + app + " " + annotation); return; } else if (app.killedByAm) { Slog.i(TAG, "App already killed by AM skipping ANR: " + app + " " + annotation); return; } else if (app.killed) { Slog.i(TAG, "Skipping died app ANR: " + app + " " + annotation); return; } // In case we come through here for the same app before completing // this one, mark as anring now so we will bail out. app.notResponding = true; // 记录ANR到event log EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid, app.processName, app.info.flags, annotation); // Dump thread traces as quickly as we can, starting with "interesting" processes. firstPids.add(app.pid); // Don't dump other PIDs if it's a background ANR isSilentANR = !showBackground && !isInterestingForBackgroundTraces(app); if (!isSilentANR) { int parentPid = app.pid; if (parent != null && parent.app != null && parent.app.pid > 0) { parentPid = parent.app.pid; } if (parentPid != app.pid) firstPids.add(parentPid); if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID); for (int i = mService.mLruProcesses.size() - 1; i >= 0; i--) { ProcessRecord r = mService.mLruProcesses.get(i); if (r != null && r.thread != null) { int pid = r.pid; if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {if (r.persistent) { firstPids.add(pid); if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);} else if (r.treatLikeActivity) { firstPids.add(pid); if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);} else { lastPids.put(pid, Boolean.TRUE); if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);} } } } } } // 记录ANR 到 main log StringBuilder info = new StringBuilder(); info.setLength(0); info.append("ANR in ").append(app.processName); //例如:ANR in com.xxxx.moblie (进程名) if (activity != null && activity.shortComponentName != null) { info.append(" (").append(activity.shortComponentName).append(")");//例如:(com.xxxx.moblie/.ui.MainActivity) } info.append("\n");//换行 info.append("PID: ").append(app.pid).append("\n");//记录进程id if (annotation != null) { info.append("Reason: ").append(annotation).append("\n"); } if (parent != null && parent != activity) { info.append("Parent: ").append(parent.shortComponentName).append("\n"); } ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true); // don't dump native PIDs for background ANRs unless it is the process of interest String[] nativeProcs = null; if (isSilentANR) { for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) { if (NATIVE_STACKS_OF_INTEREST[i].equals(app.processName)) { nativeProcs = new String[] { app.processName }; break; } } } else { nativeProcs = NATIVE_STACKS_OF_INTEREST; } int[] pids = nativeProcs == null ? null : Process.getPidsForCommands(nativeProcs); ArrayList<Integer> nativePids = null; if (pids != null) { nativePids = new ArrayList<Integer>(pids.length); for (int i : pids) { nativePids.add(i); } } // For background ANRs, don't pass the ProcessCpuTracker to // avoid spending 1/2 second collecting stats to rank lastPids. File tracesFile = ActivityManagerService.dumpStackTraces( true, firstPids, (isSilentANR) ? null : processCpuTracker, (isSilentANR) ? null : lastPids, nativePids); String cpuInfo = null; if (ActivityManagerService.MONITOR_CPU_USAGE) { mService.updateCpuStatsNow(); synchronized (mService.mProcessCpuTracker) { cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime); } info.append(processCpuTracker.printCurrentLoad()); info.append(cpuInfo); } info.append(processCpuTracker.printCurrentState(anrTime));//将ANR相关的错误日志信息打印出来 Slog.e(TAG, info.toString());...省略... }
通过上面的代码我们知道,当ANR发生的时候,会将ANR记录到event log和main log中。
三、查看events_log(如果没有找到该文件可以跳过这一步)
查看mobilelog文件夹下的events_log,从日志中搜索关键字:am_anr,找到出现ANR的时间点、进程PID、ANR类型。
如日志:
07-20 15:36:36.472 1000 1520 1597 I am_anr : [0,1480,com.xxxx.moblie,952680005,Input dispatching timed out (AppWindowToken{da8f666 token=Token{5501f51 ActivityRecord{15c5c78 u0 com.xxxx.moblie/.ui.MainActivity t3862}}}, Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)]
从上面的log我们可以看出: 应用com.xxxx.moblie 在07-20 15:36:36.472时间,发生了一次KeyDispatchTimeout类型的ANR,它的进程号是1480.
把关键的信息整理一下:
- ANR时间:07-20 15:36:36.472
- 进程pid:1480
- 进程名:com.xxxx.moblie
- ANR类型:KeyDispatchTimeout
我们已经知道了发生KeyDispatchTimeout的ANR是因为 input事件在5秒内没有处理完成。那么在这个时间07-20 15:36:36.472 的前5秒,也就是(15:36:30 ~15:36:31)时间段左右程序到底做了什么事情?这个简单,因为我们已经知道pid了,再搜索一下pid = 1480的日志.这些日志表示该进程所运行的轨迹,关键的日志如下:
07-20 15:36:29.749 10102 1480 1737 D moblie-Application: [Thread:17329] receive an intent from server, action=com.ttt.push.RECEIVE_MESSAGE07-20 15:36:30.136 10102 1480 1737 D moblie-Application: receiving an empty message, drop07-20 15:36:35.791 10102 1480 1766 I Adreno : QUALCOMM build : 9c9b012, I92eb381bc907-20 15:36:35.791 10102 1480 1766 I Adreno : Build Date : 12/31/1707-20 15:36:35.791 10102 1480 1766 I Adreno : OpenGL ES Shader Compiler Version: EV031.22.00.0107-20 15:36:35.791 10102 1480 1766 I Adreno : Local Branch : 07-20 15:36:35.791 10102 1480 1766 I Adreno : Remote Branch : refs/tags/AU_LINUX_ANDROID_LA.UM.6.4.R1.08.00.00.309.04907-20 15:36:35.791 10102 1480 1766 I Adreno : Remote Branch : NONE07-20 15:36:35.791 10102 1480 1766 I Adreno : Reconstruct Branch : NOTHING07-20 15:36:35.826 10102 1480 1766 I vndksupport: sphal namespace is not configured for this process. Loading /vendor/lib64/hw/gralloc.msm8998.so from the current namespace instead.07-20 15:36:36.682 10102 1480 1480 W ViewRootImpl[MainActivity]: Cancelling event due to no window focus: KeyEvent { action=ACTION_UP, keyCode=KEYCODE_PERIOD, scanCode=0, metaState=0, flags=0x28, repeatCount=0, eventTime=16099429, downTime=16099429, deviceId=-1, source=0x101 }
从上面我们可以知道,在时间 07-20 15:36:29.749 程序收到了一个action消息。
07-20 15:36:29.749 10102 1480 1737 D moblie-Application: [Thread:17329] receive an intent from server, action=com.ttt.push.RECEIVE_MESSAGE。
原来是应用com.xxxx.moblie 收到了一个推送消息(com.ttt.push.RECEIVE_MESSAGE)导致了阻塞,我们再串联一下目前所获取到的信息:在时间07-20 15:36:29.749 应用com.xxxx.moblie 收到了一下推送信息action=com.ttt.push.RECEIVE_MESSAGE发生阻塞,5秒后发生了KeyDispatchTimeout的ANR。
四、查看main_log日志信息
在分析ANR的时候,我们首先要确认是不是当时CPU很紧张、各路APP都在抢占资源,CPU无法及时响应最终导致了ANR?为了排查这种情况,我们就需要获取ANR发生时候的CPU状态参数。
接下来我们来看一个ANR模拟日志案例
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: ANR in com.xxxx.moblie (com.xxxx.moblie/.ui.MainActivity) (关键字ANR in + 进程名 + Activity名称)07-20 15:36:58.711 1000 1520 1597 E ActivityManager: PID: 1480 (进程pid)07-20 15:36:58.711 1000 1520 1597 E ActivityManager: Reason: Input dispatching timed out (AppWindowToken{da8f666 token=Token{5501f51 ActivityRecord{15c5c78 u0 com.xxxx.moblie/.ui.MainActivity t3862}}}, Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)(ANR的原因,输入分发超时)07-20 15:36:58.711 1000 1520 1597 E ActivityManager: Load: 0.0 / 0.0 / 0.0 (Load表明是1分钟,5分钟,15分钟CPU的负载)07-20 15:36:58.711 1000 1520 1597 E ActivityManager: CPU usage from 20ms to 20286ms later (2018-07-20 15:36:36.170 to 2018-07-20 15:36:56.436):07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 42% 6774/pressure: 41% user + 1.4% kernel / faults: 168 minor07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 34% 142/kswapd0: 0% user + 34% kernel07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 31% 1520/system_server: 13% user + 18% kernel / faults: 58724 minor 1585 major07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 13% 29901/com.ss.android.article.news: 7.7% user + 6% kernel / faults: 56007 minor 2446 major07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 13% 32638/com.android.quicksearchbox: 9.4% user + 3.8% kernel / faults: 48999 minor 1540 major07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 11% (CPU的使用率)1480/com.xxxx.moblie: 5.2%(用户态的使用率) user + (内核态的使用率) 6.3% kernel / faults: 76401 minor 2422 major07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 8.2% 21000/kworker/u16:12: 0% user + 8.2% kernel07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 0.8% 724/mtd: 0% user + 0.8% kernel / faults: 1561 minor 9 major07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 8% 29704/kworker/u16:8: 0% user + 8% kernel07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 7.9% 24391/kworker/u16:18: 0% user + 7.9% kernel07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 7.1% 30656/kworker/u16:14: 0% user + 7.1% kernel07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 7.1% 9998/kworker/u16:4: 0% user + 7.1% kernel
通过上面所提供的案例我们可以分析出以下几点:
- ANR发生的位置是:com.xxxx.moblie/.ui.MainActivity
- com.xxxx.moblie 占用了11%的CPU,CPU的使用率并不是很高,基本可以排除CPU负载的原因
- Reason提示我们是输入分发超时导致的ANR
下面所提供的是一个ANR的真实日志案例
10-09 19:35:22.124 940 968 E ActivityManager: ANR in com.example.anrtest (com.example.anrtest/.MainActivity) // 记录ANR+进程名+Activity名称10-09 19:35:22.124 940 968 E ActivityManager: PID: 8390 //记录进程ID10-09 19:35:22.124 940 968 E ActivityManager: Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 28. Wait queue head age: 5517.5ms.)(ANR的原因,输入分发超时)10-09 19:35:22.124 940 968 E ActivityManager: Load: 2.52 / 2.57 / 2.7310-09 19:35:22.124 940 968 E ActivityManager: CPU usage from 99984ms to 0ms ago (2022-10-09 19:33:39.209 to 2022-10-09 19:35:19.194):10-09 19:35:22.124 940 968 E ActivityManager: 25% 445/surfaceflinger: 14% user + 11% kernel / faults: 11962 minor10-09 19:35:22.124 940 968 E ActivityManager: 11% 394/android.hardware.graphics.composer@2.2-service: 2.9% user + 8.2% kernel / faults: 1 minor10-09 19:35:22.124 940 968 E ActivityManager: 10% 2101/com.leapmotor.appcenter: 7.8% user + 3.1% kernel / faults: 926 minor10-09 19:35:22.124 940 968 E ActivityManager: 3.2% 1961/com.iflytek.cutefly.speechclient.hmi: 2.5% user + 0.7% kernel / faults: 2111 minor10-09 19:35:22.124 940 968 E ActivityManager: 1.8% 386/android.hardware.audio@2.0-service: 0.2% user + 1.6% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.9% 940/system_server: 0.6% user + 0.3% kernel / faults: 5334 minor10-09 19:35:22.124 940 968 E ActivityManager: 0.7% 440/audioserver: 0.3% user + 0.3% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.3% 277/apr_vm_cb_threa: 0% user + 0.3% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.3% 544/leap_uhab: 0% user + 0.2% kernel / faults: 20 minor10-09 19:35:22.124 940 968 E ActivityManager: 0.3% 5219/com.leapmotor.leapmotorsoscall: 0.1% user + 0.1% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.2% 2367/com.leapmotor.multimedia: 0% user + 0.2% kernel / faults: 9 minor10-09 19:35:22.124 940 968 E ActivityManager: 0.2% 8215/kworker/2:3: 0% user + 0.2% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.2% 3992/adbd: 0% user + 0.2% kernel / faults: 46652 minor10-09 19:35:22.124 940 968 E ActivityManager: 0.2% 7786/com.ebanma.tinyapp: 0.2% user + 0% kernel / faults: 557 minor10-09 19:35:22.124 940 968 E ActivityManager: 0.2% 7895/com.ebanma.tinyapp:DataCenterService: 0.2% user + 0% kernel / faults: 463 minor10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 543/leap_systemsdk_service: 0% user + 0.1% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 1392/com.leapmotor.cameraaround: 0% user + 0.1% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 410/leap_vsomeip_route: 0.1% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 3554/com.leapmotor.driverecord:emergency: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 411/leap_camera_around: 0% user + 0.1% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 2915/com.leapmotor.systemupdate: 0% user + 0% kernel / faults: 5 minor10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 9/rcu_preempt: 0% user + 0.1% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 412/leap_camera_face: 0% user + 0.1% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 413/leap_camera_front: 0% user + 0.1% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 2021/com.leapmotor.facevideo: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 2554/com.leapmotor.phone: 0% user + 0.1% kernel / faults: 10 minor10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 1916/com.leapmotor.log: 0% user + 0% kernel / faults: 2 minor10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 2188/com.leapmotor.camera:front_encode: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 2211/com.leapmotor.camera:around_encode: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 3541/com.leapmotor.driverecord:trip: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 285/logd: 0% user + 0% kernel / faults: 3 minor10-09 19:35:22.124 940 968 E ActivityManager: 0% 325/ais_v4l2_proxy: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 541/leap_shutdown: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 2243/com.leapmotor.camera:front_push_encode: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 1199/com.android.systemui: 0% user + 0% kernel / faults: 576 minor10-09 19:35:22.124 940 968 E ActivityManager: 0% 3492/com.leapmotor.driverecord: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 251/vlog: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 560/installd: 0% user + 0% kernel / faults: 8 minor10-09 19:35:22.124 940 968 E ActivityManager: 0% 4106/kworker/u8:0: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 6191/installer: 0% user + 0% kernel / faults: 295 minor10-09 19:35:22.124 940 968 E ActivityManager: 0% 8/ksoftirqd/0: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 18/ksoftirqd/1: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 1131/com.android.car: 0% user + 0% kernel / faults: 553 minor10-09 19:35:22.124 940 968 E ActivityManager: 0% 2134/com.leapmotor.bt:bt_service: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 7818/kworker/u9:0: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 24/ksoftirqd/2: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 512/leap_logcat: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 10/rcu_sched: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 30/ksoftirqd/3: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 70/system: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 240/kworker/u8:10: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 281/jbd2/vdb-8: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 286/servicemanager: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 287/hwservicemanager: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 330/zygote64: 0% user + 0% kernel / faults: 153 minor10-09 19:35:22.124 940 968 E ActivityManager: 0% 409/vendor.qti.hardware.perf@1.0-service: 0% user + 0% kernel / faults: 37 minor10-09 19:35:22.124 940 968 E ActivityManager: 0% 542/leap_vsomeip_qnx_heart: 0% user + 0% kernel / faults: 3 minor10-09 19:35:22.124 940 968 E ActivityManager: 0% 654/msm_irqbalance: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 677/ipacm: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 1366/.dataservices: 0% user + 0% kernel / faults: 3 minor10-09 19:35:22.124 940 968 E ActivityManager: 0% 2665/dmesg: 0% user + 0% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0% 2893/android.process.acore: 0% user + 0% kernel / faults: 35 minor10-09 19:35:22.124 2101 2120 I motor.appcente: Wrote stack traces to '[tombstoned]'10-09 19:35:22.125 940 969 W ActivityManager: Force finishing activity com.example.anrtest/.MainActivity10-09 19:35:22.127 940 969 I ActivityManager: saveTopActivity mTmpTop=com.example.anrtest/com.example.anrtest.MainActivity
通过上面所提供的案例我们可以分析出以下几点:
- ANR发生的位置是:com.example.anrtest/.MainActivity
- CPU的使用率都不是很高,基本可以排除CPU负载的原因
- Reason提示我们是输入分发超时导致的ANR
通过上面几点我们虽然排除了CPU过度负载的可能,但我们并不能准确定位出ANR的确切位置,要想准确定位出ANR发生的确切位置,就要借助系统为了解决ANR问题而提供的终极大杀器——traces.txt文件了。
五、 traces.txt 文件分析
当APP不响应、响应慢了、或者WatchDog的监视没有得到回应时,系统就会dump出一个traces.txt文件,存放在文件目录:/data/anr/文件夹中,通过traces文件,我们可以拿到线程名、堆栈信息、线程当前状态、binder call等信息。
我们可以通过adb命令获取到该文件夹下面的所有traces文件:adb pull /data/anr
trace:Cmd line:com.xxxx.moblie"main" prio=5 tid=1 Runnable | group="main" sCount=0 dsCount=0 obj=0x73bcc7d0 self=0x7f20814c00 | sysTid=20176 nice=-10 cgrp=default sched=0/0 handle=0x7f251349b0 | state=R schedstat=( 0 0 0 ) utm=12 stm=3 core=5 HZ=100 | stack=0x7fdb75e000-0x7fdb760000 stackSize=8MB | held mutexes= "mutator lock"(shared held) // java 堆栈调用信息,可以查看调用的关系,定位到具体位置 at ttt.push.InterceptorProxy.addMiuiApplication(InterceptorProxy.java:77) at ttt.push.InterceptorProxy.create(InterceptorProxy.java:59) at android.app.Activity.onCreate(Activity.java:1041) at miui.app.Activity.onCreate(SourceFile:47) at com.xxxx.moblie.ui.b.onCreate(SourceFile:172) at com.xxxx.moblie.ui.MainActivity.onCreate(SourceFile:68) at android.app.Activity.performCreate(Activity.java:7050) at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1214) at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2807) at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2929) at android.app.ActivityThread.-wrap11(ActivityThread.java:-1) at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1618) at android.os.Handler.dispatchMessage(Handler.java:105) at android.os.Looper.loop(Looper.java:171) at android.app.ActivityThread.main(ActivityThread.java:6699) at java.lang.reflect.Method.invoke(Native method) at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:246) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:783)
这里详细解析一下traces.txt里面的一些字段,看看它到底能给我们提供什么信息.
- main:main标识是主线程,如果是线程,那么命名成“Thread-X”的格式,x表示线程id,逐步递增。
- prio:线程优先级,默认是5
- tid:tid不是线程的id,是线程唯一标识ID
- group:是线程组名称
- sCount:该线程被挂起的次数
- dsCount:是线程被调试器挂起的次数
- obj:对象地址
- self:该线程Native的地址
- sysTid:是线程号(主线程的线程号和进程号相同)
- nice:是线程的调度优先级
- sched:分别标志了线程的调度策略和优先级
- cgrp:调度归属组
- handle:线程处理函数的地址。
- state:是调度状态
- schedstat:从 /proc/[pid]/task/[tid]/schedstat读出,三个值分别表示线程在cpu上执行的时间、线程的等待时间和线程执行的时间片长度,不支持这项信息的三个值都是0;
- utm:是线程用户态下使用的时间值(单位是jiffies)
- stm:是内核态下的调度时间值
- core:是最后执行这个线程的cpu核的序号。
Java的堆栈信息是我们最关心的,它能够定位到具体位置。从上面的traces,我们可以判断ttt.push.InterceptorProxy.addMiuiApplicationInterceptorProxy.java:77 导致了com.xxxx.moblie发生了ANR。这时候可以对着源码查看,找到出问题,并且解决它。
接下来再看一个真实的导致ANR的trace案例:
//开头显示进程号、ANR发生的时间点和进程名称----- pid 4972 at 2022-10-09 16:45:41 -----Cmd line: com.example.anrtest...省略...DALVIK THREADS (14)://以下是各个线程的函数堆栈信息//依次是:线程名(后面标识有daemon说明是个守护线程)、线程优先级、线程号、线程当前状态(TIMED_WAIT或SUSPENDED在anr时很常见)"Signal Catcher" daemon prio=5 tid=3 Runnable//依次是:线程组名称、suspendCount个数、debugSuspendCount个数、标记,线程的Java对象地址、线程的Native对象地址 | group="system" sCount=0 dsCount=0 flags=0 obj=0x15980100 self=0x7322e16400 //sysTid是线程号,主线程的线程号和进程号相同 | sysTid=4978 nice=0 cgrp=default sched=0/0 handle=0x731c3304f0 | state=R schedstat=( 5583230 6778645 10 ) utm=0 stm=0 core=3 HZ=100 | stack=0x731c235000-0x731c237000 stackSize=1009KB | held mutexes= "mutator lock"(shared held)//“Signal Catcher”负责接收和处理kernel发送的各种信号,例如SIGNAL_QUIT、SIGNAL_USR1等就是被该线程//接收到,这个文件的内容就是由该线程负责输出的,可以看到它的状态是RUNNABLE,不过此线程也不需要关心...省略...//主线程,当前处于休眠状态"main" prio=5 tid=1 Sleeping | group="main" sCount=1 dsCount=0 flags=1 obj=0x73e64b08 self=0x7322e14c00 | sysTid=4972 nice=-10 cgrp=default sched=0/0 handle=0x73a88cc548 | state=S schedstat=( 618097798 184039398 700 ) utm=56 stm=5 core=3 HZ=100 | stack=0x7fd05f2000-0x7fd05f4000 stackSize=8MB | held mutexes= at java.lang.Thread.sleep(Native method) - sleeping on <0x0d9c45cc> (a java.lang.Object) at java.lang.Thread.sleep(Thread.java:373) - locked <0x0d9c45cc> (a java.lang.Object) at java.lang.Thread.sleep(Thread.java:314) at android.os.SystemClock.sleep(SystemClock.java:127) at com.example.anrtest.MainActivity$1.onClick(MainActivity.java:57) at android.view.View.performClick(View.java:6597) at android.view.View.performClickInternal(View.java:6574) at android.view.View.access$3100(View.java:778) at android.view.View$PerformClick.run(View.java:25885) at android.os.Handler.handleCallback(Handler.java:873) at android.os.Handler.dispatchMessage(Handler.java:99) at android.os.Looper.loop(Looper.java:193) at android.app.ActivityThread.main(ActivityThread.java:6718) at java.lang.reflect.Method.invoke(Native method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858) ...省略..."Jit thread pool worker thread 0" daemon prio=5 tid=2 Native | group="main" sCount=1 dsCount=0 flags=1 obj=0x15980000 self=0x731c029000 | sysTid=4977 nice=9 cgrp=default sched=0/0 handle=0x731c4314f0 | state=S schedstat=( 122669387 79479997 188 ) utm=8 stm=3 core=3 HZ=100 | stack=0x731c333000-0x731c335000 stackSize=1021KB | held mutexes= ...省略...//Binder线程是进程的线程池中用来处理binder请求的线程"Binder:4972_1" prio=5 tid=9 Native | group="main" sCount=1 dsCount=0 flags=1 obj=0x159829a0 self=0x731a60d000 | sysTid=4984 nice=0 cgrp=default sched=0/0 handle=0x73099b34f0 | state=S schedstat=( 12586983 41079166 22 ) utm=1 stm=0 core=3 HZ=100 | stack=0x73098b8000-0x73098ba000 stackSize=1009KB | held mutexes= ...省略...
六、ANR分析流程总结
总结一下上面我们分析ANR的主体流程:
首先我们搜索am_anr,找到出现ANR的时间点、进程PID、ANR类型、然后再找搜索PID,找前5秒左右的日志。
2、过滤ANR IN 查看CPU信息
3、接着查看traces.txt,找到java的堆栈信息定位代码位置,最后查看源码,分析与解决问题。
到这里,通过上面三个步骤我们基本就能定位出来大部分ANR的来龙去脉了。
我在接下来的第六步和第七步还继续提供了多个ANR案例供大家参考分析,有兴趣的可以看着。
七、ANR 案例整理
主线程被其他线程lock,导致死锁
waiting on <0x1cd570> (a android.os.MessageQueue)
DALVIK THREADS:"main" prio=5 tid=3 TIMED_WAIT | group="main" sCount=1 dsCount=0 s=0 obj=0x400143a8 | sysTid=691 nice=0 sched=0/0 handle=-1091117924 at java.lang.Object.wait(Native Method) - waiting on <0x1cd570> (a android.os.MessageQueue) at java.lang.Object.wait(Object.java:195) at android.os.MessageQueue.next(MessageQueue.java:144) at android.os.Looper.loop(Looper.java:110) at android.app.ActivityThread.main(ActivityThread.java:3742) at java.lang.reflect.Method.invokeNative(Native Method) at java.lang.reflect.Method.invoke(Method.java:515) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:739) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:497) at dalvik.system.NativeStart.main(Native Method)"Binder Thread #3" prio=5 tid=15 NATIVE | group="main" sCount=1 dsCount=0 s=0 obj=0x434e7758 | sysTid=734 nice=0 sched=0/0 handle=1733632 at dalvik.system.NativeStart.run(Native Method)"Binder Thread #2" prio=5 tid=13 NATIVE | group="main" sCount=1 dsCount=0 s=0 obj=0x1cd570 | sysTid=696 nice=0 sched=0/0 handle=1369840 at dalvik.system.NativeStart.run(Native Method)"Binder Thread #1" prio=5 tid=11 NATIVE | group="main" sCount=1 dsCount=0 s=0 obj=0x433aca10 | sysTid=695 nice=0 sched=0/0 handle=1367448 at dalvik.system.NativeStart.run(Native Method)----- end 691 -----
主线程做耗时的操作:比如数据库读写。
"main" prio=5 tid=1 Nativeheld mutexes=kernel: (couldn't read /proc/self/task/11003/stack)native: #00 pc 000492a4 /system/lib/libc.so (nanosleep+12)native: #01 pc 0002dc21 /system/lib/libc.so (usleep+52)native: #02 pc 00009cab /system/lib/libsqlite.so (???)native: #03 pc 00011119 /system/lib/libsqlite.so (???)native: #04 pc 00016455 /system/lib/libsqlite.so (???)native: #16 pc 0000fa29 /system/lib/libsqlite.so (???)native: #17 pc 0000fad7 /system/lib/libsqlite.so (sqlite3_prepare16_v2+14)native: #18 pc 0007f671 /system/lib/libandroid_runtime.so (???)native: #19 pc 002b4721 /system/framework/arm/boot-framework.oat (Java_android_database_sqlite_SQLiteConnection_nativePrepareStatement__JLjava_lang_String_2+116)at android.database.sqlite.SQLiteConnection.setWalModeFromConfiguration(SQLiteConnection.java:294)at android.database.sqlite.SQLiteConnection.open(SQLiteConnection.java:215)at android.database.sqlite.SQLiteConnection.open(SQLiteConnection.java:193)at android.database.sqlite.SQLiteConnectionPool.openConnectionLocked(SQLiteConnectionPool.java:463)at android.database.sqlite.SQLiteConnectionPool.open(SQLiteConnectionPool.java:185)at android.database.sqlite.SQLiteConnectionPool.open(SQLiteConnectionPool.java:177)at android.database.sqlite.SQLiteDatabase.openInner(SQLiteDatabase.java:808)locked <0x0db193bf> (a java.lang.Object)at android.database.sqlite.SQLiteDatabase.open(SQLiteDatabase.java:793)at android.database.sqlite.SQLiteDatabase.openDatabase(SQLiteDatabase.java:696)at android.app.ContextImpl.openOrCreateDatabase(ContextImpl.java:690)at android.content.ContextWrapper.openOrCreateDatabase(ContextWrapper.java:299)at android.database.sqlite.SQLiteOpenHelper.getDatabaseLocked(SQLiteOpenHelper.java:223)at android.database.sqlite.SQLiteOpenHelper.getWritableDatabase(SQLiteOpenHelper.java:163)locked <0x045a4a8c> (a com.xxxx.video.common.data.DataBaseHelper)at com.xxxx.video.common.data.DataBaseORM.(DataBaseORM.java:46)at com.xxxx.video.common.data.DataBaseORM.getInstance(DataBaseORM.java:53)locked <0x017095d5> (a java.lang.Class)
binder数据量过大
07-21 04:43:21.573 1000 1488 12756 E Binder : Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 388568 (data: 1, 32, 7274595)07-21 04:43:21.573 1000 1488 12756 E Binder : android.util.Log$TerribleFailure: Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 388568 (data: 1, 32, 7274595)07-21 04:43:21.607 1000 1488 2951 E Binder : Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 211848 (data: 1, 23, 7274595)07-21 04:43:21.607 1000 1488 2951 E Binder : android.util.Log$TerribleFailure: Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 211848 (data: 1, 23, 7274595)07-21 04:43:21.662 1000 1488 6258 E Binder : Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 259300 (data: 1, 33, 7274595)
binder 通信失败
07-21 06:04:35.580 <6>[32837.690321] binder: 1698:2362 transaction failed 29189/-3, size 100-0 line 304207-21 06:04:35.594 <6>[32837.704042] binder: 1765:4071 transaction failed 29189/-3, size 76-0 line 304207-21 06:04:35.899 <6>[32838.009132] binder: 1765:4067 transaction failed 29189/-3, size 224-8 line 304207-21 06:04:36.018 <6>[32838.128903] binder: 1765:2397 transaction failed 29189/-22, size 348-0 line 2916
八、模拟触发ANR事件,获取trace文件
以下的几个案例,都是我通过代码故意触发ANR所产生的trace文件。
输入的事件在5秒内没有被响应,主线程被阻塞。
输入事件5s没有响应,如onClick事件。这是anr问题的主要类型,一般开发者不会犯这样的错,凡是耗时的操作都另开线程处理,如果疏忽了看看自己的代码就知道怎么处理。
(1)模拟触发ANR:
findViewById(R.id.btn_test).setOnClickListener(new View.OnClickListener() { @Override public void onClick(View v) { SystemClock.sleep(40000);//故意响应超时导致ANR } });
(2)对应的Trace文件关键日志:
DALVIK THREADS (14): //虚拟机线程被阻塞"Signal Catcher" daemon prio=5 tid=3 Runnable | group="system" sCount=0 dsCount=0 flags=0 obj=0x15980100 self=0x7322e16400 | sysTid=4978 nice=0 cgrp=default sched=0/0 handle=0x731c3304f0 | state=R schedstat=( 5583230 6778645 10 ) utm=0 stm=0 core=3 HZ=100 | stack=0x731c235000-0x731c237000 stackSize=1009KB | held mutexes= "mutator lock"(shared held)"main" prio=5 tid=1 Sleeping //主线程被阻塞 | group="main" sCount=1 dsCount=0 flags=1 obj=0x73e64b08 self=0x7322e14c00 | sysTid=4972 nice=-10 cgrp=default sched=0/0 handle=0x73a88cc548 | state=S schedstat=( 618097798 184039398 700 ) utm=56 stm=5 core=3 HZ=100 | stack=0x7fd05f2000-0x7fd05f4000 stackSize=8MB | held mutexes= at java.lang.Thread.sleep(Native method) - sleeping on <0x0d9c45cc> (a java.lang.Object) at java.lang.Thread.sleep(Thread.java:373) - locked <0x0d9c45cc> (a java.lang.Object) at java.lang.Thread.sleep(Thread.java:314) at android.os.SystemClock.sleep(SystemClock.java:127) //代码被阻塞的关键位置 at com.example.anrtest.MainActivity$1.onClick(MainActivity.java:57) //按钮的点击事件 at android.view.View.performClick(View.java:6597) at android.view.View.performClickInternal(View.java:6574) at android.view.View.access$3100(View.java:778) at android.view.View$PerformClick.run(View.java:25885) at android.os.Handler.handleCallback(Handler.java:873) at android.os.Handler.dispatchMessage(Handler.java:99) at android.os.Looper.loop(Looper.java:193) at android.app.ActivityThread.main(ActivityThread.java:6718) at java.lang.reflect.Method.invoke(Native method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)
广播接收者的onReceive方法在10秒内没有执行完毕。
BroadcastReceiver是在程序主线程运行,而且默认情况下BroadcastReceiver的运行时间为10s,所以不能有耗时操作,如果耗时超过10s就会导致anr,从traces文件log就可以看出onReceive不能进行耗时任务。
(1)模拟触发ANR:
IntentFilter intentFilter = new IntentFilter(); intentFilter.addAction("com.anr.test"); registerReceiver(new BroadcastReceiver() { @Override public void onReceive(Context context, Intent intent) { SystemClock.sleep(40000);//故意响应超时导致ANR } }, intentFilter);
(2)对应的Trace文件关键日志:
DALVIK THREADS (14): //虚拟机线程被阻塞"Signal Catcher" daemon prio=5 tid=3 Runnable | group="system" sCount=0 dsCount=0 flags=0 obj=0x13100088 self=0x7322e16400 | sysTid=6639 nice=0 cgrp=default sched=0/0 handle=0x731c3304f0 | state=R schedstat=( 6547603 5577708 11 ) utm=0 stm=0 core=0 HZ=100 | stack=0x731c235000-0x731c237000 stackSize=1009KB | held mutexes= "mutator lock"(shared held)"main" prio=5 tid=1 Sleeping //主线程被阻塞 | group="main" sCount=1 dsCount=0 flags=1 obj=0x73e64b08 self=0x7322e14c00 | sysTid=6633 nice=-10 cgrp=default sched=0/0 handle=0x73a88cc548 | state=S schedstat=( 605006551 138044643 533 ) utm=56 stm=3 core=2 HZ=100 | stack=0x7fd05f2000-0x7fd05f4000 stackSize=8MB | held mutexes= at java.lang.Thread.sleep(Native method) - sleeping on <0x08c05759> (a java.lang.Object) at java.lang.Thread.sleep(Thread.java:373) - locked <0x08c05759> (a java.lang.Object) at java.lang.Thread.sleep(Thread.java:314) at android.os.SystemClock.sleep(SystemClock.java:127) //代码被阻塞的关键位置 at com.example.anrtest.MainActivity$3.onReceive(MainActivity.java:80) //广播接收者的onReceive方法 at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0(LoadedApk.java:1391) at android.app.-$$Lambda$LoadedApk$ReceiverDispatcher$Args$_BumDX2UKsnxLVrE6UJsJZkotuA.run(lambda:-1) at android.os.Handler.handleCallback(Handler.java:873) at android.os.Handler.dispatchMessage(Handler.java:99) at android.os.Looper.loop(Looper.java:193) at android.app.ActivityThread.main(ActivityThread.java:6718) at java.lang.reflect.Method.invoke(Native method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)
BroadcastReceiver是Android的消息组件,用来组件之间、应用之间的消息传递,生命周期太短也不能开子线程处理耗时任务,耗时任务一般转交给IntentService或者JobIntentService去做。
服务Service没有及时响应
Service是计算型组件,虽然在后台运行,但是本质上它也跑在主线程,如果你的服务要做任何CPU密集型(如MP3播放)或阻塞(如网络)操作,都要放在子线程中,否则耗时超过20s就会导致anr。下面我在onStartCommand方法中休眠40秒。
(1)模拟触发ANR:
@Override public int onStartCommand(Intent intent, int flags, int startId) { SystemClock.sleep(40000);//故意响应超时导致ANR return super.onStartCommand(intent, flags, startId); }
(2)对应的Trace文件关键日志:
DALVIK THREADS (13): //虚拟机线程被阻塞"Signal Catcher" daemon prio=5 tid=3 Runnable | group="system" sCount=0 dsCount=0 flags=0 obj=0x13940108 self=0x7322e16400 | sysTid=7658 nice=0 cgrp=default sched=0/0 handle=0x731c3304f0 | state=R schedstat=( 5995782 10695989 10 ) utm=0 stm=0 core=3 HZ=100 | stack=0x731c235000-0x731c237000 stackSize=1009KB | held mutexes= "mutator lock"(shared held)"main" prio=5 tid=1 Sleeping //主线程被阻塞 | group="main" sCount=1 dsCount=0 flags=1 obj=0x73e64b08 self=0x7322e14c00 | sysTid=7652 nice=-10 cgrp=default sched=0/0 handle=0x73a88cc548 | state=S schedstat=( 595004855 101962599 596 ) utm=40 stm=18 core=2 HZ=100 | stack=0x7fd05f2000-0x7fd05f4000 stackSize=8MB | held mutexes= at java.lang.Thread.sleep(Native method) - sleeping on <0x0e096f2a> (a java.lang.Object) at java.lang.Thread.sleep(Thread.java:373) - locked <0x0e096f2a> (a java.lang.Object) at java.lang.Thread.sleep(Thread.java:314) at android.os.SystemClock.sleep(SystemClock.java:127) //代码被阻塞的关键位置 at com.example.anrtest.ANRService.onStartCommand(ANRService.java:17) //服务的onStartCommand方法 at android.app.ActivityThread.handleServiceArgs(ActivityThread.java:3705) at android.app.ActivityThread.access$1600(ActivityThread.java:200) at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1688) at android.os.Handler.dispatchMessage(Handler.java:106) at android.os.Looper.loop(Looper.java:193) at android.app.ActivityThread.main(ActivityThread.java:6718) at java.lang.reflect.Method.invoke(Native method) at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)
来源地址:https://blog.csdn.net/abc6368765/article/details/127220609
免责声明:
① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的,并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据,供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。
② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341