文章目录
- 前言
- OOMMonitorInitTask.INSTANCE.init
- OOMMonitor.INSTANCE.startLoop
- super.startLoop
- call() == LoopState.Terminate
- dumpAndAnalysis
- dump
- startAnalysisService
- 回到startLoop方法
- 总结
前言
这篇文章主要剖析KOOM的Java层源码设计逻辑。
使用篇请看上一篇:
【Android KOOM】KOOM java leak使用全解析
OOMMonitorInitTask.INSTANCE.init
OOMMonitorInitTask.INSTANCE.init(JavaLeakTestActivity.this.getApplication());
这里进行初始化,来看看init里面做了什么:
object OOMMonitorInitTask : InitTask {
override fun init(application: Application) {
val config = OOMMonitorConfig.Builder()
.setThreadThreshold(50) //50 only for test! Please use default value!
.setFdThreshold(300) // 300 only for test! Please use default value!
.setHeapThreshold(0.9f) // 0.9f for test! Please use default value!
.setVssSizeThreshold(1_000_000) // 1_000_000 for test! Please use default value!
.setMaxOverThresholdCount(1) // 1 for test! Please use default value!
.setAnalysisMaxTimesPerVersion(3) // Consider use default value!
.setAnalysisPeriodPerVersion(15 * 24 * 60 * 60 * 1000) // Consider use default value!
.setLoopInterval(5_000) // 5_000 for test! Please use default value!
.setEnableHprofDumpAnalysis(true)
.setHprofUploader(object : OOMHprofUploader {
override fun upload(file: File, type: OOMHprofUploader.HprofType) {
MonitorLog.e("OOMMonitor", "todo, upload hprof ${file.name} if necessary")
}
})
.setReportUploader(object : OOMReportUploader {
override fun upload(file: File, content: String) {
MonitorLog.i("OOMMonitor", content)
MonitorLog.e("OOMMonitor", "todo, upload report ${file.name} if necessary")
}
})
.build()
MonitorManager.addMonitorConfig(config)
}
}
可以看到里面做了各种参数的配置,包括上传hprof和报告的上传回调。
使用了构建者模式来进行参数设置,接着通过MonitorManager.addMonitorConfig(config)
添加到MonitorManager中,可见MonitorManager这个类就是监控器管理用的。
interface InitTask {
fun init(application: Application)
}
定义了一个接口,用来初始化内存监控任务。参数是需要传递Application,但是这里没有看到有使用到。
OOMMonitor.INSTANCE.startLoop
OOMMonitor.INSTANCE.startLoop(true, false,5_000L);
上面配置好咯参数和回调,这里就是开始循环。下面来看看里面做了什么。
object OOMMonitor : LoopMonitor<OOMMonitorConfig>(), LifecycleEventObserver {
@Volatile
private var mIsLoopStarted = false
...
override fun startLoop(clearQueue: Boolean, postAtFront: Boolean, delayMillis: Long) {
throwIfNotInitialized { return }
if (!isMainProcess()) {
return
}
MonitorLog.i(TAG, "startLoop()")
if (mIsLoopStarted) {
return
}
mIsLoopStarted = true
super.startLoop(clearQueue, postAtFront, delayMillis)
getLoopHandler().postDelayed({ async { processOldHprofFile() } }, delayMillis)
}
...
}
判断下,假如非主线程,立刻返回。这里可以看出来,调用的地方必须是主线程,不然它就不会执行。
来看下mIsLoopStarted,它被Volatile修饰。Volatile的作用是可以把对应的变量刷新到Cpu缓存中,保证了多线程环境变量的可见性。假如有其他线程修改了这个变量,那么其他线程可以立刻知道。
而这里判断假如loop已经开始,那么也return掉。这些属于健壮性代码。
super.startLoop
看下super.startLoop:
open fun startLoop(
clearQueue: Boolean = true,
postAtFront: Boolean = false,
delayMillis: Long = 0L
) {
if (clearQueue) getLoopHandler().removeCallbacks(mLoopRunnable)
if (postAtFront) {
getLoopHandler().postAtFrontOfQueue(mLoopRunnable)
} else {
getLoopHandler().postDelayed(mLoopRunnable, delayMillis)
}
mIsLoopStopped = false
}
这里可看到围绕着mLoopRunnable来做功夫。首先看看是否需要清理之前的mLoopRunnable,接着根据参数,决定把runable post到消息队列的哪种情况中,这个稍后研究。这里先看看哪里传入的Handler。
通过跳转,找到了这里:
package com.kwai.koom.base.loop
import android.os.Handler
import android.os.HandlerThread
import android.os.Process.THREAD_PRIORITY_BACKGROUND
internal object LoopThread : HandlerThread("LoopThread", THREAD_PRIORITY_BACKGROUND) {
init {
start()
}
internal val LOOP_HANDLER = Handler(LoopThread.looper)
}
这里是一个HandlerThread,至于HandlerThread。并且LoopThread它在初始化就执行start方法来启动线程。
接着看mLoopRunnable
protected open fun getLoopInterval(): Long {
return DEFAULT_LOOP_INTERVAL
}
companion object {
private const val DEFAULT_LOOP_INTERVAL = 1000L
}
private val mLoopRunnable = object : Runnable {
override fun run() {
if (call() == LoopState.Terminate) {
return
}
if (mIsLoopStopped) {
return
}
getLoopHandler().removeCallbacks(this)
getLoopHandler().postDelayed(this, getLoopInterval())
}
}
这里就是拿到handler,执行postDelayed,间隔设置为1秒。
call() == LoopState.Terminate
这行代码是关键,假如LoopState.Terminate,是结束状态的话,那就执行call方法。
看下OOMMonitor的实现:
override fun call(): LoopState {
if (!sdkVersionMatch()) {
return LoopState.Terminate
}
if (mHasDumped) {
return LoopState.Terminate
}
return trackOOM()
}
假如dump完成,就返回terminate状态。继续看trackOOM方法:
private fun trackOOM(): LoopState {
SystemInfo.refresh()
mTrackReasons.clear()
for (oomTracker in mOOMTrackers) {
if (oomTracker.track()) {
mTrackReasons.add(oomTracker.reason())
}
}
if (mTrackReasons.isNotEmpty() && monitorConfig.enableHprofDumpAnalysis) {
if (isExceedAnalysisPeriod() || isExceedAnalysisTimes()) {
MonitorLog.e(TAG, "Triggered, but exceed analysis times or period!")
} else {
async {
MonitorLog.i(TAG, "mTrackReasons:${mTrackReasons}")
dumpAndAnalysis()
}
}
return LoopState.Terminate
}
return LoopState.Continue
}
看下refresh方法:
var procStatus = ProcStatus()
var lastProcStatus = ProcStatus()
var memInfo = MemInfo()
var lastMemInfo = MemInfo()
var javaHeap = JavaHeap()
var lastJavaHeap = JavaHeap()
fun refresh() {
lastJavaHeap = javaHeap
lastMemInfo = memInfo
lastProcStatus = procStatus
javaHeap = JavaHeap()
procStatus = ProcStatus()
memInfo = MemInfo()
javaHeap.max = Runtime.getRuntime().maxMemory()
javaHeap.total = Runtime.getRuntime().totalMemory()
javaHeap.free = Runtime.getRuntime().freeMemory()
javaHeap.used = javaHeap.total - javaHeap.free
javaHeap.rate = 1.0f * javaHeap.used / javaHeap.max
File("/proc/self/status").forEachLineQuietly { line ->
if (procStatus.vssInKb != 0 && procStatus.rssInKb != 0
&& procStatus.thread != 0) return@forEachLineQuietly
when {
line.startsWith("VmSize") -> {
procStatus.vssInKb = VSS_REGEX.matchValue(line)
}
line.startsWith("VmRSS") -> {
procStatus.rssInKb = RSS_REGEX.matchValue(line)
}
line.startsWith("Threads") -> {
procStatus.thread = THREADS_REGEX.matchValue(line)
}
}
}
File("/proc/meminfo").forEachLineQuietly { line ->
when {
line.startsWith("MemTotal") -> {
memInfo.totalInKb = MEM_TOTAL_REGEX.matchValue(line)
}
line.startsWith("MemFree") -> {
memInfo.freeInKb = MEM_FREE_REGEX.matchValue(line)
}
line.startsWith("MemAvailable") -> {
memInfo.availableInKb = MEM_AVA_REGEX.matchValue(line)
}
line.startsWith("CmaTotal") -> {
memInfo.cmaTotal = MEM_CMA_REGEX.matchValue(line)
}
line.startsWith("ION_heap") -> {
memInfo.IONHeap = MEM_ION_REGEX.matchValue(line)
}
}
}
memInfo.rate = 1.0f * memInfo.availableInKb / memInfo.totalInKb
MonitorLog.i(TAG, "----OOM Monitor Memory----")
MonitorLog.i(TAG,"[java] max:${javaHeap.max} used ratio:${(javaHeap.rate * 100).toInt()}%")
MonitorLog.i(TAG,"[proc] VmSize:${procStatus.vssInKb}kB VmRss:${procStatus.rssInKb}kB " + "Threads:${procStatus.thread}")
MonitorLog.i(TAG,"[meminfo] MemTotal:${memInfo.totalInKb}kB MemFree:${memInfo.freeInKb}kB " + "MemAvailable:${memInfo.availableInKb}kB")
MonitorLog.i(TAG,"avaliable ratio:${(memInfo.rate * 100).toInt()}% CmaTotal:${memInfo.cmaTotal}kB ION_heap:${memInfo.IONHeap}kB")
}
SystemInfo类里面有很多Java堆,内存信息,进程状态相关的类。这里面可以看出,这个类就是用来把一些监控到的数据刷新和写入文件里面的。当然,还有log输出。
再看mOOMTrackers,分别是各个跟踪器
private val mOOMTrackers = mutableListOf(
HeapOOMTracker(), ThreadOOMTracker(), FdOOMTracker(),
PhysicalMemoryOOMTracker(), FastHugeMemoryOOMTracker()
)
他们抽象父类是:
abstract class OOMTracker : Monitor<OOMMonitorConfig>() {
/**
* @return true 表示追踪到oom、 false 表示没有追踪到oom
*/
abstract fun track(): Boolean
/**
* 重置track状态
*/
abstract fun reset()
/**
* @return 追踪到的oom的标识
*/
abstract fun reason(): String
}
至于具体怎么track,由于篇幅和内容方向问题,这篇文章先不进一步分析。留到后面的文章继续。
回到trackOOM方法:
mTrackReasons.clear()
for (oomTracker in mOOMTrackers) {
if (oomTracker.track()) {
mTrackReasons.add(oomTracker.reason())
}
}
if (mTrackReasons.isNotEmpty() && monitorConfig.enableHprofDumpAnalysis) {
if (isExceedAnalysisPeriod() || isExceedAnalysisTimes()) {
MonitorLog.e(TAG, "Triggered, but exceed analysis times or period!")
} else {
async {
MonitorLog.i(TAG, "mTrackReasons:${mTrackReasons}")
dumpAndAnalysis()
}
}
假如track到了原因,它就添加mTrackReasons。
假如分析超过时间和次数,就打印error。其它正常情况就打印mTrackReasons,执行dumpAndAnalysis,然后返回LoopState.Terminate状态。
下面重点看看dumpAndAnalysis方法:
dumpAndAnalysis
private fun dumpAndAnalysis() {
MonitorLog.i(TAG, "dumpAndAnalysis");
runCatching {
if (!OOMFileManager.isSpaceEnough()) {
MonitorLog.e(TAG, "available space not enough", true)
return@runCatching
}
if (mHasDumped) {
return
}
mHasDumped = true
val date = Date()
val jsonFile = OOMFileManager.createJsonAnalysisFile(date)
val hprofFile = OOMFileManager.createHprofAnalysisFile(date).apply {
createNewFile()
setWritable(true)
setReadable(true)
}
MonitorLog.i(TAG, "hprof analysis dir:$hprofAnalysisDir")
ForkJvmHeapDumper.getInstance().run {
dump(hprofFile.absolutePath)
}
MonitorLog.i(TAG, "end hprof dump", true)
Thread.sleep(1000) // make sure file synced to disk.
MonitorLog.i(TAG, "start hprof analysis")
startAnalysisService(hprofFile, jsonFile, mTrackReasons.joinToString())
}.onFailure {
it.printStackTrace()
MonitorLog.i(TAG, "onJvmThreshold Exception " + it.message, true)
}
}
这里面正式把track到的数据写入到文件中,包括json文件和hprof文件。重点看dump方法:
dump
@Override
public synchronized boolean dump(String path) {
MonitorLog.i(TAG, "dump " + path);
if (!sdkVersionMatch()) {
throw new UnsupportedOperationException("dump failed caused by sdk version not supported!");
}
init();
if (!mLoadSuccess) {
MonitorLog.e(TAG, "dump failed caused by so not loaded!");
return false;
}
boolean dumpRes = false;
try {
MonitorLog.i(TAG, "before suspend and fork.");
int pid = suspendAndFork();
if (pid == 0) {
// Child process
Debug.dumpHprofData(path);
exitProcess();
} else if (pid > 0) {
// Parent process
dumpRes = resumeAndWait(pid);
MonitorLog.i(TAG, "dump " + dumpRes + ", notify from pid " + pid);
}
} catch (IOException e) {
MonitorLog.e(TAG, "dump failed caused by " + e);
e.printStackTrace();
}
return dumpRes;
}
init方法:
private void init () {
if (mLoadSuccess) {
return;
}
if (loadSoQuietly("koom-fast-dump")) {
mLoadSuccess = true;
nativeInit();
}
}
这里加载一个so库,可以看到还有这些native方法:
/**
* Init before do dump.
*/
private native void nativeInit();
/**
* Suspend the whole ART, and then fork a process for dumping hprof.
*
* @return return value of fork
*/
private native int suspendAndFork();
/**
* Resume the whole ART, and then wait child process to notify.
*
* @param pid pid of child process.
*/
private native boolean resumeAndWait(int pid);
/**
* Exit current process.
*/
private native void exitProcess();
接着执行suspendAndFork,也是native方法。拿到进程pid之后,fork当前进程。然后dump hprof文件。
至于为什么需要fork一个进程出来dump,可以通过上面截图看出来原因,dump hprof 数据的时候会触发GC,而GC会出发STW,这无疑会造成APP卡顿。这也是LeakCanary不能做成线上内存监控的主要原因,而KOOM解决了这个问题。
子进程dump工作做完之后,接着exitProcess退出。
假如pid > 0,resumeAndWait,就恢复整个ART虚拟机,然后等待子线程唤醒。
这里逻辑我说的有点不清晰,由于看不到so的代码,无法确认。有知道的大佬可以指点一下,感激。
startAnalysisService
前面fork子进程后,执行了 Thread.sleep(1000) // make sure file synced to disk.
。
接着看是分析堆转信息工作:
private fun startAnalysisService(
hprofFile: File,
jsonFile: File,
reason: String
) {
if (hprofFile.length() == 0L) {
hprofFile.delete()
MonitorLog.i(TAG, "hprof file size 0", true)
return
}
if (!getApplication().isForeground) {
MonitorLog.e(TAG, "try startAnalysisService, but not foreground")
mForegroundPendingRunnables.add(Runnable {
startAnalysisService(
hprofFile,
jsonFile,
reason
)
})
return
}
OOMPreferenceManager.increaseAnalysisTimes()
val extraData = AnalysisExtraData().apply {
this.reason = reason
this.currentPage = getApplication().currentActivity?.localClassName.orEmpty()
this.usageSeconds = "${(SystemClock.elapsedRealtime() - mMonitorInitTime) / 1000}"
}
HeapAnalysisService.startAnalysisService(
getApplication(),
hprofFile.canonicalPath,
jsonFile.canonicalPath,
extraData,
object : AnalysisReceiver.ResultCallBack {
override fun onError() {
MonitorLog.e(TAG, "heap analysis error, do file delete", true)
hprofFile.delete()
jsonFile.delete()
}
override fun onSuccess() {
MonitorLog.i(TAG, "heap analysis success, do upload", true)
val content = jsonFile.readText()
MonitorLogger.addExceptionEvent(content, Logger.ExceptionType.OOM_STACKS)
monitorConfig.reportUploader?.upload(jsonFile, content)
monitorConfig.hprofUploader?.upload(hprofFile, OOMHprofUploader.HprofType.ORIGIN)
}
})
}
这里就是进行针对一些dump数据进行解析、整理等工作,假如需要上传到服务器,这里也预留了接口供开发者使用,非常贴心。
到这里KOOM框架的Java层核心代码逻辑基本过完了。
回到startLoop方法
回到startLoop方法中super.startLoop 方法,下一行代码是:
getLoopHandler().postDelayed({ async { processOldHprofFile() } }, delayMillis)
前面分析知道,getLoopHandler拿到的是HandlerThread,这里延时post一个runable消息给它。这里使用协程来执行。
重点需要关注的是processOldHprofFile。
object OOMMonitor : LoopMonitor<OOMMonitorConfig>(), LifecycleEventObserver {
private const val TAG = "OOMMonitor"
...
private fun processOldHprofFile() {
MonitorLog.i(TAG, "processHprofFile")
if (mHasProcessOldHprof) {
return
}
mHasProcessOldHprof = true;
reAnalysisHprof()
manualDumpHprof()
}
...
private fun reAnalysisHprof() {
for (file in hprofAnalysisDir.listFiles().orEmpty()) {
if (!file.exists()) continue
if (!file.name.startsWith(MonitorBuildConfig.VERSION_NAME)) {
MonitorLog.i(TAG, "delete other version files ${file.name}")
file.delete()
continue
}
if (file.canonicalPath.endsWith(".hprof")) {
val jsonFile = File(file.canonicalPath.replace(".hprof", ".json"))
if (!jsonFile.exists()) {
MonitorLog.i(TAG, "create json file and then start service")
jsonFile.createNewFile()
startAnalysisService(file, jsonFile, "reanalysis")
} else {
MonitorLog.i(
TAG,
if (jsonFile.length() == 0L) "last analysis isn't succeed, delete file"
else "delete old files", true
)
jsonFile.delete()
file.delete()
}
}
}
}
private fun manualDumpHprof() {
for (hprofFile in manualDumpDir.listFiles().orEmpty()) {
MonitorLog.i(TAG, "manualDumpHprof upload:${hprofFile.absolutePath}")
monitorConfig.hprofUploader?.upload(hprofFile, OOMHprofUploader.HprofType.STRIPPED)
}
}
}
里面就是操作dump出来的文件,判断当前的版本,假如是旧的,删掉重写等逻辑。
总结
截止到这里,我们开始监控的这两行代码分析完毕:
/*
* Init OOMMonitor
*/
OOMMonitorInitTask.INSTANCE.init(JavaLeakTestActivity.this.getApplication());
OOMMonitor.INSTANCE.startLoop(true, false,5_000L);
很简单的两行代码,里面包含了如此之多的业务逻辑和精彩的设计。
很多时候,我们使用越是简单的开源框架,越是能证明作者的厉害之处。他们把繁杂的逻辑内聚到了框架里面,让使用者能用简单一两行代码实现复杂的逻辑业务。
KOOM作为一个线上内存监控框架,有很多优秀的设计。这篇文章也只是在外层分析了一些表面的技术逻辑,至于更深入的内容,后续会继续更新。