200 lines
9.0 KiB
Markdown
200 lines
9.0 KiB
Markdown
Android Live-LocK Daemon
|
|
========================
|
|
|
|
Introduction
|
|
------------
|
|
|
|
Android Live-LocK Daemon (llkd) is used to catch kernel deadlocks and mitigate.
|
|
|
|
Code is structured to allow integration into another service as either as part
|
|
of the main loop, or spun off as a thread should that be necessary. A default
|
|
standalone implementation is provided by llkd component.
|
|
|
|
The 'C' interface from libllkd component is thus:
|
|
|
|
#include "llkd.h"
|
|
bool llkInit(const char* threadname) /* return true if enabled */
|
|
unsigned llkCheckMillseconds(void) /* ms to sleep for next check */
|
|
|
|
If a threadname is provided, a thread will be automatically spawned, otherwise
|
|
caller must call llkCheckMilliseconds in its main loop. Function will return
|
|
the period of time before the next expected call to this handler.
|
|
|
|
Operations
|
|
----------
|
|
|
|
There are two detection scenarios. Persistent D or Z state, and persistent
|
|
stack signature.
|
|
|
|
If a thread is in D or Z state with no forward progress for longer than
|
|
ro.llk.timeout_ms, or ro.llk.[D|Z].timeout_ms, kill the process or parent
|
|
process respectively. If another scan shows the same process continues to
|
|
exist, then have a confirmed live-lock condition and need to panic. Panic
|
|
the kernel in a manner to provide the greatest bugreporting details as to the
|
|
condition. Add a alarm self watchdog should llkd ever get locked up that is
|
|
double the expected time to flow through the mainloop. Sampling is every
|
|
ro.llk_sample_ms.
|
|
|
|
For usedebug releases only, persistent stack signature checking is enabled.
|
|
If a thread in any state but Z, has a persistent listed ro.llk.stack kernel
|
|
symbol always being reported, even if there is forward scheduling progress, for
|
|
longer than ro.llk.timeout_ms, or ro.llk.stack.timeout_ms, then issue a kill
|
|
to the process. If another scan shows the same process continues to exist,
|
|
then have a confirmed live-lock condition and need to panic. There is no
|
|
ABA detection since forward scheduling progress is allowed, thus the condition
|
|
for the symbols are:
|
|
|
|
- Check is looking for " __symbol__+0x" or " __symbol__.cfi+0x" in
|
|
/proc/__pid__/stack.
|
|
- The __symbol__ should be rare and short lived enough that on a typical
|
|
system the function is seen at most only once in a sample over the timeout
|
|
period of ro.llk.stack.timeout_ms, samples occur every ro.llk.check_ms. This
|
|
can be the only way to prevent a false trigger as there is no ABA protection.
|
|
- Persistent continuously when the live lock condition exists.
|
|
- Should be just below the function that is calling the lock that could
|
|
contend, because if the lock is below or in the symbol function, the
|
|
symbol will show in all affected processes, not just the one that
|
|
caused the lockup.
|
|
|
|
Default will not monitor init, or [kthreadd] and all that [kthreadd] spawns.
|
|
This reduces the effectiveness of llkd by limiting its coverage. If there is
|
|
value in covering [kthreadd] spawned threads, the requirement will be that
|
|
the drivers not remain in a persistent 'D' state, or that they have mechanisms
|
|
to recover the thread should it be killed externally (this is good driver
|
|
coding hygiene, a common request to add such to publicly reviewed kernel.org
|
|
maintained drivers). For instance use wait_event_interruptible() instead of
|
|
wait_event(). The blacklists can be adjusted accordingly if these
|
|
conditions are met to cover kernel components. For the stack symbol checking,
|
|
there is an additional process blacklist so that we do not incide sepolicy
|
|
violations on services that block ptrace operations.
|
|
|
|
An accompanying gTest set have been added, and will setup a persistent D or Z
|
|
process, with and without forward progress, but not in a live-lock state
|
|
because that would require a buggy kernel, or a module or kernel modification
|
|
to stimulate. The test will check that llkd will mitigate first by killing
|
|
the appropriate process. D state is setup by vfork() waiting for exec() in
|
|
child process. Z state is setup by fork() and an un-waited for child process.
|
|
Should be noted that both of these conditions should never happen on Android
|
|
on purpose, and llkd effectively sweeps up processes that create these
|
|
conditions. If the test can, it will reconfigure llkd to expedite the test
|
|
duration by adjusting the ro.llk.* Android properties. Tests run the D state
|
|
with some scheduling progress to ensure that ABA checking prevents false
|
|
triggers. If 100% reliable ABA on platform, then ro.llk.killtest can be
|
|
set to false; however this will result in some of the unit tests to panic
|
|
kernel instead of deal with more graceful kill operation.
|
|
|
|
Android Properties
|
|
------------------
|
|
|
|
The following are the Android Properties llkd respond to.
|
|
*prop*_ms named properties are in milliseconds.
|
|
Properties that use comma (*,*) separator for lists, use a leading separator to
|
|
preserve default and add or subtract entries with (*optional*) plus (*+*) and
|
|
minus (*-*) prefixes respectively.
|
|
For these lists, the string "*false*" is synonymous with an *empty* list,
|
|
and *blank* or *missing* resorts to the specified *default* value.
|
|
|
|
#### ro.config.low_ram
|
|
device is configured with limited memory.
|
|
|
|
#### ro.debuggable
|
|
device is configured for userdebug or eng build.
|
|
|
|
#### ro.llk.sysrq_t
|
|
default not ro.config.low_ram, or ro.debuggable if property is "eng".
|
|
if true do sysrq t (dump all threads).
|
|
|
|
#### ro.llk.enable
|
|
default false, allow live-lock daemon to be enabled.
|
|
|
|
#### llk.enable
|
|
default ro.llk.enable, and evaluated for eng.
|
|
|
|
#### ro.khungtask.enable
|
|
default false, allow [khungtask] daemon to be enabled.
|
|
|
|
#### khungtask.enable
|
|
default ro.khungtask.enable and evaluated for eng.
|
|
|
|
#### ro.llk.mlockall
|
|
default false, enable call to mlockall().
|
|
|
|
#### ro.khungtask.timeout
|
|
default value 12 minutes, [khungtask] maximum timelimit.
|
|
|
|
#### ro.llk.timeout_ms
|
|
default 10 minutes, D or Z maximum timelimit, double this value and it sets
|
|
the alarm watchdog for llkd.
|
|
|
|
#### ro.llk.D.timeout_ms
|
|
default ro.llk.timeout_ms, D maximum timelimit.
|
|
|
|
#### ro.llk.Z.timeout_ms
|
|
default ro.llk.timeout_ms, Z maximum timelimit.
|
|
|
|
#### ro.llk.stack.timeout_ms
|
|
default ro.llk.timeout_ms,
|
|
checking for persistent stack symbols maximum timelimit.
|
|
Only active on userdebug or eng builds.
|
|
|
|
#### ro.llk.check_ms
|
|
default 2 minutes samples of threads for D or Z.
|
|
|
|
#### ro.llk.stack
|
|
default cma_alloc,__get_user_pages,bit_wait_io,wait_on_page_bit_killable
|
|
comma separated list of kernel symbols.
|
|
Look for kernel stack symbols that if ever persistently present can
|
|
indicate a subsystem is locked up.
|
|
Beware, check does not on purpose do forward scheduling ABA except by polling
|
|
every ro.llk_check_ms over the period ro.llk.stack.timeout_ms, so stack symbol
|
|
should be exceptionally rare and fleeting.
|
|
One must be convinced that it is virtually *impossible* for symbol to show up
|
|
persistently in all samples of the stack.
|
|
Again, looks for a match for either " **symbol**+0x" or " **symbol**.cfi+0x"
|
|
in stack expansion.
|
|
Only available on userdebug or eng builds, limited privileges due to security
|
|
concerns on user builds prevents this checking.
|
|
|
|
#### ro.llk.blacklist.process
|
|
default 0,1,2 (kernel, init and [kthreadd]) plus process names
|
|
init,[kthreadd],[khungtaskd],lmkd,llkd,watchdogd,
|
|
[watchdogd],[watchdogd/0],...,[watchdogd/***get_nprocs**-1*].
|
|
Do not watch these processes. A process can be comm, cmdline or pid reference.
|
|
NB: automated default here can be larger than the current maximum property
|
|
size of 92.
|
|
NB: false is a very very very unlikely process to want to blacklist.
|
|
|
|
#### ro.llk.blacklist.parent
|
|
default 0,2,adbd&[setsid] (kernel, [kthreadd] and adbd *only for zombie setsid*).
|
|
Do not watch processes that have this parent.
|
|
An ampersand (*&*) separator is used to specify that the parent is ignored
|
|
only in combination with the target child process.
|
|
Ampersand was selected because it is never part of a process name,
|
|
however a setprop in the shell requires it to be escaped or quoted;
|
|
init rc file where this is normally specified does not have this issue.
|
|
A parent or target processes can be specified as comm, cmdline or pid reference.
|
|
|
|
#### ro.llk.blacklist.uid
|
|
default *empty* or false, comma separated list of uid numbers or names.
|
|
Do not watch processes that match this uid.
|
|
|
|
#### ro.llk.blacklist.process.stack
|
|
default process names init,lmkd.llkd,llkd,keystore,ueventd,apexd,logd.
|
|
This subset of processes are not monitored for live lock stack signatures.
|
|
Also prevents the sepolicy violation associated with processes that block
|
|
ptrace, as these can not be checked anyways.
|
|
Only active on userdebug and eng builds.
|
|
|
|
Architectural Concerns
|
|
----------------------
|
|
|
|
- built-in [khungtask] daemon is too generic and trips on driver code that
|
|
sits around in D state too much. To switch to S instead makes the task(s)
|
|
killable, so the drivers should be able to resurrect them if needed.
|
|
- Properties are limited to 92 characters.
|
|
- Create kernel module and associated gTest to actually test panic.
|
|
- Create gTest to test out blacklist (ro.llk.blacklist.*properties* generally
|
|
not be inputs). Could require more test-only interfaces to libllkd.
|
|
- Speed up gTest using something else than ro.llk.*properties*, which should
|
|
not be inputs as they should be baked into the product.
|