Canary[PLDI'21]
Practical Static Detection of Inter-thread Value-Flow Bugs [PLDI'21]
- 1. Abstract
- 2. Goal: Detect concurrent bugs correctly
- 3. Proposed Method: Canary
- 4. Implementation
- 5. Evaluation
- 6. Related Work
- 7. Conclusion
- 8. Relation to my research
1. Abstract
Problem
- Concurrent programs are error prone because of
- interleavings of threads
- state explosion
- computational complexity
-
Precise inter-thread pointer analysis is difficult because of:
- thread interference (inter-thread data dependencies)
- difficulty in exploring only the necessary interleaving space for specific properties
Proposed Method: Canary
-
A value-flow analysis framework for concurrent programs
- Build inter-thread VFG
- Convert concurrency bug detection into source-sink reachability problem
Result
- Detect 18 previously unknown concurrency bugs in famous projects
2. Goal: Detect concurrent bugs correctly
bool cond;
// Thread 1
void thread1() {
int **x = malloc(); // alloc_x
int *a = malloc(); // alloc_a
*x = a; // alloc_x -> alloc_a
fork(thread2, x);
if(cond) {
int *c = *x; // c -> {alloc_a, alloc_b}
print(*c); // May-use alloc_b...?
}
}//
// Thread 2
void thread2(int **y) {
int *b = malloc(); // alloc_b
if (!cond) {
*y = b; // alloc_x -> alloc_b
free(b); // free alloc_b
}
}Q. In this code, does use-after-free occur?
A. No, because ofif(cond) // Thread 1 and if (!cond) // Thread 2
-
(Inter-thread) Data Dependencies
bandc
-
Realizable Path (control-flow)
free(b)print(*c)
Detection Target: src-sink reachability
Canary detect concurrency bugs via src-sink properties:- There are data-dependencies between src and sink.
- There are realizable paths between src and sink.
-
Concurrency bugs
- Inter-thread use-after-free
- NULL pointer dereference
- Double free
- Information leak
- Race condition (order violation)
// Thread 1 (src) // Thread 2 (sink)
// use-after-free
free(x);
use(x);
// race condition
if (i < N)
i++;
return arr[i];3. Proposed Method: Canary
Workflow of Canary
- (Intra-thread) Data Dependence Analysis
-
Interference Dependence Analysis
- a.k.a. Inter-thread Data Dependence Analysis
-
Source-Sink Checking
- check whether the execution is realizable
Preliminaries
Memory Consistency Model
We assume that the shared memory system is sequentially consistent.
-
Sequentially Consistent(SC)
- All memory operations are executed in order.
- All memory operations are atomic.
- This assumption matches the natural expectation of us.
Language, Abstract Domain
Language:
Abstract Domain
-
No Loop
- Loop, Recursive Functions: Unrolling (2 times)
-
Side effects on the function parameters
- Introduce auxiliary variables for the objects passed into the function by references.
-
Partial SSA-form
- Direct flow about top variable \( v \) is obvious.
- We focus on indirect flow about address-taken variables \( o \).
3.1 Data Dependence Analysis: Construct VFG
Our goal:- Resolve intra-thread data dependence edges
- Compute Guard condition
- 🗝 Points-to facts of address-taken variables
Steps
- Intra-procedural path-sensitive pointer analysis
- Procedural Transfer Function Computation
- VFG Construction
1. Intra-procedural path-sensitive pointer analysis
\begin{algorithm}
\caption{Intra-procedural Pointer Analysis}
\begin{algorithmic}
\PROCEDURE{PathSensitivePTA}{function F}
\FORALL{instruction I in F}
\IF{$ l, \phi : p = alloc $}
\STATE $ p \rightarrowtail (\phi, alloc) $
\COMMENT{$\rightarrowtail$: points-to, $\phi$: path-condition}
\ELSEIF{$ l, \phi : p = q $}
\STATE $ p \rightarrowtail (\gamma \land \phi, o) , \forall (\gamma, o) \in Pts(q) $
\ELSEIF{$ l, \phi : *x = q $}
\IF{$ Pts(x) $ is a singleton}
\STATE $ IN_l \leftarrow (IN_l \backslash Pts(x)) $
\COMMENT{strong update}
\ENDIF
\STATE $ OUT_l \leftarrow IN_l \cup \{ Pts(x) \rightarrowtail (\gamma \land \phi, o ) \}, \forall (\gamma, o) \in Pts(q)$
\ELSEIF{$ l, \phi : p = *y $}
\STATE Let $o \in Pts(y) \cap IN_l$
\STATE $ p \rightarrowtail (\gamma \land \phi, o'), \forall (\gamma, o') \in Pts(o)$
\ELSEIF{$ l, \phi : (x_0, ..., x_n) = call f(v_1, ..., v_n) $}
\STATE $ x_i \rightarrowtail (\phi, Trans(F, Pts(v_i))), \forall i \in [1, n]$
\STATE \COMMENT{$Trans$: Summary of F}
\ELSEIF{$ l, \phi : fork(t, f)$}
\STATE continue
\ENDIF
\ENDFOR
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}
Example: Thread1
// IN = {}
x = malloc();
// x -> (true, alloc_x)
a = malloc();
// a -> (true, alloc_a)
*x = a;
// OUT = {alloc_x -> (true, alloc_a)}
if (cond) {
// IN = {alloc_x -> (true, alloc_a)}
c = *x;
// c -> (cond, alloc_a)
print(*c);
}void thread2(int **y)
// y -> (true, alloc_x)
b = malloc(); // alloc_b
if (!cond)
*y = b;
free(b);Q. Answer points-to relations about Thread2.
A.y -> (true, alloc_x)
b -> (true, alloc_b)
alloc_x -> (!cond, alloc_b)2. Procedural Transfer Function Computation
Summarize $F$'s points-to side-effects.Example: thread2()
// y -> o1 -> o2
void thread2(y) {
b = malloc(); // alloc_b
if (!cond) {
*y = b;
free(b);
}
}
// y -> o1 -> alloc_bSummary: $Trans(F, Pts(o1)) =$ {alloc_b}
3. VFG Construction
Basic rules for data dependencies
| Instruction | Value Flow | Guard Condition |
|---|---|---|
| $l,\phi: p = alloc$ | $alloc \rightarrow p$ | $\phi$ |
| $l,\phi : p = q$ | $q \rightarrow p$ | $\phi$ |
| \begin{array} \\ l_1, \phi_1 : *x = q; \\ l_2, \phi_2 : p = *y; \end{array} | \begin{array} \\ \exists o | (o, \alpha) \in Pts(x) \land (o, \beta) \in Pts(y) \\ q \rightarrow p \end{array} | $\phi_1 \land \phi_2 \land \alpha \land \beta$ |
Example: Value-Flow Graph
bool cond;
// Thread 1
void thread1() {
int **x = malloc(); // alloc_x
int *a = malloc(); // alloc_a
*x = a; // alloc_x -> alloc_a
fork(thread2, x);
if(cond) {
int *c = *x; // c -> {alloc_a, alloc_b}
print(*c); // May-use alloc_b...?
}
}3.2 Interference Dependence Analysis: Construct Inter-thread VFG
Our goal:- Resolve interference dependence edges
- Compute Guard condition
- 🗝 Alias relationship between different threads
Steps
- Escaped(Shared) Object Computation
- Dependence Edges Computation
- Dependence Guard Computation
- Update until we reach a fixed-point
1. Escaped Object Computation
Objects pointed-to-by escaped objects are escaped objects.
- $EspObj$: Escaped(Shared) Objects
-
$Pted$: Pointed-to-by Variables
- ex) $p \rightarrowtail o, Pts(p) = \{o\}, Pted(o) = \{p\} $
\begin{algorithm}
\caption{Escaped Object Computation}
\begin{algorithmic}
\PROCEDURE{EscapeAnalysis}{VFG}
\STATE Initialize $EspObj$
\COMMENT{objects passed to the fork calls}
\FORALL{node $n$ in VFG}
\IF{$n : *x = q$}
\STATE Let $ x \rightarrowtail o$ and $ q \rightarrowtail o'$
\IF{$o \in EspObj$}
\STATE $EspObj \leftarrow EspObj \cup \{ o' \}$
\ENDIF
\ENDIF
\ENDFOR
\FORALL{escaped object $o$ in $EspObj$}
\STATE Find the set of nodes $N$ that reachable from $o$
\STATE Let $\sigma$ the aggregated guards of edges during traversal
\FORALL{node $n$ in $N$}
\STATE $Pted(o) \leftarrow Pted(o) \cup (n, \sigma)$
\ENDFOR
\ENDFOR
\RETURN $EspObj$ and their $Pted$
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}
Example: alloc_b in Thread2
void thread2(int **y)
// EspObj = {alloc_x}
b = malloc(); // alloc_b
if (!cond)
// y -> alloc_x
// b -> alloc_b
*y = b;
// EspObj += alloc_bResults of EscapeAnalysis:
\begin{array} \\ EspObj = \{ alloc\_x, alloc\_b \} \\ Pted(alloc\_x) = \{ (true, x), (true, y) \} \\ Pted(alloc\_b) = \{ (!cond, b) \} \end{array}2. Dependence Edges Computation
\begin{algorithm}
\caption{Dependence Edges Computation}
\begin{algorithmic}
\FORALL{store node $l_1, \phi_1 : *x = q$ in $t \in T$}
\FORALL{load node $l_2, \phi_2 : p = *y$ in $t' \backslash t \in T$}
\IF{$(x, \alpha), (y, \beta) \in Pted(o), o \in EspObj$}
\STATE $\Phi_{Guard} = GuardComputation()$
\STATE $l_1: *x = q \longrightarrow_{\Phi_{Guard}} l_2: p = *y$ in VFG
\ENDIF
\ENDFOR
\ENDFOR
\end{algorithmic}
\end{algorithm}
Example:
// Thread 1 // Thread 2
c = *x; <---- *y = b;
// Pted(alloc_x) = {x, y}3. Dependence Guard Computation
\( \Phi_{Guard} = \Phi_{alias} \land \Phi_{ls} \)-
$\Phi_{Guard}$: Dependence Guard Constraint
- $\Phi_{alias}$: Alias Constraint
- $\Phi_{ls}$: Load-Store Constraint
Alias Constraint
$\Phi_{alias} = \phi_1 \land \phi_2 \land \alpha \land \beta$ where:- $l_1, \phi_1 : *x = q \longrightarrow l_2, \phi_2 : p = *y$
- $ x \rightarrowtail (\alpha, o)$
- $ y \rightarrowtail (\beta, o)$
- Conditions: $\phi_1, \phi_2, \alpha, \beta$
Example
- $cond: c = *x;$
- $!cond: *y = b;$
- $x \rightarrowtail (true, alloc_x)$
- $y \rightarrowtail (true, alloc_x)$
$ \therefore \Phi_{alias} = cond \land !cond \land true \land true$
Load-Store Constraint
$\Phi_{ls} = \bigwedge_{s, s' \in S(l)} \left( O_s < O_l \bigwedge_{\forall s' \neq s} O_{s'} < O_s \lor O_l < O_{s'} \right)$- $s$: store at $l_1$
- $l$: load at $l_2$
- $O_s < O_l$: strict partial order between $s$ and $l$
- $s'$: other stores
- $O_{s'} < O_s \lor O_l < O_{s'}$: no other stores between $s$ and $l$
Example
$ \therefore \Phi_{ls} = O_b < O_c $
Guard Constraint
\( \Phi_{Guard} = \Phi_{alias} \land \Phi_{ls} \)
Example
$ \therefore \Phi_{Guard} = cond \land !cond \land O_b < O_c $
4. Update until we reach a fixed-point
We now add new dependence edges.
Thereby, there are new $Pts$ relations (e.g., $c \rightarrowtail alloc\_b$).
We re-compute the following steps until we reach a fixed point:
- Intra-thread data dependence
- Interference dependence
3.3 Source-Sink Checking: Check Realizable Value-Flow Path
Once the interference-aware VFG is built, some concurrency bugs can be reduced to solving source-sink reachability problems over VFG.-
ex) Inter-thread use-after-free bug
- src: free statement
- sink: use statement
In this section, we address two problems to achieve precision and efficiency:
- How to address the program order violation of a value-flow path?
- How to efficiently solve the constraints for value-flow paths?
Partially-ordered Value Flows Searching
To check value-flow paths are realizable, we collect guard constraint $\Phi_{Guard}$ along the path. \begin{eqnarray} \\ && \pi = \langle v_1@l_1, v_2@l_2, ..., v_k@l_k \rangle \\ && src = v_1@l_1, sink = v_k@l_k \\ && (v_i@l_i, v_{i+1}@l_{i+1}) \text{ via either data-dependence or interference-dependence} \end{eqnarray}Example
But this orders over $\pi$ may violate the program order.
Bad Example
This path $\pi$ is realizable...?
Partial Order Constraints: Intra-thread Program Order
$\Phi_{po}(\pi) = O_{l_1} < O_{l_2} < O_{l_3} < ... < O_{l_k}$
- This constraints represent intra-thread program order.
Example
This path $\pi$ is not realizable!!
$ \because O_2 < O_3 \land O_3 < O_4 \land O_4 < O_1 \land O_1 < O_2 = false$
Summary: All Constraints
$\Phi_{all}(\pi) = \Phi_{Guard}(\pi) \land \Phi_{po}(\pi) $- $\Phi_{Guard}(\pi)$: Guard constraint via value-flow paths
- $\Phi_{po}(\pi)$: Intra-thread program order constraint
Example
Constraint Solving using SMT Solver
Finally, we check src-sink reachability by solving $\Phi_{all}$ using SMT solver.
To solve it effectively, we optimize by three means:
-
Filter out apparent contradictions
- ex) $ cond \land !cond = false$
-
The constraints on different src-sink paths are independent
- We can solve them in parallel.
- Cube-and-conquer parallel SMT solving strategy
Example
$\therefore$ This use-after-free path $\pi$ is not realizable.
4. Implementation
-
Canary: on top of LLVM framework
- Support pthread, std::thread
-
MHP(may-happen-in-parallel) analysis in interference dependence analysis
- to prune value-flow paths
- Use Steengard's analysis for first CG
- SMT Solver: Z3
5. Evaluation
-
Evaluation Points
- Comparison in Value-flow Graph Construction
- Comparison in Concurrency bug Detection
- Detected Real Concurrency Bugs
-
Analyzed Projects
- 20 Real-life open-source C/C++ projects
- ex) Firefox, MariaDB, MySQL, ...
-
Experiment Environment
- two 20 core Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
- 256GB physical memory
- Ubuntu-16.04
Comparison in Value-flow Graph Construction
-
Canary
- finish all projects
- On average,
- >15x faster than SABER
- >180x faster than FSAM
-
SABER
- Times out on 9 projects
-
FSAM
- Times out on 15 projects
- SABER uses 130G additional memory compared to Canary for the subjects larger than 100KLoc.
- FSAM uses 200G additional memory compared to Canary for the subjects larger than 50KLoc.
Summary: Canary is much more efficient.
($\because$ FSAM and SABER perform an exhaustive (Andersen) points-to analysis.)
Comparison in Concurrency bug Detection
We choose to check inter-thread use-after-free because it has become the most exploited memory errors.
Scalability
We do not compare with FSAM and SABER because they suffer from the scalability problem in VFG construction.-
Time/Memory
- Canary's cost grow almost linearly
- MySQL (3MLoc): 2.5 hours
- FireFox (9MLoc): 4.67 hours
Precision
-
Canary
- Reports: 15 bugs
- False positive rates: 26.67%
-
SABER
- Reports: 9896 bugs
- True positive in 100 sampling: 5 bugs
-
FSAM
- Reports: 586 bugs
- True positive in 100 sampling: 2 bugs
Summary: Canary is scalable and precise.
($\because$ Canary builds more precise dependence relations and takes the feasible interleaving into account.)
Detected Real Concurrency Bugs
Canary have already detected 18 concurrency bugs from dozens of open-source projects.- Famous sotfware systems: Firefox, LevelDB, Transission, ...
- 14 bugs have been fixed in the recent release versions.
- aside) The confirmation of concurrency bugs was their team effort. (It is very difficult!!)
7. Conclusion
- Goal: Detect inter-thread value-flow bugs
-
Approach: Interference-aware value-flow analysis
- Canary archives both good precision and scalability.
- Results: 18 previously-unknown concurrency bugs on famous projects
-
Future work
- Enhancements to support more synchronization semantics (e.g., lock/unlock, signal/notify)
- Relaxed memory models
- Customized decision procedures to improve the efficiency of constraint solving
8. Relation to my research
Canary (再掲)
-
Canaryは,src-sink reachabilityに帰着できる並行バグを検出する.
- e.g., Use-after-free, NULL pointer dereference, Double free, Race condition
// Thread 1 // Thread 2
free(x);
use(x);自分の研究
-
データフロー保護を並行プログラムに適用することで,src-sink reachabilityに帰着できる並行バグを検出できるのでは?
- Inter-thread VFGを構築し,スレッド間のデータフローを求める.
-
合法なデータフローを保護する.
- 並行バグを引き起こすデータフローを違法とする
- その他のスレッド間データフローを合法とする
- 要考察: 並行バグを引き起こすデータフロー
-
SoundなInterference VFGを構築
- Canaryは,ループを展開するなど一部unsoundな解析を行う
- 要考察: DFIの観点では,soundなVFGを構築したい