0% found this document useful (0 votes)
73 views13 pages

Rsut Call Graph

Uploaded by

17778221637user
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views13 pages

Rsut Call Graph

Uploaded by

17778221637user
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

A Context-Sensitive Pointer Analysis Framework for

Rust and Its Application to Call Graph Construction


Wei Li Dongjie He Yujiang Gui
UNSW UNSW UNSW
Sydney, Australia Sydney, Australia Sydney, Australia
[email protected] [email protected] [email protected]

Wenguang Chen Jingling Xue


Ant Group, Tsinghua University UNSW
Beijing, China Sydney, Australia
[email protected] [email protected]

Abstract ACM Reference Format:


Existing program analysis tools for Rust lack the ability to Wei Li, Dongjie He, Yujiang Gui, Wenguang Chen, and Jingling Xue.
2024. A Context-Sensitive Pointer Analysis Framework for Rust and
effectively detect security vulnerabilities due to the absence
Its Application to Call Graph Construction. In Proceedings of the 33rd
of an accurate call graph and precise points-to information. ACM SIGPLAN International Conference on Compiler Construction
We present Rupta, the first context-sensitive pointer analysis (CC ’24), March 2–3, 2024, Edinburgh, United Kingdom. ACM, New
framework designed for Rust, with a particular focus on its York, NY, USA, 13 pages. https://doi.org/10.1145/3640537.3641574
role in constructing call graphs. Operating on Rust MIR,
Rupta employs callsite-based context-sensitivity and on- 1 Introduction
the-fly call graph construction to address a range of pointer
analysis challenges, including method/function calls, pointer Rust, a growing system-level programming language, is in-
casts, and nested structs, while preserving type information. creasingly employed in low-level systems development, in-
Our assessment of Rupta against two state-of-the-art call cluding OSes [10, 17, 34, 40] and web browsers [2]. It ensures
graph construction techniques, Rurta (Rapid Type Analysis- memory safety by segregating safe code and unsafe code, effec-
based) and Ruscg (static dispatch-only), across 13 real-world tively eliminating memory safety bugs during compile time
Rust programs demonstrates its high efficiency and preci- within safe code. However, the Rust compiler bypasses safety
sion. In particular, our results reveal that Rupta surpasses checks in unsafe code, potentially leading to security vulnera-
Ruscg in soundness by discovering 29% more call graph bilities [16, 37, 47], similar to C/C++ programs. Existing pro-
edges and outperforms Rurta in precision by eliminating gram analysis (and verification) tools [3, 6, 8, 27, 37, 44, 46]
approximately 70% of spurious dynamic call edges. Conse- for Rust lack the ability to detect vulnerabilities due to the
quently, Rupta has the potential to enhance existing security absence of an accurate call graph and precise points-to in-
analysis tools, enabling them to identify a greater number formation, hindering their bug detection capability.
of security vulnerabilities in Rust programs. For instance, existing inter-procedural bug detection tools
[6, 27, 37] focus solely on static calls at compile time, over-
CCS Concepts: • Theory of computation → Program looking dynamic calls and thus hampering their detection ca-
analysis; • Software and its engineering → Automated pabilities. Similarly, other vulnerability detection techniques
static analysis. [27, 48] rely on simple heuristics for alias computation, miss-
ing critical bugs due to imprecise points-to information and
Keywords: Rust, Pointer Analysis, Call Graph Construction call graph analysis, as acknowledged by their authors.
Despite the critical need for precise points-to information
Permission to make digital or hard copies of all or part of this work for and accurate call graphs in Rust analysis tools operating
personal or classroom use is granted without fee provided that copies
on Rust’s Mid-level Intermediate Representation (MIR) for
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights Rust programs [27, 48], there is currently no existing pointer
for components of this work owned by others than the author(s) must analysis framework available.
be honored. Abstracting with credit is permitted. To copy otherwise, or Developing a dedicated pointer analysis for Rust MIR is im-
republish, to post on servers or to redistribute to lists, requires prior specific perative. However, this task presents significant challenges
permission and/or a fee. Request permissions from [email protected]. and requires substantial engineering efforts, primarily due to
CC ’24, March 2–3, 2024, Edinburgh, United Kingdom
the complexity of certain language features. Simply porting
© 2024 Copyright held by the owner/author(s). Publication rights licensed
to ACM.
existing solutions for C/C++ (e.g., SVF [41]) or Java (e.g.,
ACM ISBN 979-8-4007-0507-6/24/03 Qilin [21] and Doop [11]) to Rust is not feasible for sev-
https://doi.org/10.1145/3640537.3641574 eral reasons. Firstly, Rust’s memory access paradigms are

60
CC ’24, March 2–3, 2024, Edinburgh, United Kingdom Wei Li, Dongjie He, Yujiang Gui, Wenguang Chen, and Jingling Xue

as intricate as those of C/C++, but its MIR operates at a analysis times. This encourages exploration into targeted
higher abstraction level than those used by SVF, necessitat- context-sensitive pointer analysis methods, potentially boost-
ing a proper memory model and more complex handling ing Rust’s analysis efficiency.
during pointer analysis. Secondly, Rust supports various In summary, this paper makes the following contributions:
calls resolved through different mechanisms, such as static • Introduction of Rupta, the first context-sensitive pointer
and dynamic dispatch. Accurately computing call targets for analysis framework for Rust MIR.
these diverse calls requires correct modeling of the associ- • Addressing challenges in precise Rust pointer analysis
ated semantics for each kind of call, a challenging task due and call graph construction, including complex mem-
to limited documentation in Rust. Thirdly, achieving precise ory models, diverse function calls, and accurate type
pointer analysis and building call graphs demand maintain- semantics for pointer casts and nested structs.
ing accurate type information for objects in a Rust program. • Extensive evaluation of Rupta to demonstrate its effec-
However, this can be difficult due to pointer casts that may tiveness in efficiently producing accurate call graphs
interpret the same object as having different types at dif- and precise points-to information for Rust.
ferent program points. Finally, careful handling of nested The rest of this paper is organized as follows. Section 2
structs is essential to preserve type information accurately highlights design challenges and motivates our approach.
during analysis. Section 3 introduces Rupta. Section 4 covers implementation
In this paper, we introduce Rupta, the first context-sensitive details, while Section 5 presents performance evaluation.
pointer analysis framework designed specifically for Rust Section 6 discusses related work, and Section 7 concludes.
MIR. Rupta incorporates callsite-based context-sensitivity
and on-the-fly call graph construction, effectively address- 2 Motivation: Challenges and Insights
ing earlier challenges. It offers comprehensive support for
The Rust compiler processes Rust programs through various
resolving various types of calls and maintaining precise type
Intermediate Representations (IRs): High-level IR (HIR), Mid-
information during analysis. To handle pointer casts, Rupta
level IR (MIR), and LLVM IR. MIR is favored for various Rust
adopts a novel approach, symbolizing memory objects with
program analyses [27, 44] due to its structured format and
multiple abstractions of different types. Additionally, Rupta
rich type information compared to HIR and LLVM IR.
uses a projection-based approach, unlike the field-index-
As Rust gains traction in systems development, the de-
based method commonly used in C/C++ pointer analysis
mand for research in pointer analysis for the language is
[36, 41], to differentiate fields in nested structs while pre-
growing. However, addressing Rust’s unique characteristics
serving their type information. Rupta is conceived as an
presents new challenges, especially when designing tools
open-source project, built entirely from scratch using Rust,
like Rupta to operate on Rust MIR. Achieving precision and
and currently comprises approximately 10,000 lines of code.
efficiency while accommodating Rust-specific features be-
It aims to function as a tool for advanced Rust analyses,
comes paramount. We delve into the encountered challenges
enhancing the safety and security of Rust programs.
during development, which involve preserving type infor-
To illustrate Rupta’s effectiveness in call graph construc-
mation for each pointed-to object. These challenges include
tion, we implemented two additional call graph construction
analyzing calls, modeling pointer casts, and representing
tools commonly used in Rust program analysis: Rurta (a
object fields, such as nested structs.
sound but less precise technique based on Rapid Type Anal-
ysis (RTA) [5] utilized in [44]) and Ruscg (a commonly used 2.1 Analyzing Calls
unsound technique considering only static dispatch [27, 37]).
To compute precise points-to information and call graphs effi-
In our extensive evaluation of 13 real-world Rust projects,
ciently in Rupta, we must effectively resolve call targets. Rust
Rupta demonstrated its ability to provide highly precise
relies on traits to define shared behaviors between types and
points-to information while maintaining high efficiency. When
abstract over implementations, as illustrated by the Shape
comparing Rupta’s call graph construction capabilities with
trait in Figure 1, which provides a common area() method.
those of Rurta and Ruscg, we found that Rupta outper-
These traits facilitate polymorphism, increasing flexibility
forms Ruscg by discovering 29% more call graph edges and
and code reuse. Rust employs two dispatch mechanisms for
is more precise than Rurta by eliminating about 70% of
call resolution: static dispatch and dynamic dispatch, repre-
spurious dynamic call edges.
senting two options for implementing polymorphism.
Our evaluation reveals a notable insight for Rust: in our
framework, 1-callsite-sensitive pointer analysis offers bet- 2.1.1 Static Dispatch This enables the creation of generic
ter scalability compared to Andersen-style analysis, easily functions that can operate on arguments of various types, em-
achievable through our Rupta framework. We found that ploying generics or parametric polymorphism, akin to C++’s
many Rust Standard Library functions, especially those safe class templates. For example, the function sprint_area()
functions encapsulating unsafe code, show markedly less can accept arguments of any type that implements the Shape
precision in context-insensitive analysis, leading to longer trait, like Circle and Rectangle.

61
A Context-Sensitive Pointer Analysis Framework for Rust CC ’24, March 2–3, 2024, Edinburgh, United Kingdom

1 trait Shape { fn area(&self) -> f64; } 1 // function pointer


2 struct Circle { r: f64 } 2 fn times2(x: u32) -> u32 { x * 2 }
3 impl Shape for Circle { 3 let fp: fn(u32) -> u32 = times2;
4 fn area(&self) -> f64 { PI * self.r * self.r } } 4 let r = fp(2);
5 struct Rectangle { w: f64, h: f64 } 5 // Fn trait calls
6 impl Shape for Rectangle { 6 fn foo<F: Fn(u32) -> u32>(x: u32, f: F) -> u32 { f(x) }
7 fn area(&self) -> f64 { self.w * self.h } } 7 foo(2, |x| x * 2); // Closure
8 fn sprint_area<T: Shape>(obj: T) { // static dispatch 8 foo(2, times2); // Function Item
9 print!("Area = {}", obj.area()); } 9 foo(2, fp); // Function Pointer
10 fn dprint_area(obj: &dyn Shape) { // dynamic dispatch
11 print!("Area = {}", obj.area()); } Figure 2. Function pointers and Fn traits.
12 let c = Circle { r: 1.0 };
13 let r = Rectangle { w: 2.0, h: 1.0 };
14 sprint_area(c); function call styles. We’ll use Fn* for brevity to refer to any
15 sprint_area(r); of these traits. An Fn* object acts as a regular function and
16 dprint_area(&c as &dyn Shape);
can be used as a function call operand. In Figure 2, foo()
accepts an Fn* argument and invokes it as f(x) in line 6.
Figure 1. An example with static and dynamic dispatch. In Rust, any type that implements an Fn* trait must also
Generics undergo monomorphization when translating implement the call() method defined within the trait. The
Rust MIR to a codegen IR, resulting in separate copies of func- Rust compiler will route a call via an Fn* trait object (e.g.,
tions for each concrete argument type when sprint_area() f(x)) to the corresponding implementation. Similar to regu-
is called with Circle and Rectangle objects (lines 14–15). lar traits, Fn* traits can be dispatched statically or dynami-
This is akin to invoking the following two functions: cally. In Rust, Fn* traits are automatically implemented by
1 fn sprint_area_circle(obj: Circle) { ... } closures, function items and function pointers (lines 7–9 in
2 fn sprint_area_rectangle(obj: Rectangle) { ... } Figure 2). Special handling is required to accurately model
In each function, the call to obj.area() will execute the function calls through the Fn* traits.
specific implementation associated with the argument type. In conclusion, accurately computing points-to information
Rust’s type system provides robust compile-time reason- and building precise call graphs for Rust programs requires
ing for generic types. This enables a pointer analysis tool on addressing Rust’s complex static and dynamic dispatch se-
Rust MIR, such as Rupta (introduced in this paper), to ana- mantics. Handling dynamic trait objects and function point-
lyze generic functions effectively through monomorphized ers is also crucial. Existing pointer analysis frameworks like
duplication for various implementations. SVF [41] and Pinpoint [38] for C/C++, as well as Qilin [21]
and Doop [11] for Java, do not inherently support Rust’s Fn*
2.1.2 Dynamic Dispatch This enables runtime polymor- traits, function pointers, or Rust’s specific features.
phism using dynamic trait objects, which refer to values of
any type that implement a given trait. The object’s specific
type is only known at runtime. In Figure 1, dprint_area() 2.2 Modeling Pointer Casts
is similar to sprint_area(), but its parameter obj is a ref- Handling pointer casts, a common practice in low-level pro-
erence to a dynamic trait object, indicated by the keyword gramming languages, poses a challenge in correctly inferring
"dyn". Thus, area() called on obj (line 11) is dynamically underlying object types. In Rust, a pointer cast statement,
dispatched at runtime. similar to the bitcast instruction in LLVM IR, allows rein-
Dynamic trait objects are created by casting from con- terpretation of the underlying object into one with a different
crete instances (e.g., line 16). Each instance of a pointer to type, leading to varying object types at different points.
a dynamic trait object includes a data pointer and a vir- Existing pointer analysis frameworks for Java, such as
tual method table (vtable) [15], enabling runtime resolution Qilin [21] and Doop [11], do not address pointer casts be-
of trait-defined method calls. Achieving precise static call tween two unrelated types, as illustrated in Figure 3, since
graphs presents challenges akin to those in C++ and Java, such casts are not permitted in Java. On the other hand,
requiring accurate modeling of the potential concrete types frameworks for C/C++, such as SVF [41], treat pointer cast
a dynamic trait object may refer to at runtime. statements as straightforward assignments without accu-
rately modeling the type information of the cast object.
2.1.3 Function Pointers Rust, like C/C++, supports func-
In Figure 3, we reuse the Shape trait and the Circle struct
tion pointers for dynamic runtime function invocation. These
from Figure 1. Here, p is obtained by casting an f64 pointer
pointers refer to function items or closures with unknown
referring to variable r into a Circle object pointer (line 2),
identities at compile time. In Figure 2, fp in line 3 is a func-
and then p itself is cast to a dynamic Shape object (lines 3–4).
tion pointer pointing to the function item times2, and the
Finally, the trait method area() is called on s (line 5).
dynamic function call in line 4 uses fp as the operand.
In Rust’s pointer analysis, dynamic calls are resolved by
2.1.4 Fn Traits The Rust standard library defines three identifying the types of objects to which the pointers of
special traits: Fn, FnMut, and FnOnce, representing different dynamic trait objects point. This approach is similar to that

62
CC ’24, March 2–3, 2024, Edinburgh, United Kingdom Wei Li, Dongjie He, Yujiang Gui, Wenguang Chen, and Jingling Xue

1 let r = 2.0; 1 struct Inner { 6 let a: u32 = 2;


2 let p = &r as *const f64 as *const Circle; 2 x: &u32, 7 let mut obj = Outer {
3 let q = p as *const dyn Shape; 3 y: &u32 } 8 inn: Inner {...} };
4 let s: &dyn Shape = &(*q); // s is equivalent to q 4 struct Outer { 9 let mut ptr = &mut obj.inn;
5 s.area(); 5 inn: Inner } 10 (*ptr).x = &a;
(a) Struct declaration (b) Complex field access
Figure 3. An example with a pointer cast.
11 impl ToString for Inner { fn to_string(&self) -> String {...} }
used in Java pointer analysis, primarily due to the lack of 12 impl ToString for Outer { fn to_string(&self) -> String {...} }
13 impl ToString for &u32 { fn to_string(&self) -> String {...} }
vtable information in Rust MIR. Treating pointer casts as 14 let tostr = ptr as &dyn ToString;
simple assignments, as in existing C/C++ pointer analysis 15 tostr.to_string();
[38, 41], would lead to the failure to resolve the call s.area(), (c) Dynamic call
as s would be assumed to point to r of type f64.
In Rust, the vtable pointer of a dynamic trait object is Figure 4. An example with nested struct types.
initialized at the cast site, depending on the concrete type However, when applied to Rust MIR, this representation in-
from which the trait object is being cast. Thus, the vtable troduces challenges. Rust MIR includes a wider range of type
pointer of s actually points to the vtable corresponding to categories than LLVM IR, such as Enum and Union, making
the implementation of Circle for Shape. As a result, calling it difficult to properly represent each kind of field. Addition-
Circle::area() will yield 𝜋 × 22 = 12.56. ally, this representation creates ambiguity regarding object
To accurately model pointer casts for a given object, we types. For example, the abstraction (obj, 0) can represent
propose using multiple abstractions for the underlying object, three different types: Outer, Inner, and &u32. This ambigu-
each associated with a unique type. For example, in the ity hampers the full utilization of Rust’s precise type system
casting scenario from Figure 3, we create a new abstraction and may result in precision issues.
r’ of type Circle and associate it as the pointed-to object for In the scenario where the ToString trait is implemented
pointers q and s. This approach enables Rupta to correctly for three types, Outer, Inner, and &u32, as depicted in Fig-
resolve s.area() calls, enhancing analysis soundness and ure 4c, a complexity emerges when creating a dynamic trait
precision through type filtering [9]. object. This complexity arises when casting ptr into tostr
(line 14) and then invoking the trait-defined method
2.3 Representing an Object’s Fields to_string() on it (line 15). The field-index-based approach
is unable to determine which of these three implementations
The representation of an object’s fields, particularly in nested
should be called. This ambiguity stems from the fact that the
structs in Rust, greatly affects pointer analysis precision, ne-
object (obj, 0) that tostr points to could potentially be
cessitating the preservation of type information in analysis.
any one of the three types: Outer, Inner, or &u32.
Rust’s memory model shares similarities with C/C++ but
In our Rupta framework, we use a projection-based rep-
differs from Java. Java uses reference semantics, storing all
resentation to distinguish object fields, identifying each sub-
objects on the heap. In contrast, C/C++ employs both value
object by its base object and full field projection. This pre-
and reference semantics, enabling objects to reside on both
serves the source code’s access pattern (e.g., obj.inn and
the stack and the heap. Rust, by default, supports value se-
obj.inn.x) and assigns a unique type to each object. In the
mantics, allowing objects to contain subobjects as field com-
example from Figure 4c, our approach correctly identifies
ponents. Additionally, any object, regardless of its allocation
that tostr points to the obj.inn object of type Inner, en-
location (stack or heap), can have its address taken and stored
suring accurate resolution of the dynamic call in line 15.
in a variable. These features have substantial implications for
handling field information in Rust program pointer analysis.
Consider an example illustrated in Figure 4. In Figure 4a,
3 Rupta: Pointer Analysis for Rust
the Outer struct contains a subobject inn of type Inner, In this section, we introduce the core pointer analysis tech-
stored directly within the Outer instance, accessible without nique used in Rupta. We present a simple Rust language
dereferencing. In Figure 4b, ptr holds the address of obj.inn model in Section 3.1 as the foundation for our Rupta frame-
subobject, and it is then used to store the address of a into work. The rules for context-sensitive pointer analysis in Rust
the obj.inn.x field. Precise pointer analysis requires field are explained in Section 3.2. Finally, we discuss Rupta’s effec-
sensitivity to distinguish fields within the same struct. tive handling of various Rust language features in Section 3.3.
Existing pointer analysis frameworks for C/C++ [38, 41]
use a field-index-based approach on LLVM IR, representing 3.1 The Rust Language Model
the 𝑖-th field of obj as (obj, 𝑖). This approach typically The simple Rust language given in Figure 5 is a subset of Rust
allows structs to share the same index with their first field, MIR. It encompasses functions and global variables, treated
facilitating the handling of pointer casts that convert a struct in a unified manner (since the initialization procedure for
pointer into its first field (e.g., casting *Outer to *Inner). each global variable can be viewed as a function). Within

63
A Context-Sensitive Pointer Analysis Framework for Rust CC ’24, March 2–3, 2024, Edinburgh, United Kingdom

Function 𝑓𝑛 ::= 𝑓 (𝑣 1, 𝑣 2, ...) {𝑠; } Table 1. Eight kinds of statements analyzed by Rupta.
Type 𝜏 ::= bool | i8 | u8 | i16 | u16 | ...
Local 𝑣 ∈ {𝑣 0, 𝑣 1, 𝑣 2, ...} Statement Kind Statement Kind
Projection 𝜂 ::= 𝜖 | 𝑛.𝜂
Place 𝑝 ::= 𝑣 .𝜂 𝑣 = alloc(𝜏) Alloc 𝑝 1 = &𝑝 2 AddrOf
PlaceExpr 𝜚 ::= 𝑣 | ∗𝜚 | 𝜚 .𝑛 𝑝1 = 𝑝2 Assign 𝑝 1 = 𝑝 2 as 𝜏 Cast
Rvalue 𝑟 ::= 𝜚 | &𝜚 | 𝜚 as 𝜏 𝑝 = (∗𝑣).𝜂 Load (∗𝑣).𝜂 = 𝑝 Store
Statement 𝑠 ::= 𝑠 1 ; 𝑠 2 | 𝜚 = 𝑟 | 𝜚 = 𝑓 (𝜚 1, 𝜚 2, ...) | ... 𝑝 = &(∗𝑣).𝜂 Gep 𝑝 0 = 𝑓 (𝑝 1, 𝑝 2, ...) Call

Figure 5. A simplified Rust language, where 𝑛 represents a and heap objects are identified by their allocation sites. All
projection element other than a dereference for a struct. allocation function calls like alloc, realloc, alloc_zeroed,
and exchange_malloc are treated as allocation sites. Heap
each function, the first local 𝑣 0 denotes the return value,
objects default to type u8, as the return type of alloc is *u8.
followed by locals representing function arguments, and
An AddrOf statement assigns the address of 𝑝 2 to another
then user-declared variables and temporaries.
place, creating a pointer 𝑝 1 . To read from or write into a
Since Rupta is currently flow-insensitive, it only considers
place dereferenced from a pointer 𝑣, we use Load or Store
assignments and function calls, while ignoring control flow
statements. Assign or Cast statements copy the value from
statements and other irrelevant Rust MIR statements. In Rust
𝑝 2 to 𝑝 1 , with Cast reinterpreting it as a new type 𝜏. A
MIR, each object is represented by a Place Expression, which
Gep statement is similar to an LLVM IR GetElementPtr
consists of a base variable and a list of projection elements
instruction, returning the address of a field or element in the
that project out from the base variable, such as dereferencing
base object pointed to by 𝑣. Rust has two main types of type
a pointer, accessing a field, or indexing an array. To address
casts: primitive (for primitive data types) and pointer (for
memory representation, we introduce two abstractions: Place
pointer types). In pointer analysis, we only need to handle
𝑝 and PlaceExpr 𝜚 . Places are a subset of place expressions
pointer casts. Some Rust MIR cast statements are categorized
that exclude dereferences, always corresponding to a specific
as Assign because they don’t reinterpret the underlying
part of a local variable on the stack. On the other hand, a
objects, such as casting from a reference to a raw pointer or
place expression may represent a memory access on the
from a concrete type to a dynamic trait type.
heap. If a projection does not exist (when 𝜂 = 𝜖), a place 𝑝
When analyzing a Call statement, we must account for
degenerates simply into a local variable. For instance, in the
inter-procedural assignments, which may include argument-
example from Figures 4a and 4b, obj is a local variable with
passing and value-return. In the case of a function pointer
𝜂 consisting of three projection elements: inn, inn.x, and
call, the target function is represented as a variable.
inn.y, resulting in three places, obj.inn, obj.inn.x, and
In pointer analysis, we typically analyze statements that
obj.inn.y, with respect to the base variable obj.
copy memory addresses from one pointer to another. In Rust
An assignment 𝜚 = 𝑟 replaces the value of 𝜚 with an rvalue
MIR, whole structs or arrays can be copied to another vari-
𝑟 , which may be a place expression (𝜚 ), a place expression
able. For example, a statement like dst = src can copy all
with its address taken (&𝜚 ), or a type cast (𝜚 as 𝜏). Rust’s
fields or elements from the src variable to the dst variable,
type system ensures that 𝜚 and 𝑟 must have the same type.
both of which have the type Outer. Therefore, it is impor-
This language model, while providing a good represen-
tant to be cautious when handling assignments in Rust MIR,
tation of Rust MIR, encounters challenges when handling
as implicit pointer assignments can occur when copying
complex high-level assignment statements. To mitigate this,
pointer-type fields between structs.
we classify all statements into eight categories, as shown
in Table 1. For complex assignments that do not fit these
categories (e.g., (∗𝑣 1 ).𝜂 1 = (∗𝑣 2 ).𝜂 2 ), we break them down 3.2 The Core Analysis
into simpler statements using temporary variables. For a Rust program, we define domains F, V, T, N, and L as
We use more specific representations, like 𝑝 and (∗𝑣).𝜂, to sets of functions, variables, data types, projection elements
denote objects (or memory locations), rather than a general (dereferences excluded), and statement labels, respectively.
place expression 𝜚 . Our analysis assumes that all pointers Let R = N∗ denote the universe of projections. Thus, V × R
being dereferenced are local variables. In Rust, there are two represents the set of places, whereas P and O denote the sets
types of pointers: references (e.g., &u32) and raw pointers of pointers and pointed-to memory locations, respectively.
(e.g., ∗u32). Nevertheless, we treat them uniformly in our Rupta supports context-sensitivity through cloning [21,
analysis without distinguishing between the two. 41], creating multiple function instances under distinct call-
Let us explore the eight statement types in Table 1. An ing contexts to prevent information from one context flowing
Alloc statement creates an object 𝑜 of type 𝜏. If 𝑜 is on into another. It employs a k-callsite-sensitive pointer anal-
the stack, 𝑣 represents 𝑜. For heap objects, 𝑣 points to 𝑜, ysis, distinguishing the calling context of a function based
on its 𝑘-limited call path (the last 𝑘 call sites, each labeled

64
CC ’24, March 2–3, 2024, Edinburgh, United Kingdom Wei Li, Dongjie He, Yujiang Gui, Wenguang Chen, and Jingling Xue

as 𝑙 ∈ L). Thus, the set of contexts is C = L∗ . For a con- 𝑖 : 𝑣 = alloc(𝜏 ) (𝜂, 𝜏 ′ ) ∈ proj_ty(𝜏 )
[AllocStack]
text 𝑐 = [𝑙 1, ..., 𝑙𝑛 ] ∈ C and a call site 𝑙 ∈ L, 𝑙::𝑐 represents type_of(𝑣.𝜂 ) = 𝜏 ′
[𝑙, 𝑙 1, ..., 𝑙𝑛 ], and ⌈𝑐⌉ 𝑘 returns the 𝑘-limited context [𝑙 1, ..., 𝑙𝑘 ] 𝑖 : 𝑝 = alloc(𝜏 )
for 𝑐. We use ⟨𝑐, 𝑜⟩ and ⟨𝑐, 𝑝⟩ to denote specialized instances 𝑐 ∈ ctxt_of(𝑓𝑖 ) (𝜂, 𝜏 ′ ) ∈ proj_ty(𝜏 )
[AllocHeap]
of object 𝑜 and variable 𝑝 under context 𝑐, respectively. ⟨𝑐, 𝑜𝑖 ⟩ ∈ pts(𝑐, 𝑝 ) type_of(𝑜𝑖 .𝜂 ) = 𝜏 ′

To address pointer casts, we adopt a novel approach com- 𝑖 : 𝑝 1 = &𝑝 2 𝑐 ∈ ctxt_of(𝑓𝑖 )


[AddrOf]
pared to existing pointer analysis frameworks for C/C++ ⟨𝑐, 𝑝 2 ⟩ ∈ pts(𝑐, 𝑝 1 )

[38, 41]. In our approach, a memory location involved in a 𝑖 : 𝑝1 = 𝑝2


𝑐 ∈ ctxt_of(𝑓𝑖 ) 𝜏 = type_of(𝑝 1 ) 𝜂 ∈ ptr_proj(𝜏 )
cast statement is represented by multiple abstract objects, [Assign]
pts(𝑐, 𝑝 2 .𝜂 ) ⊆ pts(𝑐, 𝑝 1 .𝜂 )
each with a distinct type. We use 𝑜 ≻𝜏 ∈ O to denote an
𝑖 : 𝑝 = (∗𝑣).𝜂 𝑐 ∈ ctxt_of(𝑓𝑖 )
abstract object referring to the type variant of 𝑜, which is ⟨𝑐 ′ , 𝑜 ⟩ ∈ pts(𝑐, 𝑣) 𝜏 = type_of(𝑝 ) 𝜂 ′ ∈ ptr_proj(𝜏 )
specialized to have the type 𝜏. Our pointer analysis ensures [Load]
pts(𝑐 , 𝑜.𝜂.𝜂 ) ⊆ pts(𝑐, 𝑝.𝜂 ′ )
′ ′

that (1) each object is associated with a single type, and (2)
𝑖 : (∗𝑣).𝜂 = 𝑝 𝑐 ∈ ctxt_of( 𝑓𝑖 )
each pointer, excluding those for dynamic trait objects, is ⟨𝑐 ′ , 𝑜 ⟩ ∈ pts(𝑐, 𝑣) 𝜏 = type_of(𝑝 ) 𝜂 ′ ∈ ptr_proj(𝜏 )
[Store]
restricted to point only to objects of a specific type. pts(𝑐, 𝑝.𝜂 ) ⊆ pts(𝑐 , 𝑜.𝜂.𝜂 ′ )
′ ′

Our analysis computes the following points-to relation: 𝑖 : 𝑝 = &(∗𝑣).𝜂 𝑐 ∈ ctxt_of( 𝑓𝑖 ) ⟨𝑐 ′ , 𝑜 ⟩ ∈ pts(𝑐, 𝑣)
[Gep]
⟨𝑐 ′ , 𝑜.𝜂 ⟩ ∈ pts(𝑐, 𝑝 )
• pts : C × P ↦→ ℘(C × O)
𝑖 : 𝑝 1 = 𝑝 2 as 𝜏
𝑐 ∈ ctxt_of(𝑓𝑖 ) ⟨𝑐 ′ , 𝑜 ⟩ ∈ pts(𝑐, 𝑝 2 ) 𝜏 ′ = deref_ty(𝜏 )
which records the points-to information for each pointer [Cast]
𝑜 ≻𝜏 ′ = transmute(𝑐 ′ , 𝑜, 𝜏 ′ ) ⟨𝑐 ′ , 𝑜 ≻𝜏 ′ ⟩ ∈ pts(𝑐, 𝑝 1 )
under every possible context being analyzed.
The following auxiliary functions are also used: 𝑖 : 𝑝 0 = 𝑓 (𝑝 1 , ..., 𝑝𝑟 )
𝑐 ∈ ctxt_of( 𝑓𝑖 ) 𝑐 ′ = ⌈𝑖 ::𝑐 ⌉𝑘 𝑓 ′ ∈ dispatch(𝑐, 𝑓 , 𝑝 1 )
𝜏0 = type_of(𝑝 0 ) 𝜏1 = type_of(𝑝 1 ) ... 𝜏𝑟 = type_of(𝑝𝑟 )
• ctxt_of : F ↦→ ℘(C) [Call]
𝑐 ′ ∈ ctxt_of( 𝑓 ′ )
• type_of : (P ∪ O) ↦→ T 𝑓′
∀𝜂 0 ∈ ptr_proj(𝜏0 ) : pts(𝑐 ′ , 𝑣0 .𝜂 0 ) ⊆ pts(𝑐, 𝑝 0 .𝜂 0 )
• proj_ty : T ↦→ ℘(R × T) 𝑓′
∀ 𝑗 ∈ [1, 𝑟 ], ∀𝜂 𝑗 ∈ ptr_proj(𝜏 𝑗 ) : pts(𝑐, 𝑝 𝑗 .𝜂 𝑗 ) ⊆ pts(𝑐 ′ , 𝑣 𝑗 .𝜂 𝑗 )
• ptr_proj : T ↦→ ℘(R)
• deref_ty : T ↦→ T
• transmute : C × O × T ↦→ O
Figure 6. Rules for 𝑘-callsite-sensitive pointer analysis (𝑓𝑖 is
• dispatch : C × (F ∪ P) × P ↦→ ℘(F) the function containing statement 𝑖 being analyzed).
except that the abstract object 𝑜𝑖 , identified by the alloca-
where ctxt_of maintains function contexts, type_of records
tion site 𝑖, is added to the points-to target of the pointer
pointer or object types, proj_ty provides field projections
𝑝. [AddrOf] simply causes 𝑝 1 to point to the place 𝑝 2 . [As-
with types (including an empty projection 𝜖 paired with
sign] handles implicit pointer copy in an Assign statement
the given type), ptr_proj fetches projections of pointer-
element-wise, based on the projections of the pointer type
type fields (resulting in 𝜖 for pointer types or ∅ if none ex-
fields of the type of 𝑝 1 (and 𝑝 2 ). For instance, if 𝑝 1 and 𝑝 2 are
ist), deref_ty indicates the type of a dereferenced pointer,
of the Outer type, pts(𝑐, 𝑝 2 .inn.x) ⊆ pts(𝑐, 𝑝 1 .inn.x) and
transmute returns the type variant of an object specialized
pts(𝑐, 𝑝 2 .inn.y) ⊆ pts(𝑐, 𝑝 1 .inn.y) will be established. If
for a different type, and dispatch determines the actual
𝑝 1 and 𝑝 2 are of a pointer type, then ptr_proj(𝜏) = 𝜖, reduc-
function implementation for a call. More details on the last
ing [Assign] to pts(𝑐, 𝑝 2 ) ⊆ pts(𝑐, 𝑝 1 ). [Load] and [Store]
two functions can be found in Section 3.3.
obtain every memory location 𝑜 that may be pointed to by 𝑣
Figure 6 formalizes callsite-based context-sensitive pointer
from the pts function and handle Load or Store statements
analysis with inference rules. We will briefly explain each
as an Assign statement between 𝑜.𝜂 and 𝑝, for all pointer
rule using the examples from Figure 4. In Section 3.3, we
type fields of 𝑜.𝜂 and 𝑝. For example, given the pointer ptr,
explore the analysis of various advanced language features,
which points to the object obj.inn, we can load it into a vari-
addressing the corresponding challenges from Section 2.
able in of type Inner using in = *ptr (or store to it using
[AllocStack] is responsible for managing stack alloca-
(*ptr) = in). It will be processed as in.x = obj.inn.x and
tion for a local variable. It creates an abstract object for this
in.y = obj.inn.y (or obj.inn.x = in.x and obj.inn.y =
variable and its constituent fields (if any), while recording
in.y). Similarly, [Gep] finds any object 𝑜 that may be pointed
their types. For instance, with the type Outer, we can derive
to by 𝑣 and inserts its field 𝑜.𝜂 into the points-to set of 𝑝. In
field-type pairs such as (𝜖, Outer), (inn, Inner), (inn.x,
[Cast], only a pointer cast is considered where 𝜏 is a pointer
&u32), and (inn.y, &u32) using the function proj_ty.
type. The function transmute is applied to each object 𝑜 in
Therefore, objects obj, obj.inn, obj.inn.x and obj.inn.y
the points-to set of 𝑝 2 to obtain its type variant 𝑜 ≻𝜏 ′ , where 𝜏 ′
are created and their respective types are recorded for the
is the dereference type of 𝜏. 𝑜 ≻𝜏 ′ is added to the points-to set
local variable obj. [AllocHeap] is responsible for allocating
of 𝑝 1 instead of 𝑜. [Call] applies dispatch to determine the
a heap object. It operates in a manner akin to [AllocStack]

65
A Context-Sensitive Pointer Analysis Framework for Rust CC ’24, March 2–3, 2024, Edinburgh, United Kingdom

actual implementation of a function invoked. A new context Algorithm 2: static_dispatch


𝑐 ′ is generated from the current context 𝑐 and the call site 𝑖 Input: A call’s operand 𝑓
Output: A target function 𝑓 ′ resolved
for the callee function invoked. In Rust MIR, a method call 1 (𝑑𝑒 𝑓 _𝑖𝑑, 𝑠𝑢𝑏𝑠𝑡𝑠 ) ← 𝑓
𝑝 0 = 𝑝 1 .𝑓 (𝑝 2, ..., 𝑝𝑟 ) is translated into 𝑝 0 = 𝑓 (𝑝 1, 𝑝 2, ..., 𝑝𝑟 ), 2 if 𝑑𝑒 𝑓 _𝑖𝑑 is not a trait method then
3 𝑓′ ← 𝑓
with 𝑝 1 as the first argument in the call. We identify the
𝑓′ 𝑓′ 4 else
variables in a method 𝑓 ′ as 𝑣 0 , 𝑣 1 , .... Thus, a call is modeled 5 if 𝑑𝑒 𝑓 _𝑖𝑑 is not an Fn* trait method then
6 𝑓 ′ ← Instance :: resolve(𝑑𝑒 𝑓 _𝑖𝑑, 𝑠𝑢𝑏𝑠𝑡𝑠 )
as assignments from its actual arguments 𝑝 𝑗 to their corre-
𝑓′ 7 else
sponding formal parameters 𝑣 𝑗 , and from its return variable 8 if substs[0] is Closure(𝑑𝑒 𝑓 _𝑖𝑑 ′ , 𝑠𝑢𝑏𝑠𝑡𝑠 ′ ) then
′ 9 𝑓 ′ ← (𝑑𝑒 𝑓 _𝑖𝑑 ′ , 𝑠𝑢𝑏𝑠𝑡𝑠 ′ )
𝑓
𝑣 0 to the target variable 𝑝 0 of the call statement. 10 else if substs[0] is FnDef(𝑑𝑒 𝑓 _𝑖𝑑 ′ , 𝑠𝑢𝑏𝑠𝑡𝑠 ′ ) then
Employing the inference rules from Figure 6, we use the in- 11 𝑓 ′′ ← (𝑑𝑒 𝑓 _𝑖𝑑 ′ , 𝑠𝑢𝑏𝑠𝑡𝑠 ′ )
12 𝑓 ′ ← static_dispatch( 𝑓 ′′ )
cremental worklist algorithm [22] for pointer analysis. This 13 else if substs[0] is FnPtr then
method maintains and updates a worklist of pointers, prop- 14 𝑓 ′ ← dummpy_function( )
agating changes through the constraint graph of pointer- 15 else
16 𝑓 ′ ← Instance :: resolve(𝑑𝑒 𝑓 _𝑖𝑑, 𝑠𝑢𝑏𝑠𝑡𝑠 )
related assignments (Table 1), until reaching a fixed point.

3.3 Support for Rust’s Language Features


provided within the trait and monomorphized for various
We delve into our approaches for addressing various pointer types that implement this trait. In this case, the 𝑑𝑒 𝑓 _𝑖𝑑 of the
analysis challenges, including analyzing calls, modeling pointer callee corresponds to the method’s definition within the trait
casts, and representing an object’s fields (among others). scope, and the first element of substs represents the concrete
3.3.1 Calls Different kinds of function calls are resolved type on which the method is called. In Figure 1, when the
through dispatch in our analysis according to Algorithm 1. method area() defined in trait Shape is called on a Circle
object, the 𝑑𝑒 𝑓 _𝑖𝑑 and substs are 𝑖𝑑 Shape::area and [Circle],
respectively. To identify the exact implementation for the
Algorithm 1: dispatch
invocation, Rupta uses the Rust compiler’s internal API,
Input: A context 𝑐 , the call’s operand 𝑓 , the call’s first argument 𝑝 1
Output: A set of target functions F resolved Instance::resolve(). If the trait method is implemented
1 if f is statically dispatched then for the concrete type, the corresponding implementation is
2 𝑓 ′ ← static_dispatch(𝑓 )
3 F ← {𝑓 ′} selected. Otherwise, the default implementation within the
4 else if f is dynamically dispatched then trait is monomorphized for the given concrete type.
5 F ← dynamic_dispatch(𝑐, 𝑓 , 𝑝 1 )
Handling calls through Fn* traits (lines 8–16) involves
6 else // f is a function pointer
7 F←∅ specific treatments for closures, function items, and function
8 foreach 𝑓 ′ ∈ pts(𝑐, 𝑓 ) do pointers, corresponding to Closure, FnDef, and FnPtr in
9 F ∋ static_dispatch(𝑓 ′ )
Algorithm 2. For a closure, the target function is identified
as the associated closure itself. In the case of a function item,
static_dispatch is utilized to further process it. When
Static Dispatch. A statically dispatched call in Rust aims for dealing with a function pointer under a Fn* trait, a new
a specific compile-time-determined implementation. How- dummy function is generated to act as the target function:
ever, Rust’s use of generics adds complexity. In Rust MIR,
generic types remain, and monomorphization transforms 1 fn dummy_func(fp: fn(T)->U, args: T) -> U { fp(args) }
MIR into lower-level codegen IR. To handle static dispatch
in Rupta, we perform on-the-fly monomorphization. When which only involves a function pointer call. The values of
calling a generic function, Rupta determines the real types fp and args will come from the arguments of the Fn* trait
of the generics in a context-sensitive manner and substitutes call. All the other calls via Fn* traits are handled in line 16.
them into the target function. In Rust MIR, a function is Dynamic Dispatch. Dynamic dispatch for virtual calls on
identified by a (def_id, substs) pair, where def_id uniquely dynamic trait objects necessitates knowledge of the con-
identifies the function, and substs lists the (argument) types crete types these objects point to. Call graph edges are then
for generic substitution. For example, the monomorphized dynamically identified using the gathered points-to infor-
sprint_area() functions in Figure 1 are represented by mation. Algorithm 3 details the dynamic_dispatch process
(𝑖𝑑 sprint_area, [Circle]) and (𝑖𝑑 sprint_area, [Rectangle]). for resolving targets of such calls. In these calls, the func-
To determine the target function for a call that is to be tion 𝑓 ’s 𝑑𝑒 𝑓 _𝑖𝑑 relates to a trait method’s definition, with
statically dispatched, Rupta uses static_dispatch given the first element of 𝑠𝑢𝑏𝑠𝑡𝑠 being a dynamic type, like dyn
in Algorithm 2. Specifically, when resolving a call to a trait- Shape in Figure 1. We simulate dynamic dispatch as static by
defined method 𝑓 (lines 5–18), Rupta treats the trait method substituting 𝑠𝑢𝑏𝑠𝑡𝑠[0] with the type of each object the trait
as a generic function, since a default implementation can be object’s pointer references, using static_dispatch.

66
CC ’24, March 2–3, 2024, Edinburgh, United Kingdom Wei Li, Dongjie He, Yujiang Gui, Wenguang Chen, and Jingling Xue

Algorithm 3: dynamic_dispatch Algorithm 4: transmute


Input: A context 𝑐 , the call’s operand 𝑓 , the call’s first argument 𝑝 1 Input: A context c, an abstract object 𝑜 , and a cast type 𝜏 ′
Output: A set of target functions F resolved Output: A new abstract object 𝑜 ≻𝜏 ′ of type 𝜏 ′
1 F←∅ 1 𝑜 ≻𝜏 ′ ← get_or_create(𝑜, 𝜏 ′ )
2 foreach ⟨𝑐 ′ , 𝑜 ⟩ ∈ pts(𝑐, 𝑝 1 ) do 2 𝑝𝑟𝑜 𝑗 ← ptr_proj(type_of(𝑜 ) )
3 𝜏 ← type_of(𝑜 ) 3 𝑝𝑟𝑜 𝑗 ′ ← ptr_proj(𝜏 ′ )
4 replace 𝑠𝑢𝑏𝑠𝑡𝑠 [0] of 𝑓 with 𝜏
4 for 𝜂 ∈ 𝑝𝑟𝑜 𝑗, 𝜂 ′ ∈ 𝑝𝑟𝑜 𝑗 ′ do
5 F ∋ static_dispatch(𝑓 ) 5 if 𝑜.𝜂, 𝑜 ≻𝜏 ′ .𝜂 ′ have the same offset then
6 𝜏𝜂 ← type_of(𝑜.𝜂 )
7 𝜏𝜂′ ′ ← type_of(𝑜 ≻𝜏 ′ .𝜂 ′ )
Function Pointers. The process of resolving a function
8 if 𝜏𝜂 == 𝜏𝜂′ ′ then
pointer call is described in lines 8–9 of Algorithm 1. A func- 9 Emit a new Assign stmt under context c: 𝑜.𝜂 = 𝑜 ≻𝜏 ′ .𝜂 ′
tion pointer may point to a function item or a closure. There- 10 Emit a new Assign stmt under context c: 𝑜 ≻𝜏 ′ .𝜂 ′ = 𝑜.𝜂
fore, we identify all potential targets during the pointer anal- 11 else
12 Emit a new Cast stmt under context c: 𝑜.𝜂 = 𝑜 ≻𝜏 ′ .𝜂 ′ as 𝜏𝜂
ysis. Then, for each potential target, we simulate dynamic
13 Emit a new Cast stmt under context c: 𝑜 ≻𝜏 ′ .𝜂 ′ = 𝑜.𝜂 as 𝜏𝜂′ ′
dispatch statically by using static_dispatch.

3.3.2 Pointer Casts Like C++’s reinterpret_cast, these


types after some operations. These intermediate steps of-
statements often pose challenges, leading to unsound (under-
ten result in imprecise *u8 pointers that point to multiple
approximate) and imprecise (spurious) points-to information.
extraneous objects, leading to unnecessary type variants,
To enhance modeling of pointer casts and achieve superior
imprecise points-to relations, and spurious subobjects. To
soundness and precision compared to state-of-the-art pointer
mitigate this, we impose restrictions on such casts. Specifi-
analyses for C/C++ [38, 41] (Section 2.2), we propose using
cally, we only allow a non-heap object 𝑜 to be transmuted to
multiple abstract objects with distinct types to represent
𝑜 ≻𝑢8 and further cast to 𝑜 ≻𝜏 ′ if 𝑜 ≻𝜏 ′ has already been created
memory locations. Each abstract object 𝑜 is initially associ-
before. While this approach may be unsound in rare cases, it
ated with type 𝜏. When a cast statement transforms 𝑜 into a
aligns with existing techniques [7, 36] that prioritize preci-
new type 𝜏 ′ , a new abstract object, denoted as 𝑜 ≻𝜏 ′ , is created
sion and selectively support well-established code patterns
to represent the modified memory location.
rather than being overly conservative, as mentioned in [4].
Maintaining the correlations between an object 𝑜 and its
Another issue arises with recursive casting:
type variants 𝑜 ≻𝜏 ′ is crucial. For example, in the Outer struct
from Figure 4a, a developer might cast a pointer to an Outer 1 struct Foo { x: u32, y: u32 }
struct into a pointer to a (&u32, &u32) tuple: 2 p = &(*p).y as *const u32 as *const Foo;

1 let p = &obj as *const Outer; where p initially points to an object foo of type Foo. Analyz-
2 let q = p as *const (&u32, &u32); ing line 2 leads it to point to the abstract object (foo.y) ≻Foo .
However, re-analyzing this line for the new pointed-to object
As q points to a new abstract object obj ≻ (&u32,&u32) , it is es- induces another object, ((foo.y) ≻Foo .y) ≻Foo , and this process
sential to maintain the points-to information in both obj and repeats indefinitely, causing our analysis to fail to terminate.
obj ≻ (&u32,&u32) in sync. This ensures that obj ≻ (&u32,&u32) .0 This issue is known as the Positive Weight Cycle (PWC) prob-
is associated with obj.inn.x and obj ≻ (&u32,&u32) .1 is asso- lem in pointer analysis for C/C++ [36]. To address it, we
ciated with obj.inn.y. Neglecting this synchronization can currently ignore casts that may lead to recursive casting, as
lead to unsound and imprecise results. such casts are typically incompatible and unsafe, stemming
Algorithm 4, as applied in Cast (Figure 6), manages the from the inherent imprecision of pointer analysis.
transmute() function. When a memory location 𝑜 under-
goes a cast to type 𝜏 ′ , a new abstract object 𝑜 ≻𝜏 ′ is created, 3.3.3 Representing an Object’s Fields Let us explain how
provided it does not already exist. Type coercion constraints our projection-based approach for representing object fields
are then set between 𝑜 and 𝑜 ≻𝜏 ′ . For each pair of pointer enables accurate analysis for the example in Figure 4, using
type fields (𝑜.𝜂, 𝑜 ≻𝜏 ′ .𝜂 ′ ) sharing the same offset, mutually the rules from Figure 6. When initializing obj, we create
reciprocal assignments are inserted, classified as Assign or abstract objects obj, obj.inn, obj.inn.x, and obj.inn.y
Cast based on whether their pointer types are identical. This through [AllocStack], with their types being Outer, Inner,
process ensures synchronization of points-to information &u8, and &u8, respectively. [AddrOf] handles the statement
between the original object and its transformed variants. that assigns the address of obj.inn to ptr, causing obj.inn
Our approach effectively addresses various issues caused to be included in the points-to set of ptr. The casting of
by pointer casts, but it still encounters challenges when a ptr to tostr is processed as a direct assignment by [Assign],
pointer is cast to an integer, a problem that remains unre- resulting in tostr encompassing all objects pointed to by
solved in C/C++ pointer analysis [38, 41]. Additionally, we ptr. Finally, [Call] resolves the call tostr.to_string() by
observed that many library functions perform pointer casts invoking the dispatch function (Algorithm 1), which iden-
to *u8 pointers and then revert them back to their original tifies the types of the pointed-to objects of tostr according

67
A Context-Sensitive Pointer Analysis Framework for Rust CC ’24, March 2–3, 2024, Edinburgh, United Kingdom

to Algorithm 3. As tostr is found to point to a single ob- Table 2. Rust projects (including stars and #MIR statements)
ject obj.inn with a precise type, Inner, this call is correctly
resolved to Inner::to_string(), as desired. Project #Stars #Stmts Description
atuin 10.7k 788k Shell history management for an SQLite database
3.3.4 Other Language Features Rupta effectively han- bandwhich 7.8k 333k A terminal bandwidth utilization tool
dust 6.1k 384k A more intuitive version of ‘du’
dles several other Rust features, such as Enum and Union exa 21.4k 144k A modern replacement for ‘ls’
fselect 3.5k 485k Finding files with SQL-like queries
types. Enum variants are processed as if they are fields of gitui 12.8k 776k A fast terminal-ui for git
the enum, thus eliminating the need for specialized handling. grin 5k 1103k An implementation of the Mimblewimble protocol
lsd 9.7k 420k An advanced version of ‘ls’ command tool
In the case of Union types, where different fields occupy the mdbook 14.4k 874k A utility to create books from markdown files
navi 13.1k 457k An interactive cheatsheet tool for ‘cmd’
same storage space, type coercion constraints are set among resvg 2k 458k An SVG rendering library
these fields, akin to the approach in Algorithm 4. rustscan 10.4k 469k A modern take on the port scanner
zoxide 10.2k 291k A smarter ‘cd’ command tool
In Rust, Builder::spawn() initiates threads by accepting
a generic Fn* trait object and executing its associated func- treating them as pointing to any function item or closure
tion in a new thread. This function, indirectly invoked via with the same signature encountered in assignments.
the external C function thread_start(), is not directly re- Furthermore, we obtained a context-insensitive version
solvable in Rust MIR. We have precisely modeled spawn()’s of Rupta, denoted Andersen [1], as a third baseline by ana-
semantics, enabling analysis of the associated function. lyzing every function under the empty context.

3.4 Discussion 5 Evaluation


Rupta is sound with respect to our Rust language model, We demonstrate Rupta’s efficiency and precision in per-
as outlined in Figure 5, except for the case of “𝜚 as 𝜏”. This forming 𝑘-callsite-based pointer analysis for extensive Rust
is because our inference rules in Figure 6 consistently over- programs, outperforming Ruscg and Rurta in constructing
approximate all statements analyzed. It is important to note accurate call graphs. We also highlight the inefficiency of
that in practical scenarios, no pointer analysis frameworks Andersen, mainly due to Rust’s distinctive handling of raw
are entirely sound due to dynamic language elements like pointers. The four Rupta configurations we evaluate are
reflection, native code, and pointer casts, as discussed in Andersen (context-insensitive), 1-CS (𝑘 = 1), 2-CS (𝑘 = 2),
[29]. As described in Section 3.3.2, Rupta is more sound and and 3-CS (𝑘 = 3). Our two key research questions are:
precise in handling “𝜚 as 𝜏” than the state of the art [41].
• RQ1. Is Rupta scalable and effective?
• RQ2. Can Rupta build precise call graphs?
4 Implementation Benchmarks. To evaluate the scalability, efficiency, and pre-
cision of Rupta, we conducted experiments on real-world
Rupta. Rupta, developed in Rust, consists of about 10K
Rust projects. We have selected 13 large open-source projects
lines of code and is designed as a custom callback function
with high popularity, based on their number of downloads
for the official Rust compiler, rustc 1.63.0-nightly. It
or stars on GitHub. Each project in our dataset includes a
seamlessly integrates with Cargo, the official Rust package
binary target, enabling us to start the analysis from the main
manager, and can be executed with a single command, cargo
entry point. The dependencies (including the Rust Standard
pta. By default, Rupta includes build-std, which compiles
Library) are compiled together with the project, allowing
the standard library as part of the crate graph compilation.
us to analyze functions within them. The repositories we
Moreover, Rupta includes dedicated handlers to enhance
selected, along with their GitHub stars, number of MIR state-
the precision of analyzing two standard library functions,
ments (in reachable functions discovered by Rurta), and a
Result::map_err and Into::into<Unique, NonNull>.
brief description, are presented in Table 2.
Achieving similar precision with these two functions typi-
Experimental Setting. We have conducted all our experi-
cally necessitates more in-depth context analysis.
ments on Ubuntu 20.04.5, with an Intel(R) Xeon(R) W-2275
Baselines. As the first pointer analysis framework for Rust, @ 3.30GHz CPU and 512GB of RAM.
Rupta’s call graph construction will be assessed against two
state-of-the-art solutions: Ruscg (Static Call Graph for Rust) 5.1 RQ1: Is Rupta Scalable and Effective?
[6, 27, 37] and Rurta (Rapid Type Analysis [5] for Rust) [44]. Table 3 presents the analysis times, memory usage and av-
Ruscg constructs a static call graph, omitting dynamic calls, a erage points-to sizes of Rupta for its four configurations.
common approach in existing inter-procedure analysis tools Rupta, the first context-sensitive pointer analysis tool for
[27, 37]. Rurta determines possible types for dynamic trait Rust, demonstrates scalable analysis times for Andersen, 1-
objects using information about dynamic trait object creation. CS, and 2-CS across all 13 projects. 1-CS can complete the
It maintains a cache of concrete types at cast sites, allowing analysis within 30 seconds for the majority of the projects (8
dynamic traits to refer to any type in the cache implementing out of 13), and within 60 seconds for all except grin. Mean-
the specified trait. Rurta also applies to function pointers, while, 2-CS manages to complete for 11 projects within 8

68
CC ’24, March 2–3, 2024, Edinburgh, United Kingdom Wei Li, Dongjie He, Yujiang Gui, Wenguang Chen, and Jingling Xue

Table 3. Analysis times, memory usage, and average points-to sizes of Rupta in four configurations (OOM: Out of memory).
atuin bandwhich dust exa fselect gitui grin lsd mdbook navi resvg rustscan zoxide
Time (s) 71.7 22.0 17.3 1.8 37.4 171.4 565.4 27.9 361.9 21.9 31.2 52.8 10.4
Andersen Mem (GB) 20.3 5.3 5.2 0.4 10.2 35.1 74.2 7.1 48.6 6.2 8.4 8.5 3.1
Avg pts 198.6 130.5 97.9 31.2 202.1 393.7 1000.2 145.3 758.0 150.7 176.5 253.7 87.3
Time (s) 49.7 9.5 10.7 2.5 31.0 39.8 193.3 13.9 56.3 12.3 19.0 20.9 7.2
1-CS Mem (GB) 13.0 2.3 3.2 0.6 9.7 9.7 45.3 4.2 13.9 4.0 8.0 5.9 2.1
Avg pts 22.3 9.7 11.9 6.1 37.1 13.9 72.6 11.1 23.2 12.8 33.7 17.1 12.1
Time (s) 447.5 94.6 45.8 7.5 178.9 383.8 1506.8 64.6 1057.9 50.0 62.1 196.6 22.3
2-CS Mem (GB) 60.4 11.6 14.2 2.1 42.9 47.9 161.2 18.4 165.8 17.0 25.6 30.2 7.8
Avg pts 17.8 7.9 9.8 5.0 31.9 11.4 66.5 8.4 18.8 10.3 26.3 13.8 9.8
Time (s) 9332.9 3003.4 250.3 43.5 2212.4 5103.2 470.2 258.5 321.7 4494.1 121.7
3-CS Mem (GB) 381.2 89.0 58.9 7.4 209.3 303.3 OOM 76.1 OOM 63.0 103.0 199.0 31.2
Avg pts 13.9 5.7 7.7 4.0 18.4 8.7 6.0 7.3 21.4 10.9 7.7

minutes. The analysis time and memory usage increase sig- 1 pub struct NonNull<T> { p: *const T }
nificantly for 3-CS, and it runs out of memory when ana- 2 pub struct Iter<T> { ptr: NonNull<T>, end: *const T }
3 impl<T> Iter<T> {
lyzing the two largest projects (consisting of 870K and 1.1M 4 pub fn new(slice: &[T]) -> Self {
statements). This unscalability aligns with the observation 5 let ptr = slice.as_ptr();
that published papers on callsite-based pointer analysis for 6 Self {
7 ptr: NonNull::new_unchecked(ptr),
Java and C/C++ [21, 23, 30] have not even utilized 𝑘 ⩾ 3. 8 end: ptr.add(slice.len()) }}}
Surprisingly, 1-CS outperforms Andersen, achieving faster
analysis times across all 13 benchmarks except for exa, con- Figure 7. An example calling for context-sensitivity.
trary to typical outcomes in pointer analyses for Java [21, 25].
This is primarily due to encapsulating unsafe code, espe- with context sensitivity, where even 1-CS proves sufficient
cially raw pointers within safe Rust code. Without context- to avoid conflating analysis results across contexts for func-
sensitivity, analyzing many functions in the Rust Standard tions like as_ptr and new_unchecked. Consequently, the
Library produces highly imprecise points-to results, resulting average points-to size decreases significantly (from 758.0 to
in significantly larger points-to sets for many pointers and an 23.2 for mdbook and from 1000.2 to 72.6 for grin).
overall increase in analysis time. As shown in Table 3, increas-
ing 𝑘 from 1 to 3 in Rupta improves precision, evidenced 5.2 RQ2: Can Rupta Build Precise Call Graphs?
by a slight decrease in the average size of points-to sets for Table 4 summarizes four key statistics for call graphs built
each project. Nevertheless, the disparity between Andersen using Rupta, Rurta, and Ruscg: (1) #call-edges: the number
and 1-CS remains considerable, with Andersen’s average of call graph edges discovered, (2) #reach-funcs: the number
points-to set sizes being an order of magnitude larger. of reachable functions found, (3) #dyn-sites: the number of
Let us explore why Rupta’s performance benefits from dynamic callsites detected, and (4) #dyn-edges: the number
context-sensitivity in practice. In Figure 7, we provide simpli- of dynamic call edges resolved.
fied definitions of two commonly used structs: NonNull and Let us first compare Rupta and Ruscg. On average, Rupta
Iter, along with the constructor for Iter. NonNull plays a discovers approximately 29% more call edges and 26% more
critical role in efficient heap memory manipulation, serving reachable functions, resulting in more sound call graphs, and
as a wrapper for raw pointers in widely used types like Box consequently, enabling uncovering potentially more security
and Vec. On the other hand, Iter is used for iterating a slice. vulnerabilities than Ruscg (once used by security analysis
When creating a new Iter instance from a slice, the con- tools [3, 6, 8, 27, 37, 46]). Comparing Rupta and Rurta in
structor first converts the &[T] slice to a pointer ptr of type Table 4, we observe that while the two tools produce similar
*T using the as_ptr function. Then, ptr is passed as an argu- results in terms of the number of call edges and reachable
ment to new_unchecked() to create a new NonNull object. methods found, Rupta outperforms Rurta by reducing spu-
These two functions, as_ptr() and new_unchecked(), are rious dynamic edges by an average of 70%.
frequently called in a program. In context-insensitive analy- Our results show significant precision improvements in
sis, the pointer field of the created Iter object would point to call graph construction with context-sensitive pointer analy-
every memory location referenced by the arguments of these sis compared to Andersen, reducing dynamic edges by ap-
two functions. This could result in numerous pointed-to ob- proximately 24% on average. Andersen exhibits notable im-
jects, which might further propagate, potentially causing precision, resulting in a substantial divergence between its
issues with many other library functions. call graphs and those produced by context-sensitive anal-
The problem of inflated points-to targets worsens as the yses (1-CS to 3-CS). In most cases, 1-CS to 3-CS produce
project’s scale increases, as shown in Table 3 (with aver- nearly identical results across all metrics, with only a few
age points-to sizes reaching 758.0 for mdbook and 1000.2 benchmarks showing slightly better precision for 3-CS or
for grin). However, this problem is substantially mitigated 2-CS over 1-CS. Rust differs fundamentally from other pop-
ular languages like C/C++ and Java, with relatively fewer

69
A Context-Sensitive Pointer Analysis Framework for Rust CC ’24, March 2–3, 2024, Edinburgh, United Kingdom

Table 4. Comparing call graphs built by different analyses. used KLEE, a symbolic execution engine [12], for Rust veri-
Program Metrics Ruscg Rurta Ander 1-CS 2-CS 3-CS fication. CRUST [43] and Kani [44] translated Rust code to
#call-edges 95618 127770 109755 108325 108093 108025 C and verified it using the CBMC model checker [13]. Prusti
atuin #reach-funcs 40862 52439 46499 46145 46014 46012
#dyn-sites
#dyn-edges
-
-
182
4805
134
1274
132
1057
131
1047
130
1042
[3] leveraged the Viper verification infrastructure [33] for
#call-edges 41668 56453 54302 53094 53094 52756 supporting verifying user-added specifications.
#reach-funcs 19435 24464 24099 23869 23869 23780
bandwhich
#dyn-sites - 284 252 245 245 242 Other tools focus on Rust MIR. Qin et al. [37] developed
#dyn-edges - 2369 904 708 708 672 bug detectors for use-after-free and double-lock bugs. Safe-
#call-edges 55653 63725 62447 61290 61290 61099
dust #reach-funcs 24067 26822 26614 26303 26303 26259 Drop [14] identifies memory corruptions using taint analysis.
#dyn-sites - 83 75 73 73 58
#dyn-edges - 1454 684 477 477 389 MIRAI and MirChecker [27] are verification tools that per-
#call-edges 18342 24349 23109 22258 22258 22256 form symbolic execution on Rust MIR. Rudra [6] detects
exa #reach-funcs 8197 10310 10049 9864 9864 9863
#dyn-sites - 60 56 47 47 46 memory safety bugs using Rust MIR and HIR.
#dyn-edges - 993 425 256 256 255
#call-edges 61434 77496 74130 73087 73087 73074
Pointer Analysis for Other Languages. Pointer analysis
fselect #reach-funcs
#dyn-sites
25099
-
30134
185
29657 29493 29493 29487
181 181 181 179
is a complex problem in static program analysis, with nu-
#dyn-edges - 3381 1050 878 878 872 merous approaches and tools proposed for Java and C/C++.
#call-edges 75008 126161 123613 112399 112399 112394 Imperative implementations like Spark [22], Wala [45], and
gitui #reach-funcs 33278 51411 50774 46887 46887 46884
#dyn-sites
#dyn-edges
-
-
307
2930
271
1566
260
1219
260
1219
257
1216
Qilin [37] offer context-sensitive pointer analysis for Java.
#call-edges 127921 176641 160534 157163 157154 - Declarative implementations, such as Doop [11] for Java
#reach-funcs 51725 68036 63684 63040 63033 -
grin
#dyn-sites - 623 440 436 435 - and cclyzer [7] for C/C++, are based on a Datalog engine.
#dyn-edges - 8062 2587 2046 2044 - SVF [41, 42] provides support for performing flow-sensitive
#call-edges 61870 67231 65857 64655 64655 64655
lsd #reach-funcs 25445 26853 26647 26314 26314 26314 pointer analysis for C/C++ on a sparse flow value graph.
#dyn-sites - 82 72 68 68 68
#dyn-edges - 1465 684 475 475 475 Recent research focuses on improving scalability while
#call-edges 80512 139623 121895 121139 121128 - maintaining precision or minimizing precision trade-offs.
mdbook #reach-funcs 34571 56823 50995 50855 50849 -
#dyn-sites - 179 112 112 112 - Selective context sensitivity has been a widely studied opti-
#dyn-edges - 3851 1193 1035 1032 -
#call-edges 31778 74650 62651 61385 61385 61385
mization, inspiring various techniques [18–21, 25, 26, 31, 35].
navi #reach-funcs 13797 30139 25722 25382 25382 25382
#dyn-sites - 97 74 69 69 69
#dyn-edges - 1624 700 488 488 488
#call-edges 62852 71954 70109 69187 69187 69187 7 Conclusion
resvg #reach-funcs 23288 26088 25677 25445 25445 25445
#dyn-sites
#dyn-edges
-
-
166
1232
158
551
155
444
155
444
155
444
We introduce Rupta as the first context-sensitive pointer
#call-edges 64732 78416 74859 74213 74213 74209 analysis framework for Rust, functioning on Rust MIR. This
#reach-funcs 28735 31755 31212 31091 31091 31089
rustscan
#dyn-sites - 284 247 247 247 245 framework effectively addresses crucial aspects like call anal-
#dyn-edges - 3686 1280 1150 1150 1148 ysis, pointer cast modeling, and the management of nested
#call-edges 33091 47394 46436 45359 45359 45359
zoxide #reach-funcs 14425 19870 19689 19352 19352 19352 struct fields. Through evaluations conducted on real-world
#dyn-sites - 76 62 49 49 49
#dyn-edges - 1011 500 316 316 316 projects, Rupta has proven its capability in generating pre-
cise points-to information and creating accurate call graphs.
dynamic call sites even in real-world Rust projects (Table 2). Rupta is expected to significantly enhance existing program
However, for Rust programs using certain language features, analysis and verification tools, focusing on bug and secu-
such as structured types and pointer casts, Rupta demon- rity vulnerability detection. This advancement is aimed at
strates good precision and efficiency. Therefore, 1-CS is gen- improving the overall safety and reliability of Rust programs.
erally sufficient for constructing precise call graphs for such As the Rupta community expands, we aim to integrate
Rust programs, and increasing the context length may have various forms of context-sensitivity, like object sensitivity
limited benefits. In contrast, increasing the context length in [32], type sensitivity [39], and flow sensitivity. This will
context-sensitive pointer analysis frameworks for Java, such enhance Rupta’s capacity for advanced program analysis.
as Qilin [21] and Doop [39], often yields only small preci-
sion improvements in constructing call graphs for large Java
applications. Nevertheless, Rupta represents a significant 8 Data Availability
advancement in pointer analysis over the state of the art for The research artifact associated with this paper can be found
Rust. It is substantially more sound than Ruscg (discovering on Zenodo [24]. Rupta is set to be released as open-source
29% more call graph edges) and more precise than Rurta software, accessible at https://rustanlys.github.io/rupta/.
(eliminating about 70% of spurious dynamic call edges).

6 Related Work 9 Acknowledgements


We review the work that is the most related to our work. We extend our gratitude to all reviewers for their construc-
Program Analysis for Rust. Several studies have adapted tive feedback. This research has been supported by ARC
existing program analysis tools for Rust. Linder et al. [28] grants (DP210102409 and DP240103194).

70
CC ’24, March 2–3, 2024, Edinburgh, United Kingdom Wei Li, Dongjie He, Yujiang Gui, Wenguang Chen, and Jingling Xue

References 2004, Held as Part of the Joint European Conferences on Theory and
[1] Lars Ole Andersen. 1994. Program analysis and specialization for the Practice of Software, ETAPS 2004, Barcelona, Spain, March 29-April 2,
C programming language. Ph.D. Dissertation. 2004. Proceedings 10. Springer, 168–176. https://doi.org/10.1007/978-3-
[2] Brian Anderson, Lars Bergstrom, Manish Goregaokar, Josh Matthews, 540-24730-2_15
Keegan McAllister, Jack Moffitt, and Simon Sapin. 2016. Engineering [14] Mohan Cui, Chengjun Chen, Hui Xu, and Yangfan Zhou. 2023. Safe-
the Servo web browser engine using Rust. In Proceedings of the 38th Drop: Detecting Memory Deallocation Bugs of Rust Programs via
International Conference on Software Engineering Companion (Austin, Static Data-flow Analysis. ACM Trans. Softw. Eng. Methodol. 32, 4,
Texas) (ICSE ’16). Association for Computing Machinery, New York, Article 82 (may 2023), 21 pages. https://doi.org/10.1145/3542948
NY, USA, 81–89. https://doi.org/10.1145/2889160.2889229 [15] Karel Driesen and Urs Hölzle. 1996. The direct cost of virtual function
[3] Vytautas Astrauskas, Peter Müller, Federico Poli, and Alexander J. calls in C++. In Proceedings of the 11th ACM SIGPLAN Conference on
Summers. 2019. Leveraging Rust types for modular specification and Object-Oriented Programming, Systems, Languages, and Applications
verification. Proc. ACM Program. Lang. 3, OOPSLA, Article 147 (oct (San Jose, California, USA) (OOPSLA ’96). Association for Computing
2019), 30 pages. https://doi.org/10.1145/3360573 Machinery, New York, NY, USA, 306–323. https://doi.org/10.1145/
[4] Dzintars Avots, Michael Dalton, V. Benjamin Livshits, and Monica S. 236337.236369
Lam. 2005. Improving software security with a C pointer analysis. In [16] Ana Nora Evans, Bradford Campbell, and Mary Lou Soffa. 2020. Is Rust
Proceedings of the 27th International Conference on Software Engineering used safely by software developers?. In Proceedings of the ACM/IEEE
(St. Louis, MO, USA) (ICSE ’05). Association for Computing Machinery, 42nd International Conference on Software Engineering (Seoul, South
New York, NY, USA, 332–341. https://doi.org/10.1145/1062455.1062520 Korea) (ICSE ’20). Association for Computing Machinery, New York,
[5] David F. Bacon and Peter F. Sweeney. 1996. Fast static analysis of NY, USA, 246–257. https://doi.org/10.1145/3377811.3380413
C++ virtual function calls. In Proceedings of the 11th ACM SIGPLAN [17] Rust for Linux. 2023. The Rust for Linux Project. https://rust-for-
Conference on Object-Oriented Programming, Systems, Languages, and linux.com/
Applications (San Jose, California, USA) (OOPSLA ’96). Association for [18] Dongjie He, Yujiang Gui, Wei Li, Yonggang Tao, Changwei Zou, Yulei
Computing Machinery, New York, NY, USA, 324–341. https://doi.org/ Sui, and Jingling Xue. 2023. A Container-Usage-Pattern-Based Context
10.1145/236337.236371 Debloating Approach for Object-Sensitive Pointer Analysis. Proc. ACM
[6] Yechan Bae, Youngsuk Kim, Ammar Askar, Jungwon Lim, and Tae- Program. Lang. 7, OOPSLA2 (2023), 971–1000. https://doi.org/10.1145/
soo Kim. 2021. Rudra: Finding Memory Safety Bugs in Rust at the 3622832
Ecosystem Scale. In Proceedings of the ACM SIGOPS 28th Symposium [19] Dongjie He, Jingbo Lu, Yaoqing Gao, and Jingling Xue. 2021. Acceler-
on Operating Systems Principles (Virtual Event, Germany) (SOSP ’21). ating Object-Sensitive Pointer Analysis by Exploiting Object Contain-
Association for Computing Machinery, New York, NY, USA, 84–99. ment and Reachability. In 35th European Conference on Object-Oriented
https://doi.org/10.1145/3477132.3483570 Programming (ECOOP 2021) (Leibniz International Proceedings in In-
[7] George Balatsouras and Yannis Smaragdakis. 2016. Structure-sensitive formatics (LIPIcs), Vol. 194), Anders Møller and Manu Sridharan (Eds.).
points-to analysis for C and C++. In Static Analysis: 23rd International Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Ger-
Symposium, SAS 2016, Edinburgh, UK, September 8-10, 2016, Proceedings many, 16:1–16:31. https://doi.org/10.4230/LIPIcs.ECOOP.2021.16
23. Springer, 84–104. https://doi.org/10.1007/978-3-662-53413-7_5 [20] Dongjie He, Jingbo Lu, and Jingling Xue. 2022. Context debloat-
[8] Marek Baranowski, Shaobo He, and Zvonimir Rakamarić. 2018. Veri- ing for object-sensitive pointer analysis. In Proceedings of the 36th
fying Rust Programs with SMACK. In Automated Technology for Veri- IEEE/ACM International Conference on Automated Software Engineer-
fication and Analysis: 16th International Symposium, ATVA 2018, Los ing (Melbourne, Australia) (ASE ’21). IEEE Press, 79–91. https:
Angeles, CA, USA, October 7-10, 2018, Proceedings 16. Springer, 528–535. //doi.org/10.1109/ASE51524.2021.9678880
https://doi.org/10.1007/978-3-030-01090-4_32 [21] Dongjie He, Jingbo Lu, and Jingling Xue. 2022. Qilin: A New Frame-
[9] Marc Berndl, Ondrej Lhoták, Feng Qian, Laurie Hendren, and Navindra work For Supporting Fine-Grained Context-Sensitivity in Java Pointer
Umanee. 2003. Points-to analysis using BDDs. In Proceedings of the Analysis. In 36th European Conference on Object-Oriented Program-
ACM SIGPLAN 2003 Conference on Programming Language Design and ming (ECOOP 2022) (Leibniz International Proceedings in Informatics
Implementation (San Diego, California, USA) (PLDI ’03). Association (LIPIcs), Vol. 222), Karim Ali and Jan Vitek (Eds.). Schloss Dagstuhl
for Computing Machinery, New York, NY, USA, 103–114. https://doi. – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 30:1–30:29.
org/10.1145/781131.781144 https://doi.org/10.4230/LIPIcs.ECOOP.2022.30
[10] Kevin Boos, Namitha Liyanage, Ramla Ijaz, and Lin Zhong. 2020. The- [22] Ondřej Lhoták and Laurie Hendren. 2003. Scaling Java points-to analy-
seus: an Experiment in Operating System Structure and State Man- sis using Spark. In Compiler Construction: 12th International Conference,
agement. In 14th USENIX Symposium on Operating Systems Design CC 2003 Held as Part of the Joint European Conferences on Theory and
and Implementation (OSDI 20). USENIX Association, 1–19. https: Practice of Software, ETAPS 2003 Warsaw, Poland, April 7–11, 2003 Pro-
//www.usenix.org/conference/osdi20/presentation/boos ceedings 12. Springer, 153–169. https://doi.org/10.1007/3-540-36579-
[11] Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative 6_12
specification of sophisticated points-to analyses. In Proceedings of the [23] Lian Li, Cristina Cifuentes, and Nathan Keynes. 2013. Precise and
24th ACM SIGPLAN Conference on Object Oriented Programming Sys- Scalable Context-Sensitive Pointer Analysis via Value Flow Graph. In
tems Languages and Applications (Orlando, Florida, USA) (OOPSLA ’09). Proceedings of the 2013 International Symposium on Memory Manage-
Association for Computing Machinery, New York, NY, USA, 243–262. ment (Seattle, Washington, USA) (ISMM ’13). Association for Comput-
https://doi.org/10.1145/1640089.1640108 ing Machinery, New York, NY, USA, 85–96. https://doi.org/10.1145/
[12] Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: 2491894.2466483
unassisted and automatic generation of high-coverage tests for com- [24] Wei Li, Dongjie He, Yujiang Gui, Wenguang Chen, and Xue Jin-
plex systems programs. In Proceedings of the 8th USENIX Confer- gling. 2024. Artifact associated to the paper "A Context-Sensitive
ence on Operating Systems Design and Implementation (San Diego, Pointer Analysis Framework for Rust and Its Application to Call Graph
California) (OSDI’08). USENIX Association, USA, 209–224. https: Construction" published in CC’24. https://doi.org/10.5281/zenodo.
//dl.acm.org/doi/10.5555/1855741.1855756 10566216 artifact.
[13] Edmund Clarke, Daniel Kroening, and Flavio Lerda. 2004. A tool for [25] Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2018.
checking ANSI-C programs. In Tools and Algorithms for the Construc- Precision-guided context sensitivity for pointer analysis. Proc. ACM
tion and Analysis of Systems: 10th International Conference, TACAS

71
A Context-Sensitive Pointer Analysis Framework for Rust CC ’24, March 2–3, 2024, Edinburgh, United Kingdom

Program. Lang. 2, OOPSLA, Article 141 (oct 2018), 29 pages. https: [37] Boqin Qin, Yilun Chen, Zeming Yu, Linhai Song, and Yiying Zhang.
//doi.org/10.1145/3276511 2020. Understanding memory and thread safety practices and issues in
[26] Yue Li, Tian Tan, Anders Møller, and Yannis Smaragdakis. 2018. real-world Rust programs. In Proceedings of the 41st ACM SIGPLAN Con-
Scalability-first pointer analysis with self-tuning context-sensitivity. ference on Programming Language Design and Implementation (London,
In Proceedings of the 2018 26th ACM Joint Meeting on European Soft- UK) (PLDI 2020). Association for Computing Machinery, New York,
ware Engineering Conference and Symposium on the Foundations of NY, USA, 763–779. https://doi.org/10.1145/3385412.3386036
Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). [38] Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and
Association for Computing Machinery, New York, NY, USA, 129–140. Charles Zhang. 2018. Pinpoint: Fast and Precise Sparse Value Flow
https://doi.org/10.1145/3236024.3236041 Analysis for Million Lines of Code. In Proceedings of the 39th ACM
[27] Zhuohua Li, Jincheng Wang, Mingshen Sun, and John C.S. Lui. 2021. SIGPLAN Conference on Programming Language Design and Implemen-
MirChecker: Detecting Bugs in Rust Programs via Static Analysis. tation (Philadelphia, PA, USA) (PLDI 2018). Association for Computing
In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Machinery, New York, NY, USA, 693–706. https://doi.org/10.1145/
Communications Security (Virtual Event, Republic of Korea) (CCS ’21). 3192366.3192418
Association for Computing Machinery, New York, NY, USA, 2183–2196. [39] Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick
https://doi.org/10.1145/3460120.3484541 your contexts well: understanding object-sensitivity. In Proceedings of
[28] Marcus Lindner, Jorge Aparicius, and Per Lindgren. 2018. No Panic! the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of
Verification of Rust Programs by Symbolic Execution. In 2018 IEEE 16th Programming Languages (Austin, Texas, USA) (POPL ’11). Association
International Conference on Industrial Informatics (INDIN). 108–114. for Computing Machinery, New York, NY, USA, 17–30. https://doi.
https://doi.org/10.1109/INDIN.2018.8471992 org/10.1145/1926385.1926390
[29] Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej [40] Jeff Vander Stoep and Stephen Hines. 2021. Rust in the Android
Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, platform. https://security.googleblog.com/2021/04/rust-in-android-
Uday P. Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In platform.html
defense of soundiness: a manifesto. Commun. ACM 58, 2 (jan 2015), [41] Yulei Sui and Jingling Xue. 2016. SVF: interprocedural static value-flow
44–46. https://doi.org/10.1145/2644805 analysis in LLVM. In Proceedings of the 25th International Conference
[30] Jingbo Lu, Dongjie He, and Jingling Xue. 2021. Selective Context- on Compiler Construction (Barcelona, Spain) (CC 2016). Association for
Sensitivity for k-CFA with CFL-Reachability. In Static Analysis: 28th Computing Machinery, New York, NY, USA, 265–266. https://doi.org/
International Symposium, SAS 2021, Chicago, IL, USA, October 17–19, 10.1145/2892208.2892235
2021, Proceedings (Chicago, IL, USA). Springer-Verlag, Berlin, Heidel- [42] Yulei Sui, Ding Ye, and Jingling Xue. 2012. Static memory leak detection
berg, 261–285. https://doi.org/10.1007/978-3-030-88806-0_13 using full-sparse value-flow analysis. In International Symposium on
[31] Jingbo Lu and Jingling Xue. 2019. Precision-preserving yet fast object- Software Testing and Analysis, ISSTA 2012, Minneapolis, MN, USA, July
sensitive pointer analysis with partial context sensitivity. Proc. ACM 15-20, 2012, Mats Per Erik Heimdahl and Zhendong Su (Eds.). ACM,
Program. Lang. 3, OOPSLA, Article 148 (oct 2019), 29 pages. https: 254–264. https://doi.org/10.1145/2338965.2336784
//doi.org/10.1145/3360574 [43] John Toman, Stuart Pernsteiner, and Emina Torlak. 2015. CRUST: a
[32] Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2002. Pa- bounded verifier for Rust. In Proceedings of the 30th IEEE/ACM In-
rameterized object sensitivity for points-to and side-effect analyses ternational Conference on Automated Software Engineering (Lincoln,
for Java. In Proceedings of the 2002 ACM SIGSOFT International Sym- Nebraska) (ASE ’15). IEEE Press, 75–80. https://doi.org/10.1109/ASE.
posium on Software Testing and Analysis (Roma, Italy) (ISSTA ’02). 2015.77
Association for Computing Machinery, New York, NY, USA, 1–11. [44] Alexa VanHattum, Daniel Schwartz-Narbonne, Nathan Chong, and
https://doi.org/10.1145/566172.566174 Adrian Sampson. 2022. Verifying dynamic trait objects in Rust. In
[33] Peter Müller, Malte Schwerhoff, and Alexander J. Summers. 2016. Viper: Proceedings of the 44th International Conference on Software Engineer-
A Verification Infrastructure for Permission-Based Reasoning. In Pro- ing: Software Engineering in Practice (Pittsburgh, Pennsylvania) (ICSE-
ceedings of the 17th International Conference on Verification, Model SEIP ’22). Association for Computing Machinery, New York, NY, USA,
Checking, and Abstract Interpretation - Volume 9583 (St. Petersburg, 321–330. https://doi.org/10.1145/3510457.3513031
FL, USA) (VMCAI 2016). Springer-Verlag, Berlin, Heidelberg, 41–62. [45] WALA. 2023. WALA: T.J. Watson Libraries for Analysis. Retrieved
https://doi.org/10.1007/978-3-662-49122-5_2 April 8, 2023 from http://wala.sourceforge.net/
[34] Vikram Narayanan, Tianjiao Huang, David Detweiler, Dan Appel, [46] Fabian Wolff, Aurel Bílý, Christoph Matheja, Peter Müller, and Alexan-
Zhaofeng Li, Gerd Zellweger, and Anton Burtsev. 2020. RedLeaf: Isola- der J. Summers. 2021. Modular specification and verification of closures
tion and Communication in a Safe Operating System. In 14th USENIX in Rust. Proc. ACM Program. Lang. 5, OOPSLA, Article 145 (oct 2021),
Symposium on Operating Systems Design and Implementation (OSDI 29 pages. https://doi.org/10.1145/3485522
20). USENIX Association, 21–39. https://www.usenix.org/conference/ [47] Hui Xu, Zhuangbin Chen, Mingshen Sun, Yangfan Zhou, and Michael R.
osdi20/presentation/narayanan-vikram Lyu. 2021. Memory-Safety Challenge Considered Solved? An In-Depth
[35] Hakjoo Oh, Wonchan Lee, Kihong Heo, Hongseok Yang, and Study with All Rust CVEs. ACM Trans. Softw. Eng. Methodol. 31, 1,
Kwangkeun Yi. 2014. Selective context-sensitivity guided by impact Article 3 (sep 2021), 25 pages. https://doi.org/10.1145/3466642
pre-analysis. In Proceedings of the 35th ACM SIGPLAN Conference on [48] Ziyi Zhang, Boqin Qin, Yilun Chen, Linhai Song, and Yiying Zhang.
Programming Language Design and Implementation (Edinburgh, United 2020. VRLifeTime – An IDE Tool to Avoid Concurrency and Memory
Kingdom) (PLDI ’14). Association for Computing Machinery, New York, Bugs in Rust. In Proceedings of the 2020 ACM SIGSAC Conference on
NY, USA, 475–484. https://doi.org/10.1145/2594291.2594318 Computer and Communications Security (Virtual Event, USA) (CCS ’20).
[36] David J. Pearce, Paul H.J. Kelly, and Chris Hankin. 2007. Efficient Association for Computing Machinery, New York, NY, USA, 2085–2087.
field-sensitive pointer analysis of C. ACM Trans. Program. Lang. Syst. https://doi.org/10.1145/3372297.3420024
30, 1 (nov 2007), 4–es. https://doi.org/10.1145/1290520.1290524
Received 07-NOV-2023; accepted 2023-12-23

72

You might also like