See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/29611120
The Parts-of-file File System
Article · January 2003
Source: OAI
CITATIONS READS
5 204
2 authors:
Yoann Padioleau Olivier Ridoux
Meta University of Rennes
17 PUBLICATIONS 669 CITATIONS 120 PUBLICATIONS 932 CITATIONS
SEE PROFILE SEE PROFILE
All content following this page was uploaded by Olivier Ridoux on 27 May 2014.
The user has requested enhancement of the downloaded file.
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE
The Parts-of-file File System
Yoann Padioleau — Olivier Ridoux
N° 4783
March 2003
THÈME 2
ISRN INRIA/RR--4783--FR+ENG
apport
de recherche
N 0249-6399
The Parts-of-le File System
Yoann Padioleau , Olivier Ridoux
Thème 2 Génie logiciel
et calcul symbolique
Projet Lande
Rapport de recherche n° 4783 March 2003 14 pages
Abstract: We present a new way of managing le contents, its implementation and preliminary experimental
results. The goal is to permit simultaneous read/write accesses to dierent views on a le, in order to help in
separating a user's concerns, even when they are not independent. Files are considered as mount point directories
from which views on the les are accessible as subdirectories and les with read and write permissions. Views
are designated with logical formulas describing desired properties of views. They contain those parts of the
original le that satisfy the formulas. Properties are attached to parts of the le automatically via programs
called transducers. A le system interface is used for querying views, navigating between views, and updating
view contents.
Key-words: le system, view, update, query, navigation
Unité de recherche INRIA Rennes
Un système de gestion de vues de chiers : Parts-of-le File System
Résumé : Nous présentons une façon nouvelle de gérer le contenu des chiers, sa mise en ÷uvre et les premiers
résultats expérimentaux. L'objectif est de permettre l'accès simultané à diérentes vues d'un chier, en lecture
et en écriture, et ceci an de contribuer à une véritable séparation des problèmes d'un utilisateur, même quand
ceux-ci ne sont pas disjoints. Les chiers sont considérés comme des répertoires points de montage, depuis
lesquels des vues sur les chiers sont accessibles comme des sous-répertoires et des chiers accessibles en lecture
et en écriture. Les vues sont spéciées à l'aide de formules logiques, et elles contiennent les parties du chier
qui satisfont ces formules. Des propriétés sont attachées aux parties de chiers via des programmes appelés
transducteurs (transducers). Une interface de système de chier standard est utilisée pour lister, explorer et
mettre à jour les vues de façon cohérente.
Mots-clés : système de chiers, vue, mise à jour, interrogation, navigation
Parts-of-le FS 3
Contents rst case, separation of operation concerns makes it
easy to add a new operation, but dicult to add a
1 Introduction 3 new type of object. In the second case, to add a new
type of object is easy, but to add a new operation is
2 Principles 4 cumbersome. In both cases, the le format that is the
2.1 Viewed les, views, and view le . . . 4 most convenient to the program processor is imposed
2.2 A running example . . . . . . . . . . . 4 to the user at the price that some concerns are not
2.3 Indexing . . . . . . . . . . . . . . . . . 6 clearly separated. But there are more than two fam-
2.4 Querying a view . . . . . . . . . . . . 7 ilies of concerns in programming. Beside objects and
2.5 Navigating between views . . . . . . . 7 operations, there are non-functional requirements like
2.6 Updating . . . . . . . . . . . . . . . . 7 security, life cycle concerns like versions, etc. No sin-
gle source le format can separate properly all these
3 Algorithms and data structures 7 concerns.
3.1 View les . . . . . . . . . . . . . . . . 8
3.2 On-the-y indexing . . . . . . . . . . . 8 This is only an example in one domain, but similar
3.3 Synchronizing views . . . . . . . . . . 9 examples exist in many domain. Very often, no single
3.4 Impacts on other mechanisms . . . . . 9 le organization can satisfy all desired separations of
3.5 File operations . . . . . . . . . . . . . 10 concerns.
4 Extensions 10 Separation of concerns is also an issue for human-
5 Experimentation 10 readable les that are not human-composed, e.g., log
5.1 Implementation . . . . . . . . . . . . . 10 les, where to isolate a single concern is dicult.
5.2 Eciency . . . . . . . . . . . . . . . . 11
We propose a le management service that permits
6 Related work 12 to store les of an arbitrary format, in such a way that
the user can manipulate them through projections of
7 Applications and future directions 13 various concerns. This raises issues of querying and
navigating inside a le to specify views, of updating a
8 Conclusion 13 le through its views, and of concurrent view updates.
The le will be called the viewed le. The task of the
1 Introduction proposed service is to maintain coherence between the
viewed le and its updatable views, and between the
Human-composed text les (e.g., program source les, views themselves. In order to oer a generic service,
reports, conguration les) have a life cycle that al- this is realized at the operating system level. So, the
ternates text editing by a user, and processing by an generic service is the management of composite oper-
application (e.g., (error checking; compilation), or ating system objects, their parts, and the coherence
(fault nding; testing) ). The text editing phase re- of the whole.
quires that the les clearly separate concerns, whereas
the processing phase requires that the les are easily The crux of our proposition is to use for these mech-
loadable. Moreover, separation of concerns cannot be anisms the same interface as for le systems. In short,
realized by any single le format, because concerns a viewed le is somewhat like a mount-point of a le
often overlap. system. So, the le system oers the interface for
We will take an example from the domain of soft- querying views, navigating between views, and updat-
ware engineering, as we believe these questions are ing view contents. This provides a unifying framework
very important to this domain. Consider rst pro- for application level tools, allowing to combine them.
cedural programming, à la C; what source les most This le system will be called the Parts-of-le File
clearly separate are operations (i.e., functions). Each System (PofFS for short).
operation appears as a separate concern. They are ap-
plied to objects that are not clearly separated at all, The plan of this article is as follows. We rst present
since every operation applies to several kinds of object. the principles of the Parts-of-le File System in Sec-
Consider now object-oriented programming, à la Java; tion 2. Then, we expose an implementation scheme
what source les most clearly separate are (classes of) in Section 3. We present additional features in Sec-
objects. Now, each class of object appears as a sep- tion 4. Section 5 describes experiments and their re-
arate concern, but operations (i.e., methods) are not sults. Section 6 presents related works. Finally, we
clearly separated. Indeed, methods that implement a present future search directions in Section 7 and con-
given operation are scattered in several places. In the clude in Section 8.
4 Padioleau & Ridoux
2 Principles cal navigation starts from the most general directo-
ry (e.g., the root, or a homedir), and goes down the
2.1 Viewed les, views, and view le hierarchy following subdirectories, navigation among
views starts from the most general view, and goes
A view is specied as a property of parts of a viewed down the DAG following proper increments.
le: e.g., to be a declaration in a program le, or All this involves several mechanisms:
to concern variable x. A view determines a view le, 1. The indexing mechanism by which transducers
which contains exactly all the parts of the viewed le decorate parts of le with properties.
that satisfy the property: e.g., all declarations of a 2. The querying mechanism that compares the
program le. We propose a scheme by which proper- properties specifying views, and the properties that
ties are attached to parts of a le automatically via decorate parts of le.
programs called transducers. So, a view le is deter- 3. The navigating mechanism by which proper in-
mined by comparing the properties attached to parts crements of view properties are proposed as subdirec-
of le and the property that species the view. tories.
The most general property, to be a part of a le, 4. The updating mechanism, as modifying a view
determines a view le that is simply a copy of the must modify the viewed le and the other views.
viewed le. Too specic properties, e.g., to be a dec- The rst three mechanisms inherit from mecha-
laration and a comment, determine view les where nisms we have proposed for a Logic File System
no part of the viewed le is present. This latter kind (LISFS [8]). The essence of LISFS it to mix querying
of view is called empty. and navigation for les decorated with logical proper-
Useful properties are somewhere between the most ties. The main dierence between LISFS and PofFS
general property, and the over-specic ones. Prop- is that the terminal objects of the le system are les
erties and the generality ordering form a directed a- for LISFS, and parts of le for PofFS. So, PofFS can
cyclic graph (a DAG). Moreover, dierent properties be seen as a Logic File System for le contents.
may specify the same view, e.g., to be a declaration In the following sections, we will rst describe the
and to be a declaration and not a comment deter- usage of the Parts-of-le File System with a simple
mine the same view le. We say that no comment is example, then we will describe more precisely the dif-
not a proper increment of declaration. This notion ferent mechanisms of PofFS.
is formally dened in Section 2.5. Note that decla-
ration may be a proper increment of no comment. 2.2 A running example
The property to concern variable x is a proper incre-
ment of declaration only if x is not the only declared A typical shell sequence using PofFS would be:
thing. Two dierent increments of the same property
may specify the same view. The le system maintains $1> cat -n foo.c
the relations between properties, and can compute on 1 int f(int x) {
demand all possible increments to any property, that 2 int y;
specify dierent views. 3 assert(x > 1);
In summary, a view on a viewed le is specied by 4 y = x;
a property, it determines a view le that contains all 5 fprintf(stderr, "x = %d", x);
parts of the viewed le that satisfy the property, and 6 return y * 2
it determines also a collection of proper increments. 7 }
In standard le system words, a view on a viewed le 8 int f2(int z) {
is a directory that is accessible via a path (the prop- 9 return z * 4
erty). It contains a view le formed of all parts of the 10 }
viewed le satisfying the property, and subdirectories Command 1 shows the content of the viewed C le
that can be accessed via the proper increments. foo.c.
Parts are represented by their coordinates in the
viewed le: line numbers if the syntax is line-oriented $2> poffsmount foo.c /poffs
(e.g., like many conguration les and log les), or --transducers=c_transducer,
stream intervals if the syntax is more structured /home/pad/my_transducer
(e.g., many source les). Coordinates are only used --meta=/tmp/poffs_tmp
in the tables of PofFS; the user never sees them. In-
stead, the user designates views using the properties. Command 2 mounts it under /poffs, using a gen-
The properties form the path to the directory where eral transducer c_transducer, and the user dened
the view le resides. my_transducer, to associate properties to parts of
Navigation among views adapts principles of hier- the le. In this example, the parts will be lines.
archical navigation. In the same way as hierarchi- Properties are: to which function belongs this line
Parts-of-le FS 5
specification
specification
function:f2
function:f2
debugging
debugging
function:f
function:f
pro pro
pe pe
var:x
var:y
var:z
var:x
var:y
var:z
r r
line ties line ties
numbers numbers
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
Figure 1: A le context > ls
> cd function:f var:x/ var:y/
debugging/
(function:f ), what are the variables involved in this specification/
line (var:x), or does this line have a debugging aspect,
or a specification aspect. We can represent such asso- Figure 2: Navigation in a le context
ciations by a matrix linesproperties which forms the
le context (see Figure 1 for an illustration). A place Again, command 6 shows how a new view le foo.c
(/tmp/poffs_tmp) is given to program poffsmount has been created, containing this time less lines that
in order to store the le context on disk with other the viewed le, and shows how the property incre-
meta-data such as view le contents. ments are related to the current query. Indeed, under
$3> cd /poffs poffs/function:f, var:z is no longer listed as an in-
$4> ls crement, as the current view le (which contains only
debugging/ specification/ the code of the f function) contains no line using vari-
function:f/ function:f2/ able z (see Figure 2 for an illustration). Moreover,
var:x/ var:y/ var:z/ function:f is also not listed because this subdirecto-
foo.c ry does not rene the view, but just generates exactly
the same view as the current one.
Command 4 has two eects. First, it creates a view
le foo.c which contains the parts of le correspond- $7> cd !(debugging|specification)
ing to this directory. As the directory is the root of the Command 7 illustrates the possibilities of
mounted le system (/poffs in the example), no prop- the query language, combining negation (writ-
erties have been selected yet and the view will have ten !) and disjunction (written |). The slash
exactly the same contents as the viewed le. Second, can be read as a conjunction, so the path
it presents navigation increments to the user as sub- /poffs/function:f/!(debugging|specification)
directories (as function:f, debugging, . . . ). Those corresponds logically to
subdirectories correspond to properties that actually function:f ^ :(debugging _ specification).
rene the current view, without making it empty.
$8> ls
$5> cd function:f
var:x/ var:y/
Command 5 renes the view, selecting parts of the foo.c
view le having the function:f property. The cur- $9> cat foo.c
rent directory changes to /poffs/function:f, and int f(int x) {
this path corresponds internally to a logical query, in int y;
this case function:f . ................:1
y = x;
$6> ls ................:2
debugging/ specification/ return y * 2
var:x/ var:y/ }
foo.c ................:3
6 Padioleau & Ridoux
2.3 Indexing
specification
function:f2
debugging
function:f
pro
pe A le transducer for PofFS attaches a property to ev-
var:x
var:y
var:z
r
line ties ery part of a le. The denition of a part can be
line-oriented; i.e., parts are lines. Alternatively, it can
numbers
1 line #1
2
0000000000000000000000000000000
1111111111111111111111111111111
line #2 be structure-oriented; i.e., parts correspond to nodes
0000000000000000000000000000000
1111111111111111111111111111111
3 ...:1 of the abstract syntax tree of the viewed le. For
4
0000000000000000000000000000000
1111111111111111111111111111111
line #4
the sake of simplicity, we will only develop the line-
0000000000000000000000000000000
1111111111111111111111111111111
5 ...:2
oriented case in this article. In fact, operating system
issues do not depend on this orientation.
6 line #6
7 line #7
8 Several transducers can be given to the poffsmount
9 ...:3
command; PofFS will cascade them. This makes the
10
system easily extensible. For example to dene a
memory management aspect, one just has to write
not
t
no
cd function:f/!(debugging|specification) a script such as:
Figure 3: Creation of a view for each line
if line match
(malloc|calloc|new|delete|free)
Command 8 shows that the list of subdirectories has then print "memory_managment\n"
been reduced to var:x and var:y (see also Figure 3). else print "\n"
Command 9 shows the content of the view le. Lines
not satisfying the current query have been ltered out and pass it to the the poffsmount command, without
and replaced with special marks, like ...........:1 taking into account other transducers.
A transducer is a nite state machine, so that the
$10> cat foo.c | sed -e s/y/z > foo.c indexing of a part may depend on previous lines. For
example, remembering in what function body a line
Command 10 shows that the views are updatable lies, permits to add the property function:f to the
views, and can be modied by any tools. The eect line. This shows that even with the line-oriented ap-
of this command is to replace all occurrences of y, in proach structured data can be analyzed.
the current view le, with z. Attached properties are writen in a logic language,
which is used for querying and navigating. A logic
$11> ls is dened by a language F , i.e., its formulas, and an
var:x/ var:z/ entailment relation that is usually written j=. f 1 j= f 2
foo.c means that if f 1 is true, then f 2 must be true too.
For instance, a ^ b j= a holds in propositional logic.
Command 11 shows how modifying the view aects Let O be the set of all objects in the le (objects are
the property increment (compare with result of com- parts of le)), and c(o) be the content of the object o,
mand 8). transducers implement a function d that associates to
each object o a formula d(o) describing the object.
In the current prototype, the description of an ob-
$12> pwd ject is limited to a conjunction of atomic properties
/poffs/function:f/!(debugging|specification)/ (but queries can be any formulas). So, transducers
$13> cd /poffs actually return for each line (or token) a set of prop-
$14> cat foo.c erties. This association can be represented as in Fig-
int f(int x) { ure 1 by a matrix lines properties.
int z; Indexing is costly, so that we have tried to limit
assert(x > 1); it. At some points, a transducer may come back to
z = x; its initial state. This means that parts before that
fprintf(stderr, "x = %d", x); point do not inuence parts beyond that point. These
return z * 2 points are recorded as synchronization points. If a le
} is modied at some point, and must be re-indexed, it
inf f2(int z) { return z * 4 } is enough to re-index the le starting from the last
synchronization point before the modied point, and
Finally, command 14 shows how modifying the view ending at the rst synchronization point of the modi-
aects the other views, by propagating the modica- ed le that is also a synchronization point of the old
tion. le.
Parts-of-le FS 7
2.4 Querying a view 2.6 Updating
Views are designated with formulas that serve as As we want to be able to update views, PofFS needs
paths. Each directory contains a unique view le, to remember what has been ltered out for back-
which contains the set of lines satisfying the path. propagating an update to the viewed le. We also
In the current prototype, the logic is the logic of need to make the missing parts explicit in the view
proposition. Queries are either simple atoms (e.g., a), le. Indeed, it must be visible if a new line is added
negations of formulas (e.g., :a), disjunctions of for- before or after a missing part. For instance, assum-
mulas (e.g., a1 _ a2 _ a3), or conjunctions of formulas ing a schematic view a ... b, where ... represents
(e.g., (a1 _ a2) ^ a3 ^ (:a4)). The entailment relation is the missing part, a c ... b can not be distinguished
that of usual propositional logic. These formulas are from a ... c b if the position of the missing part is
written a, !a, a1|a2|a3, and (a1|a2)&(a3)&(!a4) in not shown. So, PofFS inserts special marks in the view
the concrete syntax. The slash is read as a conjunc- le everywhere parts have been ltered out. In order
tion. Simple atoms can be valued attributes. The to not pollute too much the le with marks, only one
query language supports comparison and matching mark is generated for consecutive missing parts. In
operations on attributes valued by integers or strings: order to designate missing parts (e.g., to move them),
e.g., cd "function:f.*" (i.e., select all functions a unique number is associated to each mark. We will
whose name starts with an `f'). Descriptions must also see in section 4 another use for this number.
be conjunctions of atomic formulas, but any formula So, a view le is composed of a set of lines satisfying
can serve as a query. the query, and a set of marks, internally referring to
The query mechanism checks if a conjunction lines in the viewed le. Updating a view le involves
of properties satises a query. In a directo- an update of the viewed le and of the properties of
ry p, the view le contains fc(o) j o 2 O; d(o) j= pg. each part of le. The updated viewed le is composed
E.g., in the context of foo.c, line 2 belongs of a new set of lines, which are the lines that are hid-
to the view le of directory function:f because den by a mark in the updated view le, and the vis-
function:f ^ var:y j= function:f . The c(o) are al- ible lines in the updated view le. The properties of
ways displayed in the order of the viewed le. each line are updated by re-applying the transducers
on this updated viewed le. Remember that, thanks
to the synchronization points, re-indexing is actually
2.5 Navigating between views done only on a section of the viewed le.
Property increments, which allow to rene views,
are proposed to the user as subdirectories. More 3 Algorithms and data struc-
formally, let F be the set of all properties, let
ext(p) = fo 2 O j d(o) j= pg (the extension of p), then
tures
the set of subdirectories Sub in a directory p is a - Each line of the viewed le is represented by an inter-
nite subset of I = ff 2 F j ; ext(f ^ p) ext(p)g. nal object identier oi. The data structure is essen-
The subset is chosen to cover all dierent extensions tially a matrix object->properties stored on disk.
ext(f ^ p). E.g., at step 6 of the running example, It represents the meta-data object properties in-
var:x is a subdirectory of function:f because formation. It is often large, but sparse. The query
ext(var:x ^ function:f ) and navigation mechanisms use this matrix; the in-
= ext(var:x) \ ext(function:f ) = f1; 3; 4; 5g dexing mechanism lls it in. To make the query
ext(function:f ). and navigation algorithms more ecient, PofFS also
Note that stores on disk an inverted table property->objects.
ext(f ^ g) = ext(f ) \ ext(g), Transducers initialize the matrix, allocating fresh
new objects for each line. There is also a table
ext(f _ g) = ext(f ) [ ext(g), object->contents that associates to each object its
and ext(:f ) = complement(ext(f )). content, and a table line->object that associates to
PofFS provides also mechanisms to group related each line number of the viewed le, its internal ob-
properties together, in order to reduce the number of ject identier. Indeed, the line order in the viewed
answers in ls. For instance, directories function:f/ le, and the line order in the internal representation
and function:f2/ can be grouped together under the need not be the same. E.g., if a line is inserted in
directory function/*, making the navigation process a view le, it will be added to the end of its repre-
far more convenient. Directories can also be grouped sentation. These two tables allow to reconstitute the
by the user in taxonomies. As these mechanisms are content of the viewed le. Figure 4 illustrates these
quite complex and inherited from LISFS, we refer data-structures with foo.c (see Section 2.2), where
to [8] for more information. we assume that function f2 is created rst, then func-
8 Padioleau & Ridoux
tion f, then the fprintf to stderr, and nally the
assert. This scenario explains the numbering of obj-
ects in the le context.
The viewed file
3.1 View les
inode(foo.c) Each directory corresponds to a logical formula f . Us-
ing table object->properties, the query algorithm
data computes the set of all objects ois that satisfy f . Ev-
int f(int x) {
ery time the user reads the content of a directory,
int y;
e.g., with the ls command, a new view le is created.
A new inode i is allocated, but no data is created.
assert(x > 1)
Then its data blocks i.data and a table i.marks (as-
y = x;
fprintf(stderr, "x = %d", x);
return y * 2 sociating to each mark a list of objects) are lled in
} on demand (e.g., command open) according to the fol-
int f2(int z) {
lowing algorithm:
return z* 4
} mark = 0
inmark = false
The file context
foreach l in line->objects
line->object object->contents
let o = line->object[l] in
l1 o4 o1 int f2(int z) {
if (o is in ois) & not inmark
l2 o5 o2 return z * 4
l3 o10 o3 }
then add object->contents[o] to i.data;
l4 o6 o4 int f(int x) { if (o is in ois) & inmark
l5 o9 o5 int y; then inmark = false;
l6 o7 o6 y = x; add ".........:" and mark
l7 o8 o7 return y * 2 and object->contents[o] to i.data;
l8 o1 o8 }
if (o is not in ois) & not inmark
l9 o2 o9 fprintf(stderr, "x = %d", x)
then mark++; inmark = true;
l10 o3 o10 assert(x > 1);
i.marks[mark] = o;
object->properties property->object
o1 #function:f2, #var:z #function:f2 o1, o2, o3
if (o is not in ois) & inmark
o2 #function:f2, #var:z #function:f o4, o5, o6, o7, then add o to i.marks[mark]
o3 #function:f2 o8, o9, o10
o4 #function:f, #var:x #var:x o4, o6, o9, o10 if inmark
o5 #function:f, #var:y #var:y o5, o6, o7 then add ".........:" and mark
o6 #function:f, #var:x, #var:y #var:z o1, o2
to i.data
o7 #function:f, #var:y #debugging o9
3.2 On-the-y indexing
o8 #function:f #specification o10
o9 #function:f, #var:x, #debugging
When the user modies a view le, PofFS updates the
o10 #function:f, #var:x, #specification
A view file content of the viewed le, and also updates the ta-
ble object->properties as the properties may have
cd function:f/!(debugging|specification)
changed. This work is done when the user commits
i
a view le (e.g., every time the user saves a view le
data
marks int f(int x) {
m1 o3 int y; with its text editor).
m2 o5 ................:1 Recalculating from scratch table
object->properties is expensive, because it
m3 o8, o9, o10 y = x;
................:2
also requires to calculate the inverted table
property->objects. So, we prefer to patch
return y * 2
}
this table according to the modications. Every time
a modication is committed, PofFS will change only
................:3
a few lines of table object->properties, instead of
Figure 4: Representation of a view erasing it from disk and creating a new table.
When the user commits (in fact, a release oper-
ation) an updated view le whose inode is i, PofFS
rst computes the new content of the viewed le, us-
ing i.data and i.marks according to the following
algorithm:
Parts-of-le FS 9
new_content = empty content 3.3 Synchronizing views
foreach line l in i.data
if l contains a mark with number j The user can open dierent view les in dierent di-
then rectories simultaneously, which introduces the concur-
foreach o in i.mark[j] rent view update problem because updating one view
add object->contents[o] to new_content le modies the viewed le, and so may invalidate the
else contents of the other view les. We choose to delegate
add l to new_content resolution of the concurrent view update problem to
applications that use PofFS. For instance, emacs al-
ready checks o concurrent updates. This requires
This yields a new content new_content. Then, some support from the operating system to indicate
PofFS builds a new table new_properties which to an application that the view le it manipulates is
associates properties to each line of new_content. not fresh enough. It also requires the assurance that
It builds it by applying the transducers on the application will check this indication.
new_content starting from the appropriate synchro-
PofFS support to synchronization is to have a time-
nization point, and by copying the old properties in stamp in each directory inode, and to increment a
object->properties for sections of the viewed le
global timestamp every time a view le is successful-
that contain no modied part and are delimited by ly committed. Every time the user uses ls (in fact,
synchronization points. Then, PofFS computes the a readdir operation), PofFS checks if the timestamp
dierence between the old properties and the new of this directory is equal to the global timestamp. If
ones, modifying the tables according to the following `yes', then the view is fresh enough and nothing has
algorithm: to be done, if `no' then a new view le, with a new
inode, is created, and the timestamp of this directory
done = empty array;
is set to the global timestamp.
foreach l in line->object do Moreover, as some applications may keep a view le
done[line->object[l]] = 0 open without checking if the view le is fresh enough,
foreach l in new_properties do we simply forbid to update out-of-date view les. We
find in object->properties an object o achieve this by associating to each inode representing
such that (object->properties[o] a view le a timestamp. When an application commits
== new_properties[l]) a view le, PofFS checks if the timestamp of this view
&& done[o] == 0 le is equal to the global timestamp. If `yes', then
if an object a is found the global timestamp is incremented, and tables are
line->object[l] = a updated as seen in section 3.2; if `no', an error code is
object->contents[a] = new_content[l] returned.
done[a] = 1 Basically, this system can be seen as a classic mul-
else tiple readers and single writer system.
allocate a new object o
object->properties[o]
= new_properties[l] 3.4 Impacts on traditional le han-
line->object[l] = o dling mechanisms
object->contents[o] = new_content[l]
foreach p in new_properties[l] A user can add a line in a view where this line does not
add o to property->objects[p] belong; e.g., adding a comment in a view le which re-
sides in a directory with the property !comment. Sav-
foreach o in done ing this view le calls the indexing mechanism (de-
if done[o] = 0 then scribed in Section 3.2), and generates an up-to-date
free o in object->properties view le in this directory (see Section 3.3). As text
editors are not prepared to the fact that the content of
a le after a save diers from the content just before
This schema saves calls to the transducers that exper- the save, they often do not refresh the buer with the
iments have shown to be costly. new content. That means that the comment will re-
Note that the algorithm compares the properties main in the view le, and in particular that a second
and not the contents of the lines, as two lines may have save operation will have for eect to reinsert this line a
the same content but not the same properties (e.g., the second time in the viewed le. To avoid this problem,
closing brackets in foo.c:7 et foo.c:10 form identical the text editor must be congured so that every save
lines, but one will have the function:f property, and operation refreshes the buer with the new content.
the other one function:f 2). This is possible for emacs by dening a macro.
10 Padioleau & Ridoux
3.5 File operations 4 Extensions
We switch now from application level operations to Missing parts in a view le are like hidden. Being able
le system operations to enter in more details. We to hide some parts of a le is a useful feature, but a
use Linux VFS terminology (Virtual File System [6]). user may want to see what is inside one hidden part.
Operation read_super is called via the user pro- So, to be a hidden part with number x is treated as
gram poffsmount. It takes as parameters a le name, a property. The user is allowed to specify a view by a
and a list of transducers. It creates the tables of property mark:x provided x is a mark number in the
the le context, and lls them in by passing the le view le of the current view. E.g., in the example of
through the transducers. It also creates the root in- Section 2.2, another shell sequence using PofFS would
ode. Operation put_super (shell command umount) be:
needs not sync the le context tables; it can free them. $10> cd mark:1
So, all the tables that form the le context can be seen $11> ls
as temporary data. However, as the construction of a foo.c
le context is costly, one may also keep the internal $12> cat foo.c
tables on disk, to reuse them when re-mounting. ................:1
Operation readdir is called via the user program fprintf(stderr, "x = %d", x);
ls. It takes as a parameter an inode that represents ................:2
a view, and returns a list of pairs (name,inode) for $13> cd mark:1
every subdirectory of the view, and for the view le. $14> ls
Command readdir may build the representation of specification/
the view le, though it is preferable to operate lazily, var:x/ var:y/
and leave it to le operation open. Operation lookup foo.c
can be called via the command cd keyword. It takes $15> cat foo.c
as a parameter an inode (the current view) and a int f(int x) {
string, which is the plain name of a le or of a proper- int y;
ty, and it returns the corresponding inode, or an error assert(x > 1);
condition. y = x;
Operations lseek, read, write, and truncate are ................:1
standard. For example, operation read takes as a
parameter an inode, and a buer to be lled in. It We implement this feature by using i.marks to com-
gets the block addresses of the contents of the le and pute the set of objects ois that belong to the new
lls in appropriately the buer passed in parameter. view le, bypassing the query mechanism. This allows
Operation open actually builds the representation of to really navigate in the content of a le, adding to
the view le, if it does not exist yet, then it behaves PofFS the advantages of hypertext systems. Note that
as usual. Operation release causes the construction the directory mark:1 is not special; it species a view,
of the new contents of the viewed le, and its re- in which increments and a view le are computed as
indexing, which yields a new le context. Operations usual.
mkdir, rename, create, and unlink are forbidden, Some tools require that information remains in sep-
because all objects in view directories are there for arate les, e.g., Java compilers force the programmer
logical reasons: property increments, and view le. to map all classes on dierent les. So, we added to
Moving them, or creating new things would lead to PofFS the ability to mount a group of les. This just
incoherence. requires to insert special marks in the view le to rep-
resent the le boundaries.
VFS includes a cache mechanism for name lookup,
which makes it possible to avoid to call the concrete
operation of a le system. As VFS is used to work
with traditional hierarchic le systems, the existing
5 Experimentation
cache handling strategy is not prepared to the fact 5.1 Implementation
that modifying the content of a le has side eects
on the contents of other les. However, this is exact- The Parts-of-le File System is implemented as an
ly what happens when an update in a view le causes extension to the Logic File System. It consists in a
updates on other view les, and so on the inodes asso- user-level le system based on PerlFS. EXT2 is used
ciated to those view les, as described in Section 3.3. as an underlying le system to store le contents and
So, PofFS invalidates the cache handling mechanism meta-data. This implementation style is very conve-
by forcing the VFS to call every time the concrete nient for prototyping, but it yields a rather slow le
operation. system. The time penalty ratio of PerlFS is about two
Parts-of-le FS 11
BibTeX Article Program BibTeX Article Program
size of Space
specic 23 LoC 29 LoC 61 LoC overhead per 0.9 Kb 1.0 Kb 0.81 Kb
transducer line
size of le 269 Kb 78 Kb 92 Kb Space 28 25 23
number of 8055 1783 2651 overhead
lines Space
mount time 122s 22s 29s overhead per 1.1 Kb 0.9 Kb 1.4 Kb
(1st/others) /0.216s /0.058s /0.078s attribute
Total number
of dierent 7061 2132 1498 Table 2: Summary of experiments (2)
attributes
Average
number of 16 15 14
attributes frequent BibTeX elds: title, author, year, etc. The
per line two transducers produce a total of 7061 dierent at-
size of 7800Kb 1960Kb 2164Kb tributes, and an average of 16 attributes per line. Note
context le that attributes of a line of a given BibTeX entry may
save time 0.956s 0.256s 0.331s
ls time 0.253s 0.100s 0.170s be replicated on all lines of the same entry to handle
the fact that the really useful unit is the entry and
Table 1: Summary of experiments (1) not the line. Then, it becomes very easy to navigate
in the BibTeX le. Navigation proposes subdirecto-
ries in a way that reminds of text-mining: e.g., who
are the most frequent co-authors of author A? What
w.r.t. EXT2, for equal functionality. The added func- were the most prominent years for subject S? PofFS
tionalities in PofFS/LISFS augment the time penalty. forms a le context of about 8000 objects 7000
Experiments on a prototype LISFS show that naviga- attributes in 122 seconds. This is done at mount
tion and querying is rather ecient, especially when time, but since internal tables are persistent, subse-
compared with their application level counter-parts quent mounts only cost 0.216 seconds. The size of
like command find. However, we observed that cre- the le context is 7800 Kbytes. Once the le context
ation commands, like create, are slow. We consider is created, one can navigate in it, which is fast, and
this a current state of aair that gives indications on modify view les. Finding all co-authors of a given
where to rene the implementation. Both LISFS and author (i.e., cd author:Jones; ls author:*) takes
PofFS are prototypes that are in permanent evolution. 0.25 seconds. Modifying an entry takes about 1 sec-
Transducers are not proper parts of either LISFS onds. Tools exist for manipulating and editing Bib-
or PofFS. Instead, both le systems oer hooks for TeX les, but they do not oer as many possibilities
calling user-dened transducers. Our transducers are as simply mounting PofFS on a BibTeX le. The only
either ad-hoc scripts or calls to indexing tools like thing that may be missing is a graphical user interface.
etags. In our experiments, we always have used two A le browser or a windowed text editor like emacs do
transducers for each application. The rst one per- a part of the job. This demonstrates the interest of an
forms a full-text indexing, and is used in every appli- operating system approach; existing interfaces already
cation. It consists of a 2 lines long script that extracts works.
one attribute per word in a viewed le. The second Our second experiment is the edition of this ar-
one is specic to the application, and will be described ticle. It is composed in LATEX as a single le
with the application. of about 1700 lines. The specic transducer ex-
We ran several experiments to assess the eciency tracts attributes like section, subsection, comment,
of PofFS, both in speed and disk space, and in usabil- etc. This produces a total of 2132 attributes, and
ity. The platform for all experiments was a Linux box an average of 15 attributes per line. The le con-
running kernel 2.4, with a 2Ghz Pentium 4, 750Mb text is built from scratch in 22 seconds, and load-
RAM, and a 40 Gb IDE disk. In the following sec- ed on subsequent mounts in 0.058 seconds. It oc-
tion, all the experiments are line-oriented. cupies 1960 Kbytes. Again, navigation and query-
ing are fast: nding all subsections that talk about
5.2 Eciency synchronization (i.e., cd contains:synchro.*; ls
subsection:*) costs about 0.1 seconds. Modication
A rst experiment is to mount PofFS on a BibTeX takes about 0.25 seconds. LATEX permits to split a
le. The BibTeX le is about 8000 lines long, and text into several les, but it is rather inconvenient
contains more than 900 entries. The specic trans- because many operations do not cross le boundaries:
ducer extracts properties that correspond to the most e.g., searching, spell-checking, query-replacing. So, we
12 Padioleau & Ridoux
believe it is better to keep it all in one text le, and on which the user can click to expand the content.
cut slices in it at will. Since views are operating sys- The contribution of PofFS is to generalize those ideas,
tem level objects, they can be accessed by dierent allowing powerful query, navigation and updating of
applications that do not know it others, and PofFS arbitrary text data. Furthermore, these ideas are sup-
will maintain their coherence. For instance, one may ported in a generic way at the operating system level.
detex a LATEX le to strip o its commands, and then
pass the result through a spell-checker, but it will not
correct the LATEX le. If instead, it is a transducer There is an active research area in software engi-
that hides the LATEX commands, the view le can be neering for manipulating programs via views. How-
passed through the spell-checker, and it will correct ever, proposed tools in this area lack of at least one
the viewed le. of PofFS three facets, navigation, querying, and up-
The last real size experiment is the LISFS/PofFS dating, and they often are very specic to a task
program. It is a single Perl program of about (e.g., navigating in Java class hierarchies).
2600 lines. It follows a very elaborate coding disci-
pline, in which several aspects of the nal product Still at the application level, but general purpose,
are identied and interleaved: e.g., debugging and archive tools (e.g., tar) provide a relation between
assertions as in Section 2.2, dierent aspects of the (parts of) a le system and a single le. However, the
le system like security, and several versions of the relation is merely duplication, so that updating an
same operations. The specic transducer extracts at- archived le does not update the archive. Moreover,
tributes that correspond to these aspects. This pro- their navigation and querying capabilities are limited,
duces a total of 1498 attributes, and an average of and they do not oer an operating system API.
14 attributes per line. The le context is built in
29 seconds, reloaded in 0.078 seconds, and it occupies
2164 Kbytes. Modication takes about 0.3 seconds. The view update problem is an important issue in
Navigation is fast: 0.170 seconds for listing functions data-base management [4]. However, PofFS case is
that use a given variable (i.e., cd var:transact; ls much simpler than for data-bases. Indeed, our view
function:*). Using navigation and querying, one les are sets of actual parts of the viewed le. So,
may select a slice of the source le (i.e., a congu- back-propagating updates to the viewed le only re-
ration), and edit it or execute it. One may also con- quires to know the coordinates of the parts that form
sider the dierent facets of an aspect across the whole the updated view le. In data-bases, views are rela-
program. This makes it easier to make a coherent tions, i.e., sets of items, that are not made of items of
change in this aspect. A change in any view is back- the viewed relations. Instead, the items in the views
propagated into the source le. No single tool oer all are computed from the viewed relations. So, one would
this services, and no split of the program into several have to invert the computation to be able to back-
les ts all needs. propagate updates. Not all computations can be in-
Tables 1 and 2 summarize these three experiments. verted, so updating views is only possible in restricted
These experiments on real data are encouraging. They cases. The fact that a view update may create items
show that PofFS can be used in practice, as its re- that do not belong to the view (see Section 3.4) is
sponse times are compatible with interactive usage. also a data-base issue. In our context, it is solved
Only mounting from scratch is too slow. This is why by re-indexing. Note also that data-bases distinguish
it is important to keep the internal tables on disk, and virtual views from materialized views (or snapshot).
reload them when re-mounting. In the latter case, With PofFS, views are just materialized enough so
mounting is fast enough. Note also that the trans- that applications can see them as ordinary les, but
ducer used in the three experiments permits full-text they remain synchronized with the viewed le.
indexing, but costs one attribute per word. If it is dis-
abled, keeping only the specic transducers, mounting
is four times faster. On the operating system side, many works on le
organization have been made. SFS [2], HAC [3], or
Nebula [1], mix querying and navigation in a le sys-
6 Related work tem. For instance, SFS has transducers that extract
automatically properties from le contents, such as
On the application side, modern text editors such as the names of the functions in a C le. This makes
emacs or syntactic editors, and Integrated Develop- it easy to search for a le using a query such as
ment Environments (IDEs) as in Smalltalk, ease the cd function:foo. The contribution of PofFS is to go
manipulation of le contents. They provide query and deeper than the le level, allowing to go inside les,
navigation tools, and often allow to hide some parts no longer selecting sets of les, but sets of parts of
of the program, compressing the code under an icon, les.
Parts-of-le FS 13
7 Applications and future direc- 8 Conclusion
tions Our contribution is to have identied a conict be-
tween applications that handle complex and struc-
An important research direction is to study the impact tured information stored in single les, and the need
of PofFS in larger applications than our experiments. to manipulate (i.e., navigate, query, and update) more
Another one is to make it more ecient. elementary units. The management of le contents in
In the domain of software engineering, being able general purpose le systems has not evolved that much
to manipulate comments, documentation, debugging, since the development of le systems; les are consid-
specication or platform dependent code or other as- ered as units, and navigation and querying are only
pects such as security or memory management, to l- dened at the le level. We propose to consider les
ter them out or to focus on them, provides a great as mount points to be able to navigate inside them.
improvement for managing a project. Using PofFS, This raises the problem of what to do if a part of a
the programmer can write more commentaries or de- le is updated in such a le system. We have pro-
fensive code without being afraid to pollute the pro- posed a notion of updatable views to solve this prob-
gram, as those information can be hidden. The pro- lem. It provides a unifying framework of operations
grammer can also keep in source les the naive prelim- like indexing, querying, navigating and updating, and
inary version of an optimized function, which makes a unifying system level support, under a standard in-
the understanding of the optimized one easier. terface, the le system. This is implemented as the
Software engineering has developed rich notions Parts-of-le File System.
of views which either are extracted from programs PofFS is easily extensible, and allows to freely com-
(e.g., slicing [10]) or guide the production of programs bine querying and navigation, which makes it possible
(e.g., UML, Aspect Oriented Programming [7, 5]). All to combine services that were kept separate in appli-
these views can be designated by properties and serve cation level tools. Such a le system gives at a system
as an eective way of manipulating programs. For level services that are useful in many applications. We
example, UML diagrams could serve as guides for re- have used it in real size applications like text editing
nement steps that lead to a concrete program. The and programming.
whole thing would be kept coherent by accessing it
only through view les.
Another direction is to add more transducers to
References
PofFS. Many services provided traditionally by etags, [1] C.M. Bowman, C. Dharap, M. Baruah, B. Ca-
class browsers, call graphs, javadoc, literate program- margo, and S. Potti. A File System for Informa-
ming, versioning, . . . can be put in PofFS easily. tion Management. In ISMM Int. Conf. Intelligent
Moreover, the unied interface for these services, the Information Management Systems, 1994.
le system, makes new fruitful combinations possible.
[2] D.K. Giord, P. Jouvelot, M.A. Sheldon, and
System administration often incurs the manage- J.W. O'Toole Jr. Semantic le systems. In 13th
ment of table les, for users, services, etc. These ta- ACM Symp. on Operating Systems Principles,
bles obey precise formats, and sometimes have a sys- pages 1625. ACM SIGOPS, 1991.
tem API. One could take advantage of PofFS to oer a
secure and uniform handling of these tables. This re- [3] B. Gopal and U. Manber. Integrating content-
quires that every type of les comes with its dedicated based access mechanisms with hierarchical le
transducer (as for emacs modes). For example, a user systems. In 3rd ACM Symp. Operating Sys-
could mount PofFS on le /etc/passwd, and simply tems Design and Implementation, pages 265278,
use cd and ls, or getdents to get the information 1999.
he needs, making the use of function getpwent use-
less. An interesting aspect of this API, is that it has [4] A. Keller. Algorithms for translating view up-
an interactive interface via a shell. So, a programmer dates into database updates for views involv-
can rst test his query under a shell, and then put it ing selections, projections, and joins. In 4t-
in his program. Similar techniques could be used for f ACM Symp. Principles of Database Systems,
analysing log les. pages 154163, 1985.
Our perspectives for improving the eciency of [5] G. Kiczales. Aspect-oriented programming. ACM
PofFS are to make the construction of the internal Computing Surveys, 28(4):154, 1996.
data-structure more lazy, and to use a more direct
implementation style than PerlFS, at least for le op- [6] S.R. Kleiman. Vnodes: An architecture for mul-
erations that are similar in PofFS and in usual le tiple le system types in Sun UNIX. In USENIX
systems. Summer, pages 238247, 1986.
14 Padioleau & Ridoux
[7] P.B. Kruchten. The 4 + 1 view model of architec-
ture. IEEE Software, 12(6):4250, 1995.
[8] Y. Padioleau and O. Ridoux. A logic le system.
In USENIX Annual Technical Conference, 2003.
Long version in [9].
[9] Y. Padioleau and O. Ridoux. A logic le system.
Rapport de recherche 4656, Inria, 2003.
[10] F. Tip. A survey of program slicing techniques.
Journal of programming languages, 3:121189,
1995.
Unité de recherche INRIA Rennes
IRISA, Campus universitaire de Beaulieu - 35042 Rennes Cedex (France)
Unité de recherche INRIA Lorraine : LORIA, Technopôle de Nancy-Brabois - Campus scientifique
615, rue du Jardin Botanique - BP 101 - 54602 Villers-lès-Nancy Cedex (France)
Unité de recherche INRIA Rhône-Alpes : 655, avenue de l’Europe - 38330 Montbonnot-St-Martin (France)
Unité de recherche INRIA Rocquencourt : Domaine de Voluceau - Rocquencourt - BP 105 - 78153 Le Chesnay Cedex (France)
Unité de recherche INRIA Sophia Antipolis : 2004, route des Lucioles - BP 93 - 06902 Sophia Antipolis Cedex (France)
Éditeur
INRIA - Domaine de Voluceau - Rocquencourt, BP 105 - 78153 Le Chesnay Cedex (France)
http://www.inria.fr
ISSN 0249-6399
View publication stats