Firestore for App Developers
Firestore for App Developers
Abstract—The recent years have seen an explosive growth schemaless data model, ACID transactions, strong consistency,
in web and mobile application development. Such applications and index-everything default means that developers can focus
typically have rapid development cycles, and their developers ex- more on the data they wish to store and present to the
pect mobile-friendly features and serverless characteristics such
as rapid deployment capabilities (with minimal initialization), end user without worrying about the details of the database
scalability to handle workload spikes, and flexible pay-as-you-go configuration.
billing. Google’s Firestore is a NoSQL serverless database with Fully Serverless Operation and Rapid Scaling: Some ap-
real-time notification capability, and together with the Firebase plications go viral, and that translates to difficult problems
ecosystem greatly simplifies common app development challenges around scaling of the infrastructure with increasing QPS load,
by letting application developers focus primarily on their business
logic and user experience. This paper presents the Firestore storage, and therefore costs. Firestore is truly serverless–the
architecture, how it satisfies the aforementioned requirements, application developer needs to only create a (static) web
and how its real-time notification system works in tandem with page or an application, and initialize a Firestore database to
Firebase client libraries to allow mobile applications to provide enable end users to store and share data. End-user database
a smooth user experience even in the presence of network requests are routed directly to Firestore, without the need for
connectivity issues.
Index Terms—cloud, database, mobile and web applications, a dedicated server to perform access control thanks to security
continuous queries rules set by the developer. Firestore’s API encourages usage
that scales independently of the database size and traffic, and
I. I NTRODUCTION Firestore’s implementation leverages Google’s infrastructure
A large amount of modern computing happens at the edge, (in particular, Spanner) to provide a highly available and
in web browsers or on mobile devices. Deploying applications strongly consistent database whose scale is limited only by the
to such devices is generally easy, either via static hosting or physical constraints of a cloud region’s datacenters. Firestore’s
via Google and Apple’s application stores. However, many serverless pay-as-you-go pricing together with a daily free
applications need some remote computing and storage, be it quota ensures that billing increases reflect application success;
just for reliability or state sharing across multiple devices, for a standalone emulator allows developers to safely experiment.
sharing information between users, or for querying datasets Flexible, Efficient Real-time Queries: An application often
that are being updated by other processes. Implementing, needs to send fast notifications to potentially large subsets of
deploying, scaling and managing this remote infrastructure web or mobile devices for many reasons, such as communi-
remains a significant challenge, even in today’s world of cation between users. In a Firestore-based application, these
ubiquitous cloud services. are typically coded as real-time (also known as streaming or
Firestore is a schemaless serverless database with real-time continuous) queries [1] to the backend database. The results
notification capabilities that greatly simplifies the development of a real-time query are updated by the application and
of web and mobile applications. It scales to millions of queries presented to the end-user to reflect any pertinent change in
per second and petabytes of stored data; notable current and the database. Firestore supports queries that can be efficiently
past users include the New York Times and BeReal, as well executed using secondary indexes and updated in real-time
as a prominent social media app and a mobile game each with from the database’s write log (this a key element of Firestore’s
over a hundred million users. Importantly, at low scale QPS scalability). These queries fall short of full SQL support, but
(queries per second) and storage consumption, Firestore costs are generally sufficient for the querying needs of interactive
close to nothing. applications.
In this paper, we describe four key aspects to Firestore’s Disconnected Operation: Mobile devices can lose network
success and describe how Firestore achieves them. connectivity for arbitrary lengths of time. The behavior of an
Ease of Use: Modern application development benefits application while the device is disconnected can often be key
from rapid iteration and deployment to production. Firestore’s to its success. The Firestore database service together with
Firebase client-side SDK libraries support fully disconnected
* The author was a Visiting Researcher at Google, and is partially supported operation, with automatic reconciliation on reconnection. This
by Project 62021002 of NSFC. greatly simplifies development of mobile applications.
3368
/restaurants/one {/restaurants/two,avgRating:4,...}@3
address: "415 Main Street", t=13:{/restaurants/one,avgRating:3,...}@1,
type = "BBQ", {/restaurants/two,avgRating:3.5,...}@13
avgRating: 3.5, t=19:delete /restaurants/one@19,
numRatings: 10, {/restaurants/two,avgRating: 3.5,...}@13
Fig. 1: Document example for a restaurant.
Firestore might report only snapshots time=10 and time=19,
/restaurants/one/ratings/2 but skip reporting time=13. This flexibility in what snapshots
rating: 3, to report gives Firestore more options on how to execute real-
userId: "UUU", time queries but does not impact the typical application which
details: {
text: "Food was tasty but cold", aims to display the latest accurate snapshot of a query’s results.
price: "good"
} D. Server SDKs
time: 2022/09/01 13:23:22 GMT As Figure 4 shows, Firestore has two categories of SDK
Fig. 2: Document example for a restaurant rating. libraries, each with support for multiple programming lan-
guages: “Server” used by applications that run in privileged
B. Indexes environments, such as GCE, GKE, Cloud Run, App Engine,
To scale with increase in database size, Firestore executes all and “Mobile and Web” used by applications that run on end-
queries using secondary indexes. To reduce the burden of index user devices such as mobile phones and browsers. The former
management, Firestore automatically defines an ascending and category includes older Datastore-API-based Server SDKs.
a descending index on each field across all documents on The Server SDKs map Firestore’s data model to the target
a per-collection basis. The automatically defined indexes for language, and provide convenient transaction abstractions,
the document in Figure 2 are on rating, userId, details, such as automatic retry with backoff.
details.text, details.price, and time.
Automatically defining indexes simplifies development but E. Mobile and Web SDKs
introduces some risks. First, a write operation becomes more One of Firestore’s differentiating features is that it allows
expensive because it needs to update more indexes, which in direct third-party (end-user) access, including support for dis-
turn increases latency and storage cost. Second, fields with connected operation. This feature is made possible by a com-
sequentially increasing values, such as time in Figure 2, bination of: abstractions within the SDKs (in particular, real-
introduce hotspots that limit maximum write throughput. To time queries), Firebase Authentication [7] which supports end-
address these issues, Firestore allows the customer to specify user authentication from a wide variety of identity providers
fields to exclude from automatic indexing (queries that would (Google, Apple, Facebook, phone numbers, anonymous, etc),
need the excluded index then fail). Firebase Security Rules [8] which is a security language for
Finally, the customer can define indexes across multiple expressing fine-grained access controls, and the Firestore API.
fields, e.g., rating asc and time desc to support queries like A developer can structure their application to authenticate
users with their choice of identity provider(s), and then pro-
s e l e c t * from / r e s t a u r a n t s / one / r a t i n g s
gram their application using the abstractions provided by the
where r a t i n g = 3 o r d e r by t i m e d e s c .
Mobile and Web SDKs. In a typical application, the main
abstractions are real-time queries to fetch the state to display
C. Querying
and various database updates to reflect the end-user’s actions.
Firestore supports point-in-time queries that are either The direct update of displayed state based on the results of
strongly-consistent or from a recent timestamp, and strongly- real-time queries greatly simplifies application development:
consistent real-time queries. Both modes support the same it displays the initial state when the application is opened,
query features: projections, predicate comparisons with a it automatically updates the display when some other user
constant, conjunctions, orders, limits, offsets. A query can have changes the state, it also automatically updates the display
at most one inequality predicate, which must match the first when this end-user updates the state (avoiding the need for
sort order. These restrictions allow Firestore’s queries to be any update-specific display logic), it behaves reasonably when
directly satisfied from its secondary indexes. the end-user is disconnected (local updates are seen), and
A real-time query reports a series of timestamped snap- it automatically reflects the results of reconciliation after
shots, where each snapshot is the strongly-consistent result reconnection.
of the query at that specific time. Snapshots are reported as The SDKs support transactional writes based on optimistic
deltas (documents added, removed, and modified) from the concurrency control while connected, and blind writes at all
previous snapshot. Firestore does not guarantee reporting every times. With transactions, all data read by the transaction
snapshot of a query’s result, e.g., if the query’s results went is revalidated for freshness at the time of the commit; the
through the following sequence (@ indicates the document’s transaction is retried if the data fails the freshness check. We
timestamp): do not support third-party, lock-based pessimistic concurrency
t=10:{/restaurants/one,avgRating:3,...}@1, control, because it would allow a third-party to easily conduct
3369
match /restaurants/{r}/ratings/{s} { 7UXVWHG%DFNHQGYLD (QG8VHUYLD0RELOH:HE
allow read: if request.auth != null; 6HUYHU6'. 6'. WKLUGSDUW\
allow create: if request.auth != null &&
request.auth.uid == )LUHVWRUH
request.resource.data.userId; $3,
}
Fig. 3: Rating security rules. )URQWHQG
0HWDGDWD 5HDOWLPH
a denial-of-service attack on writes to a Firestore database, &DFKH &DFKH
%DFNHQG
e.g., by holding read-locks on important documents for a long-
time. A blind write’s “last update wins” model works well with
potentially-disconnected operation, but still requires significant 6SDQQHU
SDK support which is discussed in Section IV-E.
In a system that allows direct third-party access, data needs
to be secured at a finer granularity than the whole database to
prevent accidental or malicious updates/views. These restric-
tions are expressed by the customer using Firestore security
rules. The example in Figure 3 allows any authenticated end- Fig. 4: Simplified architecture diagram of the Firestore stack.
user to read a restaurant rating, and any authenticated end-user
to add a restaurant rating as long as they attach their own user load-balancing and monitoring, that are not as unique to
ID to the rating. Updates and deletes of ratings are not allowed. Firestore compared to other cloud services.
The grammar allows nesting of match statements and wild-
cards to simplify writing of rules for sub-collections. The if A. Global Routing
condition can not only check the fields of accessed documents,
but also fetch and inspect fields of other database documents Global routing is more important for Firestore than a typical
(e.g., check an access control list). These additional document cloud database, as many Firestore requests originate from end
lookups are executed in a transactionally-consistent fashion users who are potentially spread around the world. These
with the operation being authorized. requests will arrive at the closest-to-the-user Google point
of presence, where Google Cloud’s networking infrastructure
F. Write Triggers looks up the database’s location from Firestore’s metadata and
routes requests to a Firestore Frontend task in the database’s
Firestore allows the definition of triggers on database
region. Outages in this location lookup will affect requests
changes that call specific handlers in Google Cloud Func-
originating locally but destined to databases in any region.
tions [9]. The application developer can define follow-up
This makes it critical for us to target 99.9999% of availability
actions in those handlers based on the changes to the database;
globally for the location lookup service in order to support
the delta from that change is conveniently available in the
Firestore availabilities of 99.99% in regional deployment
handler. This supports processing that would otherwise be
and 99.999% in multi-regional deployment, respectively. This
insecure or too expensive to perform on the end-user device.
availability is achieved by storing each database’s location
in a global Spanner database with multiple asynchronously-
IV. A RCHITECTURE AND I MPLEMENTATION
replicated read replicas. Location lookup uses these replicas,
The customer application in Figure 4 may be running on a thereby trading off slightly reduced availability of database
mobile device, a web client, a VM, or a container [10]. The creation/deletion events with much greater availability of
Firestore service is available in several geographical regions mostly-static location metadata.
of the world; a customer picks the location of a database at
creation time. Each of the four rectangles—Frontend, Back-
B. Billing and Free Quota
end, Metadata Cache, and Real-time Cache—comprises up to
thousands of tasks that get created in each region served by A key attraction for developers is Firestore’s daily free
Firestore. Firestore RPCs from the application get routed and quota comprising 1 GiB of storage, 50k document reads,
distributed across the Frontend tasks in the region where the and 20k document writes. This lets developers experiment
database is located, and subsequently to the Backend tasks with Firestore and run low-usage services at little or no
that translate them into requests to the underlying, per-region cost. The free quota is enforced by integration with Google’s
Spanner databases. quota system, operation-based billing is done by logging the
In this section, we focus on seven distinctive aspects of count of documents accessed by each RPC and integration
the architecture that are critical to the serverless nature of with Google’s billing system that reads the logs, and storage
Firestore and its ease-of-use: global routing, billing and the usage is measured and billed daily; Figure 4 does not show
free quota, multi-tenancy, writes, queries, real-time queries, these integrations. The next section discusses Firestore’s multi-
and disconnected operation. We skip other aspects, such as tenancy, which makes the free quota affordable.
3370
C. Multi-tenancy and Isolation databases in that region3 . Each directory has two tables,
Entities and IndexEntries, which contain the actual
Firestore’s multi-tenant architecture is key to its serverless Firestore database data.
scalability. All its components (see Figure 4) and the under- Each Firestore document is stored as a single row in the
lying Spanner database components are shared across large (fixed-schema) Spanner Entities table. The key-value pairs
numbers of Firestore databases. As a foundation, all compo- that constitute a schemaless Firestore document contents are
nents build on Google’s auto-scaling infrastructure [11], so the encoded in a protocol buffer [14] stored in a single column,
number of tasks in a given component adjusts in response to and the Firestore document name (unique key) encoded as
load. Thus, idle and mostly-idle databases use extremely few a byte-string serves as the key for that row. Spanner pro-
resources, which makes Firestore’s free quota and operation- vides row-granular atomicity guarantees, which means that the
based billing practical. Multi-tenancy however brings isolation schemaless collection hierarchy in Firestore’s data model does
challenges: traffic to a single Firestore database can potentially not impose any additional locking constraints; two or more
affect the performance and availability of other databases Firestore documents can be accessed concurrently independent
by consuming all or most of the resources in one or more of their position in the hierarchy.
components, or even worse, crashing tasks. Each Firestore index entry is stored in an inverted index:
Solving isolation presents several challenges. First, an indi- a single row in the (fixed-schema) Spanner IndexEntries
vidual RPC is not a uniform work unit, as its cost can vary table. The key of this table is an (index-id, values, name)
significantly—one RPC can cost a million times another—and tuple where the index-id identifies a particular index for the
in ways that are not predictable from RPC inspection, e.g., Firestore database, values is the byte-string encoding of the
queries with unknown result-set size. Second, isolation needs index entry’s values, and name is the byte-string encoded name
to happen on several dimensions: most importantly Firestore of the indexed Firestore document. The encoding of the n-tuple
Backend CPU and RAM, and Spanner CPU. Finally, database of values in values preserves the index’s desired sort order.
traffic can be very spiky: Firestore requires conforming traffic As Spanner tables, like Bigtable [4], support efficient, in-order
to grow progressively—increase at most 50% every 5 minutes, linear scans by key, a linear scan of a range of IndexEntries
starting from a 500 QPS base [12]. Firestore is designed to rows corresponds to a linear scan of a range of the logical
handle spiky traffic and will still accept traffic that violates Firestore index.
this rule as long as it can maintain isolation. Firestore manages its own indexes and implements its own
Each component is designed for isolation. For example, we query engine rather than relying on Spanner’s native function-
use a fair-CPU-share [13] scheduler in our Backend tasks, ality for two main reasons. First, it is not possible to define
keyed by database ID. We also pass the database ID as a key a Spanner index that matches Firestore’s automatic indexing
to Spanner’s similar fair-CPU-share scheduler. Additionally, rules, nor is it possible for such a Spanner index (which has
certain batch and internal workloads set custom tags on their to apply to data from all Firestore databases) to accommodate
RPCs, which allow schedulers to prioritize latency-sensitive the per-Firestore-database user-defined indexes and automatic
workloads over such RPCs. We limit the result-set size and index exclusions. Second, Firestore’s query semantics diverge
the amount of work done for a single RPC, which protects the significantly from Spanner’s, in particular by allowing sorting
system against problematic workloads. Firestore APIs support on any value including arrays and maps and sorting across
returning partial results for a query as well as resuming a fields with inconsistent types, such that a mapping to a Spanner
partially-executed query. Most parts of Firestore can split out query on the underlying Spanner database is impractical.
traffic even on a document granularity to redistribute load. Building directly on top of Spanner, with a one-to-one
Finally, some components do targeted load-shedding to drop mapping of documents and index entries to Spanner rows,
excess work before auto-scaling can take effect; auto-scaling yields significant benefits to Firestore: high availability, data
incorporates delays because short-lived traffic spikes do not integrity, transactional guarantees, and infinite scaling. In par-
merit auto-scaling. As discussed in Section VI, it is important ticular, Spanner’s automatic load-based splitting and merging
to also have manual tools to intervene in emergencies. of rows into tablets [5] (similar to other system’s shards
or partitions) that hold data for a consecutive key-range
D. Writes, Queries and Real-time Queries allows Firestore to scale to arbitrary read and write loads.
Firestore’s definition of conforming traffic [12] is designed to
A write to the database updates each matching secondary conservatively match Spanner’s splitting behavior. Firestore’s
index used by query executions, keeping them strongly con- transactions map directly to Spanner transactions, which are
sistent with the data, and notifies the Real-time Cache to send lock-based and use two-phase-commits across tablets. Because
notifications to clients with active real-time queries. Spanner uses multi-version concurrency control and assigns
1) Spanner Representation: To keep per-database cost low, TrueTime [5] timestamps to transactions, the serializability
Firestore maps each database in a region to a specific di- guarantee on timestamps allows Firestore to perform lock-free
rectory2 within a small number of pre-initialized Spanner
3 Storing each Firestore database in its own Spanner database would
require pre-allocating resources for millions of Spanner databases, which is
2A Spanner concept that guides sharding and placement [5]. prohibitively expensive with today’s state of the art.
3371
consistent (timestamp-based) reads across a database without 6) Commit Spanner transaction T with minimum allowed
blocking writes. The lack of consistency in many queries was timestamp max(mi ) and maximum allowed timestamp
a drawback for Datastore’s Megastore-based implementation; M : Spanner acquires additional exclusive locks on the
an important customer mentioned consistency as a reason for specific IndexEntries rows, and then atomically com-
migrating from Datastore to Spanner [15]. mits the changes to Entities and IndexEntries.
Adding or removing a Firestore secondary index requires This may involve updates across multiple Spanner
a backfill or backremoval in the Spanner IndexEntries tablets and servers. The lock acquisitions at this stage
table. This is managed by a background service that receives can conflict only with the read-locks from queries ex-
index change requests, scans the Entities table for all ecuting within a transaction5 , as IndexEntries rows
affected documents, makes the required IndexEntries row include the unique document name and the document is
additions or removals in Spanner, and finally marks the index already under an exclusive lock.
change as complete. It should be noted that a query that 7) Finish the two-phase-commit with the Real-time Cache
mutates the database also makes all necessary updates to the by sending corresponding Accept RPCs with the out-
”IndexEntries” table so that it conforms to an on-going backfill come of the Spanner commit; at this point the Real-time
or backremoval. Cache should have the name of each deleted document,
2) Writes: We describe writes using an example update a full copy of each inserted document, and a full copy
to the Restaurant application. Say an end-user adds a new of each modified document together with the exact
rating, which also involves updating the average rating on the changes. The Real-time Cache tracks these mutations
restaurant’s document. This requires a Firestore transaction in memory sorted in timestamp-order.
that inserts the rating document from Figure 2 and updates the There are multiple points where this process can fail, with
numRatings and avgRatings fields of the parent restaurant
varying consequences:
document from Figure 1. The commit of the Firestore trans-
action is processed by the Backend as follows: • /restaurants/one does not exist,
1) Create a Spanner read-write transaction T . /restaurants/one/ratings/2 already exists, or
2) In transaction T , read documents /restaurants/one the security rules deny the request: an error is returned
and /restaurants/one/ratings/2 from the Span- to the user.
ner Entities table with an exclusive lock4 . Ver- • The Prepare RPC fails because the Real-time Cache is
ify that /restaurants/one does exist and that unavailable (this should be rare): the write fails and an
/restaurants/one/ratings/2 does not exist.
error is returned to the user.
3) Because the request is from a third party, • The Spanner commit definitively fails, e.g., due to con-
execute the database’s write security rules tention or not being able to respect the maximum times-
(Figure 3) for /restaurants/one and tamp. The Accept RPC notifies the Real-time Cache, and
/restaurants/one/ratings/2. Add the update
an error is returned to the user.
of row /restaurants/one and the insert of row • The Spanner commit has an unknown outcome, e.g., it
/restaurants/one/ratings/2 in Entities to
times out. The Accept RPC notifies the Real-time Cache
transaction T . that the write outcome is unknown, which in turn discards
4) Use the (cached) index definitions to compute the index the in-memory sequence of mutations.
entry changes for the two documents. The result is • The Spanner commit is successful but the Accept RPC
the removal of the old index entries for numRatings is not received by the Real-time Cache. The in-memory
and avgRatings for /restaurants/one and ad- sequence of mutations is (eventually) discarded by the
ditions of new ones for their new value, and ad- Real-time Cache, but the write is acknowledged to the
dition of new index entries for all the fields of end-user.
/restaurants/one/ratings/2. Add the correspond- Insertion of documents with many fields results in a larger
ing row insertions and deletions in IndexEntries to number of index entries that need to be added, and that
transaction T , thereby ensuring Firestore indexes stay translates to a Spanner transaction potentially across more
strongly consistent with the documents. tablets, which can impact commit latency. Indexing a field that
5) Pick a reasonable max commit timestamp M and start a increases sequentially, e.g., a document creation timestamp,
two-phase-commit with the Real-time Cache by sending implies the insertion of consecutive rows in the ”IndexEntries”
one or more Prepare RPCs with max commit times- table as documents get created. This workload is inherently
tamp M . The results of each RPC contain a minimum difficult to split.
allowed commit timestamp mi ; Section IV-D4 describes Network latency between replicas is higher for a multi-
which tasks in the Real-time Cache the Backend com- regional deployment, and Spanner needs a quorum of replicas
municates with. to agree before committing a write, leading to higher Firestore
4 Sub-document granular locking is not supported because a well-designed
data model has many small documents, and sub-document concurrency is 5 As discussed in subsubsection IV-D3, a timestamp-based query runs
unnecessary. without locks.
3372
write latency in multi-regional deployments than in regional
ones.
Spanner also has a transactional messaging system that
allows its user to persist information that can be used to
perform asynchronous work. This system is used by the Fire-
store Backend to implement write triggers (subsection III-F).
If an incoming request matches a trigger, the Backend persists
a message with the changes to document(s), which is then
asynchronously removed and delivered to the Cloud Functions
service to execute the specified handler.
3) Queries: Firestore’s query engine executes all queries
using either a linear scan over a range of a single secondary
index in the Spanner IndexEntries table, or a join of
several such secondary indexes, followed by lookup of the
corresponding documents in the Entities table, with no in-
memory sorting, filtering, etc. For instance, Fig. 5: A more detailed diagram of the Real-time Cache.
s e l e c t * from / r e s t a u r a n t s The query planner then uses the (cached) index definitions
where c i t y =” SF ” and t y p e =”BBQ” to pick the best index(es) for the query. The Backend reads,
o r d e r by a v g R a t i n g d e s c and, if necessary “zig-zag joins” [16], the row ranges from
is satisfied by the secondary index (city asc, type asc, av- IndexEntries, then fetches the corresponding rows from
gRating desc). Firestore’s automatically defined single-field Entities, and returns the documents.
indexes support simple queries, such as Queries can be executed within a Firestore transaction (the
Spanner-level reads of IndexEntries and Entities acquire
s e l e c t * from / r e s t a u r a n t s read locks, guaranteeing consistency with other transactions)
where c i t y =” SF ” l i m i t 10 or outside a transaction using Spanner’s lock-free consistent,
timestamp-based reads. In the former mode, long-lived or
s e l e c t * from / r e s t a u r a n t s large transactions may lead to lock contention and deadlocks
where n u m R a t i n g s > 2 that are resolved by failing and retrying such transactions. A
timestamp-based query suffers no such problem.
s e l e c t * from / r e s t a u r a n t s 4) Real-time Queries: A client registers one or more real-
o r d e r by a v g R a t i n g d e s c time queries via a long-lived connection with a Frontend task.
To reduce the need for user-defined indexes, Firestore joins The Frontend task uses this connection to deliver updates to
existing indexes. Thus, a query like the result sets of those queries. The updates are delivered to ap-
plications as incremental, timestamped snapshots, comprised
s e l e c t * from / r e s t a u r a n t s of a delta of documents added, deleted, and modified from
where c i t y = ” SF ” and t y p e = ”BBQ” the prior snapshot. This section provides only a high-level
is executed by joining automatic single-field index (city asc) overview of this system. A more detailed description of this
with (type asc), and a query like infrastructure is outside the scope of this paper.
As Figure 5 shows, the Real-time Cache comprises
s e l e c t * from / r e s t a u r a n t s 2 components—the In-memory Changelog and the Query
where c i t y =”New York ” and t y p e =”BBQ” Matcher. A separate mechanism establishes and shares con-
o r d e r by a v g R a t i n g d e s c sistent ownership of document-name ranges to specific
by joining user-defined indexes on (city asc, avgRating desc) Changelog and Query Matcher tasks.
and (type asc, avgRating desc). Selecting the ideal set of The request/response flow for a real-time query is as fol-
indexes to join for a query is intractable, so Firestore’s lows:
query engine uses a greedy index-set selection algorithm that 1) A client creates or reuses a long-lived connection to a
optimizes for the number of selected indexes. If no such set Frontend task and registers a new real-time query.
exists, Firestore returns an error message that includes a link 2) The Frontend task creates state for this real-time query
for adding the required index via the Google Cloud Console. In and then obtains the query’s initial snapshot by for-
practice, these error messages let developers add the required warding the query to a Backend (which runs it like any
indexes during testing. We do occasionally receive support other query) to retrieve all the documents matching the
cases for query performance caused by slow index joins that query; the response includes the corresponding Spanner
are remediated by defining additional indexes. timestamp of that data, which we will call max-commit-
The execution of a non-real-time query starts by verifying version.
the security rules for the collection specified in the query. 3) The Frontend task sends the client an initial snapshot
3373
based on the Backend’s response and records max- error scenarios that may occur, but we discuss only one given
commit-version with the query state. All subsequent the lack of space: if the Changelog times out while waiting
updates to this result set will be delivered to the client for an Accept or the Accept indicates an unknown outcome,
as incremental snapshots. the system cannot guarantee ordering of the updates for that
4) The Frontend task then sends one or more Subscribe name range. Then, the Changelog task marks that name range
RPCs to Query Matcher tasks that own the specific as out-of-sync and signals that all the way up to all Frontend
document-name ranges that cover the query’s result set. tasks with a real-time query that matches the name range. The
The Subscribe RPC includes the query and the max- Frontend task then aborts all accumulated state for that query
commit-version. This informs the Query Matcher task and redoes the steps starting with the initial query request to
to register the query for matching, and to subsequently the Backend. This reset is fast, and is mostly transparent to the
send only the document updates that match the query end-user of the application. In general, this recovery method
and that have a Spanner commit timestamp later than is a fail-safe mechanism to handle difficult error conditions,
the max-commit-version. such as the crash/restart of a particular task. Load-balancing is
5) A Changelog task forwards document updates (received achieved by dynamically changing the document-name range
via Accept RPCs from the Backend) to the Query ownership across Changelog and Query Matcher tasks by
Matcher task owning the corresponding document-name leveraging the Slicer [17] auto-sharding framework.
range. On receiving the document, the Query Matcher
matches it with all the queries registered for that key E. Disconnected Operation
range and sends the matched documents to the Frontend The Client (Mobile and Web) SDKs build a local cache
task. of the documents accessed by the client together with the
6) Because updates for a single query come from the necessary local indexes. It uses the local cache to provide
multiple Query Matcher tasks to which the query was low latency responses to client queries without the network
subscribed, the Frontend task is responsible for tracking penalty. Mutations to documents by the client are acknowl-
when it has received all the updates necessary to reach edged immediately after updating the local cache; the updates
a consistent timestamp. Only then does it send the are also flushed to the Firestore API asynchronously.
accumulated delta of those updates to the client as a new The local cache is updated whenever it receives notifications
incremental snapshot for that query’s result set; it also over the long-lived connection it maintains with a Frontend
then updates the query’s max-commit-version to that task. A disconnected client can therefore continue to serve
latest timestamp. Changelog tasks generate a heartbeat queries and updates using its local cache, and reconcile its
every few milliseconds for every idle key range; this local cache when it eventually reconnects with Firestore. The
heartbeat is crucial for the Frontend tasks to know that Client SDK is also responsible for guaranteeing consistency
they have received all updates when a document-name across the multiple real-time queries a client may have active.
range is otherwise idle. Based on their privacy preferences, an end user can choose
A client can open many real-time queries to the same to persist their local cache. This choice affects the behavior
database, multiplexed over the same long-lived connection after a device is restarted; persistence provides a warm cache
to the Frontend task. Updating these queries to inconsistent as a starting point for requests to the Client SDK.
timestamps would be confusing to the end-user where the It should be noted that the customer is not billed for
results from multiple queries may be presented together. To any work that can be satisfied by the local cache; only the
avoid this, queries on the same connection are only updated traffic to/from the Firestore service is billed as described in
to a timestamp t once all queries’ max-commit-version has subsection IV-B.
reached at least t.
When a Changelog task receives a Prepare with max V. E VALUATION
timestamp M , it responds with a minimum timestamp m. We share some production data, show latency variance
The maximum timestamp (plus a small margin) sets how with change in important parameters, evaluate one isolation
long the Changelog will wait for the corresponding Accept. mechanism, and analyze factors that make Firestore easy to
The Changelog knows it has a complete sequence of updates use. All data in this section is presented as relative to a median
until time t once it has received Accept responses for all or as comparisons across the changing x-axis parameter.
Prepare RPCs that it sent out with a minimum timestamp
less than t. This machinery, and the timestamp processing A. Production Statistics
in the Frontends, relies crucially on the globally-consistent, The four million Firestore production databases accessed by
causally-ordered timestamps provided by Spanner [5]. over a billion end-users each month are evidence of Firestore’s
The Accept indicates whether the write was success- wide adoption, ease of use and scalability. The scalability
ful, failed or had an unknown outcome. If successful, the of Firestore is also demonstrated by the variability in usage
Changelog tasks forwards the associated document updates to patterns seen across customers, all of whom interact with the
the Query Matcher task (that owns the name range) with the same multi-tenant Firestore tasks and Spanner databases. We
commit timestamp; a failed write is dropped. There are several present this variance as boxplots [18] in Figure 6 using values
3374
Fig. 7: Read latency for both YCSB workloads.
Fig. 6: Various database properties normalized by median.
3375
Fig. 11: Query latency of “bystander” database with and
without fair CPU scheduling.
Figure 10a shows that the increase in latency has a base cost
Fig. 9: Notification latency on a linear y-axis with increase in plus a linear, size-dependent cost (note that the x-axis is not
number of client connections. linear). Figure 10b demonstrates that index entry count does
not significantly impact latency until beyond several hundred
fields. It shows that commit latency increases more slowly
with the increase in number of rows—each index entry is an
additional Spanner row written to the IndexEntries table.
In summary, commit latencies are dominated by document and
field sizes and not by field count.
C. Isolation
(a) (b) Subsection IV-C outlines several isolation mechanisms used
to make Firestore’s multi-tenancy feasible. One crucial isola-
Fig. 10: Latency of document commits with increasing (a)
tion feature is fair CPU scheduling in the Firestore Backend,
size, via field length (b) index count, via field count. The y
which prevents a single database’s requests from starving other
axes on both these graphs share the same linear scale, and are
databases of CPU when requests rise quicker than automatic
therefore directly comparable.
scaling can react. We evaluate this isolation with a small scale,
fixed capacity (no automatic scaling) Firestore environment
in this graph, we also observe that the commit latency remains with fair CPU scheduling enabled or disabled. We send two
constant throughout the experiment because of this separation. workloads to this environment: a “culprit” database sends
2) Data Shape: Two obvious properties affecting latency CPU-intensive (due to an inefficient indexing setup) queries
of Firestore writes are the size of documents being committed that linearly ramp up to 500 QPS to hit scaling limits of the
as well as the number of indexes being updated. The latter is test environment, and a “bystander” database sends 100QPS of
particularly relevant because—unless specifically exempted— single-document fetches. As Figure 11 shows, when capacity
Firestore automatically indexes all fields for easy querying. limits are reached halfway through the experiment, a lack
To illustrate these relationships, we ran two experiments of CPU fairness leads to a significant degradation of the
with 10 QPS of Firestore commits, where each commit adds bystander database’s latency. The fair scheduling keeps latency
a single document. In the first experiment, each document impact to a minimum, leaving only a small increase in p99
comprises a single field with a varying length of single- latency (note the log scale). Firestore’s production environment
byte characters, from 10KB to almost 1MiB, which is the robustly handles all types of traffic spikes thanks to this
maximum document size supported by Firestore. In the second isolation feature and many others, such as automatic scaling.
experiment, each document has a varying number of numeric- D. Ease of Use
value fields from 1 to 500, which results in a linear increase Outside of the quarter million developers actively build-
in the number of index entries written per commit. From a ing on Firestore, there is no convenient quantitative way to
performance standpoint, for the same number of index entries, showcase Firestore’s ease of use. Instead, we discuss some
there is no significant performance difference between one aspects of the ease with which Firestore can be used to
large field and many small fields of the same total size, or build a sample application and the corresponding number
an array or map with many elements again of the same total of lines of Javascript code. This example—illustrated as a
size: Firestore indexing flattens out fields such as arrays or step-by-step guide by the Firestore Web Codelab [6]—is a
maps to index each element, and therefore the correspond- functional restaurant recommendation web application, which
ing performance is similar to that of a document with as lets viewers see a list of restaurants with filtering and sorting,
many fields. The experiment was preceded by initializing the and view and add reviews.
database with enough data to ensure that commits spanned The initialization steps—creating and enabling a database,
multiple tablets and thus adding a single document required a setting up rudimentary access control, and a web server—are
distributed Spanner commit. Each data point shows latencies all accomplished by running a few commands. The onSnap-
from a 10 minute measurement interval. shot() method is used to listen to the results of a query. The
3376
developer specifies a callback that receives the initial query Data integrity is a core requirement of any database. We rely
result snapshot. Subsequently, each time the result set changes, both on Spanner’s data integrity guarantees for data at rest, and
another call updates the snapshot. The developer can also view periodic data validation jobs at both the Spanner and Firestore
the changes to the result set, i.e., the documents added, deleted layers to verify the correctness of data and consistency of
or modified. The result set state is populated or updated from indexes. However, mass-produced machines themselves are
the local cache and from updates sent by the Real-time Cache; unreliable [22], [23] and may corrupt in-memory data. We
this is all handled seamlessly by the Client SDKs. are actively addressing these issues through the addition of
This application is 841 lines of Javascript code, of which 92 end-to-end checksums to protect in-flight RPCs.
are to access Firestore. However, this Codelab includes cre- As discussed earlier, isolation is hard, and our techniques
ation of sample restaurant data and other hardwired constants, for maintaining isolation are not always guaranteed to work,
without which the application shrinks to 572 lines, of which especially when confronted with sudden traffic spikes of
66 are for reading from and writing to the Firestore database. unusual workloads that, e.g., require much more RAM than
Custom security rules (25 lines) allow reads to authenticated the typical RPC. We use two tools to quickly mitigate such
users and creation of new reviews as long as the review challenges. One is a low-tech manual tool that limits the
contains the authenticated user’s unique identifier. number of per-task in-flight RPCs for a given database, which
We also implemented a “Todo List” application that lets has been one of our more effective mechanisms for preventing
users share a todo list, add new items, mark items as done, isolation failure while waiting for fixes (capacity changes or
and delete them. This requires 112 lines of Javascript code, of code updates) to rollout. This tool may not suffice in some
which 37 are for Firestore database access. Security rules (8 cases, such as traffic triggering a bug that leads to task crashes
lines) allow everyone to read and create todo items, but only or when limiting a customer’s workload is highly undesirable.
an item creator can mark an item as done or delete it. In such cases, all traffic for that database can be routed to a
BeReal is an example of a recently viral social application separate pool (of tasks) for the impacted component, thereby
that was written using Firestore and where its ease of use has isolating it completely. This pool can also be configured to
been highlighted as an advantage [21]. auto-scale to the database’s traffic.
Finally, the design of the Firestore API was informed
VI. L ESSONS IN P RACTICE by many lessons learned from Datastore; we present one
We present a selection of the many practical lessons learnt significant lesson. Our customers found the Megastore-based
from running Datastore and Firestore over more than a decade. implementation restrictive for organizing their data, which
Backwards compatibility is essential for a cloud service that had to be carefully organized into entity groups to support
is constantly updated under live traffic. The default guarantee transactional updates and strongly-consistent queries. Further-
for updating Firestore is bug-for-bug API compatibility. Very more, the write throughput to each entity group was limited.
occasional, small behavior changes are possible but only with This forced most customers into using eventually-consistent
strong motivation. When necessary, such changes are preceded queries [15]. Firestore leverages Spanner to remove all of these
by a comprehensive investigation—examples are analysis of limitations, allowing unrestricted transactions and strongly-
all RPCs seen during a sufficiently long period or a scan of consistent queries, with no transaction-rate limits.
the entire corpus of customer data—to identify all customers
that may be impacted by the change, and working directly with VII. R ELATED W ORK
them to address potential risks. We twice rewrote the Firestore Lotus Notes was the first document-oriented system, and
query planner. These rewrites were extensively tested with was emulated later by what came to be known as NoSQL
A/B comparison of query execution to confirm zero customer document stores. When it launched in 1989, Notes had many
impact before rollout. The migration of Datastore from the advanced features that did not then exist in RDBMSs—e.g.,
Megastore-backed to a Spanner-backed system unavoidably disconnected operation with replicated documents in a client-
reduced maximum key size, affecting a tiny number of docu- server environment, field-level authorization and encryption, n-
ments for very few customers. We contacted these customers gram indexing of strings enabling fuzzy searching, form-based
directly to ensure their data was handled correctly. workflows, triggers, and tree of trees indexes to support nested
Relatedly, Firestore is not versioned, so rollouts directly and views with sophisticated collation options. Because Notes was
immediately impact customer traffic. Avoiding problems from aimed at small workgroup environments, it hit scaling and
these regular (weekly) rollouts requires detecting problems transactional problems when used in large enterprises as a
early and rolling back the release, if necessary. To achieve mail system and a document store. More than a decade later,
this, we rely on Google’s internal rollout principles—slow these issues were addressed with the introduction of log-based
code and configuration rollouts within a region and gradual recovery and by allowing a single database to span more
rollout across regions. We use automated A/B testing for some than one file [24]. Notes preceded the cloud and the big data
number of minutes of the current against the new release revolution, so was designed for on-premise database sizes.
across many metrics before the rollout is allowed to proceed. Microsoft’s Azure Cosmos DB (originally named Azure
This is done at sub-region granularity because traffic patterns DocumentDB [25]) is a modern day cloud-scale document
can differ significantly across regions. store, which is similar to Firestore. While Cosmos DB uses
3377
the file system directly to persist its documents, Firestore of work that is comparable to Firestore’s real-time queries in
relies on the scaling and decade-long production quality of other document database systems.
Spanner [5]. We have discussed earlier how Firestore uses VIII. F UTURE W ORK
Spanner’s data modeling constructs in a stylized way to store
The Firestore query API was designed to be simple for
document collections and their associated indexes. It should
application developers, but it may become limiting as an appli-
be noted that this results in what could be characterized as
cation matures. We are working on adding query functionality
unnormalized tables due to the redundant storage of collection
while conforming to the design parameters of predictable
name in every Spanner row representing a document, and index
query scaling and efficient real-time updates. These changes
ID in every Spanner row representing an index entry; this
will require extending our billing model that is currently
is necessitated by our multi-tenant layout in Spanner. Cos-
based on only the number of documents in the result set; a
mos DB supports a SQL dialect and JavaScript to query, using
COUNT query returns a single value but may count millions
relational or hierarchical constructs, JSON documents with
of documents, in-memory filtering may require examining
tunable transaction consistency levels. It also supports stored
and discarding many documents. However, such extensions
procedures and triggers, while Firestore supports triggers by
cannot break the pay-as-you-go billing that is essential to the
integrating with Google Cloud Functions. One of Firestore’s
serverless experience.
main goals is ease of mobile-friendly application development
It would be beneficial to push down more of Firestore query
with an easy-to-understand billing model and disconnected
evaluation into Spanner for increased efficiency and reduced
operation for mobile clients with automatic reconciliation of
latency. However, translation of a Firestore query into one
parallel updates.
on the underlying Spanner schema (that represents Firestore’s
Amazon DynamoDB [26], [27] has its own storage engine
data) may produce a sufficiently complex query that Spanner’s
that scales, whereas Firestore leverages Spanner. The evo-
query planner cannot execute it efficiently. Unlocking this
lution of DynamoDB from SimpleDB also differs from the
problem is an active area of investigation.
evolution of Firestore from Datastore. One unique feature of
Some customers wish to add light schema restrictions on
DynamoDB is that the system continuously and proactively
their previously schemaless data in some mature applications.
monitors availability both on the server side and on the client
Providing opt-in schema functionality will allow mapping
side. In addition to data replicas, it also supports log replicas
those fields directly into our underlying Spanner schema,
to improve availability. It supports strongly and eventually
unlocking the aforementioned push-down efficiency and po-
consistent reads. For efficient metadata management, it relies
tentially more query functionality.
on an in-memory distributed datastore called MemDS.
We are exploring improved isolation by selective slow-
The original version of MongoDB did not have transactional
down or rejection of traffic of a given database when under
guarantees. Now, it stores all data in BSON format (binary
memory pressure, based on the memory consumed by in-
form of JSON), using the WiredTiger storage engine which
flight queries to that database. So far, we have focused on
supports transactions and tunable levels of consistency by
isolation mostly between databases, but Firestore customers
exposing what are called writeConcern and readConcern levels
need isolation also within their database: for example, a bug
that are usable with each database operation [28]. This aspect
in their daily batch job should not lead to rejection of user-
of MongoDB has necessitated the invention of a speculative
facing traffic. Adding support for intra-database isolation will
execution model and data rollback algorithms. MongoDB does
require API-level changes, e.g., in the form of QoS indications.
support secondary indexes, an ad hoc query language, complex
aggregations, sharding, etc. The adaptation of the TPC-C IX. C ONCLUSION
benchmark to the document database model of MongoDB is In this paper, we presented and analyzed the building blocks
presented in [29]. MongoDB includes Change Streams [30], of Firestore that are key to its popularity with the applica-
which allows interested parties to subscribe to changes in one tion developer community. We showed how its schemaless
or all collections in a database. This differs from Firestore’s data model, serverless operations, and simple billing model
real-time query feature, which supports listening on complex combined with the Firebase SDKs, provides a convenient
queries such that only relevant changes are streamed to the ecosystem with a low barrier of entry to developers to rapidly
listener, and scaling to arbitrary numbers of listeners. prototype, deploy, iterate, and sustain applications. We showed
Couchbase [31] is a document database system that sup- how Spanner and Google infrastructure were leveraged to
ports JSON as its data model. Based on a shared-nothing archi- allow QPS and storage scaling, and presented an overview of
tecture, Couchbase supports indexes and declarative querying how the Real-time Cache and client SDKs provide a seamless
using SQL-like queries, including joins, aggregations, and experience of real-time notifications to clients even in the
nested objects. As an alternative to the original Couchstore presence of network connectivity issues.
storage engine, a new storage engine called Magma has been We thank the many Googlers and ex-Googlers who have
developed recently [32] to support write-intensive workloads. worked to make Datastore and Firestore a success. In par-
Its goal is to improve on write and space amplification. ticular, Datastore development was successively led by Ryan
There is a large body of work on standing or continuous Barrett, Max Ross and Alfred Fuller. The Firestore API
queries for relational database systems, but we are unaware development was led by Alfred Fuller and Andrew Lee.
3378
R EFERENCES [22] P. H. Hochschild, P. Turner, J. C. Mogul, R. Govindaraju, P. Ran-
ganathan, D. E. Culler, and A. Vahdat, “Cores that don’t count,” in
[1] A. Arasu, S. Babu, and J. Widom, “CQL: A language for continuous Proceedings of the Workshop on Hot Topics in Operating Systems, 2021,
queries over streams and relations,” in Database Programming pp. 9–16.
Languages, 9th International Workshop, DBPL 2003, Potsdam, [23] “Silent Data Corruption at Scale,” https://www.sigarch.org/
Germany, September 6-8, 2003, Revised Papers, ser. Lecture Notes silent-data-corruption-at-scale/, accessed 2022-10-18.
in Computer Science, G. Lausen and D. Suciu, Eds., vol. 2921. [24] C. Mohan, R. Barber, S. Watts, A. Somani, and M. Zaharioudakis,
Springer, 2003, pp. 1–19. [Online]. Available: https://doi.org/10.1007/ “Evolution of groupware for business applications: A database
978-3-540-24607-7 1 perspective on lotus domino/notes,” in VLDB 2000, Proceedings
[2] C. R. Severance, Using Google App Engine - start building and of 26th International Conference on Very Large Data Bases,
running web apps on Google’s infrastructure. O’Reilly, 2009. [Online]. September 10-14, 2000, Cairo, Egypt, A. E. Abbadi, M. L. Brodie,
Available: http://www.oreilly.de/catalog/9780596800697/index.html S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, and K. Whang,
[3] J. Baker, C. Bond, J. C. Corbett, J. J. Furman, A. Khorlin, Eds. Morgan Kaufmann, 2000, pp. 684–687. [Online]. Available:
J. Larson, J. Leon, Y. Li, A. Lloyd, and V. Yushprakh, “Megastore: http://www.vldb.org/conf/2000/P684.pdf
Providing scalable, highly available storage for interactive services,” [25] D. Shukla, S. Thota, K. Raman, M. Gajendran, A. Shah, S. Ziuzin,
in Fifth Biennial Conference on Innovative Data Systems Research, K. Sundaram, M. G. Guajardo, A. Wawrzyniak, S. Boshra et al.,
CIDR 2011, Asilomar, CA, USA, January 9-12, 2011, Online “Schema-agnostic indexing with azure documentdb,” Proceedings of the
Proceedings. www.cidrdb.org, 2011, pp. 223–234. [Online]. Available: VLDB Endowment, vol. 8, no. 12, pp. 1668–1679, 2015.
http://cidrdb.org/cidr2011/Papers/CIDR11 Paper32.pdf [26] S. Sivasubramanian, “Amazon dynamodb: a seamlessly scalable non-
[4] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, relational database service,” in Proceedings of the 2012 ACM SIGMOD
M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, “Bigtable: A International Conference on Management of Data, 2012, pp. 729–730.
distributed storage system for structured data,” ACM Trans. Comput. [27] S. Perianayagam, A. Vig, D. Terry, S. Sivasubramanian, J. C. Soren-
Syst., vol. 26, no. 2, pp. 4:1–4:26, 2008. [Online]. Available: son III, A. Mritunjai, J. Idziorek, N. Gallagher, M. Elhemali, N. Gordon
https://doi.org/10.1145/1365815.1365816 et al., “Amazon dynamodb: A scalable, predictably performant, and fully
managed nosql database service,” in 2022 USENIX Annual Technical
[5] J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman,
Conference (USENIX ATC 22), 2022, pp. 1037–1048.
S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild et al., “Spanner:
[28] W. Schultz, T. Avitabile, and A. Cabral, “Tunable consistency in
Google’s globally distributed database,” ACM Transactions on Computer
mongodb,” Proc. VLDB Endow., vol. 12, no. 12, pp. 2071–2081, 2019.
Systems (TOCS), vol. 31, no. 3, pp. 1–22, 2013.
[Online]. Available: http://www.vldb.org/pvldb/vol12/p2071-schultz.pdf
[6] “Cloud Firestore Web Codelab,” https://firebase.google.com/codelabs/
[29] A. Kamsky, “Adapting TPC-C benchmark to measure performance
firestore-web#0, accessed 2022-10-18.
of multi-document transactions in mongodb,” Proc. VLDB Endow.,
[7] “Firebase Authentication,” https://firebase.google.com/docs/auth, ac- vol. 12, no. 12, pp. 2254–2262, 2019. [Online]. Available: http:
cessed 2022-10-18. //www.vldb.org/pvldb/vol12/p2254-kamsky.pdf
[8] “Firebase Security Rules,” https://firebase.google.com/docs/rules, ac- [30] “How Do Change Streams Work in MongoDB?” https://www.mongodb.
cessed 2022-10-18. com/basics/change-streams, accessed 2022-10-18.
[9] “Google Cloud Functions,” https://cloud.google.com/functions, accessed [31] M. A. Hubail, A. Alsuliman, M. Blow, M. J. Carey, D. Lychagin,
2022-10-18. I. Maxon, and T. Westmann, “Couchbase analytics: Noetl for
[10] B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, “Borg, scalable nosql data analysis,” Proc. VLDB Endow., vol. 12, no. 12,
omega, and kubernetes,” Communications of the ACM, vol. 59, no. 5, pp. 2275–2286, 2019. [Online]. Available: http://www.vldb.org/pvldb/
pp. 50–57, 2016. vol12/p2275-hubail.pdf
[11] K. Rzadca, P. Findeisen, J. Swiderski, P. Zych, P. Broniek, J. Kusmierek, [32] S. Lakshman, A. Gupta, R. Suri, S. D. Lashley, J. Liang, S. Duvuru,
P. Nowak, B. Strack, P. Witusowski, S. Hand et al., “Autopilot: work- and R. Mayuram, “Magma: A high data density storage engine used in
load autoscaling at google,” in Proceedings of the Fifteenth European couchbase,” Proc. VLDB Endow., vol. 15, no. 12, pp. 3496–3508, 2022.
Conference on Computer Systems, 2020, pp. 1–16. [Online]. Available: https://www.vldb.org/pvldb/vol15/p3496-lakshman.
[12] “Firestore: Ramping up traffic,” https://firebase.google.com/docs/ pdf
firestore/best-practices#ramping up traffic, accessed 2022-10-18.
[13] C. A. Waldspurger and W. E. Weihl, Stride scheduling: Deterministic
proportional share resource management. Massachusetts Institute of
Technology. Laboratory for Computer Science, 1995.
[14] “Protocol buffers,” https://developers.google.com/protocol-buffers, ac-
cessed 2022-10-18.
[15] “How Pokémon GO scales to millions of requests?”
https://cloud.google.com/blog/topics/developers-practitioners/
how-pok%C3%A9mon-go-scales-millions-requests, accessed 2022-10-
18.
[16] L. D. Shapiro, “Join processing in database systems with large main
memories,” ACM Transactions on Database Systems (TODS), vol. 11,
no. 3, pp. 239–264, 1986.
[17] A. Adya, D. Myers, J. Howell, J. Elson, C. Meek, V. Khemani, S. Fulger,
P. Gu, L. Bhuvanagiri, J. Hunter, R. Peon, A. Shraer, A. Merchant,
and K. Lev-Ari, “Slicer: Auto-sharding for datacenter applications,” in
OSDI’16: Proceedings of the 12th USENIX conference on Operating
Systems Design and Implementation, 2016, pp. 739–753.
[18] “Box plot,” https://en.wikipedia.org/wiki/Box plot, accessed 2022-10-
18.
[19] “Cloud Firestore locations,” https://firebase.google.com/docs/firestore/
locations#location-mr, accessed 2022-10-18.
[20] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears,
“Benchmarking cloud serving systems with ycsb,” in Proceedings of the
1st ACM symposium on Cloud computing, 2010, pp. 143–154.
[21] “BeReal builds a real and authentic social media platform
on Google Cloud,” https://cloud.google.com/blog/topics/startups/
bereal-creates-reality-based-social-media-using-google-cloud, accessed
2022-10-18.
3379