blob: 41f27b3fa409a03253abac995eb285d3f8732e2b [file] [log] [blame] [view]
Nigel Tao187a4792023-09-28 22:30:441# What’s Up With Processes
2
3This is a transcript of [What's Up With
4That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
5Episode 8, a 2023 video discussion between [Sharon ([email protected])
6and Darin ([email protected])](https://www.youtube.com/watch?v=SD3cjzZl25I).
7
8The transcript was automatically generated by speech-to-text software. It may
9contain minor errors.
10
11---
12
13Chrome has a lot of process types. What is a process? What are all the types?
14How do they work together? Today’s special guest to tell us more is Darin.
15Darin is one of the founding members of the Chrome team, and wrote the initial
16implementation of the multi-process architecture.
17
18Notes:
19- https://docs.google.com/document/d/1uXF-ncJ98LWQMN7M3NA_2oYkVmW9Vzp0v-wkJaNpsDQ/edit
20
21Links:
22- [Chrome comic](https://www.google.com/googlebooks/chrome/small_00.html)
23- [What's Up With Mojo](https://www.youtube.com/watch?v=at_35qCGJPQ)
24- [What's Up With Open Source](https://www.youtube.com/watch?v=zOr64ee7FV4)
25- [What's Up With //content](https://www.youtube.com/watch?v=SD3cjzZl25I)
26- [Life of a Process](https://www.youtube.com/watch?v=5im7SGmJxnA)
27- [Chrome Compositing](https://chromium.googlesource.com/chromium/src/+/HEAD/docs/how_cc_works.md)
28- [Site Isolation papers by Charlie](https://charlesreis.com/research/publications/)
29
30---
31
3200:00 SHARON: Hello, and welcome to "What's Up With That," the series that
33demystifies all things Chrome. I'm your host, Sharon, and today, we're talking
34about processes. There are so many process types in Chrome. How do they form
35the multi-process architecture? What exactly is a process? Here to answer all
36of that and more is today's special guest, Darin. Darin was one of the founding
37members of the Chrome team and pretty much did the first implementation of the
38multi-process architecture, so it is well-suited to answer all of this. Plus,
39created the IPC channels that Chrome started with. If you want to learn more
40about IPC and Mojo, check out the last episode with Daniel for lots more on
41that. So hello. Welcome, Darin. Welcome to the show. Thanks for being here.
42
4300:38 DARIN: Thank you. Great to be here.
44
4500:38 SHARON: Yeah, cool. So first question, what is a process?
46
4700:44 DARIN: Right, so process is the container in which applications run on
48your system. Every process has both its own executing set of threads, but it
49also has its own memory space. That way, processes have their own independent
50memory, their own independent data, and their own independent execution. The
51system is multitasking across all of the processes on the system.
52
5301:13 SHARON: Cool. Chrome is basically an operating system that runs on top of
54your operating system. So there probably are parallels between Chrome's
55representation of a process and the actual operating system ones. So what are
56the similarities and differences, and how do they interact?
57
5801:30 DARIN: Well, yeah, I mean, you can talk about a lot of different things.
59I mean, so Chrome is made up of multiple processes. We run different tasks in
60different processes. That's done for multiple reasons. One is so that they can
61run independently, so that there's performance benefits that come from the fact
62that they're running independently. Back in the day, the original idea was that
63it would allow us to take advantage of the operating system's preemptive
64multitasking that it already has and to actually allow web pages to run
65concurrently and to be managed just like any other concurrent task that the
66operating system would manage. So that's the original idea there. And in that
67way, this model of Chrome divided into multiple processes just allows the
68Chrome itself and all of the tasks that it has to really take advantage of
69multi-core systems so that if you have more computing power, if you have more
70cores, you have more hyperthreading going on in your system, then it's possible
71for more things to happen concurrently. And Chrome's workload can be spread out
72that way because Chrome is broken into all of these different processes and all
73of these different threads. In that way, it's taking advantage of and mirroring
74the capabilities of the OS and providing that as a substrate for web and for
75browser and for how all these things work. How Chrome then has to be similar is
76that also, like an OS, Chrome has to manage all this stuff. And from simple
77things like how much resource should a background tab be using, should its
78timers be running when it's in the background, to much more complicated things
79when you talk about even should a process stay alive or not. If you look at
80Chrome OS where system resources can be so limited, it's necessary, or on
81mobile, necessary to terminate some of those background processes to close some
82of those tabs behind the scenes, even if the application makes it look like
83those tabs are still open. So the level of management is a big part of - in
84that way, it's being kind of like an OS.
85
8603:42 SHARON: Is Chrome's representation of a process, are those generally
87one-to-one with a system process, depending on which system you're on -
88
8903:48 DARIN: Absolutely.
90
9103:48 SHARON: or is that an abstraction layer?
92
9303:55 DARIN: No, well, absolutely when we talk about a process in Chrome, we
94mean an OS process. And so we might have multiple web pages being served by
95that single renderer process. We do try to spread the load across multiple
96processes, but we also independently decide how many processes to actually
97create. And it can be based on - there could be good reasons from, like I said,
98a performance perspective to having tabs assigned across multiple processes,
99but there can also be good security properties, like letting the web pages be
100allocated to different processes means that those web pages are not running in
101the same process, meaning they're not running in the same address space. And
102from a security perspective, that has really great properties because it means
103if a web page is able to tickle a bug in the rendering engine in the V8 or in
104part of Blink and somehow get a privilege escalation, like start to be able to
105do things that JavaScript normally can't do, it's still going to be limited by
106the capabilities of that process and what it has access to. And so if that
107process has really only the data for the web page that was providing the
108problematic JavaScript, well, it's not really getting access to anything it
109didn't already have. And that's kind of the whole idea of process isolation and
110sandboxing. And then on top of that, you limit the capabilities of that process
111by really leveraging the OS process primitive and the kinds of restrictions and
112capabilities that can be removed from that process to achieve an isolation for
113web pages for an origin or for a set of web pages. I say set because we might
114not want to allocate a process for every single tab or for every single origin
115because that might just use up way too many system resources. So we have to be
116thoughtful there, too.
117
11805:50 SHARON: Yeah, so this is quite closely related to site isolation, which
119isn't the topic of this video - maybe the next one. So terms that are used
120often and sometimes interchangeably are multi-process architecture and process
121model. So these aren't exactly the same thing, but I think can you explain the
122difference between them and what each one is for? Because there are
123similarities, but.
124
12506:16 DARIN: Sure. I mean, I think to me, the phrase "process model," it's
126talking about, what does a particular process represent, what does it do. And
127then when I say multi-process architecture, I'm thinking of the whole thing.
128It's all packaged up. It's a multi-process architecture to build a browser. At
129the end of the day, user is hopefully not so aware of the fact that this is how
130it's built. I mean, earlier on in Chrome's history, the Windows Task Manager
131didn't do a very good job of grouping processes by their parent. And so if you
132opened the Task Manager at the OS level, you'd see just a spew of processes
133that Chrome was responsible for. And it could be a little disconcerting for
134people. A little tangent, but now more modern versions of Windows, they do kind
135of group it all to the parent task. And so it's a little easier and less sort
136of in-your-face that Chrome is creating all these processes. But yeah, at the
137end of the day, it's just the multi-process architecture is like that's the
138embodiment of the whole thing. And we have these different process types that
139make up that whole thing. There's the browser process, the main one, and then a
140renderer process is the name we give to the processes responsible for running
141web pages. And then we have a few other process types that are part of the
142puzzle, a networking process, a GPU process, utility process, and occasionally,
143in the lifespan of Chrome, other types of processes. We had plugin processes,
144for example, when we were hosting Flash in Chrome. And the Native Client had
145its own type of processes as well. So what's that all about? Really, I can go
146into it if you want me to go into all the details there. But -
147
14808:05 SHARON: Yeah, I think we'll run through - this is a, yeah, perfect segue.
149We'll run through each of those process types you just mentioned and mention a
150bit about what they do, how much privilege they have, maybe how many of them
151there are because some of them, there's only one of. So I think it makes sense
152to start with the browser process, which is the process and is often likened to
153the kernel in an operating system.
154
15508:30 DARIN: Yeah, so the browser process kernel operating system broker, these
156are kind of good analogies for what the browser process's role is. So it's the
157application process, the main one, that starts up initially, and it's the one
158that hosts the whole UI of the app. And it's going to spawn these child
159processes, the renderer processes, the GPU process, and so on, to help fulfill
160its goals. So very early on, we started with this design where WebKit, the
161rendering engine we were using from Apple, it could be built as a COM control
162and register it on the system and load it as a DLL. And then in order to run
163that in a child process, it was using HWNDs and all the standard Win32 isms to
164do its job. And we started out by just literally trying to capture a bitmap
165rendering of WebKit and send it over to the browser process where we could
166present that bitmap. Actually, rewind even further. The very first version took
167advantage of the fact that Windows supports having HWNDs hosted in different
168processes and threads. And so we literally just took that HWND from WebKit and
169that child process and stuck it into the window hierarchy of the browser
170process. And we drew our browser UI around it, and WebKit was there, but it was
171running in a different process. And if we ever needed to tell that process to
172do something, we just send a WM user event postmessage to it. And that's
173something Windows lets you do. So it felt like a very simple toy kind of way to
174try it all out. A lot of limitations to that design. Pretty quickly, we
175realized we didn't want to just be in that kind of setup, and we moved to
176building our own IPC channel, a pipe, so that we could communicate and really
177get to the point where WebKit's running there without an HWND, without its own
178Win32 windowing constructs, but instead, it's just kind of an image generator.
179And we take the image that it generates, the bitmap, send it over our IPC
180channel to the browser process. And the browser process is where we have our
181window hierarchy browser process. We display that bitmap browser process where
182we collect user input and send it to the pipe to the renderer where we then
183feed it into WebKit.
184
18510:46 DARIN: That was the original architecture of Chrome. So in that world,
186the browser process is your application process. It has all the UI. And it's
187really like this glorified image viewer. And the renderer process is literally
188just like it's running WebKit - now Blink. It's running the rendering engine,
189and it's producing those images whenever. Like, an update occurs. A layout
190occurs or some invalidation occurs. And we got a little fancy. It was producing
191just the sub. It would know, oh, I really only have a small damage rect, so I
192don't have to produce the whole image. I just produce a small part. And send
193that over, and then we paint that into the part that the browser is retaining
194an old image of. And it can update just that one part. And so that's a very
195simple approach that we took when building this whole thing. And so those
196render processes become very much just very simplistic in that they aren't
197interacting with the rest of the OS in a very deep way. They are just taking
198input events from this pipe and sending images back. When they need other
199services like they need network access, instead of going straight to the
200network from the renderer process, because we started to realize, hey, we might
201want a sandbox and restrict those child processes, and also, we needed the
202notion of cookie jar that was shared across all web pages, so that if you visit
203GMail in one tab and visit GMail in another tab, you're still logged in, we
204needed the network stack to be in a unified place. So it meant that not just
205would we send images up to the browser, but now we would send network requests
206to the browser. And the browser would respond with the network data. And as a
207result, we started to go down this path of centralizing access to system
208services and resources in the browser process.
209
21012:44 DARIN: It's becoming therefore like a broker to the system that the
211renderer now is unable to - not unable - it's asking the browser for everything
212it needs. It's communicating to the browser to get access to all the different
213resources. And that allowed us to then restrict the renderer process
214considerably so that it doesn't even have access if it wanted to touch the file
215system, to touch the network TCP/IP implementation or any system resources. So
216the sandbox really is all about how we apply those restrictions, taking away
217the capabilities of a windows process. So in the very early days, there was
218just the browser process and renderer processes. And we would allow multiple
219renderer processes to be created as tabs were opened. And we put some
220restriction on the number of processes based on the amount of RAM that your
221system would have, thinking that processes maybe have some inherent overhead,
222which they do. Certainly, there's the overhead of the V8 heap that is allocated
223once per process or once per isolate, if you're familiar with the details of
224V8. And so, we didn't want to have so much of that kind of - so we thought
225there was some limit to how many processes we should have. Later on, other
226processes types started to emerge. The next one that came was the Plugin
227process because in order to get YouTube to work back in 2006, you needed to
228support Flash. And Flash has two modes - it did. It had a windowed mode and a
229windowless mode. And the difference is whether it drew itself into an HWND or
230if it would just produce a bitmap itself. But regardless of what mode it was
231rendering in, it still wanted direct system access, like it wanted to touch the
232file system. And so if we were going to run it in our browser, it can't run in
233the renderer process. It has to run somewhere else. And so, yeah, in the frenzy
234of, gee, wouldn't it be nice if we could have sandboxing, it was, how the heck
235are we going to sandbox and isolate plugins? Because the way plugins integrated
236with WebKit is that WebKit just directly called into them and said, hey, if
237it's a windowless one, give me your bitmap. I'm going to include it in my
238rendering. If it's a windowless one, it also means it's dependent on WebKit to
239feed it events. And so, how does that work? So we ended up building a process
240type called the Plugin process type for NPAPI plugins, Netscape-style plugins,
241all stuff that doesn't exist anymore. It's wonderful. And NPAPI is this
242interface that was once upon a time, I want to say, kind of, like - my head is
243going to some unsavory words. It was kind of pooped out by somebody at Netscape
244to make Acrobat Reader work over the weekend. And then it became a stable API.
245And lots of regret and sadness probably followed, but as a result, things like
246Flash were created, and web became very interesting in some ways. A wonderful
247story about Flash, I think.
248
24916:02 DARIN: But anyways, supporting that stuff meant dealing with some gnarly
250frozen APIs and figuring out how to stitch all that together, and the renderer
251process of WebKit would talk to something that wasn't actually in its process
252that was - or, again, another IPC channel, running a whole other process. We
253wanted plugins to still not run in our browser process, but to, instead, run in
254their own process so that if they crashed, they wouldn't take down the whole
255browser. And Flash and other plugins were notorious for crashing. So it was a
256must that they run in their own process. But we figured they couldn't be
257sandboxed as tightly as the renderer as WebKit because they already were
258accessing the system in very deep ways.
259
26016:55 SHARON: Cool, lots of -
261
26216:55 DARIN: Lots more processes got added later, like the networking, the GPU
263process, and NaCl. I can tell the story about those, too, if you're interested.
264
26517:08 SHARON: Oh, sure. Yeah, let's hear it.
266
26717:08 DARIN: OK, so 2009 era, I think, maybe 2010 - I don't know - somewhere
268along the way, we started building Chrome for Android. And you might recall I
269described how the renderer was really kind of a glorified image viewer, or the
270browser, browser was sort of an image viewer and the renderer's job was to
271produce a bitmap. And then we send it over to the browser, the browser would
272draw the bitmap. Mobile systems were not going to work very well if this is the
273way the drawing was going to work. If you think about how scrolling works or
274worked back then, scrolling a web page back then meant telling the computer to
275please memmove all the pixels, and then to draw another bitmap where pixels are
276not existing yet and need to be drawn. So you do a memmove followed by a
277memcpy. And so this is how original Chrome was built. If you were scrolling, it
278would be, oh, we need to shift pixels, and here's the bitmap. We need to stick
279in the part that's exposed. Do that all quickly, and do it over and over again.
280And that kind of operation is just not good if your goal is like nice
281responsive scrolling on a touch screen. Instead, the way mobile systems were
282built is using GPU rendering and compositing engines powered by GPUs, so that,
283instead, you are offloading a lot of that work to the GPU. So it was necessary
284to restructure Chrome's rendering pipeline for mobile, at least. But because we
285were doing that, we can also take advantage of it on desktop. Meanwhile, we
286were also on desktop starting to invent things like WebGL. Initially, WebGL,
287the precursor to that was this plugin called O3D, which is a 3D graphics plugin
288using the wonderful plugin APIs that I talked about before. But it provided
289this way to have 3D graphics scenes and build immersive kind of 3D content.
290That team, at some point, switched their sights on how to make that a standard
291through WebGL. Wonderful stories around that. But it also entailed figuring out
292how to do OpenGL, essentially, because WebGL was just OpenGL ES, and how to do
293that from a renderer, from that blink child process, how to do it there. And
294really, that meant that, OK, this process is going to be - these sandbox
295renderers are going to be generating a stream of GL commands. Where do they go?
296What do we do with that? And also, we know that it's possible to write shaders
297and possible to write GPU commands that can really wreck - can cause havoc, can
298be problematic, can cause the system to crash your process. So we don't want
299that happening in the browser process because we want the browser process to
300stay up so it can [INAUDIBLE] the manager.
301
30220:21 DARIN: So the GPU process was born. This will be the process that
303actually talks to the OpenGL driver or DirectX under the hood via ANGLE on
304Windows. And so now, we set up another pipe from the renderer over to the GPU
305process, and the stream of GL commands are being sent over there. And over
306there, it's talking to the driver. And if you sent something bad, driver is
307going to say no bueno and crash your process. And we would find that the
308browser would see the GPU process died, and it would maybe give you a warning
309or let you reload the page, and it will try again. As that's done, that's how
310we therefore were able to leverage processes to give us that isolation, but
311also give us that robustness, give us that capability. And that led to a lot of
312complexity, but also a lot of really amazing sophistication around the
313compositing engine. Chrome CC library was born subsequently, and all these
314things that have led to the modern way that we render the web on Chrome now.
315Skia learned how to render to OpenGL, et cetera, and the GPU process.
316
31721:35 DARIN: Next one came along was the network process, which was really born
318out of the idea of, gee, wouldn't it be nice to isolate the networking code
319into its own process that could be more tightly sandboxed? Because the
320networking stack tends to be a surface area that's accessible by attackers.
321Just like the V8 and JavaScript engine is parsing lots of stuff and very
322exposed to attack surface from would-be attackers, the network stack, same
323thing. You've got HTTP parsing and various other kinds of processing happening
324very close to content that attackers can control. And so this project, quite
325rather elaborate project to move the networking stack out of the browser
326process out of that broker process, but to, instead, its own process and have
327all the pipes go various IPC channels connecting to there, instead, was born.
328And I think this was more born in the era of Mojo IPC, where we had a more
329flexible IPC system that could help support that kind of transition, but still
330tons of work and quite a radical change to the flow of data and the way the
331system works. Previously, just to give a little aside, when a renderer is
332making a network request, the browser process acting as a broker needs to
333audit, is it OK for that guy to be requesting this thing? Think about all the
334kinds of rules that might be there, CSP, other kinds of things, and the
335security origin privileges associated with it and what we want to allow a
336renderer to actually access. Simple stuff like we support WebUI like Chrome
337colon pages in the context of, they load in a renderer process, that renderer
338process should be allowed to access other things from Chrome colon, right? But
339a web page shouldn't be able to. We don't want the arbitrary web pages to be
340poking around and seeing what's available in the Chrome colon URL. So that's
341like a simple example of where we honor that isolation. And so the browser
342process, having the network stack in the original incantation of Chrome makes
343no sense. It can apply these rules right there. Safe browsing was integrated
344there. Lots of different kinds of network filtering could be done there. Moving
345that to another process was a big change because now browser is the one that
346has the smarts to do auditing, but the data and all the requests are going to
347this other process. So making that work meant a lot more plumbing. And I think
348complexities ensued. But it's awesome to see it happen.
349
35024:20 DARIN: Anyways, I mentioned Native Client. So that was a precursor to
351Wasm that was a big investment by the Chrome team to find a way to bring native
352code to the web in a safe, secure manner. The initial take on it was, if you're
353running native code that came from the web on a system, that's scary. It could
354do like anything, right? Well, no, let's restrict the process capabilities, but
355even with a restricted set of capabilities, you can't necessarily restrict
356everything on Windows or Mac or Linux. There's always some limitation to the
357sandbox capabilities. And in many ways, the sandboxes that we implemented are
358kind of just an extra level of defense. If you think about it, the JavaScript
359Engine is already a sandbox, right? It already limits the capabilities. The web
360rendering engine, all the different kinds of security checks throughout the
361code are various forms of sandboxing. And then finally, the process in the way
362we restrict its capabilities is that next last defense. Well, running native
363code with only that last defense in place is not enough. So Native Client was
364designed to be not only to be native code that could be highly auditable, so
365that you could make sure that it's not allowed to jump to an address that it
366doesn't have code for, that it's not allowed to do things outside the set of
367things that it's allowed to do. So it had a lot of complexity as well in terms
368of how the process has to be set up in terms of the memory layout and various
369other details, which maybe I'm happy to not remember. And - but it meant it
370needed its own process type. Even though it integrated kind of like a plugin,
371it couldn't just be a plugin. It needed its own process type. And there had to
372be 64-bit variants and 32-bit variants, depending on the actual OS, actual
373underlying hardware that you were running on Arm versus Intel, all these
374differences. So yeah, we ended up with leveraging this process model
375extensively to enable these kinds of things.
376
37726:32 DARIN: I think I mentioned the utility process. In Chrome, the utility
378process is this thing you reach for when you want to do something that's
379potentially - like maybe you're dealing with some untrusted input, like you
380want to decode an image, or you want to run something in a process, and you
381just want to make sure that if it's going to do anything, it just dies over
382there and doesn't take down the whole browser process. I think some extension
383install manifest parsing, maybe various other kinds of things like that, would
384happen in a utility process as like a safety measure. Generally speaking,
385parsing input from the web or even the Web Store or things like that, doing
386that parsing in the browser process is a scary thing because you're taking
387input from a third party. And if you're parsing it there, you might have a bug
388in your parser, and that could lead to the most trusted process having been
389compromised.
390
39127:29 SHARON: Yeah, that falls into the whole Rule of Two thing, right, of
392untrusted data. We have a [INAUDIBLE] process. It's in C++. The thing that we
393decided to change is where it gets parsed, so.
394
39527:44 DARIN: That's right.
396
39727:44 SHARON: That makes sense.
398
39927:44 DARIN: Yeah, so the sandbox processes get used as this primitive to give
400us that extra safety measure.
401
40227:57 SHARON: So the other process type I can think of that wasn't just covered
403there was extensions. Is there anything to say there?
404
40528:02 DARIN: Sure, of course.
406
40728:02 SHARON: Of course.
408
40928:02 DARIN: In some ways, an extension process will show up that way in
410Chrome's Task Manager, but I believe it's usually just powered by a renderer,
411an ordinary renderer, because so extensions have background pages or background
412event in, I guess, the Manifest V2, it was background pages. Manifest V3, it's
413now just event pages or service worker type construct. And those need a process
414to run in. So the extensions get to inject some code that runs in the renderer
415of the web page, usually in an isolated world, so it can see the same DOM. If
416you've given the permission for the extension to read website data or to
417manipulate website data, it can do that by injecting a content script that will
418run in the same process as the web page that it's reading or modifying. But it
419will run in an isolated JavaScript context so that it's not seeing the same
420JavaScript variables and such. But it can still see the DOM. And that's meant
421to give a lot of capability, but also have a little bit of protection because
422it's so easy to accidentally interfere with the same JavaScript variables and
423things like this. OK, so extensions have that piece that injects a content
424script, but they also have a - usually, they can have this event service worker
425or background page that is their central place, process place for code to run.
426And so we do run that in a renderer process. And so for example, if the
427extension that's injected into a page wants to get some capabilities, it would
428talk to its service worker, who would then have the capability to ask for
429certain extension APIs to maybe understand all the tabs that are in your
430system, depending on what permissions it was granted. And then finally, with
431extensions, you also have the extension button and a dropdown that can occur
432there, which a web page can be drawn there by the extension. And that's going
433to be hosted in a renderer process, too. But that would be a web page that
434lives at a Chrome extension colon URL. And so you have these different pieces
435of the extension model where code from the extension can be running, and it,
436via some messaging channel, can talk to the other parts of itself that run in
437potentially likely different processes.
438
43930:37 SHARON: You mentioned service workers there, and those are kind of
440related to all this, too. So can you tell us a bit more about those?
441
44230:43 DARIN: Yes, so - well, OK, so backing up, in the context of extension, if
443we talk about background page first, the original idea with extensions was, OK,
444I'm injecting stuff into pages so I can modify things, but I also need like my
445home base. I need my context where - I need a place where my persistent script
446is running or where I can manage my databases, and I have just one place for
447that. And it's also a place where I can get elevated permissions to access
448other Chrome extension APIs. So that idea of a background page that the
449extension can create that's ever present so it's like a web page, but it's
450hidden, it's in the background, and content scripts that are injected into web
451pages can talk to it. So they can say, oh, I'm on this page. Give me some rules
452that I should apply to it or something, depending on the nature of that
453extension. OK, so but background pages are, unfortunately, persistent. And they
454live for the whole life of the browser. And they use up memory. They use up
455resources, even if nothing else about the extension needs doing. Even if the
456extension is not loaded into any web pages, that background page is sitting
457there. And so this was [INAUDIBLE] quickly realized, this is not great. This is
458a waste of resources for the system. We should have some policy for how we
459should close that background page down and only need to create it when
460necessary. In the context of, I think, Chrome apps, which is a thing that's no
461longer a thing, we created this concept called event pages, which allowed for
462these background pages to be a little more transient, that come into being only
463as needed, which is a much more efficient approach.
464
46532:28 DARIN: However, when it came time to bring that to extensions, at the
466same time, Service Worker had been created, which was a tool for web pages to
467be able to do background event processing. So the decision was to adopt that
468standards-based approach to how to do background processing. And so Service
469Worker is the construct that Manifest V3 allows extensions to use for that sort
470of background processing. Big difference between service workers are that they
471are not web pages. They're just JavaScript. But they can listen to different
472kinds of events. So just like a web worker, shared worker, service worker, they
473are without UI. They are without any HTML. They just have the ability to - but
474they have some functions that are given to them on the global scope that lets
475them talk to the outside world, to talk to the web page that created them, or
476in the case of Service Worker, they actually have events they can receive to
477handle network requests on behalf of the page. That's one of the main uses for
478them in the context of the web. A web page would have a Service Worker register
479it with the browser to say, hey, please contact my service worker if you are
480making a request for my origin. And that gives the Service Worker the
481opportunity to specify what content should be used to satisfy a URL. It could
482load that content out of a cache, and the Service Worker API includes APIs for
483managing caches and things like this. So all of that system that was built to
484kind of enable web pages to operate more robustly in the context of poor
485network connectivity or to get performance improvements for applications that
486are more single page applications that have a basic fixed shell that should
487load out of cache and then they make network requests to the server to get the
488data that populates some application UI, that model Service Worker was really
489designed for. But it seemed a very good fit for extensions. And it gets us out
490of the world of having these persistent extension background pages. So Manifest
491V3 says, if you want your content script to have access to privileged things,
492you go through a system, a Service Worker. And the Service Worker will get
493spawned in a renderer process. What renderer process? You don't know. It's up
494to the system. Chrome will make a decision there based on all of its usual
495rules around what other origins are in that process, thinking from a security
496isolation perspective, and so on, and so forth.
497
49835:22 SHARON: Cool. A lot of these process types have been added over time as
499the need for them arises. Like, oh, we want to put network stuff in a separate
500process. So apart from adding more process types, what have been other big
501changes to the multi-process architecture and processes in Chrome in the many
502years since launch?
503
50435:44 DARIN: The biggest one by far is the per site isolation, the site
505isolation work that was done.
506
50735:51 SHARON: We'll talk about that more next.
508
50935:56 DARIN: Yeah, so, I mean, well, I'll just say, so Charlie Reis was an
510intern on Chrome team back in the day during the pre-release period of Chrome.
511And I remember the conversations where we were like, gee, wouldn't it be nice
512if instead of isolating based on per tab, it was isolating per origin? And I
513think he was doing research on that topic, too. And he had all these ideas for
514this kind of a thing. And so it was really kind of very early on that we were
515having these conversations. But even very early on, it was like, this is going
516to be a big change, you know? No longer is it the idea that it's a big change
517to the rendering engine itself, like how frames could be served by different
518processes. So in order to isolate based on origin, you have to say a frame
519where an ad might live would actually have to be served by the process for that
520origin. And so now no longer is the whole frame tree just in one process.
521That's a big change. But built on top of the infrastructure we had, it was
522possible to imagine it, and it was quite a journey to get there. So that was
523probably the biggest change to the architecture. But like I mentioned before,
524actually, other big changes were definitely the introduction of the GPU
525process, definitely the introduction of Mojo IPC. Before Mojo IPC, the way
526things worked was, basically, messaging was much simpler, in some ways, easier
527to understand, but also much more the case that there were these files that
528really needed to know about everything in their world, like the render process
529host and the render process, the render view host and the render view, the
530render frame. The render frame host didn't exist then, but they came about
531because of site isolation, really. But the render view, render view host became
532this thing that represented the web page, and render view host in the browser,
533render view in the render. And for any feature that required brokering out to
534the browser to get access to something, essentially, the render view, the
535render view host had to be participants in that because they had to be kind of
536routers for that traffic. That's not very scalable. You start adding lots of
537engineers, building lots of different features that need lots of different
538capabilities. And these files start growing hairs and knowing about too many
539things. And it becomes really hard to manage.
540
54138:38 DARIN: On top of that, you start to have things where you say, gee, I
542really wish this system could be live in a different process. I mentioned the
543networking process. All these events were coming through these different kinds
544of crossroads of hell files. That was how I liked to call them. And in order to
545take a subset of that and move it to a different process, now you have to redo
546all that plumbing. And so the amount of layers of repeating yourself for
547plumbing IPCs felt very out of control for - maybe how much work you had to do
548to unlock a certain feature just seemed out of control. And so Mojo really was
549inspired by how to eliminate a lot of that, to have a system that's more
550endpoint to endpoint-based and all the flow of data would no longer be
551dependent on all of these kinds of routing classes that handled all this
552routing. And instead, you could just say, I have an endpoint. I have an
553endpoint over here. This one's privileged. This one's not. And if I want this
554one to live over here, I can do that. I can just move it around freely. And all
555the routing is taken care of for me. And so that was a big change. And there's
556many artifacts in the code base that sort of reveal the old system, right? In
557many ways in which the product is built still resembles that old system. The
558idea that if you look at a render view, render view host, there's an ID, a
559routing ID associated with that. The concept of routing IDs are not needed in
560Mojo anymore because the pipe itself, the Mojo pipe is like an identifier, in
561some sense. Of course, so much of our system is built up around the idea that
562tabs have these render view IDs, and frames have render frame IDs, and
563processes have process IDs. And so many systems deal with those integers that
564it's been unthinkable to not have those anymore. But in some sense, they aren't
565really needed. If we were to build things from scratch from anew with the Mojo
566system, you wouldn't need it.
567
56840:50 SHARON: Do you think if you were to start redesign the whole
569multi-process thing now, given how not just the internet is used, but also the
570devices that are out there, I think you would probably want to have multiple
571processes for things. But do you think there would be significant changes to
572how the system overall is designed or put together if one were to start now?
573
57441:16 DARIN: Well, yeah, I mean, it's always a question of where you're
575starting from and what the constraints are that you're dealing with. We were
576dealing with taking WebKit, which we didn't really have a lot of ownership of.
577And it was open source, but we also had limited bandwidth to go and fork it and
578manage that fork. And so to kind of try to create multi-process in the context
579of this big significant piece that we really can't change or do much about
580definitely limited us. So we had early ambitions and ideas. Like I said with
581Charlie about site isolation, it wasn't going to be then that we could realize
582it. It needed to be in a place where we had ownership of Blink. And not just
583ownership, I mean capability to go and change it and to own the consequences of
584changing it, to be able to manage that. We needed that, and we needed a lot of
585other pieces. So if I'm starting over, I also have to - it's sort of like,
586well, what am I starting from, right? But certainly, I feel like a lot of
587lessons along the way inspired Mojo and the design there. And I feel like
588that's a system that that sort of system would allow for an architecture that I
589think would be better in many ways. And I'm very biased because that's
590something I've worked on, and it was inspired by things I saw that weren't
591great about the way that we built Chrome originally, although, in many ways,
592the original setup with Chrome was born of pragmatism and minimalist in many
593ways, trying to achieve - Chrome was very focused on being a product first, not
594a browser construction kit. And so the idea that it needed to morph into a lot
595of different things wasn't there in the beginning. In the beginning, it was,
596you're just building a browser for Windows XP Service Pack 2. That's it,
597nothing else. OK, now Vista. You got to worry about Vista, too, sorry. But just
598that's it. And then later on, you add Mac. You add Android. You had Chrome OS,
599iOS, Chromecast, et cetera, et cetera. And suddenly your world is very
600complicated, and the needs of this system is way more. And the value of
601malleability becomes higher. Look at the investment in views, et cetera, to
602allow cross-platform UI, and then Mojo to allow a much more flexible system
603under the hood. So it depends on your constraints in a lot of ways.
604
60543:43 SHARON: Yeah, that makes sense. Something you said about even now in the
606code base, you can see remnants or suggestions of how obsessed maybe of how
607things used to be. So one of the things that makes me think of is about the IO
608and UI threads because I feel like people used to talk about those more. And
609now that's maybe changing a bit. So how come these are the only times we hear
610the term "thread," really, in all of this? And what are the IO and UI threads
611that can you just tell us a bit about?
612
61344:20 DARIN: Oh, yeah, threading is a super fun topic. Now we have all these
614task runner concepts and systems for giving you a task runner that's on an
615isolated thread or whatever. And systems like Mojo allow you to not really have
616to do a lot of plumbing to compensate for your choice of thread where you want
617something to run. You can just indicate where it should go, and that happens.
618But OK, originally, the design of the system was there was a UI thread, and
619that's where all the UI lives. So the HWNDs, the Window handles and all the
620Win32 stuff would go there. Input painting come in there. Then there was - so
621early on, I like to tell this story because one of the very first versions of
622Chrome, we had just that UI thread sending data to a renderer processes. And
623the renderers would have their main thread where they ran JavaScript and
624everything. So there was just these two threads in two different processes.
625That was kind of it. In the browser process, there might have been the system
626was probably doing a lot of other stuff with its networking stack and DNS
627threads and such. But we weren't doing any. That wasn't us. That was probably
628libraries we were using. So we had these two threads in two different processes
629and IPC channel. And so you send the input down to the renderer. The renderer
630sends you a bitmap. OK, Google Maps. Imagine Google Maps. And imagine you're on
631a single core, non hyper-threaded laptop. And you take your mouse, and you
632click on that map, and you start dragging it around. And you expect to see the
633image tiles moving around, right? And but for some reason, in Chrome, on that
634device, [SNAP] nothing happens. You just move your mouse around, and the image
635is stuck there. You're like, what's going on? It works fine on this other
636laptop. Why not on this laptop? Turns out that on that device, in that setup,
637the input stream was coming in. And basically, we were sending all this input,
638and the input events were taking priority in the Windows Event pump over any
639painting and/or reading from our IPC channels. And so, as a result, we were
640just sending input events to the renderer. It was doing work, generating new
641images. Those images were coming to the browser and backed up in some pipe and
642not really being serviced, not really making their way. And so we kind of came
643to the realization of several things. One is, we need to throttle that input
644going to the renderer, but we also probably need to have some highly responsive
645IO threads that could be dedicated to servicing the pipes, the channels, the
646IPC channels, both in the browser and the renderer, actually. And so what was
647born from that was the IO thread. And the IO thread was meant to be highly
648responsive thread for processing asynchronous IO. That's really what its name
649should be - highly responsive, non-blocking IO thread - because the name IO
650thread subsequently confused lots of people who wanted to do blocking IO on
651that thread, like read a file or something. And we had to put in some
652restrictions in the code to always let you know not to - that this function is
653going to - there's certain runtime assertions if you try to use certain
654blocking IO functions in base on the wrong threads. And alongside that, we
655invented something called the file thread. Said, this is the thread where you
656read files. This is the thread where you write files because we don't want you
657doing that on the UI thread because the UI thread needs to be responsive to
658user input. So don't do blocking file IO on the UI thread. Don't do it on the
659IO thread either. Do it on the file thread. So -
660
66148:14 SHARON: That means they're all running in the browser process.
662
66348:20 DARIN: In the browser process. The renderer got its own IO thread, too.
664So the renderer would have its main WebKit thread and its IO thread. So it was
665sort of a symmetric system. You had IPC channel, which was wrapped with a class
666called `ipc_channel_proxy`. These things still exist in the code base. And
667ChannelProxy was a way to use an IPC channel from a different thread. But the
668IPC channel would be bound to the IO thread. All of those things I just
669mentioned still exist, and Mojo was built on top of those channels. But the IPC
670channel provides that underlying pipe. So it's kind of IPC channel is
671one-to-one with an OS pipe. Mojo has this concept of pipes which are more like
672virtual pipes, and they're multiplexed over OS pipe, over an OS pipe.
673
67449:08 SHARON: OK. Yeah, because I think, yeah, now you hear non-blocking IO,
675but I feel like maybe it's just what part of the code base you work in. But
676running things, making sure things run on the right thread seems to be less of
677a problem than it used to be.
678
67949:27 DARIN: Yes. I think there's a lot of reasons for that, a lot of maturity
680in the system. But also, I think some of the primitives are set up nicely so
681that you can more easily have things running. In some ways, we used to have
682this concept of, yeah, we very much had this. Still, in some ways, still have
683this, but the idea that there is a UI thread, that there's an IO thread, and
684that there is a file thread, and you pick which thread you're going to use.
685Now, there's a whole pool of blocking IO threads. And you don't specifically
686say, I want the file thread. You say, I have blocking IO I want to do, or give
687me a - I want to put it on a thread pool. The IO thread used to be like where -
688it may be still the case that some systems would just live there only because
689maybe for latency reasons - like, cookies is a good example. We knew that we
690wanted to be able to respond quickly to the renderer if it was querying a
691cookie database. So we want to be able to service that directly on the IO
692thread. And so there'd be a collection of these things that were maybe somewhat
693sensitive, and but we wanted to have them live and be on the IO thread. And so
694that idea of some things live on the IO thread was born. But I think those
695things are few. And you really have to highly justify why you should be on that
696thread. And so most things don't need to be. Just be on the UI thread. It's OK.
697Or structure your work so that the part that is expensive and blocking goes to
698a blocking queue.
699
70051:00 SHARON: So partly for these threads, sometimes you see checks. Like,
701check that this is running on a certain thread. But in general, is there a good
702way to find out what process a certain block of code runs on? Because some
703things we know - if you go to a third party Blink, whatever, you kind of know
704that that's going to run in a render process, but just looking at the code,
705like looking in code search, can you know where something is going to -
706
70751:25 DARIN: [INAUDIBLE] very early on to try to deal with this. So like if you
708go to the content directory, it's a good one to look at. You'll see a browser
709directory, subdirectory, a renderer subdirectory, and a common directory. And
710there's some other ones that have these familiar names. We use that structure
711all throughout the code base for different components. So if you go components,
712components foo, you'd see browser, renderer, common, maybe a subset of those,
713depending on. And so the idea is, if it's code that should only run in the
714renderer, it lives in the render directory. If it's code that should only run
715in the browser, it lives in the browser directory. If it's code that could run
716in either, it lives in the common directory. So you'll see mojom definitions in
717common directories because mojom is where you define the Mojo interface that's
718going to be used in both processes.
719
72052:12 SHARON: Oh.
721
72252:12 DARIN: Yeah, we also have this code separation was also kind of born out
723of this idea at one point in time that we might generate a totally different
724binary for browser and renderer. And we used to have browsR. I'm calling it
725that way because it didn't have an E at the end, so browsR and capital R, and
726then rendR or something like this. And these were the two processes, the two
727executables. And they could just compile whatever code they needed for their
728purpose. Like WebKit would be in the renderer, and browser would have not
729WebKit. It would have other things. And so these separate directories also
730helped because it was like, that's the code that's going to go into that
731process literally. And fast forward when Sandbox came along, the team was like,
732nope, it's got to be the same executable for both browser and renderer and
733should probably be called chrome.exe instead. And then that idea kind of that
734they were separate executables and separate code kind of went away. And
735instead, all the code for Chrome went into just this big DLL on Windows. And
736the amount of shared code between the EXE and the DLL is very small, maybe a
737little bit from base and such. But yeah, this idea of tagging the directory
738structure in such a way that makes it obvious of like what process this code
739belongs in, I think it was a big help, and it was a good choice. And it gives
740people a little clarity of where they are and what they can use.
741
74253:49 SHARON: What about for non-browser renderer processes? What about GPU
743network? How do you know that this is running on the network process versus
744this is how this part of this section of the code is interacting maybe with the
745network process?
746
74754:05 DARIN: Sometimes it can be a little bit of good luck. And sometimes it
748might not be as obvious. I don't think this sort of - this structure that I
749described was used for plugins, so there's a plugins directory, which may still
750be around in some fashion or might be mostly gone. I don't know if when the
751network process transition occurred, if this annotation was really maintained.
752I actually don't think it was because I don't remember seeing network
753directories. But I could be wrong. There might be some of them. I'm not as
754familiar with the code for the networking process. But I think this convention
755has helped us a lot and would be valuable to use in more places. For GPU,
756there's a lot of symmetric code, probably code that runs in all processes, but
757still this convention probably would make sense. But yeah, I think that for
758some of those things, when you get like into the network world or you get into
759the GPU world, you're also kind of in a more focused world, a smaller world.
760And there's probably many other things you have to learn about that domain.
761
76255:16 SHARON: Yeah, the GPU stuff seems very, very difficult. And I certainly
763don't know how that works. OK, so -
764
76555:23 DARIN: [INAUDIBLE] on there.
766
76755:23 SHARON: Yes, so when it comes to process limits and performance and all
768that kind of thing, so we have process limits, but you can go over them. And
769can you tell us a bit about process limits, how they work, what happens when
770you reach the limit?
771
77255:39 DARIN: Hmm, yeah. So process limits, they exist to just have a reasonable
773number of processes allocated for some definition of reasonable. At least early
774on, that definition was based on how much RAM you had on your system. And as
775computers got more and more RAM, that definition needed to be adjusted. We
776assumed some overhead for individual processes. It's probably wise to put some
777limits on how many we create. The allocation of those processes, it's best to -
778kind of viewed as best to distribute the tabs across them as best as we can and
779the origins across them now and the side isolation world to give more isolation
780between different origins, to give more isolation between the different apps.
781But at some level, you run out, and you need to now allocate across the ones
782that are already in use. There's some hard rules around privileged content,
783like Chrome colon URLs. They should not mix with ordinary web pages. But if
784push comes to shove, we'll put a whole bunch of different origins content
785together into the same process, just ordinary web pages, not trusted content.
786
78756:52 SHARON: What happens if you just open a ton of tabs with a whole bunch of
788different pages open, and you're basically stress testing what Chrome can do?
789What happens in that case?
790
79157:08 DARIN: It creates a lot of processes. It uses a lot of system resources.
792It uses a lot of RAM. I think that this has been, I'd say, a battle for Chrome
793across a lot of its lifetime and more recently, is how to manage these extreme
794cases. And increasingly, these extreme cases are not actually odd or unusual.
795They'll do a lot of browsing. People click on a lot of links. People create a
796lot of tabs. People don't really close their browsers. They just leave it
797running. And they come back the next day, and they continue where they left
798off. And they open more tabs, and they do more surfing. And they just collect
799and collect and collect tabs. And maybe they create more windows because maybe
800they have some task that they're researching, and then they get interrupted and
801they come back to it later. But they start to accumulate these windows full of
802things that maybe they mean to come back to. And so that problem of just having
803lots and lots of stuff and lots and lots of processes, well, Chrome under the
804hood is like, I'll do my best. You wanted me to do all this stuff. I'm going to
805do it. Let's see what I can do. And on a system like Windows or Mac where
806there's a lot of RAM maybe, Chrome's thinking, OK, you wanted me to use the
807RAM. I'm going to use the RAM. You wanted all those tabs. And then even on
808those systems where maybe you're running out of RAM, but there's virtual
809memory, there's disk space, all right, let's use it. Let's go. And so I think
810it's really quite a challenge, actually.
811
81258:44 DARIN: The original idea of Chrome was, yeah, make it possible for web
813pages to take advantage of the resources of your computer. Let it allow web
814pages to be more capable because of it, and not be - the old world prior to
815Chrome was single-threaded browser, all web pages on the same thread. Like, you
816could have a dual core machine, and it wouldn't matter. It wouldn't make your
817browser any faster. But now with Chrome, no problem. You got dual core. You got
818eight cores, whatever you got. We can have all of those things saturated with
819work and allow you to multitask on the web and do lots of amazing things. But I
820think it's still a resource management challenge for the browser because on one
821hand, you want to give that capability, but on the other hand, you also don't
822want to - how much power should you be using? What if the laptop's not plugged
823into the wall? What if it's just running on battery? What is the right resource
824utilization for Chrome? I don't think that's a solved problem at all. There's
825various systems in place to throttle the resource utilization of background
826tabs. Timers, for a long time now, have been throttled, but throttling other
827things. I know there was a lot of research done into freezing tabs, so
828literally suspending them and not letting them do any work. But with that comes
829challenges of what do you do with all the IPCs that are inbound to those
830processes? They're backing up on pipes, and that's not great. If you unfreeze
831them, now there's a blast of IPCs coming in that they suddenly have to service.
832That doesn't seem great. Do you drop those IPCs on the floor? Probably not.
833Now, the process would be in some weird state, and you might as well have to
834just kill it, which, of course, is the case on dev systems like Chrome OS and
835Android. They do have to just kill the processes because of the limits of those
836devices. So, yeah, I've been a proponent of just being aggressive about killing
837processes on desktop in general. I think there's some balance there that's
838right. It's probably not right to keep all the tabs open, all the processes
839open. We should be, I think, judicious about what we keep open, keeping the
840workload reasonable, instead of making it like a, oh, yeah, I will rise to the
841challenge of dealing with thousands of tabs or thousands of web pages across
842100 processes, even if - maybe it's somehow possible through heroic effort to
843make Chrome capable of doing such a thing in an efficient manner. But does it
844mean we should? Who needs 1,000 tabs all running around doing work at once you
845know? You don't. You really don't. Nobody does.
846
84761:32 SHARON: So this is kind of the basis of the goal for Arc, right, which
848is I think it closes your tabs overnight or something. And Arc is what you work
849on now and is a Chromium-based browser. So for embedders of Chromium, let's say
850the browser kind, how much control do you have over how processes are used,
851allocated, if you embed content? Like, are you able to just say, oh, I don't
852want a network process. I will just put this all in the browser process. Can
853you do that?
854
85562:07 DARIN: Hm. You can do anything you want. It's just code. No, but as a
856browser embedder, as a Chromium embedder, you're shipping Chromium. So Arc
857browser ships a copy of Chromium. And Arc browser includes changes to Chromium
858as needed to make it work. Of course, that's possible. Of course, you could
859change a lot of stuff and make a big headache to manage it all, right? So
860there's some natural limits. You don't want to change too many things, or else
861you won't be able to really manage it going forward. You want to take updates
862from the mainline, incorporate improvements, but you also want to preserve some
863differences that you've made. Well, how do you do that? And so change
864management is a challenge. So there's a natural limit to how much you want to
865alter the base functionality. Instead, it's - anyways, the product like Arc is
866not so much differentiating on the basis of Chromium code or content layer.
867It's not really its purpose or goal. Its purpose is to differentiate at the UI
868layer and with things like what you mentioned and other things as well. Yeah,
869and so, of course, if one were to go down the path of could we optimize process
870model better, that would be in the realm of things that would be great to
871contribute to Chromium, so that it could be part of the mainline and therefore
872not be something that you have to maintain yourself. That's how I would
873approach it as a Chromium embedder.
874
87563:47 SHARON: OK, that makes sense. Yeah, if it's in Chromium, you don't have
876to worry about the updates, and you just get -
877
87863:53 DARIN: Turns out there's an army of engineers who would make sure it's
879never broken. You just gotta write some tests.
880
88163:59 SHARON: Oh, wow.
882
88363:59 DARIN: [INAUDIBLE] those tests.
884
88564:05 SHARON: So with non-browser embedders of Chromium, like, say, Electron, I
886don't know how familiar you are with that, but they presumably would have
887different needs out of how Chromium works, basically. I don't know if you know
888what they're doing with any processes.
889
89064:25 DARIN: I mean, I've used VS Code. That's a famous example of a Chromium
891embedder that you might not realize is using Chromium or built on top of it,
892that one might not realize that. But if you open up Task Manager and you look
893at VS Code, you'll see all the glorious processes under there. And so have they
894or Electron or any of these, have they altered things there? Maybe. I mean,
895there's some configuration one might do. If you're building an application
896that's very single purpose, like VS Code or Slack or - what are some other good
897examples, there's quite a few that are built on top of Chromium - they're more
898single purpose towards a single app, right? Of course, VS Code is pretty
899sprawling with all the things you can do in it, but at the same time, it could
900be the case that they don't have the same security concerns. They don't have
901the same idea of hosting content from so many different sources. So maybe they
902would tune the process model a little differently. Maybe they would decide, I
903don't really need as many processes because I'm managing things in a different
904way. It's not a browser.
905
90665:34 SHARON: Yeah, you're not handling all of the untrusted JavaScript of the
907web that you have to be -
908
90965:42 DARIN: Right, I'm not so worried about this part of my application dying
910and then wanting to keep the rest of it still running or something because that
911would still be considered a bug because part of my app died. And so some of the
912reasons for multi-process architecture might be a little different.
913
91466:01 SHARON: Right. And more just for fun, having worked on now an embedder of
915Chromium, how has that experience been in terms of decisions that were made
916when you were putting together the multi-process architecture? Are there things
917where you were like, oh, no, past me, if you'd done this differently, this
918would be easier now.
919
92066:20 DARIN: I would say I'm very thankful for Mojo IPC, made it very easy to -
921one thing that I've found is that it's possible to do a lot of amazing things
922on top of Chromium without actually modifying Chromium. And the Content API and
923Mojo IPC makes a lot of that really possible. So it's a very flexible system.
924There's a lot of really great hooks that let you interact with the system all
925the way from extending the renderer to extending the browser. And to be able to
926build stuff and layer it on top of a stable system is amazing. When I was
927working on building an Android browser, I built a tracking prevention ad
928blocking system for Android and was able to do it without modifying Chromium. I
929thought that was amazing.
930
93167:19 SHARON: How are you using Mojo? Because Mojo is typically going between
932the processes. So if you're not really changing how the processes work, what do
933you use Mojo for?
934
93567:26 DARIN: Oh, well, in that case, it was used to communicate a rule set down
936to the renderer. And then at the renderer level, I would inject a stylesheet to
937do content blocking or to apply a network filtering at the link layer. So there
938are a combination of Blink Public APIs and Content Public APIs. There are
939actually enough hooks to be able to filter network requests and insert
940stylesheets that would apply display none to a set of DOM elements. So but to
941do that efficiently, it was necessary to bundle up those rules into a blob of
942memory that you would just send down to the renderer process, to all render
943process, so it'd have it available to them so they could just directly inspect
944like a big hash map of rules. And so being able to - like I said before, when
945the IPC system is just like - when it's decoupled like that with Mojo, it makes
946it possible to kind of graft on these systems that they interact with APIs over
947here, and that endpoint talks to some endpoint over here in the browser
948process, which can have, like I said, like a rules data that it might want to
949send over and that kind of thing. And so being able to build those kinds of
950systems, and I think if you look at just how a lot of features in Chrome are
951built, they're built very similarly, too. They build on top of the Content API
952that provides the various hooks. They build on top of Blink API. Sometimes a
953feature needs to live in the renderer and the browser process. Like autofill is
954always the classic example of this early on in Chrome or password manager.
955These are systems that need to crawl the DOM. They need to poke at the DOM.
956They need to understand what's there. They need to be able to insert content or
957put overlays in, or they need to be able to talk to the browser where the
958actual database is, all that kind of stuff, and looking at different load
959events and various things to know in the lifecycle of the page. So, yeah, I'd
960say I'm thankful for a lot of these design choices along the way because I
961think it's led to Chromium being so useful to so many people in so many
962different ways. Obviously, it empowered building a really great browser and a
963really great product, but it also has empowered a lot of follow-on innovation.
964And I think that's pretty cool.
965
96669:53 SHARON: It is pretty cool. So Chrome was released in 2008. It is
967now 2023. So as math tells, it's been 15 years. We like numbers that end in 5
968and 0. So - I don't know - it's very cool. I remember when Chrome came out. And
969I don't know. Do you have any -
970
97170:08 DARIN: Yeah, for me, it's more like 17 years because we started in 2006.
972
97370:14 SHARON: Right. So do you have any general reflections on all the stuff
974that's changed in that time?
975
97670:22 DARIN: It's wild. I have a higher density of memories from the early
977days, too. It's amazing. I guess that's how memories work when everything's new
978and changing so much. But yeah, no, I'm very thankful for the journey and very
979thankful to have been part of it. And it was a lot of fun to work on. I mean,
980prior to Chrome, when I was working on Firefox, I did a little exploration on
981adding like a multi-process thing to Firefox, which I thought - just, I was
982learning about how to do IPC, and I was learning - but I was doing it for what
983purpose back then. I think I was just toying around with DCOM. I don't know if
984anybody knows what COM is, but Microsoft's Component Object Model that was like
985all the rage back then. And it allowed for like integrating different languages
986together. WinRT is all built on top of this stuff now. But anyways, Mozilla had
987its own version of COM called XPCOM. And wouldn't it be cool if you could have
988a component that - so you could have components back then that were built in
989JavaScript, and you could talk to them from C++, or they were built in C++ more
990commonly, and you talked to them from JavaScript. But wouldn't it be cool if
991one endpoint could be in another process? So that was something I was playing
992around with in 2004 when I was still working on Firefox. And then when Chrome
993opportunity came along - maybe that was 2005 - I don't know. But when the
994Chrome opportunity came along, I was like, all right, let's do it. IPC channel
995was basically those ideas, but kind of more polished slightly.
996
99772:02 SHARON: OK. Yeah, very cool. I mean, when I first started working on
998Chrome stuff, someone on my team said, any time you change something in base,
999that pretty much is going to get run anytime the internet gets run, which I
1000thought was super crazy for just some random software engineer like me to be
1001able to do, right? But -
1002
100372:20 DARIN: And now it's even more than that if you think about [INAUDIBLE]
1004code and [INAUDIBLE]..
1005
100672:20 SHARON: Yeah, all the stuff. So do you ever just think about it, and
1007you're just like, oh, my god, wow.
1008
100972:26 DARIN: Yeah, it's pretty amazing.
1010
101172:31 SHARON: So crazy.
1012
101372:31 DARIN: It is one of the special things about working on Chromium, is
1014that, yes, you can have such an amazing impact with the work that you do there.
1015
101672:38 SHARON: Have there been any cases - these are just now unrelated
1017miscellaneous questions. But in terms of surprising usages of Chromium, be it
1018like maybe the base or the net stack or something, have there been any cases
1019where you were really surprised by like, oh, this is being used here?
1020
102172:56 DARIN: Well, for sure, the first time I heard about Electron, I was like,
1022oh, this is not a good idea. House of cards, you know? It just seems like it's
1023such a complicated system to build your app on top of, right? But at the same
1024time, I totally get it and appreciate it, and I understand why people would
1025reach for it. There's so much good sauce there, so much good stuff and so
1026many - there is a lot of really good infrastructure there to build on. Early
1027on, I kind of imagined more that things like Skia and V8 and some of the other
1028libraries would be the thing that people would make lots of extra use out of,
1029right? So I didn't quite imagine people taking the browser's framework like
1030this. And we absolutely didn't build it with that purpose. Pretty much every
1031choice along the way was highly motivated by making Chrome team's life better.
1032Like, Content API was, when we came to the realization we needed it, it was
1033like we desperately need it. Just the complexity of Chrome was getting
1034unwieldy. We needed to cleave part of it and say, that is this part. We needed
1035to somehow draw a line in the sand and say, this is the set of concerns over
1036here. And so the idea that all of this could be used for other purposes is
1037cool, but it was never really in the initial cards. And I came from working on
1038Mozilla, which was, in many ways, browser construction kit first, product
1039second. So Chrome was very much like, let's go the other extreme - product
1040first, maybe a platform later. And to see it be this platform now is pretty
1041cool. But it's pretty far from where we started.
1042
104374:50 SHARON: Yeah, kind of - I watched some of the earlier talks you gave
1044about the multi-process architecture and Content, not Chrome, came up a bunch.
1045And this is, things, I guess, like Electron are the result of that, right?
1046Where -
1047
104875:01 DARIN: Yeah, it's pretty wild. Yeah, I mean, so Mozilla built this very
1049elaborate system called XUL, or X-U-L, which was a XML language for doing UI.
1050And it's very interesting, intellectually interesting, maybe different than
1051XAML. XAML is way better probably in many ways. But XUL was kind of XHTML
1052minus, minus, with a bunch of stuff added on for like UI things. And then it
1053had this thing called XBL, which is a bindings language that you could do
1054custom bindings. And so anyways, then you build your application in JavaScript
1055and Firefox, Mozilla, it was all built this way. So it was like a web page
1056hosting a web page. The outer web page was like this XML DOM. The product
1057engineers working on that, in order to get some modern Windows sort of thing
1058come through, they had to basically go through the rendering engine team to get
1059them to do something. And so it really greatly limited the ability for product
1060team to actually build product. And there were so many sacred cows around the
1061shape of Gecko and how that structure was, that while this cross-platform
1062toolkit seemed glorious at first, it ended up being handcuffs for product
1063engineering, I think. So, yeah, Chrome started out with Win32 native UI for
1064browser UI. You have all the choices you want to make, browser front-end
1065engineers. You also have to build a lot of code, but no cross-platform
1066toolkits. Views came later.
1067
106876:43 SHARON: Right. Well, this was great. Thank you very much. Normally, we do
1069a shout-out section at the end. Do you have anything - normally, it's like a
1070Slack channel or something like the Mojo Slack channel. I think in this case,
1071it's maybe - I don't know if there is a specific thing, but is there anything?
1072
107376:57 DARIN: Shout-out to all the team and the engineers making everything
1074great.
1075
107677:03 SHARON: All right.
1077
107877:03 DARIN: Yeah.
1079
108077:03 SHARON: Cool. Awesome. Well, thank you very much for chatting with us.
1081That was super cool, lots of really interesting background and good
1082information. So thank you very much.
1083
108477:15 DARIN: Yeah, a pleasure. Thank you so much for having me.
1085
108677:21 SHARON: Talk about threads, so IO, UI thread.
1087
108877:27 DARIN: Do I get credit for the confusingly named IO thread?
1089
109077:27 SHARON: OK, all right, we can cover that. That's cool. Yeah, why is it
1091called IO thread when it doesn't do IO?