Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 1 | # What’s Up With Processes |
| 2 | |
| 3 | This is a transcript of [What's Up With |
| 4 | That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq) |
| 5 | Episode 8, a 2023 video discussion between [Sharon ([email protected]) |
| 6 | and Darin ([email protected])](https://www.youtube.com/watch?v=SD3cjzZl25I). |
| 7 | |
| 8 | The transcript was automatically generated by speech-to-text software. It may |
| 9 | contain minor errors. |
| 10 | |
| 11 | --- |
| 12 | |
| 13 | Chrome has a lot of process types. What is a process? What are all the types? |
| 14 | How do they work together? Today’s special guest to tell us more is Darin. |
| 15 | Darin is one of the founding members of the Chrome team, and wrote the initial |
| 16 | implementation of the multi-process architecture. |
| 17 | |
| 18 | Notes: |
| 19 | - https://docs.google.com/document/d/1uXF-ncJ98LWQMN7M3NA_2oYkVmW9Vzp0v-wkJaNpsDQ/edit |
| 20 | |
| 21 | Links: |
| 22 | - [Chrome comic](https://www.google.com/googlebooks/chrome/small_00.html) |
| 23 | - [What's Up With Mojo](https://www.youtube.com/watch?v=at_35qCGJPQ) |
| 24 | - [What's Up With Open Source](https://www.youtube.com/watch?v=zOr64ee7FV4) |
| 25 | - [What's Up With //content](https://www.youtube.com/watch?v=SD3cjzZl25I) |
| 26 | - [Life of a Process](https://www.youtube.com/watch?v=5im7SGmJxnA) |
| 27 | - [Chrome Compositing](https://chromium.googlesource.com/chromium/src/+/HEAD/docs/how_cc_works.md) |
| 28 | - [Site Isolation papers by Charlie](https://charlesreis.com/research/publications/) |
| 29 | |
| 30 | --- |
| 31 | |
| 32 | 00:00 SHARON: Hello, and welcome to "What's Up With That," the series that |
| 33 | demystifies all things Chrome. I'm your host, Sharon, and today, we're talking |
| 34 | about processes. There are so many process types in Chrome. How do they form |
| 35 | the multi-process architecture? What exactly is a process? Here to answer all |
| 36 | of that and more is today's special guest, Darin. Darin was one of the founding |
| 37 | members of the Chrome team and pretty much did the first implementation of the |
| 38 | multi-process architecture, so it is well-suited to answer all of this. Plus, |
| 39 | created the IPC channels that Chrome started with. If you want to learn more |
| 40 | about IPC and Mojo, check out the last episode with Daniel for lots more on |
| 41 | that. So hello. Welcome, Darin. Welcome to the show. Thanks for being here. |
| 42 | |
| 43 | 00:38 DARIN: Thank you. Great to be here. |
| 44 | |
| 45 | 00:38 SHARON: Yeah, cool. So first question, what is a process? |
| 46 | |
| 47 | 00:44 DARIN: Right, so process is the container in which applications run on |
| 48 | your system. Every process has both its own executing set of threads, but it |
| 49 | also has its own memory space. That way, processes have their own independent |
| 50 | memory, their own independent data, and their own independent execution. The |
| 51 | system is multitasking across all of the processes on the system. |
| 52 | |
| 53 | 01:13 SHARON: Cool. Chrome is basically an operating system that runs on top of |
| 54 | your operating system. So there probably are parallels between Chrome's |
| 55 | representation of a process and the actual operating system ones. So what are |
| 56 | the similarities and differences, and how do they interact? |
| 57 | |
| 58 | 01:30 DARIN: Well, yeah, I mean, you can talk about a lot of different things. |
| 59 | I mean, so Chrome is made up of multiple processes. We run different tasks in |
| 60 | different processes. That's done for multiple reasons. One is so that they can |
| 61 | run independently, so that there's performance benefits that come from the fact |
| 62 | that they're running independently. Back in the day, the original idea was that |
| 63 | it would allow us to take advantage of the operating system's preemptive |
| 64 | multitasking that it already has and to actually allow web pages to run |
| 65 | concurrently and to be managed just like any other concurrent task that the |
| 66 | operating system would manage. So that's the original idea there. And in that |
| 67 | way, this model of Chrome divided into multiple processes just allows the |
| 68 | Chrome itself and all of the tasks that it has to really take advantage of |
| 69 | multi-core systems so that if you have more computing power, if you have more |
| 70 | cores, you have more hyperthreading going on in your system, then it's possible |
| 71 | for more things to happen concurrently. And Chrome's workload can be spread out |
| 72 | that way because Chrome is broken into all of these different processes and all |
| 73 | of these different threads. In that way, it's taking advantage of and mirroring |
| 74 | the capabilities of the OS and providing that as a substrate for web and for |
| 75 | browser and for how all these things work. How Chrome then has to be similar is |
| 76 | that also, like an OS, Chrome has to manage all this stuff. And from simple |
| 77 | things like how much resource should a background tab be using, should its |
| 78 | timers be running when it's in the background, to much more complicated things |
| 79 | when you talk about even should a process stay alive or not. If you look at |
| 80 | Chrome OS where system resources can be so limited, it's necessary, or on |
| 81 | mobile, necessary to terminate some of those background processes to close some |
| 82 | of those tabs behind the scenes, even if the application makes it look like |
| 83 | those tabs are still open. So the level of management is a big part of - in |
| 84 | that way, it's being kind of like an OS. |
| 85 | |
| 86 | 03:42 SHARON: Is Chrome's representation of a process, are those generally |
| 87 | one-to-one with a system process, depending on which system you're on - |
| 88 | |
| 89 | 03:48 DARIN: Absolutely. |
| 90 | |
| 91 | 03:48 SHARON: or is that an abstraction layer? |
| 92 | |
| 93 | 03:55 DARIN: No, well, absolutely when we talk about a process in Chrome, we |
| 94 | mean an OS process. And so we might have multiple web pages being served by |
| 95 | that single renderer process. We do try to spread the load across multiple |
| 96 | processes, but we also independently decide how many processes to actually |
| 97 | create. And it can be based on - there could be good reasons from, like I said, |
| 98 | a performance perspective to having tabs assigned across multiple processes, |
| 99 | but there can also be good security properties, like letting the web pages be |
| 100 | allocated to different processes means that those web pages are not running in |
| 101 | the same process, meaning they're not running in the same address space. And |
| 102 | from a security perspective, that has really great properties because it means |
| 103 | if a web page is able to tickle a bug in the rendering engine in the V8 or in |
| 104 | part of Blink and somehow get a privilege escalation, like start to be able to |
| 105 | do things that JavaScript normally can't do, it's still going to be limited by |
| 106 | the capabilities of that process and what it has access to. And so if that |
| 107 | process has really only the data for the web page that was providing the |
| 108 | problematic JavaScript, well, it's not really getting access to anything it |
| 109 | didn't already have. And that's kind of the whole idea of process isolation and |
| 110 | sandboxing. And then on top of that, you limit the capabilities of that process |
| 111 | by really leveraging the OS process primitive and the kinds of restrictions and |
| 112 | capabilities that can be removed from that process to achieve an isolation for |
| 113 | web pages for an origin or for a set of web pages. I say set because we might |
| 114 | not want to allocate a process for every single tab or for every single origin |
| 115 | because that might just use up way too many system resources. So we have to be |
| 116 | thoughtful there, too. |
| 117 | |
| 118 | 05:50 SHARON: Yeah, so this is quite closely related to site isolation, which |
| 119 | isn't the topic of this video - maybe the next one. So terms that are used |
| 120 | often and sometimes interchangeably are multi-process architecture and process |
| 121 | model. So these aren't exactly the same thing, but I think can you explain the |
| 122 | difference between them and what each one is for? Because there are |
| 123 | similarities, but. |
| 124 | |
| 125 | 06:16 DARIN: Sure. I mean, I think to me, the phrase "process model," it's |
| 126 | talking about, what does a particular process represent, what does it do. And |
| 127 | then when I say multi-process architecture, I'm thinking of the whole thing. |
| 128 | It's all packaged up. It's a multi-process architecture to build a browser. At |
| 129 | the end of the day, user is hopefully not so aware of the fact that this is how |
| 130 | it's built. I mean, earlier on in Chrome's history, the Windows Task Manager |
| 131 | didn't do a very good job of grouping processes by their parent. And so if you |
| 132 | opened the Task Manager at the OS level, you'd see just a spew of processes |
| 133 | that Chrome was responsible for. And it could be a little disconcerting for |
| 134 | people. A little tangent, but now more modern versions of Windows, they do kind |
| 135 | of group it all to the parent task. And so it's a little easier and less sort |
| 136 | of in-your-face that Chrome is creating all these processes. But yeah, at the |
| 137 | end of the day, it's just the multi-process architecture is like that's the |
| 138 | embodiment of the whole thing. And we have these different process types that |
| 139 | make up that whole thing. There's the browser process, the main one, and then a |
| 140 | renderer process is the name we give to the processes responsible for running |
| 141 | web pages. And then we have a few other process types that are part of the |
| 142 | puzzle, a networking process, a GPU process, utility process, and occasionally, |
| 143 | in the lifespan of Chrome, other types of processes. We had plugin processes, |
| 144 | for example, when we were hosting Flash in Chrome. And the Native Client had |
| 145 | its own type of processes as well. So what's that all about? Really, I can go |
| 146 | into it if you want me to go into all the details there. But - |
| 147 | |
| 148 | 08:05 SHARON: Yeah, I think we'll run through - this is a, yeah, perfect segue. |
| 149 | We'll run through each of those process types you just mentioned and mention a |
| 150 | bit about what they do, how much privilege they have, maybe how many of them |
| 151 | there are because some of them, there's only one of. So I think it makes sense |
| 152 | to start with the browser process, which is the process and is often likened to |
| 153 | the kernel in an operating system. |
| 154 | |
| 155 | 08:30 DARIN: Yeah, so the browser process kernel operating system broker, these |
| 156 | are kind of good analogies for what the browser process's role is. So it's the |
| 157 | application process, the main one, that starts up initially, and it's the one |
| 158 | that hosts the whole UI of the app. And it's going to spawn these child |
| 159 | processes, the renderer processes, the GPU process, and so on, to help fulfill |
| 160 | its goals. So very early on, we started with this design where WebKit, the |
| 161 | rendering engine we were using from Apple, it could be built as a COM control |
| 162 | and register it on the system and load it as a DLL. And then in order to run |
| 163 | that in a child process, it was using HWNDs and all the standard Win32 isms to |
| 164 | do its job. And we started out by just literally trying to capture a bitmap |
| 165 | rendering of WebKit and send it over to the browser process where we could |
| 166 | present that bitmap. Actually, rewind even further. The very first version took |
| 167 | advantage of the fact that Windows supports having HWNDs hosted in different |
| 168 | processes and threads. And so we literally just took that HWND from WebKit and |
| 169 | that child process and stuck it into the window hierarchy of the browser |
| 170 | process. And we drew our browser UI around it, and WebKit was there, but it was |
| 171 | running in a different process. And if we ever needed to tell that process to |
| 172 | do something, we just send a WM user event postmessage to it. And that's |
| 173 | something Windows lets you do. So it felt like a very simple toy kind of way to |
| 174 | try it all out. A lot of limitations to that design. Pretty quickly, we |
| 175 | realized we didn't want to just be in that kind of setup, and we moved to |
| 176 | building our own IPC channel, a pipe, so that we could communicate and really |
| 177 | get to the point where WebKit's running there without an HWND, without its own |
| 178 | Win32 windowing constructs, but instead, it's just kind of an image generator. |
| 179 | And we take the image that it generates, the bitmap, send it over our IPC |
| 180 | channel to the browser process. And the browser process is where we have our |
| 181 | window hierarchy browser process. We display that bitmap browser process where |
| 182 | we collect user input and send it to the pipe to the renderer where we then |
| 183 | feed it into WebKit. |
| 184 | |
| 185 | 10:46 DARIN: That was the original architecture of Chrome. So in that world, |
| 186 | the browser process is your application process. It has all the UI. And it's |
| 187 | really like this glorified image viewer. And the renderer process is literally |
| 188 | just like it's running WebKit - now Blink. It's running the rendering engine, |
| 189 | and it's producing those images whenever. Like, an update occurs. A layout |
| 190 | occurs or some invalidation occurs. And we got a little fancy. It was producing |
| 191 | just the sub. It would know, oh, I really only have a small damage rect, so I |
| 192 | don't have to produce the whole image. I just produce a small part. And send |
| 193 | that over, and then we paint that into the part that the browser is retaining |
| 194 | an old image of. And it can update just that one part. And so that's a very |
| 195 | simple approach that we took when building this whole thing. And so those |
| 196 | render processes become very much just very simplistic in that they aren't |
| 197 | interacting with the rest of the OS in a very deep way. They are just taking |
| 198 | input events from this pipe and sending images back. When they need other |
| 199 | services like they need network access, instead of going straight to the |
| 200 | network from the renderer process, because we started to realize, hey, we might |
| 201 | want a sandbox and restrict those child processes, and also, we needed the |
| 202 | notion of cookie jar that was shared across all web pages, so that if you visit |
| 203 | GMail in one tab and visit GMail in another tab, you're still logged in, we |
| 204 | needed the network stack to be in a unified place. So it meant that not just |
| 205 | would we send images up to the browser, but now we would send network requests |
| 206 | to the browser. And the browser would respond with the network data. And as a |
| 207 | result, we started to go down this path of centralizing access to system |
| 208 | services and resources in the browser process. |
| 209 | |
| 210 | 12:44 DARIN: It's becoming therefore like a broker to the system that the |
| 211 | renderer now is unable to - not unable - it's asking the browser for everything |
| 212 | it needs. It's communicating to the browser to get access to all the different |
| 213 | resources. And that allowed us to then restrict the renderer process |
| 214 | considerably so that it doesn't even have access if it wanted to touch the file |
| 215 | system, to touch the network TCP/IP implementation or any system resources. So |
| 216 | the sandbox really is all about how we apply those restrictions, taking away |
| 217 | the capabilities of a windows process. So in the very early days, there was |
| 218 | just the browser process and renderer processes. And we would allow multiple |
| 219 | renderer processes to be created as tabs were opened. And we put some |
| 220 | restriction on the number of processes based on the amount of RAM that your |
| 221 | system would have, thinking that processes maybe have some inherent overhead, |
| 222 | which they do. Certainly, there's the overhead of the V8 heap that is allocated |
| 223 | once per process or once per isolate, if you're familiar with the details of |
| 224 | V8. And so, we didn't want to have so much of that kind of - so we thought |
| 225 | there was some limit to how many processes we should have. Later on, other |
| 226 | processes types started to emerge. The next one that came was the Plugin |
| 227 | process because in order to get YouTube to work back in 2006, you needed to |
| 228 | support Flash. And Flash has two modes - it did. It had a windowed mode and a |
| 229 | windowless mode. And the difference is whether it drew itself into an HWND or |
| 230 | if it would just produce a bitmap itself. But regardless of what mode it was |
| 231 | rendering in, it still wanted direct system access, like it wanted to touch the |
| 232 | file system. And so if we were going to run it in our browser, it can't run in |
| 233 | the renderer process. It has to run somewhere else. And so, yeah, in the frenzy |
| 234 | of, gee, wouldn't it be nice if we could have sandboxing, it was, how the heck |
| 235 | are we going to sandbox and isolate plugins? Because the way plugins integrated |
| 236 | with WebKit is that WebKit just directly called into them and said, hey, if |
| 237 | it's a windowless one, give me your bitmap. I'm going to include it in my |
| 238 | rendering. If it's a windowless one, it also means it's dependent on WebKit to |
| 239 | feed it events. And so, how does that work? So we ended up building a process |
| 240 | type called the Plugin process type for NPAPI plugins, Netscape-style plugins, |
| 241 | all stuff that doesn't exist anymore. It's wonderful. And NPAPI is this |
| 242 | interface that was once upon a time, I want to say, kind of, like - my head is |
| 243 | going to some unsavory words. It was kind of pooped out by somebody at Netscape |
| 244 | to make Acrobat Reader work over the weekend. And then it became a stable API. |
| 245 | And lots of regret and sadness probably followed, but as a result, things like |
| 246 | Flash were created, and web became very interesting in some ways. A wonderful |
| 247 | story about Flash, I think. |
| 248 | |
| 249 | 16:02 DARIN: But anyways, supporting that stuff meant dealing with some gnarly |
| 250 | frozen APIs and figuring out how to stitch all that together, and the renderer |
| 251 | process of WebKit would talk to something that wasn't actually in its process |
| 252 | that was - or, again, another IPC channel, running a whole other process. We |
| 253 | wanted plugins to still not run in our browser process, but to, instead, run in |
| 254 | their own process so that if they crashed, they wouldn't take down the whole |
| 255 | browser. And Flash and other plugins were notorious for crashing. So it was a |
| 256 | must that they run in their own process. But we figured they couldn't be |
| 257 | sandboxed as tightly as the renderer as WebKit because they already were |
| 258 | accessing the system in very deep ways. |
| 259 | |
| 260 | 16:55 SHARON: Cool, lots of - |
| 261 | |
| 262 | 16:55 DARIN: Lots more processes got added later, like the networking, the GPU |
| 263 | process, and NaCl. I can tell the story about those, too, if you're interested. |
| 264 | |
| 265 | 17:08 SHARON: Oh, sure. Yeah, let's hear it. |
| 266 | |
| 267 | 17:08 DARIN: OK, so 2009 era, I think, maybe 2010 - I don't know - somewhere |
| 268 | along the way, we started building Chrome for Android. And you might recall I |
| 269 | described how the renderer was really kind of a glorified image viewer, or the |
| 270 | browser, browser was sort of an image viewer and the renderer's job was to |
| 271 | produce a bitmap. And then we send it over to the browser, the browser would |
| 272 | draw the bitmap. Mobile systems were not going to work very well if this is the |
| 273 | way the drawing was going to work. If you think about how scrolling works or |
| 274 | worked back then, scrolling a web page back then meant telling the computer to |
| 275 | please memmove all the pixels, and then to draw another bitmap where pixels are |
| 276 | not existing yet and need to be drawn. So you do a memmove followed by a |
| 277 | memcpy. And so this is how original Chrome was built. If you were scrolling, it |
| 278 | would be, oh, we need to shift pixels, and here's the bitmap. We need to stick |
| 279 | in the part that's exposed. Do that all quickly, and do it over and over again. |
| 280 | And that kind of operation is just not good if your goal is like nice |
| 281 | responsive scrolling on a touch screen. Instead, the way mobile systems were |
| 282 | built is using GPU rendering and compositing engines powered by GPUs, so that, |
| 283 | instead, you are offloading a lot of that work to the GPU. So it was necessary |
| 284 | to restructure Chrome's rendering pipeline for mobile, at least. But because we |
| 285 | were doing that, we can also take advantage of it on desktop. Meanwhile, we |
| 286 | were also on desktop starting to invent things like WebGL. Initially, WebGL, |
| 287 | the precursor to that was this plugin called O3D, which is a 3D graphics plugin |
| 288 | using the wonderful plugin APIs that I talked about before. But it provided |
| 289 | this way to have 3D graphics scenes and build immersive kind of 3D content. |
| 290 | That team, at some point, switched their sights on how to make that a standard |
| 291 | through WebGL. Wonderful stories around that. But it also entailed figuring out |
| 292 | how to do OpenGL, essentially, because WebGL was just OpenGL ES, and how to do |
| 293 | that from a renderer, from that blink child process, how to do it there. And |
| 294 | really, that meant that, OK, this process is going to be - these sandbox |
| 295 | renderers are going to be generating a stream of GL commands. Where do they go? |
| 296 | What do we do with that? And also, we know that it's possible to write shaders |
| 297 | and possible to write GPU commands that can really wreck - can cause havoc, can |
| 298 | be problematic, can cause the system to crash your process. So we don't want |
| 299 | that happening in the browser process because we want the browser process to |
| 300 | stay up so it can [INAUDIBLE] the manager. |
| 301 | |
| 302 | 20:21 DARIN: So the GPU process was born. This will be the process that |
| 303 | actually talks to the OpenGL driver or DirectX under the hood via ANGLE on |
| 304 | Windows. And so now, we set up another pipe from the renderer over to the GPU |
| 305 | process, and the stream of GL commands are being sent over there. And over |
| 306 | there, it's talking to the driver. And if you sent something bad, driver is |
| 307 | going to say no bueno and crash your process. And we would find that the |
| 308 | browser would see the GPU process died, and it would maybe give you a warning |
| 309 | or let you reload the page, and it will try again. As that's done, that's how |
| 310 | we therefore were able to leverage processes to give us that isolation, but |
| 311 | also give us that robustness, give us that capability. And that led to a lot of |
| 312 | complexity, but also a lot of really amazing sophistication around the |
| 313 | compositing engine. Chrome CC library was born subsequently, and all these |
| 314 | things that have led to the modern way that we render the web on Chrome now. |
| 315 | Skia learned how to render to OpenGL, et cetera, and the GPU process. |
| 316 | |
| 317 | 21:35 DARIN: Next one came along was the network process, which was really born |
| 318 | out of the idea of, gee, wouldn't it be nice to isolate the networking code |
| 319 | into its own process that could be more tightly sandboxed? Because the |
| 320 | networking stack tends to be a surface area that's accessible by attackers. |
| 321 | Just like the V8 and JavaScript engine is parsing lots of stuff and very |
| 322 | exposed to attack surface from would-be attackers, the network stack, same |
| 323 | thing. You've got HTTP parsing and various other kinds of processing happening |
| 324 | very close to content that attackers can control. And so this project, quite |
| 325 | rather elaborate project to move the networking stack out of the browser |
| 326 | process out of that broker process, but to, instead, its own process and have |
| 327 | all the pipes go various IPC channels connecting to there, instead, was born. |
| 328 | And I think this was more born in the era of Mojo IPC, where we had a more |
| 329 | flexible IPC system that could help support that kind of transition, but still |
| 330 | tons of work and quite a radical change to the flow of data and the way the |
| 331 | system works. Previously, just to give a little aside, when a renderer is |
| 332 | making a network request, the browser process acting as a broker needs to |
| 333 | audit, is it OK for that guy to be requesting this thing? Think about all the |
| 334 | kinds of rules that might be there, CSP, other kinds of things, and the |
| 335 | security origin privileges associated with it and what we want to allow a |
| 336 | renderer to actually access. Simple stuff like we support WebUI like Chrome |
| 337 | colon pages in the context of, they load in a renderer process, that renderer |
| 338 | process should be allowed to access other things from Chrome colon, right? But |
| 339 | a web page shouldn't be able to. We don't want the arbitrary web pages to be |
| 340 | poking around and seeing what's available in the Chrome colon URL. So that's |
| 341 | like a simple example of where we honor that isolation. And so the browser |
| 342 | process, having the network stack in the original incantation of Chrome makes |
| 343 | no sense. It can apply these rules right there. Safe browsing was integrated |
| 344 | there. Lots of different kinds of network filtering could be done there. Moving |
| 345 | that to another process was a big change because now browser is the one that |
| 346 | has the smarts to do auditing, but the data and all the requests are going to |
| 347 | this other process. So making that work meant a lot more plumbing. And I think |
| 348 | complexities ensued. But it's awesome to see it happen. |
| 349 | |
| 350 | 24:20 DARIN: Anyways, I mentioned Native Client. So that was a precursor to |
| 351 | Wasm that was a big investment by the Chrome team to find a way to bring native |
| 352 | code to the web in a safe, secure manner. The initial take on it was, if you're |
| 353 | running native code that came from the web on a system, that's scary. It could |
| 354 | do like anything, right? Well, no, let's restrict the process capabilities, but |
| 355 | even with a restricted set of capabilities, you can't necessarily restrict |
| 356 | everything on Windows or Mac or Linux. There's always some limitation to the |
| 357 | sandbox capabilities. And in many ways, the sandboxes that we implemented are |
| 358 | kind of just an extra level of defense. If you think about it, the JavaScript |
| 359 | Engine is already a sandbox, right? It already limits the capabilities. The web |
| 360 | rendering engine, all the different kinds of security checks throughout the |
| 361 | code are various forms of sandboxing. And then finally, the process in the way |
| 362 | we restrict its capabilities is that next last defense. Well, running native |
| 363 | code with only that last defense in place is not enough. So Native Client was |
| 364 | designed to be not only to be native code that could be highly auditable, so |
| 365 | that you could make sure that it's not allowed to jump to an address that it |
| 366 | doesn't have code for, that it's not allowed to do things outside the set of |
| 367 | things that it's allowed to do. So it had a lot of complexity as well in terms |
| 368 | of how the process has to be set up in terms of the memory layout and various |
| 369 | other details, which maybe I'm happy to not remember. And - but it meant it |
| 370 | needed its own process type. Even though it integrated kind of like a plugin, |
| 371 | it couldn't just be a plugin. It needed its own process type. And there had to |
| 372 | be 64-bit variants and 32-bit variants, depending on the actual OS, actual |
| 373 | underlying hardware that you were running on Arm versus Intel, all these |
| 374 | differences. So yeah, we ended up with leveraging this process model |
| 375 | extensively to enable these kinds of things. |
| 376 | |
| 377 | 26:32 DARIN: I think I mentioned the utility process. In Chrome, the utility |
| 378 | process is this thing you reach for when you want to do something that's |
| 379 | potentially - like maybe you're dealing with some untrusted input, like you |
| 380 | want to decode an image, or you want to run something in a process, and you |
| 381 | just want to make sure that if it's going to do anything, it just dies over |
| 382 | there and doesn't take down the whole browser process. I think some extension |
| 383 | install manifest parsing, maybe various other kinds of things like that, would |
| 384 | happen in a utility process as like a safety measure. Generally speaking, |
| 385 | parsing input from the web or even the Web Store or things like that, doing |
| 386 | that parsing in the browser process is a scary thing because you're taking |
| 387 | input from a third party. And if you're parsing it there, you might have a bug |
| 388 | in your parser, and that could lead to the most trusted process having been |
| 389 | compromised. |
| 390 | |
| 391 | 27:29 SHARON: Yeah, that falls into the whole Rule of Two thing, right, of |
| 392 | untrusted data. We have a [INAUDIBLE] process. It's in C++. The thing that we |
| 393 | decided to change is where it gets parsed, so. |
| 394 | |
| 395 | 27:44 DARIN: That's right. |
| 396 | |
| 397 | 27:44 SHARON: That makes sense. |
| 398 | |
| 399 | 27:44 DARIN: Yeah, so the sandbox processes get used as this primitive to give |
| 400 | us that extra safety measure. |
| 401 | |
| 402 | 27:57 SHARON: So the other process type I can think of that wasn't just covered |
| 403 | there was extensions. Is there anything to say there? |
| 404 | |
| 405 | 28:02 DARIN: Sure, of course. |
| 406 | |
| 407 | 28:02 SHARON: Of course. |
| 408 | |
| 409 | 28:02 DARIN: In some ways, an extension process will show up that way in |
| 410 | Chrome's Task Manager, but I believe it's usually just powered by a renderer, |
| 411 | an ordinary renderer, because so extensions have background pages or background |
| 412 | event in, I guess, the Manifest V2, it was background pages. Manifest V3, it's |
| 413 | now just event pages or service worker type construct. And those need a process |
| 414 | to run in. So the extensions get to inject some code that runs in the renderer |
| 415 | of the web page, usually in an isolated world, so it can see the same DOM. If |
| 416 | you've given the permission for the extension to read website data or to |
| 417 | manipulate website data, it can do that by injecting a content script that will |
| 418 | run in the same process as the web page that it's reading or modifying. But it |
| 419 | will run in an isolated JavaScript context so that it's not seeing the same |
| 420 | JavaScript variables and such. But it can still see the DOM. And that's meant |
| 421 | to give a lot of capability, but also have a little bit of protection because |
| 422 | it's so easy to accidentally interfere with the same JavaScript variables and |
| 423 | things like this. OK, so extensions have that piece that injects a content |
| 424 | script, but they also have a - usually, they can have this event service worker |
| 425 | or background page that is their central place, process place for code to run. |
| 426 | And so we do run that in a renderer process. And so for example, if the |
| 427 | extension that's injected into a page wants to get some capabilities, it would |
| 428 | talk to its service worker, who would then have the capability to ask for |
| 429 | certain extension APIs to maybe understand all the tabs that are in your |
| 430 | system, depending on what permissions it was granted. And then finally, with |
| 431 | extensions, you also have the extension button and a dropdown that can occur |
| 432 | there, which a web page can be drawn there by the extension. And that's going |
| 433 | to be hosted in a renderer process, too. But that would be a web page that |
| 434 | lives at a Chrome extension colon URL. And so you have these different pieces |
| 435 | of the extension model where code from the extension can be running, and it, |
| 436 | via some messaging channel, can talk to the other parts of itself that run in |
| 437 | potentially likely different processes. |
| 438 | |
| 439 | 30:37 SHARON: You mentioned service workers there, and those are kind of |
| 440 | related to all this, too. So can you tell us a bit more about those? |
| 441 | |
| 442 | 30:43 DARIN: Yes, so - well, OK, so backing up, in the context of extension, if |
| 443 | we talk about background page first, the original idea with extensions was, OK, |
| 444 | I'm injecting stuff into pages so I can modify things, but I also need like my |
| 445 | home base. I need my context where - I need a place where my persistent script |
| 446 | is running or where I can manage my databases, and I have just one place for |
| 447 | that. And it's also a place where I can get elevated permissions to access |
| 448 | other Chrome extension APIs. So that idea of a background page that the |
| 449 | extension can create that's ever present so it's like a web page, but it's |
| 450 | hidden, it's in the background, and content scripts that are injected into web |
| 451 | pages can talk to it. So they can say, oh, I'm on this page. Give me some rules |
| 452 | that I should apply to it or something, depending on the nature of that |
| 453 | extension. OK, so but background pages are, unfortunately, persistent. And they |
| 454 | live for the whole life of the browser. And they use up memory. They use up |
| 455 | resources, even if nothing else about the extension needs doing. Even if the |
| 456 | extension is not loaded into any web pages, that background page is sitting |
| 457 | there. And so this was [INAUDIBLE] quickly realized, this is not great. This is |
| 458 | a waste of resources for the system. We should have some policy for how we |
| 459 | should close that background page down and only need to create it when |
| 460 | necessary. In the context of, I think, Chrome apps, which is a thing that's no |
| 461 | longer a thing, we created this concept called event pages, which allowed for |
| 462 | these background pages to be a little more transient, that come into being only |
| 463 | as needed, which is a much more efficient approach. |
| 464 | |
| 465 | 32:28 DARIN: However, when it came time to bring that to extensions, at the |
| 466 | same time, Service Worker had been created, which was a tool for web pages to |
| 467 | be able to do background event processing. So the decision was to adopt that |
| 468 | standards-based approach to how to do background processing. And so Service |
| 469 | Worker is the construct that Manifest V3 allows extensions to use for that sort |
| 470 | of background processing. Big difference between service workers are that they |
| 471 | are not web pages. They're just JavaScript. But they can listen to different |
| 472 | kinds of events. So just like a web worker, shared worker, service worker, they |
| 473 | are without UI. They are without any HTML. They just have the ability to - but |
| 474 | they have some functions that are given to them on the global scope that lets |
| 475 | them talk to the outside world, to talk to the web page that created them, or |
| 476 | in the case of Service Worker, they actually have events they can receive to |
| 477 | handle network requests on behalf of the page. That's one of the main uses for |
| 478 | them in the context of the web. A web page would have a Service Worker register |
| 479 | it with the browser to say, hey, please contact my service worker if you are |
| 480 | making a request for my origin. And that gives the Service Worker the |
| 481 | opportunity to specify what content should be used to satisfy a URL. It could |
| 482 | load that content out of a cache, and the Service Worker API includes APIs for |
| 483 | managing caches and things like this. So all of that system that was built to |
| 484 | kind of enable web pages to operate more robustly in the context of poor |
| 485 | network connectivity or to get performance improvements for applications that |
| 486 | are more single page applications that have a basic fixed shell that should |
| 487 | load out of cache and then they make network requests to the server to get the |
| 488 | data that populates some application UI, that model Service Worker was really |
| 489 | designed for. But it seemed a very good fit for extensions. And it gets us out |
| 490 | of the world of having these persistent extension background pages. So Manifest |
| 491 | V3 says, if you want your content script to have access to privileged things, |
| 492 | you go through a system, a Service Worker. And the Service Worker will get |
| 493 | spawned in a renderer process. What renderer process? You don't know. It's up |
| 494 | to the system. Chrome will make a decision there based on all of its usual |
| 495 | rules around what other origins are in that process, thinking from a security |
| 496 | isolation perspective, and so on, and so forth. |
| 497 | |
| 498 | 35:22 SHARON: Cool. A lot of these process types have been added over time as |
| 499 | the need for them arises. Like, oh, we want to put network stuff in a separate |
| 500 | process. So apart from adding more process types, what have been other big |
| 501 | changes to the multi-process architecture and processes in Chrome in the many |
| 502 | years since launch? |
| 503 | |
| 504 | 35:44 DARIN: The biggest one by far is the per site isolation, the site |
| 505 | isolation work that was done. |
| 506 | |
| 507 | 35:51 SHARON: We'll talk about that more next. |
| 508 | |
| 509 | 35:56 DARIN: Yeah, so, I mean, well, I'll just say, so Charlie Reis was an |
| 510 | intern on Chrome team back in the day during the pre-release period of Chrome. |
| 511 | And I remember the conversations where we were like, gee, wouldn't it be nice |
| 512 | if instead of isolating based on per tab, it was isolating per origin? And I |
| 513 | think he was doing research on that topic, too. And he had all these ideas for |
| 514 | this kind of a thing. And so it was really kind of very early on that we were |
| 515 | having these conversations. But even very early on, it was like, this is going |
| 516 | to be a big change, you know? No longer is it the idea that it's a big change |
| 517 | to the rendering engine itself, like how frames could be served by different |
| 518 | processes. So in order to isolate based on origin, you have to say a frame |
| 519 | where an ad might live would actually have to be served by the process for that |
| 520 | origin. And so now no longer is the whole frame tree just in one process. |
| 521 | That's a big change. But built on top of the infrastructure we had, it was |
| 522 | possible to imagine it, and it was quite a journey to get there. So that was |
| 523 | probably the biggest change to the architecture. But like I mentioned before, |
| 524 | actually, other big changes were definitely the introduction of the GPU |
| 525 | process, definitely the introduction of Mojo IPC. Before Mojo IPC, the way |
| 526 | things worked was, basically, messaging was much simpler, in some ways, easier |
| 527 | to understand, but also much more the case that there were these files that |
| 528 | really needed to know about everything in their world, like the render process |
| 529 | host and the render process, the render view host and the render view, the |
| 530 | render frame. The render frame host didn't exist then, but they came about |
| 531 | because of site isolation, really. But the render view, render view host became |
| 532 | this thing that represented the web page, and render view host in the browser, |
| 533 | render view in the render. And for any feature that required brokering out to |
| 534 | the browser to get access to something, essentially, the render view, the |
| 535 | render view host had to be participants in that because they had to be kind of |
| 536 | routers for that traffic. That's not very scalable. You start adding lots of |
| 537 | engineers, building lots of different features that need lots of different |
| 538 | capabilities. And these files start growing hairs and knowing about too many |
| 539 | things. And it becomes really hard to manage. |
| 540 | |
| 541 | 38:38 DARIN: On top of that, you start to have things where you say, gee, I |
| 542 | really wish this system could be live in a different process. I mentioned the |
| 543 | networking process. All these events were coming through these different kinds |
| 544 | of crossroads of hell files. That was how I liked to call them. And in order to |
| 545 | take a subset of that and move it to a different process, now you have to redo |
| 546 | all that plumbing. And so the amount of layers of repeating yourself for |
| 547 | plumbing IPCs felt very out of control for - maybe how much work you had to do |
| 548 | to unlock a certain feature just seemed out of control. And so Mojo really was |
| 549 | inspired by how to eliminate a lot of that, to have a system that's more |
| 550 | endpoint to endpoint-based and all the flow of data would no longer be |
| 551 | dependent on all of these kinds of routing classes that handled all this |
| 552 | routing. And instead, you could just say, I have an endpoint. I have an |
| 553 | endpoint over here. This one's privileged. This one's not. And if I want this |
| 554 | one to live over here, I can do that. I can just move it around freely. And all |
| 555 | the routing is taken care of for me. And so that was a big change. And there's |
| 556 | many artifacts in the code base that sort of reveal the old system, right? In |
| 557 | many ways in which the product is built still resembles that old system. The |
| 558 | idea that if you look at a render view, render view host, there's an ID, a |
| 559 | routing ID associated with that. The concept of routing IDs are not needed in |
| 560 | Mojo anymore because the pipe itself, the Mojo pipe is like an identifier, in |
| 561 | some sense. Of course, so much of our system is built up around the idea that |
| 562 | tabs have these render view IDs, and frames have render frame IDs, and |
| 563 | processes have process IDs. And so many systems deal with those integers that |
| 564 | it's been unthinkable to not have those anymore. But in some sense, they aren't |
| 565 | really needed. If we were to build things from scratch from anew with the Mojo |
| 566 | system, you wouldn't need it. |
| 567 | |
| 568 | 40:50 SHARON: Do you think if you were to start redesign the whole |
| 569 | multi-process thing now, given how not just the internet is used, but also the |
| 570 | devices that are out there, I think you would probably want to have multiple |
| 571 | processes for things. But do you think there would be significant changes to |
| 572 | how the system overall is designed or put together if one were to start now? |
| 573 | |
| 574 | 41:16 DARIN: Well, yeah, I mean, it's always a question of where you're |
| 575 | starting from and what the constraints are that you're dealing with. We were |
| 576 | dealing with taking WebKit, which we didn't really have a lot of ownership of. |
| 577 | And it was open source, but we also had limited bandwidth to go and fork it and |
| 578 | manage that fork. And so to kind of try to create multi-process in the context |
| 579 | of this big significant piece that we really can't change or do much about |
| 580 | definitely limited us. So we had early ambitions and ideas. Like I said with |
| 581 | Charlie about site isolation, it wasn't going to be then that we could realize |
| 582 | it. It needed to be in a place where we had ownership of Blink. And not just |
| 583 | ownership, I mean capability to go and change it and to own the consequences of |
| 584 | changing it, to be able to manage that. We needed that, and we needed a lot of |
| 585 | other pieces. So if I'm starting over, I also have to - it's sort of like, |
| 586 | well, what am I starting from, right? But certainly, I feel like a lot of |
| 587 | lessons along the way inspired Mojo and the design there. And I feel like |
| 588 | that's a system that that sort of system would allow for an architecture that I |
| 589 | think would be better in many ways. And I'm very biased because that's |
| 590 | something I've worked on, and it was inspired by things I saw that weren't |
| 591 | great about the way that we built Chrome originally, although, in many ways, |
| 592 | the original setup with Chrome was born of pragmatism and minimalist in many |
| 593 | ways, trying to achieve - Chrome was very focused on being a product first, not |
| 594 | a browser construction kit. And so the idea that it needed to morph into a lot |
| 595 | of different things wasn't there in the beginning. In the beginning, it was, |
| 596 | you're just building a browser for Windows XP Service Pack 2. That's it, |
| 597 | nothing else. OK, now Vista. You got to worry about Vista, too, sorry. But just |
| 598 | that's it. And then later on, you add Mac. You add Android. You had Chrome OS, |
| 599 | iOS, Chromecast, et cetera, et cetera. And suddenly your world is very |
| 600 | complicated, and the needs of this system is way more. And the value of |
| 601 | malleability becomes higher. Look at the investment in views, et cetera, to |
| 602 | allow cross-platform UI, and then Mojo to allow a much more flexible system |
| 603 | under the hood. So it depends on your constraints in a lot of ways. |
| 604 | |
| 605 | 43:43 SHARON: Yeah, that makes sense. Something you said about even now in the |
| 606 | code base, you can see remnants or suggestions of how obsessed maybe of how |
| 607 | things used to be. So one of the things that makes me think of is about the IO |
| 608 | and UI threads because I feel like people used to talk about those more. And |
| 609 | now that's maybe changing a bit. So how come these are the only times we hear |
| 610 | the term "thread," really, in all of this? And what are the IO and UI threads |
| 611 | that can you just tell us a bit about? |
| 612 | |
| 613 | 44:20 DARIN: Oh, yeah, threading is a super fun topic. Now we have all these |
| 614 | task runner concepts and systems for giving you a task runner that's on an |
| 615 | isolated thread or whatever. And systems like Mojo allow you to not really have |
| 616 | to do a lot of plumbing to compensate for your choice of thread where you want |
| 617 | something to run. You can just indicate where it should go, and that happens. |
| 618 | But OK, originally, the design of the system was there was a UI thread, and |
| 619 | that's where all the UI lives. So the HWNDs, the Window handles and all the |
| 620 | Win32 stuff would go there. Input painting come in there. Then there was - so |
| 621 | early on, I like to tell this story because one of the very first versions of |
| 622 | Chrome, we had just that UI thread sending data to a renderer processes. And |
| 623 | the renderers would have their main thread where they ran JavaScript and |
| 624 | everything. So there was just these two threads in two different processes. |
| 625 | That was kind of it. In the browser process, there might have been the system |
| 626 | was probably doing a lot of other stuff with its networking stack and DNS |
| 627 | threads and such. But we weren't doing any. That wasn't us. That was probably |
| 628 | libraries we were using. So we had these two threads in two different processes |
| 629 | and IPC channel. And so you send the input down to the renderer. The renderer |
| 630 | sends you a bitmap. OK, Google Maps. Imagine Google Maps. And imagine you're on |
| 631 | a single core, non hyper-threaded laptop. And you take your mouse, and you |
| 632 | click on that map, and you start dragging it around. And you expect to see the |
| 633 | image tiles moving around, right? And but for some reason, in Chrome, on that |
| 634 | device, [SNAP] nothing happens. You just move your mouse around, and the image |
| 635 | is stuck there. You're like, what's going on? It works fine on this other |
| 636 | laptop. Why not on this laptop? Turns out that on that device, in that setup, |
| 637 | the input stream was coming in. And basically, we were sending all this input, |
| 638 | and the input events were taking priority in the Windows Event pump over any |
| 639 | painting and/or reading from our IPC channels. And so, as a result, we were |
| 640 | just sending input events to the renderer. It was doing work, generating new |
| 641 | images. Those images were coming to the browser and backed up in some pipe and |
| 642 | not really being serviced, not really making their way. And so we kind of came |
| 643 | to the realization of several things. One is, we need to throttle that input |
| 644 | going to the renderer, but we also probably need to have some highly responsive |
| 645 | IO threads that could be dedicated to servicing the pipes, the channels, the |
| 646 | IPC channels, both in the browser and the renderer, actually. And so what was |
| 647 | born from that was the IO thread. And the IO thread was meant to be highly |
| 648 | responsive thread for processing asynchronous IO. That's really what its name |
| 649 | should be - highly responsive, non-blocking IO thread - because the name IO |
| 650 | thread subsequently confused lots of people who wanted to do blocking IO on |
| 651 | that thread, like read a file or something. And we had to put in some |
| 652 | restrictions in the code to always let you know not to - that this function is |
| 653 | going to - there's certain runtime assertions if you try to use certain |
| 654 | blocking IO functions in base on the wrong threads. And alongside that, we |
| 655 | invented something called the file thread. Said, this is the thread where you |
| 656 | read files. This is the thread where you write files because we don't want you |
| 657 | doing that on the UI thread because the UI thread needs to be responsive to |
| 658 | user input. So don't do blocking file IO on the UI thread. Don't do it on the |
| 659 | IO thread either. Do it on the file thread. So - |
| 660 | |
| 661 | 48:14 SHARON: That means they're all running in the browser process. |
| 662 | |
| 663 | 48:20 DARIN: In the browser process. The renderer got its own IO thread, too. |
| 664 | So the renderer would have its main WebKit thread and its IO thread. So it was |
| 665 | sort of a symmetric system. You had IPC channel, which was wrapped with a class |
| 666 | called `ipc_channel_proxy`. These things still exist in the code base. And |
| 667 | ChannelProxy was a way to use an IPC channel from a different thread. But the |
| 668 | IPC channel would be bound to the IO thread. All of those things I just |
| 669 | mentioned still exist, and Mojo was built on top of those channels. But the IPC |
| 670 | channel provides that underlying pipe. So it's kind of IPC channel is |
| 671 | one-to-one with an OS pipe. Mojo has this concept of pipes which are more like |
| 672 | virtual pipes, and they're multiplexed over OS pipe, over an OS pipe. |
| 673 | |
| 674 | 49:08 SHARON: OK. Yeah, because I think, yeah, now you hear non-blocking IO, |
| 675 | but I feel like maybe it's just what part of the code base you work in. But |
| 676 | running things, making sure things run on the right thread seems to be less of |
| 677 | a problem than it used to be. |
| 678 | |
| 679 | 49:27 DARIN: Yes. I think there's a lot of reasons for that, a lot of maturity |
| 680 | in the system. But also, I think some of the primitives are set up nicely so |
| 681 | that you can more easily have things running. In some ways, we used to have |
| 682 | this concept of, yeah, we very much had this. Still, in some ways, still have |
| 683 | this, but the idea that there is a UI thread, that there's an IO thread, and |
| 684 | that there is a file thread, and you pick which thread you're going to use. |
| 685 | Now, there's a whole pool of blocking IO threads. And you don't specifically |
| 686 | say, I want the file thread. You say, I have blocking IO I want to do, or give |
| 687 | me a - I want to put it on a thread pool. The IO thread used to be like where - |
| 688 | it may be still the case that some systems would just live there only because |
| 689 | maybe for latency reasons - like, cookies is a good example. We knew that we |
| 690 | wanted to be able to respond quickly to the renderer if it was querying a |
| 691 | cookie database. So we want to be able to service that directly on the IO |
| 692 | thread. And so there'd be a collection of these things that were maybe somewhat |
| 693 | sensitive, and but we wanted to have them live and be on the IO thread. And so |
| 694 | that idea of some things live on the IO thread was born. But I think those |
| 695 | things are few. And you really have to highly justify why you should be on that |
| 696 | thread. And so most things don't need to be. Just be on the UI thread. It's OK. |
| 697 | Or structure your work so that the part that is expensive and blocking goes to |
| 698 | a blocking queue. |
| 699 | |
| 700 | 51:00 SHARON: So partly for these threads, sometimes you see checks. Like, |
| 701 | check that this is running on a certain thread. But in general, is there a good |
| 702 | way to find out what process a certain block of code runs on? Because some |
| 703 | things we know - if you go to a third party Blink, whatever, you kind of know |
| 704 | that that's going to run in a render process, but just looking at the code, |
| 705 | like looking in code search, can you know where something is going to - |
| 706 | |
| 707 | 51:25 DARIN: [INAUDIBLE] very early on to try to deal with this. So like if you |
| 708 | go to the content directory, it's a good one to look at. You'll see a browser |
| 709 | directory, subdirectory, a renderer subdirectory, and a common directory. And |
| 710 | there's some other ones that have these familiar names. We use that structure |
| 711 | all throughout the code base for different components. So if you go components, |
| 712 | components foo, you'd see browser, renderer, common, maybe a subset of those, |
| 713 | depending on. And so the idea is, if it's code that should only run in the |
| 714 | renderer, it lives in the render directory. If it's code that should only run |
| 715 | in the browser, it lives in the browser directory. If it's code that could run |
| 716 | in either, it lives in the common directory. So you'll see mojom definitions in |
| 717 | common directories because mojom is where you define the Mojo interface that's |
| 718 | going to be used in both processes. |
| 719 | |
| 720 | 52:12 SHARON: Oh. |
| 721 | |
| 722 | 52:12 DARIN: Yeah, we also have this code separation was also kind of born out |
| 723 | of this idea at one point in time that we might generate a totally different |
| 724 | binary for browser and renderer. And we used to have browsR. I'm calling it |
| 725 | that way because it didn't have an E at the end, so browsR and capital R, and |
| 726 | then rendR or something like this. And these were the two processes, the two |
| 727 | executables. And they could just compile whatever code they needed for their |
| 728 | purpose. Like WebKit would be in the renderer, and browser would have not |
| 729 | WebKit. It would have other things. And so these separate directories also |
| 730 | helped because it was like, that's the code that's going to go into that |
| 731 | process literally. And fast forward when Sandbox came along, the team was like, |
| 732 | nope, it's got to be the same executable for both browser and renderer and |
| 733 | should probably be called chrome.exe instead. And then that idea kind of that |
| 734 | they were separate executables and separate code kind of went away. And |
| 735 | instead, all the code for Chrome went into just this big DLL on Windows. And |
| 736 | the amount of shared code between the EXE and the DLL is very small, maybe a |
| 737 | little bit from base and such. But yeah, this idea of tagging the directory |
| 738 | structure in such a way that makes it obvious of like what process this code |
| 739 | belongs in, I think it was a big help, and it was a good choice. And it gives |
| 740 | people a little clarity of where they are and what they can use. |
| 741 | |
| 742 | 53:49 SHARON: What about for non-browser renderer processes? What about GPU |
| 743 | network? How do you know that this is running on the network process versus |
| 744 | this is how this part of this section of the code is interacting maybe with the |
| 745 | network process? |
| 746 | |
| 747 | 54:05 DARIN: Sometimes it can be a little bit of good luck. And sometimes it |
| 748 | might not be as obvious. I don't think this sort of - this structure that I |
| 749 | described was used for plugins, so there's a plugins directory, which may still |
| 750 | be around in some fashion or might be mostly gone. I don't know if when the |
| 751 | network process transition occurred, if this annotation was really maintained. |
| 752 | I actually don't think it was because I don't remember seeing network |
| 753 | directories. But I could be wrong. There might be some of them. I'm not as |
| 754 | familiar with the code for the networking process. But I think this convention |
| 755 | has helped us a lot and would be valuable to use in more places. For GPU, |
| 756 | there's a lot of symmetric code, probably code that runs in all processes, but |
| 757 | still this convention probably would make sense. But yeah, I think that for |
| 758 | some of those things, when you get like into the network world or you get into |
| 759 | the GPU world, you're also kind of in a more focused world, a smaller world. |
| 760 | And there's probably many other things you have to learn about that domain. |
| 761 | |
| 762 | 55:16 SHARON: Yeah, the GPU stuff seems very, very difficult. And I certainly |
| 763 | don't know how that works. OK, so - |
| 764 | |
| 765 | 55:23 DARIN: [INAUDIBLE] on there. |
| 766 | |
| 767 | 55:23 SHARON: Yes, so when it comes to process limits and performance and all |
| 768 | that kind of thing, so we have process limits, but you can go over them. And |
| 769 | can you tell us a bit about process limits, how they work, what happens when |
| 770 | you reach the limit? |
| 771 | |
| 772 | 55:39 DARIN: Hmm, yeah. So process limits, they exist to just have a reasonable |
| 773 | number of processes allocated for some definition of reasonable. At least early |
| 774 | on, that definition was based on how much RAM you had on your system. And as |
| 775 | computers got more and more RAM, that definition needed to be adjusted. We |
| 776 | assumed some overhead for individual processes. It's probably wise to put some |
| 777 | limits on how many we create. The allocation of those processes, it's best to - |
| 778 | kind of viewed as best to distribute the tabs across them as best as we can and |
| 779 | the origins across them now and the side isolation world to give more isolation |
| 780 | between different origins, to give more isolation between the different apps. |
| 781 | But at some level, you run out, and you need to now allocate across the ones |
| 782 | that are already in use. There's some hard rules around privileged content, |
| 783 | like Chrome colon URLs. They should not mix with ordinary web pages. But if |
| 784 | push comes to shove, we'll put a whole bunch of different origins content |
| 785 | together into the same process, just ordinary web pages, not trusted content. |
| 786 | |
| 787 | 56:52 SHARON: What happens if you just open a ton of tabs with a whole bunch of |
| 788 | different pages open, and you're basically stress testing what Chrome can do? |
| 789 | What happens in that case? |
| 790 | |
| 791 | 57:08 DARIN: It creates a lot of processes. It uses a lot of system resources. |
| 792 | It uses a lot of RAM. I think that this has been, I'd say, a battle for Chrome |
| 793 | across a lot of its lifetime and more recently, is how to manage these extreme |
| 794 | cases. And increasingly, these extreme cases are not actually odd or unusual. |
| 795 | They'll do a lot of browsing. People click on a lot of links. People create a |
| 796 | lot of tabs. People don't really close their browsers. They just leave it |
| 797 | running. And they come back the next day, and they continue where they left |
| 798 | off. And they open more tabs, and they do more surfing. And they just collect |
| 799 | and collect and collect tabs. And maybe they create more windows because maybe |
| 800 | they have some task that they're researching, and then they get interrupted and |
| 801 | they come back to it later. But they start to accumulate these windows full of |
| 802 | things that maybe they mean to come back to. And so that problem of just having |
| 803 | lots and lots of stuff and lots and lots of processes, well, Chrome under the |
| 804 | hood is like, I'll do my best. You wanted me to do all this stuff. I'm going to |
| 805 | do it. Let's see what I can do. And on a system like Windows or Mac where |
| 806 | there's a lot of RAM maybe, Chrome's thinking, OK, you wanted me to use the |
| 807 | RAM. I'm going to use the RAM. You wanted all those tabs. And then even on |
| 808 | those systems where maybe you're running out of RAM, but there's virtual |
| 809 | memory, there's disk space, all right, let's use it. Let's go. And so I think |
| 810 | it's really quite a challenge, actually. |
| 811 | |
| 812 | 58:44 DARIN: The original idea of Chrome was, yeah, make it possible for web |
| 813 | pages to take advantage of the resources of your computer. Let it allow web |
| 814 | pages to be more capable because of it, and not be - the old world prior to |
| 815 | Chrome was single-threaded browser, all web pages on the same thread. Like, you |
| 816 | could have a dual core machine, and it wouldn't matter. It wouldn't make your |
| 817 | browser any faster. But now with Chrome, no problem. You got dual core. You got |
| 818 | eight cores, whatever you got. We can have all of those things saturated with |
| 819 | work and allow you to multitask on the web and do lots of amazing things. But I |
| 820 | think it's still a resource management challenge for the browser because on one |
| 821 | hand, you want to give that capability, but on the other hand, you also don't |
| 822 | want to - how much power should you be using? What if the laptop's not plugged |
| 823 | into the wall? What if it's just running on battery? What is the right resource |
| 824 | utilization for Chrome? I don't think that's a solved problem at all. There's |
| 825 | various systems in place to throttle the resource utilization of background |
| 826 | tabs. Timers, for a long time now, have been throttled, but throttling other |
| 827 | things. I know there was a lot of research done into freezing tabs, so |
| 828 | literally suspending them and not letting them do any work. But with that comes |
| 829 | challenges of what do you do with all the IPCs that are inbound to those |
| 830 | processes? They're backing up on pipes, and that's not great. If you unfreeze |
| 831 | them, now there's a blast of IPCs coming in that they suddenly have to service. |
| 832 | That doesn't seem great. Do you drop those IPCs on the floor? Probably not. |
| 833 | Now, the process would be in some weird state, and you might as well have to |
| 834 | just kill it, which, of course, is the case on dev systems like Chrome OS and |
| 835 | Android. They do have to just kill the processes because of the limits of those |
| 836 | devices. So, yeah, I've been a proponent of just being aggressive about killing |
| 837 | processes on desktop in general. I think there's some balance there that's |
| 838 | right. It's probably not right to keep all the tabs open, all the processes |
| 839 | open. We should be, I think, judicious about what we keep open, keeping the |
| 840 | workload reasonable, instead of making it like a, oh, yeah, I will rise to the |
| 841 | challenge of dealing with thousands of tabs or thousands of web pages across |
| 842 | 100 processes, even if - maybe it's somehow possible through heroic effort to |
| 843 | make Chrome capable of doing such a thing in an efficient manner. But does it |
| 844 | mean we should? Who needs 1,000 tabs all running around doing work at once you |
| 845 | know? You don't. You really don't. Nobody does. |
| 846 | |
| 847 | 61:32 SHARON: So this is kind of the basis of the goal for Arc, right, which |
| 848 | is I think it closes your tabs overnight or something. And Arc is what you work |
| 849 | on now and is a Chromium-based browser. So for embedders of Chromium, let's say |
| 850 | the browser kind, how much control do you have over how processes are used, |
| 851 | allocated, if you embed content? Like, are you able to just say, oh, I don't |
| 852 | want a network process. I will just put this all in the browser process. Can |
| 853 | you do that? |
| 854 | |
| 855 | 62:07 DARIN: Hm. You can do anything you want. It's just code. No, but as a |
| 856 | browser embedder, as a Chromium embedder, you're shipping Chromium. So Arc |
| 857 | browser ships a copy of Chromium. And Arc browser includes changes to Chromium |
| 858 | as needed to make it work. Of course, that's possible. Of course, you could |
| 859 | change a lot of stuff and make a big headache to manage it all, right? So |
| 860 | there's some natural limits. You don't want to change too many things, or else |
| 861 | you won't be able to really manage it going forward. You want to take updates |
| 862 | from the mainline, incorporate improvements, but you also want to preserve some |
| 863 | differences that you've made. Well, how do you do that? And so change |
| 864 | management is a challenge. So there's a natural limit to how much you want to |
| 865 | alter the base functionality. Instead, it's - anyways, the product like Arc is |
| 866 | not so much differentiating on the basis of Chromium code or content layer. |
| 867 | It's not really its purpose or goal. Its purpose is to differentiate at the UI |
| 868 | layer and with things like what you mentioned and other things as well. Yeah, |
| 869 | and so, of course, if one were to go down the path of could we optimize process |
| 870 | model better, that would be in the realm of things that would be great to |
| 871 | contribute to Chromium, so that it could be part of the mainline and therefore |
| 872 | not be something that you have to maintain yourself. That's how I would |
| 873 | approach it as a Chromium embedder. |
| 874 | |
| 875 | 63:47 SHARON: OK, that makes sense. Yeah, if it's in Chromium, you don't have |
| 876 | to worry about the updates, and you just get - |
| 877 | |
| 878 | 63:53 DARIN: Turns out there's an army of engineers who would make sure it's |
| 879 | never broken. You just gotta write some tests. |
| 880 | |
| 881 | 63:59 SHARON: Oh, wow. |
| 882 | |
| 883 | 63:59 DARIN: [INAUDIBLE] those tests. |
| 884 | |
| 885 | 64:05 SHARON: So with non-browser embedders of Chromium, like, say, Electron, I |
| 886 | don't know how familiar you are with that, but they presumably would have |
| 887 | different needs out of how Chromium works, basically. I don't know if you know |
| 888 | what they're doing with any processes. |
| 889 | |
| 890 | 64:25 DARIN: I mean, I've used VS Code. That's a famous example of a Chromium |
| 891 | embedder that you might not realize is using Chromium or built on top of it, |
| 892 | that one might not realize that. But if you open up Task Manager and you look |
| 893 | at VS Code, you'll see all the glorious processes under there. And so have they |
| 894 | or Electron or any of these, have they altered things there? Maybe. I mean, |
| 895 | there's some configuration one might do. If you're building an application |
| 896 | that's very single purpose, like VS Code or Slack or - what are some other good |
| 897 | examples, there's quite a few that are built on top of Chromium - they're more |
| 898 | single purpose towards a single app, right? Of course, VS Code is pretty |
| 899 | sprawling with all the things you can do in it, but at the same time, it could |
| 900 | be the case that they don't have the same security concerns. They don't have |
| 901 | the same idea of hosting content from so many different sources. So maybe they |
| 902 | would tune the process model a little differently. Maybe they would decide, I |
| 903 | don't really need as many processes because I'm managing things in a different |
| 904 | way. It's not a browser. |
| 905 | |
| 906 | 65:34 SHARON: Yeah, you're not handling all of the untrusted JavaScript of the |
| 907 | web that you have to be - |
| 908 | |
| 909 | 65:42 DARIN: Right, I'm not so worried about this part of my application dying |
| 910 | and then wanting to keep the rest of it still running or something because that |
| 911 | would still be considered a bug because part of my app died. And so some of the |
| 912 | reasons for multi-process architecture might be a little different. |
| 913 | |
| 914 | 66:01 SHARON: Right. And more just for fun, having worked on now an embedder of |
| 915 | Chromium, how has that experience been in terms of decisions that were made |
| 916 | when you were putting together the multi-process architecture? Are there things |
| 917 | where you were like, oh, no, past me, if you'd done this differently, this |
| 918 | would be easier now. |
| 919 | |
| 920 | 66:20 DARIN: I would say I'm very thankful for Mojo IPC, made it very easy to - |
| 921 | one thing that I've found is that it's possible to do a lot of amazing things |
| 922 | on top of Chromium without actually modifying Chromium. And the Content API and |
| 923 | Mojo IPC makes a lot of that really possible. So it's a very flexible system. |
| 924 | There's a lot of really great hooks that let you interact with the system all |
| 925 | the way from extending the renderer to extending the browser. And to be able to |
| 926 | build stuff and layer it on top of a stable system is amazing. When I was |
| 927 | working on building an Android browser, I built a tracking prevention ad |
| 928 | blocking system for Android and was able to do it without modifying Chromium. I |
| 929 | thought that was amazing. |
| 930 | |
| 931 | 67:19 SHARON: How are you using Mojo? Because Mojo is typically going between |
| 932 | the processes. So if you're not really changing how the processes work, what do |
| 933 | you use Mojo for? |
| 934 | |
| 935 | 67:26 DARIN: Oh, well, in that case, it was used to communicate a rule set down |
| 936 | to the renderer. And then at the renderer level, I would inject a stylesheet to |
| 937 | do content blocking or to apply a network filtering at the link layer. So there |
| 938 | are a combination of Blink Public APIs and Content Public APIs. There are |
| 939 | actually enough hooks to be able to filter network requests and insert |
| 940 | stylesheets that would apply display none to a set of DOM elements. So but to |
| 941 | do that efficiently, it was necessary to bundle up those rules into a blob of |
| 942 | memory that you would just send down to the renderer process, to all render |
| 943 | process, so it'd have it available to them so they could just directly inspect |
| 944 | like a big hash map of rules. And so being able to - like I said before, when |
| 945 | the IPC system is just like - when it's decoupled like that with Mojo, it makes |
| 946 | it possible to kind of graft on these systems that they interact with APIs over |
| 947 | here, and that endpoint talks to some endpoint over here in the browser |
| 948 | process, which can have, like I said, like a rules data that it might want to |
| 949 | send over and that kind of thing. And so being able to build those kinds of |
| 950 | systems, and I think if you look at just how a lot of features in Chrome are |
| 951 | built, they're built very similarly, too. They build on top of the Content API |
| 952 | that provides the various hooks. They build on top of Blink API. Sometimes a |
| 953 | feature needs to live in the renderer and the browser process. Like autofill is |
| 954 | always the classic example of this early on in Chrome or password manager. |
| 955 | These are systems that need to crawl the DOM. They need to poke at the DOM. |
| 956 | They need to understand what's there. They need to be able to insert content or |
| 957 | put overlays in, or they need to be able to talk to the browser where the |
| 958 | actual database is, all that kind of stuff, and looking at different load |
| 959 | events and various things to know in the lifecycle of the page. So, yeah, I'd |
| 960 | say I'm thankful for a lot of these design choices along the way because I |
| 961 | think it's led to Chromium being so useful to so many people in so many |
| 962 | different ways. Obviously, it empowered building a really great browser and a |
| 963 | really great product, but it also has empowered a lot of follow-on innovation. |
| 964 | And I think that's pretty cool. |
| 965 | |
| 966 | 69:53 SHARON: It is pretty cool. So Chrome was released in 2008. It is |
| 967 | now 2023. So as math tells, it's been 15 years. We like numbers that end in 5 |
| 968 | and 0. So - I don't know - it's very cool. I remember when Chrome came out. And |
| 969 | I don't know. Do you have any - |
| 970 | |
| 971 | 70:08 DARIN: Yeah, for me, it's more like 17 years because we started in 2006. |
| 972 | |
| 973 | 70:14 SHARON: Right. So do you have any general reflections on all the stuff |
| 974 | that's changed in that time? |
| 975 | |
| 976 | 70:22 DARIN: It's wild. I have a higher density of memories from the early |
| 977 | days, too. It's amazing. I guess that's how memories work when everything's new |
| 978 | and changing so much. But yeah, no, I'm very thankful for the journey and very |
| 979 | thankful to have been part of it. And it was a lot of fun to work on. I mean, |
| 980 | prior to Chrome, when I was working on Firefox, I did a little exploration on |
| 981 | adding like a multi-process thing to Firefox, which I thought - just, I was |
| 982 | learning about how to do IPC, and I was learning - but I was doing it for what |
| 983 | purpose back then. I think I was just toying around with DCOM. I don't know if |
| 984 | anybody knows what COM is, but Microsoft's Component Object Model that was like |
| 985 | all the rage back then. And it allowed for like integrating different languages |
| 986 | together. WinRT is all built on top of this stuff now. But anyways, Mozilla had |
| 987 | its own version of COM called XPCOM. And wouldn't it be cool if you could have |
| 988 | a component that - so you could have components back then that were built in |
| 989 | JavaScript, and you could talk to them from C++, or they were built in C++ more |
| 990 | commonly, and you talked to them from JavaScript. But wouldn't it be cool if |
| 991 | one endpoint could be in another process? So that was something I was playing |
| 992 | around with in 2004 when I was still working on Firefox. And then when Chrome |
| 993 | opportunity came along - maybe that was 2005 - I don't know. But when the |
| 994 | Chrome opportunity came along, I was like, all right, let's do it. IPC channel |
| 995 | was basically those ideas, but kind of more polished slightly. |
| 996 | |
| 997 | 72:02 SHARON: OK. Yeah, very cool. I mean, when I first started working on |
| 998 | Chrome stuff, someone on my team said, any time you change something in base, |
| 999 | that pretty much is going to get run anytime the internet gets run, which I |
| 1000 | thought was super crazy for just some random software engineer like me to be |
| 1001 | able to do, right? But - |
| 1002 | |
| 1003 | 72:20 DARIN: And now it's even more than that if you think about [INAUDIBLE] |
| 1004 | code and [INAUDIBLE].. |
| 1005 | |
| 1006 | 72:20 SHARON: Yeah, all the stuff. So do you ever just think about it, and |
| 1007 | you're just like, oh, my god, wow. |
| 1008 | |
| 1009 | 72:26 DARIN: Yeah, it's pretty amazing. |
| 1010 | |
| 1011 | 72:31 SHARON: So crazy. |
| 1012 | |
| 1013 | 72:31 DARIN: It is one of the special things about working on Chromium, is |
| 1014 | that, yes, you can have such an amazing impact with the work that you do there. |
| 1015 | |
| 1016 | 72:38 SHARON: Have there been any cases - these are just now unrelated |
| 1017 | miscellaneous questions. But in terms of surprising usages of Chromium, be it |
| 1018 | like maybe the base or the net stack or something, have there been any cases |
| 1019 | where you were really surprised by like, oh, this is being used here? |
| 1020 | |
| 1021 | 72:56 DARIN: Well, for sure, the first time I heard about Electron, I was like, |
| 1022 | oh, this is not a good idea. House of cards, you know? It just seems like it's |
| 1023 | such a complicated system to build your app on top of, right? But at the same |
| 1024 | time, I totally get it and appreciate it, and I understand why people would |
| 1025 | reach for it. There's so much good sauce there, so much good stuff and so |
| 1026 | many - there is a lot of really good infrastructure there to build on. Early |
| 1027 | on, I kind of imagined more that things like Skia and V8 and some of the other |
| 1028 | libraries would be the thing that people would make lots of extra use out of, |
| 1029 | right? So I didn't quite imagine people taking the browser's framework like |
| 1030 | this. And we absolutely didn't build it with that purpose. Pretty much every |
| 1031 | choice along the way was highly motivated by making Chrome team's life better. |
| 1032 | Like, Content API was, when we came to the realization we needed it, it was |
| 1033 | like we desperately need it. Just the complexity of Chrome was getting |
| 1034 | unwieldy. We needed to cleave part of it and say, that is this part. We needed |
| 1035 | to somehow draw a line in the sand and say, this is the set of concerns over |
| 1036 | here. And so the idea that all of this could be used for other purposes is |
| 1037 | cool, but it was never really in the initial cards. And I came from working on |
| 1038 | Mozilla, which was, in many ways, browser construction kit first, product |
| 1039 | second. So Chrome was very much like, let's go the other extreme - product |
| 1040 | first, maybe a platform later. And to see it be this platform now is pretty |
| 1041 | cool. But it's pretty far from where we started. |
| 1042 | |
| 1043 | 74:50 SHARON: Yeah, kind of - I watched some of the earlier talks you gave |
| 1044 | about the multi-process architecture and Content, not Chrome, came up a bunch. |
| 1045 | And this is, things, I guess, like Electron are the result of that, right? |
| 1046 | Where - |
| 1047 | |
| 1048 | 75:01 DARIN: Yeah, it's pretty wild. Yeah, I mean, so Mozilla built this very |
| 1049 | elaborate system called XUL, or X-U-L, which was a XML language for doing UI. |
| 1050 | And it's very interesting, intellectually interesting, maybe different than |
| 1051 | XAML. XAML is way better probably in many ways. But XUL was kind of XHTML |
| 1052 | minus, minus, with a bunch of stuff added on for like UI things. And then it |
| 1053 | had this thing called XBL, which is a bindings language that you could do |
| 1054 | custom bindings. And so anyways, then you build your application in JavaScript |
| 1055 | and Firefox, Mozilla, it was all built this way. So it was like a web page |
| 1056 | hosting a web page. The outer web page was like this XML DOM. The product |
| 1057 | engineers working on that, in order to get some modern Windows sort of thing |
| 1058 | come through, they had to basically go through the rendering engine team to get |
| 1059 | them to do something. And so it really greatly limited the ability for product |
| 1060 | team to actually build product. And there were so many sacred cows around the |
| 1061 | shape of Gecko and how that structure was, that while this cross-platform |
| 1062 | toolkit seemed glorious at first, it ended up being handcuffs for product |
| 1063 | engineering, I think. So, yeah, Chrome started out with Win32 native UI for |
| 1064 | browser UI. You have all the choices you want to make, browser front-end |
| 1065 | engineers. You also have to build a lot of code, but no cross-platform |
| 1066 | toolkits. Views came later. |
| 1067 | |
| 1068 | 76:43 SHARON: Right. Well, this was great. Thank you very much. Normally, we do |
| 1069 | a shout-out section at the end. Do you have anything - normally, it's like a |
| 1070 | Slack channel or something like the Mojo Slack channel. I think in this case, |
| 1071 | it's maybe - I don't know if there is a specific thing, but is there anything? |
| 1072 | |
| 1073 | 76:57 DARIN: Shout-out to all the team and the engineers making everything |
| 1074 | great. |
| 1075 | |
| 1076 | 77:03 SHARON: All right. |
| 1077 | |
| 1078 | 77:03 DARIN: Yeah. |
| 1079 | |
| 1080 | 77:03 SHARON: Cool. Awesome. Well, thank you very much for chatting with us. |
| 1081 | That was super cool, lots of really interesting background and good |
| 1082 | information. So thank you very much. |
| 1083 | |
| 1084 | 77:15 DARIN: Yeah, a pleasure. Thank you so much for having me. |
| 1085 | |
| 1086 | 77:21 SHARON: Talk about threads, so IO, UI thread. |
| 1087 | |
| 1088 | 77:27 DARIN: Do I get credit for the confusingly named IO thread? |
| 1089 | |
| 1090 | 77:27 SHARON: OK, all right, we can cover that. That's cool. Yeah, why is it |
| 1091 | called IO thread when it doesn't do IO? |