Blame - docs/transcripts/wuwt-e08-processes.md - chromium/src

blob: 41f27b3fa409a03253abac995eb285d3f8732e2b [file] [log] [blame] [view]

Nigel Tao	187a479	2023-09-28 22:30:44	[diff] [blame]	1	# What’s Up With Processes
				2
				3	This is a transcript of [What's Up With
				4	That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
				5	Episode 8, a 2023 video discussion between [Sharon ([email protected])
				6	and Darin ([email protected])](https://www.youtube.com/watch?v=SD3cjzZl25I).
				7
				8	The transcript was automatically generated by speech-to-text software. It may
				9	contain minor errors.
				10
				11	---
				12
				13	Chrome has a lot of process types. What is a process? What are all the types?
				14	How do they work together? Today’s special guest to tell us more is Darin.
				15	Darin is one of the founding members of the Chrome team, and wrote the initial
				16	implementation of the multi-process architecture.
				17
				18	Notes:
				19	- https://docs.google.com/document/d/1uXF-ncJ98LWQMN7M3NA_2oYkVmW9Vzp0v-wkJaNpsDQ/edit
				20
				21	Links:
				22	- [Chrome comic](https://www.google.com/googlebooks/chrome/small_00.html)
				23	- [What's Up With Mojo](https://www.youtube.com/watch?v=at_35qCGJPQ)
				24	- [What's Up With Open Source](https://www.youtube.com/watch?v=zOr64ee7FV4)
				25	- [What's Up With //content](https://www.youtube.com/watch?v=SD3cjzZl25I)
				26	- [Life of a Process](https://www.youtube.com/watch?v=5im7SGmJxnA)
				27	- [Chrome Compositing](https://chromium.googlesource.com/chromium/src/+/HEAD/docs/how_cc_works.md)
				28	- [Site Isolation papers by Charlie](https://charlesreis.com/research/publications/)
				29
				30	---
				31
				32	00:00 SHARON: Hello, and welcome to "What's Up With That," the series that
				33	demystifies all things Chrome. I'm your host, Sharon, and today, we're talking
				34	about processes. There are so many process types in Chrome. How do they form
				35	the multi-process architecture? What exactly is a process? Here to answer all
				36	of that and more is today's special guest, Darin. Darin was one of the founding
				37	members of the Chrome team and pretty much did the first implementation of the
				38	multi-process architecture, so it is well-suited to answer all of this. Plus,
				39	created the IPC channels that Chrome started with. If you want to learn more
				40	about IPC and Mojo, check out the last episode with Daniel for lots more on
				41	that. So hello. Welcome, Darin. Welcome to the show. Thanks for being here.
				42
				43	00:38 DARIN: Thank you. Great to be here.
				44
				45	00:38 SHARON: Yeah, cool. So first question, what is a process?
				46
				47	00:44 DARIN: Right, so process is the container in which applications run on
				48	your system. Every process has both its own executing set of threads, but it
				49	also has its own memory space. That way, processes have their own independent
				50	memory, their own independent data, and their own independent execution. The
				51	system is multitasking across all of the processes on the system.
				52
				53	01:13 SHARON: Cool. Chrome is basically an operating system that runs on top of
				54	your operating system. So there probably are parallels between Chrome's
				55	representation of a process and the actual operating system ones. So what are
				56	the similarities and differences, and how do they interact?
				57
				58	01:30 DARIN: Well, yeah, I mean, you can talk about a lot of different things.
				59	I mean, so Chrome is made up of multiple processes. We run different tasks in
				60	different processes. That's done for multiple reasons. One is so that they can
				61	run independently, so that there's performance benefits that come from the fact
				62	that they're running independently. Back in the day, the original idea was that
				63	it would allow us to take advantage of the operating system's preemptive
				64	multitasking that it already has and to actually allow web pages to run
				65	concurrently and to be managed just like any other concurrent task that the
				66	operating system would manage. So that's the original idea there. And in that
				67	way, this model of Chrome divided into multiple processes just allows the
				68	Chrome itself and all of the tasks that it has to really take advantage of
				69	multi-core systems so that if you have more computing power, if you have more
				70	cores, you have more hyperthreading going on in your system, then it's possible
				71	for more things to happen concurrently. And Chrome's workload can be spread out
				72	that way because Chrome is broken into all of these different processes and all
				73	of these different threads. In that way, it's taking advantage of and mirroring
				74	the capabilities of the OS and providing that as a substrate for web and for
				75	browser and for how all these things work. How Chrome then has to be similar is
				76	that also, like an OS, Chrome has to manage all this stuff. And from simple
				77	things like how much resource should a background tab be using, should its
				78	timers be running when it's in the background, to much more complicated things
				79	when you talk about even should a process stay alive or not. If you look at
				80	Chrome OS where system resources can be so limited, it's necessary, or on
				81	mobile, necessary to terminate some of those background processes to close some
				82	of those tabs behind the scenes, even if the application makes it look like
				83	those tabs are still open. So the level of management is a big part of - in
				84	that way, it's being kind of like an OS.
				85
				86	03:42 SHARON: Is Chrome's representation of a process, are those generally
				87	one-to-one with a system process, depending on which system you're on -
				88
				89	03:48 DARIN: Absolutely.
				90
				91	03:48 SHARON: or is that an abstraction layer?
				92
				93	03:55 DARIN: No, well, absolutely when we talk about a process in Chrome, we
				94	mean an OS process. And so we might have multiple web pages being served by
				95	that single renderer process. We do try to spread the load across multiple
				96	processes, but we also independently decide how many processes to actually
				97	create. And it can be based on - there could be good reasons from, like I said,
				98	a performance perspective to having tabs assigned across multiple processes,
				99	but there can also be good security properties, like letting the web pages be
				100	allocated to different processes means that those web pages are not running in
				101	the same process, meaning they're not running in the same address space. And
				102	from a security perspective, that has really great properties because it means
				103	if a web page is able to tickle a bug in the rendering engine in the V8 or in
				104	part of Blink and somehow get a privilege escalation, like start to be able to
				105	do things that JavaScript normally can't do, it's still going to be limited by
				106	the capabilities of that process and what it has access to. And so if that
				107	process has really only the data for the web page that was providing the
				108	problematic JavaScript, well, it's not really getting access to anything it
				109	didn't already have. And that's kind of the whole idea of process isolation and
				110	sandboxing. And then on top of that, you limit the capabilities of that process
				111	by really leveraging the OS process primitive and the kinds of restrictions and
				112	capabilities that can be removed from that process to achieve an isolation for
				113	web pages for an origin or for a set of web pages. I say set because we might
				114	not want to allocate a process for every single tab or for every single origin
				115	because that might just use up way too many system resources. So we have to be
				116	thoughtful there, too.
				117
				118	05:50 SHARON: Yeah, so this is quite closely related to site isolation, which
				119	isn't the topic of this video - maybe the next one. So terms that are used
				120	often and sometimes interchangeably are multi-process architecture and process
				121	model. So these aren't exactly the same thing, but I think can you explain the
				122	difference between them and what each one is for? Because there are
				123	similarities, but.
				124
				125	06:16 DARIN: Sure. I mean, I think to me, the phrase "process model," it's
				126	talking about, what does a particular process represent, what does it do. And
				127	then when I say multi-process architecture, I'm thinking of the whole thing.
				128	It's all packaged up. It's a multi-process architecture to build a browser. At
				129	the end of the day, user is hopefully not so aware of the fact that this is how
				130	it's built. I mean, earlier on in Chrome's history, the Windows Task Manager
				131	didn't do a very good job of grouping processes by their parent. And so if you
				132	opened the Task Manager at the OS level, you'd see just a spew of processes
				133	that Chrome was responsible for. And it could be a little disconcerting for
				134	people. A little tangent, but now more modern versions of Windows, they do kind
				135	of group it all to the parent task. And so it's a little easier and less sort
				136	of in-your-face that Chrome is creating all these processes. But yeah, at the
				137	end of the day, it's just the multi-process architecture is like that's the
				138	embodiment of the whole thing. And we have these different process types that
				139	make up that whole thing. There's the browser process, the main one, and then a
				140	renderer process is the name we give to the processes responsible for running
				141	web pages. And then we have a few other process types that are part of the
				142	puzzle, a networking process, a GPU process, utility process, and occasionally,
				143	in the lifespan of Chrome, other types of processes. We had plugin processes,
				144	for example, when we were hosting Flash in Chrome. And the Native Client had
				145	its own type of processes as well. So what's that all about? Really, I can go
				146	into it if you want me to go into all the details there. But -
				147
				148	08:05 SHARON: Yeah, I think we'll run through - this is a, yeah, perfect segue.
				149	We'll run through each of those process types you just mentioned and mention a
				150	bit about what they do, how much privilege they have, maybe how many of them
				151	there are because some of them, there's only one of. So I think it makes sense
				152	to start with the browser process, which is the process and is often likened to
				153	the kernel in an operating system.
				154
				155	08:30 DARIN: Yeah, so the browser process kernel operating system broker, these
				156	are kind of good analogies for what the browser process's role is. So it's the
				157	application process, the main one, that starts up initially, and it's the one
				158	that hosts the whole UI of the app. And it's going to spawn these child
				159	processes, the renderer processes, the GPU process, and so on, to help fulfill
				160	its goals. So very early on, we started with this design where WebKit, the
				161	rendering engine we were using from Apple, it could be built as a COM control
				162	and register it on the system and load it as a DLL. And then in order to run
				163	that in a child process, it was using HWNDs and all the standard Win32 isms to
				164	do its job. And we started out by just literally trying to capture a bitmap
				165	rendering of WebKit and send it over to the browser process where we could
				166	present that bitmap. Actually, rewind even further. The very first version took
				167	advantage of the fact that Windows supports having HWNDs hosted in different
				168	processes and threads. And so we literally just took that HWND from WebKit and
				169	that child process and stuck it into the window hierarchy of the browser
				170	process. And we drew our browser UI around it, and WebKit was there, but it was
				171	running in a different process. And if we ever needed to tell that process to
				172	do something, we just send a WM user event postmessage to it. And that's
				173	something Windows lets you do. So it felt like a very simple toy kind of way to
				174	try it all out. A lot of limitations to that design. Pretty quickly, we
				175	realized we didn't want to just be in that kind of setup, and we moved to
				176	building our own IPC channel, a pipe, so that we could communicate and really
				177	get to the point where WebKit's running there without an HWND, without its own
				178	Win32 windowing constructs, but instead, it's just kind of an image generator.
				179	And we take the image that it generates, the bitmap, send it over our IPC
				180	channel to the browser process. And the browser process is where we have our
				181	window hierarchy browser process. We display that bitmap browser process where
				182	we collect user input and send it to the pipe to the renderer where we then
				183	feed it into WebKit.
				184
				185	10:46 DARIN: That was the original architecture of Chrome. So in that world,
				186	the browser process is your application process. It has all the UI. And it's
				187	really like this glorified image viewer. And the renderer process is literally
				188	just like it's running WebKit - now Blink. It's running the rendering engine,
				189	and it's producing those images whenever. Like, an update occurs. A layout
				190	occurs or some invalidation occurs. And we got a little fancy. It was producing
				191	just the sub. It would know, oh, I really only have a small damage rect, so I
				192	don't have to produce the whole image. I just produce a small part. And send
				193	that over, and then we paint that into the part that the browser is retaining
				194	an old image of. And it can update just that one part. And so that's a very
				195	simple approach that we took when building this whole thing. And so those
				196	render processes become very much just very simplistic in that they aren't
				197	interacting with the rest of the OS in a very deep way. They are just taking
				198	input events from this pipe and sending images back. When they need other
				199	services like they need network access, instead of going straight to the
				200	network from the renderer process, because we started to realize, hey, we might
				201	want a sandbox and restrict those child processes, and also, we needed the
				202	notion of cookie jar that was shared across all web pages, so that if you visit
				203	GMail in one tab and visit GMail in another tab, you're still logged in, we
				204	needed the network stack to be in a unified place. So it meant that not just
				205	would we send images up to the browser, but now we would send network requests
				206	to the browser. And the browser would respond with the network data. And as a
				207	result, we started to go down this path of centralizing access to system
				208	services and resources in the browser process.
				209
				210	12:44 DARIN: It's becoming therefore like a broker to the system that the
				211	renderer now is unable to - not unable - it's asking the browser for everything
				212	it needs. It's communicating to the browser to get access to all the different
				213	resources. And that allowed us to then restrict the renderer process
				214	considerably so that it doesn't even have access if it wanted to touch the file
				215	system, to touch the network TCP/IP implementation or any system resources. So
				216	the sandbox really is all about how we apply those restrictions, taking away
				217	the capabilities of a windows process. So in the very early days, there was
				218	just the browser process and renderer processes. And we would allow multiple
				219	renderer processes to be created as tabs were opened. And we put some
				220	restriction on the number of processes based on the amount of RAM that your
				221	system would have, thinking that processes maybe have some inherent overhead,
				222	which they do. Certainly, there's the overhead of the V8 heap that is allocated
				223	once per process or once per isolate, if you're familiar with the details of
				224	V8. And so, we didn't want to have so much of that kind of - so we thought
				225	there was some limit to how many processes we should have. Later on, other
				226	processes types started to emerge. The next one that came was the Plugin
				227	process because in order to get YouTube to work back in 2006, you needed to
				228	support Flash. And Flash has two modes - it did. It had a windowed mode and a
				229	windowless mode. And the difference is whether it drew itself into an HWND or
				230	if it would just produce a bitmap itself. But regardless of what mode it was
				231	rendering in, it still wanted direct system access, like it wanted to touch the
				232	file system. And so if we were going to run it in our browser, it can't run in
				233	the renderer process. It has to run somewhere else. And so, yeah, in the frenzy
				234	of, gee, wouldn't it be nice if we could have sandboxing, it was, how the heck
				235	are we going to sandbox and isolate plugins? Because the way plugins integrated
				236	with WebKit is that WebKit just directly called into them and said, hey, if
				237	it's a windowless one, give me your bitmap. I'm going to include it in my
				238	rendering. If it's a windowless one, it also means it's dependent on WebKit to
				239	feed it events. And so, how does that work? So we ended up building a process
				240	type called the Plugin process type for NPAPI plugins, Netscape-style plugins,
				241	all stuff that doesn't exist anymore. It's wonderful. And NPAPI is this
				242	interface that was once upon a time, I want to say, kind of, like - my head is
				243	going to some unsavory words. It was kind of pooped out by somebody at Netscape
				244	to make Acrobat Reader work over the weekend. And then it became a stable API.
				245	And lots of regret and sadness probably followed, but as a result, things like
				246	Flash were created, and web became very interesting in some ways. A wonderful
				247	story about Flash, I think.
				248
				249	16:02 DARIN: But anyways, supporting that stuff meant dealing with some gnarly
				250	frozen APIs and figuring out how to stitch all that together, and the renderer
				251	process of WebKit would talk to something that wasn't actually in its process
				252	that was - or, again, another IPC channel, running a whole other process. We
				253	wanted plugins to still not run in our browser process, but to, instead, run in
				254	their own process so that if they crashed, they wouldn't take down the whole
				255	browser. And Flash and other plugins were notorious for crashing. So it was a
				256	must that they run in their own process. But we figured they couldn't be
				257	sandboxed as tightly as the renderer as WebKit because they already were
				258	accessing the system in very deep ways.
				259
				260	16:55 SHARON: Cool, lots of -
				261
				262	16:55 DARIN: Lots more processes got added later, like the networking, the GPU
				263	process, and NaCl. I can tell the story about those, too, if you're interested.
				264
				265	17:08 SHARON: Oh, sure. Yeah, let's hear it.
				266
				267	17:08 DARIN: OK, so 2009 era, I think, maybe 2010 - I don't know - somewhere
				268	along the way, we started building Chrome for Android. And you might recall I
				269	described how the renderer was really kind of a glorified image viewer, or the
				270	browser, browser was sort of an image viewer and the renderer's job was to
				271	produce a bitmap. And then we send it over to the browser, the browser would
				272	draw the bitmap. Mobile systems were not going to work very well if this is the
				273	way the drawing was going to work. If you think about how scrolling works or
				274	worked back then, scrolling a web page back then meant telling the computer to
				275	please memmove all the pixels, and then to draw another bitmap where pixels are
				276	not existing yet and need to be drawn. So you do a memmove followed by a
				277	memcpy. And so this is how original Chrome was built. If you were scrolling, it
				278	would be, oh, we need to shift pixels, and here's the bitmap. We need to stick
				279	in the part that's exposed. Do that all quickly, and do it over and over again.
				280	And that kind of operation is just not good if your goal is like nice
				281	responsive scrolling on a touch screen. Instead, the way mobile systems were
				282	built is using GPU rendering and compositing engines powered by GPUs, so that,
				283	instead, you are offloading a lot of that work to the GPU. So it was necessary
				284	to restructure Chrome's rendering pipeline for mobile, at least. But because we
				285	were doing that, we can also take advantage of it on desktop. Meanwhile, we
				286	were also on desktop starting to invent things like WebGL. Initially, WebGL,
				287	the precursor to that was this plugin called O3D, which is a 3D graphics plugin
				288	using the wonderful plugin APIs that I talked about before. But it provided
				289	this way to have 3D graphics scenes and build immersive kind of 3D content.
				290	That team, at some point, switched their sights on how to make that a standard
				291	through WebGL. Wonderful stories around that. But it also entailed figuring out
				292	how to do OpenGL, essentially, because WebGL was just OpenGL ES, and how to do
				293	that from a renderer, from that blink child process, how to do it there. And
				294	really, that meant that, OK, this process is going to be - these sandbox
				295	renderers are going to be generating a stream of GL commands. Where do they go?
				296	What do we do with that? And also, we know that it's possible to write shaders
				297	and possible to write GPU commands that can really wreck - can cause havoc, can
				298	be problematic, can cause the system to crash your process. So we don't want
				299	that happening in the browser process because we want the browser process to
				300	stay up so it can [INAUDIBLE] the manager.
				301
				302	20:21 DARIN: So the GPU process was born. This will be the process that
				303	actually talks to the OpenGL driver or DirectX under the hood via ANGLE on
				304	Windows. And so now, we set up another pipe from the renderer over to the GPU
				305	process, and the stream of GL commands are being sent over there. And over
				306	there, it's talking to the driver. And if you sent something bad, driver is
				307	going to say no bueno and crash your process. And we would find that the
				308	browser would see the GPU process died, and it would maybe give you a warning
				309	or let you reload the page, and it will try again. As that's done, that's how
				310	we therefore were able to leverage processes to give us that isolation, but
				311	also give us that robustness, give us that capability. And that led to a lot of
				312	complexity, but also a lot of really amazing sophistication around the
				313	compositing engine. Chrome CC library was born subsequently, and all these
				314	things that have led to the modern way that we render the web on Chrome now.
				315	Skia learned how to render to OpenGL, et cetera, and the GPU process.
				316
				317	21:35 DARIN: Next one came along was the network process, which was really born
				318	out of the idea of, gee, wouldn't it be nice to isolate the networking code
				319	into its own process that could be more tightly sandboxed? Because the
				320	networking stack tends to be a surface area that's accessible by attackers.
				321	Just like the V8 and JavaScript engine is parsing lots of stuff and very
				322	exposed to attack surface from would-be attackers, the network stack, same
				323	thing. You've got HTTP parsing and various other kinds of processing happening
				324	very close to content that attackers can control. And so this project, quite
				325	rather elaborate project to move the networking stack out of the browser
				326	process out of that broker process, but to, instead, its own process and have
				327	all the pipes go various IPC channels connecting to there, instead, was born.
				328	And I think this was more born in the era of Mojo IPC, where we had a more
				329	flexible IPC system that could help support that kind of transition, but still
				330	tons of work and quite a radical change to the flow of data and the way the
				331	system works. Previously, just to give a little aside, when a renderer is
				332	making a network request, the browser process acting as a broker needs to
				333	audit, is it OK for that guy to be requesting this thing? Think about all the
				334	kinds of rules that might be there, CSP, other kinds of things, and the
				335	security origin privileges associated with it and what we want to allow a
				336	renderer to actually access. Simple stuff like we support WebUI like Chrome
				337	colon pages in the context of, they load in a renderer process, that renderer
				338	process should be allowed to access other things from Chrome colon, right? But
				339	a web page shouldn't be able to. We don't want the arbitrary web pages to be
				340	poking around and seeing what's available in the Chrome colon URL. So that's
				341	like a simple example of where we honor that isolation. And so the browser
				342	process, having the network stack in the original incantation of Chrome makes
				343	no sense. It can apply these rules right there. Safe browsing was integrated
				344	there. Lots of different kinds of network filtering could be done there. Moving
				345	that to another process was a big change because now browser is the one that
				346	has the smarts to do auditing, but the data and all the requests are going to
				347	this other process. So making that work meant a lot more plumbing. And I think
				348	complexities ensued. But it's awesome to see it happen.
				349
				350	24:20 DARIN: Anyways, I mentioned Native Client. So that was a precursor to
				351	Wasm that was a big investment by the Chrome team to find a way to bring native
				352	code to the web in a safe, secure manner. The initial take on it was, if you're
				353	running native code that came from the web on a system, that's scary. It could
				354	do like anything, right? Well, no, let's restrict the process capabilities, but
				355	even with a restricted set of capabilities, you can't necessarily restrict
				356	everything on Windows or Mac or Linux. There's always some limitation to the
				357	sandbox capabilities. And in many ways, the sandboxes that we implemented are
				358	kind of just an extra level of defense. If you think about it, the JavaScript
				359	Engine is already a sandbox, right? It already limits the capabilities. The web
				360	rendering engine, all the different kinds of security checks throughout the
				361	code are various forms of sandboxing. And then finally, the process in the way
				362	we restrict its capabilities is that next last defense. Well, running native
				363	code with only that last defense in place is not enough. So Native Client was
				364	designed to be not only to be native code that could be highly auditable, so
				365	that you could make sure that it's not allowed to jump to an address that it
				366	doesn't have code for, that it's not allowed to do things outside the set of
				367	things that it's allowed to do. So it had a lot of complexity as well in terms
				368	of how the process has to be set up in terms of the memory layout and various
				369	other details, which maybe I'm happy to not remember. And - but it meant it
				370	needed its own process type. Even though it integrated kind of like a plugin,
				371	it couldn't just be a plugin. It needed its own process type. And there had to
				372	be 64-bit variants and 32-bit variants, depending on the actual OS, actual
				373	underlying hardware that you were running on Arm versus Intel, all these
				374	differences. So yeah, we ended up with leveraging this process model
				375	extensively to enable these kinds of things.
				376
				377	26:32 DARIN: I think I mentioned the utility process. In Chrome, the utility
				378	process is this thing you reach for when you want to do something that's
				379	potentially - like maybe you're dealing with some untrusted input, like you
				380	want to decode an image, or you want to run something in a process, and you
				381	just want to make sure that if it's going to do anything, it just dies over
				382	there and doesn't take down the whole browser process. I think some extension
				383	install manifest parsing, maybe various other kinds of things like that, would
				384	happen in a utility process as like a safety measure. Generally speaking,
				385	parsing input from the web or even the Web Store or things like that, doing
				386	that parsing in the browser process is a scary thing because you're taking
				387	input from a third party. And if you're parsing it there, you might have a bug
				388	in your parser, and that could lead to the most trusted process having been
				389	compromised.
				390
				391	27:29 SHARON: Yeah, that falls into the whole Rule of Two thing, right, of
				392	untrusted data. We have a [INAUDIBLE] process. It's in C++. The thing that we
				393	decided to change is where it gets parsed, so.
				394
				395	27:44 DARIN: That's right.
				396
				397	27:44 SHARON: That makes sense.
				398
				399	27:44 DARIN: Yeah, so the sandbox processes get used as this primitive to give
				400	us that extra safety measure.
				401
				402	27:57 SHARON: So the other process type I can think of that wasn't just covered
				403	there was extensions. Is there anything to say there?
				404
				405	28:02 DARIN: Sure, of course.
				406
				407	28:02 SHARON: Of course.
				408
				409	28:02 DARIN: In some ways, an extension process will show up that way in
				410	Chrome's Task Manager, but I believe it's usually just powered by a renderer,
				411	an ordinary renderer, because so extensions have background pages or background
				412	event in, I guess, the Manifest V2, it was background pages. Manifest V3, it's
				413	now just event pages or service worker type construct. And those need a process
				414	to run in. So the extensions get to inject some code that runs in the renderer
				415	of the web page, usually in an isolated world, so it can see the same DOM. If
				416	you've given the permission for the extension to read website data or to
				417	manipulate website data, it can do that by injecting a content script that will
				418	run in the same process as the web page that it's reading or modifying. But it
				419	will run in an isolated JavaScript context so that it's not seeing the same
				420	JavaScript variables and such. But it can still see the DOM. And that's meant
				421	to give a lot of capability, but also have a little bit of protection because
				422	it's so easy to accidentally interfere with the same JavaScript variables and
				423	things like this. OK, so extensions have that piece that injects a content
				424	script, but they also have a - usually, they can have this event service worker
				425	or background page that is their central place, process place for code to run.
				426	And so we do run that in a renderer process. And so for example, if the
				427	extension that's injected into a page wants to get some capabilities, it would
				428	talk to its service worker, who would then have the capability to ask for
				429	certain extension APIs to maybe understand all the tabs that are in your
				430	system, depending on what permissions it was granted. And then finally, with
				431	extensions, you also have the extension button and a dropdown that can occur
				432	there, which a web page can be drawn there by the extension. And that's going
				433	to be hosted in a renderer process, too. But that would be a web page that
				434	lives at a Chrome extension colon URL. And so you have these different pieces
				435	of the extension model where code from the extension can be running, and it,
				436	via some messaging channel, can talk to the other parts of itself that run in
				437	potentially likely different processes.
				438
				439	30:37 SHARON: You mentioned service workers there, and those are kind of
				440	related to all this, too. So can you tell us a bit more about those?
				441
				442	30:43 DARIN: Yes, so - well, OK, so backing up, in the context of extension, if
				443	we talk about background page first, the original idea with extensions was, OK,
				444	I'm injecting stuff into pages so I can modify things, but I also need like my
				445	home base. I need my context where - I need a place where my persistent script
				446	is running or where I can manage my databases, and I have just one place for
				447	that. And it's also a place where I can get elevated permissions to access
				448	other Chrome extension APIs. So that idea of a background page that the
				449	extension can create that's ever present so it's like a web page, but it's
				450	hidden, it's in the background, and content scripts that are injected into web
				451	pages can talk to it. So they can say, oh, I'm on this page. Give me some rules
				452	that I should apply to it or something, depending on the nature of that
				453	extension. OK, so but background pages are, unfortunately, persistent. And they
				454	live for the whole life of the browser. And they use up memory. They use up
				455	resources, even if nothing else about the extension needs doing. Even if the
				456	extension is not loaded into any web pages, that background page is sitting
				457	there. And so this was [INAUDIBLE] quickly realized, this is not great. This is
				458	a waste of resources for the system. We should have some policy for how we
				459	should close that background page down and only need to create it when
				460	necessary. In the context of, I think, Chrome apps, which is a thing that's no
				461	longer a thing, we created this concept called event pages, which allowed for
				462	these background pages to be a little more transient, that come into being only
				463	as needed, which is a much more efficient approach.
				464
				465	32:28 DARIN: However, when it came time to bring that to extensions, at the
				466	same time, Service Worker had been created, which was a tool for web pages to
				467	be able to do background event processing. So the decision was to adopt that
				468	standards-based approach to how to do background processing. And so Service
				469	Worker is the construct that Manifest V3 allows extensions to use for that sort
				470	of background processing. Big difference between service workers are that they
				471	are not web pages. They're just JavaScript. But they can listen to different
				472	kinds of events. So just like a web worker, shared worker, service worker, they
				473	are without UI. They are without any HTML. They just have the ability to - but
				474	they have some functions that are given to them on the global scope that lets
				475	them talk to the outside world, to talk to the web page that created them, or
				476	in the case of Service Worker, they actually have events they can receive to
				477	handle network requests on behalf of the page. That's one of the main uses for
				478	them in the context of the web. A web page would have a Service Worker register
				479	it with the browser to say, hey, please contact my service worker if you are
				480	making a request for my origin. And that gives the Service Worker the
				481	opportunity to specify what content should be used to satisfy a URL. It could
				482	load that content out of a cache, and the Service Worker API includes APIs for
				483	managing caches and things like this. So all of that system that was built to
				484	kind of enable web pages to operate more robustly in the context of poor
				485	network connectivity or to get performance improvements for applications that
				486	are more single page applications that have a basic fixed shell that should
				487	load out of cache and then they make network requests to the server to get the
				488	data that populates some application UI, that model Service Worker was really
				489	designed for. But it seemed a very good fit for extensions. And it gets us out
				490	of the world of having these persistent extension background pages. So Manifest
				491	V3 says, if you want your content script to have access to privileged things,
				492	you go through a system, a Service Worker. And the Service Worker will get
				493	spawned in a renderer process. What renderer process? You don't know. It's up
				494	to the system. Chrome will make a decision there based on all of its usual
				495	rules around what other origins are in that process, thinking from a security
				496	isolation perspective, and so on, and so forth.
				497
				498	35:22 SHARON: Cool. A lot of these process types have been added over time as
				499	the need for them arises. Like, oh, we want to put network stuff in a separate
				500	process. So apart from adding more process types, what have been other big
				501	changes to the multi-process architecture and processes in Chrome in the many
				502	years since launch?
				503
				504	35:44 DARIN: The biggest one by far is the per site isolation, the site
				505	isolation work that was done.
				506
				507	35:51 SHARON: We'll talk about that more next.
				508
				509	35:56 DARIN: Yeah, so, I mean, well, I'll just say, so Charlie Reis was an
				510	intern on Chrome team back in the day during the pre-release period of Chrome.
				511	And I remember the conversations where we were like, gee, wouldn't it be nice
				512	if instead of isolating based on per tab, it was isolating per origin? And I
				513	think he was doing research on that topic, too. And he had all these ideas for
				514	this kind of a thing. And so it was really kind of very early on that we were
				515	having these conversations. But even very early on, it was like, this is going
				516	to be a big change, you know? No longer is it the idea that it's a big change
				517	to the rendering engine itself, like how frames could be served by different
				518	processes. So in order to isolate based on origin, you have to say a frame
				519	where an ad might live would actually have to be served by the process for that
				520	origin. And so now no longer is the whole frame tree just in one process.
				521	That's a big change. But built on top of the infrastructure we had, it was
				522	possible to imagine it, and it was quite a journey to get there. So that was
				523	probably the biggest change to the architecture. But like I mentioned before,
				524	actually, other big changes were definitely the introduction of the GPU
				525	process, definitely the introduction of Mojo IPC. Before Mojo IPC, the way
				526	things worked was, basically, messaging was much simpler, in some ways, easier
				527	to understand, but also much more the case that there were these files that
				528	really needed to know about everything in their world, like the render process
				529	host and the render process, the render view host and the render view, the
				530	render frame. The render frame host didn't exist then, but they came about
				531	because of site isolation, really. But the render view, render view host became
				532	this thing that represented the web page, and render view host in the browser,
				533	render view in the render. And for any feature that required brokering out to
				534	the browser to get access to something, essentially, the render view, the
				535	render view host had to be participants in that because they had to be kind of
				536	routers for that traffic. That's not very scalable. You start adding lots of
				537	engineers, building lots of different features that need lots of different
				538	capabilities. And these files start growing hairs and knowing about too many
				539	things. And it becomes really hard to manage.
				540
				541	38:38 DARIN: On top of that, you start to have things where you say, gee, I
				542	really wish this system could be live in a different process. I mentioned the
				543	networking process. All these events were coming through these different kinds
				544	of crossroads of hell files. That was how I liked to call them. And in order to
				545	take a subset of that and move it to a different process, now you have to redo
				546	all that plumbing. And so the amount of layers of repeating yourself for
				547	plumbing IPCs felt very out of control for - maybe how much work you had to do
				548	to unlock a certain feature just seemed out of control. And so Mojo really was
				549	inspired by how to eliminate a lot of that, to have a system that's more
				550	endpoint to endpoint-based and all the flow of data would no longer be
				551	dependent on all of these kinds of routing classes that handled all this
				552	routing. And instead, you could just say, I have an endpoint. I have an
				553	endpoint over here. This one's privileged. This one's not. And if I want this
				554	one to live over here, I can do that. I can just move it around freely. And all
				555	the routing is taken care of for me. And so that was a big change. And there's
				556	many artifacts in the code base that sort of reveal the old system, right? In
				557	many ways in which the product is built still resembles that old system. The
				558	idea that if you look at a render view, render view host, there's an ID, a
				559	routing ID associated with that. The concept of routing IDs are not needed in
				560	Mojo anymore because the pipe itself, the Mojo pipe is like an identifier, in
				561	some sense. Of course, so much of our system is built up around the idea that
				562	tabs have these render view IDs, and frames have render frame IDs, and
				563	processes have process IDs. And so many systems deal with those integers that
				564	it's been unthinkable to not have those anymore. But in some sense, they aren't
				565	really needed. If we were to build things from scratch from anew with the Mojo
				566	system, you wouldn't need it.
				567
				568	40:50 SHARON: Do you think if you were to start redesign the whole
				569	multi-process thing now, given how not just the internet is used, but also the
				570	devices that are out there, I think you would probably want to have multiple
				571	processes for things. But do you think there would be significant changes to
				572	how the system overall is designed or put together if one were to start now?
				573
				574	41:16 DARIN: Well, yeah, I mean, it's always a question of where you're
				575	starting from and what the constraints are that you're dealing with. We were
				576	dealing with taking WebKit, which we didn't really have a lot of ownership of.
				577	And it was open source, but we also had limited bandwidth to go and fork it and
				578	manage that fork. And so to kind of try to create multi-process in the context
				579	of this big significant piece that we really can't change or do much about
				580	definitely limited us. So we had early ambitions and ideas. Like I said with
				581	Charlie about site isolation, it wasn't going to be then that we could realize
				582	it. It needed to be in a place where we had ownership of Blink. And not just
				583	ownership, I mean capability to go and change it and to own the consequences of
				584	changing it, to be able to manage that. We needed that, and we needed a lot of
				585	other pieces. So if I'm starting over, I also have to - it's sort of like,
				586	well, what am I starting from, right? But certainly, I feel like a lot of
				587	lessons along the way inspired Mojo and the design there. And I feel like
				588	that's a system that that sort of system would allow for an architecture that I
				589	think would be better in many ways. And I'm very biased because that's
				590	something I've worked on, and it was inspired by things I saw that weren't
				591	great about the way that we built Chrome originally, although, in many ways,
				592	the original setup with Chrome was born of pragmatism and minimalist in many
				593	ways, trying to achieve - Chrome was very focused on being a product first, not
				594	a browser construction kit. And so the idea that it needed to morph into a lot
				595	of different things wasn't there in the beginning. In the beginning, it was,
				596	you're just building a browser for Windows XP Service Pack 2. That's it,
				597	nothing else. OK, now Vista. You got to worry about Vista, too, sorry. But just
				598	that's it. And then later on, you add Mac. You add Android. You had Chrome OS,
				599	iOS, Chromecast, et cetera, et cetera. And suddenly your world is very
				600	complicated, and the needs of this system is way more. And the value of
				601	malleability becomes higher. Look at the investment in views, et cetera, to
				602	allow cross-platform UI, and then Mojo to allow a much more flexible system
				603	under the hood. So it depends on your constraints in a lot of ways.
				604
				605	43:43 SHARON: Yeah, that makes sense. Something you said about even now in the
				606	code base, you can see remnants or suggestions of how obsessed maybe of how
				607	things used to be. So one of the things that makes me think of is about the IO
				608	and UI threads because I feel like people used to talk about those more. And
				609	now that's maybe changing a bit. So how come these are the only times we hear
				610	the term "thread," really, in all of this? And what are the IO and UI threads
				611	that can you just tell us a bit about?
				612
				613	44:20 DARIN: Oh, yeah, threading is a super fun topic. Now we have all these
				614	task runner concepts and systems for giving you a task runner that's on an
				615	isolated thread or whatever. And systems like Mojo allow you to not really have
				616	to do a lot of plumbing to compensate for your choice of thread where you want
				617	something to run. You can just indicate where it should go, and that happens.
				618	But OK, originally, the design of the system was there was a UI thread, and
				619	that's where all the UI lives. So the HWNDs, the Window handles and all the
				620	Win32 stuff would go there. Input painting come in there. Then there was - so
				621	early on, I like to tell this story because one of the very first versions of
				622	Chrome, we had just that UI thread sending data to a renderer processes. And
				623	the renderers would have their main thread where they ran JavaScript and
				624	everything. So there was just these two threads in two different processes.
				625	That was kind of it. In the browser process, there might have been the system
				626	was probably doing a lot of other stuff with its networking stack and DNS
				627	threads and such. But we weren't doing any. That wasn't us. That was probably
				628	libraries we were using. So we had these two threads in two different processes
				629	and IPC channel. And so you send the input down to the renderer. The renderer
				630	sends you a bitmap. OK, Google Maps. Imagine Google Maps. And imagine you're on
				631	a single core, non hyper-threaded laptop. And you take your mouse, and you
				632	click on that map, and you start dragging it around. And you expect to see the
				633	image tiles moving around, right? And but for some reason, in Chrome, on that
				634	device, [SNAP] nothing happens. You just move your mouse around, and the image
				635	is stuck there. You're like, what's going on? It works fine on this other
				636	laptop. Why not on this laptop? Turns out that on that device, in that setup,
				637	the input stream was coming in. And basically, we were sending all this input,
				638	and the input events were taking priority in the Windows Event pump over any
				639	painting and/or reading from our IPC channels. And so, as a result, we were
				640	just sending input events to the renderer. It was doing work, generating new
				641	images. Those images were coming to the browser and backed up in some pipe and
				642	not really being serviced, not really making their way. And so we kind of came
				643	to the realization of several things. One is, we need to throttle that input
				644	going to the renderer, but we also probably need to have some highly responsive
				645	IO threads that could be dedicated to servicing the pipes, the channels, the
				646	IPC channels, both in the browser and the renderer, actually. And so what was
				647	born from that was the IO thread. And the IO thread was meant to be highly
				648	responsive thread for processing asynchronous IO. That's really what its name
				649	should be - highly responsive, non-blocking IO thread - because the name IO
				650	thread subsequently confused lots of people who wanted to do blocking IO on
				651	that thread, like read a file or something. And we had to put in some
				652	restrictions in the code to always let you know not to - that this function is
				653	going to - there's certain runtime assertions if you try to use certain
				654	blocking IO functions in base on the wrong threads. And alongside that, we
				655	invented something called the file thread. Said, this is the thread where you
				656	read files. This is the thread where you write files because we don't want you
				657	doing that on the UI thread because the UI thread needs to be responsive to
				658	user input. So don't do blocking file IO on the UI thread. Don't do it on the
				659	IO thread either. Do it on the file thread. So -
				660
				661	48:14 SHARON: That means they're all running in the browser process.
				662
				663	48:20 DARIN: In the browser process. The renderer got its own IO thread, too.
				664	So the renderer would have its main WebKit thread and its IO thread. So it was
				665	sort of a symmetric system. You had IPC channel, which was wrapped with a class
				666	called `ipc_channel_proxy`. These things still exist in the code base. And
				667	ChannelProxy was a way to use an IPC channel from a different thread. But the
				668	IPC channel would be bound to the IO thread. All of those things I just
				669	mentioned still exist, and Mojo was built on top of those channels. But the IPC
				670	channel provides that underlying pipe. So it's kind of IPC channel is
				671	one-to-one with an OS pipe. Mojo has this concept of pipes which are more like
				672	virtual pipes, and they're multiplexed over OS pipe, over an OS pipe.
				673
				674	49:08 SHARON: OK. Yeah, because I think, yeah, now you hear non-blocking IO,
				675	but I feel like maybe it's just what part of the code base you work in. But
				676	running things, making sure things run on the right thread seems to be less of
				677	a problem than it used to be.
				678
				679	49:27 DARIN: Yes. I think there's a lot of reasons for that, a lot of maturity
				680	in the system. But also, I think some of the primitives are set up nicely so
				681	that you can more easily have things running. In some ways, we used to have
				682	this concept of, yeah, we very much had this. Still, in some ways, still have
				683	this, but the idea that there is a UI thread, that there's an IO thread, and
				684	that there is a file thread, and you pick which thread you're going to use.
				685	Now, there's a whole pool of blocking IO threads. And you don't specifically
				686	say, I want the file thread. You say, I have blocking IO I want to do, or give
				687	me a - I want to put it on a thread pool. The IO thread used to be like where -
				688	it may be still the case that some systems would just live there only because
				689	maybe for latency reasons - like, cookies is a good example. We knew that we
				690	wanted to be able to respond quickly to the renderer if it was querying a
				691	cookie database. So we want to be able to service that directly on the IO
				692	thread. And so there'd be a collection of these things that were maybe somewhat
				693	sensitive, and but we wanted to have them live and be on the IO thread. And so
				694	that idea of some things live on the IO thread was born. But I think those
				695	things are few. And you really have to highly justify why you should be on that
				696	thread. And so most things don't need to be. Just be on the UI thread. It's OK.
				697	Or structure your work so that the part that is expensive and blocking goes to
				698	a blocking queue.
				699
				700	51:00 SHARON: So partly for these threads, sometimes you see checks. Like,
				701	check that this is running on a certain thread. But in general, is there a good
				702	way to find out what process a certain block of code runs on? Because some
				703	things we know - if you go to a third party Blink, whatever, you kind of know
				704	that that's going to run in a render process, but just looking at the code,
				705	like looking in code search, can you know where something is going to -
				706
				707	51:25 DARIN: [INAUDIBLE] very early on to try to deal with this. So like if you
				708	go to the content directory, it's a good one to look at. You'll see a browser
				709	directory, subdirectory, a renderer subdirectory, and a common directory. And
				710	there's some other ones that have these familiar names. We use that structure
				711	all throughout the code base for different components. So if you go components,
				712	components foo, you'd see browser, renderer, common, maybe a subset of those,
				713	depending on. And so the idea is, if it's code that should only run in the
				714	renderer, it lives in the render directory. If it's code that should only run
				715	in the browser, it lives in the browser directory. If it's code that could run
				716	in either, it lives in the common directory. So you'll see mojom definitions in
				717	common directories because mojom is where you define the Mojo interface that's
				718	going to be used in both processes.
				719
				720	52:12 SHARON: Oh.
				721
				722	52:12 DARIN: Yeah, we also have this code separation was also kind of born out
				723	of this idea at one point in time that we might generate a totally different
				724	binary for browser and renderer. And we used to have browsR. I'm calling it
				725	that way because it didn't have an E at the end, so browsR and capital R, and
				726	then rendR or something like this. And these were the two processes, the two
				727	executables. And they could just compile whatever code they needed for their
				728	purpose. Like WebKit would be in the renderer, and browser would have not
				729	WebKit. It would have other things. And so these separate directories also
				730	helped because it was like, that's the code that's going to go into that
				731	process literally. And fast forward when Sandbox came along, the team was like,
				732	nope, it's got to be the same executable for both browser and renderer and
				733	should probably be called chrome.exe instead. And then that idea kind of that
				734	they were separate executables and separate code kind of went away. And
				735	instead, all the code for Chrome went into just this big DLL on Windows. And
				736	the amount of shared code between the EXE and the DLL is very small, maybe a
				737	little bit from base and such. But yeah, this idea of tagging the directory
				738	structure in such a way that makes it obvious of like what process this code
				739	belongs in, I think it was a big help, and it was a good choice. And it gives
				740	people a little clarity of where they are and what they can use.
				741
				742	53:49 SHARON: What about for non-browser renderer processes? What about GPU
				743	network? How do you know that this is running on the network process versus
				744	this is how this part of this section of the code is interacting maybe with the
				745	network process?
				746
				747	54:05 DARIN: Sometimes it can be a little bit of good luck. And sometimes it
				748	might not be as obvious. I don't think this sort of - this structure that I
				749	described was used for plugins, so there's a plugins directory, which may still
				750	be around in some fashion or might be mostly gone. I don't know if when the
				751	network process transition occurred, if this annotation was really maintained.
				752	I actually don't think it was because I don't remember seeing network
				753	directories. But I could be wrong. There might be some of them. I'm not as
				754	familiar with the code for the networking process. But I think this convention
				755	has helped us a lot and would be valuable to use in more places. For GPU,
				756	there's a lot of symmetric code, probably code that runs in all processes, but
				757	still this convention probably would make sense. But yeah, I think that for
				758	some of those things, when you get like into the network world or you get into
				759	the GPU world, you're also kind of in a more focused world, a smaller world.
				760	And there's probably many other things you have to learn about that domain.
				761
				762	55:16 SHARON: Yeah, the GPU stuff seems very, very difficult. And I certainly
				763	don't know how that works. OK, so -
				764
				765	55:23 DARIN: [INAUDIBLE] on there.
				766
				767	55:23 SHARON: Yes, so when it comes to process limits and performance and all
				768	that kind of thing, so we have process limits, but you can go over them. And
				769	can you tell us a bit about process limits, how they work, what happens when
				770	you reach the limit?
				771
				772	55:39 DARIN: Hmm, yeah. So process limits, they exist to just have a reasonable
				773	number of processes allocated for some definition of reasonable. At least early
				774	on, that definition was based on how much RAM you had on your system. And as
				775	computers got more and more RAM, that definition needed to be adjusted. We
				776	assumed some overhead for individual processes. It's probably wise to put some
				777	limits on how many we create. The allocation of those processes, it's best to -
				778	kind of viewed as best to distribute the tabs across them as best as we can and
				779	the origins across them now and the side isolation world to give more isolation
				780	between different origins, to give more isolation between the different apps.
				781	But at some level, you run out, and you need to now allocate across the ones
				782	that are already in use. There's some hard rules around privileged content,
				783	like Chrome colon URLs. They should not mix with ordinary web pages. But if
				784	push comes to shove, we'll put a whole bunch of different origins content
				785	together into the same process, just ordinary web pages, not trusted content.
				786
				787	56:52 SHARON: What happens if you just open a ton of tabs with a whole bunch of
				788	different pages open, and you're basically stress testing what Chrome can do?
				789	What happens in that case?
				790
				791	57:08 DARIN: It creates a lot of processes. It uses a lot of system resources.
				792	It uses a lot of RAM. I think that this has been, I'd say, a battle for Chrome
				793	across a lot of its lifetime and more recently, is how to manage these extreme
				794	cases. And increasingly, these extreme cases are not actually odd or unusual.
				795	They'll do a lot of browsing. People click on a lot of links. People create a
				796	lot of tabs. People don't really close their browsers. They just leave it
				797	running. And they come back the next day, and they continue where they left
				798	off. And they open more tabs, and they do more surfing. And they just collect
				799	and collect and collect tabs. And maybe they create more windows because maybe
				800	they have some task that they're researching, and then they get interrupted and
				801	they come back to it later. But they start to accumulate these windows full of
				802	things that maybe they mean to come back to. And so that problem of just having
				803	lots and lots of stuff and lots and lots of processes, well, Chrome under the
				804	hood is like, I'll do my best. You wanted me to do all this stuff. I'm going to
				805	do it. Let's see what I can do. And on a system like Windows or Mac where
				806	there's a lot of RAM maybe, Chrome's thinking, OK, you wanted me to use the
				807	RAM. I'm going to use the RAM. You wanted all those tabs. And then even on
				808	those systems where maybe you're running out of RAM, but there's virtual
				809	memory, there's disk space, all right, let's use it. Let's go. And so I think
				810	it's really quite a challenge, actually.
				811
				812	58:44 DARIN: The original idea of Chrome was, yeah, make it possible for web
				813	pages to take advantage of the resources of your computer. Let it allow web
				814	pages to be more capable because of it, and not be - the old world prior to
				815	Chrome was single-threaded browser, all web pages on the same thread. Like, you
				816	could have a dual core machine, and it wouldn't matter. It wouldn't make your
				817	browser any faster. But now with Chrome, no problem. You got dual core. You got
				818	eight cores, whatever you got. We can have all of those things saturated with
				819	work and allow you to multitask on the web and do lots of amazing things. But I
				820	think it's still a resource management challenge for the browser because on one
				821	hand, you want to give that capability, but on the other hand, you also don't
				822	want to - how much power should you be using? What if the laptop's not plugged
				823	into the wall? What if it's just running on battery? What is the right resource
				824	utilization for Chrome? I don't think that's a solved problem at all. There's
				825	various systems in place to throttle the resource utilization of background
				826	tabs. Timers, for a long time now, have been throttled, but throttling other
				827	things. I know there was a lot of research done into freezing tabs, so
				828	literally suspending them and not letting them do any work. But with that comes
				829	challenges of what do you do with all the IPCs that are inbound to those
				830	processes? They're backing up on pipes, and that's not great. If you unfreeze
				831	them, now there's a blast of IPCs coming in that they suddenly have to service.
				832	That doesn't seem great. Do you drop those IPCs on the floor? Probably not.
				833	Now, the process would be in some weird state, and you might as well have to
				834	just kill it, which, of course, is the case on dev systems like Chrome OS and
				835	Android. They do have to just kill the processes because of the limits of those
				836	devices. So, yeah, I've been a proponent of just being aggressive about killing
				837	processes on desktop in general. I think there's some balance there that's
				838	right. It's probably not right to keep all the tabs open, all the processes
				839	open. We should be, I think, judicious about what we keep open, keeping the
				840	workload reasonable, instead of making it like a, oh, yeah, I will rise to the
				841	challenge of dealing with thousands of tabs or thousands of web pages across
				842	100 processes, even if - maybe it's somehow possible through heroic effort to
				843	make Chrome capable of doing such a thing in an efficient manner. But does it
				844	mean we should? Who needs 1,000 tabs all running around doing work at once you
				845	know? You don't. You really don't. Nobody does.
				846
				847	61:32 SHARON: So this is kind of the basis of the goal for Arc, right, which
				848	is I think it closes your tabs overnight or something. And Arc is what you work
				849	on now and is a Chromium-based browser. So for embedders of Chromium, let's say
				850	the browser kind, how much control do you have over how processes are used,
				851	allocated, if you embed content? Like, are you able to just say, oh, I don't
				852	want a network process. I will just put this all in the browser process. Can
				853	you do that?
				854
				855	62:07 DARIN: Hm. You can do anything you want. It's just code. No, but as a
				856	browser embedder, as a Chromium embedder, you're shipping Chromium. So Arc
				857	browser ships a copy of Chromium. And Arc browser includes changes to Chromium
				858	as needed to make it work. Of course, that's possible. Of course, you could
				859	change a lot of stuff and make a big headache to manage it all, right? So
				860	there's some natural limits. You don't want to change too many things, or else
				861	you won't be able to really manage it going forward. You want to take updates
				862	from the mainline, incorporate improvements, but you also want to preserve some
				863	differences that you've made. Well, how do you do that? And so change
				864	management is a challenge. So there's a natural limit to how much you want to
				865	alter the base functionality. Instead, it's - anyways, the product like Arc is
				866	not so much differentiating on the basis of Chromium code or content layer.
				867	It's not really its purpose or goal. Its purpose is to differentiate at the UI
				868	layer and with things like what you mentioned and other things as well. Yeah,
				869	and so, of course, if one were to go down the path of could we optimize process
				870	model better, that would be in the realm of things that would be great to
				871	contribute to Chromium, so that it could be part of the mainline and therefore
				872	not be something that you have to maintain yourself. That's how I would
				873	approach it as a Chromium embedder.
				874
				875	63:47 SHARON: OK, that makes sense. Yeah, if it's in Chromium, you don't have
				876	to worry about the updates, and you just get -
				877
				878	63:53 DARIN: Turns out there's an army of engineers who would make sure it's
				879	never broken. You just gotta write some tests.
				880
				881	63:59 SHARON: Oh, wow.
				882
				883	63:59 DARIN: [INAUDIBLE] those tests.
				884
				885	64:05 SHARON: So with non-browser embedders of Chromium, like, say, Electron, I
				886	don't know how familiar you are with that, but they presumably would have
				887	different needs out of how Chromium works, basically. I don't know if you know
				888	what they're doing with any processes.
				889
				890	64:25 DARIN: I mean, I've used VS Code. That's a famous example of a Chromium
				891	embedder that you might not realize is using Chromium or built on top of it,
				892	that one might not realize that. But if you open up Task Manager and you look
				893	at VS Code, you'll see all the glorious processes under there. And so have they
				894	or Electron or any of these, have they altered things there? Maybe. I mean,
				895	there's some configuration one might do. If you're building an application
				896	that's very single purpose, like VS Code or Slack or - what are some other good
				897	examples, there's quite a few that are built on top of Chromium - they're more
				898	single purpose towards a single app, right? Of course, VS Code is pretty
				899	sprawling with all the things you can do in it, but at the same time, it could
				900	be the case that they don't have the same security concerns. They don't have
				901	the same idea of hosting content from so many different sources. So maybe they
				902	would tune the process model a little differently. Maybe they would decide, I
				903	don't really need as many processes because I'm managing things in a different
				904	way. It's not a browser.
				905
				906	65:34 SHARON: Yeah, you're not handling all of the untrusted JavaScript of the
				907	web that you have to be -
				908
				909	65:42 DARIN: Right, I'm not so worried about this part of my application dying
				910	and then wanting to keep the rest of it still running or something because that
				911	would still be considered a bug because part of my app died. And so some of the
				912	reasons for multi-process architecture might be a little different.
				913
				914	66:01 SHARON: Right. And more just for fun, having worked on now an embedder of
				915	Chromium, how has that experience been in terms of decisions that were made
				916	when you were putting together the multi-process architecture? Are there things
				917	where you were like, oh, no, past me, if you'd done this differently, this
				918	would be easier now.
				919
				920	66:20 DARIN: I would say I'm very thankful for Mojo IPC, made it very easy to -
				921	one thing that I've found is that it's possible to do a lot of amazing things
				922	on top of Chromium without actually modifying Chromium. And the Content API and
				923	Mojo IPC makes a lot of that really possible. So it's a very flexible system.
				924	There's a lot of really great hooks that let you interact with the system all
				925	the way from extending the renderer to extending the browser. And to be able to
				926	build stuff and layer it on top of a stable system is amazing. When I was
				927	working on building an Android browser, I built a tracking prevention ad
				928	blocking system for Android and was able to do it without modifying Chromium. I
				929	thought that was amazing.
				930
				931	67:19 SHARON: How are you using Mojo? Because Mojo is typically going between
				932	the processes. So if you're not really changing how the processes work, what do
				933	you use Mojo for?
				934
				935	67:26 DARIN: Oh, well, in that case, it was used to communicate a rule set down
				936	to the renderer. And then at the renderer level, I would inject a stylesheet to
				937	do content blocking or to apply a network filtering at the link layer. So there
				938	are a combination of Blink Public APIs and Content Public APIs. There are
				939	actually enough hooks to be able to filter network requests and insert
				940	stylesheets that would apply display none to a set of DOM elements. So but to
				941	do that efficiently, it was necessary to bundle up those rules into a blob of
				942	memory that you would just send down to the renderer process, to all render
				943	process, so it'd have it available to them so they could just directly inspect
				944	like a big hash map of rules. And so being able to - like I said before, when
				945	the IPC system is just like - when it's decoupled like that with Mojo, it makes
				946	it possible to kind of graft on these systems that they interact with APIs over
				947	here, and that endpoint talks to some endpoint over here in the browser
				948	process, which can have, like I said, like a rules data that it might want to
				949	send over and that kind of thing. And so being able to build those kinds of
				950	systems, and I think if you look at just how a lot of features in Chrome are
				951	built, they're built very similarly, too. They build on top of the Content API
				952	that provides the various hooks. They build on top of Blink API. Sometimes a
				953	feature needs to live in the renderer and the browser process. Like autofill is
				954	always the classic example of this early on in Chrome or password manager.
				955	These are systems that need to crawl the DOM. They need to poke at the DOM.
				956	They need to understand what's there. They need to be able to insert content or
				957	put overlays in, or they need to be able to talk to the browser where the
				958	actual database is, all that kind of stuff, and looking at different load
				959	events and various things to know in the lifecycle of the page. So, yeah, I'd
				960	say I'm thankful for a lot of these design choices along the way because I
				961	think it's led to Chromium being so useful to so many people in so many
				962	different ways. Obviously, it empowered building a really great browser and a
				963	really great product, but it also has empowered a lot of follow-on innovation.
				964	And I think that's pretty cool.
				965
				966	69:53 SHARON: It is pretty cool. So Chrome was released in 2008. It is
				967	now 2023. So as math tells, it's been 15 years. We like numbers that end in 5
				968	and 0. So - I don't know - it's very cool. I remember when Chrome came out. And
				969	I don't know. Do you have any -
				970
				971	70:08 DARIN: Yeah, for me, it's more like 17 years because we started in 2006.
				972
				973	70:14 SHARON: Right. So do you have any general reflections on all the stuff
				974	that's changed in that time?
				975
				976	70:22 DARIN: It's wild. I have a higher density of memories from the early
				977	days, too. It's amazing. I guess that's how memories work when everything's new
				978	and changing so much. But yeah, no, I'm very thankful for the journey and very
				979	thankful to have been part of it. And it was a lot of fun to work on. I mean,
				980	prior to Chrome, when I was working on Firefox, I did a little exploration on
				981	adding like a multi-process thing to Firefox, which I thought - just, I was
				982	learning about how to do IPC, and I was learning - but I was doing it for what
				983	purpose back then. I think I was just toying around with DCOM. I don't know if
				984	anybody knows what COM is, but Microsoft's Component Object Model that was like
				985	all the rage back then. And it allowed for like integrating different languages
				986	together. WinRT is all built on top of this stuff now. But anyways, Mozilla had
				987	its own version of COM called XPCOM. And wouldn't it be cool if you could have
				988	a component that - so you could have components back then that were built in
				989	JavaScript, and you could talk to them from C++, or they were built in C++ more
				990	commonly, and you talked to them from JavaScript. But wouldn't it be cool if
				991	one endpoint could be in another process? So that was something I was playing
				992	around with in 2004 when I was still working on Firefox. And then when Chrome
				993	opportunity came along - maybe that was 2005 - I don't know. But when the
				994	Chrome opportunity came along, I was like, all right, let's do it. IPC channel
				995	was basically those ideas, but kind of more polished slightly.
				996
				997	72:02 SHARON: OK. Yeah, very cool. I mean, when I first started working on
				998	Chrome stuff, someone on my team said, any time you change something in base,
				999	that pretty much is going to get run anytime the internet gets run, which I
				1000	thought was super crazy for just some random software engineer like me to be
				1001	able to do, right? But -
				1002
				1003	72:20 DARIN: And now it's even more than that if you think about [INAUDIBLE]
				1004	code and [INAUDIBLE]..
				1005
				1006	72:20 SHARON: Yeah, all the stuff. So do you ever just think about it, and
				1007	you're just like, oh, my god, wow.
				1008
				1009	72:26 DARIN: Yeah, it's pretty amazing.
				1010
				1011	72:31 SHARON: So crazy.
				1012
				1013	72:31 DARIN: It is one of the special things about working on Chromium, is
				1014	that, yes, you can have such an amazing impact with the work that you do there.
				1015
				1016	72:38 SHARON: Have there been any cases - these are just now unrelated
				1017	miscellaneous questions. But in terms of surprising usages of Chromium, be it
				1018	like maybe the base or the net stack or something, have there been any cases
				1019	where you were really surprised by like, oh, this is being used here?
				1020
				1021	72:56 DARIN: Well, for sure, the first time I heard about Electron, I was like,
				1022	oh, this is not a good idea. House of cards, you know? It just seems like it's
				1023	such a complicated system to build your app on top of, right? But at the same
				1024	time, I totally get it and appreciate it, and I understand why people would
				1025	reach for it. There's so much good sauce there, so much good stuff and so
				1026	many - there is a lot of really good infrastructure there to build on. Early
				1027	on, I kind of imagined more that things like Skia and V8 and some of the other
				1028	libraries would be the thing that people would make lots of extra use out of,
				1029	right? So I didn't quite imagine people taking the browser's framework like
				1030	this. And we absolutely didn't build it with that purpose. Pretty much every
				1031	choice along the way was highly motivated by making Chrome team's life better.
				1032	Like, Content API was, when we came to the realization we needed it, it was
				1033	like we desperately need it. Just the complexity of Chrome was getting
				1034	unwieldy. We needed to cleave part of it and say, that is this part. We needed
				1035	to somehow draw a line in the sand and say, this is the set of concerns over
				1036	here. And so the idea that all of this could be used for other purposes is
				1037	cool, but it was never really in the initial cards. And I came from working on
				1038	Mozilla, which was, in many ways, browser construction kit first, product
				1039	second. So Chrome was very much like, let's go the other extreme - product
				1040	first, maybe a platform later. And to see it be this platform now is pretty
				1041	cool. But it's pretty far from where we started.
				1042
				1043	74:50 SHARON: Yeah, kind of - I watched some of the earlier talks you gave
				1044	about the multi-process architecture and Content, not Chrome, came up a bunch.
				1045	And this is, things, I guess, like Electron are the result of that, right?
				1046	Where -
				1047
				1048	75:01 DARIN: Yeah, it's pretty wild. Yeah, I mean, so Mozilla built this very
				1049	elaborate system called XUL, or X-U-L, which was a XML language for doing UI.
				1050	And it's very interesting, intellectually interesting, maybe different than
				1051	XAML. XAML is way better probably in many ways. But XUL was kind of XHTML
				1052	minus, minus, with a bunch of stuff added on for like UI things. And then it
				1053	had this thing called XBL, which is a bindings language that you could do
				1054	custom bindings. And so anyways, then you build your application in JavaScript
				1055	and Firefox, Mozilla, it was all built this way. So it was like a web page
				1056	hosting a web page. The outer web page was like this XML DOM. The product
				1057	engineers working on that, in order to get some modern Windows sort of thing
				1058	come through, they had to basically go through the rendering engine team to get
				1059	them to do something. And so it really greatly limited the ability for product
				1060	team to actually build product. And there were so many sacred cows around the
				1061	shape of Gecko and how that structure was, that while this cross-platform
				1062	toolkit seemed glorious at first, it ended up being handcuffs for product
				1063	engineering, I think. So, yeah, Chrome started out with Win32 native UI for
				1064	browser UI. You have all the choices you want to make, browser front-end
				1065	engineers. You also have to build a lot of code, but no cross-platform
				1066	toolkits. Views came later.
				1067
				1068	76:43 SHARON: Right. Well, this was great. Thank you very much. Normally, we do
				1069	a shout-out section at the end. Do you have anything - normally, it's like a
				1070	Slack channel or something like the Mojo Slack channel. I think in this case,
				1071	it's maybe - I don't know if there is a specific thing, but is there anything?
				1072
				1073	76:57 DARIN: Shout-out to all the team and the engineers making everything
				1074	great.
				1075
				1076	77:03 SHARON: All right.
				1077
				1078	77:03 DARIN: Yeah.
				1079
				1080	77:03 SHARON: Cool. Awesome. Well, thank you very much for chatting with us.
				1081	That was super cool, lots of really interesting background and good
				1082	information. So thank you very much.
				1083
				1084	77:15 DARIN: Yeah, a pleasure. Thank you so much for having me.
				1085
				1086	77:21 SHARON: Talk about threads, so IO, UI thread.
				1087
				1088	77:27 DARIN: Do I get credit for the confusingly named IO thread?
				1089
				1090	77:27 SHARON: OK, all right, we can cover that. That's cool. Yeah, why is it
				1091	called IO thread when it doesn't do IO?