Nasko Oskov | cf0dd68 | 2020-01-16 23:31:09 | [diff] [blame^] | 1 | ## Life of a Navigation |
| 2 | |
| 3 | Navigation is one of the main functions of a browser. It is the process |
| 4 | through which the user loads documents. This documentation traces the life of |
| 5 | a navigation from the time a URL is typed in the URL bar to the time the web |
| 6 | page is completely loaded. |
| 7 | |
| 8 | |
| 9 | ### BeforeUnload |
| 10 | |
| 11 | Once a URL is typed, the first step of a navigation is to execute the |
| 12 | beforeunload event handler of the previous document, if a document is already |
| 13 | loaded. This allows the previous document to prompt the user whether they want |
| 14 | to leave, to avoid losing any unsaved data. In this case, the user can cancel |
| 15 | the navigation and no more work will be performed. |
| 16 | |
| 17 | |
| 18 | ### Network Request and Response |
| 19 | |
| 20 | If there is no beforeunload handler registered, or the user agrees to proceed, |
| 21 | the next step is making a network request to the specified URL to retrieve the |
| 22 | contents of the document to be rendered. Assuming no network error is |
| 23 | encountered (e.g. DNS resolution error, socket connection timeout, etc.), the |
| 24 | server will respond with data, with the response headers coming first. The |
| 25 | parsed headers give enough information to determine what needs to be done |
| 26 | next. |
| 27 | |
| 28 | The HTTP response code allows the browser process to know whether one of the |
| 29 | following conditions has occurred: |
| 30 | |
| 31 | * A successful response follows (2xx) |
| 32 | * A redirect has been encountered (response 3xx) |
| 33 | * An HTTP level error has occurred (response 4xx, 5xx) |
| 34 | |
| 35 | There are two cases where a navigation network request can complete without |
| 36 | resulting in a new document being rendered. The first one is HTTP response |
| 37 | code 204 or 205, which tells the browser that the response was successful, but |
| 38 | there is no content that follows, and therefore the current document must |
| 39 | remain active. The other case is when the server responds with a header |
| 40 | indicating that the response must be treated as a download. All the data is |
| 41 | read by the browser and then saved to the local filesystem. |
| 42 | |
| 43 | If the server responds with a redirect, the network stack makes another |
| 44 | request based on the HTTP response code and the Location header. The browser |
| 45 | continues following redirects until either an error or a successful response |
| 46 | is encountered. |
| 47 | |
| 48 | Once there are no more redirects, if the response is not a 204/205 or a |
| 49 | download, the network stack reads a small chunk of the actual response data |
| 50 | that the server has sent. By default this is used to perform MIME type |
| 51 | sniffing, to determine what type of response the server has sent. |
| 52 | This sniffing behavior can be suppressed by sending a “X-Content-Type-Options: |
| 53 | nosniff” header as part of the response headers. |
| 54 | |
| 55 | |
| 56 | ### Commit |
| 57 | |
| 58 | At this point the response is passed from the network stack to the browser |
| 59 | process to be used for rendering a new document. The browser process selects |
| 60 | an appropriate renderer process for the new document based on the origin and |
| 61 | headers of the response as well as the current process model and isolation |
| 62 | policy. It then sends the response to the chosen process, waiting for it to |
| 63 | create the document and send an acknowledgement. This acknowledgement from the |
| 64 | renderer process marks the _commit_ time, when the browser process changes its |
| 65 | security state to reflect the new document and creates a session history entry |
| 66 | for the previous document. |
| 67 | |
| 68 | As part of creating the new document, the old document needs to be unloaded. |
| 69 | In navigations that stay in the same renderer process, the old document is |
| 70 | unloaded by Blink before the new document is created, including running any |
| 71 | registered unload handlers. In the case of a navigation that goes |
| 72 | cross-process, any unload handlers are executed in the previous document’s |
| 73 | process concurrently with the creation of the new document in the new process. |
| 74 | |
| 75 | Once the creation of the new document is complete and the browser process |
| 76 | receives the commit message from the renderer process, the navigation is |
| 77 | complete. |
| 78 | |
| 79 | |
| 80 | ### Loading |
| 81 | |
| 82 | Even once navigation is complete, the user doesn't actually see the new page |
| 83 | yet. Most people use the word navigation to describe the act of moving from |
| 84 | one page to another, but in Chromium we separate that process into two phases. |
| 85 | So far we have described the _navigation_ phase; once the navigation has been |
| 86 | committed, the process moves into the _loading_ phase. Loading consists of |
| 87 | reading the remaining response data from the server, parsing it, rendering the |
| 88 | document so it is visible to the user, executing any script accompanying it, |
| 89 | and loading any subresources specified by the document. |
| 90 | |
| 91 | |
| 92 | The main reason for splitting into these two phases is that errors are treated |
| 93 | differently before and after a navigation commits. Consider the case where the |
| 94 | server responds with an HTTP error code. When this happens, the browser still |
| 95 | commits a new document, but that document is an error page. The error page is |
| 96 | either generated based on the HTTP response code or read as the response data |
| 97 | from the server. On the other hand, if a successful navigation has committed a |
| 98 | real document and has moved to the loading phase, it is still possible to |
| 99 | encounter an error, for example a network connection can be terminated or |
| 100 | times out. In that case the browser displays as much of the new document as it |
| 101 | can, without showing an error page. |
| 102 | |
| 103 | |
| 104 | ### WebContentsObserver |
| 105 | |
| 106 | Chromium exposes the various stages of navigation and document loading through |
| 107 | methods on the [WebContentsObserver] interface. |
| 108 | |
| 109 | #### Navigation |
| 110 | |
| 111 | * DidStartNavigation - invoked after executing the beforeunload event handler |
| 112 | and before making the initial network request. |
| 113 | * DidRedirectNavigation - invoked every time a server redirect is encountered. |
| 114 | * ReadyToCommitNavigation - invoked at the time the browser process has |
| 115 | determined that it will commit the navigation and has picked a renderer |
| 116 | process for it, but before it has sent it to the renderer process. It is not |
| 117 | invoked for same-document navigations. |
| 118 | * DidFinishNavigation - invoked once the navigation has committed. The commit |
| 119 | can be either an error page if the server responded with an error code or a |
| 120 | successful document. |
| 121 | |
| 122 | |
| 123 | #### Loading |
| 124 | |
| 125 | * DidStartLoading - invoked once per WebContents, when a navigation is about |
| 126 | to start, after executing the beforeunload handler. This is equivalent to the |
| 127 | browser UI starting to show a spinner or other visual indicator for |
| 128 | navigation and is invoked before the DidStartNavigation method for the |
| 129 | navigation. |
| 130 | * DOMContentLoaded - invoked per RenderFrameHost, when the document itself |
| 131 | has completed loading, but before subresources may have completed loading. |
| 132 | * DidFinishLoad - invoked per RenderFrameHost, when the document and all of its |
| 133 | subresources have finished loading. |
| 134 | * DidStopLoading - invoked once per WebContents, when the top-level document, |
| 135 | all of its subresources, all subframes, and their subresources have completed |
| 136 | loading. This is equivalent to the browser UI stop showing a spinner or other |
| 137 | visual indicator for navigation and loading. |
| 138 | * DidFailLoad - invoked per RenderFrameHost, when the document load failed, for |
| 139 | example due to network connection termination before reading all of the |
| 140 | response data. |
| 141 | |
| 142 | |
| 143 | ### Same-Document and Cross-Document Navigations |
| 144 | |
| 145 | Chromium defines two types of navigations based on whether the navigation |
| 146 | results in a new document or not. A _cross-document_ navigation is one that |
| 147 | results in creating a new document to replace an existing document. This is |
| 148 | the type of navigation that most users are familiar with. A _same-document_ |
| 149 | navigation does not create a new document, but rather keeps the same document |
| 150 | and changes state associated with it. A same-document navigation does create a |
| 151 | new session history entry, even though the same document remains active. This |
| 152 | can be the result of one of the following cases: |
| 153 | |
| 154 | * Navigating to a fragment within an existing document (e.g. |
| 155 | http<nolink>://foo.com/1.html#fragment) |
| 156 | * A document calling the history.pushState() or history.replaceState() APIs |
| 157 | * A session history navigation, such as going back/forward, to an existing entry |
| 158 | for the same document. |
| 159 | |
| 160 | |
| 161 | ### Browser-Initiated and Renderer-Initiated Navigations |
| 162 | |
| 163 | Chromium also defines two types of navigations based on which process |
| 164 | started the navigation: _browser-initiated_ and _renderer-initiated_. This |
| 165 | distinction is useful when making decisions about navigations, for example |
| 166 | whether an ongoing navigation needs to be cancelled or not when a new |
| 167 | navigation is starting. It is also used for some security decisions, such as |
| 168 | whether to display the target URL of the navigation in the URL bar or not. |
| 169 | Browser-initiated navigations are more trustworthy, as they are usually in |
| 170 | response to a user interaction with the UI of the browser. Renderer-initiated |
| 171 | navigations originate in the renderer process, which may be under the control |
| 172 | of an attacker. |
| 173 | |
| 174 | [WebContentsObserver]: https://source.chromium.org/chromium/chromium/src/+/master:content/public/browser/web_contents_observer.h |