Annotation of html5/spec/tokenization.html, revision 1.112
1.77 mike 1: <!DOCTYPE html>
1.88 mike 2: <html lang="en-US-x-Hixie"><head><title>8.2.4 Tokenization — HTML5</title><style type="text/css">
1.1 mike 3: pre { margin-left: 2em; white-space: pre-wrap; }
4: h2 { margin: 3em 0 1em 0; }
5: h3 { margin: 2.5em 0 1em 0; }
6: h4 { margin: 2.5em 0 0.75em 0; }
7: h5, h6 { margin: 2.5em 0 1em; }
8: h1 + h2, h1 + h2 + h2 { margin: 0.75em 0 0.75em; }
9: h2 + h3, h3 + h4, h4 + h5, h5 + h6 { margin-top: 0.5em; }
10: p { margin: 1em 0; }
11: hr:not(.top) { display: block; background: none; border: none; padding: 0; margin: 2em 0; height: auto; }
12: dl, dd { margin-top: 0; margin-bottom: 0; }
13: dt { margin-top: 0.75em; margin-bottom: 0.25em; clear: left; }
14: dt + dt { margin-top: 0; }
15: dd dt { margin-top: 0.25em; margin-bottom: 0; }
16: dd p { margin-top: 0; }
17: dd dl + p { margin-top: 1em; }
18: dd table + p { margin-top: 1em; }
19: p + * > li, dd li { margin: 1em 0; }
20: dt, dfn { font-weight: bold; font-style: normal; }
1.80 mike 21: i, em { font-style: italic; }
1.1 mike 22: dt dfn { font-style: italic; }
23: pre, code { font-size: inherit; font-family: monospace; font-variant: normal; }
24: pre strong { color: black; font: inherit; font-weight: bold; background: yellow; }
25: pre em { font-weight: bolder; font-style: normal; }
26: @media screen { code { color: orangered; } code :link, code :visited { color: inherit; } }
27: var sub { vertical-align: bottom; font-size: smaller; position: relative; top: 0.1em; }
28: table { border-collapse: collapse; border-style: hidden hidden none hidden; }
29: table thead, table tbody { border-bottom: solid; }
30: table tbody th:first-child { border-left: solid; }
31: table tbody th { text-align: left; }
32: table td, table th { border-left: solid; border-right: solid; border-bottom: solid thin; vertical-align: top; padding: 0.2em; }
33: blockquote { margin: 0 0 0 2em; border: 0; padding: 0; font-style: italic; }
34:
35: .bad, .bad *:not(.XXX) { color: gray; border-color: gray; background: transparent; }
36: .matrix, .matrix td { border: none; text-align: right; }
37: .matrix { margin-left: 2em; }
38: .dice-example { border-collapse: collapse; border-style: hidden solid solid hidden; border-width: thin; margin-left: 3em; }
39: .dice-example caption { width: 30em; font-size: smaller; font-style: italic; padding: 0.75em 0; text-align: left; }
40: .dice-example td, .dice-example th { border: solid thin; width: 1.35em; height: 1.05em; text-align: center; padding: 0; }
41:
42: .toc dfn, h1 dfn, h2 dfn, h3 dfn, h4 dfn, h5 dfn, h6 dfn { font: inherit; }
1.83 mike 43: img.extra, p.overview { float: right; }
1.82 mike 44: pre.idl { border: solid thin; background: #EEEEEE; color: black; padding: 0.5em 1em; position: relative; }
1.1 mike 45: pre.idl :link, pre.idl :visited { color: inherit; background: transparent; }
1.82 mike 46: pre.idl::before { content: "IDL"; font: bold small sans-serif; padding: 0.5em; background: white; position: absolute; top: 0; margin: -1px 0 0 -4em; width: 1.5em; border: thin solid; border-radius: 0 0 0 0.5em }
1.1 mike 47: pre.css { border: solid thin; background: #FFFFEE; color: black; padding: 0.5em 1em; }
48: pre.css:first-line { color: #AAAA50; }
49: dl.domintro { color: green; margin: 2em 0 2em 2em; padding: 0.5em 1em; border: none; background: #DDFFDD; }
50: hr + dl.domintro, div.impl + dl.domintro { margin-top: 2.5em; margin-bottom: 1.5em; }
51: dl.domintro dt, dl.domintro dt * { color: black; text-decoration: none; }
52: dl.domintro dd { margin: 0.5em 0 1em 2em; padding: 0; }
53: dl.domintro dd p { margin: 0.5em 0; }
1.84 mike 54: dl.domintro:before { display: table; margin: -1em -0.5em -0.5em auto; width: auto; content: 'This box is non-normative. Implementation requirements are given below this box.'; color: black; font-style: italic; border: solid 2px; background: white; padding: 0 0.25em; }
1.1 mike 55: dl.switch { padding-left: 2em; }
56: dl.switch > dt { text-indent: -1.5em; }
57: dl.switch > dt:before { content: '\21AA'; padding: 0 0.5em 0 0; display: inline-block; width: 1em; text-align: right; line-height: 0.5em; }
58: dl.triple { padding: 0 0 0 1em; }
59: dl.triple dt, dl.triple dd { margin: 0; display: inline }
60: dl.triple dt:after { content: ':'; }
61: dl.triple dd:after { content: '\A'; white-space: pre; }
62: .diff-old { text-decoration: line-through; color: silver; background: transparent; }
63: .diff-chg, .diff-new { text-decoration: underline; color: green; background: transparent; }
64: a .diff-new { border-bottom: 1px blue solid; }
65:
66: h2 { page-break-before: always; }
67: h1, h2, h3, h4, h5, h6 { page-break-after: avoid; }
68: h1 + h2, hr + h2.no-toc { page-break-before: auto; }
69:
1.44 mike 70: p > span:not([title=""]):not([class="XXX"]):not([class="impl"]):not([class="note"]),
71: li > span:not([title=""]):not([class="XXX"]):not([class="impl"]):not([class="note"]), { border-bottom: solid #9999CC; }
1.1 mike 72:
73: div.head { margin: 0 0 1em; padding: 1em 0 0 0; }
74: div.head p { margin: 0; }
75: div.head h1 { margin: 0; }
76: div.head .logo { float: right; margin: 0 1em; }
77: div.head .logo img { border: none } /* remove border from top image */
78: div.head dl { margin: 1em 0; }
79: div.head p.copyright, div.head p.alt { font-size: x-small; font-style: oblique; margin: 0; }
80:
81: body > .toc > li { margin-top: 1em; margin-bottom: 1em; }
82: body > .toc.brief > li { margin-top: 0.35em; margin-bottom: 0.35em; }
83: body > .toc > li > * { margin-bottom: 0.5em; }
84: body > .toc > li > * > li > * { margin-bottom: 0.25em; }
85: .toc, .toc li { list-style: none; }
86:
87: .brief { margin-top: 1em; margin-bottom: 1em; line-height: 1.1; }
88: .brief li { margin: 0; padding: 0; }
89: .brief li p { margin: 0; padding: 0; }
90:
91: .category-list { margin-top: -0.75em; margin-bottom: 1em; line-height: 1.5; }
92: .category-list::before { content: '\21D2\A0'; font-size: 1.2em; font-weight: 900; }
93: .category-list li { display: inline; }
94: .category-list li:not(:last-child)::after { content: ', '; }
95: .category-list li > span, .category-list li > a { text-transform: lowercase; }
96: .category-list li * { text-transform: none; } /* don't affect <code> nested in <a> */
97:
98: .XXX { color: #E50000; background: white; border: solid red; padding: 0.5em; margin: 1em 0; }
99: .XXX > :first-child { margin-top: 0; }
100: p .XXX { line-height: 3em; }
101: .annotation { border: solid thin black; background: #0C479D; color: white; position: relative; margin: 8px 0 20px 0; }
102: .annotation:before { position: absolute; left: 0; top: 0; width: 100%; height: 100%; margin: 6px -6px -6px 6px; background: #333333; z-index: -1; content: ''; }
103: .annotation :link, .annotation :visited { color: inherit; }
104: .annotation :link:hover, .annotation :visited:hover { background: transparent; }
105: .annotation span { border: none ! important; }
106: .note { color: green; background: transparent; font-family: sans-serif; }
107: .warning { color: red; background: transparent; }
108: .note, .warning { font-weight: bolder; font-style: italic; }
1.80 mike 109: .note em, .warning em, .note i, .warning i { font-style: normal; }
1.1 mike 110: p.note, div.note { padding: 0.5em 2em; }
111: span.note { padding: 0 2em; }
112: .note p:first-child, .warning p:first-child { margin-top: 0; }
113: .note p:last-child, .warning p:last-child { margin-bottom: 0; }
114: .warning:before { font-style: normal; }
115: p.note:before { content: 'Note: '; }
116: p.warning:before { content: '\26A0 Warning! '; }
117:
118: .bookkeeping:before { display: block; content: 'Bookkeeping details'; font-weight: bolder; font-style: italic; }
119: .bookkeeping { font-size: 0.8em; margin: 2em 0; }
120: .bookkeeping p { margin: 0.5em 2em; display: list-item; list-style: square; }
1.19 mike 121: .bookkeeping dt { margin: 0.5em 2em 0; }
122: .bookkeeping dd { margin: 0 3em 0.5em; }
1.1 mike 123:
124: h4 { position: relative; z-index: 3; }
125: h4 + .element, h4 + div + .element { margin-top: -2.5em; padding-top: 2em; }
126: .element {
127: background: #EEEEFF;
128: color: black;
129: margin: 0 0 1em 0.15em;
130: padding: 0 1em 0.25em 0.75em;
131: border-left: solid #9999FF 0.25em;
132: position: relative;
133: z-index: 1;
134: }
135: .element:before {
136: position: absolute;
137: z-index: 2;
138: top: 0;
139: left: -1.15em;
140: height: 2em;
141: width: 0.9em;
142: background: #EEEEFF;
143: content: ' ';
144: border-style: none none solid solid;
145: border-color: #9999FF;
146: border-width: 0.25em;
147: }
148:
149: .example { display: block; color: #222222; background: #FCFCFC; border-left: double; margin-left: 2em; padding-left: 1em; }
150: td > .example:only-child { margin: 0 0 0 0.1em; }
151:
152: ul.domTree, ul.domTree ul { padding: 0 0 0 1em; margin: 0; }
153: ul.domTree li { padding: 0; margin: 0; list-style: none; position: relative; }
154: ul.domTree li li { list-style: none; }
155: ul.domTree li:first-child::before { position: absolute; top: 0; height: 0.6em; left: -0.75em; width: 0.5em; border-style: none none solid solid; content: ''; border-width: 0.1em; }
156: ul.domTree li:not(:last-child)::after { position: absolute; top: 0; bottom: -0.6em; left: -0.75em; width: 0.5em; border-style: none none solid solid; content: ''; border-width: 0.1em; }
157: ul.domTree span { font-style: italic; font-family: serif; }
158: ul.domTree .t1 code { color: purple; font-weight: bold; }
159: ul.domTree .t2 { font-style: normal; font-family: monospace; }
160: ul.domTree .t2 .name { color: black; font-weight: bold; }
161: ul.domTree .t2 .value { color: blue; font-weight: normal; }
162: ul.domTree .t3 code, .domTree .t4 code, .domTree .t5 code { color: gray; }
163: ul.domTree .t7 code, .domTree .t8 code { color: green; }
164: ul.domTree .t10 code { color: teal; }
165:
166: body.dfnEnabled dfn { cursor: pointer; }
167: .dfnPanel {
168: display: inline;
169: position: absolute;
170: z-index: 10;
171: height: auto;
172: width: auto;
173: padding: 0.5em 0.75em;
174: font: small sans-serif, Droid Sans Fallback;
175: background: #DDDDDD;
176: color: black;
177: border: outset 0.2em;
178: }
179: .dfnPanel * { margin: 0; padding: 0; font: inherit; text-indent: 0; }
180: .dfnPanel :link, .dfnPanel :visited { color: black; }
181: .dfnPanel p { font-weight: bolder; }
182: .dfnPanel * + p { margin-top: 0.25em; }
183: .dfnPanel li { list-style-position: inside; }
184:
185: #configUI { position: absolute; z-index: 20; top: 10em; right: 1em; width: 11em; font-size: small; }
186: #configUI p { margin: 0.5em 0; padding: 0.3em; background: #EEEEEE; color: black; border: inset thin; }
187: #configUI p label { display: block; }
188: #configUI #updateUI, #configUI .loginUI { text-align: center; }
189: #configUI input[type=button] { display: block; margin: auto; }
1.17 mike 190:
1.51 mike 191: fieldset { margin: 1em; padding: 0.5em 1em; }
192: fieldset > legend + * { margin-top: 0; }
1.43 mike 193: fieldset > :last-child { margin-bottom: 0; }
1.51 mike 194: fieldset p { margin: 0.5em 0; }
195:
1.78 mike 196: </style><link href="http://www.w3.org/StyleSheets/TR/W3C-ED" rel="stylesheet" type="text/css"><style type="text/css">
1.1 mike 197:
198: .applies thead th > * { display: block; }
199: .applies thead code { display: block; }
200: .applies tbody th { whitespace: nowrap; }
201: .applies td { text-align: center; }
202: .applies .yes { background: yellow; }
203:
1.20 mike 204: .matrix, .matrix td { border: hidden; text-align: right; }
1.1 mike 205: .matrix { margin-left: 2em; }
206:
207: .dice-example { border-collapse: collapse; border-style: hidden solid solid hidden; border-width: thin; margin-left: 3em; }
208: .dice-example caption { width: 30em; font-size: smaller; font-style: italic; padding: 0.75em 0; text-align: left; }
209: .dice-example td, .dice-example th { border: solid thin; width: 1.35em; height: 1.05em; text-align: center; padding: 0; }
210:
1.32 mike 211: td.eg { border-width: thin; text-align: center; }
212:
1.1 mike 213: #table-example-1 { border: solid thin; border-collapse: collapse; margin-left: 3em; }
214: #table-example-1 * { font-family: "Essays1743", serif; line-height: 1.01em; }
215: #table-example-1 caption { padding-bottom: 0.5em; }
216: #table-example-1 thead, #table-example-1 tbody { border: none; }
217: #table-example-1 th, #table-example-1 td { border: solid thin; }
218: #table-example-1 th { font-weight: normal; }
219: #table-example-1 td { border-style: none solid; vertical-align: top; }
220: #table-example-1 th { padding: 0.5em; vertical-align: middle; text-align: center; }
221: #table-example-1 tbody tr:first-child td { padding-top: 0.5em; }
222: #table-example-1 tbody tr:last-child td { padding-bottom: 1.5em; }
223: #table-example-1 tbody td:first-child { padding-left: 2.5em; padding-right: 0; width: 9em; }
224: #table-example-1 tbody td:first-child::after { content: leader(". "); }
225: #table-example-1 tbody td { padding-left: 2em; padding-right: 2em; }
226: #table-example-1 tbody td:first-child + td { width: 10em; }
227: #table-example-1 tbody td:first-child + td ~ td { width: 2.5em; }
228: #table-example-1 tbody td:first-child + td + td + td ~ td { width: 1.25em; }
229:
230: .apple-table-examples { border: none; border-collapse: separate; border-spacing: 1.5em 0em; width: 40em; margin-left: 3em; }
231: .apple-table-examples * { font-family: "Times", serif; }
232: .apple-table-examples td, .apple-table-examples th { border: none; white-space: nowrap; padding-top: 0; padding-bottom: 0; }
233: .apple-table-examples tbody th:first-child { border-left: none; width: 100%; }
234: .apple-table-examples thead th:first-child ~ th { font-size: smaller; font-weight: bolder; border-bottom: solid 2px; text-align: center; }
235: .apple-table-examples tbody th::after, .apple-table-examples tfoot th::after { content: leader(". ") }
236: .apple-table-examples tbody th, .apple-table-examples tfoot th { font: inherit; text-align: left; }
237: .apple-table-examples td { text-align: right; vertical-align: top; }
238: .apple-table-examples.e1 tbody tr:last-child td { border-bottom: solid 1px; }
239: .apple-table-examples.e1 tbody + tbody tr:last-child td { border-bottom: double 3px; }
240: .apple-table-examples.e2 th[scope=row] { padding-left: 1em; }
241: .apple-table-examples sup { line-height: 0; }
242:
243: .details-example img { vertical-align: top; }
244:
1.60 mike 245: #base64-table {
246: white-space: nowrap;
247: font-size: 0.6em;
248: column-width: 6em;
249: column-count: 5;
250: column-gap: 1em;
251: -moz-column-width: 6em;
252: -moz-column-count: 5;
253: -moz-column-gap: 1em;
254: -webkit-column-width: 6em;
255: -webkit-column-count: 5;
256: -webkit-column-gap: 1em;
257: }
258: #base64-table thead { display: none; }
259: #base64-table * { border: none; }
260: #base64-table tbody td:first-child:after { content: ':'; }
261: #base64-table tbody td:last-child { text-align: right; }
262:
1.1 mike 263: #named-character-references-table {
1.41 mike 264: white-space: nowrap;
1.1 mike 265: font-size: 0.6em;
1.41 mike 266: column-width: 30em;
1.1 mike 267: column-gap: 1em;
1.41 mike 268: -moz-column-width: 30em;
1.1 mike 269: -moz-column-gap: 1em;
1.41 mike 270: -webkit-column-width: 30em;
1.1 mike 271: -webkit-column-gap: 1em;
272: }
1.41 mike 273: #named-character-references-table > table > tbody > tr > td:first-child + td,
1.1 mike 274: #named-character-references-table > table > tbody > tr > td:last-child { text-align: center; }
275: #named-character-references-table > table > tbody > tr > td:last-child:hover > span { position: absolute; top: auto; left: auto; margin-left: 0.5em; line-height: 1.2; font-size: 5em; border: outset; padding: 0.25em 0.5em; background: white; width: 1.25em; height: auto; text-align: center; }
1.41 mike 276: #named-character-references-table > table > tbody > tr#entity-CounterClockwiseContourIntegral > td:first-child { font-size: 0.5em; }
1.1 mike 277:
1.2 mike 278: .glyph.control { color: red; }
279:
1.4 mike 280: @font-face {
281: font-family: 'Essays1743';
282: src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743.ttf');
283: }
284: @font-face {
285: font-family: 'Essays1743';
286: font-weight: bold;
287: src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743-Bold.ttf');
288: }
289: @font-face {
290: font-family: 'Essays1743';
291: font-style: italic;
292: src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743-Italic.ttf');
293: }
294: @font-face {
295: font-family: 'Essays1743';
296: font-style: italic;
297: font-weight: bold;
298: src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743-BoldItalic.ttf');
299: }
300:
1.77 mike 301: </style><link href="data:text/css," id="complete" rel="stylesheet" title="Complete specification"><link href="data:text/css,.impl%20%7B%20display:%20none;%20%7D%0Ahtml%20%7B%20border:%20solid%20yellow;%20%7D%20.domintro:before%20%7B%20display:%20none;%20%7D" id="author" rel="alternate stylesheet" title="Author documentation only"><link href="data:text/css,.impl%20%7B%20background:%20%23FFEEEE;%20%7D%20.domintro:before%20%7B%20background:%20%23FFEEEE;%20%7D" id="highlight" rel="alternate stylesheet" title="Highlight implementation requirements"><script type="text/javascript">
1.68 mike 302: function getCookie(name) {
303: var params = location.search.substr(1).split("&");
304: for (var index = 0; index < params.length; index++) {
305: if (params[index] == name)
306: return "1";
307: var data = params[index].split("=");
308: if (data[0] == name)
309: return unescape(data[1]);
310: }
311: var cookies = document.cookie.split("; ");
312: for (var index = 0; index < cookies.length; index++) {
313: var data = cookies[index].split("=");
314: if (data[0] == name)
315: return unescape(data[1]);
316: }
317: return null;
318: }
319: </script>
1.1 mike 320: <script src="link-fixup.js"></script>
1.88 mike 321: <link href="parsing.html" title="8.2 Parsing HTML documents" rel="prev">
322: <link href="index.html#contents" title="Table of contents" rel="contents">
1.70 mike 323: <link href="tree-construction.html" title="8.2.5 Tree construction" rel="next">
1.100 mike 324: </head><body onload="fixBrokenLink();" class="split chapter"><div class="head" id="head">
1.1 mike 325: <p><a href="http://www.w3.org/"><img alt="W3C" height="48" src="http://www.w3.org/Icons/w3c_home" width="72"></a></p>
1.3 mike 326:
1.1 mike 327: <h1>HTML5</h1>
1.88 mike 328: <h2 class="no-num no-toc" id="a-vocabulary-and-associated-apis-for-html-and-xhtml">A vocabulary and associated APIs for HTML and XHTML</h2>
1.112 ! mike 329: <h2 class="no-num no-toc" id="editor-s-draft-31-january-2012">Editor's Draft 31 January 2012</h2>
1.88 mike 330: </div><nav class="prev_next">
331: <a href="parsing.html">← 8.2 Parsing HTML documents</a> –
332: <a href="index.html#contents">Table of contents</a> –
333: <a href="tree-construction.html">8.2.5 Tree construction →</a>
1.1 mike 334: <ol class="toc"><li><ol><li><ol><li><a href="tokenization.html#tokenization"><span class="secno">8.2.4 </span>Tokenization</a>
1.88 mike 335: <ol><li><a href="tokenization.html#data-state"><span class="secno">8.2.4.1 </span>Data state</a></li><li><a href="tokenization.html#character-reference-in-data-state"><span class="secno">8.2.4.2 </span>Character reference in data state</a></li><li><a href="tokenization.html#rcdata-state"><span class="secno">8.2.4.3 </span>RCDATA state</a></li><li><a href="tokenization.html#character-reference-in-rcdata-state"><span class="secno">8.2.4.4 </span>Character reference in RCDATA state</a></li><li><a href="tokenization.html#rawtext-state"><span class="secno">8.2.4.5 </span>RAWTEXT state</a></li><li><a href="tokenization.html#script-data-state"><span class="secno">8.2.4.6 </span>Script data state</a></li><li><a href="tokenization.html#plaintext-state"><span class="secno">8.2.4.7 </span>PLAINTEXT state</a></li><li><a href="tokenization.html#tag-open-state"><span class="secno">8.2.4.8 </span>Tag open state</a></li><li><a href="tokenization.html#end-tag-open-state"><span class="secno">8.2.4.9 </span>End tag open state</a></li><li><a href="tokenization.html#tag-name-state"><span class="secno">8.2.4.10 </span>Tag name state</a></li><li><a href="tokenization.html#rcdata-less-than-sign-state"><span class="secno">8.2.4.11 </span>RCDATA less-than sign state</a></li><li><a href="tokenization.html#rcdata-end-tag-open-state"><span class="secno">8.2.4.12 </span>RCDATA end tag open state</a></li><li><a href="tokenization.html#rcdata-end-tag-name-state"><span class="secno">8.2.4.13 </span>RCDATA end tag name state</a></li><li><a href="tokenization.html#rawtext-less-than-sign-state"><span class="secno">8.2.4.14 </span>RAWTEXT less-than sign state</a></li><li><a href="tokenization.html#rawtext-end-tag-open-state"><span class="secno">8.2.4.15 </span>RAWTEXT end tag open state</a></li><li><a href="tokenization.html#rawtext-end-tag-name-state"><span class="secno">8.2.4.16 </span>RAWTEXT end tag name state</a></li><li><a href="tokenization.html#script-data-less-than-sign-state"><span class="secno">8.2.4.17 </span>Script data less-than sign state</a></li><li><a href="tokenization.html#script-data-end-tag-open-state"><span class="secno">8.2.4.18 </span>Script data end tag open state</a></li><li><a href="tokenization.html#script-data-end-tag-name-state"><span class="secno">8.2.4.19 </span>Script data end tag name state</a></li><li><a href="tokenization.html#script-data-escape-start-state"><span class="secno">8.2.4.20 </span>Script data escape start state</a></li><li><a href="tokenization.html#script-data-escape-start-dash-state"><span class="secno">8.2.4.21 </span>Script data escape start dash state</a></li><li><a href="tokenization.html#script-data-escaped-state"><span class="secno">8.2.4.22 </span>Script data escaped state</a></li><li><a href="tokenization.html#script-data-escaped-dash-state"><span class="secno">8.2.4.23 </span>Script data escaped dash state</a></li><li><a href="tokenization.html#script-data-escaped-dash-dash-state"><span class="secno">8.2.4.24 </span>Script data escaped dash dash state</a></li><li><a href="tokenization.html#script-data-escaped-less-than-sign-state"><span class="secno">8.2.4.25 </span>Script data escaped less-than sign state</a></li><li><a href="tokenization.html#script-data-escaped-end-tag-open-state"><span class="secno">8.2.4.26 </span>Script data escaped end tag open state</a></li><li><a href="tokenization.html#script-data-escaped-end-tag-name-state"><span class="secno">8.2.4.27 </span>Script data escaped end tag name state</a></li><li><a href="tokenization.html#script-data-double-escape-start-state"><span class="secno">8.2.4.28 </span>Script data double escape start state</a></li><li><a href="tokenization.html#script-data-double-escaped-state"><span class="secno">8.2.4.29 </span>Script data double escaped state</a></li><li><a href="tokenization.html#script-data-double-escaped-dash-state"><span class="secno">8.2.4.30 </span>Script data double escaped dash state</a></li><li><a href="tokenization.html#script-data-double-escaped-dash-dash-state"><span class="secno">8.2.4.31 </span>Script data double escaped dash dash state</a></li><li><a href="tokenization.html#script-data-double-escaped-less-than-sign-state"><span class="secno">8.2.4.32 </span>Script data double escaped less-than sign state</a></li><li><a href="tokenization.html#script-data-double-escape-end-state"><span class="secno">8.2.4.33 </span>Script data double escape end state</a></li><li><a href="tokenization.html#before-attribute-name-state"><span class="secno">8.2.4.34 </span>Before attribute name state</a></li><li><a href="tokenization.html#attribute-name-state"><span class="secno">8.2.4.35 </span>Attribute name state</a></li><li><a href="tokenization.html#after-attribute-name-state"><span class="secno">8.2.4.36 </span>After attribute name state</a></li><li><a href="tokenization.html#before-attribute-value-state"><span class="secno">8.2.4.37 </span>Before attribute value state</a></li><li><a href="tokenization.html#attribute-value-double-quoted-state"><span class="secno">8.2.4.38 </span>Attribute value (double-quoted) state</a></li><li><a href="tokenization.html#attribute-value-single-quoted-state"><span class="secno">8.2.4.39 </span>Attribute value (single-quoted) state</a></li><li><a href="tokenization.html#attribute-value-unquoted-state"><span class="secno">8.2.4.40 </span>Attribute value (unquoted) state</a></li><li><a href="tokenization.html#character-reference-in-attribute-value-state"><span class="secno">8.2.4.41 </span>Character reference in attribute value state</a></li><li><a href="tokenization.html#after-attribute-value-quoted-state"><span class="secno">8.2.4.42 </span>After attribute value (quoted) state</a></li><li><a href="tokenization.html#self-closing-start-tag-state"><span class="secno">8.2.4.43 </span>Self-closing start tag state</a></li><li><a href="tokenization.html#bogus-comment-state"><span class="secno">8.2.4.44 </span>Bogus comment state</a></li><li><a href="tokenization.html#markup-declaration-open-state"><span class="secno">8.2.4.45 </span>Markup declaration open state</a></li><li><a href="tokenization.html#comment-start-state"><span class="secno">8.2.4.46 </span>Comment start state</a></li><li><a href="tokenization.html#comment-start-dash-state"><span class="secno">8.2.4.47 </span>Comment start dash state</a></li><li><a href="tokenization.html#comment-state"><span class="secno">8.2.4.48 </span>Comment state</a></li><li><a href="tokenization.html#comment-end-dash-state"><span class="secno">8.2.4.49 </span>Comment end dash state</a></li><li><a href="tokenization.html#comment-end-state"><span class="secno">8.2.4.50 </span>Comment end state</a></li><li><a href="tokenization.html#comment-end-bang-state"><span class="secno">8.2.4.51 </span>Comment end bang state</a></li><li><a href="tokenization.html#doctype-state"><span class="secno">8.2.4.52 </span>DOCTYPE state</a></li><li><a href="tokenization.html#before-doctype-name-state"><span class="secno">8.2.4.53 </span>Before DOCTYPE name state</a></li><li><a href="tokenization.html#doctype-name-state"><span class="secno">8.2.4.54 </span>DOCTYPE name state</a></li><li><a href="tokenization.html#after-doctype-name-state"><span class="secno">8.2.4.55 </span>After DOCTYPE name state</a></li><li><a href="tokenization.html#after-doctype-public-keyword-state"><span class="secno">8.2.4.56 </span>After DOCTYPE public keyword state</a></li><li><a href="tokenization.html#before-doctype-public-identifier-state"><span class="secno">8.2.4.57 </span>Before DOCTYPE public identifier state</a></li><li><a href="tokenization.html#doctype-public-identifier-double-quoted-state"><span class="secno">8.2.4.58 </span>DOCTYPE public identifier (double-quoted) state</a></li><li><a href="tokenization.html#doctype-public-identifier-single-quoted-state"><span class="secno">8.2.4.59 </span>DOCTYPE public identifier (single-quoted) state</a></li><li><a href="tokenization.html#after-doctype-public-identifier-state"><span class="secno">8.2.4.60 </span>After DOCTYPE public identifier state</a></li><li><a href="tokenization.html#between-doctype-public-and-system-identifiers-state"><span class="secno">8.2.4.61 </span>Between DOCTYPE public and system identifiers state</a></li><li><a href="tokenization.html#after-doctype-system-keyword-state"><span class="secno">8.2.4.62 </span>After DOCTYPE system keyword state</a></li><li><a href="tokenization.html#before-doctype-system-identifier-state"><span class="secno">8.2.4.63 </span>Before DOCTYPE system identifier state</a></li><li><a href="tokenization.html#doctype-system-identifier-double-quoted-state"><span class="secno">8.2.4.64 </span>DOCTYPE system identifier (double-quoted) state</a></li><li><a href="tokenization.html#doctype-system-identifier-single-quoted-state"><span class="secno">8.2.4.65 </span>DOCTYPE system identifier (single-quoted) state</a></li><li><a href="tokenization.html#after-doctype-system-identifier-state"><span class="secno">8.2.4.66 </span>After DOCTYPE system identifier state</a></li><li><a href="tokenization.html#bogus-doctype-state"><span class="secno">8.2.4.67 </span>Bogus DOCTYPE state</a></li><li><a href="tokenization.html#cdata-section-state"><span class="secno">8.2.4.68 </span>CDATA section state</a></li><li><a href="tokenization.html#tokenizing-character-references"><span class="secno">8.2.4.69 </span>Tokenizing character references</a></li></ol></li></ol></li></ol></li></ol></nav>
1.1 mike 336:
337: <div class="impl">
338:
1.29 mike 339: <h4 id="tokenization"><span class="secno">8.2.4 </span><dfn>Tokenization</dfn></h4>
1.1 mike 340:
341: <p>Implementations must act as if they used the following state
342: machine to tokenize HTML. The state machine must start in the
343: <a href="#data-state">data state</a>. Most states consume a single character,
344: which may have various side-effects, and either switches the state
1.87 mike 345: machine to a new state to <i>reconsume</i> the same character, or
346: switches it to a new state to consume the next character, or stays
347: in the same state to consume the next character. Some states have
348: more complicated behavior and can consume several characters before
349: switching to another state. In some cases, the tokenizer state is
350: also changed by the tree construction stage.</p>
1.1 mike 351:
352: <p>The exact behavior of certain states depends on the
353: <a href="parsing.html#insertion-mode">insertion mode</a> and the <a href="parsing.html#stack-of-open-elements">stack of open
354: elements</a>. Certain states also use a <dfn id="temporary-buffer"><var>temporary
355: buffer</var></dfn> to track progress.</p>
356:
357: <p>The output of the tokenization step is a series of zero or more
358: of the following tokens: DOCTYPE, start tag, end tag, comment,
359: character, end-of-file. DOCTYPE tokens have a name, a public
360: identifier, a system identifier, and a <i>force-quirks
361: flag</i>. When a DOCTYPE token is created, its name, public
362: identifier, and system identifier must be marked as missing (which
363: is a distinct state from the empty string), and the <i>force-quirks
364: flag</i> must be set to <i>off</i> (its other state is
365: <i>on</i>). Start and end tag tokens have a tag name, a
366: <i>self-closing flag</i>, and a list of attributes, each of which
367: has a name and a value. When a start or end tag token is created,
368: its <i>self-closing flag</i> must be unset (its other state is that
369: it be set), and its attributes list must be empty. Comment and
370: character tokens have data.</p>
371:
372: <p>When a token is emitted, it must immediately be handled by the
1.70 mike 373: <a href="tree-construction.html#tree-construction">tree construction</a> stage. The tree construction stage
1.1 mike 374: can affect the state of the tokenization stage, and can insert
375: additional characters into the stream. (For example, the
1.88 mike 376: <code><a href="the-script-element.html#the-script-element">script</a></code> element can result in scripts executing and
377: using the <a href="dynamic-markup-insertion.html#dynamic-markup-insertion">dynamic markup insertion</a> APIs to insert
1.1 mike 378: characters into the stream being tokenized.)</p>
379:
380: <p>When a start tag token is emitted with its <i>self-closing
381: flag</i> set, if the flag is not <dfn id="acknowledge-self-closing-flag" title="acknowledge
382: self-closing flag">acknowledged</dfn> when it is processed by the
383: tree construction stage, that is a <a href="parsing.html#parse-error">parse error</a>.</p>
384:
385: <p>When an end tag token is emitted with attributes, that is a
386: <a href="parsing.html#parse-error">parse error</a>.</p>
387:
388: <p>When an end tag token is emitted with its <i>self-closing
389: flag</i> set, that is a <a href="parsing.html#parse-error">parse error</a>.</p>
390:
391: <p>An <dfn id="appropriate-end-tag-token">appropriate end tag token</dfn> is an end tag token whose
392: tag name matches the tag name of the last start tag to have been
393: emitted from this tokenizer, if any. If no start tag has been
394: emitted from this tokenizer, then no end tag token is
395: appropriate.</p>
396:
397: <p>Before each step of the tokenizer, the user agent must first
398: check the <a href="parsing.html#parser-pause-flag">parser pause flag</a>. If it is true, then the
399: tokenizer must abort the processing of any nested invocations of the
400: tokenizer, yielding control back to the caller.</p>
401:
402: <p>The tokenizer state machine consists of the states defined in the
403: following subsections.</p>
404:
405:
1.70 mike 406:
1.1 mike 407:
1.90 mike 408:
1.29 mike 409: <h5 id="data-state"><span class="secno">8.2.4.1 </span><dfn>Data state</dfn></h5>
1.1 mike 410:
411: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
412:
413: <dl class="switch"><dt>U+0026 AMPERSAND (&)</dt>
414: <dd>Switch to the <a href="#character-reference-in-data-state">character reference in data
415: state</a>.</dd>
416:
417: <dt>U+003C LESS-THAN SIGN (<)</dt>
418: <dd>Switch to the <a href="#tag-open-state">tag open state</a>.</dd>
419:
1.51 mike 420: <dt>U+0000 NULL</dt>
421: <dd><a href="parsing.html#parse-error">Parse error</a>. Emit the <a href="parsing.html#current-input-character">current input
422: character</a> as a character token.</dd>
423:
1.1 mike 424: <dt>EOF</dt>
425: <dd>Emit an end-of-file token.</dd>
426:
427: <dt>Anything else</dt>
428: <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14 mike 429: token.</dd>
1.1 mike 430:
1.29 mike 431: </dl><h5 id="character-reference-in-data-state"><span class="secno">8.2.4.2 </span><dfn>Character reference in data state</dfn></h5>
1.1 mike 432:
1.87 mike 433: <p>Switch to the <a href="#data-state">data state</a>.</p>
434:
1.1 mike 435: <p>Attempt to <a href="#consume-a-character-reference">consume a character reference</a>, with no
436: <a href="#additional-allowed-character">additional allowed character</a>.</p>
437:
1.18 mike 438: <p>If nothing is returned, emit a U+0026 AMPERSAND character (&)
1.1 mike 439: token.</p>
440:
1.85 mike 441: <p>Otherwise, emit the character tokens that were returned.</p>
1.1 mike 442:
443:
1.29 mike 444: <h5 id="rcdata-state"><span class="secno">8.2.4.3 </span><dfn>RCDATA state</dfn></h5>
1.1 mike 445:
446: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
447:
448: <dl class="switch"><dt>U+0026 AMPERSAND (&)</dt>
449: <dd>Switch to the <a href="#character-reference-in-rcdata-state">character reference in RCDATA
450: state</a>.</dd>
451:
452: <dt>U+003C LESS-THAN SIGN (<)</dt>
453: <dd>Switch to the <a href="#rcdata-less-than-sign-state">RCDATA less-than sign state</a>.</dd>
454:
1.51 mike 455: <dt>U+0000 NULL</dt>
456: <dd><a href="parsing.html#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
457: character token.</dd>
458:
1.1 mike 459: <dt>EOF</dt>
460: <dd>Emit an end-of-file token.</dd>
461:
462: <dt>Anything else</dt>
463: <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14 mike 464: token.</dd>
1.1 mike 465:
466: </dl><h5 id="character-reference-in-rcdata-state"><span class="secno">8.2.4.4 </span><dfn>Character reference in RCDATA state</dfn></h5>
467:
1.87 mike 468: <p>Switch to the <a href="#rcdata-state">RCDATA state</a>.</p>
469:
1.1 mike 470: <p>Attempt to <a href="#consume-a-character-reference">consume a character reference</a>, with no
471: <a href="#additional-allowed-character">additional allowed character</a>.</p>
472:
1.18 mike 473: <p>If nothing is returned, emit a U+0026 AMPERSAND character (&)
1.1 mike 474: token.</p>
475:
1.85 mike 476: <p>Otherwise, emit the character tokens that were returned.</p>
1.1 mike 477:
478:
1.29 mike 479: <h5 id="rawtext-state"><span class="secno">8.2.4.5 </span><dfn>RAWTEXT state</dfn></h5>
1.1 mike 480:
481: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
482:
483: <dl class="switch"><dt>U+003C LESS-THAN SIGN (<)</dt>
484: <dd>Switch to the <a href="#rawtext-less-than-sign-state">RAWTEXT less-than sign state</a>.</dd>
485:
1.51 mike 486: <dt>U+0000 NULL</dt>
487: <dd><a href="parsing.html#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
488: character token.</dd>
489:
1.1 mike 490: <dt>EOF</dt>
491: <dd>Emit an end-of-file token.</dd>
492:
493: <dt>Anything else</dt>
494: <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14 mike 495: token.</dd>
1.1 mike 496:
1.29 mike 497: </dl><h5 id="script-data-state"><span class="secno">8.2.4.6 </span><dfn>Script data state</dfn></h5>
1.1 mike 498:
499: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
500:
501: <dl class="switch"><dt>U+003C LESS-THAN SIGN (<)</dt>
502: <dd>Switch to the <a href="#script-data-less-than-sign-state">script data less-than sign state</a>.</dd>
503:
1.51 mike 504: <dt>U+0000 NULL</dt>
505: <dd><a href="parsing.html#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
506: character token.</dd>
507:
1.1 mike 508: <dt>EOF</dt>
509: <dd>Emit an end-of-file token.</dd>
510:
511: <dt>Anything else</dt>
512: <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14 mike 513: token.</dd>
1.1 mike 514:
1.29 mike 515: </dl><h5 id="plaintext-state"><span class="secno">8.2.4.7 </span><dfn>PLAINTEXT state</dfn></h5>
1.1 mike 516:
517: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
518:
1.51 mike 519: <dl class="switch"><dt>U+0000 NULL</dt>
520: <dd><a href="parsing.html#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
521: character token.</dd>
522:
523: <dt>EOF</dt>
1.1 mike 524: <dd>Emit an end-of-file token.</dd>
525:
526: <dt>Anything else</dt>
527: <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14 mike 528: token.</dd>
1.1 mike 529:
1.29 mike 530: </dl><h5 id="tag-open-state"><span class="secno">8.2.4.8 </span><dfn>Tag open state</dfn></h5>
1.1 mike 531:
532: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
533:
534: <dl class="switch"><dt>U+0021 EXCLAMATION MARK (!)</dt>
535: <dd>Switch to the <a href="#markup-declaration-open-state">markup declaration open state</a>.</dd>
536:
537: <dt>U+002F SOLIDUS (/)</dt>
538: <dd>Switch to the <a href="#end-tag-open-state">end tag open state</a>.</dd>
539:
540: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
541: <dd>Create a new start tag token, set its tag name to the
542: lowercase version of the <a href="parsing.html#current-input-character">current input character</a> (add 0x0020 to the
543: character's code point), then switch to the <a href="#tag-name-state">tag name
544: state</a>. (Don't emit the token yet; further details will
545: be filled in before it is emitted.)</dd>
546:
547: <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
548: <dd>Create a new start tag token, set its tag name to the
549: <a href="parsing.html#current-input-character">current input character</a>, then switch to the <a href="#tag-name-state">tag
550: name state</a>. (Don't emit the token yet; further details will
551: be filled in before it is emitted.)</dd>
552:
553: <dt>U+003F QUESTION MARK (?)</dt>
554: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#bogus-comment-state">bogus
555: comment state</a>.</dd>
556:
557: <dt>Anything else</dt>
1.87 mike 558: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
559: state</a>. Emit a U+003C LESS-THAN SIGN character token.
560: Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 561:
562: </dl><h5 id="end-tag-open-state"><span class="secno">8.2.4.9 </span><dfn>End tag open state</dfn></h5>
563:
564: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
565:
566: <dl class="switch"><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
567: <dd>Create a new end tag token, set its tag name to the lowercase
568: version of the <a href="parsing.html#current-input-character">current input character</a> (add 0x0020 to
569: the character's code point), then switch to the <a href="#tag-name-state">tag name
570: state</a>. (Don't emit the token yet; further details will be
571: filled in before it is emitted.)</dd>
572:
573: <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
574: <dd>Create a new end tag token, set its tag name to the
575: <a href="parsing.html#current-input-character">current input character</a>, then switch to the <a href="#tag-name-state">tag
576: name state</a>. (Don't emit the token yet; further details will
577: be filled in before it is emitted.)</dd>
578:
579: <dt>U+003E GREATER-THAN SIGN (>)</dt>
580: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
581: state</a>.</dd>
582:
583: <dt>EOF</dt>
1.87 mike 584: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
585: state</a>. Emit a U+003C LESS-THAN SIGN character token and a
586: U+002F SOLIDUS character token. Reconsume the EOF character.</dd>
1.1 mike 587:
588: <dt>Anything else</dt>
589: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#bogus-comment-state">bogus
590: comment state</a>.</dd>
591:
1.29 mike 592: </dl><h5 id="tag-name-state"><span class="secno">8.2.4.10 </span><dfn>Tag name state</dfn></h5>
1.1 mike 593:
594: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
595:
1.73 mike 596: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 597: <dt>U+000A LINE FEED (LF)</dt>
598: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 599:
1.1 mike 600: <dt>U+0020 SPACE</dt>
601: <dd>Switch to the <a href="#before-attribute-name-state">before attribute name state</a>.</dd>
602:
603: <dt>U+002F SOLIDUS (/)</dt>
604: <dd>Switch to the <a href="#self-closing-start-tag-state">self-closing start tag state</a>.</dd>
605:
606: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 607: <dd>Switch to the <a href="#data-state">data state</a>. Emit the current tag
608: token.</dd>
1.1 mike 609:
610: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
611: <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
612: character</a> (add 0x0020 to the character's code point) to the
1.14 mike 613: current tag token's tag name.</dd>
1.1 mike 614:
1.51 mike 615: <dt>U+0000 NULL</dt>
616: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
617: character to the current tag token's tag name.</dd>
618:
1.1 mike 619: <dt>EOF</dt>
1.87 mike 620: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
621: state</a>. Reconsume the EOF character.</dd>
1.1 mike 622:
623: <dt>Anything else</dt>
624: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14 mike 625: tag token's tag name.</dd>
1.1 mike 626:
1.29 mike 627: </dl><h5 id="rcdata-less-than-sign-state"><span class="secno">8.2.4.11 </span><dfn>RCDATA less-than sign state</dfn></h5>
1.70 mike 628:
1.1 mike 629:
630: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
631:
632: <dl class="switch"><dt>U+002F SOLIDUS (/)</dt>
633: <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Switch
634: to the <a href="#rcdata-end-tag-open-state">RCDATA end tag open state</a>.</dd>
635:
636: <dt>Anything else</dt>
1.87 mike 637: <dd>Switch to the <a href="#rcdata-state">RCDATA state</a>. Emit a U+003C
638: LESS-THAN SIGN character token. Reconsume the <a href="parsing.html#current-input-character">current
639: input character</a>.</dd>
1.1 mike 640:
1.29 mike 641: </dl><h5 id="rcdata-end-tag-open-state"><span class="secno">8.2.4.12 </span><dfn>RCDATA end tag open state</dfn></h5>
1.70 mike 642:
1.1 mike 643:
644: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
645:
646: <dl class="switch"><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
647: <dd>Create a new end tag token, and set its tag name to the
648: lowercase version of the <a href="parsing.html#current-input-character">current input character</a> (add
649: 0x0020 to the character's code point). Append the <a href="parsing.html#current-input-character">current
650: input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
651: switch to the <a href="#rcdata-end-tag-name-state">RCDATA end tag name state</a>. (Don't emit
652: the token yet; further details will be filled in before it is
653: emitted.)</dd>
654:
655: <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
656: <dd>Create a new end tag token, and set its tag name to the
657: <a href="parsing.html#current-input-character">current input character</a>. Append the <a href="parsing.html#current-input-character">current
658: input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
659: switch to the <a href="#rcdata-end-tag-name-state">RCDATA end tag name state</a>. (Don't emit
660: the token yet; further details will be filled in before it is
661: emitted.)</dd>
662:
663: <dt>Anything else</dt>
1.87 mike 664: <dd>Switch to the <a href="#rcdata-state">RCDATA state</a>. Emit a U+003C
665: LESS-THAN SIGN character token and a U+002F SOLIDUS character token.
666: Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 667:
1.29 mike 668: </dl><h5 id="rcdata-end-tag-name-state"><span class="secno">8.2.4.13 </span><dfn>RCDATA end tag name state</dfn></h5>
1.70 mike 669:
1.1 mike 670:
671: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
672:
1.73 mike 673: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 674: <dt>U+000A LINE FEED (LF)</dt>
675: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 676:
1.1 mike 677: <dt>U+0020 SPACE</dt>
678: <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
679: token</a>, then switch to the <a href="#before-attribute-name-state">before attribute name
680: state</a>. Otherwise, treat it as per the "anything else" entry
681: below.</dd>
682:
683: <dt>U+002F SOLIDUS (/)</dt>
684: <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
685: token</a>, then switch to the <a href="#self-closing-start-tag-state">self-closing start tag
686: state</a>. Otherwise, treat it as per the "anything else" entry
687: below.</dd>
688:
689: <dt>U+003E GREATER-THAN SIGN (>)</dt>
690: <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
1.87 mike 691: token</a>, then switch to the <a href="#data-state">data state</a> and emit
692: the current tag token. Otherwise, treat it as per the "anything
1.1 mike 693: else" entry below.</dd>
694:
695: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
696: <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
697: character</a> (add 0x0020 to the character's code point) to the
698: current tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14 mike 699: character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1 mike 700:
701: <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
702: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
703: tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14 mike 704: character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1 mike 705:
706: <dt>Anything else</dt>
1.87 mike 707: <dd>Switch to the <a href="#rcdata-state">RCDATA state</a>. Emit a U+003C
708: LESS-THAN SIGN character token, a U+002F SOLIDUS character token,
709: and a character token for each of the characters in the
710: <var><a href="#temporary-buffer">temporary buffer</a></var> (in the order they were added to the
711: buffer). Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 712:
1.29 mike 713: </dl><h5 id="rawtext-less-than-sign-state"><span class="secno">8.2.4.14 </span><dfn>RAWTEXT less-than sign state</dfn></h5>
1.70 mike 714:
1.1 mike 715:
716: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
717:
718: <dl class="switch"><dt>U+002F SOLIDUS (/)</dt>
719: <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Switch
720: to the <a href="#rawtext-end-tag-open-state">RAWTEXT end tag open state</a>.</dd>
721:
722: <dt>Anything else</dt>
1.87 mike 723: <dd>Switch to the <a href="#rawtext-state">RAWTEXT state</a>. Emit a U+003C
724: LESS-THAN SIGN character token. Reconsume the <a href="parsing.html#current-input-character">current
725: input character</a>.</dd>
1.1 mike 726:
1.29 mike 727: </dl><h5 id="rawtext-end-tag-open-state"><span class="secno">8.2.4.15 </span><dfn>RAWTEXT end tag open state</dfn></h5>
1.70 mike 728:
1.1 mike 729:
730: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
731:
732: <dl class="switch"><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
733: <dd>Create a new end tag token, and set its tag name to the
734: lowercase version of the <a href="parsing.html#current-input-character">current input character</a> (add
735: 0x0020 to the character's code point). Append the <a href="parsing.html#current-input-character">current
736: input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
737: switch to the <a href="#rawtext-end-tag-name-state">RAWTEXT end tag name state</a>. (Don't emit
738: the token yet; further details will be filled in before it is
739: emitted.)</dd>
740:
741: <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
742: <dd>Create a new end tag token, and set its tag name to the
743: <a href="parsing.html#current-input-character">current input character</a>. Append the <a href="parsing.html#current-input-character">current
744: input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
745: switch to the <a href="#rawtext-end-tag-name-state">RAWTEXT end tag name state</a>. (Don't emit
746: the token yet; further details will be filled in before it is
747: emitted.)</dd>
748:
749: <dt>Anything else</dt>
1.87 mike 750: <dd>Switch to the <a href="#rawtext-state">RAWTEXT state</a>. Emit a U+003C
751: LESS-THAN SIGN character token and a U+002F SOLIDUS character
752: token. Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 753:
1.29 mike 754: </dl><h5 id="rawtext-end-tag-name-state"><span class="secno">8.2.4.16 </span><dfn>RAWTEXT end tag name state</dfn></h5>
1.70 mike 755:
1.1 mike 756:
757: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
758:
1.73 mike 759: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 760: <dt>U+000A LINE FEED (LF)</dt>
761: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 762:
1.1 mike 763: <dt>U+0020 SPACE</dt>
764: <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
765: token</a>, then switch to the <a href="#before-attribute-name-state">before attribute name
766: state</a>. Otherwise, treat it as per the "anything else" entry
767: below.</dd>
768:
769: <dt>U+002F SOLIDUS (/)</dt>
770: <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
771: token</a>, then switch to the <a href="#self-closing-start-tag-state">self-closing start tag
772: state</a>. Otherwise, treat it as per the "anything else" entry
773: below.</dd>
774:
775: <dt>U+003E GREATER-THAN SIGN (>)</dt>
776: <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
1.87 mike 777: token</a>, then switch to the <a href="#data-state">data state</a> and emit
778: the current tag token. Otherwise, treat it as per the "anything
1.1 mike 779: else" entry below.</dd>
780:
781: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
782: <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
783: character</a> (add 0x0020 to the character's code point) to the
784: current tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14 mike 785: character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1 mike 786:
787: <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
788: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
789: tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14 mike 790: character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1 mike 791:
792: <dt>Anything else</dt>
1.87 mike 793: <dd>Switch to the <a href="#rawtext-state">RAWTEXT state</a>. Emit a U+003C
794: LESS-THAN SIGN character token, a U+002F SOLIDUS character token,
795: and a character token for each of the characters in the
796: <var><a href="#temporary-buffer">temporary buffer</a></var> (in the order they were added to the
797: buffer). Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 798:
1.29 mike 799: </dl><h5 id="script-data-less-than-sign-state"><span class="secno">8.2.4.17 </span><dfn>Script data less-than sign state</dfn></h5>
1.1 mike 800:
801: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
802:
803: <dl class="switch"><dt>U+002F SOLIDUS (/)</dt>
804: <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Switch
805: to the <a href="#script-data-end-tag-open-state">script data end tag open state</a>.</dd>
806:
807: <dt>U+0021 EXCLAMATION MARK (!)</dt>
1.14 mike 808: <dd>Switch to the <a href="#script-data-escape-start-state">script data escape start state</a>. Emit
809: a U+003C LESS-THAN SIGN character token and a U+0021 EXCLAMATION
810: MARK character token.</dd>
1.1 mike 811:
812: <dt>Anything else</dt>
1.87 mike 813: <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003C
814: LESS-THAN SIGN character token. Reconsume the <a href="parsing.html#current-input-character">current
815: input character</a>.</dd>
1.1 mike 816:
1.29 mike 817: </dl><h5 id="script-data-end-tag-open-state"><span class="secno">8.2.4.18 </span><dfn>Script data end tag open state</dfn></h5>
1.70 mike 818:
1.1 mike 819:
820: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
821:
822: <dl class="switch"><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
823: <dd>Create a new end tag token, and set its tag name to the
824: lowercase version of the <a href="parsing.html#current-input-character">current input character</a> (add
825: 0x0020 to the character's code point). Append the <a href="parsing.html#current-input-character">current
826: input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
827: switch to the <a href="#script-data-end-tag-name-state">script data end tag name state</a>. (Don't emit
828: the token yet; further details will be filled in before it is
829: emitted.)</dd>
830:
831: <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
832: <dd>Create a new end tag token, and set its tag name to the
833: <a href="parsing.html#current-input-character">current input character</a>. Append the <a href="parsing.html#current-input-character">current
834: input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
835: switch to the <a href="#script-data-end-tag-name-state">script data end tag name state</a>. (Don't emit
836: the token yet; further details will be filled in before it is
837: emitted.)</dd>
838:
839: <dt>Anything else</dt>
1.87 mike 840: <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003C
841: LESS-THAN SIGN character token and a U+002F SOLIDUS character token.
842: Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 843:
1.29 mike 844: </dl><h5 id="script-data-end-tag-name-state"><span class="secno">8.2.4.19 </span><dfn>Script data end tag name state</dfn></h5>
1.70 mike 845:
1.1 mike 846:
847: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
848:
1.73 mike 849: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 850: <dt>U+000A LINE FEED (LF)</dt>
851: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 852:
1.1 mike 853: <dt>U+0020 SPACE</dt>
854: <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
855: token</a>, then switch to the <a href="#before-attribute-name-state">before attribute name
856: state</a>. Otherwise, treat it as per the "anything else" entry
857: below.</dd>
858:
859: <dt>U+002F SOLIDUS (/)</dt>
860: <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
861: token</a>, then switch to the <a href="#self-closing-start-tag-state">self-closing start tag
862: state</a>. Otherwise, treat it as per the "anything else" entry
863: below.</dd>
864:
865: <dt>U+003E GREATER-THAN SIGN (>)</dt>
866: <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
1.87 mike 867: token</a>, then switch to the <a href="#data-state">data state</a> and emit
868: the current tag token. Otherwise, treat it as per the "anything
1.1 mike 869: else" entry below.</dd>
870:
871: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
872: <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
873: character</a> (add 0x0020 to the character's code point) to the
874: current tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14 mike 875: character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1 mike 876:
877: <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
878: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
879: tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14 mike 880: character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1 mike 881:
882: <dt>Anything else</dt>
1.87 mike 883: <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003C
884: LESS-THAN SIGN character token, a U+002F SOLIDUS character token,
885: and a character token for each of the characters in the
886: <var><a href="#temporary-buffer">temporary buffer</a></var> (in the order they were added to the
887: buffer). Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 888:
1.29 mike 889: </dl><h5 id="script-data-escape-start-state"><span class="secno">8.2.4.20 </span><dfn>Script data escape start state</dfn></h5>
1.1 mike 890:
891: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
892:
893: <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14 mike 894: <dd>Switch to the <a href="#script-data-escape-start-dash-state">script data escape start dash
895: state</a>. Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1 mike 896:
897: <dt>Anything else</dt>
1.87 mike 898: <dd>Switch to the <a href="#script-data-state">script data state</a>. Reconsume the
899: <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 900:
1.29 mike 901: </dl><h5 id="script-data-escape-start-dash-state"><span class="secno">8.2.4.21 </span><dfn>Script data escape start dash state</dfn></h5>
1.1 mike 902:
903: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
904:
905: <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14 mike 906: <dd>Switch to the <a href="#script-data-escaped-dash-dash-state">script data escaped dash dash
907: state</a>. Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1 mike 908:
909: <dt>Anything else</dt>
1.87 mike 910: <dd>Switch to the <a href="#script-data-state">script data state</a>. Reconsume the
911: <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 912:
1.29 mike 913: </dl><h5 id="script-data-escaped-state"><span class="secno">8.2.4.22 </span><dfn>Script data escaped state</dfn></h5>
1.1 mike 914:
915: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
916:
917: <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14 mike 918: <dd>Switch to the <a href="#script-data-escaped-dash-state">script data escaped dash state</a>. Emit
919: a U+002D HYPHEN-MINUS character token.</dd>
1.1 mike 920:
921: <dt>U+003C LESS-THAN SIGN (<)</dt>
1.61 mike 922: <dd>Switch to the <a href="#script-data-escaped-less-than-sign-state">script data escaped less-than sign
923: state</a>.</dd>
1.1 mike 924:
1.51 mike 925: <dt>U+0000 NULL</dt>
926: <dd><a href="parsing.html#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
927: character token.</dd>
928:
1.1 mike 929: <dt>EOF</dt>
1.87 mike 930: <dd>Switch to the <a href="#data-state">data state</a>. <a href="parsing.html#parse-error">Parse
931: error</a>. Reconsume the EOF character.</dd>
1.1 mike 932:
933: <dt>Anything else</dt>
934: <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14 mike 935: token.</dd>
1.1 mike 936:
1.29 mike 937: </dl><h5 id="script-data-escaped-dash-state"><span class="secno">8.2.4.23 </span><dfn>Script data escaped dash state</dfn></h5>
1.1 mike 938:
939: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
940:
941: <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14 mike 942: <dd>Switch to the <a href="#script-data-escaped-dash-dash-state">script data escaped dash dash
943: state</a>. Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1 mike 944:
945: <dt>U+003C LESS-THAN SIGN (<)</dt>
1.61 mike 946: <dd>Switch to the <a href="#script-data-escaped-less-than-sign-state">script data escaped less-than sign
947: state</a>.</dd>
1.1 mike 948:
1.51 mike 949: <dt>U+0000 NULL</dt>
950: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#script-data-escaped-state">script data
951: escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER character
952: token.</dd>
953:
1.1 mike 954: <dt>EOF</dt>
1.87 mike 955: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
956: state</a>. Reconsume the EOF character.</dd>
1.1 mike 957:
958: <dt>Anything else</dt>
1.14 mike 959: <dd>Switch to the <a href="#script-data-escaped-state">script data escaped state</a>. Emit the
960: <a href="parsing.html#current-input-character">current input character</a> as a character token.</dd>
1.1 mike 961:
1.29 mike 962: </dl><h5 id="script-data-escaped-dash-dash-state"><span class="secno">8.2.4.24 </span><dfn>Script data escaped dash dash state</dfn></h5>
1.1 mike 963:
964: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
965:
966: <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14 mike 967: <dd>Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1 mike 968:
969: <dt>U+003C LESS-THAN SIGN (<)</dt>
1.61 mike 970: <dd>Switch to the <a href="#script-data-escaped-less-than-sign-state">script data escaped less-than sign
971: state</a>.</dd>
1.1 mike 972:
973: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 974: <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003E
975: GREATER-THAN SIGN character token.</dd>
1.1 mike 976:
1.51 mike 977: <dt>U+0000 NULL</dt>
978: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#script-data-escaped-state">script data
979: escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER character
980: token.</dd>
981:
1.1 mike 982: <dt>EOF</dt>
1.87 mike 983: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
984: state</a>. Reconsume the EOF character.</dd>
1.1 mike 985:
986: <dt>Anything else</dt>
1.14 mike 987: <dd>Switch to the <a href="#script-data-escaped-state">script data escaped state</a>. Emit the
988: <a href="parsing.html#current-input-character">current input character</a> as a character token.</dd>
1.1 mike 989:
1.29 mike 990: </dl><h5 id="script-data-escaped-less-than-sign-state"><span class="secno">8.2.4.25 </span><dfn>Script data escaped less-than sign state</dfn></h5>
1.1 mike 991:
992: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
993:
994: <dl class="switch"><dt>U+002F SOLIDUS (/)</dt>
995: <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Switch
996: to the <a href="#script-data-escaped-end-tag-open-state">script data escaped end tag open state</a>.</dd>
997:
998: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
1.14 mike 999: <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Append
1000: the lowercase version of the <a href="parsing.html#current-input-character">current input character</a>
1001: (add 0x0020 to the character's code point) to the <var><a href="#temporary-buffer">temporary
1.1 mike 1002: buffer</a></var>. Switch to the <a href="#script-data-double-escape-start-state">script data double escape start
1.14 mike 1003: state</a>. Emit a U+003C LESS-THAN SIGN character token and the
1004: <a href="parsing.html#current-input-character">current input character</a> as a character token.</dd>
1.1 mike 1005:
1006: <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
1.14 mike 1007: <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Append
1008: the <a href="parsing.html#current-input-character">current input character</a> to the <var><a href="#temporary-buffer">temporary
1.1 mike 1009: buffer</a></var>. Switch to the <a href="#script-data-double-escape-start-state">script data double escape start
1.14 mike 1010: state</a>. Emit a U+003C LESS-THAN SIGN character token and the
1011: <a href="parsing.html#current-input-character">current input character</a> as a character token.</dd>
1.1 mike 1012:
1013: <dt>Anything else</dt>
1.87 mike 1014: <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003C
1015: LESS-THAN SIGN character token. Reconsume the <a href="parsing.html#current-input-character">current
1016: input character</a>.</dd>
1.1 mike 1017:
1.29 mike 1018: </dl><h5 id="script-data-escaped-end-tag-open-state"><span class="secno">8.2.4.26 </span><dfn>Script data escaped end tag open state</dfn></h5>
1.1 mike 1019:
1020: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1021:
1022: <dl class="switch"><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
1023: <dd>Create a new end tag token, and set its tag name to the
1024: lowercase version of the <a href="parsing.html#current-input-character">current input character</a> (add
1025: 0x0020 to the character's code point). Append the <a href="parsing.html#current-input-character">current
1026: input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
1027: switch to the <a href="#script-data-escaped-end-tag-name-state">script data escaped end tag name
1028: state</a>. (Don't emit the token yet; further details will be
1029: filled in before it is emitted.)</dd>
1030:
1031: <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
1032: <dd>Create a new end tag token, and set its tag name to the
1033: <a href="parsing.html#current-input-character">current input character</a>. Append the <a href="parsing.html#current-input-character">current
1034: input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
1035: switch to the <a href="#script-data-escaped-end-tag-name-state">script data escaped end tag name
1036: state</a>. (Don't emit the token yet; further details will be
1037: filled in before it is emitted.)</dd>
1038:
1039: <dt>Anything else</dt>
1.87 mike 1040: <dd>Switch to the <a href="#script-data-escaped-state">script data escaped state</a>. Emit a
1041: U+003C LESS-THAN SIGN character token and a U+002F SOLIDUS
1042: character token. Reconsume the <a href="parsing.html#current-input-character">current input
1043: character</a>.</dd>
1.1 mike 1044:
1.29 mike 1045: </dl><h5 id="script-data-escaped-end-tag-name-state"><span class="secno">8.2.4.27 </span><dfn>Script data escaped end tag name state</dfn></h5>
1.1 mike 1046:
1047: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1048:
1.73 mike 1049: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1050: <dt>U+000A LINE FEED (LF)</dt>
1051: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1052:
1.1 mike 1053: <dt>U+0020 SPACE</dt>
1054: <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
1055: token</a>, then switch to the <a href="#before-attribute-name-state">before attribute name
1056: state</a>. Otherwise, treat it as per the "anything else" entry
1057: below.</dd>
1058:
1059: <dt>U+002F SOLIDUS (/)</dt>
1060: <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
1061: token</a>, then switch to the <a href="#self-closing-start-tag-state">self-closing start tag
1062: state</a>. Otherwise, treat it as per the "anything else" entry
1063: below.</dd>
1064:
1065: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1066: <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
1.87 mike 1067: token</a>, then switch to the <a href="#data-state">data state</a> and emit
1068: the current tag token. Otherwise, treat it as per the "anything
1.1 mike 1069: else" entry below.</dd>
1070:
1071: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
1072: <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
1073: character</a> (add 0x0020 to the character's code point) to the
1074: current tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14 mike 1075: character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1 mike 1076:
1077: <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
1078: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1079: tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14 mike 1080: character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1 mike 1081:
1082: <dt>Anything else</dt>
1.87 mike 1083: <dd>Switch to the <a href="#script-data-escaped-state">script data escaped state</a>. Emit a
1084: U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS character
1085: token, and a character token for each of the characters in the
1086: <var><a href="#temporary-buffer">temporary buffer</a></var> (in the order they were added to the
1087: buffer). Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 1088:
1.29 mike 1089: </dl><h5 id="script-data-double-escape-start-state"><span class="secno">8.2.4.28 </span><dfn>Script data double escape start state</dfn></h5>
1.1 mike 1090:
1091: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1092:
1.73 mike 1093: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1094: <dt>U+000A LINE FEED (LF)</dt>
1095: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1096:
1.1 mike 1097: <dt>U+0020 SPACE</dt>
1098: <dt>U+002F SOLIDUS (/)</dt>
1099: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1100: <dd>If the <var><a href="#temporary-buffer">temporary buffer</a></var> is the string "<code title="">script</code>", then switch to the <a href="#script-data-double-escaped-state">script data
1.1 mike 1101: double escaped state</a>. Otherwise, switch to the <a href="#script-data-escaped-state">script
1.14 mike 1102: data escaped state</a>. Emit the <a href="parsing.html#current-input-character">current input
1103: character</a> as a character token.</dd>
1.1 mike 1104:
1105: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
1.14 mike 1106: <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
1.1 mike 1107: character</a> (add 0x0020 to the character's code point) to the
1.14 mike 1108: <var><a href="#temporary-buffer">temporary buffer</a></var>. Emit the <a href="parsing.html#current-input-character">current input
1109: character</a> as a character token.</dd>
1.1 mike 1110:
1111: <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
1.14 mike 1112: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the
1113: <var><a href="#temporary-buffer">temporary buffer</a></var>. Emit the <a href="parsing.html#current-input-character">current input
1114: character</a> as a character token.</dd>
1.1 mike 1115:
1116: <dt>Anything else</dt>
1.87 mike 1117: <dd>Switch to the <a href="#script-data-escaped-state">script data escaped state</a>. Reconsume
1118: the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 1119:
1.29 mike 1120: </dl><h5 id="script-data-double-escaped-state"><span class="secno">8.2.4.29 </span><dfn>Script data double escaped state</dfn></h5>
1.1 mike 1121:
1122: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1123:
1124: <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14 mike 1125: <dd>Switch to the <a href="#script-data-double-escaped-dash-state">script data double escaped dash
1126: state</a>. Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1 mike 1127:
1128: <dt>U+003C LESS-THAN SIGN (<)</dt>
1.61 mike 1129: <dd>Switch to the <a href="#script-data-double-escaped-less-than-sign-state">script data double escaped less-than
1.14 mike 1130: sign state</a>. Emit a U+003C LESS-THAN SIGN character
1.61 mike 1131: token.</dd>
1.1 mike 1132:
1.51 mike 1133: <dt>U+0000 NULL</dt>
1134: <dd><a href="parsing.html#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
1135: character token.</dd>
1136:
1.1 mike 1137: <dt>EOF</dt>
1.87 mike 1138: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1139: state</a>. Reconsume the EOF character.</dd>
1.1 mike 1140:
1141: <dt>Anything else</dt>
1142: <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14 mike 1143: token.</dd>
1.1 mike 1144:
1.29 mike 1145: </dl><h5 id="script-data-double-escaped-dash-state"><span class="secno">8.2.4.30 </span><dfn>Script data double escaped dash state</dfn></h5>
1.1 mike 1146:
1147: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1148:
1149: <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14 mike 1150: <dd>Switch to the <a href="#script-data-double-escaped-dash-dash-state">script data double escaped dash dash
1151: state</a>. Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1 mike 1152:
1153: <dt>U+003C LESS-THAN SIGN (<)</dt>
1.61 mike 1154: <dd>Switch to the <a href="#script-data-double-escaped-less-than-sign-state">script data double escaped less-than
1.14 mike 1155: sign state</a>. Emit a U+003C LESS-THAN SIGN character
1.61 mike 1156: token.</dd>
1.1 mike 1157:
1.51 mike 1158: <dt>U+0000 NULL</dt>
1159: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#script-data-double-escaped-state">script data
1160: double escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER
1161: character token.</dd>
1162:
1.1 mike 1163: <dt>EOF</dt>
1.87 mike 1164: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1165: state</a>. Reconsume the EOF character.</dd>
1.1 mike 1166:
1167: <dt>Anything else</dt>
1.14 mike 1168: <dd>Switch to the <a href="#script-data-double-escaped-state">script data double escaped
1169: state</a>. Emit the <a href="parsing.html#current-input-character">current input character</a> as a
1170: character token.</dd>
1.1 mike 1171:
1.29 mike 1172: </dl><h5 id="script-data-double-escaped-dash-dash-state"><span class="secno">8.2.4.31 </span><dfn>Script data double escaped dash dash state</dfn></h5>
1.1 mike 1173:
1174: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1175:
1176: <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14 mike 1177: <dd>Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1 mike 1178:
1179: <dt>U+003C LESS-THAN SIGN (<)</dt>
1.61 mike 1180: <dd>Switch to the <a href="#script-data-double-escaped-less-than-sign-state">script data double escaped less-than
1.14 mike 1181: sign state</a>. Emit a U+003C LESS-THAN SIGN character
1.61 mike 1182: token.</dd>
1.1 mike 1183:
1184: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1185: <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003E
1186: GREATER-THAN SIGN character token.</dd>
1.1 mike 1187:
1.51 mike 1188: <dt>U+0000 NULL</dt>
1189: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#script-data-double-escaped-state">script data
1190: double escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER
1191: character token.</dd>
1192:
1.1 mike 1193: <dt>EOF</dt>
1.87 mike 1194: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1195: state</a>. Reconsume the EOF character.</dd>
1.1 mike 1196:
1197: <dt>Anything else</dt>
1.14 mike 1198: <dd>Switch to the <a href="#script-data-double-escaped-state">script data double escaped
1199: state</a>. Emit the <a href="parsing.html#current-input-character">current input character</a> as a
1200: character token.</dd>
1.1 mike 1201:
1.29 mike 1202: </dl><h5 id="script-data-double-escaped-less-than-sign-state"><span class="secno">8.2.4.32 </span><dfn>Script data double escaped less-than sign state</dfn></h5>
1.1 mike 1203:
1204: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1205:
1206: <dl class="switch"><dt>U+002F SOLIDUS (/)</dt>
1.14 mike 1207: <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Switch
1208: to the <a href="#script-data-double-escape-end-state">script data double escape end state</a>. Emit a
1209: U+002F SOLIDUS character token.</dd>
1.1 mike 1210:
1211: <dt>Anything else</dt>
1.87 mike 1212: <dd>Switch to the <a href="#script-data-double-escaped-state">script data double escaped state</a>.
1213: Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 1214:
1.29 mike 1215: </dl><h5 id="script-data-double-escape-end-state"><span class="secno">8.2.4.33 </span><dfn>Script data double escape end state</dfn></h5>
1.1 mike 1216:
1217: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1218:
1.73 mike 1219: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1220: <dt>U+000A LINE FEED (LF)</dt>
1221: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1222:
1.1 mike 1223: <dt>U+0020 SPACE</dt>
1224: <dt>U+002F SOLIDUS (/)</dt>
1225: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1226: <dd>If the <var><a href="#temporary-buffer">temporary buffer</a></var> is the string "<code title="">script</code>", then switch to the <a href="#script-data-escaped-state">script data
1.1 mike 1227: escaped state</a>. Otherwise, switch to the <a href="#script-data-double-escaped-state">script data
1.14 mike 1228: double escaped state</a>. Emit the <a href="parsing.html#current-input-character">current input
1229: character</a> as a character token.</dd>
1.1 mike 1230:
1231: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
1.14 mike 1232: <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
1.1 mike 1233: character</a> (add 0x0020 to the character's code point) to the
1.14 mike 1234: <var><a href="#temporary-buffer">temporary buffer</a></var>. Emit the <a href="parsing.html#current-input-character">current input
1235: character</a> as a character token.</dd>
1.1 mike 1236:
1237: <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
1.14 mike 1238: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the
1239: <var><a href="#temporary-buffer">temporary buffer</a></var>. Emit the <a href="parsing.html#current-input-character">current input
1240: character</a> as a character token.</dd>
1.1 mike 1241:
1242: <dt>Anything else</dt>
1.87 mike 1243: <dd>Switch to the <a href="#script-data-double-escaped-state">script data double escaped state</a>.
1244: Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 1245:
1.29 mike 1246: </dl><h5 id="before-attribute-name-state"><span class="secno">8.2.4.34 </span><dfn>Before attribute name state</dfn></h5>
1.1 mike 1247:
1248: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1249:
1.73 mike 1250: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1251: <dt>U+000A LINE FEED (LF)</dt>
1252: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1253:
1.1 mike 1254: <dt>U+0020 SPACE</dt>
1.14 mike 1255: <dd>Ignore the character.</dd>
1.1 mike 1256:
1257: <dt>U+002F SOLIDUS (/)</dt>
1258: <dd>Switch to the <a href="#self-closing-start-tag-state">self-closing start tag state</a>.</dd>
1259:
1260: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1261: <dd>Switch to the <a href="#data-state">data state</a>. Emit the current tag
1262: token.</dd>
1.1 mike 1263:
1264: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
1265: <dd>Start a new attribute in the current tag token. Set that
1266: attribute's name to the lowercase version of the <a href="parsing.html#current-input-character">current input
1267: character</a> (add 0x0020 to the character's code point), and its
1268: value to the empty string. Switch to the <a href="#attribute-name-state">attribute name
1269: state</a>.</dd>
1270:
1.51 mike 1271: <dt>U+0000 NULL</dt>
1272: <dd><a href="parsing.html#parse-error">Parse error</a>. Start a new attribute in the current
1273: tag token. Set that attribute's name to a U+FFFD REPLACEMENT
1274: CHARACTER character, and its value to the empty string. Switch to
1275: the <a href="#attribute-name-state">attribute name state</a>.</dd>
1276:
1.1 mike 1277: <dt>U+0022 QUOTATION MARK (")</dt>
1278: <dt>U+0027 APOSTROPHE (')</dt>
1279: <dt>U+003C LESS-THAN SIGN (<)</dt>
1280: <dt>U+003D EQUALS SIGN (=)</dt>
1281: <dd><a href="parsing.html#parse-error">Parse error</a>. Treat it as per the "anything else"
1282: entry below.</dd>
1283:
1284: <dt>EOF</dt>
1.87 mike 1285: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1286: state</a>. Reconsume the EOF character.</dd>
1.1 mike 1287:
1288: <dt>Anything else</dt>
1289: <dd>Start a new attribute in the current tag token. Set that
1.51 mike 1290: attribute's name to the <a href="parsing.html#current-input-character">current input character</a>, and
1291: its value to the empty string. Switch to the <a href="#attribute-name-state">attribute name
1.1 mike 1292: state</a>.</dd>
1293:
1.29 mike 1294: </dl><h5 id="attribute-name-state"><span class="secno">8.2.4.35 </span><dfn>Attribute name state</dfn></h5>
1.1 mike 1295:
1296: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1297:
1.73 mike 1298: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1299: <dt>U+000A LINE FEED (LF)</dt>
1300: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1301:
1.1 mike 1302: <dt>U+0020 SPACE</dt>
1303: <dd>Switch to the <a href="#after-attribute-name-state">after attribute name state</a>.</dd>
1304:
1305: <dt>U+002F SOLIDUS (/)</dt>
1306: <dd>Switch to the <a href="#self-closing-start-tag-state">self-closing start tag state</a>.</dd>
1307:
1308: <dt>U+003D EQUALS SIGN (=)</dt>
1309: <dd>Switch to the <a href="#before-attribute-value-state">before attribute value state</a>.</dd>
1310:
1311: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1312: <dd>Switch to the <a href="#data-state">data state</a>. Emit the current tag
1313: token.</dd>
1.1 mike 1314:
1315: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
1316: <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
1317: character</a> (add 0x0020 to the character's code point) to the
1.14 mike 1318: current attribute's name.</dd>
1.1 mike 1319:
1.51 mike 1320: <dt>U+0000 NULL</dt>
1321: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
1322: character to the current attribute's name.</dd>
1323:
1.1 mike 1324: <dt>U+0022 QUOTATION MARK (")</dt>
1325: <dt>U+0027 APOSTROPHE (')</dt>
1326: <dt>U+003C LESS-THAN SIGN (<)</dt>
1327: <dd><a href="parsing.html#parse-error">Parse error</a>. Treat it as per the "anything else"
1328: entry below.</dd>
1329:
1330: <dt>EOF</dt>
1.87 mike 1331: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1332: state</a>. Reconsume the EOF character.</dd>
1.1 mike 1333:
1334: <dt>Anything else</dt>
1335: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14 mike 1336: attribute's name.</dd>
1.1 mike 1337:
1338: </dl><p>When the user agent leaves the attribute name state (and before
1339: emitting the tag token, if appropriate), the complete attribute's
1340: name must be compared to the other attributes on the same token;
1341: if there is already an attribute on the token with the exact same
1342: name, then this is a <a href="parsing.html#parse-error">parse error</a> and the new
1343: attribute must be dropped, along with the value that gets
1344: associated with it (if any).</p>
1345:
1346:
1.29 mike 1347: <h5 id="after-attribute-name-state"><span class="secno">8.2.4.36 </span><dfn>After attribute name state</dfn></h5>
1.1 mike 1348:
1349: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1350:
1.73 mike 1351: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1352: <dt>U+000A LINE FEED (LF)</dt>
1353: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1354:
1.1 mike 1355: <dt>U+0020 SPACE</dt>
1.14 mike 1356: <dd>Ignore the character.</dd>
1.1 mike 1357:
1358: <dt>U+002F SOLIDUS (/)</dt>
1359: <dd>Switch to the <a href="#self-closing-start-tag-state">self-closing start tag state</a>.</dd>
1360:
1361: <dt>U+003D EQUALS SIGN (=)</dt>
1362: <dd>Switch to the <a href="#before-attribute-value-state">before attribute value state</a>.</dd>
1363:
1364: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1365: <dd>Switch to the <a href="#data-state">data state</a>. Emit the current tag
1366: token.</dd>
1.1 mike 1367:
1368: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
1369: <dd>Start a new attribute in the current tag token. Set that
1370: attribute's name to the lowercase version of the <a href="parsing.html#current-input-character">current
1371: input character</a> (add 0x0020 to the character's code point),
1372: and its value to the empty string. Switch to the <a href="#attribute-name-state">attribute
1373: name state</a>.</dd>
1374:
1.51 mike 1375: <dt>U+0000 NULL</dt>
1376: <dd><a href="parsing.html#parse-error">Parse error</a>. Start a new attribute in the current
1377: tag token. Set that attribute's name to a U+FFFD REPLACEMENT
1378: CHARACTER character, and its value to the empty string. Switch to
1379: the <a href="#attribute-name-state">attribute name state</a>.</dd>
1380:
1.1 mike 1381: <dt>U+0022 QUOTATION MARK (")</dt>
1382: <dt>U+0027 APOSTROPHE (')</dt>
1383: <dt>U+003C LESS-THAN SIGN (<)</dt>
1384: <dd><a href="parsing.html#parse-error">Parse error</a>. Treat it as per the "anything else"
1385: entry below.</dd>
1386:
1387: <dt>EOF</dt>
1.87 mike 1388: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1389: state</a>. Reconsume the EOF character.</dd>
1.1 mike 1390:
1391: <dt>Anything else</dt>
1392: <dd>Start a new attribute in the current tag token. Set that
1393: attribute's name to the <a href="parsing.html#current-input-character">current input character</a>, and
1394: its value to the empty string. Switch to the <a href="#attribute-name-state">attribute name
1395: state</a>.</dd>
1396:
1.29 mike 1397: </dl><h5 id="before-attribute-value-state"><span class="secno">8.2.4.37 </span><dfn>Before attribute value state</dfn></h5>
1.1 mike 1398:
1399: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1400:
1.73 mike 1401: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1402: <dt>U+000A LINE FEED (LF)</dt>
1403: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1404:
1.1 mike 1405: <dt>U+0020 SPACE</dt>
1.14 mike 1406: <dd>Ignore the character.</dd>
1.1 mike 1407:
1408: <dt>U+0022 QUOTATION MARK (")</dt>
1409: <dd>Switch to the <a href="#attribute-value-double-quoted-state">attribute value (double-quoted) state</a>.</dd>
1410:
1411: <dt>U+0026 AMPERSAND (&)</dt>
1.87 mike 1412: <dd>Switch to the <a href="#attribute-value-unquoted-state">attribute value (unquoted) state</a>.
1413: Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1 mike 1414:
1415: <dt>U+0027 APOSTROPHE (')</dt>
1416: <dd>Switch to the <a href="#attribute-value-single-quoted-state">attribute value (single-quoted) state</a>.</dd>
1417:
1.51 mike 1418: <dt>U+0000 NULL</dt>
1419: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
1420: character to the current attribute's value. Switch to the
1421: <a href="#attribute-value-unquoted-state">attribute value (unquoted) state</a>.</dd>
1422:
1.1 mike 1423: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1424: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1425: state</a>. Emit the current tag token.</dd>
1.1 mike 1426:
1427: <dt>U+003C LESS-THAN SIGN (<)</dt>
1428: <dt>U+003D EQUALS SIGN (=)</dt>
1429: <dt>U+0060 GRAVE ACCENT (`)</dt>
1430: <dd><a href="parsing.html#parse-error">Parse error</a>. Treat it as per the "anything else"
1431: entry below.</dd>
1432:
1433: <dt>EOF</dt>
1.87 mike 1434: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1435: state</a>. Reconsume the EOF character.</dd>
1.1 mike 1436:
1437: <dt>Anything else</dt>
1438: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1439: attribute's value. Switch to the <a href="#attribute-value-unquoted-state">attribute value (unquoted)
1440: state</a>.</dd>
1441:
1442: </dl><h5 id="attribute-value-double-quoted-state"><span class="secno">8.2.4.38 </span><dfn>Attribute value (double-quoted) state</dfn></h5>
1443:
1444: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1445:
1446: <dl class="switch"><dt>U+0022 QUOTATION MARK (")</dt>
1447: <dd>Switch to the <a href="#after-attribute-value-quoted-state">after attribute value (quoted)
1448: state</a>.</dd>
1449:
1450: <dt>U+0026 AMPERSAND (&)</dt>
1451: <dd>Switch to the <a href="#character-reference-in-attribute-value-state">character reference in attribute value
1452: state</a>, with the <a href="#additional-allowed-character">additional allowed character</a>
1453: being U+0022 QUOTATION MARK (").</dd>
1454:
1.51 mike 1455: <dt>U+0000 NULL</dt>
1456: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
1457: character to the current attribute's value.</dd>
1458:
1.1 mike 1459: <dt>EOF</dt>
1.87 mike 1460: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1461: state</a>. Reconsume the EOF character.</dd>
1.1 mike 1462:
1463: <dt>Anything else</dt>
1464: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14 mike 1465: attribute's value.</dd>
1.1 mike 1466:
1467: </dl><h5 id="attribute-value-single-quoted-state"><span class="secno">8.2.4.39 </span><dfn>Attribute value (single-quoted) state</dfn></h5>
1468:
1469: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1470:
1471: <dl class="switch"><dt>U+0027 APOSTROPHE (')</dt>
1472: <dd>Switch to the <a href="#after-attribute-value-quoted-state">after attribute value (quoted)
1473: state</a>.</dd>
1474:
1475: <dt>U+0026 AMPERSAND (&)</dt>
1476: <dd>Switch to the <a href="#character-reference-in-attribute-value-state">character reference in attribute value
1477: state</a>, with the <a href="#additional-allowed-character">additional allowed character</a>
1478: being U+0027 APOSTROPHE (').</dd>
1479:
1.61 mike 1480: <dt>U+0000 NULL</dt>
1481: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
1482: character to the current attribute's value.</dd>
1483:
1.1 mike 1484: <dt>EOF</dt>
1.87 mike 1485: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1486: state</a>. Reconsume the EOF character.</dd>
1.1 mike 1487:
1488: <dt>Anything else</dt>
1489: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14 mike 1490: attribute's value.</dd>
1.1 mike 1491:
1492: </dl><h5 id="attribute-value-unquoted-state"><span class="secno">8.2.4.40 </span><dfn>Attribute value (unquoted) state</dfn></h5>
1493:
1494: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1495:
1.73 mike 1496: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1497: <dt>U+000A LINE FEED (LF)</dt>
1498: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1499:
1.1 mike 1500: <dt>U+0020 SPACE</dt>
1501: <dd>Switch to the <a href="#before-attribute-name-state">before attribute name state</a>.</dd>
1502:
1503: <dt>U+0026 AMPERSAND (&)</dt>
1504: <dd>Switch to the <a href="#character-reference-in-attribute-value-state">character reference in attribute value
1505: state</a>, with the <a href="#additional-allowed-character">additional allowed character</a>
1506: being U+003E GREATER-THAN SIGN (>).</dd>
1507:
1508: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1509: <dd>Switch to the <a href="#data-state">data state</a>. Emit the current tag
1510: token.</dd>
1.1 mike 1511:
1.51 mike 1512: <dt>U+0000 NULL</dt>
1513: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
1514: character to the current attribute's value.</dd>
1515:
1.1 mike 1516: <dt>U+0022 QUOTATION MARK (")</dt>
1517: <dt>U+0027 APOSTROPHE (')</dt>
1518: <dt>U+003C LESS-THAN SIGN (<)</dt>
1519: <dt>U+003D EQUALS SIGN (=)</dt>
1520: <dt>U+0060 GRAVE ACCENT (`)</dt>
1521: <dd><a href="parsing.html#parse-error">Parse error</a>. Treat it as per the "anything else"
1522: entry below.</dd>
1523:
1524: <dt>EOF</dt>
1.87 mike 1525: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1526: state</a>. Reconsume the EOF character.</dd>
1.1 mike 1527:
1528: <dt>Anything else</dt>
1529: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14 mike 1530: attribute's value.</dd>
1.1 mike 1531:
1.29 mike 1532: </dl><h5 id="character-reference-in-attribute-value-state"><span class="secno">8.2.4.41 </span><dfn>Character reference in attribute value state</dfn></h5>
1.1 mike 1533:
1534: <p>Attempt to <a href="#consume-a-character-reference">consume a character reference</a>.</p>
1535:
1.18 mike 1536: <p>If nothing is returned, append a U+0026 AMPERSAND character
1537: (&) to the current attribute's value.</p>
1.1 mike 1538:
1.85 mike 1539: <p>Otherwise, append the returned character tokens to the current
1.1 mike 1540: attribute's value.</p>
1541:
1.27 mike 1542: <p>Finally, switch back to the attribute value state that switched
1543: into this state.</p>
1.1 mike 1544:
1545:
1546: <h5 id="after-attribute-value-quoted-state"><span class="secno">8.2.4.42 </span><dfn>After attribute value (quoted) state</dfn></h5>
1547:
1548: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1549:
1.73 mike 1550: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1551: <dt>U+000A LINE FEED (LF)</dt>
1552: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1553:
1.1 mike 1554: <dt>U+0020 SPACE</dt>
1555: <dd>Switch to the <a href="#before-attribute-name-state">before attribute name state</a>.</dd>
1556:
1557: <dt>U+002F SOLIDUS (/)</dt>
1558: <dd>Switch to the <a href="#self-closing-start-tag-state">self-closing start tag state</a>.</dd>
1559:
1560: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1561: <dd>Switch to the <a href="#data-state">data state</a>. Emit the current tag
1562: token.</dd>
1.1 mike 1563:
1564: <dt>EOF</dt>
1.87 mike 1565: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1566: state</a>. Reconsume the EOF character.</dd>
1.1 mike 1567:
1568: <dt>Anything else</dt>
1.87 mike 1569: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#before-attribute-name-state">before attribute
1570: name state</a>. Reconsume the character.</dd>
1.1 mike 1571:
1.29 mike 1572: </dl><h5 id="self-closing-start-tag-state"><span class="secno">8.2.4.43 </span><dfn>Self-closing start tag state</dfn></h5>
1.1 mike 1573:
1574: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1575:
1576: <dl class="switch"><dt>U+003E GREATER-THAN SIGN (>)</dt>
1577: <dd>Set the <i>self-closing flag</i> of the current tag
1.14 mike 1578: token. Switch to the <a href="#data-state">data state</a>. Emit the current tag
1579: token.</dd>
1.1 mike 1580:
1581: <dt>EOF</dt>
1.87 mike 1582: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1583: state</a>. Reconsume the EOF character.</dd>
1.1 mike 1584:
1585: <dt>Anything else</dt>
1.87 mike 1586: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#before-attribute-name-state">before attribute
1587: name state</a>. Reconsume the character.</dd>
1.1 mike 1588:
1.29 mike 1589: </dl><h5 id="bogus-comment-state"><span class="secno">8.2.4.44 </span><dfn>Bogus comment state</dfn></h5>
1.1 mike 1590:
1591: <p>Consume every character up to and including the first U+003E
1592: GREATER-THAN SIGN character (>) or the end of the file (EOF),
1593: whichever comes first. Emit a comment token whose data is the
1.51 mike 1594: concatenation of all the characters starting from and including the
1595: character that caused the state machine to switch into the bogus
1596: comment state, up to and including the character immediately before
1597: the last consumed character (i.e. up to the character just before
1598: the U+003E or EOF character), but with any U+0000 NULL characters
1599: replaced by U+FFFD REPLACEMENT CHARACTER characters. (If the comment
1600: was started by the end of the file (EOF), the token is empty.)</p>
1.1 mike 1601:
1602: <p>Switch to the <a href="#data-state">data state</a>.</p>
1603:
1604: <p>If the end of the file was reached, reconsume the EOF
1605: character.</p>
1606:
1607:
1.29 mike 1608: <h5 id="markup-declaration-open-state"><span class="secno">8.2.4.45 </span><dfn>Markup declaration open state</dfn></h5>
1.1 mike 1609:
1610: <p>If the next two characters are both U+002D HYPHEN-MINUS
1611: characters (-), consume those two characters, create a comment token
1612: whose data is the empty string, and switch to the <a href="#comment-start-state">comment
1613: start state</a>.</p>
1614:
1615: <p>Otherwise, if the next seven characters are an <a href="infrastructure.html#ascii-case-insensitive">ASCII
1616: case-insensitive</a> match for the word "DOCTYPE", then consume
1617: those characters and switch to the <a href="#doctype-state">DOCTYPE state</a>.</p>
1618:
1.86 mike 1619: <p>Otherwise, if there is a <a href="parsing.html#current-node">current node</a> and it is not
1620: an element in the <a href="namespaces.html#html-namespace-0">HTML namespace</a> and the next seven
1621: characters are a <a href="infrastructure.html#case-sensitive">case-sensitive</a> match for the string
1622: "[CDATA[" (the five uppercase letters "CDATA" with a U+005B LEFT
1623: SQUARE BRACKET character before and after), then consume those
1624: characters and switch to the <a href="#cdata-section-state">CDATA section state</a>.</p>
1.1 mike 1625:
1626: <p>Otherwise, this is a <a href="parsing.html#parse-error">parse error</a>. Switch to the
1627: <a href="#bogus-comment-state">bogus comment state</a>. The next character that is
1628: consumed, if any, is the first character that will be in the
1629: comment.</p>
1630:
1631:
1.29 mike 1632: <h5 id="comment-start-state"><span class="secno">8.2.4.46 </span><dfn>Comment start state</dfn></h5>
1.1 mike 1633:
1634: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1635:
1636: <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1637: <dd>Switch to the <a href="#comment-start-dash-state">comment start dash state</a>.</dd>
1638:
1.51 mike 1639: <dt>U+0000 NULL</dt>
1640: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
1641: character to the comment token's data. Switch to the <a href="#comment-state">comment
1642: state</a>.</dd>
1643:
1.1 mike 1644: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1645: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1.70 mike 1646: state</a>. Emit the comment token.</dd>
1.90 mike 1647:
1.1 mike 1648: <dt>EOF</dt>
1.87 mike 1649: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1650: state</a>. Emit the comment token. Reconsume the EOF character.</dd>
1.1 mike 1651:
1652: <dt>Anything else</dt>
1653: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the comment
1654: token's data. Switch to the <a href="#comment-state">comment state</a>.</dd>
1655:
1.29 mike 1656: </dl><h5 id="comment-start-dash-state"><span class="secno">8.2.4.47 </span><dfn>Comment start dash state</dfn></h5>
1.1 mike 1657:
1658: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1659:
1660: <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1661: <dd>Switch to the <a href="#comment-end-state">comment end state</a></dd>
1662:
1.51 mike 1663: <dt>U+0000 NULL</dt>
1664: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+002D HYPHEN-MINUS
1665: character (-) and a U+FFFD REPLACEMENT CHARACTER character to the
1666: comment token's data. Switch to the <a href="#comment-state">comment
1667: state</a>.</dd>
1668:
1.1 mike 1669: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1670: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1671: state</a>. Emit the comment token.</dd>
1.1 mike 1672:
1673: <dt>EOF</dt>
1.87 mike 1674: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1675: state</a>. Emit the comment token. Reconsume the EOF
1676: character.</dd>
1677:
1.1 mike 1678: <dt>Anything else</dt>
1679: <dd>Append a U+002D HYPHEN-MINUS character (-) and the
1680: <a href="parsing.html#current-input-character">current input character</a> to the comment token's
1681: data. Switch to the <a href="#comment-state">comment state</a>.</dd>
1682:
1.29 mike 1683: </dl><h5 id="comment-state"><span class="secno">8.2.4.48 </span><dfn id="comment">Comment state</dfn></h5>
1.1 mike 1684:
1685: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1686:
1687: <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1688: <dd>Switch to the <a href="#comment-end-dash-state">comment end dash state</a></dd>
1689:
1.51 mike 1690: <dt>U+0000 NULL</dt>
1691: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
1692: character to the comment token's data.</dd>
1693:
1.1 mike 1694: <dt>EOF</dt>
1.87 mike 1695: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1696: state</a>. Emit the comment token. Reconsume the EOF
1697: character.</dd>
1698:
1.1 mike 1699: <dt>Anything else</dt>
1700: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the comment
1.14 mike 1701: token's data.</dd>
1.1 mike 1702:
1.29 mike 1703: </dl><h5 id="comment-end-dash-state"><span class="secno">8.2.4.49 </span><dfn>Comment end dash state</dfn></h5>
1.1 mike 1704:
1705: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1706:
1707: <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1708: <dd>Switch to the <a href="#comment-end-state">comment end state</a></dd>
1709:
1.51 mike 1710: <dt>U+0000 NULL</dt>
1711: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+002D HYPHEN-MINUS
1712: character (-) and a U+FFFD REPLACEMENT CHARACTER character to the
1713: comment token's data. Switch to the <a href="#comment-state">comment
1714: state</a>.</dd>
1715:
1.1 mike 1716: <dt>EOF</dt>
1.87 mike 1717: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1718: state</a>. Emit the comment token. Reconsume the EOF
1719: character.</dd>
1720:
1.1 mike 1721: <dt>Anything else</dt>
1722: <dd>Append a U+002D HYPHEN-MINUS character (-) and the
1723: <a href="parsing.html#current-input-character">current input character</a> to the comment token's
1724: data. Switch to the <a href="#comment-state">comment state</a>.</dd>
1725:
1.29 mike 1726: </dl><h5 id="comment-end-state"><span class="secno">8.2.4.50 </span><dfn>Comment end state</dfn></h5>
1.1 mike 1727:
1728: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1729:
1730: <dl class="switch"><dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1731: <dd>Switch to the <a href="#data-state">data state</a>. Emit the comment
1732: token.</dd>
1.1 mike 1733:
1.51 mike 1734: <dt>U+0000 NULL</dt>
1735: <dd><a href="parsing.html#parse-error">Parse error</a>. Append two U+002D HYPHEN-MINUS
1736: characters (-) and a U+FFFD REPLACEMENT CHARACTER character to the
1737: comment token's data. Switch to the <a href="#comment-state">comment
1738: state</a>.</dd>
1739:
1.1 mike 1740: <dt>U+0021 EXCLAMATION MARK (!)</dt>
1741: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#comment-end-bang-state">comment end bang
1742: state</a>.</dd>
1743:
1744: <dt>U+002D HYPHEN-MINUS (-)</dt>
1745: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+002D HYPHEN-MINUS
1.14 mike 1746: character (-) to the comment token's data.</dd>
1.1 mike 1747:
1748: <dt>EOF</dt>
1.87 mike 1749: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1750: state</a>. Emit the comment token. Reconsume the EOF
1751: character.</dd>
1.90 mike 1752:
1.1 mike 1753: <dt>Anything else</dt>
1754: <dd><a href="parsing.html#parse-error">Parse error</a>. Append two U+002D HYPHEN-MINUS
1755: characters (-) and the <a href="parsing.html#current-input-character">current input character</a> to the
1756: comment token's data. Switch to the <a href="#comment-state">comment
1757: state</a>.</dd>
1758:
1.29 mike 1759: </dl><h5 id="comment-end-bang-state"><span class="secno">8.2.4.51 </span><dfn>Comment end bang state</dfn></h5>
1.1 mike 1760:
1761: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1762:
1763: <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1764: <dd>Append two U+002D HYPHEN-MINUS characters (-) and a U+0021
1765: EXCLAMATION MARK character (!) to the comment token's data. Switch
1766: to the <a href="#comment-end-dash-state">comment end dash state</a>.</dd>
1767:
1768: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1769: <dd>Switch to the <a href="#data-state">data state</a>. Emit the comment
1770: token.</dd>
1.1 mike 1771:
1.51 mike 1772: <dt>U+0000 NULL</dt>
1773: <dd><a href="parsing.html#parse-error">Parse error</a>. Append two U+002D HYPHEN-MINUS
1774: characters (-), a U+0021 EXCLAMATION MARK character (!), and a
1775: U+FFFD REPLACEMENT CHARACTER character to the comment token's data.
1776: Switch to the <a href="#comment-state">comment state</a>.</dd>
1777:
1.1 mike 1778: <dt>EOF</dt>
1.87 mike 1779: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1780: state</a>. Emit the comment token. Reconsume the EOF
1781: character.</dd>
1782:
1.1 mike 1783: <dt>Anything else</dt>
1784: <dd>Append two U+002D HYPHEN-MINUS characters (-), a U+0021
1785: EXCLAMATION MARK character (!), and the <a href="parsing.html#current-input-character">current input
1786: character</a> to the comment token's data. Switch to the
1787: <a href="#comment-state">comment state</a>.</dd>
1788:
1.37 mike 1789: </dl><h5 id="doctype-state"><span class="secno">8.2.4.52 </span><dfn>DOCTYPE state</dfn></h5>
1.1 mike 1790:
1791: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1792:
1.73 mike 1793: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1794: <dt>U+000A LINE FEED (LF)</dt>
1795: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1796:
1.1 mike 1797: <dt>U+0020 SPACE</dt>
1798: <dd>Switch to the <a href="#before-doctype-name-state">before DOCTYPE name state</a>.</dd>
1799:
1800: <dt>EOF</dt>
1.87 mike 1801: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1802: state</a>. Create a new DOCTYPE token. Set its <i>force-quirks
1803: flag</i> to <i>on</i>. Emit the token. Reconsume the EOF
1804: character.</dd>
1.1 mike 1805:
1806: <dt>Anything else</dt>
1.87 mike 1807: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#before-doctype-name-state">before DOCTYPE
1808: name state</a>. Reconsume the character.</dd>
1.1 mike 1809:
1.37 mike 1810: </dl><h5 id="before-doctype-name-state"><span class="secno">8.2.4.53 </span><dfn>Before DOCTYPE name state</dfn></h5>
1.1 mike 1811:
1812: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1813:
1.73 mike 1814: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1815: <dt>U+000A LINE FEED (LF)</dt>
1816: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1817:
1.1 mike 1818: <dt>U+0020 SPACE</dt>
1.14 mike 1819: <dd>Ignore the character.</dd>
1.1 mike 1820:
1821: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
1822: <dd>Create a new DOCTYPE token. Set the token's name to the
1823: lowercase version of the <a href="parsing.html#current-input-character">current input character</a> (add 0x0020 to the
1824: character's code point). Switch to the <a href="#doctype-name-state">DOCTYPE name
1825: state</a>.</dd>
1826:
1.51 mike 1827: <dt>U+0000 NULL</dt>
1.72 mike 1828: <dd><a href="parsing.html#parse-error">Parse error</a>. Create a new DOCTYPE token. Set the
1829: token's name to a U+FFFD REPLACEMENT CHARACTER character. Switch to
1830: the <a href="#doctype-name-state">DOCTYPE name state</a>.</dd>
1.51 mike 1831:
1.1 mike 1832: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1833: <dd><a href="parsing.html#parse-error">Parse error</a>. Create a new DOCTYPE token. Set its
1.14 mike 1834: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
1835: state</a>. Emit the token.</dd>
1.1 mike 1836:
1837: <dt>EOF</dt>
1.87 mike 1838: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1839: state</a>. Create a new DOCTYPE token. Set its <i>force-quirks
1840: flag</i> to <i>on</i>. Emit the token. Reconsume the EOF
1841: character.</dd>
1.1 mike 1842:
1843: <dt>Anything else</dt>
1844: <dd>Create a new DOCTYPE token. Set the token's name to the
1845: <a href="parsing.html#current-input-character">current input character</a>. Switch to the <a href="#doctype-name-state">DOCTYPE name
1846: state</a>.</dd>
1847:
1.37 mike 1848: </dl><h5 id="doctype-name-state"><span class="secno">8.2.4.54 </span><dfn>DOCTYPE name state</dfn></h5>
1.1 mike 1849:
1850: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1851:
1.73 mike 1852: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1853: <dt>U+000A LINE FEED (LF)</dt>
1854: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1855:
1.1 mike 1856: <dt>U+0020 SPACE</dt>
1857: <dd>Switch to the <a href="#after-doctype-name-state">after DOCTYPE name state</a>.</dd>
1858:
1859: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1860: <dd>Switch to the <a href="#data-state">data state</a>. Emit the current DOCTYPE
1861: token.</dd>
1.1 mike 1862:
1863: <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
1864: <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
1865: character</a> (add 0x0020 to the character's code point) to the
1.14 mike 1866: current DOCTYPE token's name.</dd>
1.1 mike 1867:
1.51 mike 1868: <dt>U+0000 NULL</dt>
1869: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
1870: character to the current DOCTYPE token's name.</dd>
1871:
1.1 mike 1872: <dt>EOF</dt>
1.87 mike 1873: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1874: state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
1875: <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1 mike 1876:
1877: <dt>Anything else</dt>
1878: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14 mike 1879: DOCTYPE token's name.</dd>
1.1 mike 1880:
1.37 mike 1881: </dl><h5 id="after-doctype-name-state"><span class="secno">8.2.4.55 </span><dfn>After DOCTYPE name state</dfn></h5>
1.1 mike 1882:
1883: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1884:
1.73 mike 1885: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1886: <dt>U+000A LINE FEED (LF)</dt>
1887: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1888:
1.1 mike 1889: <dt>U+0020 SPACE</dt>
1.14 mike 1890: <dd>Ignore the character.</dd>
1.1 mike 1891:
1892: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 1893: <dd>Switch to the <a href="#data-state">data state</a>. Emit the current DOCTYPE
1894: token.</dd>
1.1 mike 1895:
1896: <dt>EOF</dt>
1.87 mike 1897: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1898: state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
1899: <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1 mike 1900:
1901: <dt>Anything else</dt>
1902: <dd>
1903:
1904: <p>If the six characters starting from the <a href="parsing.html#current-input-character">current input
1905: character</a> are an <a href="infrastructure.html#ascii-case-insensitive">ASCII case-insensitive</a> match
1906: for the word "PUBLIC", then consume those characters and switch to
1907: the <a href="#after-doctype-public-keyword-state">after DOCTYPE public keyword state</a>.</p>
1908:
1909: <p>Otherwise, if the six characters starting from the
1910: <a href="parsing.html#current-input-character">current input character</a> are an <a href="infrastructure.html#ascii-case-insensitive">ASCII
1911: case-insensitive</a> match for the word "SYSTEM", then consume
1912: those characters and switch to the <a href="#after-doctype-system-keyword-state">after DOCTYPE system
1913: keyword state</a>.</p>
1914:
1915: <p>Otherwise, this is the <a href="parsing.html#parse-error">parse error</a>. Set the
1916: DOCTYPE token's <i>force-quirks flag</i> to <i>on</i>. Switch to
1917: the <a href="#bogus-doctype-state">bogus DOCTYPE state</a>.</p>
1918:
1919: </dd>
1920:
1.37 mike 1921: </dl><h5 id="after-doctype-public-keyword-state"><span class="secno">8.2.4.56 </span><dfn>After DOCTYPE public keyword state</dfn></h5>
1.1 mike 1922:
1923: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1924:
1.73 mike 1925: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1926: <dt>U+000A LINE FEED (LF)</dt>
1927: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1928:
1.1 mike 1929: <dt>U+0020 SPACE</dt>
1930: <dd>Switch to the <a href="#before-doctype-public-identifier-state">before DOCTYPE public identifier
1931: state</a>.</dd>
1932:
1933: <dt>U+0022 QUOTATION MARK (")</dt>
1934: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's public
1935: identifier to the empty string (not missing), then switch to the
1936: <a href="#doctype-public-identifier-double-quoted-state">DOCTYPE public identifier (double-quoted) state</a>.</dd>
1937:
1938: <dt>U+0027 APOSTROPHE (')</dt>
1939: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's public
1940: identifier to the empty string (not missing), then switch to the
1941: <a href="#doctype-public-identifier-single-quoted-state">DOCTYPE public identifier (single-quoted) state</a>.</dd>
1942:
1943: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1944: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14 mike 1945: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
1946: state</a>. Emit that DOCTYPE token.</dd>
1.1 mike 1947:
1948: <dt>EOF</dt>
1.87 mike 1949: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1950: state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
1951: <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1 mike 1952:
1953: <dt>Anything else</dt>
1954: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1955: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#bogus-doctype-state">bogus
1956: DOCTYPE state</a>.</dd>
1957:
1.37 mike 1958: </dl><h5 id="before-doctype-public-identifier-state"><span class="secno">8.2.4.57 </span><dfn>Before DOCTYPE public identifier state</dfn></h5>
1.1 mike 1959:
1960: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1961:
1.73 mike 1962: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 1963: <dt>U+000A LINE FEED (LF)</dt>
1964: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 1965:
1.1 mike 1966: <dt>U+0020 SPACE</dt>
1.14 mike 1967: <dd>Ignore the character.</dd>
1.1 mike 1968:
1969: <dt>U+0022 QUOTATION MARK (")</dt>
1970: <dd>Set the DOCTYPE token's public identifier to the empty string
1971: (not missing), then switch to the <a href="#doctype-public-identifier-double-quoted-state">DOCTYPE public identifier
1972: (double-quoted) state</a>.</dd>
1973:
1974: <dt>U+0027 APOSTROPHE (')</dt>
1975: <dd>Set the DOCTYPE token's public identifier to the empty string
1976: (not missing), then switch to the <a href="#doctype-public-identifier-single-quoted-state">DOCTYPE public identifier
1977: (single-quoted) state</a>.</dd>
1978:
1979: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1980: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14 mike 1981: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
1982: state</a>. Emit that DOCTYPE token.</dd>
1.1 mike 1983:
1984: <dt>EOF</dt>
1.87 mike 1985: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1986: state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
1987: <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1 mike 1988:
1989: <dt>Anything else</dt>
1990: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1991: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#bogus-doctype-state">bogus
1992: DOCTYPE state</a>.</dd>
1993:
1.37 mike 1994: </dl><h5 id="doctype-public-identifier-double-quoted-state"><span class="secno">8.2.4.58 </span><dfn>DOCTYPE public identifier (double-quoted) state</dfn></h5>
1.1 mike 1995:
1996: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
1997:
1998: <dl class="switch"><dt>U+0022 QUOTATION MARK (")</dt>
1999: <dd>Switch to the <a href="#after-doctype-public-identifier-state">after DOCTYPE public identifier state</a>.</dd>
2000:
1.51 mike 2001: <dt>U+0000 NULL</dt>
2002: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
2003: character to the current DOCTYPE token's public identifier.</dd>
2004:
1.1 mike 2005: <dt>U+003E GREATER-THAN SIGN (>)</dt>
2006: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14 mike 2007: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
2008: state</a>. Emit that DOCTYPE token.</dd>
1.1 mike 2009:
2010: <dt>EOF</dt>
1.87 mike 2011: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
2012: state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
2013: <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1 mike 2014:
2015: <dt>Anything else</dt>
1.51 mike 2016: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
2017: DOCTYPE token's public identifier.</dd>
1.1 mike 2018:
1.37 mike 2019: </dl><h5 id="doctype-public-identifier-single-quoted-state"><span class="secno">8.2.4.59 </span><dfn>DOCTYPE public identifier (single-quoted) state</dfn></h5>
1.1 mike 2020:
2021: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
2022:
2023: <dl class="switch"><dt>U+0027 APOSTROPHE (')</dt>
2024: <dd>Switch to the <a href="#after-doctype-public-identifier-state">after DOCTYPE public identifier state</a>.</dd>
2025:
1.51 mike 2026: <dt>U+0000 NULL</dt>
2027: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
2028: character to the current DOCTYPE token's public identifier.</dd>
2029:
1.1 mike 2030: <dt>U+003E GREATER-THAN SIGN (>)</dt>
2031: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14 mike 2032: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
2033: state</a>. Emit that DOCTYPE token.</dd>
1.1 mike 2034:
2035: <dt>EOF</dt>
1.87 mike 2036: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
2037: state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
2038: <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1 mike 2039:
2040: <dt>Anything else</dt>
1.51 mike 2041: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
2042: DOCTYPE token's public identifier.</dd>
1.1 mike 2043:
1.37 mike 2044: </dl><h5 id="after-doctype-public-identifier-state"><span class="secno">8.2.4.60 </span><dfn>After DOCTYPE public identifier state</dfn></h5>
1.1 mike 2045:
2046: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
2047:
1.73 mike 2048: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 2049: <dt>U+000A LINE FEED (LF)</dt>
2050: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 2051:
1.1 mike 2052: <dt>U+0020 SPACE</dt>
2053: <dd>Switch to the <a href="#between-doctype-public-and-system-identifiers-state">between DOCTYPE public and system
2054: identifiers state</a>.</dd>
2055:
2056: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 2057: <dd>Switch to the <a href="#data-state">data state</a>. Emit the current DOCTYPE
2058: token.</dd>
1.1 mike 2059:
2060: <dt>U+0022 QUOTATION MARK (")</dt>
2061: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's system
2062: identifier to the empty string (not missing), then switch to the
2063: <a href="#doctype-system-identifier-double-quoted-state">DOCTYPE system identifier (double-quoted) state</a>.</dd>
2064:
2065: <dt>U+0027 APOSTROPHE (')</dt>
2066: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's system
2067: identifier to the empty string (not missing), then switch to the
2068: <a href="#doctype-system-identifier-single-quoted-state">DOCTYPE system identifier (single-quoted) state</a>.</dd>
2069:
2070: <dt>EOF</dt>
1.87 mike 2071: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
2072: state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
2073: <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1 mike 2074:
2075: <dt>Anything else</dt>
2076: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
2077: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#bogus-doctype-state">bogus
2078: DOCTYPE state</a>.</dd>
2079:
1.37 mike 2080: </dl><h5 id="between-doctype-public-and-system-identifiers-state"><span class="secno">8.2.4.61 </span><dfn>Between DOCTYPE public and system identifiers state</dfn></h5>
1.1 mike 2081:
2082: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
2083:
1.73 mike 2084: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 2085: <dt>U+000A LINE FEED (LF)</dt>
2086: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 2087:
1.1 mike 2088: <dt>U+0020 SPACE</dt>
1.14 mike 2089: <dd>Ignore the character.</dd>
1.1 mike 2090:
2091: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 2092: <dd>Switch to the <a href="#data-state">data state</a>. Emit the current DOCTYPE
2093: token.</dd>
1.1 mike 2094:
2095: <dt>U+0022 QUOTATION MARK (")</dt>
2096: <dd>Set the DOCTYPE token's system identifier to the empty string
2097: (not missing), then switch to the <a href="#doctype-system-identifier-double-quoted-state">DOCTYPE system identifier
2098: (double-quoted) state</a>.</dd>
2099:
2100: <dt>U+0027 APOSTROPHE (')</dt>
2101: <dd>Set the DOCTYPE token's system identifier to the empty string
2102: (not missing), then switch to the <a href="#doctype-system-identifier-single-quoted-state">DOCTYPE system identifier
2103: (single-quoted) state</a>.</dd>
2104:
2105: <dt>EOF</dt>
1.87 mike 2106: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
2107: state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
2108: <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1 mike 2109:
2110: <dt>Anything else</dt>
2111: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
2112: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#bogus-doctype-state">bogus
2113: DOCTYPE state</a>.</dd>
2114:
1.37 mike 2115: </dl><h5 id="after-doctype-system-keyword-state"><span class="secno">8.2.4.62 </span><dfn>After DOCTYPE system keyword state</dfn></h5>
1.1 mike 2116:
2117: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
2118:
1.73 mike 2119: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 2120: <dt>U+000A LINE FEED (LF)</dt>
2121: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 2122:
1.1 mike 2123: <dt>U+0020 SPACE</dt>
2124: <dd>Switch to the <a href="#before-doctype-system-identifier-state">before DOCTYPE system identifier
2125: state</a>.</dd>
2126:
2127: <dt>U+0022 QUOTATION MARK (")</dt>
2128: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's system
2129: identifier to the empty string (not missing), then switch to the
2130: <a href="#doctype-system-identifier-double-quoted-state">DOCTYPE system identifier (double-quoted) state</a>.</dd>
2131:
2132: <dt>U+0027 APOSTROPHE (')</dt>
2133: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's system
2134: identifier to the empty string (not missing), then switch to the
2135: <a href="#doctype-system-identifier-single-quoted-state">DOCTYPE system identifier (single-quoted) state</a>.</dd>
2136:
2137: <dt>U+003E GREATER-THAN SIGN (>)</dt>
2138: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14 mike 2139: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
2140: state</a>. Emit that DOCTYPE token.</dd>
1.1 mike 2141:
2142: <dt>EOF</dt>
1.87 mike 2143: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
2144: state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
2145: <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1 mike 2146:
2147: <dt>Anything else</dt>
2148: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
2149: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#bogus-doctype-state">bogus
2150: DOCTYPE state</a>.</dd>
2151:
1.37 mike 2152: </dl><h5 id="before-doctype-system-identifier-state"><span class="secno">8.2.4.63 </span><dfn>Before DOCTYPE system identifier state</dfn></h5>
1.1 mike 2153:
2154: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
2155:
1.73 mike 2156: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 2157: <dt>U+000A LINE FEED (LF)</dt>
2158: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 2159:
1.1 mike 2160: <dt>U+0020 SPACE</dt>
1.14 mike 2161: <dd>Ignore the character.</dd>
1.1 mike 2162:
2163: <dt>U+0022 QUOTATION MARK (")</dt>
2164: <dd>Set the DOCTYPE token's system identifier to the empty string
2165: (not missing), then switch to the <a href="#doctype-system-identifier-double-quoted-state">DOCTYPE system identifier
2166: (double-quoted) state</a>.</dd>
2167:
2168: <dt>U+0027 APOSTROPHE (')</dt>
2169: <dd>Set the DOCTYPE token's system identifier to the empty string
2170: (not missing), then switch to the <a href="#doctype-system-identifier-single-quoted-state">DOCTYPE system identifier
2171: (single-quoted) state</a>.</dd>
2172:
2173: <dt>U+003E GREATER-THAN SIGN (>)</dt>
2174: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14 mike 2175: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
2176: state</a>. Emit that DOCTYPE token.</dd>
1.1 mike 2177:
2178: <dt>EOF</dt>
1.87 mike 2179: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
2180: state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
2181: <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1 mike 2182:
2183: <dt>Anything else</dt>
2184: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
2185: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#bogus-doctype-state">bogus
2186: DOCTYPE state</a>.</dd>
2187:
1.37 mike 2188: </dl><h5 id="doctype-system-identifier-double-quoted-state"><span class="secno">8.2.4.64 </span><dfn>DOCTYPE system identifier (double-quoted) state</dfn></h5>
1.1 mike 2189:
2190: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
2191:
2192: <dl class="switch"><dt>U+0022 QUOTATION MARK (")</dt>
2193: <dd>Switch to the <a href="#after-doctype-system-identifier-state">after DOCTYPE system identifier
2194: state</a>.</dd>
2195:
1.51 mike 2196: <dt>U+0000 NULL</dt>
2197: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
2198: character to the current DOCTYPE token's system identifier.</dd>
2199:
1.1 mike 2200: <dt>U+003E GREATER-THAN SIGN (>)</dt>
2201: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14 mike 2202: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
2203: state</a>. Emit that DOCTYPE token.</dd>
1.1 mike 2204:
2205: <dt>EOF</dt>
1.87 mike 2206: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
2207: state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
2208: <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1 mike 2209:
2210: <dt>Anything else</dt>
2211: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14 mike 2212: DOCTYPE token's system identifier.</dd>
1.1 mike 2213:
1.37 mike 2214: </dl><h5 id="doctype-system-identifier-single-quoted-state"><span class="secno">8.2.4.65 </span><dfn>DOCTYPE system identifier (single-quoted) state</dfn></h5>
1.1 mike 2215:
2216: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
2217:
2218: <dl class="switch"><dt>U+0027 APOSTROPHE (')</dt>
2219: <dd>Switch to the <a href="#after-doctype-system-identifier-state">after DOCTYPE system identifier
2220: state</a>.</dd>
2221:
1.51 mike 2222: <dt>U+0000 NULL</dt>
2223: <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
2224: character to the current DOCTYPE token's system identifier.</dd>
2225:
1.1 mike 2226: <dt>U+003E GREATER-THAN SIGN (>)</dt>
2227: <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14 mike 2228: <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
2229: state</a>. Emit that DOCTYPE token.</dd>
1.1 mike 2230:
2231: <dt>EOF</dt>
1.87 mike 2232: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
2233: state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
2234: <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1 mike 2235:
2236: <dt>Anything else</dt>
2237: <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14 mike 2238: DOCTYPE token's system identifier.</dd>
1.1 mike 2239:
1.37 mike 2240: </dl><h5 id="after-doctype-system-identifier-state"><span class="secno">8.2.4.66 </span><dfn>After DOCTYPE system identifier state</dfn></h5>
1.1 mike 2241:
2242: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
2243:
1.73 mike 2244: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 2245: <dt>U+000A LINE FEED (LF)</dt>
2246: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 2247:
1.1 mike 2248: <dt>U+0020 SPACE</dt>
1.14 mike 2249: <dd>Ignore the character.</dd>
1.1 mike 2250:
2251: <dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 2252: <dd>Switch to the <a href="#data-state">data state</a>. Emit the current DOCTYPE
2253: token.</dd>
1.1 mike 2254:
2255: <dt>EOF</dt>
1.87 mike 2256: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
2257: state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
2258: <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1 mike 2259:
2260: <dt>Anything else</dt>
2261: <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#bogus-doctype-state">bogus DOCTYPE
2262: state</a>. (This does <em>not</em> set the DOCTYPE token's
2263: <i>force-quirks flag</i> to <i>on</i>.)</dd>
2264:
1.37 mike 2265: </dl><h5 id="bogus-doctype-state"><span class="secno">8.2.4.67 </span><dfn>Bogus DOCTYPE state</dfn></h5>
1.1 mike 2266:
2267: <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
2268:
2269: <dl class="switch"><dt>U+003E GREATER-THAN SIGN (>)</dt>
1.14 mike 2270: <dd>Switch to the <a href="#data-state">data state</a>. Emit the DOCTYPE
2271: token.</dd>
1.1 mike 2272:
2273: <dt>EOF</dt>
1.87 mike 2274: <dd>Switch to the <a href="#data-state">data state</a>. Emit the DOCTYPE token.
2275: Reconsume the EOF character.</dd>
1.1 mike 2276:
2277: <dt>Anything else</dt>
1.14 mike 2278: <dd>Ignore the character.</dd>
1.1 mike 2279:
1.37 mike 2280: </dl><h5 id="cdata-section-state"><span class="secno">8.2.4.68 </span><dfn>CDATA section state</dfn></h5>
1.1 mike 2281:
1.87 mike 2282: <p>Switch to the <a href="#data-state">data state</a>.</p>
2283:
1.1 mike 2284: <p>Consume every character up to the next occurrence of the three
2285: character sequence U+005D RIGHT SQUARE BRACKET U+005D RIGHT SQUARE
2286: BRACKET U+003E GREATER-THAN SIGN (<code title="">]]></code>), or the
2287: end of the file (EOF), whichever comes first. Emit a series of
2288: character tokens consisting of all the characters consumed except
2289: the matching three character sequence at the end (if one was found
1.70 mike 2290: before the end of the file).</p>
1.1 mike 2291:
2292: <p>If the end of the file was reached, reconsume the EOF
2293: character.</p>
2294:
2295:
2296:
1.37 mike 2297: <h5 id="tokenizing-character-references"><span class="secno">8.2.4.69 </span>Tokenizing character references</h5>
1.1 mike 2298:
2299: <p>This section defines how to <dfn id="consume-a-character-reference">consume a character
2300: reference</dfn>. This definition is used when parsing character
2301: references <a href="#character-reference-in-data-state" title="character reference in data state">in
2302: text</a> and <a href="#character-reference-in-attribute-value-state" title="character reference in attribute value
2303: state">in attributes</a>.</p>
2304:
2305: <p>The behavior depends on the identity of the next character (the
2306: one immediately after the U+0026 AMPERSAND character):</p>
2307:
1.73 mike 2308: <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1 mike 2309: <dt>U+000A LINE FEED (LF)</dt>
2310: <dt>U+000C FORM FEED (FF)</dt>
1.70 mike 2311:
1.1 mike 2312: <dt>U+0020 SPACE</dt>
2313: <dt>U+003C LESS-THAN SIGN</dt>
2314: <dt>U+0026 AMPERSAND</dt>
2315: <dt>EOF</dt>
2316: <dt>The <dfn id="additional-allowed-character">additional allowed character</dfn>, if there is one</dt>
2317:
2318: <dd>Not a character reference. No characters are consumed, and
2319: nothing is returned. (This is not an error, either.)</dd>
2320:
2321:
2322: <dt>U+0023 NUMBER SIGN (#)</dt>
2323:
2324: <dd>
2325:
2326: <p>Consume the U+0023 NUMBER SIGN.</p>
2327:
2328: <p>The behavior further depends on the character after the U+0023
2329: NUMBER SIGN:</p>
2330:
2331: <dl class="switch"><dt>U+0078 LATIN SMALL LETTER X</dt>
2332: <dt>U+0058 LATIN CAPITAL LETTER X</dt>
2333:
2334: <dd>
2335:
2336: <p>Consume the X.</p>
2337:
2338: <p>Follow the steps below, but using the range of characters
2339: U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0061 LATIN
2340: SMALL LETTER A to U+0066 LATIN SMALL LETTER F, and U+0041 LATIN
2341: CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F (in other
2342: words, 0-9, A-F, a-f).</p>
2343:
2344: <p>When it comes to interpreting the number, interpret it as a
2345: hexadecimal number.</p>
2346:
2347: </dd>
2348:
2349:
2350: <dt>Anything else</dt>
2351:
2352: <dd>
2353:
2354: <p>Follow the steps below, but using the range of characters
2355: U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9).</p>
2356:
2357: <p>When it comes to interpreting the number, interpret it as a
2358: decimal number.</p>
2359:
2360: </dd>
2361:
2362: </dl><p>Consume as many characters as match the range of characters
2363: given above.</p>
2364:
2365: <p>If no characters match the range, then don't consume any
2366: characters (and unconsume the U+0023 NUMBER SIGN character and, if
2367: appropriate, the X character). This is a <a href="parsing.html#parse-error">parse
2368: error</a>; nothing is returned.</p>
2369:
2370: <p>Otherwise, if the next character is a U+003B SEMICOLON, consume
2371: that too. If it isn't, there is a <a href="parsing.html#parse-error">parse
2372: error</a>.</p>
2373:
2374: <p>If one or more characters match the range, then take them all
2375: and interpret the string of characters as a number (either
2376: hexadecimal or decimal as appropriate).</p>
2377:
2378: <p>If that number is one of the numbers in the first column of the
2379: following table, then this is a <a href="parsing.html#parse-error">parse error</a>. Find the
2380: row with that number in the first column, and return a character
2381: token for the Unicode character given in the second column of that
2382: row.</p>
2383:
1.26 mike 2384: <table id="table-charref-overrides"><thead><tr><th>Number </th><th colspan="2">Unicode character
1.1 mike 2385: </th></tr></thead><tbody><tr><td>0x00 </td><td>U+FFFD </td><td>REPLACEMENT CHARACTER
2386: </td></tr><tr><td>0x0D </td><td>U+000D </td><td>CARRIAGE RETURN (CR)
2387: </td></tr><tr><td>0x80 </td><td>U+20AC </td><td>EURO SIGN (€)
2388: </td></tr><tr><td>0x81 </td><td>U+0081 </td><td><control>
2389: </td></tr><tr><td>0x82 </td><td>U+201A </td><td>SINGLE LOW-9 QUOTATION MARK (‚)
2390: </td></tr><tr><td>0x83 </td><td>U+0192 </td><td>LATIN SMALL LETTER F WITH HOOK (ƒ)
2391: </td></tr><tr><td>0x84 </td><td>U+201E </td><td>DOUBLE LOW-9 QUOTATION MARK („)
2392: </td></tr><tr><td>0x85 </td><td>U+2026 </td><td>HORIZONTAL ELLIPSIS (…)
2393: </td></tr><tr><td>0x86 </td><td>U+2020 </td><td>DAGGER (†)
2394: </td></tr><tr><td>0x87 </td><td>U+2021 </td><td>DOUBLE DAGGER (‡)
2395: </td></tr><tr><td>0x88 </td><td>U+02C6 </td><td>MODIFIER LETTER CIRCUMFLEX ACCENT (ˆ)
2396: </td></tr><tr><td>0x89 </td><td>U+2030 </td><td>PER MILLE SIGN (‰)
2397: </td></tr><tr><td>0x8A </td><td>U+0160 </td><td>LATIN CAPITAL LETTER S WITH CARON (Š)
2398: </td></tr><tr><td>0x8B </td><td>U+2039 </td><td>SINGLE LEFT-POINTING ANGLE QUOTATION MARK (‹)
2399: </td></tr><tr><td>0x8C </td><td>U+0152 </td><td>LATIN CAPITAL LIGATURE OE (Œ)
2400: </td></tr><tr><td>0x8D </td><td>U+008D </td><td><control>
2401: </td></tr><tr><td>0x8E </td><td>U+017D </td><td>LATIN CAPITAL LETTER Z WITH CARON (Ž)
2402: </td></tr><tr><td>0x8F </td><td>U+008F </td><td><control>
2403: </td></tr><tr><td>0x90 </td><td>U+0090 </td><td><control>
2404: </td></tr><tr><td>0x91 </td><td>U+2018 </td><td>LEFT SINGLE QUOTATION MARK (‘)
2405: </td></tr><tr><td>0x92 </td><td>U+2019 </td><td>RIGHT SINGLE QUOTATION MARK (’)
2406: </td></tr><tr><td>0x93 </td><td>U+201C </td><td>LEFT DOUBLE QUOTATION MARK (“)
2407: </td></tr><tr><td>0x94 </td><td>U+201D </td><td>RIGHT DOUBLE QUOTATION MARK (”)
2408: </td></tr><tr><td>0x95 </td><td>U+2022 </td><td>BULLET (•)
2409: </td></tr><tr><td>0x96 </td><td>U+2013 </td><td>EN DASH (–)
2410: </td></tr><tr><td>0x97 </td><td>U+2014 </td><td>EM DASH (—)
2411: </td></tr><tr><td>0x98 </td><td>U+02DC </td><td>SMALL TILDE (˜)
2412: </td></tr><tr><td>0x99 </td><td>U+2122 </td><td>TRADE MARK SIGN (™)
2413: </td></tr><tr><td>0x9A </td><td>U+0161 </td><td>LATIN SMALL LETTER S WITH CARON (š)
2414: </td></tr><tr><td>0x9B </td><td>U+203A </td><td>SINGLE RIGHT-POINTING ANGLE QUOTATION MARK (›)
2415: </td></tr><tr><td>0x9C </td><td>U+0153 </td><td>LATIN SMALL LIGATURE OE (œ)
2416: </td></tr><tr><td>0x9D </td><td>U+009D </td><td><control>
2417: </td></tr><tr><td>0x9E </td><td>U+017E </td><td>LATIN SMALL LETTER Z WITH CARON (ž)
2418: </td></tr><tr><td>0x9F </td><td>U+0178 </td><td>LATIN CAPITAL LETTER Y WITH DIAERESIS (Ÿ)
1.70 mike 2419: </td></tr></tbody></table><p>Otherwise, if the number is in the range 0xD800 to 0xDFFF or is greater than 0x10FFFF, then this is a
1.61 mike 2420: <a href="parsing.html#parse-error">parse error</a>. Return a U+FFFD REPLACEMENT
2421: CHARACTER.</p>
1.1 mike 2422:
2423: <p>Otherwise, return a character token for the Unicode character
2424: whose code point is that number.
2425:
1.90 mike 2426:
2427: If the number is in the range 0x0001 to 0x0008, 0x000E to 0x001F, 0x007F to 0x009F, 0xFDD0 to
1.1 mike 2428: 0xFDEF, or is one of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF,
2429: 0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE,
2430: 0x5FFFF, 0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF,
2431: 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE,
2432: 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF,
2433: 0x10FFFE, or 0x10FFFF, then this is a <a href="parsing.html#parse-error">parse
2434: error</a>.</p>
2435:
2436: </dd>
2437:
2438:
2439: <dt>Anything else</dt>
2440:
2441: <dd>
2442:
2443: <p>Consume the maximum number of characters possible, with the
2444: consumed characters matching one of the identifiers in the first
2445: column of the <a href="named-character-references.html#named-character-references">named character references</a> table (in a
2446: <a href="infrastructure.html#case-sensitive">case-sensitive</a> manner).</p>
2447:
2448: <p>If no match can be made, then no characters are consumed, and
2449: nothing is returned. In this case, if the characters after the
2450: U+0026 AMPERSAND character (&) consist of a sequence of one or
2451: more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT
2452: NINE (9), U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER
2453: Z, and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL
2454: LETTER Z, followed by a U+003B SEMICOLON character (;), then this
2455: is a <a href="parsing.html#parse-error">parse error</a>.</p>
2456:
2457: <p>If the character reference is being consumed <a href="#character-reference-in-attribute-value-state" title="character reference in attribute value state">as part of an
2458: attribute</a>, and the last character matched is not a U+003B
2459: SEMICOLON character (;), and the next character is either a U+003D
2460: EQUALS SIGN character (=) or in the range U+0030 DIGIT ZERO (0) to
2461: U+0039 DIGIT NINE (9), U+0041 LATIN CAPITAL LETTER A to U+005A
2462: LATIN CAPITAL LETTER Z, or U+0061 LATIN SMALL LETTER A to U+007A
2463: LATIN SMALL LETTER Z, then, for historical reasons, all the
2464: characters that were matched after the U+0026 AMPERSAND character
2465: (&) must be unconsumed, and nothing is returned.</p>
1.70 mike 2466:
1.1 mike 2467:
2468: <p>Otherwise, a character reference is parsed. If the last
2469: character matched is not a U+003B SEMICOLON character (;), there
2470: is a <a href="parsing.html#parse-error">parse error</a>.</p>
2471:
1.41 mike 2472: <p>Return one or two character tokens for the character(s)
2473: corresponding to the character reference name (as given by the
2474: second column of the <a href="named-character-references.html#named-character-references">named character references</a>
2475: table).</p>
1.1 mike 2476:
2477: <div class="example">
2478:
2479: <p>If the markup contains (not in an attribute) the string <code title="">I'm &notit; I tell you</code>, the character
2480: reference is parsed as "not", as in, <code title="">I'm ¬it;
2481: I tell you</code> (and this is a parse error). But if the markup
2482: was <code title="">I'm &notin; I tell you</code>, the
2483: character reference would be parsed as "notin;", resulting in
2484: <code title="">I'm ∉ I tell you</code> (and no parse
2485: error).</p>
2486:
2487: </div>
2488:
2489: </dd>
2490:
2491: </dl></div></body></html>
Webmaster