Annotation of html5/spec/tokenization.html, revision 1.112

1.77      mike        1: <!DOCTYPE html>
1.88      mike        2: <html lang="en-US-x-Hixie"><head><title>8.2.4 Tokenization &#8212; HTML5</title><style type="text/css">
1.1       mike        3:    pre { margin-left: 2em; white-space: pre-wrap; }
                      4:    h2 { margin: 3em 0 1em 0; }
                      5:    h3 { margin: 2.5em 0 1em 0; }
                      6:    h4 { margin: 2.5em 0 0.75em 0; }
                      7:    h5, h6 { margin: 2.5em 0 1em; }
                      8:    h1 + h2, h1 + h2 + h2 { margin: 0.75em 0 0.75em; }
                      9:    h2 + h3, h3 + h4, h4 + h5, h5 + h6 { margin-top: 0.5em; }
                     10:    p { margin: 1em 0; }
                     11:    hr:not(.top) { display: block; background: none; border: none; padding: 0; margin: 2em 0; height: auto; }
                     12:    dl, dd { margin-top: 0; margin-bottom: 0; }
                     13:    dt { margin-top: 0.75em; margin-bottom: 0.25em; clear: left; }
                     14:    dt + dt { margin-top: 0; }
                     15:    dd dt { margin-top: 0.25em; margin-bottom: 0; }
                     16:    dd p { margin-top: 0; }
                     17:    dd dl + p { margin-top: 1em; }
                     18:    dd table + p { margin-top: 1em; }
                     19:    p + * > li, dd li { margin: 1em 0; }
                     20:    dt, dfn { font-weight: bold; font-style: normal; }
1.80      mike       21:    i, em { font-style: italic; }
1.1       mike       22:    dt dfn { font-style: italic; }
                     23:    pre, code { font-size: inherit; font-family: monospace; font-variant: normal; }
                     24:    pre strong { color: black; font: inherit; font-weight: bold; background: yellow; }
                     25:    pre em { font-weight: bolder; font-style: normal; }
                     26:    @media screen { code { color: orangered; } code :link, code :visited { color: inherit; } }
                     27:    var sub { vertical-align: bottom; font-size: smaller; position: relative; top: 0.1em; }
                     28:    table { border-collapse: collapse; border-style: hidden hidden none hidden; }
                     29:    table thead, table tbody { border-bottom: solid; }
                     30:    table tbody th:first-child { border-left: solid; }
                     31:    table tbody th { text-align: left; }
                     32:    table td, table th { border-left: solid; border-right: solid; border-bottom: solid thin; vertical-align: top; padding: 0.2em; }
                     33:    blockquote { margin: 0 0 0 2em; border: 0; padding: 0; font-style: italic; }
                     34: 
                     35:    .bad, .bad *:not(.XXX) { color: gray; border-color: gray; background: transparent; }
                     36:    .matrix, .matrix td { border: none; text-align: right; }
                     37:    .matrix { margin-left: 2em; }
                     38:    .dice-example { border-collapse: collapse; border-style: hidden solid solid hidden; border-width: thin; margin-left: 3em; }
                     39:    .dice-example caption { width: 30em; font-size: smaller; font-style: italic; padding: 0.75em 0; text-align: left; }
                     40:    .dice-example td, .dice-example th { border: solid thin; width: 1.35em; height: 1.05em; text-align: center; padding: 0; }
                     41: 
                     42:    .toc dfn, h1 dfn, h2 dfn, h3 dfn, h4 dfn, h5 dfn, h6 dfn { font: inherit; }
1.83      mike       43:    img.extra, p.overview { float: right; }
1.82      mike       44:    pre.idl { border: solid thin; background: #EEEEEE; color: black; padding: 0.5em 1em; position: relative; }
1.1       mike       45:    pre.idl :link, pre.idl :visited { color: inherit; background: transparent; }
1.82      mike       46:    pre.idl::before { content: "IDL"; font: bold small sans-serif; padding: 0.5em; background: white; position: absolute; top: 0; margin: -1px 0 0 -4em; width: 1.5em; border: thin solid; border-radius: 0 0 0 0.5em }
1.1       mike       47:    pre.css { border: solid thin; background: #FFFFEE; color: black; padding: 0.5em 1em; }
                     48:    pre.css:first-line { color: #AAAA50; }
                     49:    dl.domintro { color: green; margin: 2em 0 2em 2em; padding: 0.5em 1em; border: none; background: #DDFFDD; }
                     50:    hr + dl.domintro, div.impl + dl.domintro { margin-top: 2.5em; margin-bottom: 1.5em; }
                     51:    dl.domintro dt, dl.domintro dt * { color: black; text-decoration: none; }
                     52:    dl.domintro dd { margin: 0.5em 0 1em 2em; padding: 0; }
                     53:    dl.domintro dd p { margin: 0.5em 0; }
1.84      mike       54:    dl.domintro:before { display: table; margin: -1em -0.5em -0.5em auto; width: auto; content: 'This box is non-normative. Implementation requirements are given below this box.'; color: black; font-style: italic; border: solid 2px; background: white; padding: 0 0.25em; }
1.1       mike       55:    dl.switch { padding-left: 2em; }
                     56:    dl.switch > dt { text-indent: -1.5em; }
                     57:    dl.switch > dt:before { content: '\21AA'; padding: 0 0.5em 0 0; display: inline-block; width: 1em; text-align: right; line-height: 0.5em; }
                     58:    dl.triple { padding: 0 0 0 1em; }
                     59:    dl.triple dt, dl.triple dd { margin: 0; display: inline }
                     60:    dl.triple dt:after { content: ':'; }
                     61:    dl.triple dd:after { content: '\A'; white-space: pre; }
                     62:    .diff-old { text-decoration: line-through; color: silver; background: transparent; }
                     63:    .diff-chg, .diff-new { text-decoration: underline; color: green; background: transparent; }
                     64:    a .diff-new { border-bottom: 1px blue solid; }
                     65: 
                     66:    h2 { page-break-before: always; }
                     67:    h1, h2, h3, h4, h5, h6 { page-break-after: avoid; }
                     68:    h1 + h2, hr + h2.no-toc { page-break-before: auto; }
                     69: 
1.44      mike       70:    p  > span:not([title=""]):not([class="XXX"]):not([class="impl"]):not([class="note"]),
                     71:    li > span:not([title=""]):not([class="XXX"]):not([class="impl"]):not([class="note"]), { border-bottom: solid #9999CC; }
1.1       mike       72: 
                     73:    div.head { margin: 0 0 1em; padding: 1em 0 0 0; }
                     74:    div.head p { margin: 0; }
                     75:    div.head h1 { margin: 0; }
                     76:    div.head .logo { float: right; margin: 0 1em; }
                     77:    div.head .logo img { border: none } /* remove border from top image */
                     78:    div.head dl { margin: 1em 0; }
                     79:    div.head p.copyright, div.head p.alt { font-size: x-small; font-style: oblique; margin: 0; }
                     80: 
                     81:    body > .toc > li { margin-top: 1em; margin-bottom: 1em; }
                     82:    body > .toc.brief > li { margin-top: 0.35em; margin-bottom: 0.35em; }
                     83:    body > .toc > li > * { margin-bottom: 0.5em; }
                     84:    body > .toc > li > * > li > * { margin-bottom: 0.25em; }
                     85:    .toc, .toc li { list-style: none; }
                     86: 
                     87:    .brief { margin-top: 1em; margin-bottom: 1em; line-height: 1.1; }
                     88:    .brief li { margin: 0; padding: 0; }
                     89:    .brief li p { margin: 0; padding: 0; }
                     90: 
                     91:    .category-list { margin-top: -0.75em; margin-bottom: 1em; line-height: 1.5; }
                     92:    .category-list::before { content: '\21D2\A0'; font-size: 1.2em; font-weight: 900; }
                     93:    .category-list li { display: inline; }
                     94:    .category-list li:not(:last-child)::after { content: ', '; }
                     95:    .category-list li > span, .category-list li > a { text-transform: lowercase; }
                     96:    .category-list li * { text-transform: none; } /* don't affect <code> nested in <a> */
                     97: 
                     98:    .XXX { color: #E50000; background: white; border: solid red; padding: 0.5em; margin: 1em 0; }
                     99:    .XXX > :first-child { margin-top: 0; }
                    100:    p .XXX { line-height: 3em; }
                    101:    .annotation { border: solid thin black; background: #0C479D; color: white; position: relative; margin: 8px 0 20px 0; }
                    102:    .annotation:before { position: absolute; left: 0; top: 0; width: 100%; height: 100%; margin: 6px -6px -6px 6px; background: #333333; z-index: -1; content: ''; }
                    103:    .annotation :link, .annotation :visited { color: inherit; }
                    104:    .annotation :link:hover, .annotation :visited:hover { background: transparent; }
                    105:    .annotation span { border: none ! important; }
                    106:    .note { color: green; background: transparent; font-family: sans-serif; }
                    107:    .warning { color: red; background: transparent; }
                    108:    .note, .warning { font-weight: bolder; font-style: italic; }
1.80      mike      109:    .note em, .warning em, .note i, .warning i { font-style: normal; }
1.1       mike      110:    p.note, div.note { padding: 0.5em 2em; }
                    111:    span.note { padding: 0 2em; }
                    112:    .note p:first-child, .warning p:first-child { margin-top: 0; }
                    113:    .note p:last-child, .warning p:last-child { margin-bottom: 0; }
                    114:    .warning:before { font-style: normal; }
                    115:    p.note:before { content: 'Note: '; }
                    116:    p.warning:before { content: '\26A0 Warning! '; }
                    117: 
                    118:    .bookkeeping:before { display: block; content: 'Bookkeeping details'; font-weight: bolder; font-style: italic; }
                    119:    .bookkeeping { font-size: 0.8em; margin: 2em 0; }
                    120:    .bookkeeping p { margin: 0.5em 2em; display: list-item; list-style: square; }
1.19      mike      121:    .bookkeeping dt { margin: 0.5em 2em 0; }
                    122:    .bookkeeping dd { margin: 0 3em 0.5em; }
1.1       mike      123: 
                    124:    h4 { position: relative; z-index: 3; }
                    125:    h4 + .element, h4 + div + .element { margin-top: -2.5em; padding-top: 2em; }
                    126:    .element {
                    127:      background: #EEEEFF;
                    128:      color: black;
                    129:      margin: 0 0 1em 0.15em;
                    130:      padding: 0 1em 0.25em 0.75em;
                    131:      border-left: solid #9999FF 0.25em;
                    132:      position: relative;
                    133:      z-index: 1;
                    134:    }
                    135:    .element:before {
                    136:      position: absolute;
                    137:      z-index: 2;
                    138:      top: 0;
                    139:      left: -1.15em;
                    140:      height: 2em;
                    141:      width: 0.9em;
                    142:      background: #EEEEFF;
                    143:      content: ' ';
                    144:      border-style: none none solid solid;
                    145:      border-color: #9999FF;
                    146:      border-width: 0.25em;
                    147:    }
                    148: 
                    149:    .example { display: block; color: #222222; background: #FCFCFC; border-left: double; margin-left: 2em; padding-left: 1em; }
                    150:    td > .example:only-child { margin: 0 0 0 0.1em; }
                    151: 
                    152:    ul.domTree, ul.domTree ul { padding: 0 0 0 1em; margin: 0; }
                    153:    ul.domTree li { padding: 0; margin: 0; list-style: none; position: relative; }
                    154:    ul.domTree li li { list-style: none; }
                    155:    ul.domTree li:first-child::before { position: absolute; top: 0; height: 0.6em; left: -0.75em; width: 0.5em; border-style: none none solid solid; content: ''; border-width: 0.1em; }
                    156:    ul.domTree li:not(:last-child)::after { position: absolute; top: 0; bottom: -0.6em; left: -0.75em; width: 0.5em; border-style: none none solid solid; content: ''; border-width: 0.1em; }
                    157:    ul.domTree span { font-style: italic; font-family: serif; }
                    158:    ul.domTree .t1 code { color: purple; font-weight: bold; }
                    159:    ul.domTree .t2 { font-style: normal; font-family: monospace; }
                    160:    ul.domTree .t2 .name { color: black; font-weight: bold; }
                    161:    ul.domTree .t2 .value { color: blue; font-weight: normal; }
                    162:    ul.domTree .t3 code, .domTree .t4 code, .domTree .t5 code { color: gray; }
                    163:    ul.domTree .t7 code, .domTree .t8 code { color: green; }
                    164:    ul.domTree .t10 code { color: teal; }
                    165: 
                    166:    body.dfnEnabled dfn { cursor: pointer; }
                    167:    .dfnPanel {
                    168:      display: inline;
                    169:      position: absolute;
                    170:      z-index: 10;
                    171:      height: auto;
                    172:      width: auto;
                    173:      padding: 0.5em 0.75em;
                    174:      font: small sans-serif, Droid Sans Fallback;
                    175:      background: #DDDDDD;
                    176:      color: black;
                    177:      border: outset 0.2em;
                    178:    }
                    179:    .dfnPanel * { margin: 0; padding: 0; font: inherit; text-indent: 0; }
                    180:    .dfnPanel :link, .dfnPanel :visited { color: black; }
                    181:    .dfnPanel p { font-weight: bolder; }
                    182:    .dfnPanel * + p { margin-top: 0.25em; }
                    183:    .dfnPanel li { list-style-position: inside; }
                    184: 
                    185:    #configUI { position: absolute; z-index: 20; top: 10em; right: 1em; width: 11em; font-size: small; }
                    186:    #configUI p { margin: 0.5em 0; padding: 0.3em; background: #EEEEEE; color: black; border: inset thin; }
                    187:    #configUI p label { display: block; }
                    188:    #configUI #updateUI, #configUI .loginUI { text-align: center; }
                    189:    #configUI input[type=button] { display: block; margin: auto; }
1.17      mike      190: 
1.51      mike      191:    fieldset { margin: 1em; padding: 0.5em 1em; }
                    192:    fieldset > legend + * { margin-top: 0; }
1.43      mike      193:    fieldset > :last-child { margin-bottom: 0; }
1.51      mike      194:    fieldset p { margin: 0.5em 0; }
                    195: 
1.78      mike      196:   </style><link href="http://www.w3.org/StyleSheets/TR/W3C-ED" rel="stylesheet" type="text/css"><style type="text/css">
1.1       mike      197: 
                    198:    .applies thead th > * { display: block; }
                    199:    .applies thead code { display: block; }
                    200:    .applies tbody th { whitespace: nowrap; }
                    201:    .applies td { text-align: center; }
                    202:    .applies .yes { background: yellow; }
                    203: 
1.20      mike      204:    .matrix, .matrix td { border: hidden; text-align: right; }
1.1       mike      205:    .matrix { margin-left: 2em; }
                    206: 
                    207:    .dice-example { border-collapse: collapse; border-style: hidden solid solid hidden; border-width: thin; margin-left: 3em; }
                    208:    .dice-example caption { width: 30em; font-size: smaller; font-style: italic; padding: 0.75em 0; text-align: left; }
                    209:    .dice-example td, .dice-example th { border: solid thin; width: 1.35em; height: 1.05em; text-align: center; padding: 0; }
                    210: 
1.32      mike      211:    td.eg { border-width: thin; text-align: center; }
                    212: 
1.1       mike      213:    #table-example-1 { border: solid thin; border-collapse: collapse; margin-left: 3em; }
                    214:    #table-example-1 * { font-family: "Essays1743", serif; line-height: 1.01em; }
                    215:    #table-example-1 caption { padding-bottom: 0.5em; }
                    216:    #table-example-1 thead, #table-example-1 tbody { border: none; }
                    217:    #table-example-1 th, #table-example-1 td { border: solid thin; }
                    218:    #table-example-1 th { font-weight: normal; }
                    219:    #table-example-1 td { border-style: none solid; vertical-align: top; }
                    220:    #table-example-1 th { padding: 0.5em; vertical-align: middle; text-align: center; }
                    221:    #table-example-1 tbody tr:first-child td { padding-top: 0.5em; }
                    222:    #table-example-1 tbody tr:last-child td { padding-bottom: 1.5em; }
                    223:    #table-example-1 tbody td:first-child { padding-left: 2.5em; padding-right: 0; width: 9em; }
                    224:    #table-example-1 tbody td:first-child::after { content: leader(". "); }
                    225:    #table-example-1 tbody td { padding-left: 2em; padding-right: 2em; }
                    226:    #table-example-1 tbody td:first-child + td { width: 10em; }
                    227:    #table-example-1 tbody td:first-child + td ~ td { width: 2.5em; }
                    228:    #table-example-1 tbody td:first-child + td + td + td ~ td { width: 1.25em; }
                    229: 
                    230:    .apple-table-examples { border: none; border-collapse: separate; border-spacing: 1.5em 0em; width: 40em; margin-left: 3em; }
                    231:    .apple-table-examples * { font-family: "Times", serif; }
                    232:    .apple-table-examples td, .apple-table-examples th { border: none; white-space: nowrap; padding-top: 0; padding-bottom: 0; }
                    233:    .apple-table-examples tbody th:first-child { border-left: none; width: 100%; }
                    234:    .apple-table-examples thead th:first-child ~ th { font-size: smaller; font-weight: bolder; border-bottom: solid 2px; text-align: center; }
                    235:    .apple-table-examples tbody th::after, .apple-table-examples tfoot th::after { content: leader(". ") }
                    236:    .apple-table-examples tbody th, .apple-table-examples tfoot th { font: inherit; text-align: left; }
                    237:    .apple-table-examples td { text-align: right; vertical-align: top; }
                    238:    .apple-table-examples.e1 tbody tr:last-child td { border-bottom: solid 1px; }
                    239:    .apple-table-examples.e1 tbody + tbody tr:last-child td { border-bottom: double 3px; }
                    240:    .apple-table-examples.e2 th[scope=row] { padding-left: 1em; }
                    241:    .apple-table-examples sup { line-height: 0; }
                    242: 
                    243:    .details-example img { vertical-align: top; }
                    244: 
1.60      mike      245:    #base64-table {
                    246:      white-space: nowrap;
                    247:      font-size: 0.6em;
                    248:      column-width: 6em;
                    249:      column-count: 5;
                    250:      column-gap: 1em;
                    251:      -moz-column-width: 6em;
                    252:      -moz-column-count: 5;
                    253:      -moz-column-gap: 1em;
                    254:      -webkit-column-width: 6em;
                    255:      -webkit-column-count: 5;
                    256:      -webkit-column-gap: 1em;
                    257:    }
                    258:    #base64-table thead { display: none; }
                    259:    #base64-table * { border: none; }
                    260:    #base64-table tbody td:first-child:after { content: ':'; }
                    261:    #base64-table tbody td:last-child { text-align: right; }
                    262: 
1.1       mike      263:    #named-character-references-table {
1.41      mike      264:      white-space: nowrap;
1.1       mike      265:      font-size: 0.6em;
1.41      mike      266:      column-width: 30em;
1.1       mike      267:      column-gap: 1em;
1.41      mike      268:      -moz-column-width: 30em;
1.1       mike      269:      -moz-column-gap: 1em;
1.41      mike      270:      -webkit-column-width: 30em;
1.1       mike      271:      -webkit-column-gap: 1em;
                    272:    }
1.41      mike      273:    #named-character-references-table > table > tbody > tr > td:first-child + td,
1.1       mike      274:    #named-character-references-table > table > tbody > tr > td:last-child { text-align: center; }
                    275:    #named-character-references-table > table > tbody > tr > td:last-child:hover > span { position: absolute; top: auto; left: auto; margin-left: 0.5em; line-height: 1.2; font-size: 5em; border: outset; padding: 0.25em 0.5em; background: white; width: 1.25em; height: auto; text-align: center; }
1.41      mike      276:    #named-character-references-table > table > tbody > tr#entity-CounterClockwiseContourIntegral > td:first-child { font-size: 0.5em; }
1.1       mike      277: 
1.2       mike      278:    .glyph.control { color: red; }
                    279: 
1.4       mike      280:    @font-face {
                    281:      font-family: 'Essays1743';
                    282:      src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743.ttf');
                    283:    }
                    284:    @font-face {
                    285:      font-family: 'Essays1743';
                    286:      font-weight: bold;
                    287:      src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743-Bold.ttf');
                    288:    }
                    289:    @font-face {
                    290:      font-family: 'Essays1743';
                    291:      font-style: italic;
                    292:      src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743-Italic.ttf');
                    293:    }
                    294:    @font-face {
                    295:      font-family: 'Essays1743';
                    296:      font-style: italic;
                    297:      font-weight: bold;
                    298:      src: url('http://www.whatwg.org/specs/web-apps/current-work/fonts/Essays1743-BoldItalic.ttf');
                    299:    }
                    300: 
1.77      mike      301:   </style><link href="data:text/css," id="complete" rel="stylesheet" title="Complete specification"><link href="data:text/css,.impl%20%7B%20display:%20none;%20%7D%0Ahtml%20%7B%20border:%20solid%20yellow;%20%7D%20.domintro:before%20%7B%20display:%20none;%20%7D" id="author" rel="alternate stylesheet" title="Author documentation only"><link href="data:text/css,.impl%20%7B%20background:%20%23FFEEEE;%20%7D%20.domintro:before%20%7B%20background:%20%23FFEEEE;%20%7D" id="highlight" rel="alternate stylesheet" title="Highlight implementation requirements"><script type="text/javascript">
1.68      mike      302:    function getCookie(name) {
                    303:      var params = location.search.substr(1).split("&");
                    304:      for (var index = 0; index < params.length; index++) {
                    305:        if (params[index] == name)
                    306:          return "1";
                    307:        var data = params[index].split("=");
                    308:        if (data[0] == name)
                    309:          return unescape(data[1]);
                    310:      }
                    311:      var cookies = document.cookie.split("; ");
                    312:      for (var index = 0; index < cookies.length; index++) {
                    313:        var data = cookies[index].split("=");
                    314:        if (data[0] == name)
                    315:          return unescape(data[1]);
                    316:      }
                    317:      return null;
                    318:    }
                    319:   </script>
1.1       mike      320:   <script src="link-fixup.js"></script>
1.88      mike      321:   <link href="parsing.html" title="8.2 Parsing HTML documents" rel="prev">
                    322:   <link href="index.html#contents" title="Table of contents" rel="contents">
1.70      mike      323:   <link href="tree-construction.html" title="8.2.5 Tree construction" rel="next">
1.100     mike      324:   </head><body onload="fixBrokenLink();" class="split chapter"><div class="head" id="head">
1.1       mike      325:    <p><a href="http://www.w3.org/"><img alt="W3C" height="48" src="http://www.w3.org/Icons/w3c_home" width="72"></a></p>
1.3       mike      326: 
1.1       mike      327:    <h1>HTML5</h1>
1.88      mike      328:    <h2 class="no-num no-toc" id="a-vocabulary-and-associated-apis-for-html-and-xhtml">A vocabulary and associated APIs for HTML and XHTML</h2>
1.112   ! mike      329:    <h2 class="no-num no-toc" id="editor-s-draft-31-january-2012">Editor's Draft 31 January 2012</h2>
1.88      mike      330:    </div><nav class="prev_next">
                    331:    <a href="parsing.html">&#8592; 8.2 Parsing HTML documents</a> &#8211;
                    332:    <a href="index.html#contents">Table of contents</a> &#8211;
                    333:    <a href="tree-construction.html">8.2.5 Tree construction &#8594;</a>
1.1       mike      334:   <ol class="toc"><li><ol><li><ol><li><a href="tokenization.html#tokenization"><span class="secno">8.2.4 </span>Tokenization</a>
1.88      mike      335:       <ol><li><a href="tokenization.html#data-state"><span class="secno">8.2.4.1 </span>Data state</a></li><li><a href="tokenization.html#character-reference-in-data-state"><span class="secno">8.2.4.2 </span>Character reference in data state</a></li><li><a href="tokenization.html#rcdata-state"><span class="secno">8.2.4.3 </span>RCDATA state</a></li><li><a href="tokenization.html#character-reference-in-rcdata-state"><span class="secno">8.2.4.4 </span>Character reference in RCDATA state</a></li><li><a href="tokenization.html#rawtext-state"><span class="secno">8.2.4.5 </span>RAWTEXT state</a></li><li><a href="tokenization.html#script-data-state"><span class="secno">8.2.4.6 </span>Script data state</a></li><li><a href="tokenization.html#plaintext-state"><span class="secno">8.2.4.7 </span>PLAINTEXT state</a></li><li><a href="tokenization.html#tag-open-state"><span class="secno">8.2.4.8 </span>Tag open state</a></li><li><a href="tokenization.html#end-tag-open-state"><span class="secno">8.2.4.9 </span>End tag open state</a></li><li><a href="tokenization.html#tag-name-state"><span class="secno">8.2.4.10 </span>Tag name state</a></li><li><a href="tokenization.html#rcdata-less-than-sign-state"><span class="secno">8.2.4.11 </span>RCDATA less-than sign state</a></li><li><a href="tokenization.html#rcdata-end-tag-open-state"><span class="secno">8.2.4.12 </span>RCDATA end tag open state</a></li><li><a href="tokenization.html#rcdata-end-tag-name-state"><span class="secno">8.2.4.13 </span>RCDATA end tag name state</a></li><li><a href="tokenization.html#rawtext-less-than-sign-state"><span class="secno">8.2.4.14 </span>RAWTEXT less-than sign state</a></li><li><a href="tokenization.html#rawtext-end-tag-open-state"><span class="secno">8.2.4.15 </span>RAWTEXT end tag open state</a></li><li><a href="tokenization.html#rawtext-end-tag-name-state"><span class="secno">8.2.4.16 </span>RAWTEXT end tag name state</a></li><li><a href="tokenization.html#script-data-less-than-sign-state"><span class="secno">8.2.4.17 </span>Script data less-than sign state</a></li><li><a href="tokenization.html#script-data-end-tag-open-state"><span class="secno">8.2.4.18 </span>Script data end tag open state</a></li><li><a href="tokenization.html#script-data-end-tag-name-state"><span class="secno">8.2.4.19 </span>Script data end tag name state</a></li><li><a href="tokenization.html#script-data-escape-start-state"><span class="secno">8.2.4.20 </span>Script data escape start state</a></li><li><a href="tokenization.html#script-data-escape-start-dash-state"><span class="secno">8.2.4.21 </span>Script data escape start dash state</a></li><li><a href="tokenization.html#script-data-escaped-state"><span class="secno">8.2.4.22 </span>Script data escaped state</a></li><li><a href="tokenization.html#script-data-escaped-dash-state"><span class="secno">8.2.4.23 </span>Script data escaped dash state</a></li><li><a href="tokenization.html#script-data-escaped-dash-dash-state"><span class="secno">8.2.4.24 </span>Script data escaped dash dash state</a></li><li><a href="tokenization.html#script-data-escaped-less-than-sign-state"><span class="secno">8.2.4.25 </span>Script data escaped less-than sign state</a></li><li><a href="tokenization.html#script-data-escaped-end-tag-open-state"><span class="secno">8.2.4.26 </span>Script data escaped end tag open state</a></li><li><a href="tokenization.html#script-data-escaped-end-tag-name-state"><span class="secno">8.2.4.27 </span>Script data escaped end tag name state</a></li><li><a href="tokenization.html#script-data-double-escape-start-state"><span class="secno">8.2.4.28 </span>Script data double escape start state</a></li><li><a href="tokenization.html#script-data-double-escaped-state"><span class="secno">8.2.4.29 </span>Script data double escaped state</a></li><li><a href="tokenization.html#script-data-double-escaped-dash-state"><span class="secno">8.2.4.30 </span>Script data double escaped dash state</a></li><li><a href="tokenization.html#script-data-double-escaped-dash-dash-state"><span class="secno">8.2.4.31 </span>Script data double escaped dash dash state</a></li><li><a href="tokenization.html#script-data-double-escaped-less-than-sign-state"><span class="secno">8.2.4.32 </span>Script data double escaped less-than sign state</a></li><li><a href="tokenization.html#script-data-double-escape-end-state"><span class="secno">8.2.4.33 </span>Script data double escape end state</a></li><li><a href="tokenization.html#before-attribute-name-state"><span class="secno">8.2.4.34 </span>Before attribute name state</a></li><li><a href="tokenization.html#attribute-name-state"><span class="secno">8.2.4.35 </span>Attribute name state</a></li><li><a href="tokenization.html#after-attribute-name-state"><span class="secno">8.2.4.36 </span>After attribute name state</a></li><li><a href="tokenization.html#before-attribute-value-state"><span class="secno">8.2.4.37 </span>Before attribute value state</a></li><li><a href="tokenization.html#attribute-value-double-quoted-state"><span class="secno">8.2.4.38 </span>Attribute value (double-quoted) state</a></li><li><a href="tokenization.html#attribute-value-single-quoted-state"><span class="secno">8.2.4.39 </span>Attribute value (single-quoted) state</a></li><li><a href="tokenization.html#attribute-value-unquoted-state"><span class="secno">8.2.4.40 </span>Attribute value (unquoted) state</a></li><li><a href="tokenization.html#character-reference-in-attribute-value-state"><span class="secno">8.2.4.41 </span>Character reference in attribute value state</a></li><li><a href="tokenization.html#after-attribute-value-quoted-state"><span class="secno">8.2.4.42 </span>After attribute value (quoted) state</a></li><li><a href="tokenization.html#self-closing-start-tag-state"><span class="secno">8.2.4.43 </span>Self-closing start tag state</a></li><li><a href="tokenization.html#bogus-comment-state"><span class="secno">8.2.4.44 </span>Bogus comment state</a></li><li><a href="tokenization.html#markup-declaration-open-state"><span class="secno">8.2.4.45 </span>Markup declaration open state</a></li><li><a href="tokenization.html#comment-start-state"><span class="secno">8.2.4.46 </span>Comment start state</a></li><li><a href="tokenization.html#comment-start-dash-state"><span class="secno">8.2.4.47 </span>Comment start dash state</a></li><li><a href="tokenization.html#comment-state"><span class="secno">8.2.4.48 </span>Comment state</a></li><li><a href="tokenization.html#comment-end-dash-state"><span class="secno">8.2.4.49 </span>Comment end dash state</a></li><li><a href="tokenization.html#comment-end-state"><span class="secno">8.2.4.50 </span>Comment end state</a></li><li><a href="tokenization.html#comment-end-bang-state"><span class="secno">8.2.4.51 </span>Comment end bang state</a></li><li><a href="tokenization.html#doctype-state"><span class="secno">8.2.4.52 </span>DOCTYPE state</a></li><li><a href="tokenization.html#before-doctype-name-state"><span class="secno">8.2.4.53 </span>Before DOCTYPE name state</a></li><li><a href="tokenization.html#doctype-name-state"><span class="secno">8.2.4.54 </span>DOCTYPE name state</a></li><li><a href="tokenization.html#after-doctype-name-state"><span class="secno">8.2.4.55 </span>After DOCTYPE name state</a></li><li><a href="tokenization.html#after-doctype-public-keyword-state"><span class="secno">8.2.4.56 </span>After DOCTYPE public keyword state</a></li><li><a href="tokenization.html#before-doctype-public-identifier-state"><span class="secno">8.2.4.57 </span>Before DOCTYPE public identifier state</a></li><li><a href="tokenization.html#doctype-public-identifier-double-quoted-state"><span class="secno">8.2.4.58 </span>DOCTYPE public identifier (double-quoted) state</a></li><li><a href="tokenization.html#doctype-public-identifier-single-quoted-state"><span class="secno">8.2.4.59 </span>DOCTYPE public identifier (single-quoted) state</a></li><li><a href="tokenization.html#after-doctype-public-identifier-state"><span class="secno">8.2.4.60 </span>After DOCTYPE public identifier state</a></li><li><a href="tokenization.html#between-doctype-public-and-system-identifiers-state"><span class="secno">8.2.4.61 </span>Between DOCTYPE public and system identifiers state</a></li><li><a href="tokenization.html#after-doctype-system-keyword-state"><span class="secno">8.2.4.62 </span>After DOCTYPE system keyword state</a></li><li><a href="tokenization.html#before-doctype-system-identifier-state"><span class="secno">8.2.4.63 </span>Before DOCTYPE system identifier state</a></li><li><a href="tokenization.html#doctype-system-identifier-double-quoted-state"><span class="secno">8.2.4.64 </span>DOCTYPE system identifier (double-quoted) state</a></li><li><a href="tokenization.html#doctype-system-identifier-single-quoted-state"><span class="secno">8.2.4.65 </span>DOCTYPE system identifier (single-quoted) state</a></li><li><a href="tokenization.html#after-doctype-system-identifier-state"><span class="secno">8.2.4.66 </span>After DOCTYPE system identifier state</a></li><li><a href="tokenization.html#bogus-doctype-state"><span class="secno">8.2.4.67 </span>Bogus DOCTYPE state</a></li><li><a href="tokenization.html#cdata-section-state"><span class="secno">8.2.4.68 </span>CDATA section state</a></li><li><a href="tokenization.html#tokenizing-character-references"><span class="secno">8.2.4.69 </span>Tokenizing character references</a></li></ol></li></ol></li></ol></li></ol></nav>
1.1       mike      336: 
                    337:   <div class="impl">
                    338: 
1.29      mike      339:   <h4 id="tokenization"><span class="secno">8.2.4 </span><dfn>Tokenization</dfn></h4>
1.1       mike      340: 
                    341:   <p>Implementations must act as if they used the following state
                    342:   machine to tokenize HTML. The state machine must start in the
                    343:   <a href="#data-state">data state</a>. Most states consume a single character,
                    344:   which may have various side-effects, and either switches the state
1.87      mike      345:   machine to a new state to <i>reconsume</i> the same character, or
                    346:   switches it to a new state to consume the next character, or stays
                    347:   in the same state to consume the next character. Some states have
                    348:   more complicated behavior and can consume several characters before
                    349:   switching to another state. In some cases, the tokenizer state is
                    350:   also changed by the tree construction stage.</p>
1.1       mike      351: 
                    352:   <p>The exact behavior of certain states depends on the
                    353:   <a href="parsing.html#insertion-mode">insertion mode</a> and the <a href="parsing.html#stack-of-open-elements">stack of open
                    354:   elements</a>. Certain states also use a <dfn id="temporary-buffer"><var>temporary
                    355:   buffer</var></dfn> to track progress.</p>
                    356: 
                    357:   <p>The output of the tokenization step is a series of zero or more
                    358:   of the following tokens: DOCTYPE, start tag, end tag, comment,
                    359:   character, end-of-file. DOCTYPE tokens have a name, a public
                    360:   identifier, a system identifier, and a <i>force-quirks
                    361:   flag</i>. When a DOCTYPE token is created, its name, public
                    362:   identifier, and system identifier must be marked as missing (which
                    363:   is a distinct state from the empty string), and the <i>force-quirks
                    364:   flag</i> must be set to <i>off</i> (its other state is
                    365:   <i>on</i>). Start and end tag tokens have a tag name, a
                    366:   <i>self-closing flag</i>, and a list of attributes, each of which
                    367:   has a name and a value. When a start or end tag token is created,
                    368:   its <i>self-closing flag</i> must be unset (its other state is that
                    369:   it be set), and its attributes list must be empty. Comment and
                    370:   character tokens have data.</p>
                    371: 
                    372:   <p>When a token is emitted, it must immediately be handled by the
1.70      mike      373:   <a href="tree-construction.html#tree-construction">tree construction</a> stage. The tree construction stage
1.1       mike      374:   can affect the state of the tokenization stage, and can insert
                    375:   additional characters into the stream. (For example, the
1.88      mike      376:   <code><a href="the-script-element.html#the-script-element">script</a></code> element can result in scripts executing and
                    377:   using the <a href="dynamic-markup-insertion.html#dynamic-markup-insertion">dynamic markup insertion</a> APIs to insert
1.1       mike      378:   characters into the stream being tokenized.)</p>
                    379: 
                    380:   <p>When a start tag token is emitted with its <i>self-closing
                    381:   flag</i> set, if the flag is not <dfn id="acknowledge-self-closing-flag" title="acknowledge
                    382:   self-closing flag">acknowledged</dfn> when it is processed by the
                    383:   tree construction stage, that is a <a href="parsing.html#parse-error">parse error</a>.</p>
                    384: 
                    385:   <p>When an end tag token is emitted with attributes, that is a
                    386:   <a href="parsing.html#parse-error">parse error</a>.</p>
                    387: 
                    388:   <p>When an end tag token is emitted with its <i>self-closing
                    389:   flag</i> set, that is a <a href="parsing.html#parse-error">parse error</a>.</p>
                    390: 
                    391:   <p>An <dfn id="appropriate-end-tag-token">appropriate end tag token</dfn> is an end tag token whose
                    392:   tag name matches the tag name of the last start tag to have been
                    393:   emitted from this tokenizer, if any. If no start tag has been
                    394:   emitted from this tokenizer, then no end tag token is
                    395:   appropriate.</p>
                    396: 
                    397:   <p>Before each step of the tokenizer, the user agent must first
                    398:   check the <a href="parsing.html#parser-pause-flag">parser pause flag</a>. If it is true, then the
                    399:   tokenizer must abort the processing of any nested invocations of the
                    400:   tokenizer, yielding control back to the caller.</p>
                    401: 
                    402:   <p>The tokenizer state machine consists of the states defined in the
                    403:   following subsections.</p>
                    404: 
                    405: 
1.70      mike      406:   
1.1       mike      407: 
1.90      mike      408: 
1.29      mike      409:   <h5 id="data-state"><span class="secno">8.2.4.1 </span><dfn>Data state</dfn></h5>
1.1       mike      410: 
                    411:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    412: 
                    413:   <dl class="switch"><dt>U+0026 AMPERSAND (&amp;)</dt>
                    414:    <dd>Switch to the <a href="#character-reference-in-data-state">character reference in data
                    415:    state</a>.</dd>
                    416: 
                    417:    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
                    418:    <dd>Switch to the <a href="#tag-open-state">tag open state</a>.</dd>
                    419: 
1.51      mike      420:    <dt>U+0000 NULL</dt>
                    421:    <dd><a href="parsing.html#parse-error">Parse error</a>. Emit the <a href="parsing.html#current-input-character">current input
                    422:    character</a> as a character token.</dd>
                    423: 
1.1       mike      424:    <dt>EOF</dt>
                    425:    <dd>Emit an end-of-file token.</dd>
                    426: 
                    427:    <dt>Anything else</dt>
                    428:    <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14      mike      429:    token.</dd>
1.1       mike      430: 
1.29      mike      431:   </dl><h5 id="character-reference-in-data-state"><span class="secno">8.2.4.2 </span><dfn>Character reference in data state</dfn></h5>
1.1       mike      432: 
1.87      mike      433:   <p>Switch to the <a href="#data-state">data state</a>.</p>
                    434: 
1.1       mike      435:   <p>Attempt to <a href="#consume-a-character-reference">consume a character reference</a>, with no
                    436:   <a href="#additional-allowed-character">additional allowed character</a>.</p>
                    437: 
1.18      mike      438:   <p>If nothing is returned, emit a U+0026 AMPERSAND character (&amp;)
1.1       mike      439:   token.</p>
                    440: 
1.85      mike      441:   <p>Otherwise, emit the character tokens that were returned.</p>
1.1       mike      442: 
                    443: 
1.29      mike      444:   <h5 id="rcdata-state"><span class="secno">8.2.4.3 </span><dfn>RCDATA state</dfn></h5>
1.1       mike      445: 
                    446:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    447: 
                    448:   <dl class="switch"><dt>U+0026 AMPERSAND (&amp;)</dt>
                    449:    <dd>Switch to the <a href="#character-reference-in-rcdata-state">character reference in RCDATA
                    450:    state</a>.</dd>
                    451: 
                    452:    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
                    453:    <dd>Switch to the <a href="#rcdata-less-than-sign-state">RCDATA less-than sign state</a>.</dd>
                    454: 
1.51      mike      455:    <dt>U+0000 NULL</dt>
                    456:    <dd><a href="parsing.html#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
                    457:    character token.</dd>
                    458: 
1.1       mike      459:    <dt>EOF</dt>
                    460:    <dd>Emit an end-of-file token.</dd>
                    461: 
                    462:    <dt>Anything else</dt>
                    463:    <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14      mike      464:    token.</dd>
1.1       mike      465: 
                    466:   </dl><h5 id="character-reference-in-rcdata-state"><span class="secno">8.2.4.4 </span><dfn>Character reference in RCDATA state</dfn></h5>
                    467: 
1.87      mike      468:   <p>Switch to the <a href="#rcdata-state">RCDATA state</a>.</p>
                    469: 
1.1       mike      470:   <p>Attempt to <a href="#consume-a-character-reference">consume a character reference</a>, with no
                    471:   <a href="#additional-allowed-character">additional allowed character</a>.</p>
                    472: 
1.18      mike      473:   <p>If nothing is returned, emit a U+0026 AMPERSAND character (&amp;)
1.1       mike      474:   token.</p>
                    475: 
1.85      mike      476:   <p>Otherwise, emit the character tokens that were returned.</p>
1.1       mike      477: 
                    478: 
1.29      mike      479:   <h5 id="rawtext-state"><span class="secno">8.2.4.5 </span><dfn>RAWTEXT state</dfn></h5>
1.1       mike      480: 
                    481:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    482: 
                    483:   <dl class="switch"><dt>U+003C LESS-THAN SIGN (&lt;)</dt>
                    484:    <dd>Switch to the <a href="#rawtext-less-than-sign-state">RAWTEXT less-than sign state</a>.</dd>
                    485: 
1.51      mike      486:    <dt>U+0000 NULL</dt>
                    487:    <dd><a href="parsing.html#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
                    488:    character token.</dd>
                    489: 
1.1       mike      490:    <dt>EOF</dt>
                    491:    <dd>Emit an end-of-file token.</dd>
                    492: 
                    493:    <dt>Anything else</dt>
                    494:    <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14      mike      495:    token.</dd>
1.1       mike      496: 
1.29      mike      497:   </dl><h5 id="script-data-state"><span class="secno">8.2.4.6 </span><dfn>Script data state</dfn></h5>
1.1       mike      498: 
                    499:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    500: 
                    501:   <dl class="switch"><dt>U+003C LESS-THAN SIGN (&lt;)</dt>
                    502:    <dd>Switch to the <a href="#script-data-less-than-sign-state">script data less-than sign state</a>.</dd>
                    503: 
1.51      mike      504:    <dt>U+0000 NULL</dt>
                    505:    <dd><a href="parsing.html#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
                    506:    character token.</dd>
                    507: 
1.1       mike      508:    <dt>EOF</dt>
                    509:    <dd>Emit an end-of-file token.</dd>
                    510: 
                    511:    <dt>Anything else</dt>
                    512:    <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14      mike      513:    token.</dd>
1.1       mike      514: 
1.29      mike      515:   </dl><h5 id="plaintext-state"><span class="secno">8.2.4.7 </span><dfn>PLAINTEXT state</dfn></h5>
1.1       mike      516: 
                    517:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    518: 
1.51      mike      519:   <dl class="switch"><dt>U+0000 NULL</dt>
                    520:    <dd><a href="parsing.html#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
                    521:    character token.</dd>
                    522: 
                    523:    <dt>EOF</dt>
1.1       mike      524:    <dd>Emit an end-of-file token.</dd>
                    525: 
                    526:    <dt>Anything else</dt>
                    527:    <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14      mike      528:    token.</dd>
1.1       mike      529: 
1.29      mike      530:   </dl><h5 id="tag-open-state"><span class="secno">8.2.4.8 </span><dfn>Tag open state</dfn></h5>
1.1       mike      531: 
                    532:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    533: 
                    534:   <dl class="switch"><dt>U+0021 EXCLAMATION MARK (!)</dt>
                    535:    <dd>Switch to the <a href="#markup-declaration-open-state">markup declaration open state</a>.</dd>
                    536: 
                    537:    <dt>U+002F SOLIDUS (/)</dt>
                    538:    <dd>Switch to the <a href="#end-tag-open-state">end tag open state</a>.</dd>
                    539: 
                    540:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                    541:    <dd>Create a new start tag token, set its tag name to the
                    542:    lowercase version of the <a href="parsing.html#current-input-character">current input character</a> (add 0x0020 to the
                    543:    character's code point), then switch to the <a href="#tag-name-state">tag name
                    544:    state</a>. (Don't emit the token yet; further details will
                    545:    be filled in before it is emitted.)</dd>
                    546: 
                    547:    <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
                    548:    <dd>Create a new start tag token, set its tag name to the
                    549:    <a href="parsing.html#current-input-character">current input character</a>, then switch to the <a href="#tag-name-state">tag
                    550:    name state</a>. (Don't emit the token yet; further details will
                    551:    be filled in before it is emitted.)</dd>
                    552: 
                    553:    <dt>U+003F QUESTION MARK (?)</dt>
                    554:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#bogus-comment-state">bogus
                    555:    comment state</a>.</dd>
                    556: 
                    557:    <dt>Anything else</dt>
1.87      mike      558:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                    559:    state</a>. Emit a U+003C LESS-THAN SIGN character token.
                    560:    Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike      561: 
                    562:   </dl><h5 id="end-tag-open-state"><span class="secno">8.2.4.9 </span><dfn>End tag open state</dfn></h5>
                    563: 
                    564:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    565: 
                    566:   <dl class="switch"><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                    567:    <dd>Create a new end tag token, set its tag name to the lowercase
                    568:    version of the <a href="parsing.html#current-input-character">current input character</a> (add 0x0020 to
                    569:    the character's code point), then switch to the <a href="#tag-name-state">tag name
                    570:    state</a>. (Don't emit the token yet; further details will be
                    571:    filled in before it is emitted.)</dd>
                    572: 
                    573:    <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
                    574:    <dd>Create a new end tag token, set its tag name to the
                    575:    <a href="parsing.html#current-input-character">current input character</a>, then switch to the <a href="#tag-name-state">tag
                    576:    name state</a>. (Don't emit the token yet; further details will
                    577:    be filled in before it is emitted.)</dd>
                    578: 
                    579:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                    580:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                    581:    state</a>.</dd>
                    582: 
                    583:    <dt>EOF</dt>
1.87      mike      584:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                    585:    state</a>. Emit a U+003C LESS-THAN SIGN character token and a
                    586:    U+002F SOLIDUS character token. Reconsume the EOF character.</dd>
1.1       mike      587: 
                    588:    <dt>Anything else</dt>
                    589:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#bogus-comment-state">bogus
                    590:    comment state</a>.</dd>
                    591: 
1.29      mike      592:   </dl><h5 id="tag-name-state"><span class="secno">8.2.4.10 </span><dfn>Tag name state</dfn></h5>
1.1       mike      593: 
                    594:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    595: 
1.73      mike      596:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike      597:    <dt>U+000A LINE FEED (LF)</dt>
                    598:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike      599:    
1.1       mike      600:    <dt>U+0020 SPACE</dt>
                    601:    <dd>Switch to the <a href="#before-attribute-name-state">before attribute name state</a>.</dd>
                    602: 
                    603:    <dt>U+002F SOLIDUS (/)</dt>
                    604:    <dd>Switch to the <a href="#self-closing-start-tag-state">self-closing start tag state</a>.</dd>
                    605: 
                    606:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike      607:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the current tag
                    608:    token.</dd>
1.1       mike      609: 
                    610:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                    611:    <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
                    612:    character</a> (add 0x0020 to the character's code point) to the
1.14      mike      613:    current tag token's tag name.</dd>
1.1       mike      614: 
1.51      mike      615:    <dt>U+0000 NULL</dt>
                    616:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
                    617:    character to the current tag token's tag name.</dd>
                    618: 
1.1       mike      619:    <dt>EOF</dt>
1.87      mike      620:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                    621:    state</a>. Reconsume the EOF character.</dd>
1.1       mike      622: 
                    623:    <dt>Anything else</dt>
                    624:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14      mike      625:    tag token's tag name.</dd>
1.1       mike      626: 
1.29      mike      627:   </dl><h5 id="rcdata-less-than-sign-state"><span class="secno">8.2.4.11 </span><dfn>RCDATA less-than sign state</dfn></h5>
1.70      mike      628:   
1.1       mike      629: 
                    630:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    631: 
                    632:   <dl class="switch"><dt>U+002F SOLIDUS (/)</dt>
                    633:    <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Switch
                    634:    to the <a href="#rcdata-end-tag-open-state">RCDATA end tag open state</a>.</dd>
                    635: 
                    636:    <dt>Anything else</dt>
1.87      mike      637:    <dd>Switch to the <a href="#rcdata-state">RCDATA state</a>. Emit a U+003C
                    638:    LESS-THAN SIGN character token. Reconsume the <a href="parsing.html#current-input-character">current
                    639:    input character</a>.</dd>
1.1       mike      640: 
1.29      mike      641:   </dl><h5 id="rcdata-end-tag-open-state"><span class="secno">8.2.4.12 </span><dfn>RCDATA end tag open state</dfn></h5>
1.70      mike      642:   
1.1       mike      643: 
                    644:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    645: 
                    646:   <dl class="switch"><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                    647:    <dd>Create a new end tag token, and set its tag name to the
                    648:    lowercase version of the <a href="parsing.html#current-input-character">current input character</a> (add
                    649:    0x0020 to the character's code point). Append the <a href="parsing.html#current-input-character">current
                    650:    input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
                    651:    switch to the <a href="#rcdata-end-tag-name-state">RCDATA end tag name state</a>. (Don't emit
                    652:    the token yet; further details will be filled in before it is
                    653:    emitted.)</dd>
                    654: 
                    655:    <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
                    656:    <dd>Create a new end tag token, and set its tag name to the
                    657:    <a href="parsing.html#current-input-character">current input character</a>. Append the <a href="parsing.html#current-input-character">current
                    658:    input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
                    659:    switch to the <a href="#rcdata-end-tag-name-state">RCDATA end tag name state</a>. (Don't emit
                    660:    the token yet; further details will be filled in before it is
                    661:    emitted.)</dd>
                    662: 
                    663:    <dt>Anything else</dt>
1.87      mike      664:    <dd>Switch to the <a href="#rcdata-state">RCDATA state</a>. Emit a U+003C
                    665:    LESS-THAN SIGN character token and a U+002F SOLIDUS character token.
                    666:    Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike      667: 
1.29      mike      668:   </dl><h5 id="rcdata-end-tag-name-state"><span class="secno">8.2.4.13 </span><dfn>RCDATA end tag name state</dfn></h5>
1.70      mike      669:   
1.1       mike      670: 
                    671:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    672: 
1.73      mike      673:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike      674:    <dt>U+000A LINE FEED (LF)</dt>
                    675:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike      676:    
1.1       mike      677:    <dt>U+0020 SPACE</dt>
                    678:    <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
                    679:    token</a>, then switch to the <a href="#before-attribute-name-state">before attribute name
                    680:    state</a>. Otherwise, treat it as per the "anything else" entry
                    681:    below.</dd>
                    682: 
                    683:    <dt>U+002F SOLIDUS (/)</dt>
                    684:    <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
                    685:    token</a>, then switch to the <a href="#self-closing-start-tag-state">self-closing start tag
                    686:    state</a>. Otherwise, treat it as per the "anything else" entry
                    687:    below.</dd>
                    688: 
                    689:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                    690:    <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
1.87      mike      691:    token</a>, then switch to the <a href="#data-state">data state</a> and emit
                    692:    the current tag token. Otherwise, treat it as per the "anything
1.1       mike      693:    else" entry below.</dd>
                    694: 
                    695:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                    696:    <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
                    697:    character</a> (add 0x0020 to the character's code point) to the
                    698:    current tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14      mike      699:    character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1       mike      700: 
                    701:    <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
                    702:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
                    703:    tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14      mike      704:    character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1       mike      705: 
                    706:    <dt>Anything else</dt>
1.87      mike      707:    <dd>Switch to the <a href="#rcdata-state">RCDATA state</a>. Emit a U+003C
                    708:    LESS-THAN SIGN character token, a U+002F SOLIDUS character token,
                    709:    and a character token for each of the characters in the
                    710:    <var><a href="#temporary-buffer">temporary buffer</a></var> (in the order they were added to the
                    711:    buffer). Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike      712: 
1.29      mike      713:   </dl><h5 id="rawtext-less-than-sign-state"><span class="secno">8.2.4.14 </span><dfn>RAWTEXT less-than sign state</dfn></h5>
1.70      mike      714:   
1.1       mike      715: 
                    716:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    717: 
                    718:   <dl class="switch"><dt>U+002F SOLIDUS (/)</dt>
                    719:    <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Switch
                    720:    to the <a href="#rawtext-end-tag-open-state">RAWTEXT end tag open state</a>.</dd>
                    721: 
                    722:    <dt>Anything else</dt>
1.87      mike      723:    <dd>Switch to the <a href="#rawtext-state">RAWTEXT state</a>. Emit a U+003C
                    724:    LESS-THAN SIGN character token. Reconsume the <a href="parsing.html#current-input-character">current
                    725:    input character</a>.</dd>
1.1       mike      726: 
1.29      mike      727:   </dl><h5 id="rawtext-end-tag-open-state"><span class="secno">8.2.4.15 </span><dfn>RAWTEXT end tag open state</dfn></h5>
1.70      mike      728:   
1.1       mike      729: 
                    730:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    731: 
                    732:   <dl class="switch"><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                    733:    <dd>Create a new end tag token, and set its tag name to the
                    734:    lowercase version of the <a href="parsing.html#current-input-character">current input character</a> (add
                    735:    0x0020 to the character's code point). Append the <a href="parsing.html#current-input-character">current
                    736:    input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
                    737:    switch to the <a href="#rawtext-end-tag-name-state">RAWTEXT end tag name state</a>. (Don't emit
                    738:    the token yet; further details will be filled in before it is
                    739:    emitted.)</dd>
                    740: 
                    741:    <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
                    742:    <dd>Create a new end tag token, and set its tag name to the
                    743:    <a href="parsing.html#current-input-character">current input character</a>. Append the <a href="parsing.html#current-input-character">current
                    744:    input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
                    745:    switch to the <a href="#rawtext-end-tag-name-state">RAWTEXT end tag name state</a>. (Don't emit
                    746:    the token yet; further details will be filled in before it is
                    747:    emitted.)</dd>
                    748: 
                    749:    <dt>Anything else</dt>
1.87      mike      750:    <dd>Switch to the <a href="#rawtext-state">RAWTEXT state</a>. Emit a U+003C
                    751:    LESS-THAN SIGN character token and a U+002F SOLIDUS character
                    752:    token. Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike      753: 
1.29      mike      754:   </dl><h5 id="rawtext-end-tag-name-state"><span class="secno">8.2.4.16 </span><dfn>RAWTEXT end tag name state</dfn></h5>
1.70      mike      755:   
1.1       mike      756: 
                    757:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    758: 
1.73      mike      759:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike      760:    <dt>U+000A LINE FEED (LF)</dt>
                    761:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike      762:    
1.1       mike      763:    <dt>U+0020 SPACE</dt>
                    764:    <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
                    765:    token</a>, then switch to the <a href="#before-attribute-name-state">before attribute name
                    766:    state</a>. Otherwise, treat it as per the "anything else" entry
                    767:    below.</dd>
                    768: 
                    769:    <dt>U+002F SOLIDUS (/)</dt>
                    770:    <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
                    771:    token</a>, then switch to the <a href="#self-closing-start-tag-state">self-closing start tag
                    772:    state</a>. Otherwise, treat it as per the "anything else" entry
                    773:    below.</dd>
                    774: 
                    775:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                    776:    <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
1.87      mike      777:    token</a>, then switch to the <a href="#data-state">data state</a> and emit
                    778:    the current tag token. Otherwise, treat it as per the "anything
1.1       mike      779:    else" entry below.</dd>
                    780: 
                    781:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                    782:    <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
                    783:    character</a> (add 0x0020 to the character's code point) to the
                    784:    current tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14      mike      785:    character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1       mike      786: 
                    787:    <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
                    788:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
                    789:    tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14      mike      790:    character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1       mike      791: 
                    792:    <dt>Anything else</dt>
1.87      mike      793:    <dd>Switch to the <a href="#rawtext-state">RAWTEXT state</a>. Emit a U+003C
                    794:    LESS-THAN SIGN character token, a U+002F SOLIDUS character token,
                    795:    and a character token for each of the characters in the
                    796:    <var><a href="#temporary-buffer">temporary buffer</a></var> (in the order they were added to the
                    797:    buffer). Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike      798: 
1.29      mike      799:   </dl><h5 id="script-data-less-than-sign-state"><span class="secno">8.2.4.17 </span><dfn>Script data less-than sign state</dfn></h5>
1.1       mike      800: 
                    801:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    802: 
                    803:   <dl class="switch"><dt>U+002F SOLIDUS (/)</dt>
                    804:    <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Switch
                    805:    to the <a href="#script-data-end-tag-open-state">script data end tag open state</a>.</dd>
                    806: 
                    807:    <dt>U+0021 EXCLAMATION MARK (!)</dt>
1.14      mike      808:    <dd>Switch to the <a href="#script-data-escape-start-state">script data escape start state</a>. Emit
                    809:    a U+003C LESS-THAN SIGN character token and a U+0021 EXCLAMATION
                    810:    MARK character token.</dd>
1.1       mike      811: 
                    812:    <dt>Anything else</dt>
1.87      mike      813:    <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003C
                    814:    LESS-THAN SIGN character token. Reconsume the <a href="parsing.html#current-input-character">current
                    815:    input character</a>.</dd>
1.1       mike      816: 
1.29      mike      817:   </dl><h5 id="script-data-end-tag-open-state"><span class="secno">8.2.4.18 </span><dfn>Script data end tag open state</dfn></h5>
1.70      mike      818:   
1.1       mike      819: 
                    820:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    821: 
                    822:   <dl class="switch"><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                    823:    <dd>Create a new end tag token, and set its tag name to the
                    824:    lowercase version of the <a href="parsing.html#current-input-character">current input character</a> (add
                    825:    0x0020 to the character's code point). Append the <a href="parsing.html#current-input-character">current
                    826:    input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
                    827:    switch to the <a href="#script-data-end-tag-name-state">script data end tag name state</a>. (Don't emit
                    828:    the token yet; further details will be filled in before it is
                    829:    emitted.)</dd>
                    830: 
                    831:    <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
                    832:    <dd>Create a new end tag token, and set its tag name to the
                    833:    <a href="parsing.html#current-input-character">current input character</a>. Append the <a href="parsing.html#current-input-character">current
                    834:    input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
                    835:    switch to the <a href="#script-data-end-tag-name-state">script data end tag name state</a>. (Don't emit
                    836:    the token yet; further details will be filled in before it is
                    837:    emitted.)</dd>
                    838: 
                    839:    <dt>Anything else</dt>
1.87      mike      840:    <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003C
                    841:    LESS-THAN SIGN character token and a U+002F SOLIDUS character token.
                    842:    Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike      843: 
1.29      mike      844:   </dl><h5 id="script-data-end-tag-name-state"><span class="secno">8.2.4.19 </span><dfn>Script data end tag name state</dfn></h5>
1.70      mike      845:   
1.1       mike      846: 
                    847:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    848: 
1.73      mike      849:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike      850:    <dt>U+000A LINE FEED (LF)</dt>
                    851:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike      852:    
1.1       mike      853:    <dt>U+0020 SPACE</dt>
                    854:    <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
                    855:    token</a>, then switch to the <a href="#before-attribute-name-state">before attribute name
                    856:    state</a>. Otherwise, treat it as per the "anything else" entry
                    857:    below.</dd>
                    858: 
                    859:    <dt>U+002F SOLIDUS (/)</dt>
                    860:    <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
                    861:    token</a>, then switch to the <a href="#self-closing-start-tag-state">self-closing start tag
                    862:    state</a>. Otherwise, treat it as per the "anything else" entry
                    863:    below.</dd>
                    864: 
                    865:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                    866:    <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
1.87      mike      867:    token</a>, then switch to the <a href="#data-state">data state</a> and emit
                    868:    the current tag token. Otherwise, treat it as per the "anything
1.1       mike      869:    else" entry below.</dd>
                    870: 
                    871:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                    872:    <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
                    873:    character</a> (add 0x0020 to the character's code point) to the
                    874:    current tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14      mike      875:    character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1       mike      876: 
                    877:    <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
                    878:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
                    879:    tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14      mike      880:    character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1       mike      881: 
                    882:    <dt>Anything else</dt>
1.87      mike      883:    <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003C
                    884:    LESS-THAN SIGN character token, a U+002F SOLIDUS character token,
                    885:    and a character token for each of the characters in the
                    886:    <var><a href="#temporary-buffer">temporary buffer</a></var> (in the order they were added to the
                    887:    buffer). Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike      888: 
1.29      mike      889:   </dl><h5 id="script-data-escape-start-state"><span class="secno">8.2.4.20 </span><dfn>Script data escape start state</dfn></h5>
1.1       mike      890: 
                    891:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    892: 
                    893:   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14      mike      894:    <dd>Switch to the <a href="#script-data-escape-start-dash-state">script data escape start dash
                    895:    state</a>. Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1       mike      896: 
                    897:    <dt>Anything else</dt>
1.87      mike      898:    <dd>Switch to the <a href="#script-data-state">script data state</a>. Reconsume the
                    899:    <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike      900: 
1.29      mike      901:   </dl><h5 id="script-data-escape-start-dash-state"><span class="secno">8.2.4.21 </span><dfn>Script data escape start dash state</dfn></h5>
1.1       mike      902: 
                    903:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    904: 
                    905:   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14      mike      906:    <dd>Switch to the <a href="#script-data-escaped-dash-dash-state">script data escaped dash dash
                    907:    state</a>. Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1       mike      908: 
                    909:    <dt>Anything else</dt>
1.87      mike      910:    <dd>Switch to the <a href="#script-data-state">script data state</a>. Reconsume the
                    911:    <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike      912: 
1.29      mike      913:   </dl><h5 id="script-data-escaped-state"><span class="secno">8.2.4.22 </span><dfn>Script data escaped state</dfn></h5>
1.1       mike      914: 
                    915:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    916: 
                    917:   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14      mike      918:    <dd>Switch to the <a href="#script-data-escaped-dash-state">script data escaped dash state</a>. Emit
                    919:    a U+002D HYPHEN-MINUS character token.</dd>
1.1       mike      920: 
                    921:    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
1.61      mike      922:    <dd>Switch to the <a href="#script-data-escaped-less-than-sign-state">script data escaped less-than sign
                    923:    state</a>.</dd>
1.1       mike      924: 
1.51      mike      925:    <dt>U+0000 NULL</dt>
                    926:    <dd><a href="parsing.html#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
                    927:    character token.</dd>
                    928: 
1.1       mike      929:    <dt>EOF</dt>
1.87      mike      930:    <dd>Switch to the <a href="#data-state">data state</a>. <a href="parsing.html#parse-error">Parse
                    931:    error</a>. Reconsume the EOF character.</dd>
1.1       mike      932: 
                    933:    <dt>Anything else</dt>
                    934:    <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14      mike      935:    token.</dd>
1.1       mike      936: 
1.29      mike      937:   </dl><h5 id="script-data-escaped-dash-state"><span class="secno">8.2.4.23 </span><dfn>Script data escaped dash state</dfn></h5>
1.1       mike      938: 
                    939:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    940: 
                    941:   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14      mike      942:    <dd>Switch to the <a href="#script-data-escaped-dash-dash-state">script data escaped dash dash
                    943:    state</a>. Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1       mike      944: 
                    945:    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
1.61      mike      946:    <dd>Switch to the <a href="#script-data-escaped-less-than-sign-state">script data escaped less-than sign
                    947:    state</a>.</dd>
1.1       mike      948: 
1.51      mike      949:    <dt>U+0000 NULL</dt>
                    950:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#script-data-escaped-state">script data
                    951:    escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER character
                    952:    token.</dd>
                    953: 
1.1       mike      954:    <dt>EOF</dt>
1.87      mike      955:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                    956:    state</a>. Reconsume the EOF character.</dd>
1.1       mike      957: 
                    958:    <dt>Anything else</dt>
1.14      mike      959:    <dd>Switch to the <a href="#script-data-escaped-state">script data escaped state</a>. Emit the
                    960:    <a href="parsing.html#current-input-character">current input character</a> as a character token.</dd>
1.1       mike      961: 
1.29      mike      962:   </dl><h5 id="script-data-escaped-dash-dash-state"><span class="secno">8.2.4.24 </span><dfn>Script data escaped dash dash state</dfn></h5>
1.1       mike      963: 
                    964:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    965: 
                    966:   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14      mike      967:    <dd>Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1       mike      968: 
                    969:    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
1.61      mike      970:    <dd>Switch to the <a href="#script-data-escaped-less-than-sign-state">script data escaped less-than sign
                    971:    state</a>.</dd>
1.1       mike      972: 
                    973:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike      974:    <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003E
                    975:    GREATER-THAN SIGN character token.</dd>
1.1       mike      976: 
1.51      mike      977:    <dt>U+0000 NULL</dt>
                    978:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#script-data-escaped-state">script data
                    979:    escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER character
                    980:    token.</dd>
                    981: 
1.1       mike      982:    <dt>EOF</dt>
1.87      mike      983:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                    984:    state</a>. Reconsume the EOF character.</dd>
1.1       mike      985: 
                    986:    <dt>Anything else</dt>
1.14      mike      987:    <dd>Switch to the <a href="#script-data-escaped-state">script data escaped state</a>. Emit the
                    988:    <a href="parsing.html#current-input-character">current input character</a> as a character token.</dd>
1.1       mike      989: 
1.29      mike      990:   </dl><h5 id="script-data-escaped-less-than-sign-state"><span class="secno">8.2.4.25 </span><dfn>Script data escaped less-than sign state</dfn></h5>
1.1       mike      991: 
                    992:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                    993: 
                    994:   <dl class="switch"><dt>U+002F SOLIDUS (/)</dt>
                    995:    <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Switch
                    996:    to the <a href="#script-data-escaped-end-tag-open-state">script data escaped end tag open state</a>.</dd>
                    997: 
                    998:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
1.14      mike      999:    <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Append
                   1000:    the lowercase version of the <a href="parsing.html#current-input-character">current input character</a>
                   1001:    (add 0x0020 to the character's code point) to the <var><a href="#temporary-buffer">temporary
1.1       mike     1002:    buffer</a></var>. Switch to the <a href="#script-data-double-escape-start-state">script data double escape start
1.14      mike     1003:    state</a>. Emit a U+003C LESS-THAN SIGN character token and the
                   1004:    <a href="parsing.html#current-input-character">current input character</a> as a character token.</dd>
1.1       mike     1005: 
                   1006:    <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
1.14      mike     1007:    <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Append
                   1008:    the <a href="parsing.html#current-input-character">current input character</a> to the <var><a href="#temporary-buffer">temporary
1.1       mike     1009:    buffer</a></var>. Switch to the <a href="#script-data-double-escape-start-state">script data double escape start
1.14      mike     1010:    state</a>. Emit a U+003C LESS-THAN SIGN character token and the
                   1011:    <a href="parsing.html#current-input-character">current input character</a> as a character token.</dd>
1.1       mike     1012: 
                   1013:    <dt>Anything else</dt>
1.87      mike     1014:    <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003C
                   1015:    LESS-THAN SIGN character token. Reconsume the <a href="parsing.html#current-input-character">current
                   1016:    input character</a>.</dd>
1.1       mike     1017: 
1.29      mike     1018:   </dl><h5 id="script-data-escaped-end-tag-open-state"><span class="secno">8.2.4.26 </span><dfn>Script data escaped end tag open state</dfn></h5>
1.1       mike     1019: 
                   1020:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1021: 
                   1022:   <dl class="switch"><dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                   1023:    <dd>Create a new end tag token, and set its tag name to the
                   1024:    lowercase version of the <a href="parsing.html#current-input-character">current input character</a> (add
                   1025:    0x0020 to the character's code point). Append the <a href="parsing.html#current-input-character">current
                   1026:    input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
                   1027:    switch to the <a href="#script-data-escaped-end-tag-name-state">script data escaped end tag name
                   1028:    state</a>. (Don't emit the token yet; further details will be
                   1029:    filled in before it is emitted.)</dd>
                   1030: 
                   1031:    <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
                   1032:    <dd>Create a new end tag token, and set its tag name to the
                   1033:    <a href="parsing.html#current-input-character">current input character</a>. Append the <a href="parsing.html#current-input-character">current
                   1034:    input character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>. Finally,
                   1035:    switch to the <a href="#script-data-escaped-end-tag-name-state">script data escaped end tag name
                   1036:    state</a>. (Don't emit the token yet; further details will be
                   1037:    filled in before it is emitted.)</dd>
                   1038: 
                   1039:    <dt>Anything else</dt>
1.87      mike     1040:    <dd>Switch to the <a href="#script-data-escaped-state">script data escaped state</a>. Emit a
                   1041:    U+003C LESS-THAN SIGN character token and a U+002F SOLIDUS
                   1042:    character token. Reconsume the <a href="parsing.html#current-input-character">current input
                   1043:    character</a>.</dd>
1.1       mike     1044: 
1.29      mike     1045:   </dl><h5 id="script-data-escaped-end-tag-name-state"><span class="secno">8.2.4.27 </span><dfn>Script data escaped end tag name state</dfn></h5>
1.1       mike     1046: 
                   1047:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1048: 
1.73      mike     1049:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1050:    <dt>U+000A LINE FEED (LF)</dt>
                   1051:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1052:    
1.1       mike     1053:    <dt>U+0020 SPACE</dt>
                   1054:    <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
                   1055:    token</a>, then switch to the <a href="#before-attribute-name-state">before attribute name
                   1056:    state</a>. Otherwise, treat it as per the "anything else" entry
                   1057:    below.</dd>
                   1058: 
                   1059:    <dt>U+002F SOLIDUS (/)</dt>
                   1060:    <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
                   1061:    token</a>, then switch to the <a href="#self-closing-start-tag-state">self-closing start tag
                   1062:    state</a>. Otherwise, treat it as per the "anything else" entry
                   1063:    below.</dd>
                   1064: 
                   1065:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                   1066:    <dd>If the current end tag token is an <a href="#appropriate-end-tag-token">appropriate end tag
1.87      mike     1067:    token</a>, then switch to the <a href="#data-state">data state</a> and emit
                   1068:    the current tag token. Otherwise, treat it as per the "anything
1.1       mike     1069:    else" entry below.</dd>
                   1070: 
                   1071:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                   1072:    <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
                   1073:    character</a> (add 0x0020 to the character's code point) to the
                   1074:    current tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14      mike     1075:    character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1       mike     1076: 
                   1077:    <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
                   1078:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
                   1079:    tag token's tag name. Append the <a href="parsing.html#current-input-character">current input
1.14      mike     1080:    character</a> to the <var><a href="#temporary-buffer">temporary buffer</a></var>.</dd>
1.1       mike     1081: 
                   1082:    <dt>Anything else</dt>
1.87      mike     1083:    <dd>Switch to the <a href="#script-data-escaped-state">script data escaped state</a>. Emit a
                   1084:    U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS character
                   1085:    token, and a character token for each of the characters in the
                   1086:    <var><a href="#temporary-buffer">temporary buffer</a></var> (in the order they were added to the
                   1087:    buffer). Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike     1088: 
1.29      mike     1089:   </dl><h5 id="script-data-double-escape-start-state"><span class="secno">8.2.4.28 </span><dfn>Script data double escape start state</dfn></h5>
1.1       mike     1090: 
                   1091:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1092: 
1.73      mike     1093:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1094:    <dt>U+000A LINE FEED (LF)</dt>
                   1095:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1096:    
1.1       mike     1097:    <dt>U+0020 SPACE</dt>
                   1098:    <dt>U+002F SOLIDUS (/)</dt>
                   1099:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1100:    <dd>If the <var><a href="#temporary-buffer">temporary buffer</a></var> is the string "<code title="">script</code>", then switch to the <a href="#script-data-double-escaped-state">script data
1.1       mike     1101:    double escaped state</a>. Otherwise, switch to the <a href="#script-data-escaped-state">script
1.14      mike     1102:    data escaped state</a>. Emit the <a href="parsing.html#current-input-character">current input
                   1103:    character</a> as a character token.</dd>
1.1       mike     1104: 
                   1105:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
1.14      mike     1106:    <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
1.1       mike     1107:    character</a> (add 0x0020 to the character's code point) to the
1.14      mike     1108:    <var><a href="#temporary-buffer">temporary buffer</a></var>. Emit the <a href="parsing.html#current-input-character">current input
                   1109:    character</a> as a character token.</dd>
1.1       mike     1110: 
                   1111:    <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
1.14      mike     1112:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the
                   1113:    <var><a href="#temporary-buffer">temporary buffer</a></var>. Emit the <a href="parsing.html#current-input-character">current input
                   1114:    character</a> as a character token.</dd>
1.1       mike     1115: 
                   1116:    <dt>Anything else</dt>
1.87      mike     1117:    <dd>Switch to the <a href="#script-data-escaped-state">script data escaped state</a>. Reconsume
                   1118:    the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike     1119: 
1.29      mike     1120:   </dl><h5 id="script-data-double-escaped-state"><span class="secno">8.2.4.29 </span><dfn>Script data double escaped state</dfn></h5>
1.1       mike     1121: 
                   1122:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1123: 
                   1124:   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14      mike     1125:    <dd>Switch to the <a href="#script-data-double-escaped-dash-state">script data double escaped dash
                   1126:    state</a>. Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1       mike     1127: 
                   1128:    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
1.61      mike     1129:    <dd>Switch to the <a href="#script-data-double-escaped-less-than-sign-state">script data double escaped less-than
1.14      mike     1130:    sign state</a>. Emit a U+003C LESS-THAN SIGN character
1.61      mike     1131:    token.</dd>
1.1       mike     1132: 
1.51      mike     1133:    <dt>U+0000 NULL</dt>
                   1134:    <dd><a href="parsing.html#parse-error">Parse error</a>. Emit a U+FFFD REPLACEMENT CHARACTER
                   1135:    character token.</dd>
                   1136: 
1.1       mike     1137:    <dt>EOF</dt>
1.87      mike     1138:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1139:    state</a>. Reconsume the EOF character.</dd>
1.1       mike     1140: 
                   1141:    <dt>Anything else</dt>
                   1142:    <dd>Emit the <a href="parsing.html#current-input-character">current input character</a> as a character
1.14      mike     1143:    token.</dd>
1.1       mike     1144: 
1.29      mike     1145:   </dl><h5 id="script-data-double-escaped-dash-state"><span class="secno">8.2.4.30 </span><dfn>Script data double escaped dash state</dfn></h5>
1.1       mike     1146: 
                   1147:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1148: 
                   1149:   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14      mike     1150:    <dd>Switch to the <a href="#script-data-double-escaped-dash-dash-state">script data double escaped dash dash
                   1151:    state</a>. Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1       mike     1152: 
                   1153:    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
1.61      mike     1154:    <dd>Switch to the <a href="#script-data-double-escaped-less-than-sign-state">script data double escaped less-than
1.14      mike     1155:    sign state</a>. Emit a U+003C LESS-THAN SIGN character
1.61      mike     1156:    token.</dd>
1.1       mike     1157: 
1.51      mike     1158:    <dt>U+0000 NULL</dt>
                   1159:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#script-data-double-escaped-state">script data
                   1160:    double escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER
                   1161:    character token.</dd>
                   1162: 
1.1       mike     1163:    <dt>EOF</dt>
1.87      mike     1164:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1165:    state</a>. Reconsume the EOF character.</dd>
1.1       mike     1166: 
                   1167:    <dt>Anything else</dt>
1.14      mike     1168:    <dd>Switch to the <a href="#script-data-double-escaped-state">script data double escaped
                   1169:    state</a>. Emit the <a href="parsing.html#current-input-character">current input character</a> as a
                   1170:    character token.</dd>
1.1       mike     1171: 
1.29      mike     1172:   </dl><h5 id="script-data-double-escaped-dash-dash-state"><span class="secno">8.2.4.31 </span><dfn>Script data double escaped dash dash state</dfn></h5>
1.1       mike     1173: 
                   1174:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1175: 
                   1176:   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
1.14      mike     1177:    <dd>Emit a U+002D HYPHEN-MINUS character token.</dd>
1.1       mike     1178: 
                   1179:    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
1.61      mike     1180:    <dd>Switch to the <a href="#script-data-double-escaped-less-than-sign-state">script data double escaped less-than
1.14      mike     1181:    sign state</a>. Emit a U+003C LESS-THAN SIGN character
1.61      mike     1182:    token.</dd>
1.1       mike     1183: 
                   1184:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1185:    <dd>Switch to the <a href="#script-data-state">script data state</a>. Emit a U+003E
                   1186:    GREATER-THAN SIGN character token.</dd>
1.1       mike     1187: 
1.51      mike     1188:    <dt>U+0000 NULL</dt>
                   1189:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#script-data-double-escaped-state">script data
                   1190:    double escaped state</a>. Emit a U+FFFD REPLACEMENT CHARACTER
                   1191:    character token.</dd>
                   1192: 
1.1       mike     1193:    <dt>EOF</dt>
1.87      mike     1194:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1195:    state</a>. Reconsume the EOF character.</dd>
1.1       mike     1196: 
                   1197:    <dt>Anything else</dt>
1.14      mike     1198:    <dd>Switch to the <a href="#script-data-double-escaped-state">script data double escaped
                   1199:    state</a>. Emit the <a href="parsing.html#current-input-character">current input character</a> as a
                   1200:    character token.</dd>
1.1       mike     1201: 
1.29      mike     1202:   </dl><h5 id="script-data-double-escaped-less-than-sign-state"><span class="secno">8.2.4.32 </span><dfn>Script data double escaped less-than sign state</dfn></h5>
1.1       mike     1203: 
                   1204:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1205: 
                   1206:   <dl class="switch"><dt>U+002F SOLIDUS (/)</dt>
1.14      mike     1207:    <dd>Set the <var><a href="#temporary-buffer">temporary buffer</a></var> to the empty string. Switch
                   1208:    to the <a href="#script-data-double-escape-end-state">script data double escape end state</a>. Emit a
                   1209:    U+002F SOLIDUS character token.</dd>
1.1       mike     1210: 
                   1211:    <dt>Anything else</dt>
1.87      mike     1212:    <dd>Switch to the <a href="#script-data-double-escaped-state">script data double escaped state</a>.
                   1213:    Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike     1214: 
1.29      mike     1215:   </dl><h5 id="script-data-double-escape-end-state"><span class="secno">8.2.4.33 </span><dfn>Script data double escape end state</dfn></h5>
1.1       mike     1216: 
                   1217:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1218: 
1.73      mike     1219:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1220:    <dt>U+000A LINE FEED (LF)</dt>
                   1221:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1222:    
1.1       mike     1223:    <dt>U+0020 SPACE</dt>
                   1224:    <dt>U+002F SOLIDUS (/)</dt>
                   1225:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1226:    <dd>If the <var><a href="#temporary-buffer">temporary buffer</a></var> is the string "<code title="">script</code>", then switch to the <a href="#script-data-escaped-state">script data
1.1       mike     1227:    escaped state</a>. Otherwise, switch to the <a href="#script-data-double-escaped-state">script data
1.14      mike     1228:    double escaped state</a>. Emit the <a href="parsing.html#current-input-character">current input
                   1229:    character</a> as a character token.</dd>
1.1       mike     1230: 
                   1231:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
1.14      mike     1232:    <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
1.1       mike     1233:    character</a> (add 0x0020 to the character's code point) to the
1.14      mike     1234:    <var><a href="#temporary-buffer">temporary buffer</a></var>. Emit the <a href="parsing.html#current-input-character">current input
                   1235:    character</a> as a character token.</dd>
1.1       mike     1236: 
                   1237:    <dt>U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z</dt>
1.14      mike     1238:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the
                   1239:    <var><a href="#temporary-buffer">temporary buffer</a></var>. Emit the <a href="parsing.html#current-input-character">current input
                   1240:    character</a> as a character token.</dd>
1.1       mike     1241: 
                   1242:    <dt>Anything else</dt>
1.87      mike     1243:    <dd>Switch to the <a href="#script-data-double-escaped-state">script data double escaped state</a>.
                   1244:    Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike     1245: 
1.29      mike     1246:   </dl><h5 id="before-attribute-name-state"><span class="secno">8.2.4.34 </span><dfn>Before attribute name state</dfn></h5>
1.1       mike     1247: 
                   1248:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1249: 
1.73      mike     1250:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1251:    <dt>U+000A LINE FEED (LF)</dt>
                   1252:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1253:    
1.1       mike     1254:    <dt>U+0020 SPACE</dt>
1.14      mike     1255:    <dd>Ignore the character.</dd>
1.1       mike     1256: 
                   1257:    <dt>U+002F SOLIDUS (/)</dt>
                   1258:    <dd>Switch to the <a href="#self-closing-start-tag-state">self-closing start tag state</a>.</dd>
                   1259: 
                   1260:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1261:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the current tag
                   1262:    token.</dd>
1.1       mike     1263: 
                   1264:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                   1265:    <dd>Start a new attribute in the current tag token. Set that
                   1266:    attribute's name to the lowercase version of the <a href="parsing.html#current-input-character">current input
                   1267:    character</a> (add 0x0020 to the character's code point), and its
                   1268:    value to the empty string. Switch to the <a href="#attribute-name-state">attribute name
                   1269:    state</a>.</dd>
                   1270: 
1.51      mike     1271:    <dt>U+0000 NULL</dt>
                   1272:    <dd><a href="parsing.html#parse-error">Parse error</a>. Start a new attribute in the current
                   1273:    tag token. Set that attribute's name to a U+FFFD REPLACEMENT
                   1274:    CHARACTER character, and its value to the empty string. Switch to
                   1275:    the <a href="#attribute-name-state">attribute name state</a>.</dd>
                   1276: 
1.1       mike     1277:    <dt>U+0022 QUOTATION MARK (")</dt>
                   1278:    <dt>U+0027 APOSTROPHE (')</dt>
                   1279:    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
                   1280:    <dt>U+003D EQUALS SIGN (=)</dt>
                   1281:    <dd><a href="parsing.html#parse-error">Parse error</a>. Treat it as per the "anything else"
                   1282:    entry below.</dd>
                   1283: 
                   1284:    <dt>EOF</dt>
1.87      mike     1285:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1286:    state</a>. Reconsume the EOF character.</dd>
1.1       mike     1287: 
                   1288:    <dt>Anything else</dt>
                   1289:    <dd>Start a new attribute in the current tag token. Set that
1.51      mike     1290:    attribute's name to the <a href="parsing.html#current-input-character">current input character</a>, and
                   1291:    its value to the empty string. Switch to the <a href="#attribute-name-state">attribute name
1.1       mike     1292:    state</a>.</dd>
                   1293: 
1.29      mike     1294:   </dl><h5 id="attribute-name-state"><span class="secno">8.2.4.35 </span><dfn>Attribute name state</dfn></h5>
1.1       mike     1295: 
                   1296:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1297: 
1.73      mike     1298:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1299:    <dt>U+000A LINE FEED (LF)</dt>
                   1300:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1301:    
1.1       mike     1302:    <dt>U+0020 SPACE</dt>
                   1303:    <dd>Switch to the <a href="#after-attribute-name-state">after attribute name state</a>.</dd>
                   1304: 
                   1305:    <dt>U+002F SOLIDUS (/)</dt>
                   1306:    <dd>Switch to the <a href="#self-closing-start-tag-state">self-closing start tag state</a>.</dd>
                   1307: 
                   1308:    <dt>U+003D EQUALS SIGN (=)</dt>
                   1309:    <dd>Switch to the <a href="#before-attribute-value-state">before attribute value state</a>.</dd>
                   1310: 
                   1311:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1312:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the current tag
                   1313:    token.</dd>
1.1       mike     1314: 
                   1315:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                   1316:    <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
                   1317:    character</a> (add 0x0020 to the character's code point) to the
1.14      mike     1318:    current attribute's name.</dd>
1.1       mike     1319: 
1.51      mike     1320:    <dt>U+0000 NULL</dt>
                   1321:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
                   1322:    character to the current attribute's name.</dd>
                   1323: 
1.1       mike     1324:    <dt>U+0022 QUOTATION MARK (")</dt>
                   1325:    <dt>U+0027 APOSTROPHE (')</dt>
                   1326:    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
                   1327:    <dd><a href="parsing.html#parse-error">Parse error</a>. Treat it as per the "anything else"
                   1328:    entry below.</dd>
                   1329: 
                   1330:    <dt>EOF</dt>
1.87      mike     1331:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1332:    state</a>. Reconsume the EOF character.</dd>
1.1       mike     1333: 
                   1334:    <dt>Anything else</dt>
                   1335:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14      mike     1336:    attribute's name.</dd>
1.1       mike     1337: 
                   1338:   </dl><p>When the user agent leaves the attribute name state (and before
                   1339:   emitting the tag token, if appropriate), the complete attribute's
                   1340:   name must be compared to the other attributes on the same token;
                   1341:   if there is already an attribute on the token with the exact same
                   1342:   name, then this is a <a href="parsing.html#parse-error">parse error</a> and the new
                   1343:   attribute must be dropped, along with the value that gets
                   1344:   associated with it (if any).</p>
                   1345: 
                   1346: 
1.29      mike     1347:   <h5 id="after-attribute-name-state"><span class="secno">8.2.4.36 </span><dfn>After attribute name state</dfn></h5>
1.1       mike     1348: 
                   1349:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1350: 
1.73      mike     1351:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1352:    <dt>U+000A LINE FEED (LF)</dt>
                   1353:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1354:    
1.1       mike     1355:    <dt>U+0020 SPACE</dt>
1.14      mike     1356:    <dd>Ignore the character.</dd>
1.1       mike     1357: 
                   1358:    <dt>U+002F SOLIDUS (/)</dt>
                   1359:    <dd>Switch to the <a href="#self-closing-start-tag-state">self-closing start tag state</a>.</dd>
                   1360: 
                   1361:    <dt>U+003D EQUALS SIGN (=)</dt>
                   1362:    <dd>Switch to the <a href="#before-attribute-value-state">before attribute value state</a>.</dd>
                   1363: 
                   1364:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1365:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the current tag
                   1366:    token.</dd>
1.1       mike     1367: 
                   1368:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                   1369:    <dd>Start a new attribute in the current tag token. Set that
                   1370:    attribute's name to the lowercase version of the <a href="parsing.html#current-input-character">current
                   1371:    input character</a> (add 0x0020 to the character's code point),
                   1372:    and its value to the empty string. Switch to the <a href="#attribute-name-state">attribute
                   1373:    name state</a>.</dd>
                   1374: 
1.51      mike     1375:    <dt>U+0000 NULL</dt>
                   1376:    <dd><a href="parsing.html#parse-error">Parse error</a>. Start a new attribute in the current
                   1377:    tag token. Set that attribute's name to a U+FFFD REPLACEMENT
                   1378:    CHARACTER character, and its value to the empty string. Switch to
                   1379:    the <a href="#attribute-name-state">attribute name state</a>.</dd>
                   1380: 
1.1       mike     1381:    <dt>U+0022 QUOTATION MARK (")</dt>
                   1382:    <dt>U+0027 APOSTROPHE (')</dt>
                   1383:    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
                   1384:    <dd><a href="parsing.html#parse-error">Parse error</a>. Treat it as per the "anything else"
                   1385:    entry below.</dd>
                   1386: 
                   1387:    <dt>EOF</dt>
1.87      mike     1388:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1389:    state</a>. Reconsume the EOF character.</dd>
1.1       mike     1390: 
                   1391:    <dt>Anything else</dt>
                   1392:    <dd>Start a new attribute in the current tag token. Set that
                   1393:    attribute's name to the <a href="parsing.html#current-input-character">current input character</a>, and
                   1394:    its value to the empty string. Switch to the <a href="#attribute-name-state">attribute name
                   1395:    state</a>.</dd>
                   1396: 
1.29      mike     1397:   </dl><h5 id="before-attribute-value-state"><span class="secno">8.2.4.37 </span><dfn>Before attribute value state</dfn></h5>
1.1       mike     1398: 
                   1399:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1400: 
1.73      mike     1401:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1402:    <dt>U+000A LINE FEED (LF)</dt>
                   1403:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1404:    
1.1       mike     1405:    <dt>U+0020 SPACE</dt>
1.14      mike     1406:    <dd>Ignore the character.</dd>
1.1       mike     1407: 
                   1408:    <dt>U+0022 QUOTATION MARK (")</dt>
                   1409:    <dd>Switch to the <a href="#attribute-value-double-quoted-state">attribute value (double-quoted) state</a>.</dd>
                   1410: 
                   1411:    <dt>U+0026 AMPERSAND (&amp;)</dt>
1.87      mike     1412:    <dd>Switch to the <a href="#attribute-value-unquoted-state">attribute value (unquoted) state</a>.
                   1413:    Reconsume the <a href="parsing.html#current-input-character">current input character</a>.</dd>
1.1       mike     1414: 
                   1415:    <dt>U+0027 APOSTROPHE (')</dt>
                   1416:    <dd>Switch to the <a href="#attribute-value-single-quoted-state">attribute value (single-quoted) state</a>.</dd>
                   1417: 
1.51      mike     1418:    <dt>U+0000 NULL</dt>
                   1419:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
                   1420:    character to the current attribute's value. Switch to the
                   1421:    <a href="#attribute-value-unquoted-state">attribute value (unquoted) state</a>.</dd>
                   1422: 
1.1       mike     1423:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1424:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1425:    state</a>. Emit the current tag token.</dd>
1.1       mike     1426: 
                   1427:    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
                   1428:    <dt>U+003D EQUALS SIGN (=)</dt>
                   1429:    <dt>U+0060 GRAVE ACCENT (`)</dt>
                   1430:    <dd><a href="parsing.html#parse-error">Parse error</a>. Treat it as per the "anything else"
                   1431:    entry below.</dd>
                   1432: 
                   1433:    <dt>EOF</dt>
1.87      mike     1434:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1435:    state</a>. Reconsume the EOF character.</dd>
1.1       mike     1436: 
                   1437:    <dt>Anything else</dt>
                   1438:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
                   1439:    attribute's value. Switch to the <a href="#attribute-value-unquoted-state">attribute value (unquoted)
                   1440:    state</a>.</dd>
                   1441: 
                   1442:   </dl><h5 id="attribute-value-double-quoted-state"><span class="secno">8.2.4.38 </span><dfn>Attribute value (double-quoted) state</dfn></h5>
                   1443: 
                   1444:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1445: 
                   1446:   <dl class="switch"><dt>U+0022 QUOTATION MARK (")</dt>
                   1447:    <dd>Switch to the <a href="#after-attribute-value-quoted-state">after attribute value (quoted)
                   1448:    state</a>.</dd>
                   1449: 
                   1450:    <dt>U+0026 AMPERSAND (&amp;)</dt>
                   1451:    <dd>Switch to the <a href="#character-reference-in-attribute-value-state">character reference in attribute value
                   1452:    state</a>, with the <a href="#additional-allowed-character">additional allowed character</a>
                   1453:    being U+0022 QUOTATION MARK (").</dd>
                   1454: 
1.51      mike     1455:    <dt>U+0000 NULL</dt>
                   1456:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
                   1457:    character to the current attribute's value.</dd>
                   1458: 
1.1       mike     1459:    <dt>EOF</dt>
1.87      mike     1460:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1461:    state</a>. Reconsume the EOF character.</dd>
1.1       mike     1462: 
                   1463:    <dt>Anything else</dt>
                   1464:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14      mike     1465:    attribute's value.</dd>
1.1       mike     1466: 
                   1467:   </dl><h5 id="attribute-value-single-quoted-state"><span class="secno">8.2.4.39 </span><dfn>Attribute value (single-quoted) state</dfn></h5>
                   1468: 
                   1469:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1470: 
                   1471:   <dl class="switch"><dt>U+0027 APOSTROPHE (')</dt>
                   1472:    <dd>Switch to the <a href="#after-attribute-value-quoted-state">after attribute value (quoted)
                   1473:    state</a>.</dd>
                   1474: 
                   1475:    <dt>U+0026 AMPERSAND (&amp;)</dt>
                   1476:    <dd>Switch to the <a href="#character-reference-in-attribute-value-state">character reference in attribute value
                   1477:    state</a>, with the <a href="#additional-allowed-character">additional allowed character</a>
                   1478:    being U+0027 APOSTROPHE (').</dd>
                   1479: 
1.61      mike     1480:    <dt>U+0000 NULL</dt>
                   1481:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
                   1482:    character to the current attribute's value.</dd>
                   1483: 
1.1       mike     1484:    <dt>EOF</dt>
1.87      mike     1485:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1486:    state</a>. Reconsume the EOF character.</dd>
1.1       mike     1487: 
                   1488:    <dt>Anything else</dt>
                   1489:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14      mike     1490:    attribute's value.</dd>
1.1       mike     1491: 
                   1492:   </dl><h5 id="attribute-value-unquoted-state"><span class="secno">8.2.4.40 </span><dfn>Attribute value (unquoted) state</dfn></h5>
                   1493: 
                   1494:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1495: 
1.73      mike     1496:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1497:    <dt>U+000A LINE FEED (LF)</dt>
                   1498:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1499:    
1.1       mike     1500:    <dt>U+0020 SPACE</dt>
                   1501:    <dd>Switch to the <a href="#before-attribute-name-state">before attribute name state</a>.</dd>
                   1502: 
                   1503:    <dt>U+0026 AMPERSAND (&amp;)</dt>
                   1504:    <dd>Switch to the <a href="#character-reference-in-attribute-value-state">character reference in attribute value
                   1505:    state</a>, with the <a href="#additional-allowed-character">additional allowed character</a>
                   1506:    being U+003E GREATER-THAN SIGN (&gt;).</dd>
                   1507: 
                   1508:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1509:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the current tag
                   1510:    token.</dd>
1.1       mike     1511: 
1.51      mike     1512:    <dt>U+0000 NULL</dt>
                   1513:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
                   1514:    character to the current attribute's value.</dd>
                   1515: 
1.1       mike     1516:    <dt>U+0022 QUOTATION MARK (")</dt>
                   1517:    <dt>U+0027 APOSTROPHE (')</dt>
                   1518:    <dt>U+003C LESS-THAN SIGN (&lt;)</dt>
                   1519:    <dt>U+003D EQUALS SIGN (=)</dt>
                   1520:    <dt>U+0060 GRAVE ACCENT (`)</dt>
                   1521:    <dd><a href="parsing.html#parse-error">Parse error</a>. Treat it as per the "anything else"
                   1522:    entry below.</dd>
                   1523: 
                   1524:    <dt>EOF</dt>
1.87      mike     1525:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1526:    state</a>. Reconsume the EOF character.</dd>
1.1       mike     1527: 
                   1528:    <dt>Anything else</dt>
                   1529:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14      mike     1530:    attribute's value.</dd>
1.1       mike     1531: 
1.29      mike     1532:   </dl><h5 id="character-reference-in-attribute-value-state"><span class="secno">8.2.4.41 </span><dfn>Character reference in attribute value state</dfn></h5>
1.1       mike     1533: 
                   1534:   <p>Attempt to <a href="#consume-a-character-reference">consume a character reference</a>.</p>
                   1535: 
1.18      mike     1536:   <p>If nothing is returned, append a U+0026 AMPERSAND character
                   1537:   (&amp;) to the current attribute's value.</p>
1.1       mike     1538: 
1.85      mike     1539:   <p>Otherwise, append the returned character tokens to the current
1.1       mike     1540:   attribute's value.</p>
                   1541: 
1.27      mike     1542:   <p>Finally, switch back to the attribute value state that switched
                   1543:   into this state.</p>
1.1       mike     1544: 
                   1545: 
                   1546:   <h5 id="after-attribute-value-quoted-state"><span class="secno">8.2.4.42 </span><dfn>After attribute value (quoted) state</dfn></h5>
                   1547: 
                   1548:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1549: 
1.73      mike     1550:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1551:    <dt>U+000A LINE FEED (LF)</dt>
                   1552:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1553:    
1.1       mike     1554:    <dt>U+0020 SPACE</dt>
                   1555:    <dd>Switch to the <a href="#before-attribute-name-state">before attribute name state</a>.</dd>
                   1556: 
                   1557:    <dt>U+002F SOLIDUS (/)</dt>
                   1558:    <dd>Switch to the <a href="#self-closing-start-tag-state">self-closing start tag state</a>.</dd>
                   1559: 
                   1560:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1561:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the current tag
                   1562:    token.</dd>
1.1       mike     1563: 
                   1564:    <dt>EOF</dt>
1.87      mike     1565:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1566:    state</a>. Reconsume the EOF character.</dd>
1.1       mike     1567: 
                   1568:    <dt>Anything else</dt>
1.87      mike     1569:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#before-attribute-name-state">before attribute
                   1570:    name state</a>. Reconsume the character.</dd>
1.1       mike     1571: 
1.29      mike     1572:   </dl><h5 id="self-closing-start-tag-state"><span class="secno">8.2.4.43 </span><dfn>Self-closing start tag state</dfn></h5>
1.1       mike     1573: 
                   1574:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1575: 
                   1576:   <dl class="switch"><dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                   1577:    <dd>Set the <i>self-closing flag</i> of the current tag
1.14      mike     1578:    token. Switch to the <a href="#data-state">data state</a>. Emit the current tag
                   1579:    token.</dd>
1.1       mike     1580: 
                   1581:    <dt>EOF</dt>
1.87      mike     1582:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1583:    state</a>. Reconsume the EOF character.</dd>
1.1       mike     1584: 
                   1585:    <dt>Anything else</dt>
1.87      mike     1586:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#before-attribute-name-state">before attribute
                   1587:    name state</a>. Reconsume the character.</dd>
1.1       mike     1588: 
1.29      mike     1589:   </dl><h5 id="bogus-comment-state"><span class="secno">8.2.4.44 </span><dfn>Bogus comment state</dfn></h5>
1.1       mike     1590: 
                   1591:   <p>Consume every character up to and including the first U+003E
                   1592:   GREATER-THAN SIGN character (&gt;) or the end of the file (EOF),
                   1593:   whichever comes first. Emit a comment token whose data is the
1.51      mike     1594:   concatenation of all the characters starting from and including the
                   1595:   character that caused the state machine to switch into the bogus
                   1596:   comment state, up to and including the character immediately before
                   1597:   the last consumed character (i.e. up to the character just before
                   1598:   the U+003E or EOF character), but with any U+0000 NULL characters
                   1599:   replaced by U+FFFD REPLACEMENT CHARACTER characters. (If the comment
                   1600:   was started by the end of the file (EOF), the token is empty.)</p>
1.1       mike     1601: 
                   1602:   <p>Switch to the <a href="#data-state">data state</a>.</p>
                   1603: 
                   1604:   <p>If the end of the file was reached, reconsume the EOF
                   1605:   character.</p>
                   1606: 
                   1607: 
1.29      mike     1608:   <h5 id="markup-declaration-open-state"><span class="secno">8.2.4.45 </span><dfn>Markup declaration open state</dfn></h5>
1.1       mike     1609: 
                   1610:   <p>If the next two characters are both U+002D HYPHEN-MINUS
                   1611:   characters (-), consume those two characters, create a comment token
                   1612:   whose data is the empty string, and switch to the <a href="#comment-start-state">comment
                   1613:   start state</a>.</p>
                   1614: 
                   1615:   <p>Otherwise, if the next seven characters are an <a href="infrastructure.html#ascii-case-insensitive">ASCII
                   1616:   case-insensitive</a> match for the word "DOCTYPE", then consume
                   1617:   those characters and switch to the <a href="#doctype-state">DOCTYPE state</a>.</p>
                   1618: 
1.86      mike     1619:   <p>Otherwise, if there is a <a href="parsing.html#current-node">current node</a> and it is not
                   1620:   an element in the <a href="namespaces.html#html-namespace-0">HTML namespace</a> and the next seven
                   1621:   characters are a <a href="infrastructure.html#case-sensitive">case-sensitive</a> match for the string
                   1622:   "[CDATA[" (the five uppercase letters "CDATA" with a U+005B LEFT
                   1623:   SQUARE BRACKET character before and after), then consume those
                   1624:   characters and switch to the <a href="#cdata-section-state">CDATA section state</a>.</p>
1.1       mike     1625: 
                   1626:   <p>Otherwise, this is a <a href="parsing.html#parse-error">parse error</a>. Switch to the
                   1627:   <a href="#bogus-comment-state">bogus comment state</a>. The next character that is
                   1628:   consumed, if any, is the first character that will be in the
                   1629:   comment.</p>
                   1630: 
                   1631: 
1.29      mike     1632:   <h5 id="comment-start-state"><span class="secno">8.2.4.46 </span><dfn>Comment start state</dfn></h5>
1.1       mike     1633: 
                   1634:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1635: 
                   1636:   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
                   1637:    <dd>Switch to the <a href="#comment-start-dash-state">comment start dash state</a>.</dd>
                   1638: 
1.51      mike     1639:    <dt>U+0000 NULL</dt>
                   1640:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
                   1641:    character to the comment token's data. Switch to the <a href="#comment-state">comment
                   1642:    state</a>.</dd>
                   1643: 
1.1       mike     1644:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1645:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
1.70      mike     1646:    state</a>. Emit the comment token.</dd> 
1.90      mike     1647: 
1.1       mike     1648:    <dt>EOF</dt>
1.87      mike     1649:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1650:    state</a>. Emit the comment token. Reconsume the EOF character.</dd>
1.1       mike     1651: 
                   1652:    <dt>Anything else</dt>
                   1653:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the comment
                   1654:    token's data. Switch to the <a href="#comment-state">comment state</a>.</dd>
                   1655: 
1.29      mike     1656:   </dl><h5 id="comment-start-dash-state"><span class="secno">8.2.4.47 </span><dfn>Comment start dash state</dfn></h5>
1.1       mike     1657: 
                   1658:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1659: 
                   1660:   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
                   1661:    <dd>Switch to the <a href="#comment-end-state">comment end state</a></dd>
                   1662: 
1.51      mike     1663:    <dt>U+0000 NULL</dt>
                   1664:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+002D HYPHEN-MINUS
                   1665:    character (-) and a U+FFFD REPLACEMENT CHARACTER character to the
                   1666:    comment token's data. Switch to the <a href="#comment-state">comment
                   1667:    state</a>.</dd>
                   1668: 
1.1       mike     1669:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1670:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1671:    state</a>. Emit the comment token.</dd>
1.1       mike     1672: 
                   1673:    <dt>EOF</dt>
1.87      mike     1674:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1675:    state</a>. Emit the comment token. Reconsume the EOF
                   1676:    character.</dd> 
                   1677: 
1.1       mike     1678:    <dt>Anything else</dt>
                   1679:    <dd>Append a U+002D HYPHEN-MINUS character (-) and the
                   1680:    <a href="parsing.html#current-input-character">current input character</a> to the comment token's
                   1681:    data. Switch to the <a href="#comment-state">comment state</a>.</dd>
                   1682: 
1.29      mike     1683:   </dl><h5 id="comment-state"><span class="secno">8.2.4.48 </span><dfn id="comment">Comment state</dfn></h5>
1.1       mike     1684: 
                   1685:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1686: 
                   1687:   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
                   1688:    <dd>Switch to the <a href="#comment-end-dash-state">comment end dash state</a></dd>
                   1689: 
1.51      mike     1690:    <dt>U+0000 NULL</dt>
                   1691:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
                   1692:    character to the comment token's data.</dd>
                   1693: 
1.1       mike     1694:    <dt>EOF</dt>
1.87      mike     1695:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1696:    state</a>. Emit the comment token. Reconsume the EOF
                   1697:    character.</dd> 
                   1698: 
1.1       mike     1699:    <dt>Anything else</dt>
                   1700:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the comment
1.14      mike     1701:    token's data.</dd>
1.1       mike     1702: 
1.29      mike     1703:   </dl><h5 id="comment-end-dash-state"><span class="secno">8.2.4.49 </span><dfn>Comment end dash state</dfn></h5>
1.1       mike     1704: 
                   1705:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1706: 
                   1707:   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
                   1708:    <dd>Switch to the <a href="#comment-end-state">comment end state</a></dd>
                   1709: 
1.51      mike     1710:    <dt>U+0000 NULL</dt>
                   1711:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+002D HYPHEN-MINUS
                   1712:    character (-) and a U+FFFD REPLACEMENT CHARACTER character to the
                   1713:    comment token's data. Switch to the <a href="#comment-state">comment
                   1714:    state</a>.</dd>
                   1715: 
1.1       mike     1716:    <dt>EOF</dt>
1.87      mike     1717:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1718:    state</a>. Emit the comment token. Reconsume the EOF
                   1719:    character.</dd> 
                   1720: 
1.1       mike     1721:    <dt>Anything else</dt>
                   1722:    <dd>Append a U+002D HYPHEN-MINUS character (-) and the
                   1723:    <a href="parsing.html#current-input-character">current input character</a> to the comment token's
                   1724:    data. Switch to the <a href="#comment-state">comment state</a>.</dd>
                   1725: 
1.29      mike     1726:   </dl><h5 id="comment-end-state"><span class="secno">8.2.4.50 </span><dfn>Comment end state</dfn></h5>
1.1       mike     1727: 
                   1728:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1729: 
                   1730:   <dl class="switch"><dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1731:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the comment
                   1732:    token.</dd>
1.1       mike     1733: 
1.51      mike     1734:    <dt>U+0000 NULL</dt>
                   1735:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append two U+002D HYPHEN-MINUS
                   1736:    characters (-) and a U+FFFD REPLACEMENT CHARACTER character to the
                   1737:    comment token's data. Switch to the <a href="#comment-state">comment
                   1738:    state</a>.</dd>
                   1739: 
1.1       mike     1740:    <dt>U+0021 EXCLAMATION MARK (!)</dt>
                   1741:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#comment-end-bang-state">comment end bang
                   1742:    state</a>.</dd>
                   1743: 
                   1744:    <dt>U+002D HYPHEN-MINUS (-)</dt>
                   1745:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+002D HYPHEN-MINUS
1.14      mike     1746:    character (-) to the comment token's data.</dd>
1.1       mike     1747: 
                   1748:    <dt>EOF</dt>
1.87      mike     1749:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1750:    state</a>. Emit the comment token. Reconsume the EOF
                   1751:    character.</dd> 
1.90      mike     1752: 
1.1       mike     1753:    <dt>Anything else</dt>
                   1754:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append two U+002D HYPHEN-MINUS
                   1755:    characters (-) and the <a href="parsing.html#current-input-character">current input character</a> to the
                   1756:    comment token's data. Switch to the <a href="#comment-state">comment
                   1757:    state</a>.</dd>
                   1758: 
1.29      mike     1759:   </dl><h5 id="comment-end-bang-state"><span class="secno">8.2.4.51 </span><dfn>Comment end bang state</dfn></h5>
1.1       mike     1760: 
                   1761:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1762: 
                   1763:   <dl class="switch"><dt>U+002D HYPHEN-MINUS (-)</dt>
                   1764:    <dd>Append two U+002D HYPHEN-MINUS characters (-) and a U+0021
                   1765:    EXCLAMATION MARK character (!) to the comment token's data. Switch
                   1766:    to the <a href="#comment-end-dash-state">comment end dash state</a>.</dd>
                   1767: 
                   1768:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1769:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the comment
                   1770:    token.</dd>
1.1       mike     1771: 
1.51      mike     1772:    <dt>U+0000 NULL</dt>
                   1773:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append two U+002D HYPHEN-MINUS
                   1774:    characters (-), a U+0021 EXCLAMATION MARK character (!), and a
                   1775:    U+FFFD REPLACEMENT CHARACTER character to the comment token's data.
                   1776:    Switch to the <a href="#comment-state">comment state</a>.</dd>
                   1777: 
1.1       mike     1778:    <dt>EOF</dt>
1.87      mike     1779:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1780:    state</a>. Emit the comment token. Reconsume the EOF
                   1781:    character.</dd> 
                   1782: 
1.1       mike     1783:    <dt>Anything else</dt>
                   1784:    <dd>Append two U+002D HYPHEN-MINUS characters (-), a U+0021
                   1785:    EXCLAMATION MARK character (!), and the <a href="parsing.html#current-input-character">current input
                   1786:    character</a> to the comment token's data. Switch to the
                   1787:    <a href="#comment-state">comment state</a>.</dd>
                   1788: 
1.37      mike     1789:   </dl><h5 id="doctype-state"><span class="secno">8.2.4.52 </span><dfn>DOCTYPE state</dfn></h5>
1.1       mike     1790: 
                   1791:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1792: 
1.73      mike     1793:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1794:    <dt>U+000A LINE FEED (LF)</dt>
                   1795:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1796:    
1.1       mike     1797:    <dt>U+0020 SPACE</dt>
                   1798:    <dd>Switch to the <a href="#before-doctype-name-state">before DOCTYPE name state</a>.</dd>
                   1799: 
                   1800:    <dt>EOF</dt>
1.87      mike     1801:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1802:    state</a>. Create a new DOCTYPE token. Set its <i>force-quirks
                   1803:    flag</i> to <i>on</i>. Emit the token. Reconsume the EOF
                   1804:    character.</dd>
1.1       mike     1805: 
                   1806:    <dt>Anything else</dt>
1.87      mike     1807:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#before-doctype-name-state">before DOCTYPE
                   1808:    name state</a>. Reconsume the character.</dd>
1.1       mike     1809: 
1.37      mike     1810:   </dl><h5 id="before-doctype-name-state"><span class="secno">8.2.4.53 </span><dfn>Before DOCTYPE name state</dfn></h5>
1.1       mike     1811: 
                   1812:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1813: 
1.73      mike     1814:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1815:    <dt>U+000A LINE FEED (LF)</dt>
                   1816:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1817:    
1.1       mike     1818:    <dt>U+0020 SPACE</dt>
1.14      mike     1819:    <dd>Ignore the character.</dd>
1.1       mike     1820: 
                   1821:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                   1822:    <dd>Create a new DOCTYPE token. Set the token's name to the
                   1823:    lowercase version of the <a href="parsing.html#current-input-character">current input character</a> (add 0x0020 to the
                   1824:    character's code point). Switch to the <a href="#doctype-name-state">DOCTYPE name
                   1825:    state</a>.</dd>
                   1826: 
1.51      mike     1827:    <dt>U+0000 NULL</dt>
1.72      mike     1828:    <dd><a href="parsing.html#parse-error">Parse error</a>. Create a new DOCTYPE token. Set the
                   1829:    token's name to a U+FFFD REPLACEMENT CHARACTER character. Switch to
                   1830:    the <a href="#doctype-name-state">DOCTYPE name state</a>.</dd>
1.51      mike     1831: 
1.1       mike     1832:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                   1833:    <dd><a href="parsing.html#parse-error">Parse error</a>. Create a new DOCTYPE token. Set its
1.14      mike     1834:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
                   1835:    state</a>. Emit the token.</dd>
1.1       mike     1836: 
                   1837:    <dt>EOF</dt>
1.87      mike     1838:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1839:    state</a>. Create a new DOCTYPE token. Set its <i>force-quirks
                   1840:    flag</i> to <i>on</i>. Emit the token. Reconsume the EOF
                   1841:    character.</dd>
1.1       mike     1842: 
                   1843:    <dt>Anything else</dt>
                   1844:    <dd>Create a new DOCTYPE token. Set the token's name to the
                   1845:    <a href="parsing.html#current-input-character">current input character</a>. Switch to the <a href="#doctype-name-state">DOCTYPE name
                   1846:    state</a>.</dd>
                   1847: 
1.37      mike     1848:   </dl><h5 id="doctype-name-state"><span class="secno">8.2.4.54 </span><dfn>DOCTYPE name state</dfn></h5>
1.1       mike     1849: 
                   1850:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1851: 
1.73      mike     1852:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1853:    <dt>U+000A LINE FEED (LF)</dt>
                   1854:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1855:    
1.1       mike     1856:    <dt>U+0020 SPACE</dt>
                   1857:    <dd>Switch to the <a href="#after-doctype-name-state">after DOCTYPE name state</a>.</dd>
                   1858: 
                   1859:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1860:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the current DOCTYPE
                   1861:    token.</dd>
1.1       mike     1862: 
                   1863:    <dt>U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z</dt>
                   1864:    <dd>Append the lowercase version of the <a href="parsing.html#current-input-character">current input
                   1865:    character</a> (add 0x0020 to the character's code point) to the
1.14      mike     1866:    current DOCTYPE token's name.</dd>
1.1       mike     1867: 
1.51      mike     1868:    <dt>U+0000 NULL</dt>
                   1869:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
                   1870:    character to the current DOCTYPE token's name.</dd>
                   1871: 
1.1       mike     1872:    <dt>EOF</dt>
1.87      mike     1873:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1874:    state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
                   1875:    <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1       mike     1876: 
                   1877:    <dt>Anything else</dt>
                   1878:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14      mike     1879:    DOCTYPE token's name.</dd>
1.1       mike     1880: 
1.37      mike     1881:   </dl><h5 id="after-doctype-name-state"><span class="secno">8.2.4.55 </span><dfn>After DOCTYPE name state</dfn></h5>
1.1       mike     1882: 
                   1883:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1884: 
1.73      mike     1885:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1886:    <dt>U+000A LINE FEED (LF)</dt>
                   1887:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1888:    
1.1       mike     1889:    <dt>U+0020 SPACE</dt>
1.14      mike     1890:    <dd>Ignore the character.</dd>
1.1       mike     1891: 
                   1892:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     1893:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the current DOCTYPE
                   1894:    token.</dd>
1.1       mike     1895: 
                   1896:    <dt>EOF</dt>
1.87      mike     1897:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1898:    state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
                   1899:    <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1       mike     1900: 
                   1901:    <dt>Anything else</dt>
                   1902:    <dd>
                   1903: 
                   1904:     <p>If the six characters starting from the <a href="parsing.html#current-input-character">current input
                   1905:     character</a> are an <a href="infrastructure.html#ascii-case-insensitive">ASCII case-insensitive</a> match
                   1906:     for the word "PUBLIC", then consume those characters and switch to
                   1907:     the <a href="#after-doctype-public-keyword-state">after DOCTYPE public keyword state</a>.</p>
                   1908: 
                   1909:     <p>Otherwise, if the six characters starting from the
                   1910:     <a href="parsing.html#current-input-character">current input character</a> are an <a href="infrastructure.html#ascii-case-insensitive">ASCII
                   1911:     case-insensitive</a> match for the word "SYSTEM", then consume
                   1912:     those characters and switch to the <a href="#after-doctype-system-keyword-state">after DOCTYPE system
                   1913:     keyword state</a>.</p>
                   1914: 
                   1915:     <p>Otherwise, this is the <a href="parsing.html#parse-error">parse error</a>. Set the
                   1916:     DOCTYPE token's <i>force-quirks flag</i> to <i>on</i>. Switch to
                   1917:     the <a href="#bogus-doctype-state">bogus DOCTYPE state</a>.</p>
                   1918: 
                   1919:    </dd>
                   1920: 
1.37      mike     1921:   </dl><h5 id="after-doctype-public-keyword-state"><span class="secno">8.2.4.56 </span><dfn>After DOCTYPE public keyword state</dfn></h5>
1.1       mike     1922: 
                   1923:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1924: 
1.73      mike     1925:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1926:    <dt>U+000A LINE FEED (LF)</dt>
                   1927:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1928:    
1.1       mike     1929:    <dt>U+0020 SPACE</dt>
                   1930:    <dd>Switch to the <a href="#before-doctype-public-identifier-state">before DOCTYPE public identifier
                   1931:    state</a>.</dd>
                   1932: 
                   1933:    <dt>U+0022 QUOTATION MARK (")</dt>
                   1934:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's public
                   1935:    identifier to the empty string (not missing), then switch to the
                   1936:    <a href="#doctype-public-identifier-double-quoted-state">DOCTYPE public identifier (double-quoted) state</a>.</dd>
                   1937: 
                   1938:    <dt>U+0027 APOSTROPHE (')</dt>
                   1939:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's public
                   1940:    identifier to the empty string (not missing), then switch to the
                   1941:    <a href="#doctype-public-identifier-single-quoted-state">DOCTYPE public identifier (single-quoted) state</a>.</dd>
                   1942: 
                   1943:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                   1944:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14      mike     1945:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
                   1946:    state</a>. Emit that DOCTYPE token.</dd>
1.1       mike     1947: 
                   1948:    <dt>EOF</dt>
1.87      mike     1949:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1950:    state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
                   1951:    <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1       mike     1952: 
                   1953:    <dt>Anything else</dt>
                   1954:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
                   1955:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#bogus-doctype-state">bogus
                   1956:    DOCTYPE state</a>.</dd>
                   1957: 
1.37      mike     1958:   </dl><h5 id="before-doctype-public-identifier-state"><span class="secno">8.2.4.57 </span><dfn>Before DOCTYPE public identifier state</dfn></h5>
1.1       mike     1959: 
                   1960:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1961: 
1.73      mike     1962:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     1963:    <dt>U+000A LINE FEED (LF)</dt>
                   1964:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     1965:    
1.1       mike     1966:    <dt>U+0020 SPACE</dt>
1.14      mike     1967:    <dd>Ignore the character.</dd>
1.1       mike     1968: 
                   1969:    <dt>U+0022 QUOTATION MARK (")</dt>
                   1970:    <dd>Set the DOCTYPE token's public identifier to the empty string
                   1971:    (not missing), then switch to the <a href="#doctype-public-identifier-double-quoted-state">DOCTYPE public identifier
                   1972:    (double-quoted) state</a>.</dd>
                   1973: 
                   1974:    <dt>U+0027 APOSTROPHE (')</dt>
                   1975:    <dd>Set the DOCTYPE token's public identifier to the empty string
                   1976:    (not missing), then switch to the <a href="#doctype-public-identifier-single-quoted-state">DOCTYPE public identifier
                   1977:    (single-quoted) state</a>.</dd>
                   1978: 
                   1979:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                   1980:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14      mike     1981:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
                   1982:    state</a>. Emit that DOCTYPE token.</dd>
1.1       mike     1983: 
                   1984:    <dt>EOF</dt>
1.87      mike     1985:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   1986:    state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
                   1987:    <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1       mike     1988: 
                   1989:    <dt>Anything else</dt>
                   1990:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
                   1991:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#bogus-doctype-state">bogus
                   1992:    DOCTYPE state</a>.</dd>
                   1993: 
1.37      mike     1994:   </dl><h5 id="doctype-public-identifier-double-quoted-state"><span class="secno">8.2.4.58 </span><dfn>DOCTYPE public identifier (double-quoted) state</dfn></h5>
1.1       mike     1995: 
                   1996:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   1997: 
                   1998:   <dl class="switch"><dt>U+0022 QUOTATION MARK (")</dt>
                   1999:    <dd>Switch to the <a href="#after-doctype-public-identifier-state">after DOCTYPE public identifier state</a>.</dd>
                   2000: 
1.51      mike     2001:    <dt>U+0000 NULL</dt>
                   2002:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
                   2003:    character to the current DOCTYPE token's public identifier.</dd>
                   2004: 
1.1       mike     2005:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                   2006:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14      mike     2007:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
                   2008:    state</a>. Emit that DOCTYPE token.</dd>
1.1       mike     2009: 
                   2010:    <dt>EOF</dt>
1.87      mike     2011:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   2012:    state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
                   2013:    <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1       mike     2014: 
                   2015:    <dt>Anything else</dt>
1.51      mike     2016:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
                   2017:    DOCTYPE token's public identifier.</dd>
1.1       mike     2018: 
1.37      mike     2019:   </dl><h5 id="doctype-public-identifier-single-quoted-state"><span class="secno">8.2.4.59 </span><dfn>DOCTYPE public identifier (single-quoted) state</dfn></h5>
1.1       mike     2020: 
                   2021:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   2022: 
                   2023:   <dl class="switch"><dt>U+0027 APOSTROPHE (')</dt>
                   2024:    <dd>Switch to the <a href="#after-doctype-public-identifier-state">after DOCTYPE public identifier state</a>.</dd>
                   2025: 
1.51      mike     2026:    <dt>U+0000 NULL</dt>
                   2027:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
                   2028:    character to the current DOCTYPE token's public identifier.</dd>
                   2029: 
1.1       mike     2030:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                   2031:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14      mike     2032:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
                   2033:    state</a>. Emit that DOCTYPE token.</dd>
1.1       mike     2034: 
                   2035:    <dt>EOF</dt>
1.87      mike     2036:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   2037:    state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
                   2038:    <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1       mike     2039: 
                   2040:    <dt>Anything else</dt>
1.51      mike     2041:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
                   2042:    DOCTYPE token's public identifier.</dd>
1.1       mike     2043: 
1.37      mike     2044:   </dl><h5 id="after-doctype-public-identifier-state"><span class="secno">8.2.4.60 </span><dfn>After DOCTYPE public identifier state</dfn></h5>
1.1       mike     2045: 
                   2046:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   2047: 
1.73      mike     2048:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     2049:    <dt>U+000A LINE FEED (LF)</dt>
                   2050:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     2051:    
1.1       mike     2052:    <dt>U+0020 SPACE</dt>
                   2053:    <dd>Switch to the <a href="#between-doctype-public-and-system-identifiers-state">between DOCTYPE public and system
                   2054:    identifiers state</a>.</dd>
                   2055: 
                   2056:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     2057:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the current DOCTYPE
                   2058:    token.</dd>
1.1       mike     2059: 
                   2060:    <dt>U+0022 QUOTATION MARK (")</dt>
                   2061:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's system
                   2062:    identifier to the empty string (not missing), then switch to the
                   2063:    <a href="#doctype-system-identifier-double-quoted-state">DOCTYPE system identifier (double-quoted) state</a>.</dd>
                   2064: 
                   2065:    <dt>U+0027 APOSTROPHE (')</dt>
                   2066:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's system
                   2067:    identifier to the empty string (not missing), then switch to the
                   2068:    <a href="#doctype-system-identifier-single-quoted-state">DOCTYPE system identifier (single-quoted) state</a>.</dd>
                   2069: 
                   2070:    <dt>EOF</dt>
1.87      mike     2071:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   2072:    state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
                   2073:    <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1       mike     2074: 
                   2075:    <dt>Anything else</dt>
                   2076:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
                   2077:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#bogus-doctype-state">bogus
                   2078:    DOCTYPE state</a>.</dd>
                   2079: 
1.37      mike     2080:   </dl><h5 id="between-doctype-public-and-system-identifiers-state"><span class="secno">8.2.4.61 </span><dfn>Between DOCTYPE public and system identifiers state</dfn></h5>
1.1       mike     2081: 
                   2082:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   2083: 
1.73      mike     2084:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     2085:    <dt>U+000A LINE FEED (LF)</dt>
                   2086:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     2087:    
1.1       mike     2088:    <dt>U+0020 SPACE</dt>
1.14      mike     2089:    <dd>Ignore the character.</dd>
1.1       mike     2090: 
                   2091:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     2092:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the current DOCTYPE
                   2093:    token.</dd>
1.1       mike     2094: 
                   2095:    <dt>U+0022 QUOTATION MARK (")</dt>
                   2096:    <dd>Set the DOCTYPE token's system identifier to the empty string
                   2097:    (not missing), then switch to the <a href="#doctype-system-identifier-double-quoted-state">DOCTYPE system identifier
                   2098:    (double-quoted) state</a>.</dd>
                   2099: 
                   2100:    <dt>U+0027 APOSTROPHE (')</dt>
                   2101:    <dd>Set the DOCTYPE token's system identifier to the empty string
                   2102:    (not missing), then switch to the <a href="#doctype-system-identifier-single-quoted-state">DOCTYPE system identifier
                   2103:    (single-quoted) state</a>.</dd>
                   2104: 
                   2105:    <dt>EOF</dt>
1.87      mike     2106:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   2107:    state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
                   2108:    <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1       mike     2109: 
                   2110:    <dt>Anything else</dt>
                   2111:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
                   2112:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#bogus-doctype-state">bogus
                   2113:    DOCTYPE state</a>.</dd>
                   2114: 
1.37      mike     2115:   </dl><h5 id="after-doctype-system-keyword-state"><span class="secno">8.2.4.62 </span><dfn>After DOCTYPE system keyword state</dfn></h5>
1.1       mike     2116: 
                   2117:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   2118: 
1.73      mike     2119:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     2120:    <dt>U+000A LINE FEED (LF)</dt>
                   2121:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     2122:    
1.1       mike     2123:    <dt>U+0020 SPACE</dt>
                   2124:    <dd>Switch to the <a href="#before-doctype-system-identifier-state">before DOCTYPE system identifier
                   2125:    state</a>.</dd>
                   2126: 
                   2127:    <dt>U+0022 QUOTATION MARK (")</dt>
                   2128:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's system
                   2129:    identifier to the empty string (not missing), then switch to the
                   2130:    <a href="#doctype-system-identifier-double-quoted-state">DOCTYPE system identifier (double-quoted) state</a>.</dd>
                   2131: 
                   2132:    <dt>U+0027 APOSTROPHE (')</dt>
                   2133:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's system
                   2134:    identifier to the empty string (not missing), then switch to the
                   2135:    <a href="#doctype-system-identifier-single-quoted-state">DOCTYPE system identifier (single-quoted) state</a>.</dd>
                   2136: 
                   2137:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                   2138:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14      mike     2139:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
                   2140:    state</a>. Emit that DOCTYPE token.</dd>
1.1       mike     2141: 
                   2142:    <dt>EOF</dt>
1.87      mike     2143:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   2144:    state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
                   2145:    <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1       mike     2146: 
                   2147:    <dt>Anything else</dt>
                   2148:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
                   2149:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#bogus-doctype-state">bogus
                   2150:    DOCTYPE state</a>.</dd>
                   2151: 
1.37      mike     2152:   </dl><h5 id="before-doctype-system-identifier-state"><span class="secno">8.2.4.63 </span><dfn>Before DOCTYPE system identifier state</dfn></h5>
1.1       mike     2153: 
                   2154:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   2155: 
1.73      mike     2156:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     2157:    <dt>U+000A LINE FEED (LF)</dt>
                   2158:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     2159:    
1.1       mike     2160:    <dt>U+0020 SPACE</dt>
1.14      mike     2161:    <dd>Ignore the character.</dd>
1.1       mike     2162: 
                   2163:    <dt>U+0022 QUOTATION MARK (")</dt>
                   2164:    <dd>Set the DOCTYPE token's system identifier to the empty string
                   2165:    (not missing), then switch to the <a href="#doctype-system-identifier-double-quoted-state">DOCTYPE system identifier
                   2166:    (double-quoted) state</a>.</dd>
                   2167: 
                   2168:    <dt>U+0027 APOSTROPHE (')</dt>
                   2169:    <dd>Set the DOCTYPE token's system identifier to the empty string
                   2170:    (not missing), then switch to the <a href="#doctype-system-identifier-single-quoted-state">DOCTYPE system identifier
                   2171:    (single-quoted) state</a>.</dd>
                   2172: 
                   2173:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                   2174:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14      mike     2175:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
                   2176:    state</a>. Emit that DOCTYPE token.</dd>
1.1       mike     2177: 
                   2178:    <dt>EOF</dt>
1.87      mike     2179:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   2180:    state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
                   2181:    <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1       mike     2182: 
                   2183:    <dt>Anything else</dt>
                   2184:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
                   2185:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#bogus-doctype-state">bogus
                   2186:    DOCTYPE state</a>.</dd>
                   2187: 
1.37      mike     2188:   </dl><h5 id="doctype-system-identifier-double-quoted-state"><span class="secno">8.2.4.64 </span><dfn>DOCTYPE system identifier (double-quoted) state</dfn></h5>
1.1       mike     2189: 
                   2190:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   2191: 
                   2192:   <dl class="switch"><dt>U+0022 QUOTATION MARK (")</dt>
                   2193:    <dd>Switch to the <a href="#after-doctype-system-identifier-state">after DOCTYPE system identifier
                   2194:    state</a>.</dd>
                   2195: 
1.51      mike     2196:    <dt>U+0000 NULL</dt>
                   2197:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
                   2198:    character to the current DOCTYPE token's system identifier.</dd>
                   2199: 
1.1       mike     2200:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                   2201:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14      mike     2202:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
                   2203:    state</a>. Emit that DOCTYPE token.</dd>
1.1       mike     2204: 
                   2205:    <dt>EOF</dt>
1.87      mike     2206:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   2207:    state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
                   2208:    <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1       mike     2209: 
                   2210:    <dt>Anything else</dt>
                   2211:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14      mike     2212:    DOCTYPE token's system identifier.</dd>
1.1       mike     2213: 
1.37      mike     2214:   </dl><h5 id="doctype-system-identifier-single-quoted-state"><span class="secno">8.2.4.65 </span><dfn>DOCTYPE system identifier (single-quoted) state</dfn></h5>
1.1       mike     2215: 
                   2216:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   2217: 
                   2218:   <dl class="switch"><dt>U+0027 APOSTROPHE (')</dt>
                   2219:    <dd>Switch to the <a href="#after-doctype-system-identifier-state">after DOCTYPE system identifier
                   2220:    state</a>.</dd>
                   2221: 
1.51      mike     2222:    <dt>U+0000 NULL</dt>
                   2223:    <dd><a href="parsing.html#parse-error">Parse error</a>. Append a U+FFFD REPLACEMENT CHARACTER
                   2224:    character to the current DOCTYPE token's system identifier.</dd>
                   2225: 
1.1       mike     2226:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
                   2227:    <dd><a href="parsing.html#parse-error">Parse error</a>. Set the DOCTYPE token's
1.14      mike     2228:    <i>force-quirks flag</i> to <i>on</i>. Switch to the <a href="#data-state">data
                   2229:    state</a>. Emit that DOCTYPE token.</dd>
1.1       mike     2230: 
                   2231:    <dt>EOF</dt>
1.87      mike     2232:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   2233:    state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
                   2234:    <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1       mike     2235: 
                   2236:    <dt>Anything else</dt>
                   2237:    <dd>Append the <a href="parsing.html#current-input-character">current input character</a> to the current
1.14      mike     2238:    DOCTYPE token's system identifier.</dd>
1.1       mike     2239: 
1.37      mike     2240:   </dl><h5 id="after-doctype-system-identifier-state"><span class="secno">8.2.4.66 </span><dfn>After DOCTYPE system identifier state</dfn></h5>
1.1       mike     2241: 
                   2242:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   2243: 
1.73      mike     2244:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     2245:    <dt>U+000A LINE FEED (LF)</dt>
                   2246:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     2247:    
1.1       mike     2248:    <dt>U+0020 SPACE</dt>
1.14      mike     2249:    <dd>Ignore the character.</dd>
1.1       mike     2250: 
                   2251:    <dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     2252:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the current DOCTYPE
                   2253:    token.</dd>
1.1       mike     2254: 
                   2255:    <dt>EOF</dt>
1.87      mike     2256:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#data-state">data
                   2257:    state</a>. Set the DOCTYPE token's <i>force-quirks flag</i> to
                   2258:    <i>on</i>. Emit that DOCTYPE token. Reconsume the EOF character.</dd>
1.1       mike     2259: 
                   2260:    <dt>Anything else</dt>
                   2261:    <dd><a href="parsing.html#parse-error">Parse error</a>. Switch to the <a href="#bogus-doctype-state">bogus DOCTYPE
                   2262:    state</a>. (This does <em>not</em> set the DOCTYPE token's
                   2263:    <i>force-quirks flag</i> to <i>on</i>.)</dd>
                   2264: 
1.37      mike     2265:   </dl><h5 id="bogus-doctype-state"><span class="secno">8.2.4.67 </span><dfn>Bogus DOCTYPE state</dfn></h5>
1.1       mike     2266: 
                   2267:   <p>Consume the <a href="parsing.html#next-input-character">next input character</a>:</p>
                   2268: 
                   2269:   <dl class="switch"><dt>U+003E GREATER-THAN SIGN (&gt;)</dt>
1.14      mike     2270:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the DOCTYPE
                   2271:    token.</dd>
1.1       mike     2272: 
                   2273:    <dt>EOF</dt>
1.87      mike     2274:    <dd>Switch to the <a href="#data-state">data state</a>. Emit the DOCTYPE token.
                   2275:    Reconsume the EOF character.</dd>
1.1       mike     2276: 
                   2277:    <dt>Anything else</dt>
1.14      mike     2278:    <dd>Ignore the character.</dd>
1.1       mike     2279: 
1.37      mike     2280:   </dl><h5 id="cdata-section-state"><span class="secno">8.2.4.68 </span><dfn>CDATA section state</dfn></h5>
1.1       mike     2281: 
1.87      mike     2282:   <p>Switch to the <a href="#data-state">data state</a>.</p>
                   2283: 
1.1       mike     2284:   <p>Consume every character up to the next occurrence of the three
                   2285:   character sequence U+005D RIGHT SQUARE BRACKET U+005D RIGHT SQUARE
                   2286:   BRACKET U+003E GREATER-THAN SIGN (<code title="">]]&gt;</code>), or the
                   2287:   end of the file (EOF), whichever comes first. Emit a series of
                   2288:   character tokens consisting of all the characters consumed except
                   2289:   the matching three character sequence at the end (if one was found
1.70      mike     2290:   before the end of the file).</p>
1.1       mike     2291: 
                   2292:   <p>If the end of the file was reached, reconsume the EOF
                   2293:   character.</p>
                   2294: 
                   2295: 
                   2296: 
1.37      mike     2297:   <h5 id="tokenizing-character-references"><span class="secno">8.2.4.69 </span>Tokenizing character references</h5>
1.1       mike     2298: 
                   2299:   <p>This section defines how to <dfn id="consume-a-character-reference">consume a character
                   2300:   reference</dfn>. This definition is used when parsing character
                   2301:   references <a href="#character-reference-in-data-state" title="character reference in data state">in
                   2302:   text</a> and <a href="#character-reference-in-attribute-value-state" title="character reference in attribute value
                   2303:   state">in attributes</a>.</p>
                   2304: 
                   2305:   <p>The behavior depends on the identity of the next character (the
                   2306:   one immediately after the U+0026 AMPERSAND character):</p>
                   2307: 
1.73      mike     2308:   <dl class="switch"><dt>U+0009 CHARACTER TABULATION (tab)</dt>
1.1       mike     2309:    <dt>U+000A LINE FEED (LF)</dt>
                   2310:    <dt>U+000C FORM FEED (FF)</dt>
1.70      mike     2311:    
1.1       mike     2312:    <dt>U+0020 SPACE</dt>
                   2313:    <dt>U+003C LESS-THAN SIGN</dt>
                   2314:    <dt>U+0026 AMPERSAND</dt>
                   2315:    <dt>EOF</dt>
                   2316:    <dt>The <dfn id="additional-allowed-character">additional allowed character</dfn>, if there is one</dt>
                   2317: 
                   2318:    <dd>Not a character reference. No characters are consumed, and
                   2319:    nothing is returned. (This is not an error, either.)</dd>
                   2320: 
                   2321: 
                   2322:    <dt>U+0023 NUMBER SIGN (#)</dt>
                   2323: 
                   2324:    <dd>
                   2325: 
                   2326:     <p>Consume the U+0023 NUMBER SIGN.</p>
                   2327: 
                   2328:     <p>The behavior further depends on the character after the U+0023
                   2329:     NUMBER SIGN:</p>
                   2330: 
                   2331:     <dl class="switch"><dt>U+0078 LATIN SMALL LETTER X</dt>
                   2332:      <dt>U+0058 LATIN CAPITAL LETTER X</dt>
                   2333: 
                   2334:      <dd>
                   2335: 
                   2336:       <p>Consume the X.</p>
                   2337: 
                   2338:       <p>Follow the steps below, but using the range of characters
                   2339:       U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0061 LATIN
                   2340:       SMALL LETTER A to U+0066 LATIN SMALL LETTER F, and U+0041 LATIN
                   2341:       CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F (in other
                   2342:       words, 0-9, A-F, a-f).</p>
                   2343: 
                   2344:       <p>When it comes to interpreting the number, interpret it as a
                   2345:       hexadecimal number.</p>
                   2346: 
                   2347:      </dd>
                   2348: 
                   2349: 
                   2350:      <dt>Anything else</dt>
                   2351: 
                   2352:      <dd>
                   2353: 
                   2354:       <p>Follow the steps below, but using the range of characters
                   2355:       U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9).</p>
                   2356: 
                   2357:       <p>When it comes to interpreting the number, interpret it as a
                   2358:       decimal number.</p>
                   2359: 
                   2360:      </dd>
                   2361: 
                   2362:     </dl><p>Consume as many characters as match the range of characters
                   2363:     given above.</p>
                   2364: 
                   2365:     <p>If no characters match the range, then don't consume any
                   2366:     characters (and unconsume the U+0023 NUMBER SIGN character and, if
                   2367:     appropriate, the X character). This is a <a href="parsing.html#parse-error">parse
                   2368:     error</a>; nothing is returned.</p>
                   2369: 
                   2370:     <p>Otherwise, if the next character is a U+003B SEMICOLON, consume
                   2371:     that too. If it isn't, there is a <a href="parsing.html#parse-error">parse
                   2372:     error</a>.</p>
                   2373: 
                   2374:     <p>If one or more characters match the range, then take them all
                   2375:     and interpret the string of characters as a number (either
                   2376:     hexadecimal or decimal as appropriate).</p>
                   2377: 
                   2378:     <p>If that number is one of the numbers in the first column of the
                   2379:     following table, then this is a <a href="parsing.html#parse-error">parse error</a>. Find the
                   2380:     row with that number in the first column, and return a character
                   2381:     token for the Unicode character given in the second column of that
                   2382:     row.</p>
                   2383: 
1.26      mike     2384:     <table id="table-charref-overrides"><thead><tr><th>Number </th><th colspan="2">Unicode character
1.1       mike     2385:      </th></tr></thead><tbody><tr><td>0x00 </td><td>U+FFFD </td><td>REPLACEMENT CHARACTER
                   2386:       </td></tr><tr><td>0x0D </td><td>U+000D </td><td>CARRIAGE RETURN (CR)
                   2387:       </td></tr><tr><td>0x80 </td><td>U+20AC </td><td>EURO SIGN (&#8364;)
                   2388:       </td></tr><tr><td>0x81 </td><td>U+0081 </td><td>&lt;control&gt;
                   2389:       </td></tr><tr><td>0x82 </td><td>U+201A </td><td>SINGLE LOW-9 QUOTATION MARK (&#8218;)
                   2390:       </td></tr><tr><td>0x83 </td><td>U+0192 </td><td>LATIN SMALL LETTER F WITH HOOK (&#402;)
                   2391:       </td></tr><tr><td>0x84 </td><td>U+201E </td><td>DOUBLE LOW-9 QUOTATION MARK (&#8222;)
                   2392:       </td></tr><tr><td>0x85 </td><td>U+2026 </td><td>HORIZONTAL ELLIPSIS (&#8230;)
                   2393:       </td></tr><tr><td>0x86 </td><td>U+2020 </td><td>DAGGER (&#8224;)
                   2394:       </td></tr><tr><td>0x87 </td><td>U+2021 </td><td>DOUBLE DAGGER (&#8225;)
                   2395:       </td></tr><tr><td>0x88 </td><td>U+02C6 </td><td>MODIFIER LETTER CIRCUMFLEX ACCENT (&#710;)
                   2396:       </td></tr><tr><td>0x89 </td><td>U+2030 </td><td>PER MILLE SIGN (&#8240;)
                   2397:       </td></tr><tr><td>0x8A </td><td>U+0160 </td><td>LATIN CAPITAL LETTER S WITH CARON (&#352;)
                   2398:       </td></tr><tr><td>0x8B </td><td>U+2039 </td><td>SINGLE LEFT-POINTING ANGLE QUOTATION MARK (&#8249;)
                   2399:       </td></tr><tr><td>0x8C </td><td>U+0152 </td><td>LATIN CAPITAL LIGATURE OE (&#338;)
                   2400:       </td></tr><tr><td>0x8D </td><td>U+008D </td><td>&lt;control&gt;
                   2401:       </td></tr><tr><td>0x8E </td><td>U+017D </td><td>LATIN CAPITAL LETTER Z WITH CARON (&#381;)
                   2402:       </td></tr><tr><td>0x8F </td><td>U+008F </td><td>&lt;control&gt;
                   2403:       </td></tr><tr><td>0x90 </td><td>U+0090 </td><td>&lt;control&gt;
                   2404:       </td></tr><tr><td>0x91 </td><td>U+2018 </td><td>LEFT SINGLE QUOTATION MARK (&#8216;)
                   2405:       </td></tr><tr><td>0x92 </td><td>U+2019 </td><td>RIGHT SINGLE QUOTATION MARK (&#8217;)
                   2406:       </td></tr><tr><td>0x93 </td><td>U+201C </td><td>LEFT DOUBLE QUOTATION MARK (&#8220;)
                   2407:       </td></tr><tr><td>0x94 </td><td>U+201D </td><td>RIGHT DOUBLE QUOTATION MARK (&#8221;)
                   2408:       </td></tr><tr><td>0x95 </td><td>U+2022 </td><td>BULLET (&#8226;)
                   2409:       </td></tr><tr><td>0x96 </td><td>U+2013 </td><td>EN DASH (&#8211;)
                   2410:       </td></tr><tr><td>0x97 </td><td>U+2014 </td><td>EM DASH (&#8212;)
                   2411:       </td></tr><tr><td>0x98 </td><td>U+02DC </td><td>SMALL TILDE (&#732;)
                   2412:       </td></tr><tr><td>0x99 </td><td>U+2122 </td><td>TRADE MARK SIGN (&#8482;)
                   2413:       </td></tr><tr><td>0x9A </td><td>U+0161 </td><td>LATIN SMALL LETTER S WITH CARON (&#353;)
                   2414:       </td></tr><tr><td>0x9B </td><td>U+203A </td><td>SINGLE RIGHT-POINTING ANGLE QUOTATION MARK (&#8250;)
                   2415:       </td></tr><tr><td>0x9C </td><td>U+0153 </td><td>LATIN SMALL LIGATURE OE (&#339;)
                   2416:       </td></tr><tr><td>0x9D </td><td>U+009D </td><td>&lt;control&gt;
                   2417:       </td></tr><tr><td>0x9E </td><td>U+017E </td><td>LATIN SMALL LETTER Z WITH CARON (&#382;)
                   2418:       </td></tr><tr><td>0x9F </td><td>U+0178 </td><td>LATIN CAPITAL LETTER Y WITH DIAERESIS (&#376;)
1.70      mike     2419:     </td></tr></tbody></table><p>Otherwise, if the number is in the range 0xD800 to 0xDFFF or is greater than 0x10FFFF, then this is a
1.61      mike     2420:     <a href="parsing.html#parse-error">parse error</a>. Return a U+FFFD REPLACEMENT
                   2421:     CHARACTER.</p>
1.1       mike     2422: 
                   2423:     <p>Otherwise, return a character token for the Unicode character
                   2424:     whose code point is that number.
                   2425: 
1.90      mike     2426:     
                   2427:     If the number is in the range 0x0001 to 0x0008,    0x000E to 0x001F,  0x007F  to 0x009F, 0xFDD0 to
1.1       mike     2428:     0xFDEF, or is one of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF,
                   2429:     0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE,
                   2430:     0x5FFFF, 0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF,
                   2431:     0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE,
                   2432:     0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF,
                   2433:     0x10FFFE, or 0x10FFFF, then this is a <a href="parsing.html#parse-error">parse
                   2434:     error</a>.</p>
                   2435: 
                   2436:    </dd>
                   2437: 
                   2438: 
                   2439:    <dt>Anything else</dt>
                   2440: 
                   2441:    <dd>
                   2442: 
                   2443:     <p>Consume the maximum number of characters possible, with the
                   2444:     consumed characters matching one of the identifiers in the first
                   2445:     column of the <a href="named-character-references.html#named-character-references">named character references</a> table (in a
                   2446:     <a href="infrastructure.html#case-sensitive">case-sensitive</a> manner).</p>
                   2447: 
                   2448:     <p>If no match can be made, then no characters are consumed, and
                   2449:     nothing is returned. In this case, if the characters after the
                   2450:     U+0026 AMPERSAND character (&amp;) consist of a sequence of one or
                   2451:     more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT
                   2452:     NINE (9), U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER
                   2453:     Z, and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL
                   2454:     LETTER Z, followed by a U+003B SEMICOLON character (;), then this
                   2455:     is a <a href="parsing.html#parse-error">parse error</a>.</p>
                   2456: 
                   2457:     <p>If the character reference is being consumed <a href="#character-reference-in-attribute-value-state" title="character reference in attribute value state">as part of an
                   2458:     attribute</a>, and the last character matched is not a U+003B
                   2459:     SEMICOLON character (;), and the next character is either a U+003D
                   2460:     EQUALS SIGN character (=) or in the range U+0030 DIGIT ZERO (0) to
                   2461:     U+0039 DIGIT NINE (9), U+0041 LATIN CAPITAL LETTER A to U+005A
                   2462:     LATIN CAPITAL LETTER Z, or U+0061 LATIN SMALL LETTER A to U+007A
                   2463:     LATIN SMALL LETTER Z, then, for historical reasons, all the
                   2464:     characters that were matched after the U+0026 AMPERSAND character
                   2465:     (&amp;) must be unconsumed, and nothing is returned.</p>
1.70      mike     2466:     
1.1       mike     2467: 
                   2468:     <p>Otherwise, a character reference is parsed. If the last
                   2469:     character matched is not a U+003B SEMICOLON character (;), there
                   2470:     is a <a href="parsing.html#parse-error">parse error</a>.</p>
                   2471: 
1.41      mike     2472:     <p>Return one or two character tokens for the character(s)
                   2473:     corresponding to the character reference name (as given by the
                   2474:     second column of the <a href="named-character-references.html#named-character-references">named character references</a>
                   2475:     table).</p>
1.1       mike     2476: 
                   2477:     <div class="example">
                   2478: 
                   2479:      <p>If the markup contains (not in an attribute) the string <code title="">I'm &amp;notit; I tell you</code>, the character
                   2480:      reference is parsed as "not", as in, <code title="">I'm &#172;it;
                   2481:      I tell you</code> (and this is a parse error). But if the markup
                   2482:      was <code title="">I'm &amp;notin; I tell you</code>, the
                   2483:      character reference would be parsed as "notin;", resulting in
                   2484:      <code title="">I'm &#8713; I tell you</code> (and no parse
                   2485:      error).</p>
                   2486: 
                   2487:     </div>
                   2488: 
                   2489:    </dd>
                   2490: 
                   2491:   </dl></div></body></html>

Webmaster