Skip to content

Commit 5cefb6c

Browse files
bpo-25324: Move the description of tokenize tokens to token.rst. (#1911)
1 parent 6260d9f commit 5cefb6c

File tree

2 files changed

+39
-39
lines changed

2 files changed

+39
-39
lines changed

Doc/library/token.rst

+30-11
Original file line numberDiff line numberDiff line change
@@ -101,18 +101,37 @@ The token constants are:
101101
AWAIT
102102
ASYNC
103103
ERRORTOKEN
104-
COMMENT
105-
NL
106-
ENCODING
107104
N_TOKENS
108105
NT_OFFSET
109106

110-
.. versionchanged:: 3.5
111-
Added :data:`AWAIT` and :data:`ASYNC` tokens. Starting with
112-
Python 3.7, "async" and "await" will be tokenized as :data:`NAME`
113-
tokens, and :data:`AWAIT` and :data:`ASYNC` will be removed.
114107

115-
.. versionchanged:: 3.7
116-
Added :data:`COMMENT`, :data:`NL` and :data:`ENCODING` to bring
117-
the tokens in the C code in line with the tokens needed in
118-
:mod:`tokenize` module. These tokens aren't used by the C tokenizer.
108+
The following token type values aren't used by the C tokenizer but are needed for
109+
the :mod:`tokenize` module.
110+
111+
.. data:: COMMENT
112+
113+
Token value used to indicate a comment.
114+
115+
116+
.. data:: NL
117+
118+
Token value used to indicate a non-terminating newline. The
119+
:data:`NEWLINE` token indicates the end of a logical line of Python code;
120+
``NL`` tokens are generated when a logical line of code is continued over
121+
multiple physical lines.
122+
123+
124+
.. data:: ENCODING
125+
126+
Token value that indicates the encoding used to decode the source bytes
127+
into text. The first token returned by :func:`tokenize.tokenize` will
128+
always be an ``ENCODING`` token.
129+
130+
131+
.. versionchanged:: 3.5
132+
Added :data:`AWAIT` and :data:`ASYNC` tokens. Starting with
133+
Python 3.7, "async" and "await" will be tokenized as :data:`NAME`
134+
tokens, and :data:`AWAIT` and :data:`ASYNC` will be removed.
135+
136+
.. versionchanged:: 3.7
137+
Added :data:`COMMENT`, :data:`NL` and :data:`ENCODING` tokens.

Doc/library/tokenize.rst

+9-28
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ as well, making it useful for implementing "pretty-printers," including
1717
colorizers for on-screen displays.
1818

1919
To simplify token stream handling, all :ref:`operators` and :ref:`delimiters`
20-
tokens are returned using the generic :data:`token.OP` token type. The exact
20+
tokens are returned using the generic :data:`~token.OP` token type. The exact
2121
type can be determined by checking the ``exact_type`` property on the
2222
:term:`named tuple` returned from :func:`tokenize.tokenize`.
2323

@@ -44,7 +44,7 @@ The primary entry point is a :term:`generator`:
4444

4545
The returned :term:`named tuple` has an additional property named
4646
``exact_type`` that contains the exact operator type for
47-
:data:`token.OP` tokens. For all other token types ``exact_type``
47+
:data:`~token.OP` tokens. For all other token types ``exact_type``
4848
equals the named tuple ``type`` field.
4949

5050
.. versionchanged:: 3.1
@@ -58,26 +58,7 @@ The primary entry point is a :term:`generator`:
5858

5959

6060
All constants from the :mod:`token` module are also exported from
61-
:mod:`tokenize`, as are three additional token type values:
62-
63-
.. data:: COMMENT
64-
65-
Token value used to indicate a comment.
66-
67-
68-
.. data:: NL
69-
70-
Token value used to indicate a non-terminating newline. The NEWLINE token
71-
indicates the end of a logical line of Python code; NL tokens are generated
72-
when a logical line of code is continued over multiple physical lines.
73-
74-
75-
.. data:: ENCODING
76-
77-
Token value that indicates the encoding used to decode the source bytes
78-
into text. The first token returned by :func:`.tokenize` will always be an
79-
ENCODING token.
80-
61+
:mod:`tokenize`.
8162

8263
Another function is provided to reverse the tokenization process. This is
8364
useful for creating tools that tokenize a script, modify the token stream, and
@@ -96,8 +77,8 @@ write back the modified script.
9677
token type and token string as the spacing between tokens (column
9778
positions) may change.
9879

99-
It returns bytes, encoded using the ENCODING token, which is the first
100-
token sequence output by :func:`.tokenize`.
80+
It returns bytes, encoded using the :data:`~token.ENCODING` token, which
81+
is the first token sequence output by :func:`.tokenize`.
10182

10283

10384
:func:`.tokenize` needs to detect the encoding of source files it tokenizes. The
@@ -115,7 +96,7 @@ function it uses to do this is available:
11596

11697
It detects the encoding from the presence of a UTF-8 BOM or an encoding
11798
cookie as specified in :pep:`263`. If both a BOM and a cookie are present,
118-
but disagree, a SyntaxError will be raised. Note that if the BOM is found,
99+
but disagree, a :exc:`SyntaxError` will be raised. Note that if the BOM is found,
119100
``'utf-8-sig'`` will be returned as an encoding.
120101

121102
If no encoding is specified, then the default of ``'utf-8'`` will be
@@ -147,8 +128,8 @@ function it uses to do this is available:
147128
3
148129

149130
Note that unclosed single-quoted strings do not cause an error to be
150-
raised. They are tokenized as ``ERRORTOKEN``, followed by the tokenization of
151-
their contents.
131+
raised. They are tokenized as :data:`~token.ERRORTOKEN`, followed by the
132+
tokenization of their contents.
152133

153134

154135
.. _tokenize-cli:
@@ -260,7 +241,7 @@ the name of the token, and the final column is the value of the token (if any)
260241
4,11-4,12: NEWLINE '\n'
261242
5,0-5,0: ENDMARKER ''
262243
263-
The exact token type names can be displayed using the ``-e`` option:
244+
The exact token type names can be displayed using the :option:`-e` option:
264245

265246
.. code-block:: sh
266247

0 commit comments

Comments
 (0)