|
110 | 110 | \indextext{line splicing}% |
111 | 111 | If the first translation character is \unicode{feff}{byte order mark}, |
112 | 112 | it is deleted. |
113 | | -Each sequence of a backslash character (\textbackslash) |
| 113 | +Each sequence of a backslash character (\unicode{005c}{reverse solidus}) |
114 | 114 | immediately followed by |
115 | | -zero or more whitespace characters other than new-line followed by |
| 115 | +zero or more \grammarterm{whitespace-character}s other than new-line followed by |
116 | 116 | a new-line character is deleted, splicing |
117 | 117 | physical source lines to form \defnx{logical source lines}{source line!logical}. Only the last |
118 | 118 | backslash on any physical source line shall be eligible for being part |
|
126 | 126 | shall be processed as if an additional new-line character were appended |
127 | 127 | to the file. |
128 | 128 |
|
129 | | -\item The source file is decomposed into preprocessing |
130 | | -tokens\iref{lex.pptoken} and sequences of whitespace characters |
131 | | -(including comments). A source file shall not end in a partial |
| 129 | +\item |
| 130 | +\indextext{whitespace}% |
| 131 | +\indextext{comment}% |
| 132 | +\indextext{token!preprocessing}% |
| 133 | +The source file is decomposed into preprocessing |
| 134 | +tokens\iref{lex.pptoken} and whitespace\iref{lex.whitespace} (sequences of \grammarterm{whitespace-character}s |
| 135 | +and comments). A source file shall not end in a partial |
132 | 136 | preprocessing token or in a partial comment. |
133 | 137 | \begin{footnote} |
134 | 138 | A partial preprocessing |
|
140 | 144 | would arise from a source file ending with an unclosed \tcode{/*} |
141 | 145 | comment. |
142 | 146 | \end{footnote} |
143 | | -Each comment\iref{lex.comment} is replaced by one space character. New-line characters are |
144 | | -retained. Whether each nonempty sequence of whitespace characters other |
145 | | -than new-line is retained or replaced by one space character is |
| 147 | +Each comment\iref{lex.comment} is replaced by one \unicode{0020}{space} character. New-line characters are |
| 148 | +retained. Whether each nonempty sequence of \grammarterm{whitespace-character}s other |
| 149 | +than new-line is retained or replaced by one \unicode{0020}{space} character is |
146 | 150 | unspecified. |
147 | 151 | As characters from the source file are consumed |
148 | 152 | to form the next preprocessing token |
|
178 | 182 | \item |
179 | 183 | Adjacent \grammarterm{string-literal} tokens are concatenated\iref{lex.string}. |
180 | 184 |
|
181 | | -\item Whitespace characters separating tokens are no longer |
182 | | -significant. Each preprocessing token is converted into a |
183 | | -token\iref{lex.token}. The resulting tokens |
184 | | -constitute a \defn{translation unit} and |
| 185 | +\item |
| 186 | +Each preprocessing token is converted into a token\iref{lex.token}. |
| 187 | +Any \grammarterm{whitespace-character}s separating tokens are no longer significant. |
| 188 | +The resulting tokens constitute a \defn{translation unit} and |
185 | 189 | are syntactically and |
186 | 190 | semantically analyzed and translated. |
187 | 191 | \begin{note} |
|
467 | 471 | None of these names or aliases have leading or trailing spaces. |
468 | 472 | \end{note} |
469 | 473 |
|
470 | | -\rSec1[lex.comment]{Comments} |
| 474 | +\rSec1[lex.whitespace]{Whitespace} |
| 475 | +\indextext{whitespace|(}% |
| 476 | + |
| 477 | +\rSec2[lex.whitechar]{Whitespace Characters} |
| 478 | + |
| 479 | +\indextext{character!whitespace|(}% |
| 480 | +\begin{bnf} |
| 481 | +\nontermdef{whitespace-character}\br |
| 482 | + \unicode{0009}{character tabulation}\br |
| 483 | + \textnormal{new-line}\br |
| 484 | + \unicode{000b}{line tabulation}\br |
| 485 | + \unicode{000c}{form feed}\br |
| 486 | + \unicode{0020}{space}\br |
| 487 | +\end{bnf} |
| 488 | + |
| 489 | +\pnum |
| 490 | +\begin{note} |
| 491 | +Whitespace characters are used to separate elements of the \Cpp grammar. |
| 492 | +\end{note} |
| 493 | +\indextext{character!whitespace|)} |
| 494 | + |
| 495 | +\rSec2[lex.comment]{Comments} |
471 | 496 |
|
472 | 497 | \pnum |
473 | 498 | \indextext{comment|(}% |
|
477 | 502 | characters \tcode{*/}. These comments do not nest. |
478 | 503 | \indextext{comment!\tcode{//}}% |
479 | 504 | The characters \tcode{//} start a comment, which terminates immediately before the |
480 | | -next new-line character. If there is a form-feed or a vertical-tab |
481 | | -character in such a comment, only whitespace characters shall appear |
| 505 | +next new-line character. If there is a \unicode{000c}{form feed} or a \unicode{000b}{line tabulation} |
| 506 | +character in such a comment, only \grammarterm{whitespace-character}s shall appear |
482 | 507 | between it and the new-line that terminates the comment; no diagnostic |
483 | 508 | is required. |
484 | 509 | \begin{note} |
|
489 | 514 | \tcode{/*} comment. |
490 | 515 | \end{note} |
491 | 516 | \indextext{comment|)} |
| 517 | +\indextext{whitespace|)}% |
492 | 518 |
|
493 | 519 | \rSec1[lex.pptoken]{Preprocessing tokens} |
494 | 520 |
|
|
506 | 532 | string-literal\br |
507 | 533 | user-defined-string-literal\br |
508 | 534 | preprocessing-op-or-punc\br |
509 | | - \textnormal{each non-whitespace character that cannot be one of the above} |
| 535 | + \textnormal{each non-\grammarterm{whitespace-character} that cannot be one of the above} |
510 | 536 | \end{bnf} |
511 | 537 |
|
512 | 538 | \pnum |
|
520 | 546 | (\grammarterm{import-keyword}, \grammarterm{module-keyword}, and \grammarterm{export-keyword}), |
521 | 547 | identifiers, preprocessing numbers, character literals (including user-defined character |
522 | 548 | literals), string literals (including user-defined string literals), preprocessing |
523 | | -operators and punctuators, and single non-whitespace characters that do not lexically |
| 549 | +operators and punctuators, and single non-\grammarterm{whitespace-character}s that do not lexically |
524 | 550 | match the other preprocessing token categories. |
525 | 551 | If a \unicode{0027}{apostrophe} or a \unicode{0022}{quotation mark} character |
526 | 552 | matches the last category, the program is ill-formed. |
527 | 553 | If any character not in the basic character set matches the last category, |
528 | 554 | the program is ill-formed. |
529 | 555 | Preprocessing tokens can be separated by |
530 | 556 | \indextext{whitespace}% |
531 | | -whitespace; |
| 557 | +whitespace\iref{lex.whitespace}; |
532 | 558 | \indextext{comment}% |
533 | | -this consists of comments\iref{lex.comment}, or whitespace characters |
534 | | -(\unicode{0020}{space}, |
535 | | -\unicode{0009}{character tabulation}, |
536 | | -new-line, |
537 | | -\unicode{000b}{line tabulation}, and |
538 | | -\unicode{000c}{form feed}), or both. |
| 559 | +this consists of comments, \grammarterm{whitespace-character}s, or both. |
539 | 560 | As described in \ref{cpp}, in certain |
540 | 561 | circumstances during translation phase 4, whitespace (or the absence |
541 | 562 | thereof) serves as more than preprocessing token separation. Whitespace |
|
826 | 847 | \end{footnote} |
827 | 848 | operators, and other separators. |
828 | 849 | \indextext{whitespace}% |
829 | | -Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments |
830 | | -(collectively, ``whitespace''), as described below, are ignored except |
831 | | -as they serve to separate tokens. |
| 850 | +Whitespace\iref{lex.whitespace} is ignored except to separate tokens. |
832 | 851 | \begin{note} |
833 | 852 | Whitespace can separate otherwise adjacent identifiers, keywords, numeric |
834 | 853 | literals, and alternative tokens containing alphabetic characters. |
|
1790 | 1809 | \begin{bnf} |
1791 | 1810 | \nontermdef{d-char}\br |
1792 | 1811 | \textnormal{any member of the basic character set except:}\br |
1793 | | - \bnfindent\textnormal{\unicode{0020}{space}, \unicode{0028}{left parenthesis}, \unicode{0029}{right parenthesis}, \unicode{005c}{reverse solidus},}\br |
1794 | | - \bnfindent\textnormal{\unicode{0009}{character tabulation}, \unicode{000b}{line tabulation}, \unicode{000c}{form feed}, and new-line} |
| 1812 | + \bnfindent\textnormal{a \grammarterm{whitespace-character}, \unicode{0028}{left parenthesis}, \unicode{0029}{right parenthesis},}\br |
| 1813 | + \bnfindent\textnormal{and \unicode{005c}{reverse solidus}} |
1795 | 1814 | \end{bnf} |
1796 | 1815 |
|
1797 | 1816 | \pnum |
|
0 commit comments