Contents Index Previous Next
2.1 Character Set
1
{character set} The
only characters allowed outside of
comments
are the
graphic_characters and
format_effectors.
1.a
Ramification: Any character,
including an other_control_function,
is allowed in a comment.
1.b
Note that this rule doesn't really
have much force, since the implementation can represent characters in
the source in any way it sees fit. For example, an implementation could
simply define that what seems to be a non-graphic, non-format-effector
character is actually a representation of the space character.
1.c
Discussion: It is our intent to
follow the terminology of ISO 10646 BMP where appropriate, and to remain compatible
with the character classifications defined in A.3,
``Character Handling''. Note that our definition for
graphic_character is more inclusive than
that of ISO 10646-1.
Syntax
2
character
::= graphic_character |
format_effector |
other_control_function
3
graphic_character
::= identifier_letter |
digit |
space_character |
special_character
Static Semantics
4
The character repertoire for the text of an Ada
program consists of the collection of characters called the Basic Multilingual
Plane (BMP) of the ISO 10646 Universal Multiple-Octet Coded Character
Set, plus a set of format_effectors
and, in comments only, a set of other_control_functions;
the coded representation for these characters is implementation defined
[(it need not be a representation defined within ISO-10646-1)].
4.a
Implementation defined: The
coded representation for the text of an Ada program.
5
The description of the language definition in
this International Standard uses the graphic symbols defined for Row
00: Basic Latin and Row 00: Latin-1 Supplement of the ISO 10646 BMP;
these correspond to the graphic symbols of ISO 8859-1 (Latin-1); no graphic
symbols are used in this International Standard for characters outside
of Row 00 of the BMP. The actual set of graphic symbols used by an implementation
for the visual representation of the text of an Ada program is not specified.
{unspecified [partial]}
6
The categories of
characters are defined as follows:
7
- {identifier_letter} identifier_letter
-
upper_case_identifier_letter | lower_case_identifier_letter
7.a
Discussion: We use identifier_letter
instead of simply letter because
ISO 10646 BMP includes many other characters that would generally be
considered "letters."
8
- {upper_case_identifier_letter} upper_case_identifier_letter
-
Any character of Row 00 of ISO 10646 BMP whose name begins ``Latin Capital
Letter''.
9
- {lower_case_identifier_letter} lower_case_identifier_letter
-
Any character of Row 00 of ISO 10646 BMP whose name begins ``Latin Small
Letter''.
9.a/1
This paragraph was deleted.To
be honest: {8652/0001} The
above rules do not include the ligatures Æ and æ. However, the intent
is to include these characters as identifier letters. This problem was pointed
out by a comment from the Netherlands.
10
- {digit} digit
-
One of the characters 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9.
11
- {space_character} space_character
-
The character of ISO 10646 BMP named ``Space''.
12
- {special_character} special_character
-
Any character of the ISO 10646 BMP that is not reserved for a control
function, and is not the space_character,
an identifier_letter, or a digit.
12.a
Ramification: Note that
the no break space and soft hyphen are special_characters,
and therefore graphic_characters.
They are not the same characters as space and hyphen-minus.
13
- {format_effector} format_effector
-
The control functions of ISO 6429 called character tabulation (HT), line
tabulation (VT), carriage return (CR), line feed (LF), and form feed
(FF). {control character: See also format_effector}
14
- {other_control_function} other_control_function
-
Any control function, other than a format_effector,
that is allowed in a comment; the set of other_control_functions
allowed in comments is implementation defined. {control
character: See also other_control_function}
14.a
Implementation defined: The
control functions allowed in comments.
15
{names
of special_characters} {special_character
(names)} The following names are used
when referring to certain
special_characters:
{quotation mark} {number
sign} {ampersand}
{apostrophe} {tick}
{left parenthesis} {right
parenthesis} {asterisk}
{multiply} {plus
sign} {comma}
{hyphen-minus} {minus}
{full stop} {dot}
{point} {solidus}
{divide} {colon}
{semicolon} {less-than
sign} {equals sign}
{greater-than sign} {low
line} {underline}
{vertical line} {left
square bracket} {right
square bracket} {left
curly bracket} {right
curly bracket}
15.a
Discussion: These are the
ones that play a special role in the syntax of Ada 95, or in the syntax
rules; we don't bother to define names for all characters. The first
name given is the name from ISO 10646-1; the subsequent names, if any,
are those used within the standard, depending on context.
symbol | name | symbol | name |
|
| | | |
|
" | quotation mark | : | colon |
|
# | number sign | ; | semicolon |
|
& | ampersand | < | less-than sign |
|
' | apostrophe, tick | = | equals sign |
|
( | left parenthesis | > | greater-than sign |
|
) | right parenthesis | _ | low line, underline |
|
* | asterisk, multiply | | | vertical line |
|
+ | plus sign | [ | left square bracket |
|
, | comma | ] | right square bracket |
|
- | hyphen-minus, minus | { | left curly bracket |
|
. | full stop, dot, point | } | right curly bracket |
|
/ | solidus, divide | | |
|
Implementation Permissions
16
In a nonstandard mode, the implementation may
support a different character repertoire[; in particular, the set of
characters that are considered identifier_letters
can be extended or changed to conform to local conventions].
16.a
Ramification: If an implementation
supports other character sets, it defines which characters fall into
each category, such as ``identifier_letter,''
and what the corresponding rules of this section are, such as which characters
are allowed in the text of a program.
17
1 Every code position of
ISO 10646 BMP that is not reserved for a control function is defined
to be a graphic_character by this
International Standard. This includes all code positions other than 0000
- 001F, 007F - 009F, and FFFE - FFFF.
18
2 The language does not specify
the source representation of programs.
18.a
Discussion: Any source
representation is valid so long as the implementer can produce an (information-preserving)
algorithm for translating both directions between the representation
and the standard character set. (For example, every character in the
standard character set has to be representable, even if the output devices
attached to a given computer cannot print all of those characters properly.)
From a practical point of view, every implementer will have to provide
some way to process the ACVC. It is the intent to allow source representations,
such as parse trees, that are not even linear sequences of characters.
It is also the intent to allow different fonts: reserved words might
be in bold face, and that should be irrelevant to the semantics.
Extensions to Ada 83
18.b
{extensions to Ada 83}
Ada 95 allows 8-bit and 16-bit characters, as well
as implementation-specified character sets.
Wording Changes from Ada 83
18.c
The syntax rules in this clause are modified
to remove the emphasis on basic characters vs. others. (In this day and age,
there is no need to point out that you can write programs without using (for
example) lower case letters.) In particular, character
(representing all characters usable outside comments) is added, and basic_graphic_character,
other_special_character, and basic_character
are removed. Special_character is expanded
to include Ada 83's other_special_character,
as well as new 8-bit characters not present in Ada 83. Note that the term ``basic
letter'' is used in A.3, ``Character
Handling'' to refer to letters without diacritical marks.
18.d
Character names now come from
ISO 10646.
18.e
We use identifier_letter
rather than letter since ISO 10646
BMP includes many "letters' that are not permitted in identifiers
(in the standard mode).
Contents Index Previous Next Legal