JS Lexical Grammer

This page describes JavaScript's lexical grammar.
The source text of

ECMAScript scripts gets scanned from left to right and is converted into a
sequence of input elements which are tokens, control characters, line
terminators, comments or white space. ECMAScript also defines certain
keywords and literals and has rules for automatic insertion of semicolons
to end statements.
Control characters
Control characters have no visual representation but are used to control
the interpretation of the text.
Unicode format-control characters

Code Abbreviatio
Name Description
point n
Zero width Placed between characters to prevent being connected
U+200C <ZWNJ>
non-joiner into ligatures in certain languages (Wikipedia).
Placed between characters that would not normally be
Zero width connected in order to cause the characters to be
U+200D <ZWJ>
joiner rendered using their connected form in certain
languages (Wikipedia).
Byte order Used at the start of the script to mark it as Unicode
U+FEFF <BOM>
mark and the text's byte order (Wikipedia).
White space
White space characters improve the readability of source text and
separate tokens from each other. These characters are usually
unnecessary for the functionality of the code. Minification tools are often
used to remove whitespace in order to reduce the amount of data that
needs to be transferred.
White space characters
Code Abbreviatio Escape
Name Description
point n sequence
Character
U+0009 <HT> Horizontal tabulation \t
tabulation
U+000B Line tabulation <VT> Vertical tabulation \v
Page breaking control
U+000C Form feed <FF> \f
character (Wikipedia).
U+0020 Space <SP> Normal space
Normal space, but no point
U+00A0 No-break space <NBSP>
at which a line may break
Other Unicode Spaces in Unicode on
Others <USP>
space characters Wikipedia
Line terminators
In addition to white space characters, line terminator characters are used
to improve the readability of the source text. However, in some cases, line
terminators can influence the execution of JavaScript code as there are a
few places where they are forbidden. Line terminators also affect the
process of automatic semicolon insertion. Line terminators are matched
by the \s class in regular expressions.
Only the following Unicode code points are treated as line terminators in
ECMAScript, other line breaking characters are treated as white space (for
example, Next Line, NEL, U+0085 is considered as white space).
Line terminator characters

Code Abbreviatio Escape
Name Description
point n sequence
New line character in UNIX
U+000A Line Feed <LF> \n
systems.
New line character in
Carriage
U+000D <CR> Commodore and early Mac \r
Return
systems.
Line
U+2028 <LS> Wikipedia
Separator
Paragraph
U+2029 <PS> Wikipedia
Separator
Comments
Comments are used to add hints, notes, suggestions, or warnings to
JavaScript code. This can make it easier to read and understand. They can
also be used to disable code to prevent it from being executed; this can
be a valuable debugging tool.
JavaScript has two ways of assigning comments in its code.

The first way is the // comment; this makes all text following it on the
same line into a comment. For example:
function comment() {
// This is a one line JavaScript comment
console.log('Hello world!');
}
comment();
The second way is the /* */ style, which is much more flexible.
For example, you can use it on a single line:
/* This is a one line JavaScript comment */
}
comment();
You can also make multiple-line comments, like this:
/* This comment spans multiple lines. Notice
that we don't need to end the comment until we're done. */
}
comment();
You can also use it in the middle of a line, if you wish, although this can
make your code harder to read so it should be used with caution:
function comment(x) {
console.log('Hello ' + x /* insert the value of x */ + ' !');
}
comment('world');
In addition, you can use it to disable code to prevent it from running, by
wrapping code in a comment, like this:
/* console.log('Hello world!'); */
}
comment();
In this case, the console.log() call is never issued, since it's inside a
comment. Any number of lines of code can be disabled this way.
Keywords
Reserved keywords as of ECMAScript 2015
break
case
catch
class
const
continue
debugger
default
delete
do
else
export
extends
finally
for
function
if
import
in
instanceof
new
return
super
switch
this
throw
try
typeof
var
void
while
with
yield
Future reserved keywords

The following are reserved as future keywords by the ECMAScript
specification. They have no special functionality at present, but they might
at some future time, so they cannot be used as identifiers.
These are always reserved:
enum
The following are only reserved when they are found in strict mode code:
implements
interface
let
package
private
protected
public
static
The following are only reserved when they are found in module code:
await
Future reserved keywords in older standards
The following are reserved as future keywords by older ECMAScript

specifications (ECMAScript 1 till 3).
abstract
boolean
byte
char
double
final
float
goto
int
long
native
short
synchronized
throws
transient
volatile
Additionally, the literals null, true, and false cannot be used as
identifiers in ECMAScript.
Reserved word usage

Reserved words actually only apply to Identifiers (vs. IdentifierNames) .
As described in es5.github.com/#A.1, these are
all IdentifierNames which do not exclude ReservedWords.
a.import
a['import']
a = { import: 'test' }.
On the other hand the following is illegal because it's an Identifier, which
is an IdentifierName without the reserved words. Identifiers are used
for FunctionDeclaration, FunctionExpression,
VariableDeclaration and so on. IdentifierNames are used
for MemberExpression, CallExpression and so on.
function import() {} // Illegal.
Literals
Null literal
See also null for more information.
null
Boolean literal
See also Boolean for more information.
true
false
Numeric literals
Decimal
1234567890
42
// Caution when using with a leading zero:

0888 // 888 parsed as decimal
0777 // parsed as octal, 511 in decimal
Note that decimal literals can start with a zero (0) followed by another
decimal digit, but If all digits after the leading 0 are smaller than 8, the
number is interpreted as an octal number. This won't throw in JavaScript,
see bug 957513. See also the page about parseInt().
Binary
Binary number syntax uses a leading zero followed by a lowercase or

uppercase Latin letter "B" (0b or 0B). Because this syntax is new in
ECMAScript 2015, see the browser compatibility table, below. If the digits
after the 0b are not 0 or 1, the following SyntaxError is thrown: "Missing
binary digits after 0b".
var FLT_SIGNBIT = 0b10000000000000000000000000000000; //
2147483648
var FLT_EXPONENT = 0b01111111100000000000000000000000; //
2139095040
var FLT_MANTISSA = 0B00000000011111111111111111111111; //
8388607
Octal
Octal number syntax uses a leading zero followed by a lowercase or

uppercase Latin letter "O" (0o or 0O). Because this syntax is new in
ECMAScript 2015, see the browser compatibility table, below. If the digits
after the 0o are outside the range (01234567), the
following SyntaxError is thrown: "Missing octal digits after 0o".
var n = 0O755; // 493
var m = 0o644; // 420
// Also possible with just a leading zero (see note about

decimals above)
0755
0644
Hexadecimal
Hexadecimal number syntax uses a leading zero followed by a lowercase

or uppercase Latin letter "X" (0x or 0X). If the digits after 0x are outside
the range (0123456789ABCDEF), the following SyntaxError is thrown:
"Identifier starts immediately after numeric literal".
0xFFFFFFFFFFFFFFFFF // 295147905179352830000
0x123456789ABCDEF // 81985529216486900
0XA // 10
Object literals
See also Object and Object initializer for more information.
var o = { a: 'foo', b: 'bar', c: 42 };
// shorthand notation. New in ES2015

var a = 'foo', b = 'bar', c = 42;
var o = {a, b, c};
// instead of
var o = { a: a, b: b, c: c };
Array literals
See also Array for more information.
[1954, 1974, 1990, 2014]
String literals
'foo'
"bar"
Hexadecimal escape sequences

'\xA9' // ""
Unicode escape sequences
The Unicode escape sequences consist of exactly four hexadecimal digits

following \u. Each one specifies a two-byte character in the UTF-16
encoding. For codepoints between 0 and FFFF, the digits are identical to
the codepoint. Higher codepoints require two escape sequences
representing the surrogate pair used to encode the character; the
surrogate pair is distinct from the codepoint.
'\u00A9' // ""
Unicode code point escapes
New in ECMAScript 2015. With Unicode code point escapes, any character
can be escaped using hexadecimal numbers so that it is possible to use
Unicode code points up to 0x10FFFF. With simple Unicode escapes it is
often necessary to write the surrogate halves separately to achieve the
same.
See
also String.fromCodePoint() or String.prototype.codePointAt().
'\u{2F804}'
// the same with simple Unicode escapes

'\uD87E\uDC04'
Regular expression literals

See also RegExp for more information.
/ab+c/g
// An "empty" regular expression literal

// The empty non-capturing group is necessary
// to avoid ambiguity with single-line comments.
/(?:)/
Template literals
See also template strings for more information.
`string text`
`string text line 1

string text line 2`
`string text ${expression} string text`
tag `string text ${expression} string text`
Automatic semicolon insertion

Some JavaScript statements must be terminated with semicolons and are
therefore affected by automatic semicolon insertion (ASI):
Empty statement
let, const, variable statement
import, export, module declaration
Expression statement
debugger
continue, break, throw
return
The ECMAScript specification mentions three rules of semicolon insertion.
1. A semicolon is inserted before, when a Line terminator or "}" is
encountered that is not allowed by the grammar.
{ 1 2 } 3
// is transformed by ASI into
{ 1 2 ;} 3;
2. A semicolon is inserted at the end, when the end of the input stream
of tokens is detected and the parser is unable to parse the single input
stream as a complete program.
Here ++ is not treated as a postfix operator applying to variable b,

because a line terminator occurs between b and ++.
a = b
++c
// is transformend by ASI into
a = b;
++c;
3. A semicolon is inserted at the end, when a statement with restricted
productions in the grammar is followed by a line terminator. These
statements with "no LineTerminator here" rules are:
PostfixExpressions (++ and --)

continue
break
return
yield, yield*
module
return
a + b
// is transformed by ASI into
return;
a + b;

JS Lexical Grammer

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

JS Lexical Grammer

Transféré par

Droits d'auteur :

Formats disponibles

This page describes JavaScript's lexical grammar.

The source text of

Unicode format-control characters

Line terminator characters

JavaScript has two ways of assigning comments in its code.

Future reserved keywords

These are always reserved:

Future reserved keywords in older standards

The following are reserved as future keywords by older ECMAScript

Reserved word usage

// Caution when using with a leading zero:

Binary number syntax uses a leading zero followed by a lowercase or

Octal number syntax uses a leading zero followed by a lowercase or

// Also possible with just a leading zero (see note about

Hexadecimal number syntax uses a leading zero followed by a lowercase

// shorthand notation. New in ES2015

Hexadecimal escape sequences

Unicode escape sequences

The Unicode escape sequences consist of exactly four hexadecimal digits

Unicode code point escapes

// the same with simple Unicode escapes

Regular expression literals

// An "empty" regular expression literal

`string text line 1

`string text ${expression} string text`

tag `string text ${expression} string text`

Automatic semicolon insertion

// is transformed by ASI into

Here ++ is not treated as a postfix operator applying to variable b,

// is transformend by ASI into

PostfixExpressions (++ and --)

// is transformed by ASI into

Vous aimerez peut-être aussi