feat(0.0.1): update spec to pair compiler features
This commit is contained in:
parent
9299d9ff0b
commit
cc565982a7
@ -33,8 +33,14 @@ The compiler will have 5 phases:
|
|||||||
|
|
||||||
## Source code representation
|
## Source code representation
|
||||||
|
|
||||||
Source code must be ASCII encoded. However, bytes inside string literals
|
Source code must be UTF-8 encoded.
|
||||||
are treated as-is, and send over to PHP without modification.
|
Any non-ASCII bytes appearing within a string literal in source code
|
||||||
|
carry their UTF-8 meaning into the content of the string,
|
||||||
|
the bytes are not modified by the compiler.
|
||||||
|
|
||||||
|
Furthermore, THP only recognizes LF as line terminator.
|
||||||
|
Using CRLF will lead to a compiler error.
|
||||||
|
|
||||||
|
|
||||||
## Grammar syntax
|
## Grammar syntax
|
||||||
|
|
||||||
@ -188,6 +194,10 @@ decimal_digit = "0".."9"
|
|||||||
binary_digit = "0" | "1"
|
binary_digit = "0" | "1"
|
||||||
octal_digit = "0".."7"
|
octal_digit = "0".."7"
|
||||||
hex_digit = "0".."9" | "a".."f" | "A".."F"
|
hex_digit = "0".."9" | "a".."f" | "A".."F"
|
||||||
|
|
||||||
|
operator_char = "+" | "-" | "=" | "*" | "!" | "/" | "|"
|
||||||
|
| "@" | "#" | "$" | "~" | "%" | "&" | "?"
|
||||||
|
| "<" | ">" | "^" | "." | ":"
|
||||||
```
|
```
|
||||||
|
|
||||||
## Tokens
|
## Tokens
|
||||||
@ -227,9 +237,31 @@ Float = decimal_digit+, ".", decimal_digit+, scientific_notation?
|
|||||||
scientific_notation = "e", ("+" | "-"), decimal_digit+
|
scientific_notation = "e", ("+" | "-"), decimal_digit+
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Identifier & Datatypes
|
||||||
|
|
||||||
|
```ebnf
|
||||||
|
Identifier = (underscore | lowercase_letter), identifier_letter*
|
||||||
|
|
||||||
|
identifier_letter = underscore | lowercase_letter | uppercase_letter | decimal_digit
|
||||||
|
```
|
||||||
|
|
||||||
|
```ebnf
|
||||||
|
Datatype = uppercase_letter, indentifier_letter*
|
||||||
|
```
|
||||||
|
|
||||||
|
## Operator
|
||||||
|
|
||||||
|
If 2 or more operator chars are together, they count as a single operator. That is,
|
||||||
|
`+-` always becomes a single token, not 2 `+` `-` tokens. The lexer is not aware of
|
||||||
|
any operator.
|
||||||
|
|
||||||
|
```ebnf
|
||||||
|
Operator = operator_char+
|
||||||
|
```
|
||||||
|
|
||||||
|
## Comments
|
||||||
|
|
||||||
|
At this time, only single line comments are allowed.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user