feat: update the language spec
This commit is contained in:
parent
a95346f7ee
commit
33e51cd5cc
@ -36,7 +36,7 @@ and RegExp:
|
|||||||
; comments
|
; comments
|
||||||
|
|
||||||
syntax = concatenation
|
syntax = concatenation
|
||||||
concatenation = alternation grouping
|
concatenation = production1, production2
|
||||||
|
|
||||||
alternation = "a" | "b"
|
alternation = "a" | "b"
|
||||||
| "c"
|
| "c"
|
||||||
@ -65,8 +65,7 @@ The compiler consists of 5 common phases:
|
|||||||
## Source Code representation
|
## Source Code representation
|
||||||
|
|
||||||
Source code is encoded in UTF-8, and a single UTF-8 codepoint is
|
Source code is encoded in UTF-8, and a single UTF-8 codepoint is
|
||||||
a single character. As THP is implemented using the Rust programming
|
a single character.
|
||||||
language, rules around Rust's UTF-8 usage are followed.
|
|
||||||
|
|
||||||
|
|
||||||
## Basic characters
|
## Basic characters
|
||||||
@ -88,9 +87,102 @@ uppercase_letter = "A".."Z"
|
|||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Whitespace & Automatic semicolon insertion
|
||||||
|
|
||||||
|
This section is being reworked on the Zig rewrite of the compiler.
|
||||||
|
|
||||||
|
THP is whitespace insensitive. However, THP has special rules
|
||||||
|
when handling statement termination in order to not use
|
||||||
|
semicolons.
|
||||||
|
|
||||||
|
Certain statements have clearly defined markers of termination.
|
||||||
|
For example, an `if` statement always has braces `{}`, so
|
||||||
|
the closing brace `}` is the terminator. The same with
|
||||||
|
parenthesis, square brackets, etc.
|
||||||
|
|
||||||
|
Other statements require a explicit terminator. For example,
|
||||||
|
the assignment statement:
|
||||||
|
|
||||||
|
<Code thpcode={`
|
||||||
|
val computation = 123 + 456 // how to detect if the statement ends here
|
||||||
|
* 789 // or extends up to here?
|
||||||
|
`} />
|
||||||
|
|
||||||
|
In other languages a semicolon would be used to signal the end of the
|
||||||
|
statement:
|
||||||
|
|
||||||
|
```c
|
||||||
|
int computation = 123 + 456
|
||||||
|
* 789;
|
||||||
|
```
|
||||||
|
|
||||||
|
THP does not use semicolons. Instead, THP has 1 strict rule and 1 exception
|
||||||
|
to the rule:
|
||||||
|
|
||||||
|
### All statements end with a newline
|
||||||
|
|
||||||
|
No matter the indentation, whitespace or others, every statement ends
|
||||||
|
with a newline.
|
||||||
|
|
||||||
|
<Code thpcode={`
|
||||||
|
val compute = 1 + 2 * 3 / 4
|
||||||
|
// statement ends here ↑
|
||||||
|
`} />
|
||||||
|
|
||||||
|
As mentioned before, this does not affect statements that have clear delimiters.
|
||||||
|
For example, the following code will work as expected:
|
||||||
|
|
||||||
|
<Code thpcode={`
|
||||||
|
val compute = my_function(
|
||||||
|
param1,
|
||||||
|
param2,
|
||||||
|
) / 64
|
||||||
|
// ↑ statement ends here
|
||||||
|
`} />
|
||||||
|
|
||||||
|
In a way, the parenthesis will "disable" the rule.
|
||||||
|
|
||||||
|
But how to have an statement span multiple lines?
|
||||||
|
|
||||||
|
### Exception: operator on the next line.
|
||||||
|
|
||||||
|
If the next line begins with any operator, the statement of the previous line
|
||||||
|
continues.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
<Code thpcode={`
|
||||||
|
val computation = 123 + 456
|
||||||
|
* 789
|
||||||
|
// ↑ statement ends here, and there is a single statement
|
||||||
|
`} />
|
||||||
|
|
||||||
|
This is so no matter the indentation:
|
||||||
|
|
||||||
|
<Code thpcode={`
|
||||||
|
// weird indentation:
|
||||||
|
|
||||||
|
val computation = 123 + 456
|
||||||
|
* 789
|
||||||
|
// ↑ statement still ends here
|
||||||
|
`} />
|
||||||
|
|
||||||
|
What is important is that an operator begins the new line.
|
||||||
|
If the operator is left on the previous line, this will not work:
|
||||||
|
|
||||||
|
<Code thpcode={`
|
||||||
|
// statement ends here ↓, and now there is a syntax error (dangling operator)
|
||||||
|
val computation = 123 + 456 *
|
||||||
|
789
|
||||||
|
// ↑ this is a different statement
|
||||||
|
`} />
|
||||||
|
|
||||||
|
For this the parser must do look-ahead of 1 token. This is the only place the parser
|
||||||
|
does so.
|
||||||
|
|
||||||
|
|
||||||
## Whitespace
|
|
||||||
|
## Old Whitespace rules
|
||||||
|
|
||||||
THP is partially whitespace sensitive. It uses the following tokens: Indent, Dedent & NewLine
|
THP is partially whitespace sensitive. It uses the following tokens: Indent, Dedent & NewLine
|
||||||
to determine when an expression spans multiple lines.
|
to determine when an expression spans multiple lines.
|
||||||
|
@ -15,14 +15,15 @@ Int = hexadecimal_number
|
|||||||
| decimal_number
|
| decimal_number
|
||||||
|
|
||||||
hexadecimal_number = "0", ("x" | "X"), hexadecimal_digit+
|
hexadecimal_number = "0", ("x" | "X"), hexadecimal_digit+
|
||||||
decimal_number = decimal_digit+
|
|
||||||
octal_number = "0", ("o" | "O"), octal_digit+
|
octal_number = "0", ("o" | "O"), octal_digit+
|
||||||
binary_number = "0", "b", binary_digit +
|
binary_number = "0", ("b" | "B"), binary_digit+
|
||||||
|
decimal_number = "1".."9", decimal_digit*
|
||||||
```
|
```
|
||||||
|
|
||||||
<Code thpcode={`
|
<Code thpcode={`
|
||||||
12345
|
12345
|
||||||
01234 // This is a decimal number, not an octal number
|
01234 // This is an error, a decimal number cant have
|
||||||
|
// leading zeroes
|
||||||
0o755 // This is octal
|
0o755 // This is octal
|
||||||
0b0110
|
0b0110
|
||||||
0xff25
|
0xff25
|
||||||
@ -31,17 +32,15 @@ binary_number = "0", "b", binary_digit +
|
|||||||
|
|
||||||
`TODO`: Allow underscores `_` between any number: `1_000_000`.
|
`TODO`: Allow underscores `_` between any number: `1_000_000`.
|
||||||
|
|
||||||
`TODO`: Make it an error to have a number start with a leading zero,
|
|
||||||
to eliminate confusion with proper octal and legacy PHP octal.
|
|
||||||
|
|
||||||
|
|
||||||
## Float
|
## Float
|
||||||
|
|
||||||
```ebnf
|
```ebnf
|
||||||
Float = decimal_number, ".", decimal_number+, scientific_notation?
|
Float = decimal_digit+, ".", decimal_digit+, scientific_notation?
|
||||||
| decimal_number, scientific_notation
|
| decimal_digit+, scientific_notation
|
||||||
|
|
||||||
scientific_notation = "e", ("+" | "-"), decimal_number
|
scientific_notation = "e", ("+" | "-"), decimal_digit+
|
||||||
```
|
```
|
||||||
|
|
||||||
<Code thpcode={`
|
<Code thpcode={`
|
||||||
|
Loading…
Reference in New Issue
Block a user