feat: update the language spec

This commit is contained in:
Araozu 2024-11-17 08:35:47 -05:00
parent a95346f7ee
commit 33e51cd5cc
2 changed files with 103 additions and 12 deletions

View File

@ -36,7 +36,7 @@ and RegExp:
; comments
syntax = concatenation
concatenation = alternation grouping
concatenation = production1, production2
alternation = "a" | "b"
| "c"
@ -65,8 +65,7 @@ The compiler consists of 5 common phases:
## Source Code representation
Source code is encoded in UTF-8, and a single UTF-8 codepoint is
a single character. As THP is implemented using the Rust programming
language, rules around Rust's UTF-8 usage are followed.
a single character.
## Basic characters
@ -88,9 +87,102 @@ uppercase_letter = "A".."Z"
```
## Whitespace & Automatic semicolon insertion
This section is being reworked on the Zig rewrite of the compiler.
THP is whitespace insensitive. However, THP has special rules
when handling statement termination in order to not use
semicolons.
Certain statements have clearly defined markers of termination.
For example, an `if` statement always has braces `{}`, so
the closing brace `}` is the terminator. The same with
parenthesis, square brackets, etc.
Other statements require a explicit terminator. For example,
the assignment statement:
<Code thpcode={`
val computation = 123 + 456 // how to detect if the statement ends here
* 789 // or extends up to here?
`} />
In other languages a semicolon would be used to signal the end of the
statement:
```c
int computation = 123 + 456
* 789;
```
THP does not use semicolons. Instead, THP has 1 strict rule and 1 exception
to the rule:
### All statements end with a newline
No matter the indentation, whitespace or others, every statement ends
with a newline.
<Code thpcode={`
val compute = 1 + 2 * 3 / 4
// statement ends here ↑
`} />
As mentioned before, this does not affect statements that have clear delimiters.
For example, the following code will work as expected:
<Code thpcode={`
val compute = my_function(
param1,
param2,
) / 64
// ↑ statement ends here
`} />
In a way, the parenthesis will "disable" the rule.
But how to have an statement span multiple lines?
### Exception: operator on the next line.
If the next line begins with any operator, the statement of the previous line
continues.
For example:
<Code thpcode={`
val computation = 123 + 456
* 789
// ↑ statement ends here, and there is a single statement
`} />
This is so no matter the indentation:
<Code thpcode={`
// weird indentation:
val computation = 123 + 456
* 789
// ↑ statement still ends here
`} />
What is important is that an operator begins the new line.
If the operator is left on the previous line, this will not work:
<Code thpcode={`
// statement ends here ↓, and now there is a syntax error (dangling operator)
val computation = 123 + 456 *
789
// ↑ this is a different statement
`} />
For this the parser must do look-ahead of 1 token. This is the only place the parser
does so.
## Whitespace
## Old Whitespace rules
THP is partially whitespace sensitive. It uses the following tokens: Indent, Dedent & NewLine
to determine when an expression spans multiple lines.

View File

@ -15,14 +15,15 @@ Int = hexadecimal_number
| decimal_number
hexadecimal_number = "0", ("x" | "X"), hexadecimal_digit+
decimal_number = decimal_digit+
octal_number = "0", ("o" | "O"), octal_digit+
binary_number = "0", "b", binary_digit +
binary_number = "0", ("b" | "B"), binary_digit+
decimal_number = "1".."9", decimal_digit*
```
<Code thpcode={`
12345
01234 // This is a decimal number, not an octal number
01234 // This is an error, a decimal number cant have
// leading zeroes
0o755 // This is octal
0b0110
0xff25
@ -31,17 +32,15 @@ binary_number = "0", "b", binary_digit +
`TODO`: Allow underscores `_` between any number: `1_000_000`.
`TODO`: Make it an error to have a number start with a leading zero,
to eliminate confusion with proper octal and legacy PHP octal.
## Float
```ebnf
Float = decimal_number, ".", decimal_number+, scientific_notation?
| decimal_number, scientific_notation
Float = decimal_digit+, ".", decimal_digit+, scientific_notation?
| decimal_digit+, scientific_notation
scientific_notation = "e", ("+" | "-"), decimal_number
scientific_notation = "e", ("+" | "-"), decimal_digit+
```
<Code thpcode={`