From 33e51cd5cc7f9da7989bd8459888ff63e8da7c2b Mon Sep 17 00:00:00 2001 From: Araozu Date: Sun, 17 Nov 2024 08:35:47 -0500 Subject: [PATCH] feat: update the language spec --- src/pages/spec/index.mdx | 100 ++++++++++++++++++++++++++++-- src/pages/spec/tokens/numbers.mdx | 15 +++-- 2 files changed, 103 insertions(+), 12 deletions(-) diff --git a/src/pages/spec/index.mdx b/src/pages/spec/index.mdx index 2307d86..64344e5 100644 --- a/src/pages/spec/index.mdx +++ b/src/pages/spec/index.mdx @@ -36,7 +36,7 @@ and RegExp: ; comments syntax = concatenation -concatenation = alternation grouping +concatenation = production1, production2 alternation = "a" | "b" | "c" @@ -65,8 +65,7 @@ The compiler consists of 5 common phases: ## Source Code representation Source code is encoded in UTF-8, and a single UTF-8 codepoint is -a single character. As THP is implemented using the Rust programming -language, rules around Rust's UTF-8 usage are followed. +a single character. ## Basic characters @@ -88,9 +87,102 @@ uppercase_letter = "A".."Z" ``` +## Whitespace & Automatic semicolon insertion + +This section is being reworked on the Zig rewrite of the compiler. + +THP is whitespace insensitive. However, THP has special rules +when handling statement termination in order to not use +semicolons. + +Certain statements have clearly defined markers of termination. +For example, an `if` statement always has braces `{}`, so +the closing brace `}` is the terminator. The same with +parenthesis, square brackets, etc. + +Other statements require a explicit terminator. For example, +the assignment statement: + + + +In other languages a semicolon would be used to signal the end of the +statement: + +```c +int computation = 123 + 456 +* 789; +``` + +THP does not use semicolons. Instead, THP has 1 strict rule and 1 exception +to the rule: + +### All statements end with a newline + +No matter the indentation, whitespace or others, every statement ends +with a newline. + + + +As mentioned before, this does not affect statements that have clear delimiters. +For example, the following code will work as expected: + + + +In a way, the parenthesis will "disable" the rule. + +But how to have an statement span multiple lines? + +### Exception: operator on the next line. + +If the next line begins with any operator, the statement of the previous line +continues. + +For example: + + + +This is so no matter the indentation: + + + +What is important is that an operator begins the new line. +If the operator is left on the previous line, this will not work: + + + +For this the parser must do look-ahead of 1 token. This is the only place the parser +does so. -## Whitespace + +## Old Whitespace rules THP is partially whitespace sensitive. It uses the following tokens: Indent, Dedent & NewLine to determine when an expression spans multiple lines. diff --git a/src/pages/spec/tokens/numbers.mdx b/src/pages/spec/tokens/numbers.mdx index cb17f6c..892efbc 100644 --- a/src/pages/spec/tokens/numbers.mdx +++ b/src/pages/spec/tokens/numbers.mdx @@ -15,14 +15,15 @@ Int = hexadecimal_number | decimal_number hexadecimal_number = "0", ("x" | "X"), hexadecimal_digit+ -decimal_number = decimal_digit+ octal_number = "0", ("o" | "O"), octal_digit+ -binary_number = "0", "b", binary_digit + +binary_number = "0", ("b" | "B"), binary_digit+ +decimal_number = "1".."9", decimal_digit* ```