307 lines
7.0 KiB
Markdown
307 lines
7.0 KiB
Markdown
# 完整的 TSLX 块匹配规则
|
||
|
||
## 基本原则
|
||
- 在 `<?tslx>` 之后进入 `tslx_block`,默认所有内容都是文本(`tslx_content`)
|
||
- Scanner 通过 lookahead 判断 `<?tsl` 和 `<?=` 的行为
|
||
|
||
---
|
||
|
||
## 匹配规则
|
||
|
||
### 规则 1:配对的 `<?tsl ... ?>`
|
||
**条件**:遇到 `<?tsl` 且能找到对应的 `?>`
|
||
**行为**:匹配为 `tsl_statement_block`
|
||
**Token 序列**:`tsl_statement_start_tag` → `statement...` → `tsl_statement_end_tag`
|
||
**结果**:`tslx_block` 继续
|
||
|
||
---
|
||
|
||
### 规则 2:配对的 `<?= ... ?>`
|
||
**条件**:遇到 `<?=` 且能找到对应的 `?>`
|
||
**行为**:匹配为 `tsl_expression_block`
|
||
**Token 序列**:`tsl_expression_start_tag` → `expression` → `tsl_expression_end_tag`
|
||
**结果**:`tslx_block` 继续
|
||
|
||
---
|
||
|
||
### 规则 3:未配对的 `<?tsl`(关键特性)
|
||
**条件**:遇到 `<?tsl` 但找不到对应的 `?>`
|
||
**行为**:
|
||
- Scanner 通过 lookahead 判断后面没有 `?>`
|
||
- **`<?tsl` 本身被识别为 `tslx_end_tag`**(不是生成额外的 token)
|
||
- `tslx_block` 结束
|
||
- Parser 跳出 `tslx_block`,继续解析后续内容(如 `a := 1;`)
|
||
|
||
**Token 序列**:
|
||
```
|
||
tslx_end_tag: "<?tsl" ← 这个 token 的文本内容就是 "<?tsl"
|
||
```
|
||
|
||
**结果**:`tslx_block` 结束,后续内容由 `root` 的其他规则处理
|
||
|
||
---
|
||
|
||
### 规则 4:未配对的 `<?=`(错误情况)
|
||
**条件**:遇到 `<?=` 但找不到对应的 `?>`
|
||
**行为**:
|
||
- Scanner 返回 `tsl_expression_start_tag`
|
||
- Parser 期望 `expression` 和 `tsl_expression_end_tag`
|
||
- 找不到 `?>` 导致**语法错误**
|
||
|
||
**结果**:Parser 报错
|
||
|
||
---
|
||
|
||
### 规则 5:到达 EOF
|
||
**条件**:到达文件末尾,且之前没有遇到未配对的 `<?tsl`
|
||
**行为**:在 EOF 位置生成 `tslx_end_tag`
|
||
**结果**:`tslx_block` 正常结束
|
||
|
||
---
|
||
|
||
## Scanner Lookahead 策略
|
||
|
||
Scanner 需要判断 `<?tsl` 或 `<?=` 后面是否有匹配的 `?>`:
|
||
|
||
### 推荐策略(需选择其一):
|
||
|
||
**策略 A:扫描到换行或分号**
|
||
```
|
||
<?tsl ← 向前扫描
|
||
a := 1; ← 遇到 ; 前没有 ?>,判定为未配对
|
||
```
|
||
- 优点:快速判断,符合单行语句的直觉
|
||
- 缺点:多行语句可能误判
|
||
|
||
**策略 B:扫描到下一个 `<?` 或 EOF**
|
||
```
|
||
<?tsl ← 向前扫描
|
||
a := 1;
|
||
<?tsl ... ← 遇到下一个 <? 前没有 ?>,判定为未配对
|
||
```
|
||
- 优点:支持多行语句
|
||
- 缺点:扫描距离可能较长
|
||
|
||
**策略 C:扫描固定距离(如 1000 字符)**
|
||
```
|
||
<?tsl ← 向前看 N 个字符
|
||
... ← 如果在范围内找不到 ?>,判定为未配对
|
||
```
|
||
- 优点:性能可控
|
||
- 缺点:可能误判超长的配对块
|
||
|
||
**当前实现**:_[填入你选择的策略]_
|
||
|
||
---
|
||
|
||
## 示例解析
|
||
|
||
### 示例 1:未配对的 `<?tsl`
|
||
|
||
#### 输入:
|
||
```typescript
|
||
|
||
aaaa
|
||
<?tsl
|
||
a := 1;
|
||
```
|
||
|
||
#### Token 序列:
|
||
```
|
||
1. tslx_tag "<?tslx>"
|
||
2. tslx_content "\naaaa\n"
|
||
3. tslx_end_tag "<?tsl" ← <?tsl 本身就是这个 token
|
||
4. identifier "a"
|
||
5. := ":="
|
||
6. number "1"
|
||
7. ; ";"
|
||
```
|
||
|
||
#### AST 结构:
|
||
```
|
||
root
|
||
├── tslx_block
|
||
│ ├── tslx_tag: "<?tslx>"
|
||
│ ├── tslx_content: "\naaaa\n"
|
||
│ └── tslx_end_tag: "<?tsl"
|
||
└── var_declaration
|
||
├── name: "a"
|
||
└── value: 1
|
||
```
|
||
|
||
#### 说明:
|
||
- `<?tsl` 触发 `tslx_block` 结束
|
||
- `a := 1;` 在 `tslx_block` **外部**,由 `root` 的 `var_declaration` 匹配
|
||
|
||
---
|
||
|
||
### 示例 2:全部配对 + EOF
|
||
|
||
#### 输入:
|
||
```typescript
|
||
|
||
aaaa
|
||
|
||
bbb
|
||
```
|
||
|
||
#### Token 序列:
|
||
```
|
||
1. tslx_tag "<?tslx>"
|
||
2. tslx_content "\naaaa\n"
|
||
3. tsl_statement_start_tag "<?tsl"
|
||
4. identifier "echo"
|
||
5. number "1"
|
||
6. ; ";"
|
||
7. tsl_statement_end_tag "?>"
|
||
8. tslx_content "\nbbb\n"
|
||
9. tslx_end_tag "" ← EOF 位置生成
|
||
```
|
||
|
||
#### AST 结构:
|
||
```
|
||
root
|
||
└── tslx_block
|
||
├── tslx_tag: "<?tslx>"
|
||
├── tslx_content: "\naaaa\n"
|
||
├── tsl_statement_block
|
||
│ └── expression_statement
|
||
│ └── call_expression: echo(1)
|
||
├── tslx_content: "\nbbb\n"
|
||
└── tslx_end_tag: (EOF)
|
||
```
|
||
|
||
---
|
||
|
||
### 示例 3:未配对的 `<?=`(错误)
|
||
|
||
#### 输入:
|
||
```typescript
|
||
|
||
<?=
|
||
a + 1
|
||
```
|
||
|
||
#### Token 序列:
|
||
```
|
||
1. tslx_tag "<?tslx>"
|
||
2. tslx_content "\n"
|
||
3. tsl_expression_start_tag "<?="
|
||
4. identifier "a"
|
||
5. + "+"
|
||
6. number "1"
|
||
[ERROR] 期望 tsl_expression_end_tag (?>),但遇到 EOF
|
||
```
|
||
|
||
#### 结果:
|
||
**语法错误**:`<?=` 必须有匹配的 `?>`
|
||
|
||
---
|
||
|
||
### 示例 4:混合使用
|
||
|
||
#### 输入:
|
||
```typescript
|
||
|
||
text1
|
||
|
||
text2
|
||
<?= 1 + 1 ?>
|
||
text3
|
||
<?tsl
|
||
var x := 1;
|
||
```
|
||
|
||
#### Token 序列:
|
||
```
|
||
1. tslx_tag "<?tslx>"
|
||
2. tslx_content "\ntext1\n"
|
||
3. tsl_statement_start_tag "<?tsl"
|
||
4. ... (echo "hello")
|
||
5. tsl_statement_end_tag "?>"
|
||
6. tslx_content "\ntext2\n"
|
||
7. tsl_expression_start_tag "<?="
|
||
8. ... (1 + 1)
|
||
9. tsl_expression_end_tag "?>"
|
||
10. tslx_content "\ntext3\n"
|
||
11. tslx_end_tag "<?tsl" ← 未配对,结束 tslx_block
|
||
12. identifier "var"
|
||
13. ... (x := 1)
|
||
```
|
||
|
||
#### AST 结构:
|
||
```
|
||
root
|
||
├── tslx_block
|
||
│ ├── tslx_tag
|
||
│ ├── tslx_content: "\ntext1\n"
|
||
│ ├── tsl_statement_block: echo("hello")
|
||
│ ├── tslx_content: "\ntext2\n"
|
||
│ ├── tsl_expression_block: 1 + 1
|
||
│ ├── tslx_content: "\ntext3\n"
|
||
│ └── tslx_end_tag: "<?tsl"
|
||
└── var_declaration: x := 1
|
||
```
|
||
|
||
---
|
||
|
||
## Grammar.js 修改建议
|
||
|
||
### 当前定义:
|
||
```javascript
|
||
tslx_block: ($) =>
|
||
prec.right(
|
||
seq(
|
||
$.tslx_tag,
|
||
repeat(
|
||
choice(
|
||
$.tslx_content,
|
||
$._tsl_statement_block,
|
||
$._tsl_expression_block,
|
||
$.tslx_end_tag // ❌ 问题:允许 end_tag 后继续匹配
|
||
),
|
||
),
|
||
),
|
||
),
|
||
```
|
||
|
||
### 建议修改:
|
||
```javascript
|
||
tslx_block: ($) =>
|
||
prec.right(
|
||
seq(
|
||
$.tslx_tag,
|
||
repeat(
|
||
choice(
|
||
$.tslx_content,
|
||
$._tsl_statement_block,
|
||
$._tsl_expression_block,
|
||
),
|
||
),
|
||
$.tslx_end_tag // ✅ 移到 repeat 外面,作为结束标记
|
||
),
|
||
),
|
||
```
|
||
|
||
**理由**:
|
||
- `tslx_end_tag` 是 `tslx_block` 的**终结符**,不应该在循环体内
|
||
- 这样确保 `tslx_end_tag` 后不会再匹配 `tslx_content` 等内容
|
||
|
||
---
|
||
|
||
## 核心要点总结
|
||
|
||
1. ✅ `<?tsl` 有**双重身份**:
|
||
- 有 `?>` → `tsl_statement_start_tag`
|
||
- 无 `?>` → `tslx_end_tag`
|
||
|
||
2. ✅ `<?=` **必须配对**,否则语法错误
|
||
|
||
3. ✅ `<?tsl` 作为 `tslx_end_tag` 时,其文本内容就是 `"<?tsl"`
|
||
|
||
4. ✅ Scanner 通过 **lookahead** 判断是否有匹配的 `?>`
|
||
|
||
5. ✅ `tslx_end_tag` 在 Grammar 中应该是 `tslx_block` 的**终结符**
|
||
|
||
6. ✅ EOF 时自动生成 `tslx_end_tag` 来正常结束 `tslx_block`
|