> For the complete documentation index, see [llms.txt](https://gocompiler.shizhz.me/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://gocompiler.shizhz.me/2.-golang-bian-yi-qi-ci-fa-fen-xi/2.3-chu-li-zi-fu.md).

# 2.3 处理字符

Go 支持 UTF-8, 意味着可以使用任意 UTF-8 编码的字符为变量命名，而负责从连续字节流中处理字符的逻辑在`source.go`中，主要的结构体是：

```go
type source struct {
	in   io.Reader                        // 输入来源
	errh func(line, col uint, msg string) // 错误处理函数

	buf       []byte // 保存从 in 里面读取到的字节的缓存
	ioerr     error  // 如果从 in 读取内容到 buf 时发生 io 错误，则存放在该字段内
	b, r, e   int    // buf 中的三个标记位置，用来读取下一个字符，以及在 buf 中内容解析完了之后重新从 in 中读取新的内容到 buf 中
	line, col uint   // 用来标记当前字符的源代码位置
	ch        rune   // 最近读取到的 UTF-8 编码的字符
	chw       int    // ch 的字节长度，ASCII 中的字符长度为 1, 中文一个字符长度大于 1
}
```

其中重要的函数有两个：

```go
func (s *source) nextch() { /* ... */ }

func (s *source) fill() { /* ... */ }
```

`nextch`用来读取下一个字符，参见源代码可以发现其依赖`unicode/utf8`，以便从 buf 中读取下一个 UTF-8 编码的字符；`fill` 用来更新缓存 buf。任何时候 source 都保存着最新的字符读取状态，所以 nextch 不会返回字符，而是改变 source 的内部字段，获取当前字符直接访问属性`source.ch`就可以了。

我们通过 UT 来测试一下该模块的逻辑：在相同目录下创建`source_test.go`文件，并创建如下方法：

```go
func TestSource(t *testing.T) { 
    var s source 
    var buf bytes.Buffer
    buf.WriteString("abcdeABCDE 中文来了“ _")
    
    s.init(&buf, func(line, col uint, msg string) {
        fmt.Printf("Error: [msg: %s, line: %d, col: %d]\n", msg, line, col)
    })
    
    s.fill()
    
    for s.r != s.e {
        s.nextch()
    
        fmt.Printf("%v,", s.ch)
    }
    fmt.Println()
}
```

运行该方法，可以发现每个中文字符都得到了合理的解析：

> &#x20;`97,98,99,100,101,65,66,67,68,69,32,20013,25991,26469,20102,8220,32,95,`