Go Context 源码阅读

2025-10-31 · 3215 words · ~ 16 min read

Go 1.7 将 context 包引入标准库，解决的核心问题是：在一次请求的调用链上，跨 goroutine 传递截止时间、取消信号和请求范围的键值数据。本文基于 Go 1.26.3 的源码（src/context/context.go）逐层拆解其实现，不跳过任何细节。

一、Context 接口

1
2
3
4
5
6


type Context interface {
    Deadline() (deadline time.Time, ok bool)
    Done() <-chan struct{}
    Err() error
    Value(key any) any
}

四个方法的语义和约定：

Deadline 返回截止时间。ok == false 表示未设置截止时间。多次调用结果相同，调用方可以缓存结果。

Done 返回一个只读 channel。context 被取消时，这个 channel 会被关闭，所有在它上面等待的 goroutine 都会收到信号。有几个特殊情况：

Background() 和 TODO() 的 Done() 永远返回 nil，表示永不取消。
channel 的关闭可能在 cancel 函数返回之后异步发生（源码注释原文：The close of the Done channel may happen asynchronously, after the cancel function returns）。
在 select 语句里使用 Done() 是标准模式，因为对 nil channel 的接收会永久阻塞，不会误触发。

Err 在 Done 未关闭时返回 nil；关闭后返回非 nil 错误，值只有两种：Canceled 或 DeadlineExceeded。一旦返回非 nil，之后每次调用都返回同一个值。

Value 按 key 在 context 链上查找值，未找到返回 nil。key 必须是可比较类型，官方建议用包内私有类型避免命名冲突。

两个预定义错误

1
2
3
4
5
6
7
8
9


var Canceled = errors.New("context canceled")

var DeadlineExceeded error = deadlineExceededError{}

type deadlineExceededError struct{}

func (deadlineExceededError) Error() string   { return "context deadline exceeded" }
func (deadlineExceededError) Timeout() bool   { return true }
func (deadlineExceededError) Temporary() bool { return true }

DeadlineExceeded 不是普通的 errors.New，而是一个实现了 Timeout() bool 和 Temporary() bool 的自定义类型。这是为了与 net.Error 接口兼容——网络相关的代码通常用这两个方法区分"超时可以重试"和"永久错误"，让 context 超时的处理逻辑可以复用已有的网络错误处理代码。

二、根 Context：Background 和 TODO

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


type emptyCtx struct{}

func (emptyCtx) Deadline() (deadline time.Time, ok bool) { return }
func (emptyCtx) Done() <-chan struct{}                    { return nil }
func (emptyCtx) Err() error                              { return nil }
func (emptyCtx) Value(key any) any                       { return nil }

type backgroundCtx struct{ emptyCtx }
type todoCtx struct{ emptyCtx }

func (backgroundCtx) String() string { return "context.Background" }
func (todoCtx) String() string       { return "context.TODO" }

func Background() Context { return backgroundCtx{} }
func TODO() Context       { return todoCtx{} }

几个设计细节：

emptyCtx 是空结构体，零内存占用，Background() 和 TODO() 每次调用都返回值类型，不分配堆内存。

backgroundCtx 和 todoCtx 都嵌入 emptyCtx，实现完全相同，区别纯粹在语义：

Background 是所有 context 树的根，用于 main、测试、服务入口。
TODO 是占位符，表示调用方还没确定该传哪个 context，方便静态分析工具（如 go vet）识别出"这里的 context 传递还没处理好"。

两者都实现了 String() 方法，是为了在调试输出时能打印出可读的名称，而不是内存地址。

三、cancelCtx：取消机制的核心

3.1 相关类型定义

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


// canceler 是一个内部接口，只有 *cancelCtx 和 *timerCtx 实现它
type canceler interface {
    cancel(removeFromParent bool, err, cause error)
    Done() <-chan struct{}
}

// cancelCtxKey 是 cancelCtx 用来暴露自身的 sentinel key
var cancelCtxKey int

// closedchan 是一个包级别预先关闭的 channel，用于优化
var closedchan = make(chan struct{})
func init() { close(closedchan) }

canceler 接口是内部接口（小写），外部无法实现，专门用于取消树的管理。

cancelCtxKey 是个 int 类型的包级变量，但它的实际用途是通过 &cancelCtxKey（指针）作为 key。由于指针是唯一的，不可能被外部包碰撞，cancelCtx.Value(&cancelCtxKey) 会返回 cancelCtx 自身，这是内部遍历取消树的核心机制，后面 parentCancelCtx 会用到。

closedchan 是一个在 init 里关闭的全局 channel。它存在的目的是：当 cancelCtx 被取消时，如果 Done() 从未被调用过（即 done 字段还是 nil），就直接把 closedchan 存进 done 字段，省去新建 channel 再关闭的开销。

3.2 cancelCtx 结构体

1
2
3
4
5
6
7
8
9


type cancelCtx struct {
    Context                              // 父 context，propagateCancel 会设置这个字段

    mu       sync.Mutex                 // 保护下面所有字段
    done     atomic.Value               // 实际类型是 chan struct{}，懒初始化
    children map[canceler]struct{}      // 子 context 集合，第一次取消后置 nil
    err      atomic.Value               // 被取消时存入 Canceled 或 DeadlineExceeded
    cause    error                      // 由 WithCancelCause 设置，普通取消时等于 err
}

每个字段的职责：

Context（嵌入字段）：指向父 context。查找值时委托给它，取消时也通过它找到祖先节点。注意这是接口类型，不是 *cancelCtx，因此可以嵌入任意 context 实现。

mu：一把 mutex，保护 done、children、err、cause 这四个字段的一致性。err 虽然用了 atomic.Value，但写入时仍在 mu 的保护下，保证"设置 err → 关闭 done channel → 清空 children"三步作为一个原子事务执行。

done：存的是 chan struct{}，用 atomic.Value 而不是直接声明是为了让读取路径可以不加锁（原子读），降低竞争。懒初始化——只有第一次调用 Done() 时才创建 channel，对于只用作超时控制、不需要监听 Done channel 的场景，节省了一次分配。

children：用 map 存子节点。map 的 key 是 canceler 接口值（实际是 *cancelCtx 或 *timerCtx 指针），value 是空结构体。取消时遍历这个 map 递归取消所有子节点，然后置 nil 释放内存。

err：同样用 atomic.Value，允许 Err() 方法在不加锁的情况下做快速检查。

cause：由 WithCancelCause 引入，普通 WithCancel 时 cause 等于 err，WithCancelCause 时可以存入调用方指定的任意 error。

3.3 WithCancel 和 withCancel

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


func WithCancel(parent Context) (ctx Context, cancel CancelFunc) {
    c := withCancel(parent)
    return c, func() { c.cancel(true, Canceled, nil) }
}

func withCancel(parent Context) *cancelCtx {
    if parent == nil {
        panic("cannot create context from nil parent")
    }
    c := &cancelCtx{}
    c.propagateCancel(parent, c)
    return c
}

WithCancel 拆成两层的原因：withCancel（小写）返回 *cancelCtx，供 WithCancelCause、timerCtx 内部复用；WithCancel（大写）是对外 API，把 *cancelCtx 包进接口类型返回，同时构造 CancelFunc 闭包。

CancelFunc 闭包调用 c.cancel(true, Canceled, nil)，第一个参数 true 表示"取消时把自己从父节点的 children map 里删除"。

3.4 cancelCtx.Value()

1
2
3
4
5
6


func (c *cancelCtx) Value(key any) any {
    if key == &cancelCtxKey {
        return c
    }
    return value(c.Context, key)
}

当 key 是 &cancelCtxKey 时，返回 c 自身（即 *cancelCtx）。这是一个自描述机制：propagateCancel 通过 parent.Value(&cancelCtxKey) 向上查找最近的 *cancelCtx，然后把子节点挂到它的 children map 上，实现高效的取消传播。其他 key 则委托给父 context 继续查找。

3.5 cancelCtx.Done()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


func (c *cancelCtx) Done() <-chan struct{} {
    d := c.done.Load()
    if d != nil {
        return d.(chan struct{})
    }
    c.mu.Lock()
    defer c.mu.Unlock()
    d = c.done.Load()
    if d == nil {
        d = make(chan struct{})
        c.done.Store(d)
    }
    return d.(chan struct{})
}

这是经典的双检锁（double-checked locking）模式，分三步：

第一步：原子读，无锁快路径。如果 channel 已经存在（大多数调用都是这个情况），直接返回，不需要加锁。

第二步：加锁，再检查一次。两次检查之间可能有其他 goroutine 抢先创建了 channel，第二次检查防止重复创建。

第三步：确认 nil 后才创建并存储。

为什么用 atomic.Value 存 channel，而不是直接声明 chan struct{}？因为读 channel 变量本身不是原子的（64 位指针在某些架构上需要对齐才能原子读写），用 atomic.Value 确保跨平台安全。

3.6 cancelCtx.Err()

1
2
3
4
5
6
7
8
9


func (c *cancelCtx) Err() error {
    // An atomic load is ~5x faster than a mutex, which can matter in tight loops.
    if err := c.err.Load(); err != nil {
        // Ensure the done channel has been closed before returning a non-nil error.
        <-c.Done()
        return err.(error)
    }
    return nil
}

Err() 的快路径是原子读，不加锁，比加 mutex 快约 5 倍——这在循环中反复检查 context 是否取消的场景（如处理流式数据）中有实际意义。

<-c.Done() 这一行看起来奇怪，实际上是在等待 done channel 关闭。cancel() 方法的执行顺序是：先设置 err，再关闭 channel。如果 Err() 在 err 设置之后、channel 关闭之前被调用，就会出现"err 非 nil 但 Done() 还没关闭"的短暂窗口。<-c.Done() 确保调用方拿到非 nil error 时，Done channel 已经处于关闭状态，两者的状态是一致的。

因为 err 只会被设置一次，<-c.Done() 要么立即返回（channel 已关闭），要么等一极短的时间直到 cancel() 关闭它，不会永久阻塞。

3.7 propagateCancel：取消信号向下传播

这是整个包里最复杂的函数，负责建立父子 context 之间的取消关系：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56


func (c *cancelCtx) propagateCancel(parent Context, child canceler) {
    c.Context = parent  // 设置父节点，Value() 查找时会沿此向上遍历

    done := parent.Done()
    if done == nil {
        return // 父 context 永不取消（如 Background），无需注册
    }

    // 非阻塞地检查父 context 是否已经被取消
    select {
    case <-done:
        child.cancel(false, parent.Err(), Cause(parent))
        return
    default:
    }

    // 路径一：父节点是标准的 *cancelCtx
    if p, ok := parentCancelCtx(parent); ok {
        p.mu.Lock()
        if err := p.err.Load(); err != nil {
            // 加锁后再次检查，防止在上面的 select 检查之后、加锁之前父节点被取消
            child.cancel(false, err.(error), p.cause)
        } else {
            if p.children == nil {
                p.children = make(map[canceler]struct{})
            }
            p.children[child] = struct{}{}
        }
        p.mu.Unlock()
        return
    }

    // 路径二：父节点实现了 AfterFunc 方法
    if a, ok := parent.(afterFuncer); ok {
        c.mu.Lock()
        stop := a.AfterFunc(func() {
            child.cancel(false, parent.Err(), Cause(parent))
        })
        c.Context = stopCtx{
            Context: parent,
            stop:    stop,
        }
        c.mu.Unlock()
        return
    }

    // 路径三：兜底，启动一个 goroutine 监听
    goroutines.Add(1)
    go func() {
        select {
        case <-parent.Done():
            child.cancel(false, parent.Err(), Cause(parent))
        case <-child.Done():
        }
    }()
}

三条路径按优先级递减：

路径一（最常见）：父节点是 *cancelCtx 或者基于它派生的 context。直接把 child 加入父节点的 children map，开销极低。parentCancelCtx 的判断见下一节。

注意这里有一个二次检查：在外层 select 检查父节点未取消，到加锁进入路径一，中间有一个空隙，父节点可能在这个空隙里被取消。加锁后用 p.err.Load() 再检查一次，如果已经被取消就直接取消子节点。这是正确处理并发的关键。

路径二：父节点实现了 afterFuncer 接口（即有 AfterFunc(func()) func() bool 方法）。通过注册回调来监听取消，不需要额外 goroutine。注意这里把 c.Context 改写成了 stopCtx，stopCtx 包装了父 context 和注销回调的函数，这样 removeChild 时可以调用 stop() 清理注册的回调。

路径三（最贵）：父节点是自定义 context 实现，没有标准的 *cancelCtx 可以挂靠，也没有 AfterFunc 接口。只能启动一个专门的 goroutine，同时监听父节点和子节点的 Done channel，取决于谁先关闭。如果父先关闭，取消子节点；如果子先关闭（如子节点先被主动取消），goroutine 直接退出，不泄露。

3.8 parentCancelCtx：识别标准 cancelCtx

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


func parentCancelCtx(parent Context) (*cancelCtx, bool) {
    done := parent.Done()
    if done == closedchan || done == nil {
        return nil, false  // 已关闭或永不关闭，无需处理
    }
    p, ok := parent.Value(&cancelCtxKey).(*cancelCtx)
    if !ok {
        return nil, false  // 链上没有 *cancelCtx
    }
    pdone, _ := p.done.Load().(chan struct{})
    if pdone != done {
        return nil, false  // channel 不一致，说明父节点被自定义 context 包装并替换了 Done channel
    }
    return p, true
}

这个函数判断 parent 是否是"可以直接操作 children map 的 *cancelCtx"。

第一步：parent.Done() 已是 closedchan（已取消）或 nil（永不取消），都不需要注册，直接返回 false。

第二步：parent.Value(&cancelCtxKey) 沿 context 链向上找到最近的 *cancelCtx。如果整条链上都没有 cancelCtx，返回 false。

第三步（关键）：比较 p.done.Load() 和 parent.Done() 返回的 channel 是否是同一个。如果自定义 context 包装了某个 cancelCtx 但是重写了 Done() 方法（返回不同的 channel），这两个值就不相等。不相等说明取消信号已经被拦截，不能绕过自定义实现直接操作 children map，必须走路径三的 goroutine 监听。

3.9 cancelCtx.cancel()：执行取消

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


func (c *cancelCtx) cancel(removeFromParent bool, err, cause error) {
    if err == nil {
        panic("context: internal error: missing cancel error")
    }
    if cause == nil {
        cause = err  // 未指定 cause 时，cause 默认等于 err
    }
    c.mu.Lock()
    if c.err.Load() != nil {
        c.mu.Unlock()
        return  // 幂等：已经被取消，直接返回
    }
    c.err.Store(err)   // 1. 存入错误，Err() 开始返回非 nil
    c.cause = cause    // 2. 存入原因
    d, _ := c.done.Load().(chan struct{})
    if d == nil {
        c.done.Store(closedchan)  // Done() 从未被调用过，直接存入预关闭 channel
    } else {
        close(d)  // 关闭已存在的 channel，唤醒所有等待者
    }
    for child := range c.children {
        // 注意：持有父节点锁的同时会获取子节点锁
        child.cancel(false, err, cause)  // 递归取消所有子节点
    }
    c.children = nil  // 清空 map，让 GC 回收子节点
    c.mu.Unlock()

    if removeFromParent {
        removeChild(c.Context, c)  // 把自己从父节点的 children map 删除
    }
}

执行顺序的每一步都有原因：

为什么先 Store(err) 再关闭 channel？Err() 的实现里，看到 err 非 nil 后会调用 <-c.Done() 等待 channel 关闭，所以写 err 先于关闭 channel 是正确的顺序。反过来不行，否则 Err() 可能在 channel 关闭后还看不到 err，返回 nil，出现不一致。

为什么用 closedchan 优化？如果 Done() 从来没被调用，done 字段是 nil。与其为了取消操作专门创建一个 channel 再立刻关闭它，不如直接存入预先关闭的 closedchan。后续调用 Done() 时会返回这个已关闭的 channel，接收立即返回。

为什么子节点 cancel 传 false 而不是 true？父节点取消时，会把自己的 children 置 nil，子节点不需要再去父节点的 map 里删自己（那个 map 马上就会被清空）。传 false 省去了子节点调用 removeChild 的开销。

为什么 removeChild 在解锁之后调用？removeChild 需要获取父节点的锁，如果在持有自身锁的情况下调用，会形成"子持有自身锁 → 试图获取父锁"的锁序，可能与"父持有父锁 → 试图获取子锁（递归取消时）“发生死锁。解锁后再调 removeChild 避免了这个问题。

3.10 removeChild：从父节点摘除自己

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


func removeChild(parent Context, child canceler) {
    if s, ok := parent.(stopCtx); ok {
        s.stop()  // 通过路径二注册的，用 stop() 注销回调
        return
    }
    p, ok := parentCancelCtx(parent)
    if !ok {
        return  // 父节点不是标准 cancelCtx（路径三的 goroutine 会自行退出）
    }
    p.mu.Lock()
    if p.children != nil {
        delete(p.children, child)
    }
    p.mu.Unlock()
}

removeChild 对应三条路径的清理：

路径二（AfterFunc）：parent 是 stopCtx，调用 stop() 注销回调。
路径一（children map）：从父节点的 children map 删除自己。
路径三（goroutine）：什么都不需要做，goroutine 监听到子节点 Done 后会自动退出。

四、timerCtx：带截止时间的 context

4.1 结构体

1
2
3
4
5
6
7
8
9


type timerCtx struct {
    cancelCtx             // 嵌入 cancelCtx，复用取消机制
    timer    *time.Timer  // 在 cancelCtx.mu 的保护下访问
    deadline time.Time
}

func (c *timerCtx) Deadline() (deadline time.Time, ok bool) {
    return c.deadline, true  // ok 永远是 true
}

timerCtx 通过嵌入 cancelCtx 复用了所有取消逻辑，自己只新增了 timer 和 deadline 两个字段，并覆盖了 Deadline() 方法。

4.2 WithDeadline 和 WithDeadlineCause

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


func WithDeadline(parent Context, d time.Time) (Context, CancelFunc) {
    return WithDeadlineCause(parent, d, nil)
}

func WithDeadlineCause(parent Context, d time.Time, cause error) (Context, CancelFunc) {
    if parent == nil {
        panic("cannot create context from nil parent")
    }
    // 优化：如果父节点的截止时间比 d 更早，新的 deadline 永远不会触发
    // 直接退化为 WithCancel，不创建计时器
    if cur, ok := parent.Deadline(); ok && cur.Before(d) {
        return WithCancel(parent)
    }
    c := &timerCtx{deadline: d}
    c.cancelCtx.propagateCancel(parent, c)
    dur := time.Until(d)
    if dur <= 0 {
        // 截止时间已经过去，立刻取消
        c.cancel(true, DeadlineExceeded, cause)
        return c, func() { c.cancel(false, Canceled, nil) }
    }
    c.mu.Lock()
    defer c.mu.Unlock()
    if c.err.Load() == nil {
        // 用 time.AfterFunc 在截止时间触发取消
        c.timer = time.AfterFunc(dur, func() {
            c.cancel(true, DeadlineExceeded, cause)
        })
    }
    return c, func() { c.cancel(true, Canceled, nil) }
}

父节点 deadline 更早时退化为 WithCancel：这是一个重要优化。子节点的 deadline 如果比父节点晚，父节点超时时子节点也会被取消（通过 children map），子节点的计时器从来不会先于父节点触发。与其创建一个永远不会用到的计时器，不如退化为 WithCancel，省掉 timer 字段的分配和清理。

propagateCancel 在创建 timer 之前调用，所以如果父节点已经被取消，propagateCancel 会立刻取消子节点，后续的 c.err.Load() == nil 检查就会失败，不会启动计时器。

cancel 函数的区别：计时器触发调用 c.cancel(true, DeadlineExceeded, cause)，错误是 DeadlineExceeded；手动调用 cancel 函数调用 c.cancel(true, Canceled, nil)，错误是 Canceled。调用方可以通过 ctx.Err() 区分超时还是手动取消。

4.3 timerCtx.cancel()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


func (c *timerCtx) cancel(removeFromParent bool, err, cause error) {
    c.cancelCtx.cancel(false, err, cause)  // 先执行父类的取消逻辑（关闭 channel，取消子节点）
    if removeFromParent {
        removeChild(c.cancelCtx.Context, c)
    }
    c.mu.Lock()
    if c.timer != nil {
        c.timer.Stop()  // 停掉计时器，防止触发已经无意义的取消
        c.timer = nil   // 置 nil 让 GC 回收 Timer
    }
    c.mu.Unlock()
}

先调 cancelCtx.cancel（第一个参数传 false，因为 timerCtx.cancel 自己负责 removeChild），再停计时器。

为什么手动 cancel 也要停计时器？如果手动调用了 cancel，context 已经被取消。但计时器还在跑，到期后会再次调用 c.cancel，进入 cancel() 开头的幂等检查后直接返回，功能上没问题，但浪费资源。timer.Stop() 提前释放计时器。

4.4 WithTimeout

1
2
3
4
5
6
7


func WithTimeout(parent Context, timeout time.Duration) (Context, CancelFunc) {
    return WithDeadline(parent, time.Now().Add(timeout))
}

func WithTimeoutCause(parent Context, timeout time.Duration, cause error) (Context, CancelFunc) {
    return WithDeadlineCause(parent, time.Now().Add(timeout), cause)
}

WithTimeout 是 WithDeadline 的薄封装，把相对时间转换为绝对时间。两者等价，选哪个取决于调用方知道的是"从现在起多久"还是"到哪个时刻”。

五、valueCtx：携带键值的 context

5.1 结构体和 WithValue

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


type valueCtx struct {
    Context        // 父 context，Value() 未找到时委托给它
    key, val any
}

func WithValue(parent Context, key, val any) Context {
    if parent == nil {
        panic("cannot create context from nil parent")
    }
    if key == nil {
        panic("nil key")
    }
    if !reflectlite.TypeOf(key).Comparable() {
        panic("key is not comparable")
    }
    return &valueCtx{parent, key, val}
}

WithValue 的校验：key 不能是 nil，必须是可比较类型（否则不能用 == 比较，查找时会 panic）。官方建议用包内私有类型（不是 string 或内建类型）作为 key，防止不同包的 key 命名冲突。

1
2
3
4
5
6


func (c *valueCtx) Value(key any) any {
    if c.key == key {
        return c.val
    }
    return value(c.Context, key)  // 未命中，委托给父节点
}

每个 WithValue 调用产生一个新的 valueCtx 节点，多次调用形成一条链：

1

Background -> valueCtx{key=A} -> valueCtx{key=B} -> cancelCtx -> valueCtx{key=C}

5.2 value() 函数：链式查找

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


func value(c Context, key any) any {
    for {
        switch ctx := c.(type) {
        case *valueCtx:
            if key == ctx.key {
                return ctx.val
            }
            c = ctx.Context  // 未命中，继续向上
        case *cancelCtx:
            if key == &cancelCtxKey {
                return c  // cancelCtx 自描述
            }
            c = ctx.Context
        case withoutCancelCtx:
            if key == &cancelCtxKey {
                return nil  // WithoutCancel 阻断了 Cause 的向上追溯
            }
            c = ctx.c
        case *timerCtx:
            if key == &cancelCtxKey {
                return &ctx.cancelCtx
            }
            c = ctx.Context
        case backgroundCtx, todoCtx:
            return nil  // 到达根节点，未找到
        default:
            return c.Value(key)  // 自定义 context，委托给它自己的 Value()
        }
    }
}

用循环而不是递归：Go 没有尾调用优化，链很长时递归会消耗大量栈空间。循环版本只用常数栈。

这个函数是整个链的遍历中心：所有标准 context 类型都在这里 switch，避免每个类型各自实现一遍向上委托。default 分支兜底处理自定义 context。

withoutCancelCtx 对 &cancelCtxKey 返回 nil：这实现了"WithoutCancel 切断 Cause 追溯"的语义。Cause() 函数通过 c.Value(&cancelCtxKey) 找 cancelCtx，遇到 withoutCancelCtx 后返回 nil，Cause 就返回 nil，表示这个 context 没有取消原因。

查找的时间复杂度是 O(n)，n 是链的长度。这也是官方文档反复强调不要用 context 传递可选参数的原因：链越长，每次 Value() 调用越慢。

六、Go 1.20+ 新增功能

6.1 WithCancelCause 和 Cause()

1
2
3
4
5
6


type CancelCauseFunc func(cause error)

func WithCancelCause(parent Context) (ctx Context, cancel CancelCauseFunc) {
    c := withCancel(parent)
    return c, func(cause error) { c.cancel(true, Canceled, cause) }
}

与 WithCancel 的唯一区别：cancel 函数接受一个 error 作为 cause，调用时传入自定义错误。ctx.Err() 仍然返回 Canceled（语义不变），但 context.Cause(ctx) 返回调用方传入的 cause。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


func Cause(c Context) error {
    err := c.Err()
    if err == nil {
        return nil  // context 未被取消
    }
    if cc, ok := c.Value(&cancelCtxKey).(*cancelCtx); ok {
        cc.mu.Lock()
        cause := cc.cause
        cc.mu.Unlock()
        if cause != nil {
            return cause
        }
    }
    return err  // 没有找到 cancelCtx 或 cause 为 nil，退回返回 Err()
}

Cause 通过 &cancelCtxKey 找到最近的 *cancelCtx，加锁读取 cause。为什么要加锁？cause 字段是普通的 error，不是 atomic.Value，读取时必须持有锁。

使用场景：区分"用户主动取消"、“超时”、“下游服务错误触发取消"等不同原因：

1
2
3
4
5


ctx, cancel := context.WithCancelCause(parent)
defer cancel(nil)  // 正常结束时传 nil，Cause 返回 Canceled

// 如果是因为下游错误取消：
cancel(fmt.Errorf("downstream service %s failed: %w", svc, err))

6.2 WithoutCancel（Go 1.21）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


type withoutCancelCtx struct {
    c Context
}

func WithoutCancel(parent Context) Context {
    if parent == nil {
        panic("cannot create context from nil parent")
    }
    return withoutCancelCtx{parent}
}

func (withoutCancelCtx) Deadline() (deadline time.Time, ok bool) { return }
func (withoutCancelCtx) Done() <-chan struct{}                    { return nil }
func (withoutCancelCtx) Err() error                              { return nil }
func (c withoutCancelCtx) Value(key any) any                     { return value(c, key) }

withoutCancelCtx 的 Done() 永远返回 nil，Err() 永远返回 nil——即使父 context 被取消，这个 context 也感知不到。但 Value() 调用的是全局的 value() 函数，可以正常向上查找键值，继承父 context 里的所有数据（如 trace ID）。

注意 Value 传入的是 c（即 withoutCancelCtx 本身），而不是 c.c（父 context）。这是因为 value() 里对 withoutCancelCtx 有专门处理：遇到 &cancelCtxKey 时返回 nil，切断了 Cause 的追溯链。

net/http transport 在建立连接时用到了它（src/net/http/transport.go）：

1
2
3
4
5
6


// 注释原文：
// Detach from the request context's cancellation signal.
// The dial should proceed even if the request is canceled,
// because a future request may be able to make use of the connection.
// We retain the request context's values.
dialCtx, dialCancel := context.WithCancel(context.WithoutCancel(ctx))

继承 trace 信息（通过 Value），但不继承取消信号——已建立的连接可以放入连接池供其他请求复用，不应因为当前请求取消而中断。

6.3 AfterFunc（Go 1.21）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


type afterFuncer interface {
    AfterFunc(func()) func() bool
}

type afterFuncCtx struct {
    cancelCtx
    once sync.Once  // 保证 f 只被执行一次，或者只被阻止一次
    f    func()
}

func AfterFunc(ctx Context, f func()) (stop func() bool) {
    a := &afterFuncCtx{f: f}
    a.cancelCtx.propagateCancel(ctx, a)
    return func() bool {
        stopped := false
        a.once.Do(func() {
            stopped = true
        })
        if stopped {
            a.cancel(true, Canceled, nil)
        }
        return stopped
    }
}

AfterFunc 注册一个回调，在 context 取消后在独立 goroutine 里执行 f。sync.Once 保证 f 和 stop 函数的执行是互斥的：

如果 context 先取消，afterFuncCtx.cancel 通过 once.Do 执行 go a.f()，之后 stop 函数调用 once.Do 什么都不做，返回 false。
如果 stop 先被调用，stop 函数通过 once.Do 把 stopped 设为 true，之后 cancel 方法里的 once.Do 不再执行 go a.f()。

1
2
3
4
5
6
7
8
9


func (a *afterFuncCtx) cancel(removeFromParent bool, err, cause error) {
    a.cancelCtx.cancel(false, err, cause)  // 把自身也标记为已取消
    if removeFromParent {
        removeChild(a.Context, a)
    }
    a.once.Do(func() {
        go a.f()  // 在独立 goroutine 中运行 f
    })
}

stop 函数返回值的含义：

true：成功阻止了 f 的运行，f 不会被执行。
false：f 已经开始运行，或者已经被另一个 stop 调用阻止了。

注意 stop 不等待 f 完成就返回。如果需要知道 f 是否执行完，必须由调用方自行同步（如用 channel 或 WaitGroup）。

stopCtx 是 AfterFunc 机制的辅助类型：

1
2
3
4


type stopCtx struct {
    Context
    stop func() bool
}

当通过路径二（propagateCancel 的 afterFuncer 分支）注册 AfterFunc 时，子 context 的 c.Context 被改写为 stopCtx，持有注销回调的 stop 函数。这样当子 context 被手动取消（调用 cancel）时，removeChild 检测到 stopCtx 后会调用 stop()，注销 AfterFunc 回调，避免父节点被取消后 f 被重复执行。

七、实际使用场景

以下例子均来自 Go 标准库源码，非虚构示例。

场景一：HTTP 请求超时（net/http + database/sql）

HTTP handler 从 r.Context() 获取请求级 context，连接关闭或超时后 context 自动取消。正确的做法是从请求 context 派生子 context，而不是从 context.Background()（来自 src/database/sql/example_service_test.go）：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


func (s *Service) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    // 健康检查：1 秒超时，客户端断开时也取消
    ctx, cancel := context.WithTimeout(r.Context(), 1*time.Second)
    defer cancel()
    if err := s.db.PingContext(ctx); err != nil {
        http.Error(w, fmt.Sprintf("db down: %v", err), http.StatusFailedDependency)
        return
    }

    // 长查询：60 秒超时，继承请求 context
    // 客户端断开时 r.Context() 会被取消，派生的 ctx 也会随之取消，查询中止
    ctx, cancel = context.WithTimeout(r.Context(), 60*time.Second)
    defer cancel()
    rows, err := db.QueryContext(ctx, "select p.name from people as p where p.active = true;")
    // ...

    // 有副作用的异步操作：不继承请求 context
    // 客户端断开不应中断事务提交，所以用 context.Background()
    ctx, cancel = context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()
    tx, err := db.BeginTx(ctx, &sql.TxOptions{Isolation: sql.LevelSerializable})
    // ...
}

场景二：防止 goroutine 泄露（context/example_test.go）

来自官方示例 src/context/example_test.go：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


gen := func(ctx context.Context) <-chan int {
    dst := make(chan int)
    go func() {
        n := 1
        for {
            select {
            case <-ctx.Done():
                return  // context 取消后退出，不泄露
            case dst <- n:
                n++
            }
        }
    }()
    return dst
}

ctx, cancel := context.WithCancel(context.Background())
defer cancel()  // 消费者退出时取消 context，通知生产者 goroutine 退出

for n := range gen(ctx) {
    fmt.Println(n)
    if n == 5 {
        break  // 提前 break，没有 defer cancel 的话 gen 的 goroutine 永久阻塞
    }
}

场景三：AfterFunc 取消网络 I/O（net/fd_unix.go）

来自 src/net/fd_unix.go，在 TCP 连接建立时：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


stop := context.AfterFunc(ctx, func() {
    // context 取消时，强制 write deadline 设为过去，让 waitWrite 立即返回
    _ = fd.pfd.SetWriteDeadline(aLongTimeAgo)
})
defer func() {
    if !stop() && ret == nil {
        // stop() 返回 false：AfterFunc 已经执行
        // 但 connect 成功了（ret == nil），连接被毒化（deadline 设为过去），不能用
        ret = mapErr(ctx.Err())
    }
}()

用 AfterFunc 而不是一个监听 goroutine，是因为 AfterFunc 在 context 已经被取消时不再消耗资源（stop() 返回后 goroutine 自行退出）。

场景四：AfterFunc 取消 TLS 握手（crypto/tls/conn.go）

来自 src/crypto/tls/conn.go：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


handshakeCtx, cancel := context.WithCancel(ctx)
defer cancel()

if ctx.Done() != nil {
    stop := context.AfterFunc(ctx, func() {
        _ = c.conn.Close()  // context 取消时直接关闭底层连接，中断握手
    })
    defer func() {
        if !stop() {
            // AfterFunc 已触发，连接已关闭，把 context 错误返回给调用方
            ret = ctx.Err()
        }
    }()
}

场景五：合并两个 context 的取消信号（context/example_test.go）

来自官方示例 src/context/example_test.go，用 AfterFunc 实现两个 context 取消信号的合并：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


mergeCancel := func(ctx, cancelCtx context.Context) (context.Context, context.CancelFunc) {
    ctx, cancel := context.WithCancelCause(ctx)
    stop := context.AfterFunc(cancelCtx, func() {
        cancel(context.Cause(cancelCtx))  // 把第二个 context 的取消原因透传
    })
    return ctx, func() {
        stop()
        cancel(context.Canceled)
    }
}

ctx1, cancel1 := context.WithCancelCause(context.Background())
defer cancel1(errors.New("ctx1 canceled"))

ctx2, cancel2 := context.WithCancelCause(context.Background())

mergedCtx, mergedCancel := mergeCancel(ctx1, ctx2)
defer mergedCancel()

cancel2(errors.New("ctx2 canceled"))
<-mergedCtx.Done()
fmt.Println(context.Cause(mergedCtx))  // 输出：ctx2 canceled

八、使用规范

官方文档（src/context/context.go 包注释）明确了几条规范：

不要把 context 存进 struct，应作为函数的第一个参数显式传递，命名为 ctx。（参考：https://go.dev/blog/context-and-structs）
不要传 nil context，不确定时传 context.TODO()。
WithValue 只用于请求范围的数据（如 trace ID、认证信息），不用于传递可选参数。
cancel 函数必须被调用，通常紧跟 defer cancel()。go vet 会检查所有控制流路径上是否调用了 CancelFunc。

九、内部类型总览

1
2
3
4
5
6
7
8
9


Context (interface)
├── emptyCtx (struct, 值类型, 零分配)
│   ├── backgroundCtx     <- Background()
│   └── todoCtx           <- TODO()
├── cancelCtx (struct, 指针)     <- WithCancel / WithCancelCause
│   ├── timerCtx (struct, 指针)  <- WithDeadline / WithTimeout
│   └── afterFuncCtx (struct, 指针)  <- AfterFunc 内部使用
├── withoutCancelCtx (struct, 值类型)  <- WithoutCancel (Go 1.21)
└── valueCtx (struct, 指针)      <- WithValue

辅助类型：

stopCtx：AfterFunc 路径二注册后，替换 c.Context 字段，持有注销函数
canceler（内部接口）：*cancelCtx 和 *timerCtx 实现，供取消树管理用

取消信号只向下传播（父取消 → 子取消），不向上传播。值查找则沿链向上遍历，直到根节点。两条方向相反的数据流共用同一条链式结构，通过嵌入字段 Context 连接。

#Go #源码

Go Context 源码阅读

Related posts