为何使用for-range遍历map时是乱序的
在阅读Go设计与实现的for-range章节时,了解到使用for-range
循环遍历map时,获取的值是乱序的,并非按照存入顺序。
于是产生了疑问:为什么要乱序?
1. for-range 如何遍历 map
使用for-range
遍历map时,会在runtime.mapiterinit
函数中,初始化开始遍历的元素:
// mapiterinit initializes the hiter struct used for ranging over maps.
// The hiter struct pointed to by 'it' is allocated on the stack
// by the compilers order pass or on the heap by reflect_mapiterinit.
// Both need to have zeroed hiter since the struct contains pointers.
func mapiterinit(t *maptype, h *hmap, it *hiter) {
...
// decide where to start
r := uintptr(fastrand())
if h.B > 31-bucketCntBits {
r += uintptr(fastrand()) << 31
}
it.startBucket = r & bucketMask(h.B)
it.offset = uint8(r >> h.B & (bucketCnt - 1))
// iterator state
it.bucket = it.startBucket
...
}
其中的通过r := uintptr(fastrand())
获取随机数并用于计算遍历的首个存储桶的位置,所以每次遍历时的起始位置是随机的。
2. 为何要随机遍历
根据 The Go Blog: Go maps in action: Iteration order 中的描述:
When iterating over a map with a range loop, the iteration order is not specified and is not guaranteed to be the same from one iteration to the next. Since the release of Go 1.0, the runtime has randomized map iteration order. Programmers had begun to rely on the stable iteration order of early versions of Go, which varied between implementations, leading to portability bugs. If you require a stable iteration order you must maintain a separate data structure that specifies that order.
Go 1 Release Notes: Iterating in maps中的描述:
The old language specification did not define the order of iteration for maps, and in practice it differed across hardware platforms. This caused tests that iterated over maps to be fragile and non-portable, with the unpleasant property that a test might always pass on one machine but break on another.
In Go 1, the order in which elements are visited when iterating over a map using a for range statement is defined to be unpredictable, even if the same loop is run multiple times with the same map. Code should not assume that the elements are visited in any particular order.
This change means that code that depends on iteration order is very likely to break early and be fixed long before it becomes a problem. Just as important, it allows the map implementation to ensure better map balancing even when programs are using range loops to select an element from a map.
总结来说为了避免开发者依赖哈希表的遍历顺序,因为哈希表的实现可能会改变,不同的实现方式会可能导致原来的依赖遍历顺序的程序失效。
个人认为,因为哈希映射的过程本身就是随机的,若想要在遍历时按照某种固定的顺序获取元素,需要额外的内存和时间开销,会降低哈希表的性能。
encoding/json
如何保证输出的一致性
3. 当使用函数json.Marshal()
将哈希表编码成json
格式时,每次的输出都是一样的。
但是 Golang 中哈希表的遍历是随机的,这是如何做到的 ?
当函数func Marshal(v any)
传入的参数类型为哈希表时,会选择对应的编码函数 json.mapEncoder.encode
:
func (me mapEncoder) encode(e *encodeState, v reflect.Value, opts encOpts) {
...
// Extract and sort the keys.
sv := make([]reflectWithString, v.Len())
mi := v.MapRange()
for i := 0; mi.Next(); i++ {
sv[i].k = mi.Key()
sv[i].v = mi.Value()
if err := sv[i].resolve(); err != nil {
e.error(fmt.Errorf("json: encoding error for type %q: %q", v.Type().String(), err.Error()))
}
}
sort.Slice(sv, func(i, j int) bool { return sv[i].ks < sv[j].ks })
...
}
由上可以看到,在获取哈希表中的所有元素之后,将entry
按照key
进行了排序,所以对于同一个哈希表来说,能够确保编码的结果的一致性。