Golang语言系列-数组与切片
数组是每一门编程语言最基本的概念之一,但是在golang里,用到的更多的是切片。切片其实是对数组的包装(结构体),包含指向数组元素的指针和切片的长度和容量。当长度和容量相等情况下,如果还要插入元素,就会引起切片扩容。很多计算机编程语言的动态数组都有扩容的概念,比如C++语言中的vector,Java中的ArrayList,而且扩容规则都类似,一般是小容量下,一次性扩容为原来的两倍,之后扩容为x倍(x一般在1.25到1.5之间)。并且往往这个扩容策略转换阈值和具体的扩容策略都是因不同语言不同版本而不一样的。
还有一点是在写算法编程题中很重要的一点,golang里的函数参数传递只有值传递,没有引用传递,所以以数组为函数参数,是会拷贝一份新的,所以函数里数组修改不会影响调用方手里的数组,但如果是切片,也是拷贝,但是会是浅拷贝,所以函数参数里切片里的指针和原来的是一样,所以函数里修改切片,调用方手里的切片的数值也会发生相应的变化。本文将从源码和实验的角度验证以上的观点。
源码分析
源码分析版本为go1.21。切片相关的源码在运行时包runtime
的slice.go
文件中。其结构体定义如下
type slice struct {
// golang中的万能指针,可以和任何指针类型互相转换,类似于C语言中的void*
// golang中还有一个指针,uintptr,是可以运算的指针
array unsafe.Pointer
len int // 当前元素数量
cap int // 当前内存区域容量
}
在make一个切片时,调用的底层函数如下:
func makeslice(et *_type, len, cap int) unsafe.Pointer {
// 以下检查要求分配的内存空间是否过大,过大则抛出panic
mem, overflow := math.MulUintptr(et.Size_, uintptr(cap))
if overflow || mem > maxAlloc || len < 0 || len > cap {
// NOTE: Produce a 'len out of range' error instead of a
// 'cap out of range' error when someone does make([]T, bignumber).
// 'cap out of range' is true too, but since the cap is only being
// supplied implicitly, saying len is clearer.
// See golang.org/issue/4085.
mem, overflow := math.MulUintptr(et.Size_, uintptr(len))
if overflow || mem > maxAlloc || len < 0 {
panicmakeslicelen()
}
panicmakeslicecap()
}
// 分配内存,小对象直接在P的本地缓存中的空闲链表中分配,超过32kb的在堆上分配
return mallocgc(mem, et, true)
}
再来看append操作,append是内建函数,底层调用相关代码如下:
// Append appends the values x to a slice s and returns the resulting slice.
// As in Go, each x's value must be assignable to the slice's element type.
func Append(s Value, x ...Value) Value {
s.mustBe(Slice) // 类型校验
n := s.Len()
s = s.extendSlice(len(x)) // 扩容
for i, v := range x { // 赋值
s.Index(n + i).Set(v)
}
return s
}
再来具体负责扩容的growslice
函数,相关代码如下:
func growslice(oldPtr unsafe.Pointer, newLen, oldCap, num int, et *_type) slice {
oldLen := newLen - num // 计算过去的长度
if raceenabled {
callerpc := getcallerpc()
racereadrangepc(oldPtr, uintptr(oldLen*int(et.Size_)), callerpc, abi.FuncPCABIInternal(growslice))
}
if msanenabled {
msanread(oldPtr, uintptr(oldLen*int(et.Size_)))
}
if asanenabled {
asanread(oldPtr, uintptr(oldLen*int(et.Size_)))
}
if newLen < 0 { // 参数校验
panic(errorString("growslice: len out of range"))
}
if et.Size_ == 0 { // 元素大小为零
// append should not create a slice with nil pointer but non-zero len.
// We assume that append doesn't need to preserve oldPtr in this case.
// 任何零字节分配的起始地址就是zerobase
return slice{unsafe.Pointer(&zerobase), newLen, newLen}
}
// 扩容操作
newcap := oldCap
doublecap := newcap + newcap
// 需要的新长度超过原来容量的两倍,则新容量直接等于新长度
// s := []int{1, 2, 3}
// s1 := append(s, s...)
// s1的容量将会是6,而不是8
if newLen > doublecap {
newcap = newLen
} else {
const threshold = 256
if oldCap < threshold { // 旧容量小于256,则新容量扩容为原来的两倍
newcap = doublecap
} else {
// Check 0 < newcap to detect overflow
// and prevent an infinite loop.
// newCap > 0避免在溢出的时候无限循环
// 扩容不是简单扩容,而是要保证扩容后的容量大于要求的长度
for 0 < newcap && newcap < newLen {
// Transition from growing 2x for small slices
// to growing 1.25x for large slices. This formula
// gives a smooth-ish transition between the two.
// 扩容为原来的1.25到2之间,且旧容量越大,扩容比例越接近1.25
newcap += (newcap + 3*threshold) / 4
}
// Set newcap to the requested cap when
// the newcap calculation overflowed.
if newcap <= 0 {
newcap = newLen
}
}
}
// 以下和内存对齐分配,溢出判断有关
var overflow bool
var lenmem, newlenmem, capmem uintptr
// Specialize for common values of et.Size.
// For 1 we don't need any division/multiplication.
// For goarch.PtrSize, compiler will optimize division/multiplication into a shift by a constant.
// For powers of 2, use a variable shift.
switch {
case et.Size_ == 1:
lenmem = uintptr(oldLen)
newlenmem = uintptr(newLen)
// roundupsize 和内存对齐有关,一般结果对齐后,分配的内存要大于等于原来的内存要求,即newlenmen > newLen
capmem = roundupsize(uintptr(newcap))
overflow = uintptr(newcap) > maxAlloc
newcap = int(capmem)
case et.Size_ == goarch.PtrSize:
lenmem = uintptr(oldLen) * goarch.PtrSize
newlenmem = uintptr(newLen) * goarch.PtrSize
capmem = roundupsize(uintptr(newcap) * goarch.PtrSize)
overflow = uintptr(newcap) > maxAlloc/goarch.PtrSize
newcap = int(capmem / goarch.PtrSize)
case isPowerOfTwo(et.Size_):
var shift uintptr
if goarch.PtrSize == 8 {
// Mask shift for better code generation.
shift = uintptr(sys.TrailingZeros64(uint64(et.Size_))) & 63
} else {
shift = uintptr(sys.TrailingZeros32(uint32(et.Size_))) & 31
}
lenmem = uintptr(oldLen) << shift
newlenmem = uintptr(newLen) << shift
capmem = roundupsize(uintptr(newcap) << shift)
overflow = uintptr(newcap) > (maxAlloc >> shift)
newcap = int(capmem >> shift)
capmem = uintptr(newcap) << shift
default:
lenmem = uintptr(oldLen) * et.Size_
newlenmem = uintptr(newLen) * et.Size_
capmem, overflow = math.MulUintptr(et.Size_, uintptr(newcap))
capmem = roundupsize(capmem)
newcap = int(capmem / et.Size_)
capmem = uintptr(newcap) * et.Size_
}
// The check of overflow in addition to capmem > maxAlloc is needed
// to prevent an overflow which can be used to trigger a segfault
// on 32bit architectures with this example program:
//
// type T [1<<27 + 1]int64
//
// var d T
// var s []T
//
// func main() {
// s = append(s, d, d, d, d)
// print(len(s), "\n")
// }
if overflow || capmem > maxAlloc {
panic(errorString("growslice: len out of range"))
}
// 申请空间
var p unsafe.Pointer
if et.PtrBytes == 0 {
p = mallocgc(capmem, nil, false)
// The append() that calls growslice is going to overwrite from oldLen to newLen.
// Only clear the part that will not be overwritten.
// The reflect_growslice() that calls growslice will manually clear
// the region not cleared here.
memclrNoHeapPointers(add(p, newlenmem), capmem-newlenmem)
} else {
// Note: can't use rawmem (which avoids zeroing of memory), because then GC can scan uninitialized memory.
p = mallocgc(capmem, et, true)
if lenmem > 0 && writeBarrier.enabled {
// Only shade the pointers in oldPtr since we know the destination slice p
// only contains nil pointers because it has been cleared during alloc.
// 写屏障相关
bulkBarrierPreWriteSrcOnly(uintptr(p), uintptr(oldPtr), lenmem-et.Size_+et.PtrBytes)
}
}
memmove(p, oldPtr, lenmem) // 内存拷贝
return slice{p, newLen, newcap}
}
在旧版本的golang中,扩容是以1024为阈值,小于阈值,扩容为原来的两倍,大于阈值,扩容为原来的1.25倍。而在新版本中,为了使扩容的过程更加平滑,是以256为阈值,小于阈值,扩容为原来的两倍,否则按照newcap += (newcap + 3*threshold) / 4
,这样,扩容比例是从2倍慢慢下降,不断接近1.25倍。值得注意的是,如果要求的新长度大于两倍的旧容量,那么新容量直接是新长度,且在扩容中,不断地加上 (newcap + 3*threshold) / 4
直到新容量大于新长度。且在之后的内存申请中,因为内存对齐的需要分配的新容量可能大于等于以上计算出来的容量。
实验
实验一、切片扩容实验
实验代码如下:
package main
import (
"fmt"
)
func main() {
arr := []int{1, 2, 3}
arr = append(arr, arr...)
fmt.Printf("len(arr)=%d, cap(arr)=%d\n", len(arr), cap(arr)) // 6, 6
arr1 := make([]int, 16)
arr1 = append(arr1, 1)
fmt.Printf("len(arr1)=%d, cap(arr1)=%d\n", len(arr1), cap(arr1)) // 17, 32
arr2 := make([]byte, 500)
arr2 = append(arr2, 1)
fmt.Printf("len(arr2)=%d, cap(arr2)=%d\n", len(arr2), cap(arr2))
// 501, 817(实际结果896)
}
输出结果如下:
实验二、数组和切片作为函数参数实验
实验代码如下:
package main
import (
"fmt"
)
func main() {
arr := [3]int{1, 2, 3} // 这样声明为数组
s := arr[1:]
f(s, arr)
fmt.Printf("arr=%+v", arr) // [1, -1, 3]
}
func f(s []int, arr [3]int) {
s[0] = -1
arr[0] = -1
}
输出结果为[1, -1, 3],首先,数组的一号位置没有变成-1,说明数组做函数参数传过去的是一份拷贝,拷贝上的修改不会影响原数组。数组的二号位置变成了-1,一方面说明通过对数组取子数组的操作得到的切片底层引用的是原来的数组,两者数据是共享的,另一方面,切片作为函数参数,虽然也是拷贝,但是保存数据的指针拷贝了数据还是指向原来的内存空间,所以对切片底层数据的修改是会影响到原来的数据的。