I was unable to find a find a clear answer online to what are the performance implications of using a standard for loop (e.g., for i := 0; i < len(arr); i++ {}) vs range-for (e.g., for idx, val := range arr {}) in Golang. By analyzing the assembler output I determined that as of Go 1.14 they produce similar CPU instructions for common 64-bit Intel/AMD CPUs.
The range-for loop produces two additional assembly instructions:
movq "".arr+8(SP), AX
pcdata $0, $0
This brings the total instruction count for function count2 to 13 compared to 11 for function count1.
I used https://go.godbolt.org/ at the suggestion of dominikh on Freenode IRC #golang to map the Golang functions with the corresponding assembler output.
My code for an apples-to-apples comparison between the two:
package main
func count1(arr []int) {
for i := 0; i < len(arr); i++ {
_ = i; _ = arr[i]
}
}
func count2(arr []int) {
for i, v := range arr {
_ = i; _ = v
}
}
func main() {
arr := make([]int, 100)
for i := 0; i < len(arr); i++ {
arr[i] = i
}
count1(arr)
count2(arr)
}
count1() produces:
pcdata $0, $0
pcdata $1, $1
movq "".arr+16(SP), AX
xorl CX, CX
jmp count1_pc12
count1_pc9:
incq CX
count1_pc12:
cmpq CX, AX
jlt count1_pc9
pcdata $0, $-1
pcdata $1, $-1
ret
count2() produces:
pcdata $0, $1
pcdata $1, $1
movq "".arr+8(SP), AX
pcdata $0, $0
movq 8(AX), AX
xorl CX, CX
jmp count2_pc16
count2_pc13:
incq CX
count2_pc16:
cmpq CX, AX
jlt count2_pc13
pcdata $0, $-1
pcdata $1, $-1
ret
count2 produces more assembler instructions which may suggest it is slower code. However, this is not always the case. To understand the real performance implications of these instructions benchmarks need to be conducted in a future article.