x86 - BTB size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake? -

are there way determine or resource can find branch target buffer size haswell, sandy bridge, ivy bridge, , skylake intel processors?

check software optimization resources agner fog, http://www.agner.org/optimize/

btb should in "the microarchitecture of intel, amd , via cpus: optimization guide assembly programmers , compiler makers", http://www.agner.org/optimize/microarchitecture.pdf

3.7 branch prediction in intel sandy bridge , ivy bridge

btb organization. branch target buffer in sandy bridge bigger in nehalem according unofficial rumors. unknown whether has 1 level, in core 2 , earlier processors, or 2 levels in nehalem. can handle maximum of 4 call instructions per 16 bytes of code. conditional jumps less efficient if there more 3 branch instructions per 16 bytes of code.

3.8 branch prediction in intel haswell, broadwell , skylake

btb organization. organization of branch target buffer unknown. appears reasonably big.

intel may describe data in "intel 64 , ia-32 architectures optimization reference manual" http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html around "3.4.1 branch prediction optimization" still no sizes.

it may looks strange, there no information btb in cpuid in 1998-2000: http://www.installaware.com/forums/oldattachments/02142006163/tstcpuid.c (by gerald j. heim, university of tübingen, germany.). , still not listed in http://www.felixcloutier.com/x86/cpuid.html or in public materials intel workers...

 * table describes possible cache , tlb configurations  * documented intel. amd doesn't use gives  * exact cache layout data on cpuid 0x8000000x.  *  * max_cache_features_iterations limits possible cache information  * 80 bytes (of 16 bytes used in generic pentii2).  * 80 possible caches on safe side 1 or 2 years.  *  * strange enough no bht, btb or return stack data given way...

there should performance monitoring unit (pmu) counters btb, , there experiments btb size running special test programs, check http://xania.org/201602/haswell-and-ivy-btb matt godbolt

conclusions

from these results, seems ivy bridge (and therefore sandy bridge) uses pretty same strategy btb lookups of unconditional branches, albeit larger table size: 4096 entries split on 1024 sets of 4 ways.

for haswell seems new approach determining sets has been taken, along new approach evicting entries.

and more posts branch prediction , events:

http://xania.org/201602/bpu-part-one static branch prediction on newer intel processors
http://xania.org/201602/bpu-part-two branch prediction - part two
http://xania.org/201602/bpu-part-three btb in contemporary intel chips)
http://xania.org/201602/bpu-part-four branch target buffer, part 2

his code public, based on agner's tests: https://github.com/mattgodbolt/agner: https://github.com/mattgodbolt/agner/blob/master/tests/btb_size.py, https://github.com/mattgodbolt/agner/blob/master/tests/branch.py

Trigger

Search This Blog

x86 - BTB size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake? -

Comments

Post a Comment