are there way determine or resource can find branch target buffer size haswell, sandy bridge, ivy bridge, , skylake intel processors?
check software optimization resources agner fog, http://www.agner.org/optimize/
btb should in "the microarchitecture of intel, amd , via cpus: optimization guide assembly programmers , compiler makers", http://www.agner.org/optimize/microarchitecture.pdf
3.7 branch prediction in intel sandy bridge , ivy bridge
btb organization. branch target buffer in sandy bridge bigger in nehalem according unofficial rumors. unknown whether has 1 level, in core 2 , earlier processors, or 2 levels in nehalem. can handle maximum of 4 call instructions per 16 bytes of code. conditional jumps less efficient if there more 3 branch instructions per 16 bytes of code.
3.8 branch prediction in intel haswell, broadwell , skylake
btb organization. organization of branch target buffer unknown. appears reasonably big.
intel may describe data in "intel 64 , ia-32 architectures optimization reference manual" http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html around "3.4.1 branch prediction optimization" still no sizes.
it may looks strange, there no information btb in cpuid in 1998-2000: http://www.installaware.com/forums/oldattachments/02142006163/tstcpuid.c (by gerald j. heim, university of tübingen, germany.). , still not listed in http://www.felixcloutier.com/x86/cpuid.html or in public materials intel workers...
* table describes possible cache , tlb configurations * documented intel. amd doesn't use gives * exact cache layout data on cpuid 0x8000000x. * * max_cache_features_iterations limits possible cache information * 80 bytes (of 16 bytes used in generic pentii2). * 80 possible caches on safe side 1 or 2 years. * * strange enough no bht, btb or return stack data given way...
there should performance monitoring unit (pmu) counters btb, , there experiments btb size running special test programs, check http://xania.org/201602/haswell-and-ivy-btb matt godbolt
conclusions
from these results, seems ivy bridge (and therefore sandy bridge) uses pretty same strategy btb lookups of unconditional branches, albeit larger table size: 4096 entries split on 1024 sets of 4 ways.
for haswell seems new approach determining sets has been taken, along new approach evicting entries.
and more posts branch prediction , events:
- http://xania.org/201602/bpu-part-one static branch prediction on newer intel processors
- http://xania.org/201602/bpu-part-two branch prediction - part two
- http://xania.org/201602/bpu-part-three btb in contemporary intel chips)
- http://xania.org/201602/bpu-part-four branch target buffer, part 2
his code public, based on agner's tests: https://github.com/mattgodbolt/agner: https://github.com/mattgodbolt/agner/blob/master/tests/btb_size.py, https://github.com/mattgodbolt/agner/blob/master/tests/branch.py
Comments
Post a Comment