diff options
Diffstat (limited to 'doc/binutils.txt')
-rw-r--r-- | doc/binutils.txt | 321 |
1 files changed, 0 insertions, 321 deletions
diff --git a/doc/binutils.txt b/doc/binutils.txt deleted file mode 100644 index 382cac183..000000000 --- a/doc/binutils.txt +++ /dev/null @@ -1,321 +0,0 @@ -+ Fixes to make: - - -Bdirect: cuml. dlopen times: -0.676526 -0.695816 -0.703797 - normal: culm dlopen times: -1.811801 -1.830609 -1.859336 - -kcachgrind: 10million L1 cache misses, 2.3million L2 misses -LD_DEBUG=stats - 64k named relocations -=160 L1 cache misses each, & 31 L2 misses each. - -Afterwards: - Linking 'only' ~15% of the time (vs. 50%) - + still the largest single thing though. - - - + suse-dynsort: FIXME: need to free this data too ... - - -The dynsort harness: -zdynsort, -hashvals: -UnSorted -Iter 39.66 ms -Iter 39.60 ms -Iter 41.17 ms -Iter 39.49 ms -Sorted -Iter 14.18 ms -Iter 14.22 ms -Iter 14.12 ms -Iter 15.06 ms - ie. a ~60% speedup [ no benefit from -Bdirect either ;-] - --Wl,-Bdirect -Wl,-z dynsort -Wl,-hashvals - -** Generating a new '.suse.hashvals' section ** - DT_SUSE_HASHVALS 0x60000000d 0x6ffff000 - + Random no. in range: - SUSE_LO 6cbdd030 - SUSE_HI 6cbdd031 - -VCL diffs: - Iter 192.02 ms - Iter 192.23 ms - Iter 192.22 ms - - Iter 94.25 ms - Iter 93.78 ms - Iter 94.96 ms - - - + Easy to output - + re-factoring glibc is a *total* PITA ... - - dl-reloc.c - calls 'RESOLVE_MAP' - -(gdb) p map->l_info[0x4b] -$2 = (Elf32_Dyn *) 0x0 -(gdb) p *map -$3 = {l_addr = 1073872896, l_name = 0x400009a8 "/usr/lib/libstdc++.so.6", - -** - - -Improving binutils' symbol & reloc sorting efficiency: - -// Improved binutils symbol sorting for cache efficiency ... - + Unfortunately -Wl,-combreloc exports & re-imports - the relocs before sorting them [ most odd ] - - + We sort .dynsym nicely (by elf hash) - + we fail to sort .dynstr - - in sequence of suffix order [!?] - + elf-strtab.c (_bfd_elf_strtab_finalize) - + does a sort - but only on a copy ... - + also assigns hard addresses [!] - + ultimately in bfd_strtab_add order ... - + can we sort earlier & influence that ? - + red herring: - + '_bfd_elf_export_symbol' ... - + _bfd_elf_merge_symbol (?) - + *Unfortunately* - + the sym table is already built & swapped out (?) - - + we want relocs sorted by absolute elf hash value - + Unfortunately - we don't have the symbol foo here - + which *really* sucks ... - + The relocs are typically just copied from the source - though so ... _bfd_elf_link_output_relocs ... - + the SymStrTab + contains all the symbols - + in the right order - given an index each - + unfortunately inaccessible by 'index' ... - + Back the strhash with the actual string data ? [!?] - + thus avoiding foos ? - - + and we want symbols sorted by % hash_table_size - + HACK: _bfd_elf_link_renumber_dynsyms [!?] - + sort as we renumber ? - + bfd_elf_final_link - + elf_link_hash_traverse ... elf_link_output_extsym - + [ elf_sort_symbol ? ] - - ** Could we have a flat array of entries ? - + we need Index [ it's a byte index ... ] - + urgh ... - + st_name is a byte offset into foo - - - ** Problems: - + dynsym index order is: (bfd_elf_final_link): - + dummy_symbol [elf_link_output_sym ...] - + elf_link_hash_traverse (elf_link_output_extsym...) - localsyms = true - + abfd->sections order - + local dynsyms ... - + elf_link_hash_traverse (elf_link_output_extsym) - localsyms = false - ** To Do: - + 'simply' - pre-quicksort on - (elf_link_hash_entry): - ** h->u.elf_hash_value % hash_size [!?] - + iterate over that instead of the hash ... - + also sorts the strings by elf hash incidentally :-) - - + bucketcount = elf_hash_table (finfo->info)->bucketcount; - + bucket = h->u.elf_hash_value % bucketcount; - - + compute_bucket_count ... - + builds 'hashcodes' list ... - + used only locally ... - - ** What about _bfd_elf_link_renumber_dynsyms ... - + what about hash table removals - - surely they change the order of traversal ? - -** Hack elf_link_hash_traverse - + in 2 modes: pre-sort & post sort. - -** We want to sort before _bfd_elf_link_renumber_dynsyms: - from bfd_elf_size_dynamic_sections ... - + bucketcount calculated on line 5787 - + bfd_elf_size_dynamic_sections [4987->5816] - + renumber_dynsyms is line 5715 - + [urk] - -dltest.c - simple dlopen 100 times of libvcl: -Iter 0.12241 us -Iter 0.12255 us -Iter 0.12828 us - -sorted alphabetically: -Iter 0.12222 us -Iter 0.12506 us -Iter 0.12137 us - -just libvcl sorted by elf_hash % bucketcount: -Iter 0.11994 us -Iter 0.11950 us -Iter 0.11959 us - -more libvcl ldd output re-linked: - sal, libutl, libsot - -unchanged: -Iter 0.36873 us -Iter 0.36727 us -Iter 0.3777 us -Iter 0.39592 us -Iter 0.41514 us -- outlier -Iter 0.38233 us - -Avg: 0.3784 ( + 0.3845 with outlier) - -changed: -Iter 0.35362 us -Iter 0.35221 us -Iter 0.3547 us -Iter 0.37611 us -Iter 0.38871 us -Iter 0.37706 us - -Avg: 0.3671 - -** sorting relocs by elf hash - + store the hash value as we write it out ?! - + store an array of 'h' pointers per reloc ? -... - + then we have an array of foo ... - + or ? ... symbol foo... - + OR - store the symbol strings in a chunk - instead & index by offsets ? - + can realloc take that ? - + can we work out the size in advance ? - + can we get a very-slow prototype ? - + foo: - + just read-in the elf string tab anyway. - + it got emitted by bfd_stringtab_emit just above anyway. - - + relocations - point into dynsym ? - if so - horay ;-) - + we can use the previous sort / foo table too ? - + otherwise we screwed that up nastily [?] - + hmm. - - + relocs must have a dynsym ptr - since that's - what we accelerate -Bdirect with ... - -** TODO - + Check that we do elide partially matching symbols ... - + run 'make check' (?) - + 3 unexpected failures - + all related to having re-sorted syms / relocs - -+ Issues: - + qsort 'global' closure ? - + Pre % the elf symbol hash data with the - bucket count ! :-) - + add an elf_link_hash_entry pointer and pass - that (if we can) from bfd_elf_link_record_dynamic_symbol ? - + then we can avoid re-calculating the symbol hash etc. - also get the foo to baa more nicely ... (?) - - -** Way better: - + 30% faster ... - [ for my 1 test ;-] - -** For just libsvx: - each avg of 10 runs: - -Unsorted: -Iter 549 ms -Iter 552 ms -Iter 550 ms - Avg: 550ms - -Sorted: -Iter 536 ms -Iter 540 ms -Iter 540 ms - Avg: 539ms - - -** For libvcl: avg of 50 runs - + with 5 of it's deps re-linked: - -Unsorted: -Iter 88.45 ms -Iter 88.27 ms -Iter 89.89 ms -Iter 88.65 ms - + 88.82ms - -Sorted: -Iter 85.11 ms -Iter 85.22 ms -Iter 85.43 ms -Iter 84.57 ms - + 85.08ms - - -** Analysis of the test harness: - -L2 data read misses: - -do_lookup_x 362k -> 130k -strcmp 159k -> 53k -_dl_elf_hash 158k -> 147k - -L1 data read misses: - -_dl_relocate_obj 1649k -> 587k -do_lookup_x 923k -> 295k -_dl_elf_hash 187k -> 162k -strcmp 216k -> 80k - -michael@linux:/opt/OOInstall/program> for a in 0 1 2 3 4 5 6 7 8 9; do /opt/gcc/src/dynsort-test/dltest 10 ./libsvx680li.so; done -Iter 982.57 ms -Iter 1028.00 ms -Iter 1029.86 ms -Iter 1019.55 ms -Iter 1047.68 ms -Iter 1031.04 ms -Iter 1051.30 ms -Iter 1029.33 ms -Iter 1016.63 ms -Iter 615.35 ms -michael@linux:/opt/OOInstall.sorted/program> for a in 0 1 2 3 4 5 6 7 8 9; do /opt/gcc/src/dynsort-test/dltest 10 ./libsvx680li.so; done -Iter 938.20 ms -Iter 962.02 ms -Iter 941.60 ms -Iter 944.48 ms -Iter 931.99 ms -Iter 942.54 ms -Iter 953.35 ms -Iter 939.51 ms -Iter 962.91 ms -Iter 949.63 ms - - -Not having 'UNDEF' slots in the chain: -no slots: -Iter 3753.08 ms -Iter 3602.53 ms -Iter 3654.38 ms -Iter 3575.02 ms -Iter 3661.06 ms -Iter 3659.29 ms -slots: -Iter 3526.86 ms -Iter 3634.79 ms -Iter 3666.99 ms -Iter 3595.30 ms -Iter 3685.82 ms -Iter 3681.00 ms - -michael@linux:/opt/OOInstall/program> objdump -T libsvx680li.so | grep '\*UND\*' | wc -l -5219 -michael@linux:/opt/OOInstall/program> objdump -T libsvx680li.so | grep -v '\*UND\*' | wc -l -48289 |