summaryrefslogtreecommitdiff
path: root/doc/binutils.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/binutils.txt')
-rw-r--r--doc/binutils.txt321
1 files changed, 0 insertions, 321 deletions
diff --git a/doc/binutils.txt b/doc/binutils.txt
deleted file mode 100644
index 382cac183..000000000
--- a/doc/binutils.txt
+++ /dev/null
@@ -1,321 +0,0 @@
-+ Fixes to make:
-
- -Bdirect: cuml. dlopen times:
-0.676526
-0.695816
-0.703797
- normal: culm dlopen times:
-1.811801
-1.830609
-1.859336
-
-kcachgrind: 10million L1 cache misses, 2.3million L2 misses
-LD_DEBUG=stats - 64k named relocations
-=160 L1 cache misses each, & 31 L2 misses each.
-
-Afterwards:
- Linking 'only' ~15% of the time (vs. 50%)
- + still the largest single thing though.
-
-
- + suse-dynsort: FIXME: need to free this data too ...
-
-
-The dynsort harness: -zdynsort, -hashvals:
-UnSorted
-Iter 39.66 ms
-Iter 39.60 ms
-Iter 41.17 ms
-Iter 39.49 ms
-Sorted
-Iter 14.18 ms
-Iter 14.22 ms
-Iter 14.12 ms
-Iter 15.06 ms
- ie. a ~60% speedup [ no benefit from -Bdirect either ;-]
-
--Wl,-Bdirect -Wl,-z dynsort -Wl,-hashvals
-
-** Generating a new '.suse.hashvals' section **
- DT_SUSE_HASHVALS 0x60000000d 0x6ffff000
- + Random no. in range:
- SUSE_LO 6cbdd030
- SUSE_HI 6cbdd031
-
-VCL diffs:
- Iter 192.02 ms
- Iter 192.23 ms
- Iter 192.22 ms
-
- Iter 94.25 ms
- Iter 93.78 ms
- Iter 94.96 ms
-
-
- + Easy to output
- + re-factoring glibc is a *total* PITA ...
-
- dl-reloc.c - calls 'RESOLVE_MAP'
-
-(gdb) p map->l_info[0x4b]
-$2 = (Elf32_Dyn *) 0x0
-(gdb) p *map
-$3 = {l_addr = 1073872896, l_name = 0x400009a8 "/usr/lib/libstdc++.so.6",
-
-**
-
-
-Improving binutils' symbol & reloc sorting efficiency:
-
-// Improved binutils symbol sorting for cache efficiency ...
- + Unfortunately -Wl,-combreloc exports & re-imports
- the relocs before sorting them [ most odd ]
-
- + We sort .dynsym nicely (by elf hash)
- + we fail to sort .dynstr
- - in sequence of suffix order [!?]
- + elf-strtab.c (_bfd_elf_strtab_finalize)
- + does a sort - but only on a copy ...
- + also assigns hard addresses [!]
- + ultimately in bfd_strtab_add order ...
- + can we sort earlier & influence that ?
- + red herring:
- + '_bfd_elf_export_symbol' ...
- + _bfd_elf_merge_symbol (?)
- + *Unfortunately*
- + the sym table is already built & swapped out (?)
-
- + we want relocs sorted by absolute elf hash value
- + Unfortunately - we don't have the symbol foo here
- + which *really* sucks ...
- + The relocs are typically just copied from the source
- though so ... _bfd_elf_link_output_relocs ...
- + the SymStrTab + contains all the symbols
- + in the right order - given an index each
- + unfortunately inaccessible by 'index' ...
- + Back the strhash with the actual string data ? [!?]
- + thus avoiding foos ?
-
- + and we want symbols sorted by % hash_table_size
- + HACK: _bfd_elf_link_renumber_dynsyms [!?]
- + sort as we renumber ?
- + bfd_elf_final_link
- + elf_link_hash_traverse ... elf_link_output_extsym
- + [ elf_sort_symbol ? ]
-
- ** Could we have a flat array of entries ?
- + we need Index [ it's a byte index ... ]
- + urgh ...
- + st_name is a byte offset into foo
-
-
- ** Problems:
- + dynsym index order is: (bfd_elf_final_link):
- + dummy_symbol [elf_link_output_sym ...]
- + elf_link_hash_traverse (elf_link_output_extsym...)
- localsyms = true
- + abfd->sections order
- + local dynsyms ...
- + elf_link_hash_traverse (elf_link_output_extsym)
- localsyms = false
- ** To Do:
- + 'simply' - pre-quicksort on
- (elf_link_hash_entry):
- ** h->u.elf_hash_value % hash_size [!?]
- + iterate over that instead of the hash ...
- + also sorts the strings by elf hash incidentally :-)
-
- + bucketcount = elf_hash_table (finfo->info)->bucketcount;
- + bucket = h->u.elf_hash_value % bucketcount;
-
- + compute_bucket_count ...
- + builds 'hashcodes' list ...
- + used only locally ...
-
- ** What about _bfd_elf_link_renumber_dynsyms ...
- + what about hash table removals
- - surely they change the order of traversal ?
-
-** Hack elf_link_hash_traverse
- + in 2 modes: pre-sort & post sort.
-
-** We want to sort before _bfd_elf_link_renumber_dynsyms:
- from bfd_elf_size_dynamic_sections ...
- + bucketcount calculated on line 5787
- + bfd_elf_size_dynamic_sections [4987->5816]
- + renumber_dynsyms is line 5715
- + [urk]
-
-dltest.c - simple dlopen 100 times of libvcl:
-Iter 0.12241 us
-Iter 0.12255 us
-Iter 0.12828 us
-
-sorted alphabetically:
-Iter 0.12222 us
-Iter 0.12506 us
-Iter 0.12137 us
-
-just libvcl sorted by elf_hash % bucketcount:
-Iter 0.11994 us
-Iter 0.11950 us
-Iter 0.11959 us
-
-more libvcl ldd output re-linked:
- sal, libutl, libsot
-
-unchanged:
-Iter 0.36873 us
-Iter 0.36727 us
-Iter 0.3777 us
-Iter 0.39592 us
-Iter 0.41514 us -- outlier
-Iter 0.38233 us
-
-Avg: 0.3784 ( + 0.3845 with outlier)
-
-changed:
-Iter 0.35362 us
-Iter 0.35221 us
-Iter 0.3547 us
-Iter 0.37611 us
-Iter 0.38871 us
-Iter 0.37706 us
-
-Avg: 0.3671
-
-** sorting relocs by elf hash
- + store the hash value as we write it out ?!
- + store an array of 'h' pointers per reloc ?
-...
- + then we have an array of foo ...
- + or ? ... symbol foo...
- + OR - store the symbol strings in a chunk
- instead & index by offsets ?
- + can realloc take that ?
- + can we work out the size in advance ?
- + can we get a very-slow prototype ?
- + foo:
- + just read-in the elf string tab anyway.
- + it got emitted by bfd_stringtab_emit just above anyway.
-
- + relocations - point into dynsym ? - if so - horay ;-)
- + we can use the previous sort / foo table too ?
- + otherwise we screwed that up nastily [?]
- + hmm.
-
- + relocs must have a dynsym ptr - since that's
- what we accelerate -Bdirect with ...
-
-** TODO
- + Check that we do elide partially matching symbols ...
- + run 'make check' (?)
- + 3 unexpected failures
- + all related to having re-sorted syms / relocs
-
-+ Issues:
- + qsort 'global' closure ?
- + Pre % the elf symbol hash data with the
- bucket count ! :-)
- + add an elf_link_hash_entry pointer and pass
- that (if we can) from bfd_elf_link_record_dynamic_symbol ?
- + then we can avoid re-calculating the symbol hash etc.
- also get the foo to baa more nicely ... (?)
-
-
-** Way better:
- + 30% faster ...
- [ for my 1 test ;-]
-
-** For just libsvx: - each avg of 10 runs:
-
-Unsorted:
-Iter 549 ms
-Iter 552 ms
-Iter 550 ms
- Avg: 550ms
-
-Sorted:
-Iter 536 ms
-Iter 540 ms
-Iter 540 ms
- Avg: 539ms
-
-
-** For libvcl: avg of 50 runs
- + with 5 of it's deps re-linked:
-
-Unsorted:
-Iter 88.45 ms
-Iter 88.27 ms
-Iter 89.89 ms
-Iter 88.65 ms
- + 88.82ms
-
-Sorted:
-Iter 85.11 ms
-Iter 85.22 ms
-Iter 85.43 ms
-Iter 84.57 ms
- + 85.08ms
-
-
-** Analysis of the test harness:
-
-L2 data read misses:
-
-do_lookup_x 362k -> 130k
-strcmp 159k -> 53k
-_dl_elf_hash 158k -> 147k
-
-L1 data read misses:
-
-_dl_relocate_obj 1649k -> 587k
-do_lookup_x 923k -> 295k
-_dl_elf_hash 187k -> 162k
-strcmp 216k -> 80k
-
-michael@linux:/opt/OOInstall/program> for a in 0 1 2 3 4 5 6 7 8 9; do /opt/gcc/src/dynsort-test/dltest 10 ./libsvx680li.so; done
-Iter 982.57 ms
-Iter 1028.00 ms
-Iter 1029.86 ms
-Iter 1019.55 ms
-Iter 1047.68 ms
-Iter 1031.04 ms
-Iter 1051.30 ms
-Iter 1029.33 ms
-Iter 1016.63 ms
-Iter 615.35 ms
-michael@linux:/opt/OOInstall.sorted/program> for a in 0 1 2 3 4 5 6 7 8 9; do /opt/gcc/src/dynsort-test/dltest 10 ./libsvx680li.so; done
-Iter 938.20 ms
-Iter 962.02 ms
-Iter 941.60 ms
-Iter 944.48 ms
-Iter 931.99 ms
-Iter 942.54 ms
-Iter 953.35 ms
-Iter 939.51 ms
-Iter 962.91 ms
-Iter 949.63 ms
-
-
-Not having 'UNDEF' slots in the chain:
-no slots:
-Iter 3753.08 ms
-Iter 3602.53 ms
-Iter 3654.38 ms
-Iter 3575.02 ms
-Iter 3661.06 ms
-Iter 3659.29 ms
-slots:
-Iter 3526.86 ms
-Iter 3634.79 ms
-Iter 3666.99 ms
-Iter 3595.30 ms
-Iter 3685.82 ms
-Iter 3681.00 ms
-
-michael@linux:/opt/OOInstall/program> objdump -T libsvx680li.so | grep '\*UND\*' | wc -l
-5219
-michael@linux:/opt/OOInstall/program> objdump -T libsvx680li.so | grep -v '\*UND\*' | wc -l
-48289