Lair Of The Multimedia Guru

2009-12-27

ld.so GNU linker/loader

ld.so is the little thing that loads all your shared libs and binds all the referenced symbols to their definitions, well more or less at least. I think a better name for it would be a russian roulett linker, it binds symbols correctly most of the time but if it fails you dont want to be anywhere close to its line of fire ;).

The problems start with a mistake,oversight or let me call it a typo in the ELF spec. More precissely every reference in every object, be that the application or a lib will be resolved to the first matching symbol definition. The search order is a breadth first search starting from the application. This doesnt look wrong at first and thats why i call it a oversight or typo. The problem is for example if your application links to libz1 and libpng1 while libpng1 links to libz2. All references to libz2 in libpng1 will be resolved with things from libz1 and if we now assume these 2 are incompatible … boom. Had the ELF spec required a breadth first search starting from the object that contains the reference this problem and all the ridiculous mess that i describe below would not exist.

As the ELF spec requires this quite unpractical behavior one would expect that there would be some option that could enable a more practical search order. Sadly there is not in GNU ld.so, instead people recomand to use symbol versioning as a workaround.

Symbol versioning, the heal all ill in a world free of thouse who wrote GNU ld.so.

Symbol versioning, as implemented by sun and later gnu has various purposes like finding the lowest version of a lib an application needs but in GNUs world it serves a much more important purpose, to workaround GNU ld.so fantastic misdesign.

The idea is that if everyone and everything uses symbol versioning then in our little example above libpng1 will use symbols from libz2 while the application uses symbols from libz1 and things actually work

The bad news is that symbol versioning is disabled by default, it can be enabled by using a version script or using –default-symver

The worse news is that you cant enable it if it wasnt enabled in a released binary without hell breaking loose due to the ld.so bugs i describe below

The Bugs in ld.so

Sorry for the long intro above but the whole really is a convoluted mess that is hard to explain …

If you add a version to an existing library without bumping the soname you will hit ld.so bug #1, an assertion failure if an application compiled against the versoned lib is linked to a lib without versioning:
Inconsistency detected by ld.so: do-lookup.h: 115: check_match: Assertion `version->filename == ((void *)0) || ! _dl_name_match_p (version->filename, map)' failed!
That one can be fixed by a single line change, commenting the assert out or as debian&ubuntu prefer by adding hundreads of dependancies (one for each use of the changed lib).

This bug is funny because if you rename the lib, load it with LD_PRELOAD and hexedit the filename out of it, it works and no more assert failure happens

But lets look at bug #2, ld.so will satisfy unversioned symbol references with the first symbol found versioned or not. This makes the coexistence of versioned and unversioned libs pretty much like walking a minefield. More specifically if you thought you can just turn versioning on with a ABI and soname bump, think again, it doesnt work because the old sonamed lib didnt use versioning the linker will bind references that where intended to the old lib to the new if that happens to come first in the list of dependancies.

And due to bug #3 ld.so will satisfy versioned symbol references with the first symbol found versioned or not. Unless the filename matches some test in which case bug#1 will end ld.sos life and also that of your application. Together with bug#2 this means if theres a lib with versioning and one without loaded, your references versioned or not can end up bound to either depending on luck the only thing you should not expect is that both will be bound correctly, because i think that cant happen at least not without very obscure tricks

The solution

Fix ld.so obviously, anything else is pure insanity, i dont even want to think further on what breaks and how it could be worked around. Heres a simple proof of concept patch but keep in mind this has only lightly been tested and it does not fix the root problem, that is searching is still in the silly order, just that now references will be resolved preferably with matching definitions and without random filename related assert(0).

A quick test and demonstration of the difference (id upload the tgz of the code if wordpress wouldnt disallow uploading tgz files)

./compileX.sh /lib/ld-2.9.so
Test for introducing versioning into A0 and a new A1 with versioning
Trying 6 A0:0 A1:2 B0:4 App:0
libA0 B:libB0 libA0 libA0
Trying 7 A0:1 A1:2 B0:4 App:0
libA0 B:libB0 libA1 libA0
Trying 14 A0:0 A1:2 B0:4 App:8
./app: ./libA0.so: no version information available (required by ./app)
Inconsistency detected by ld.so: do-lookup.h: 115: check_match: Assertion `version->filename == ((void *)0) || ! _dl_name_match_p (version->filename, map)’ failed!
Trying 15 A0:1 A1:2 B0:4 App:8
libA0 B:libB0 libA1 libA0
Test for introducing versioning into A1 and a new A0 with versioning
Trying 9 A0:1 A1:0 B0:0 App:8
libA0 B:libB0 libA0 libA0
Trying 11 A0:1 A1:2 B0:0 App:8
libA0 B:libB0 libA0 libA0
Trying 13 A0:1 A1:0 B0:4 App:8
./app: ./libA1.so: no version information available (required by ./libB0.so)
Inconsistency detected by ld.so: do-lookup.h: 115: check_match: Assertion `version->filename == ((void *)0) || ! _dl_name_match_p (version->filename, map)’ failed!
Trying 15 A0:1 A1:2 B0:4 App:8
libA0 B:libB0 libA1 libA0

./compileX.sh ~/libcugh/eglibc-2.10.2/build-tree/amd64-libc/elf/ld.so
Test for introducing versioning into A0 and a new A1 with versioning
Trying 6 A0:0 A1:2 B0:4 App:0
libA0 B:libB0 libA1 libA0
Trying 7 A0:1 A1:2 B0:4 App:0
libA0 B:libB0 libA1 libA0
Trying 14 A0:0 A1:2 B0:4 App:8
./app: ./libA0.so: no version information available (required by ./app)
libA0 B:libB0 libA1 libA0
Trying 15 A0:1 A1:2 B0:4 App:8
libA0 B:libB0 libA1 libA0
Test for introducing versioning into A1 and a new A0 with versioning
Trying 9 A0:1 A1:0 B0:0 App:8
libA0 B:libB0 libA1 libA0
Trying 11 A0:1 A1:2 B0:0 App:8
libA0 B:libB0 libA1 libA0
Trying 13 A0:1 A1:0 B0:4 App:8
./app: ./libA1.so: no version information available (required by ./libB0.so)
libA0 B:libB0 libA1 libA0
Trying 15 A0:1 A1:2 B0:4 App:8
libA0 B:libB0 libA1 libA0

Filed under: GNU — Michael @ 03:31

4 Comments »

  1. Thanks for pointing out the hideousness of the beast. While I haven’t gotten around to it and don’t know if I will, I’ve considered putting together an ld.so replacement either from scratch or based on the uclibc ld.so. That, and trying to retrofit glibc-compatible headers/abi onto my libc to get a drop-in replacement that’ll work with binary distributions. =)

    Comment by Rich — 2010-03-04 @ 10:39

  2. I just wanna know if you filled this bug to the mainteiners and if a complete solution is been worked.

    Comment by Bruno — 2010-04-28 @ 13:25

  3. > I just wanna know if you filled this bug to the mainteiners and if a
    > complete solution is been worked.

    I thought reinhardt tartler would be submitting this to upstream. The reason why i didnt do it myself is i dont like upstream (drepper afaik) at all and iam not a diplomatic person so i feared me submitting this would degenerate into a flamewar that ends in denial and rejection which would be pointless.

    Comment by Michael — 2010-05-01 @ 18:52

  4. The worse news is that you cant enable it if it wasnt enabled in a released binary without hell breaking loose due to the ld.so bugs i describe below

    Comment by MyAnimeStream — 2013-05-08 @ 10:49

RSS feed for comments on this post.

Leave a comment

Powered by WordPress