Once again, it has been ages since I have touched this site. And once again I promise to be more active. … what’s up? Oh, yeah! BLOG!
I have recently been looking into the glibc resolver code.
It started out like any other troubleshooting effort, just trying to get a good foothold and identify where things could go wrong and how to ensure they went right. Once I got in the code… it was a real “took the red pill” sort of moment.
often deal with how the resolver is configured, but had never needed to
consider where it lived. As it turns out, I have been sort of
imagining a sort of PFM magic bubble whenever I thought about name
resolution on *nix based operating systems. I generally understood that
name resolution was not handled by the kernel, but I also never
imagined that resolution occurred purely in the user space. I’m not
sure what I imagined wedged between user space and the kernel. I think I
envisioned a sort of stateful shared resolver living under the mystic
veil of glibc. As you might guess, that is not the case. Every process
The nscd process helps out, using it’s own user-land resolver to provide resolution services over a local unix domain socket. Every other process’s resolver can then forego doing the work itself and just pass requests off to nscd, which may have already done the lookup within the result’s TTL, eliminating the need to resolve the same name for multiple processes. Shazam! A stateful shared resolver! Except I knew nscd was a completely optional service and still imagined a non-caching single resolver living somewhere.
It is indeed, every process for itself. The gethostbyname and getaddrinfo functions (along with some group and user related resolvers) create an instance of the resolver entirely within the process. res_init() or more accurately one of it’s internal calls (__res_maybe_init() is maybe my favorite) is called, initializing the resolver. The initialization involves reading /etc/resolv.conf to load the search suffixes, nameservers, and other configs. This could very well be the last time that information is ever read by the process. This is the source of the trouble I was trying to… shoot.
… wobbly transition to flashback …
Changes needed to be made to resolv.conf to add a search suffix. After the change, nscd was restarted and the server seemed completely functional. Command line tests worked. Our PHP code running under Apache httpd were now able to resolve the hosts with the new suffix. All was well.
Are you sure I can’t just take the blue pill?
Days later we started seeing periodic “unknown host” errors from PHP applications.
Why would it have worked initially, but start failing a couple days later?
I will update this post in a couple days with what we found.
In the mean time, here are a couple links to other interesting DNS resolution information
So far I’ve only run into the nss_resinit module to address the fact that changes to resolv.conf aren’t automagically loaded into the resolver.