My customer saw a very strange behaviour of DNS packets. Some of the packets where dropped at the firewall. Since this caused a short delay I was asked to do some analysis.
My customer saw a very strange behaviour of DNS packets. Some of the packets where dropped at the firewall. Since this caused a short delay I was asked to do some analysis. It turned out I had to debug an IPv4/IPv6 glibc problem. But first things first…
What happened exactly? Name resolution on the linux host always had some mysterious delay. To test it I used the getent command like this:
# getent passwd
The following figure illustrates the problem the host ran in:
The packets in detail:
- The Linux host sent two DNS requests to the DNS server behind a firewall asking for the AAAA and the A records of an other host.
- The DNS server answered both packets. The firewall let the first packet pass but dropped the second packet.
- After one second the Linux host sent out the two packets again. But this time it queried for the AAAA record, waited for the answer and only then sent out the request for the A record. This time the firewall let pass both answer packets.
After all, a very strange behaviour of the firewall. But only at first glance!
A detailed investigation showed that the Linux host sent out all packets from the same source port. Obviously the source and destination IP address and the destination port (udp/53) also were identical.
So when the first request paket passed the firewall it added a entry in its state table with these four criteria. The second request packet did not trigger a second entry in the state table since it did not differ from the existing entry. The firewall let the first answer packet pass and deleted the entry in the state table. The second answer packet was, correctly, dropped by the firewall.
After a little investigation (thanks Robert!) I found that the glibc offers a option to disable this behaviour:
single-request (since glibc 2.10) sets RES_SNGLKUP in _res.options. By default, glibc performs IPv4 and IPv6 lookups in parallel since version 2.9. Some appliance DNS servers cannot handle these queries properly and make the requests time out. This option disables the behavior and makes glibc perform the IPv6 and IPv4 requests sequentially (at the cost of some slowdown of the resolving process).
Adding the single-request option to resolv.conf solved the problem.