Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

[Solved] intermittent DNS lookup failures in Debian 12 on AWS

Linux Kernel, Network, and Services configuration.
Post Reply
Message
Author
andyholtmacc
Posts: 5
Joined: 2024-04-04 15:24
Has thanked: 5 times
Been thanked: 1 time

[Solved] intermittent DNS lookup failures in Debian 12 on AWS

#1 Post by andyholtmacc »

Hello all. First post time. Apologies in advance if this question seems a bit vague, but I'm not sure where to look to evidence of this issue.

This is initially a request to see if anyone's seeing anything similar!

Circumstances are:
- We run workloads in AWS, historically on a weird mix of Amazon Linux, CentOS and Ubuntu.
- I want to gradually migrate to using Debian, using official Debian AMIs, as we retire servers and introduce new services.
- I have a few servers now running Debian 12 x86_64, I thought with no issues - but ...
- A service I've just migrated from an Ubuntu 16.04 box, a Python app, is having occasional issues which it claims are due to DNS lookup failures, when trying to establish a new PostgreSQL connection to the endpoint of an AWS RDS database.
- The app is Indico (see https://getindico.io/), so not one we write ourselves. It uses psycopg2 as the database client library.
- The errors look like this:

Code: Select all

(psycopg2.OperationalError) could not translate host name "indico.cluster-example.eu-west-2.rds.amazonaws.com" to address: Name or service not known
- By and large, the app is working fine, so it is clearly usually able to talk to the database, but we get a few tens of these errors every day in the app's log files.
- We never saw this with the old Indico software on the Ubuntu server.
- This prompted me to check our test Debian 12 box, also running a test version of this Indico service - and lo and behold, it has also had occurrences of this DNS error - but hardly any, since the test service isn't really used.
- All our servers use the AWS VPC DNS resolver.
- As far as I can tell, we've not seen any DNS failures of any kind on any of our other servers, many of which talk to databases, only these two new Debian 12 boxes.

I have been discussing this over on the Indico forums, and the feeling is that it is not likely to be some new unknown issue in Indico or psycopg2 - I agree with this.

So, currently, I have a DNS service (AWS's VPC DNS resolver) which I expect to be totally reliable, being used to look up a name (the AWS RDS endpoint) which I would expect to always return an answer, by an OS (Debian 12) and a DB client library (psycopg2) which I again would be amazed at if they had unknown new bugs in the area of DNS lookups.

My Debian 12 servers have /etc/resolv.conf as a link to /run/systemd/resolve/resolv.conf, as expected, and it contains something like:

Code: Select all

# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 10.211.32.2
search .
This looks normal. The boxes use DHCP, as per usual in AWS.

Can anyone suggest anything I can do to attempt to diagnose this?

Is there any more information I could provide or check, which might help?

Many thanks if you read this far! 8) Andy
Last edited by andyholtmacc on 2024-04-10 12:11, edited 3 times in total.

Aki
Global Moderator
Global Moderator
Posts: 3082
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 76 times
Been thanked: 417 times

Re: intermittent DNS lookup failures in Debian 12 on AWS

#2 Post by Aki »

Hello,

According to your previous message, your Debian 12 installation uses systemd-resolved as DNS client (network name resolution to local applications).

I don't know if it is the default configuration used by AWS Debian images, but, as far I know, this is not the default Debian 12 configuration.

A quick search lead me to a previous thread in the forum: and to systemd issues: Hope this helps.
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

flaviovs
Posts: 1
Joined: 2024-04-05 21:59
Been thanked: 1 time

Re: intermittent DNS lookup failures in Debian 12 on AWS

#3 Post by flaviovs »

Aki wrote: 2024-04-05 16:11 According to your previous message, your Debian 12 installation uses systemd-resolved as DNS client (network name resolution to local applications).

I don't know if it is the default configuration used by AWS Debian images, but, as far I know, this is not the default Debian 12 configuration.
(...)
Indeed as of today Bookworm images on AWS are still using systemd-resolved, even though it's not the official resolver for Debian. There's a bug report about that already.

andyholtmacc
Posts: 5
Joined: 2024-04-04 15:24
Has thanked: 5 times
Been thanked: 1 time

Re: intermittent DNS lookup failures in Debian 12 on AWS

#4 Post by andyholtmacc »

Thanks for the responses, both. Not sure how I didn't find that thread [1] when I searched this category before posting - I think I just searched for DNS in the title, doh! Anyway, it matches what I see, and the workaround, for me, of simply purging systemd-resolved, makes the issue go away. <edit> DO NOT USE THIS WORKAROUND - SEE MY SUBSEQUENT POST BELOW

Code: Select all

apt purge --auto-remove systemd-resolved
This package certainly seems to be installed and enabled by default in the official Debian AWS EC2 AMIs, but doesn't seem to be essential.

The symptoms of the issue seem to be limited to resolving CNAMEs, or at least I only see it with CNAMEs and not A records. Also, disabling IPv6 (I tested using nc and then nc -4) stops the issue happening.

[1] viewtopic.php?t=157142
Last edited by andyholtmacc on 2024-04-11 07:51, edited 1 time in total.

Aki
Global Moderator
Global Moderator
Posts: 3082
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 76 times
Been thanked: 417 times

Re: intermittent DNS lookup failures in Debian 12 on AWS

#5 Post by Aki »

Hello,
andyholtmacc wrote: 2024-04-09 09:12 [..] simply purging systemd-resolved, makes the issue go away [..]
I'm glad you sorted it out. :)

Please, mark the discussion as "solved" manually adding the text tag "[Solved]" at the beginning of the subject of the first message; i.e. :
[Solved] Intermittent DNS lookup failures in Debian 12 on AWS
Happy Debian !
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

andyholtmacc
Posts: 5
Joined: 2024-04-04 15:24
Has thanked: 5 times
Been thanked: 1 time

Re: intermittent DNS lookup failures in Debian 12 on AWS

#6 Post by andyholtmacc »

Not quite solved, and it looks like I was too hasty in simply purging systemd-resolved. After a reboot, that leaves me with no /etc/resolv.conf, or rather, it remains as a link to /run/systemd/resolve/resolv.conf, but that is gone. So, I can easily manually make a new /etc/resolv.conf, or reinstall systemd-resolved but disable it. Basically clean up the mess I left. :-)

However, I would like to understand more about how Debian 12 is supposed to be configured for DNS name resolution if systemd-resolved isn't in use. I am aware of the information here [1] which mentions that systemd-resolved is now a separate package and isn't installed by default. So, OK, on a clean install, without systemd-resolved, how does DNS resolution work, and what package sets it up? Using dpkg, I can't see that /etc/resolv.conf is claimed by any package.

I also note that when I re-install systemd-resolved, it restores /etc/resolv.conf as a new link, but now it's to /run/systemd/resolve/stub-resolv.conf instead of to /run/systemd/resolve/resolv.conf, as it was when I first started investigating this. Would love to know how this is chosen or configured.

[1] https://www.debian.org/releases/bookwor ... d-resolved

andyholtmacc
Posts: 5
Joined: 2024-04-04 15:24
Has thanked: 5 times
Been thanked: 1 time

Re: intermittent DNS lookup failures in Debian 12 on AWS

#7 Post by andyholtmacc »

OK this seems to be a rabbit hole I wish I wasn't in.

Having /etc/resolv.conf as a link to /run/systemd/resolve/stub-resolv.conf seems to be the recommended set up, and is indicated in the output of resolvectl status by:

Code: Select all

resolv.conf mode: stub
However, when starting with a clean Debian 12 EC2 from the official AMI, we get the link /etc/resolv.conf pointing at /run/systemd/resolve/resolv.conf, and this is indicated in the output of resolvectl status by:

Code: Select all

resolv.conf mode: uplink
If I then remove and then re-install systemd-resolved, the link /etc/resolv.conf is set to point at /run/systemd/resolve/stub-resolv.conf and we're in stub mode.

That mode as reported by resolvectl is apparently implied by whether /etc/resolv.conf is a file, or a link, and where the link points. It would be nice to know how that mode was chosen for us.

In any case, stub mode is just as bad as uplink mode when it comes to the CNAME resolution issue. I have now gone down the route of keeping systemd-resolved installed, but changing /etc/nsswitch by hand to remove resolve [!UNAVAIL=return], so I have this, a change which seems to take effect as soon as the file is modified:

Code: Select all

hosts:          files dns myhostname

andyholtmacc
Posts: 5
Joined: 2024-04-04 15:24
Has thanked: 5 times
Been thanked: 1 time

Re: [Solved] intermittent DNS lookup failures in Debian 12 on AWS

#8 Post by andyholtmacc »

Maybe my last post on the subject. Over on the GitHub bug https://github.com/systemd/systemd/issues/29069, I've been informed of the proper workaround.

Aki
Global Moderator
Global Moderator
Posts: 3082
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 76 times
Been thanked: 417 times

Re: [Solved] intermittent DNS lookup failures in Debian 12 on AWS

#9 Post by Aki »

andyholtmacc wrote: 2024-04-11 08:12 Maybe my last post on the subject. Over on the GitHub bug https://github.com/systemd/systemd/issues/29069, I've been informed of the proper workaround.
Thanks for reporting back.
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

Post Reply