The @researchfairy noticed [1] that something's wrong with PubMed so I did a little investigating with the help of my favourite command line tools, host(1), traceroute(1) and RIPE's BGPPlay tool.

The hostname for pubmed is pubmed.ncbi.nlm.nih.gov. The DNS zone is ncbi.nlm.nih.gov.

DNS zones have serial numbers. That's how secondary nameservers can figure out if something has changed and they should fetch a new copy of the zone to serve. They figure this out using a serial number which, by convention, is a date and a sequence number.

% host -t soa ncbi.nlm.nih.gov
ncbi.nlm.nih.gov has SOA record dns1-ncbi.ncbi.nlm.nih.gov. systems.ncbi.nlm.nih.gov. 2025022701 10800 5400 2419200 82800

This suggests that the zone was last changed a few days ago. So it's not a DNS change that led to this problem.

That zone has seven nameservers. Rather a lot, but not unusual for an old government system,

$ host -t ns ncbi.nlm.nih.gov
ncbi.nlm.nih.gov name server ns.nih.gov.
ncbi.nlm.nih.gov name server ns2.nih.gov.
ncbi.nlm.nih.gov name server ns3.nih.gov.
ncbi.nlm.nih.gov name server lhcns1.nlm.nih.gov.
ncbi.nlm.nih.gov name server lhcns2.nlm.nih.gov.
ncbi.nlm.nih.gov name server dns1-ncbi.ncbi.nlm.nih.gov.
ncbi.nlm.nih.gov name server dns2-ncbi.ncbi.nlm.nih.gov.

Asking these nameservers directly for the address of pubmed, we find that the ones ending with nlm.nih.gov work fine,

$ host -4 -t a pubmed.ncbi.nlm.nih.gov lhcns1.nlm.nih.gov.
pubmed.ncbi.nlm.nih.gov has address 34.107.134.59
28 min

but asking any of the first three does not work:

$ host -4 -t a pubmed.ncbi.nlm.nih.gov ns.nih.gov.
;; communications error to 128.231.128.251#53: timed out

What is wrong with the NIH nameservers?

To be continued...

[1] https://scholar.social/@researchfairy/114089685773663683

The research fairy (@[email protected])

Attached: 1 image This seems bad

Scholar Social

So, three nameservers out of seven for the pubmed.ncbi.nlm.nih.gov are broken. If you try to look at the web site, you stand a 3/7 chance of encountering something that is broken.

The nameservers have the following addresses:

$ host ns.nih.gov
host ns.nih.gov has address 128.231.128.251
$ host ns2.nih.gov
host ns3.nins2.nih.gov has address 128.231.64.1
$ host ns3.nih.gov
ns3.nih.gov has address 165.112.4.230

We can see that the addresses for the first two both start with 128.231 and might guess that they are relatively nearby to each other. This can be confirmed using traceroute. Go ahead and open a terminal and try it out!

traceroute 128.231.128.251
traceroute 128.231.64.1

for me, the paths, the sequence of routers between me and those addresses look broadly the same.

The other one is different,

traceroute 165.112.4.230

it goes a totally different way. So whatever problem is happening is not limited to a single site or datacentre.

Now we can turn to RIPE for some help.

Let us inspect the last address, https://stat.ripe.net/widget/bgplay#resource=165.112.4.230

This is very peculiar. Notice how that network is announced from two different places. And there seems to be a partition, they are not (visibly) connected to each other! This is not normal. I attach a screenshot. There is also some volatility, shown as path changes, the yellow bars at the bottom.

Looking at ns.nih.gov, https://stat.ripe.net/widget/bgplay#resource=128.231.128.251

this appears more coherent, but volatile. The network was withdrawn about 45 minutes before I took the second screenshot, about an hour ago. And then re-announced.

RIPE's BGPPlay is very nice, you can time travel and replay this incident as observed from the Internet. It takes a bit of background knowledge to decode what is going on though.

Someone is doing networking... Badly...

@researchfairy

RIPEstat

RIPEstat is an information service for Internet-related data and analytics.

@ww @researchfairy When you want to run a resilient DNS service over a wide geographic area that also performs well, you can announce different routes for networks in diff regions and only one will answer. This is called anycast and is how all the big operators do it (Google, Cloudflare, quad9, etc.) and how their 1 or 2 IPs can serve the world so efficiently.
1/2

@ww @researchfairy

Not saying that’s what’s going on here, but it’s a possibility. I doubt young DOGE SWEs have ever heard of anycast and it’s super easy to fuck it up.