The @researchfairy noticed [1] that something's wrong with PubMed so I did a little investigating with the help of my favourite command line tools, host(1), traceroute(1) and RIPE's BGPPlay tool.

The hostname for pubmed is pubmed.ncbi.nlm.nih.gov. The DNS zone is ncbi.nlm.nih.gov.

DNS zones have serial numbers. That's how secondary nameservers can figure out if something has changed and they should fetch a new copy of the zone to serve. They figure this out using a serial number which, by convention, is a date and a sequence number.

% host -t soa ncbi.nlm.nih.gov
ncbi.nlm.nih.gov has SOA record dns1-ncbi.ncbi.nlm.nih.gov. systems.ncbi.nlm.nih.gov. 2025022701 10800 5400 2419200 82800

This suggests that the zone was last changed a few days ago. So it's not a DNS change that led to this problem.

That zone has seven nameservers. Rather a lot, but not unusual for an old government system,

$ host -t ns ncbi.nlm.nih.gov
ncbi.nlm.nih.gov name server ns.nih.gov.
ncbi.nlm.nih.gov name server ns2.nih.gov.
ncbi.nlm.nih.gov name server ns3.nih.gov.
ncbi.nlm.nih.gov name server lhcns1.nlm.nih.gov.
ncbi.nlm.nih.gov name server lhcns2.nlm.nih.gov.
ncbi.nlm.nih.gov name server dns1-ncbi.ncbi.nlm.nih.gov.
ncbi.nlm.nih.gov name server dns2-ncbi.ncbi.nlm.nih.gov.

Asking these nameservers directly for the address of pubmed, we find that the ones ending with nlm.nih.gov work fine,

$ host -4 -t a pubmed.ncbi.nlm.nih.gov lhcns1.nlm.nih.gov.
pubmed.ncbi.nlm.nih.gov has address 34.107.134.59
28 min

but asking any of the first three does not work:

$ host -4 -t a pubmed.ncbi.nlm.nih.gov ns.nih.gov.
;; communications error to 128.231.128.251#53: timed out

What is wrong with the NIH nameservers?

To be continued...

[1] https://scholar.social/@researchfairy/114089685773663683

The research fairy (@[email protected])

Attached: 1 image This seems bad

Scholar Social

So, three nameservers out of seven for the pubmed.ncbi.nlm.nih.gov are broken. If you try to look at the web site, you stand a 3/7 chance of encountering something that is broken.

The nameservers have the following addresses:

$ host ns.nih.gov
host ns.nih.gov has address 128.231.128.251
$ host ns2.nih.gov
host ns3.nins2.nih.gov has address 128.231.64.1
$ host ns3.nih.gov
ns3.nih.gov has address 165.112.4.230

We can see that the addresses for the first two both start with 128.231 and might guess that they are relatively nearby to each other. This can be confirmed using traceroute. Go ahead and open a terminal and try it out!

traceroute 128.231.128.251
traceroute 128.231.64.1

for me, the paths, the sequence of routers between me and those addresses look broadly the same.

The other one is different,

traceroute 165.112.4.230

it goes a totally different way. So whatever problem is happening is not limited to a single site or datacentre.

Now we can turn to RIPE for some help.

Let us inspect the last address, https://stat.ripe.net/widget/bgplay#resource=165.112.4.230

This is very peculiar. Notice how that network is announced from two different places. And there seems to be a partition, they are not (visibly) connected to each other! This is not normal. I attach a screenshot. There is also some volatility, shown as path changes, the yellow bars at the bottom.

Looking at ns.nih.gov, https://stat.ripe.net/widget/bgplay#resource=128.231.128.251

this appears more coherent, but volatile. The network was withdrawn about 45 minutes before I took the second screenshot, about an hour ago. And then re-announced.

RIPE's BGPPlay is very nice, you can time travel and replay this incident as observed from the Internet. It takes a bit of background knowledge to decode what is going on though.

Someone is doing networking... Badly...

@researchfairy

RIPEstat

RIPEstat is an information service for Internet-related data and analytics.

As of this morning, maybe late yesterday depending on your timezone, PubMed should work again, and cancer.gov.

$ host -t spa ncbi.nlm.nih.gov
ncbi.nlm.nih.gov has SOA record dns1-ncbi.ncbi.nlm.nih.gov. systems.ncbi.nlm.nih.gov. 2025030201 32400 5400 2419200 82800

The domain ncbi.nlm.nih.gov is now handled only by the NLM nameservers without the top level NIH ones in the mix:

$ host -t ns ncbi.nlm.nih.gov
ncbi.nlm.nih.gov name server lhcns1.nlm.nih.gov.
ncbi.nlm.nih.gov name server lhcns2.nlm.nih.gov.
ncbi.nlm.nih.gov name server dns1-ncbi.ncbi.nlm.nih.gov.
ncbi.nlm.nih.gov name server dns2-ncbi.ncbi.nlm.nih.gov.

Interestingly, cancer.gov nameservers are the NIH ones,

$ host -t ns cancer.gov
cancer.gov name server ns.nih.gov.
cancer.gov name server ns2.nih.gov.
cancer.gov name server ns3.nih.gov.

and it seems to work just fine now.

So, on Saturday, someone broke part of the NIH network, which took down the NIH nameservers, apparently unintentionally given that they fixed it by late Sunday.

This just looks like a fairly high profile configuration error that took a while to fix. What they were actually trying to do we do not know...

Big Balls' Fat Fingers? Who knows?

@researchfairy @briankrebs

@ww @researchfairy @briankrebs is it notable that the IP address for TLD NIH, for example, is different now to the previous address before the borking?
@aleciabatson @researchfairy @briankrebs is it? i didn't take a note of the ip address for nth.gov itself (e.g. their web server) but it seems to be within NIH's own network... or maybe I misunderstand the question?
@ww @researchfairy @briankrebs I had looked up nih at the top of the downtime and the IP then was at a 23.41.x.x address. Once access resumed, it’s at 104.100.168.147.

@aleciabatson @researchfairy @briankrebs that's odd. If I ask Google's public resolver, I see:

$ host nih.gov 8.8.8.8
nih.gov has address 156.40.212.210

the addresses you mention belong to Akamai, a content delivery network.

It is possible they are giving different answers inside and outside the country. But if I use a VPN to appear to be in the USA, I get the same answer, so I can't reproduce what you are seeing.

It is also possible your ISP has some sort of caching arrangement where they intercept your traffic.

Which nameserver are you using?

@ww @researchfairy @briankrebs a Whois search indicates NS.NIH.GOV (97 domains), NS2.NIH.GOV (also 97 domains), and NS3.NIH.GOV (97 domains) are now registered at 104.100.168.147.

@aleciabatson

I do not understand what you are looking at. Those nameservers do not have Akamai addresses, e.g.

$ whois -h whois.nic.gov ns.nih.gov
Server Name: ns.nih.gov
IP Address: 128.231.128.251
Registrar: get.gov
Registrar WHOIS Server: whois.nic.gov
Registrar URL: https://get.gov
>>> Last update of WHOIS database: 2025-03-04T11:15:36Z <<<

It is possible that some domains that they are responsible are served by Akamai but none of the important ones that we have looked at.

Could you give some examples from these 97 domains that you think have to do with that 104.100.168.147 address?

get.gov

.Gov is the top-level domain for governments in the U.S. Request and manage a .gov domain.

@ww @aleciabatson Sorry about resurrecting an old thread, but do you think this could be related to the absolute torrent of abuse journals and society websites are getting from overly aggressive AI scrapers? Maybe someone was trying to add a caching layer & messed it up?
@williamgunn @ww I suspect what occurred in this case is registrations were centralized/moved, thereby causing downtime during DNS repointing and the change in IP addresses.