"Commercial services exist that sell you zone data, but it seems to me that this data ought to be public, so I excluded ccTLDs from my analysis for the time being."
It is worth considering that if DNS data is public it can easily be mirrored, e.g., users can maintain their own copies. Using locally-stored DNS data is rather easy to bypass the "centralisation" discussed by the author. I have been doing this for over 20 years.
Apparently some folks, like the people running ccTLD registries, believe that allowing the public to know IP addresses for large numbers of domains is a "security" issue. But it is not just the registries. For example, another source of DNS data are public scans such as the ones at opendata.rapid7.com. Recently, opendata.rapid7.com decided that this data should no longer be public.
It reminds me of telephone books and "unlisted" numbers. DNS seems to have no equivalent.
"But if you look closer, you'll notice that many of the nameservers are in the same domain, so if we then flatten the whole thing, we see a bit more of a centralization. "
Before I started using a localhost-bound forward proxy instead of DNS to map domain to IP for applications I used to run localhost-bound authoritative nameservers.1
By putting A record data into local zone file and querying that, non-recursively, instead of recursive DNS and a cache, I could eliminate many unnecessary lookups.
To gather the A record data I used non-recursive queries, not caches. (Which BTW is an excellent way to discover overly complex and brittle DNS configurations.)
One project I was working on was putting all domains using popular registrars into the same zones. For example, imagine example-domain.com uses ns.registrar-nameserver.com.2 Why query 192.5.6.30, a .com nameserver IP, to discover the nameserver for example-domain.com if I already have a list of every domain that uses ns.registrar-nameserver.com. I can put this list of domains into a local zone. Then I can just query ns.registrar-nameserver.com directly and skip remote access to 192.5.6.30 port 53. I found that the number of domains changing nameservers frequently was relativaly small, and was not enough to affect the utility of all the static DNS data.
1. I still do, e.g., I still run a local custom root.zone but I do not put much A record data into it. Instead I use DNS wildcards, as an alternative to using the firewall, to direct HTTP traffic to the proxy.
2. At the time, domaincontrol.com namservers were #1. But it is alarming to see googledomains.com servers at #2. Google did not even exist as a registrar back then, before new gTLDs and cloud computing hype. That said, it is wise to look at the actual domains rather than just count them. If an ISP or CDN is has a domain for every IP address it controls, where the IP address is contained in the domain name, that can skew the numbers considerably.
"Take out Verisign, and the internet's going to have a bad day."
I keep a local copy of com.zone. I can remember when it was a pain just to find the HDD storage for it. With today's computers, it can fit easily on a "phone" and even into RAM. IMHO, this is the most important data in the internet. It is nice to know that network operators could block port 53 and I could still use a very large portion of the www. We already have filtered DNS in certain locations, e.g., where all traffic to port 53 is redirected. I have used internet access with filtered DNS extensively. Using locally-stored DNS data is at least a way to obliviate DNS filtering. I have also used it when www sites have DNS problems.
> Apparently some folks, like the people running ccTLD registries, believe that allowing the public to know IP addresses for large numbers of domains is a "security" issue.
There's some merit to this.
Doing the reverse lookup - finding all other domains that resolve to the same IP (or same subnet) can leak a lot of information. For example: what other businesses are run by the same entity, and test/staging/admin infrastructure.
It can also be another way to attack a specific http server, since different names on the same IP can be routed to different applications or even different internal servers.
Actually relying on this is silly security-by-obscurity, but there's really no upside to publishing a detailed map.
I should add that one motivation for the project was the possibility of encrypted DNS, made possible at the time by DNSCurve and CurveDNS. The idea is that a large registrar could run CurveDNS forwarders in front of its authoritative servers and the user could then have encrypted DNS packets for many, many domains by connecting directly to the registrar's servers. I doubted at the time, and I still do, that root servers or registries would ever run CurveDNS forwarders.
"If an ISP or CDN is has a domain for every IP address it controls, where the IP address is contained in the domain name, that can skew the numbers considerably."
This is a dumb statement as these are probably subdomains not domains. I apologise. That said, when a CDN is also a domain registry, it's not unthinkable.
It is worth considering that if DNS data is public it can easily be mirrored, e.g., users can maintain their own copies. Using locally-stored DNS data is rather easy to bypass the "centralisation" discussed by the author. I have been doing this for over 20 years.
Apparently some folks, like the people running ccTLD registries, believe that allowing the public to know IP addresses for large numbers of domains is a "security" issue. But it is not just the registries. For example, another source of DNS data are public scans such as the ones at opendata.rapid7.com. Recently, opendata.rapid7.com decided that this data should no longer be public.
It reminds me of telephone books and "unlisted" numbers. DNS seems to have no equivalent.
"But if you look closer, you'll notice that many of the nameservers are in the same domain, so if we then flatten the whole thing, we see a bit more of a centralization. "
Before I started using a localhost-bound forward proxy instead of DNS to map domain to IP for applications I used to run localhost-bound authoritative nameservers.1
By putting A record data into local zone file and querying that, non-recursively, instead of recursive DNS and a cache, I could eliminate many unnecessary lookups.
To gather the A record data I used non-recursive queries, not caches. (Which BTW is an excellent way to discover overly complex and brittle DNS configurations.)
One project I was working on was putting all domains using popular registrars into the same zones. For example, imagine example-domain.com uses ns.registrar-nameserver.com.2 Why query 192.5.6.30, a .com nameserver IP, to discover the nameserver for example-domain.com if I already have a list of every domain that uses ns.registrar-nameserver.com. I can put this list of domains into a local zone. Then I can just query ns.registrar-nameserver.com directly and skip remote access to 192.5.6.30 port 53. I found that the number of domains changing nameservers frequently was relativaly small, and was not enough to affect the utility of all the static DNS data.
1. I still do, e.g., I still run a local custom root.zone but I do not put much A record data into it. Instead I use DNS wildcards, as an alternative to using the firewall, to direct HTTP traffic to the proxy.
2. At the time, domaincontrol.com namservers were #1. But it is alarming to see googledomains.com servers at #2. Google did not even exist as a registrar back then, before new gTLDs and cloud computing hype. That said, it is wise to look at the actual domains rather than just count them. If an ISP or CDN is has a domain for every IP address it controls, where the IP address is contained in the domain name, that can skew the numbers considerably.
"Take out Verisign, and the internet's going to have a bad day."
I keep a local copy of com.zone. I can remember when it was a pain just to find the HDD storage for it. With today's computers, it can fit easily on a "phone" and even into RAM. IMHO, this is the most important data in the internet. It is nice to know that network operators could block port 53 and I could still use a very large portion of the www. We already have filtered DNS in certain locations, e.g., where all traffic to port 53 is redirected. I have used internet access with filtered DNS extensively. Using locally-stored DNS data is at least a way to obliviate DNS filtering. I have also used it when www sites have DNS problems.