15 October 2019
CHAIR: Welcome back everyone, please take your seats, we're about to start.
Welcome back everyone. We're starting. Our first speaker for this slot is Andrew from ICANN. Please go ahead.
ANDREW McCONACHIE: I work for ICANN, I am going to be talking about a programme I wrote called Danish which does DANE validation for HTTPS traffic. I'm not going to be talking about tasty pace trees and other baked goods, I am sorry so disappoint. This will be about software.
I will give a quick overview of what DANE is. DANE has been used ‑‑ it's being used more and more in SMTP. It's really not being used in hospitals but it can be used for both. Some presentation assumptions. I'm going to assume a basic knowledge of DNS and DNSSEC. Some basic knowledge of TLS and the punishing TX 509 certs. What DANE is, is it's a way to tie X.509 certificates trust to DNS, that's a really simple way of putting it. It's starting to use more SMTP, it's very little used in HTTPS right now. I know there was, there was some work in browsers a while ago. But, there is very little validation going on in HTTPS and there aren't that many TLS records out there right now.
The next slide is about TLSA resources records and the TLSA resource record, here is one up there at the top. That's one I found in the wild. It starts off with underscore port underscore protocol and then a domain name and then you have got, you know, your IN and your TTL, TLSA and then the stuff I have highlighted in green there, it's kind of the policy language aspect of a TLSA record and then after that you have got the hash. So, quickly going through what that 301 stuff stands for. The first one is a certificate usage. And you get four options. You can ‑‑ you can either go with ‑‑ the first one is trust anchor, TA stands for trust anchor and it also validates based on PKIX. If you state zero there, you're going to get DANE validation and whatever your local store is for X 509 certs.
The second one, EE, stands for end entity. So you are going to get PKIX validation as well as DANE validation, so it's going to be DANE and local store. Then 2 and 3, which have kind of the inverse of that, which is DANE only, no PKIX validation and then 2 is trust anchor and 3 is end entity.
The second thing there, is called the selector. And it determines whether or not you want to go after the full ‑‑ whether you are going to match on the full X.509 CERT or whether you are just going to parse out what's called the SPKI, the subject public key information, which is basically the public key that's inside of the CERT. And then the matching type. This determines what kind of hash is up here at the end of the record. 0 is no hash used, 1 is SHA 256 and 2 is SHA 5 1. The RFC states that ‑‑ SHA 512 is a must ‑‑ sorry, SHA‑256 is a must, is that 512 is I think it's a may or a should for validators.
So, on to a bit more what my programme does and what I have been doing for the past couple of years. I wrote something called Danish, which does validation for HTTPS, DANE validation for HTTPS, kind of as a middle box application on Linux. I first wrote it in Python kind of as an experiment and it only ran in OpenWRT. And you can still get it. You can install and OpenWRT. I have no idea how many people are using this. I ran it for a while, Did some validation, that's how I ran into broken TLSA records in the wild. People are using this and I am validating and get a validation failure, and no, I can't get to a website.
I have no idea how many people are actually doing this but it exists out there if you want to install an OpenWRT, you can. That experiment ‑‑ that was mainly just like I wanted to see if I could do this and I did. So then I thought all right, cool, so we wrote it in rust initially, we could call this Version 2, and I had it supported both middle box and host operation on Linux, so basically in IP tables you could say forward or output chain. It's BSD 3 licensed. And I have been doing this for three years now. I admit it's primarily an exercise in NX domain response. I have found some TSLA records in the wild. I assume when people putting it in their zone they want me to validate it.
With exactly is Danish? It's a Linux daemon providing HTTPS DANE, and it's not exactly man in the middling things. What it does is sniffs the TLS handshake traffic and then it scarfs out the SNI and the X.509 certs. So, it's sniffing the client hello and the server hello. And then if no TLSA record is found or if validation passes, it does absolutely nothing. If validation fails, then what it does is it installs some ACLs using IP tables. Installs some quick ACLs to kill the TCP connection and it installs more to block the AC I using pattern matching. It's just a DS stub resolver. The best way to run it is just having something like Unbound or some other DNSSEC validating resolver on local host. It could be run on firewalls or end hosts. And the recent thing I added is the ability to just block SNIs based on RPZ, so, when I receive the TLS client hello, I just do a DNS lookup for the SNI and if it returns NX domain, I treat it as a validation failure.
And it supports TLS 1.0, IPv4 and IPv6.
So here is a bit more on the operation of it. Like I said, we sniff the TLS client hello and the server hello. We parse them both to get the SNI from the client hello and the server hello from the ‑‑ the X.509 from the server hello. We perform the DNS lookup. If no TLSA record is found, do nothing. If X.509 certificates and TLSA record match, do nothing. Again, like kind of a principle of do no harm. And I have no local certificate store right because I have no idea what kind of client machines are behind me as a middle box, or even if I'm running on a host machine, so, I don't look any local certificate store.
But if validation fails, then I install ‑‑ Danish installs two short‑lived ACLs to force TCP time‑out. It does not send a TCP reset because I don't want to generate packets, that's probably a dangerous path to go down, And it installs one long‑lived ACL to prevent further egressing TLS client hellos.
And here is ‑‑ this is basically everything I just said but in a handier diagram. You can see the TLS client hello go out. So, this could be a middle box, you know, it could be a firewall, or it could just be the same host machine, it doesn't matter. You have got the client over here, web server here, time is on the left. Client hello goes out. When Danish sniffs the TLSA client, it does a DNS query and then when that server hello comes back, when it has everything, so you know when DNS is returned and when it has the complete TLS server hello, which, you know, typically requires a reconstructing TCP streams and stuff like that, then it will do a validation check, and if it's a failure, then it will, you know, install the ACLs to kill the HTTPS connection.
This is a race, you know, can we get the ACLs installed before the TLS connection sets up? It's always going to be a race. I find that running it on middle boxes, I usually win the race. And, you know, if user is actually using a web browser and they receive a validation failure, the web page, any modern web page isn't going to load because they take forever to load anyways, so we're talking like milliseconds here.
So, here is an example of the ACLs I have been talking about. So this is a validation failure to a text‑box that I use. And you can see down here, these are these kind of two shortlived ACLs and then here we have the longerlived one. This is just a comment telling me what it's blocking, I am blocking that domain name and this gibberish here is just a pattern match, it says string match, but when you put these little pipes on the end of it it means HEX decimal pattern.
These are some random HTTPS TLSA RRs I have seen in the wild. This is by no means scientific. This is just stuff I have seen. This is my own anec‑data, as it were. Some interesting comments BGP these records, and I have seen a lot more than these, this is just a sampling. I have never seen anyone use certificate usage 0 or 2, ever. I find that kind of interesting. Remember, certificate usage 0, 2 are about the trust anchor, whereas 1 and 3 are about the end entity certificate. So, yeah, nobody wants to have 0 or 2. Another thing I find interesting is that almost everybody is using SHA 256, I think DEFCON might be the only SHA 512 record I have seen. A fair amount of distribution between 0 and 1, whether they want, you know, the full X.509 CERT or just the SPKI to match. But I have never seen a 0 or 2 or usage.
So these are some CLI options for Danish. You can configure, you know, where you want the IP table ACLs to install, whether you want it to be output or forward, that determines whether you are forwarding traffic or if the host is originating the traffic. You can run it on different interfaces. This is kind of an internal IP tables change that we use to install the ACLs, and then this is this new feature I added recently, you know, just to block ACPS sessions based on RPZ.
IPv6 support is just enabled if IP 6 tables is present. So there is no option for that.
Some lessons learned from all of this. This is basically an experiment. This middle box, DANE validation thing, the code is likely still like bug‑ridden. I have no idea if anybody else is doing this besides me. Sometimes Danish installs the ACLs in time, sometimes not. It seems to be a lot better on middle boxes than it does on hosts. Another thing I have noticed just in looking at all the SNIs egressing my house and other networks where I have deployed this, is that HTTPS is just kind of used as everything now. Like everything is basically using HTTPS. It's basically the new TCP. So we often think of, you know, HTTPS is primarily being used for web browsers, it's probably not so true any more. And a happy face around for TLSA RRs existing in the wild. And a bit of a sad face for this PCAP filter in IPv6. I know it's really tough to get good EPBF code support for IPv6 because of variable length headers and all that, but given all the work that's going on in EBPF, it would be nice if there was better support for IPv6 there.
Some of my crazy ideas about how I might take this into the future. With Danish I need to do a lot more testing. I need to get it working on other platforms. I'd like to add syslog support so that network admin could notice that there's been a validation failure. Some user on their network has experienced a validation failure, and you know, they'll do something.
Maybe add support for other protocols. And then these last two are kind of in the category of I wish I had more free time, maybe add HTTPS DANE validation to some popular scripting languages, and survey more TLSA records in the wild.
So, that's it. A relatively short presentation. Go ahead, try it out. You can get it there, compile it, there is a bit more information there. And please send me your bug reports and I look forward to your questions.
CHAIR: Any questions?
AUDIENCE SPEAKER: Marco Davids, SIDN. A simple question: Once you have added an ACL to the filter, will you also remove it, and, if so, when? What's the trigger for that.
ANDREW McCONACHIE: Just time. Time is the trigger for that, so that the two short ACLs, they ‑‑ I'm trying to remember what the time‑out values are ‑‑ I think they are 30 seconds. Maybe a minute. The longer one, which blocks the SNI, I think at some point I had it set to like 30 minutes. I tried, you know, maybe I should set this to the TTL value in the TLSA record, but you shouldn't really use TTL value in DNS records for anything else. So I don't know, maybe I could make that a configureable too. It's basically just time.
AUDIENCE SPEAKER: Randy Bush, Arrcus and IIJ and running 211. Kind of, you are willing to block a session but you're not willing to generate an active packet. Where is the decision there?
ANDREW McCONACHIE: It's arbitrary. I mean ‑‑
RANDY BUSH: Thank you.
ANDREW McCONACHIE: You are saying maybe I should be generating TCP resets back to the clients? Maybe I should be?
RANDY BUSH: Just trying to understand your tradeoff space.
ANDREW McCONACHIE: I don't know. This is an experiment. Like, maybe that's a good idea.
AUDIENCE SPEAKER: Actually user of certificate 0 end 2, both of them, both for my TLSA records, so you have to open up my domain, and but ‑‑ my question is, if I get it correctly, the long‑lived ACL is the one that is the string match in IP tables?
ANDREW McCONACHIE: Correct.
AUDIENCE SPEAKER: Okay. I wonder what the performance of such, because I think it's very, very bad.
ANDREW McCONACHIE: It is, yeah. It's happening in user space, yeah.
AUDIENCE SPEAKER: Maybe the ‑‑ it's a better option preformance‑wise?
ANDREW McCONACHIE: You can reset that 1 TCP session, but then clients will just immediately start new TCP sessions. Whereas if you block the TLS client hello, you know, you are just killing the clients forever.
AUDIENCE SPEAKER: Okay. Thank you.
AUDIENCE SPEAKER: Hi, I am Simon Leinen from SWITCH. I would also worry about the user experience of this mechanism because I would guess that the rate of failures will be dominated by configuration errors, not attacks, because of the, I don't know, just base rate, thinking. And in your case, the user gets a time‑out, maybe with a reset they will get the other funny symptoms that don't explain to the user what is going on. So, I think that's very problematic as a security mechanism in general. But maybe you could do something where, which is a bit more elective. You said logging the failures, that is certainly a good thing to do because it's really more an of an operator consideration. I think, it should also be possible somehow to expose the failures to the user, to the connecting client in some sort of cite channel, maybe by sending web notifications or something when this happens. Just trying to think about ways to give the user better information of what's happening. As I said I'm sure that 99 percent of these cases will be totally harmless and will not be evil Iranian hackers, but people just forgetting to update their TLSA records when they change certificate providers.
ANDREW McCONACHIE: Definitely, now, I think most of the errors are just configuration and not actual attacks.
AUDIENCE SPEAKER: Remote questions: My name is Henriette from the RIPE NCC. I have a question from a remote participant, It's Guy Boynes [phonetic] from ‑ New Media, and it's a short question, maybe a long answer. Is DANE relevant today?
ANDREW McCONACHIE: Yeah. For HTTPS, no, Not really, Not currently. I don't really know anyone else other than me that's validating it. For SMTP, it does seem to be gaining relevance. That's not what this presentation is about. I mean, this presentation is about kind of an experiment I have been running but for HTTPS I would say it's probably not all that relevant.
AUDIENCE SPEAKER: Robert, RIPE NCC. I wasn't sure if your methodology actually depends on being able to snoop the SNI or not but if it does how do you see the future assuming TLS 1.3 actually happens and that will encrypt it. Is there a future for such an approach? I like the approach but I'm not sure where it's going to go.
ANDREW McCONACHIE: I don't know either. I'm not really sure where it's going to go either. I don't know what the future of ES N I actually holds. I guess that people will be running TLS 1.2 ten years from now still because things never seem to go away in the Internet, but yeah, I don't know.
AUDIENCE SPEAKER: Geoff Huston. I experimented with a DANE validator from cz.nic a couple of years ago and part of the reason I don't run it any more is nobody does DANE signing. Secondly, it was incredibly slow, because validating that TLSA record requires a huge DNS search all the way up and down for the DS and DNSKEY records. In your experiment, did you notice that this was incredibly slow?
ANDREW McCONACHIE: Yeah. And ‑‑ well, I think I mentioned that the way I run this is, I run it with a local resolver, so I put Unbound on ‑‑ well you get some caching, but it is relatively slow. But HTTPS is also slow. So you only have to be faster than that.
GEOFF HUSTON: Have you thought about the DNSSEC ‑‑ sorry, the chain extension mechanism where the resolver, the authority actually sends you back all of the validation answers in bulk? And once you do that, have you thought about stage it into the TLSA certificate exchange and bypassing this entire side mechanism?
ANDREW McCONACHIE: I know people are working on that.
GEOFF HUSTON: Are they? I'd really like to know if they are working on it or not because it seems the way to take this and actually integrate it back with TLSA, but it seems that it always dies in a ditch somewhere, which is kind of sad.
ANDREW McCONACHIE: I just want to be a dumb stump resolver. I want to hand that off to somebody else that is better at it than me.
AUDIENCE SPEAKER: Benno Overeinder. We are, or my colleague, William, is still working on it with others.
GEOFF HUSTON: A light at the end of the tunnel.
CHAIR: Any other comment, questions? Thank you very much.
Let me welcome Mr. Moritz. The stage is yours.
MORITZ MULLER: So, today I want to talk about the route DNSSEC rollover, which you probably remembered happened last year. And you might also have remembered that a lot of people were quite afraid of this rollover that other things might break on the Internet and this 30 percent of people who use DNSSEC validating resolvers might not have Internet at some point in time. In this presentation, I will show you what did actually break, from our perspective, and what worked well, and what things were quite interesting and surprising doing the rollover.
This research was done together with people were six different organisations, among them two root server operations.
A few words about DNSSEC, just as a reminder. DNSSEC brings integrity to the DNS, so you have recursive resolver on the bottom and you have the root servers on the bottom and the recursive resolver would know what the name servers dotcom and then the route responds with this list of name servers. You also notice here that the answer that the name resolver gets from the root servers cannot be authenticated. So it might have been tampered on the way and it's not always verified. Therefore, we have DNSSEC and what it does, basically, you have a public /private key path on the route. The route signs its records that it has in the route zone. It attaches a signature to the record set and then it sends the signature together with the records to the recursive resolver.
The recursive resolver can validate the signature by having a copy, by having and trusting a copy of this key of the route and then can check whether the signature is correct or not.
So, doing the right case ‑‑ this key on the top has been replaced for the very first time since this first signing in 2010. And for the remaining slides I will use KSK‑2010 for the old key and KSK‑2017 for the new key. Now, you might wonder where were people that were worried about this rollover and why is rolling actually that hard.
First of all, I said, okay, roughly 30 percent or 25% of Internet population are using some kind of validating resolver. So, a failure in this rollover might have affected quite a lot of people. And this is the case because if a validating resolver cannot validate the signature that I showed you earlier, then this means that it will respond with an error code to its client which means that the client cannot visit its favourite website like Facebook or any other website on the Internet actually.
And so before we change KSK 2017, the old key and replace it with a new key we need to make sure that every validating resolver has a copy of the new key and trust this new key and therefore they are automatic processes that I will explain later but still we saw many validators using hard coded keys, we have containers that challenge key update mechanisms and generally people tend to forget about the DNS because the DNS is something that they set up once and then they just forget about it.
So, if the period of our research, we looked at basically this time frame of roughly two years of the rollover. It started with a very first step on the very left with a number one and this is the point in time when KSK 2017, so the new 2003 was publish in the route zone for the very first time. This is an interesting step for the resolvers that do an automatic date of their trust anchors, because at this point in time they see whether they see this keys for 30 consecutive days, and if this is the case you can trust the key and use it for validation in the future.
These things didn't go as planned, so, after a few months later, ICANN, put the rollover on a hold, which is step 2, but luckily, roughly a year later the rollover actually happened in October 11, 2018. When they already in the last phase of the rollover, which was considered more or less housekeeping, which contained the revocation of KSK 2010 and the remove from KSK 2010 from the root zone.
So let's start with the first part on the left, which is everything until the actual rollover.
And in this section we mostly used telemetry data which was first implemented in 2016 in validating resolvers and this telemetry has the goal to they can signal to the root servers which key they are currently trusting. They do this once per day and thereby we measured 100,000 validators daily. This actually already data from the telemetry protocol. In this graph we can see on the very top. The number of resolvers that trust KSK 2010, which is the red line, and this line stays like stable basically all the time before the rollover.
And then we see the trust in KSK 2017, which rises slowly. At the beginning it's quite low because these are probably resolvers that get shipped with KSK 2017 in the trust anchor ready or these are resolvers were operators configure connection 2017 manually. Then there comes this date when KSK 2017 is added to the root zone, and 30 days later we see this massive jump in trust in KSK 2017 and this is because at this point in time, the resolvers who do this automatic trust anchor mechanism see this KSK 2017 for 30 consecutive days and then add it to the trust anchor.
However, we also can see that 8 percent of resolvers do not trust KSK 2017. This hurt people quite a lot. The community in ICANN said let's take a break and let's see what actually caused these resolvers not picking up the key. Because if you would carry out a rollover and now this would mean that this 8 percent of radios would not be able to validate the trust anchor and the new root zone. This gave us more time to dig into the data and what we did was we looked at data that we saw at the B route and there we saw quite a the low of resolvers that only sent a signal of the telemetry protocol once and they also only signalled once in KSK 2010. Additionally, they only sent very few DNS queries to the root servers as well. It turned out that a few of these DNS queries were directed to a VPN provider, so what we did was, we downloaded the VPN software, or the end client of the VPN software and we saw that this client had only KSK 2010 configured in the trust anchor hard coded. We reached then out to the VPN operator or developer, and they confirmed the issue and they wrote out a bunch of patches and the impact of these patches we can see here.
So, we see at the point in time when the new VPN was released we can see that the number of resolvers that only have KSK 2010 in their trust anchor is decreasing. And this inside in the community then gave ICANN and the community enough confidence to actually move on with the rollover. So, to conclude, the take‑away from before the rollover. Most validators picked up KSK 2017 correctly. So this automatic trust anchor mechanism did work quite well.
But we saw also that one single application can influence the trust anchor signal quite a lot and had quite some impact and concerned quite a lot of people.
So, as I said, on October 11, at 4 p.m. UTC, the rollover actually happened. So at this point in time, the key set in the root zone is now assigned with a new key. And if at this point in time resolvers would not trust this new key, this he would start failing validation. And so at this point in time, we were particularly interested in the perspective of the end users, and unfortunately cannot see these, all these 24 percent of the Internet population that do validation, other people can do that I guess, so we used RIPE Atlas, and with the help of RIPE we set up a bunch of measurements to first measure at what point in time do resolvers see that the rollover actually happened, because this is TTL point in time where things can break. And second, whether these resolvers did actually break, so we will to figure out if resolvers were doing validation and whether they were changing from validating resolvers to not validating resolvers or even to failing resolvers after the rollover.
And, in total, we observed 35,000 resolver addresses in 3,000 autonomous systems.
And in this figure you actually see the time when more than 50% of resolvers see that the rollover actual happened, and this is roughly eight hours after the actual rollover. This is a bit earlier than you might expect because the DNSKEY records set of TTL of 48 because but because resolvers cap TTLs, we can see that this shift from trust using the old key to using the new key is a bit early as well.
We can see also a bunch of jumps in here, we know that RIPE Atlas is also biased to some extent because we know that they are using often resolvers from Google or CloudFlare and we can see at these points when these big resolver operators were then suddenly seeing the new key as well.
We also see ‑‑ saw that a few resolvers seem to fetch the old key just before the actual rollover. And we were wondering if this was on purpose. So we reached out to operators again and one operator confirmed to us actually that they, on purpose, fetched the old key just before the rollover so that they would have more time to sit out and see and wait what happened. And one of the reasons was because this rollover was happening at the same time as the DNS Org meeting, and they didn't want to be bothered during the social so they didn't want to debug DNSSEC issues while being a bit drunk, which I can understand.
So now we know that resolvers see that this rollover actually happened. Now, we wanted to check whether resolvers actually failed after the rollover. And the good news is, the majority of resolvers that we observed did not fail. So, there were either always validating before and after the rollover or they weren't validating before the rollover and also didn't start validating after the rollover in the period of a few days after the rollover.
But we also saw a bunch of resolvers that did have some issues. So roughly 900 of them were first validating resolvers but at some point of time after the rollover stopped doing validation and actually returned error codes to the clients. And 700 resolvers were validating resolvers before but stopped doing validation but still answered the clients but then still just didn't validate any more.
To confirm this first measurement, we also used another datasets, which is a day in the life of Internet datasets which is published by the root server operators during the period of the rollover, and what we knew from experience beforehand was that resolvers that would fail validation would send a high amount of DNSKEY queries to the root. And this is something that we observed for 360 of these failing resolvers here on the top. They sent way more DNSKEY queries to the root servers than after the rollover than before. So this was quite a strong sign that these resolvers actually had troubles during the rollover or after the rollover.
We also had look at how long it would take that these resolvers get fixed and the majority of these resolvers got fixed quite fast. Most of them within an hour. The others usually after a few hours. And three were never fixed after the rollover, at least not in the period of time that we observed.
These three resolvers were probably either abandoned resolvers where no one cared about or maybe the clients had a second resolver configured which actually worked, so the clients didn't really notice that there was a big issue.
One thing that made the news was an issue at the Irish ISP, EIR, and they reported DNS problems around the time of the rollover and also the clients reported that they couldn't access the Internet any more. So we were wondering if this was actually a DNSSEC issue. And we again had a look at the number of DNSKEY queries from the resolvers from RIPE Atlas ‑‑ from the resolvers of EIR, and we have seen here at the A and J route data that actually the resolvers of EIR sent a massive amount of DNSKEY queries right after the rollover which is a very strong sign that this DNS problem was actually a DNSSEC problem.
Then you might also notice that there is quite mysterious bump of DNSKEY queries after the removal of KSK 2010. But I will talk about that in a minute.
So, to sum up the takeaways from doing the rollover. A few resolvers have serious problems but the ones who had problems recovered relatively fast. I don't know how long the outage of EIR took but this was also recovered relatively fast even though, of course, people suffered there quite a bit.
So, let's move into the last phase of the rollover which was considered more or less housekeeping but some surprising things happened there as well. And the stage that we want to look first is the stage with number 5, which is the revocation of KSK‑2010, and at that point in time, the old key was published to the root zone with a revocation bit set and the resolvers who do the automatic trust anchor management would now see that they should not use this key for validation any more. And then the last step was the removal of KSK‑2010 from the root zone.
But first, let's step back to the number of DNSKEY queries and this is again A and J route data but we confirmed that with other root servers as well and here we can see the number of DNSKEY queries right after the rollover, after the revocation and after the removal of KSK‑2010.
We see here partially expected increase in DNSKEY queries right after the rollover. So, these are problem resolvers who had troubles and not having KSK‑2017 configured. But this number stayed stable and quite high, which was not that expected.
What was a bit more unexpected was the increase in DNSKEY queries after the revocation of KSK‑2010. And this increase even went up to 7% of the total query load at the root servers which at least worried some of the root server operators quite a bit. Luckily, this number of DNSKEY queries returned back to the load that was after the rollover, so, people were a bit more relaxed, but of course people were wondering, what caused these massive amount of DNSKEY queries.
So, what we did was we sent out a bunch of dynamics queries to the resolvers who caused these amount of queries and with DNS KSK queries some resolvers would tell us which software they were running. And it turned out that the resolvers that were willing to tell us their resolver software ran old versions of BIND. Then we reached out to these resolver operators and they also confirmed to us that the resolvers were actually indeed running older versions of BIND and one operator even gave us their configuration. So we took this configuration and we set up our own test. And this configuration basically said that there was no KSK‑2017 configured and there were a bunch of DNSSEC‑related flags and we used this configuration, spun up the resolvers, sent out a bunch of DNS queries to some domains, and observed the number of DNSKEY queries right after a few minutes.
And it turned out that occasionally there were quite a lot of DNSKEY queries sent from these resolvers to the route. And we believe that this is one of the reasons at least for this quite high amount of DNSKEY queries that we have seen in the root servers.
So, then we also returned back to the telemetry protocol that I have talked about earlier, the one where the resolvers signal to the root servers which key they have configured at the moment and we have seen another quite surprising thing here, which is the comeback of KSK‑2010.
So, right after the removal, we see that a number of resolvers that trust KSK‑2010 is actually rising again, and we were wondering here what is going on, and we found one reason for that was configuration issues, actually at the resolvers of our authors at VeriSign, but another problem seemed to be connected to Unbound. So, imagine if you are using an older version of the resolver Unbound which has KSK‑2010 and KSK‑2017 in a trust anchor, And you would install it right after the revocation of KSK‑2010. Then Unbound would start up, Would query the root zone for DNSKEY set and would see KSK‑2010 has been revoked, so I will revoke KSK‑2010 as well. However, if you run the same version of Unbound and install it after the removal of KSK‑2010, Unbound will ask the root for their DNSKEY set, will not be able to find KSK‑2010 in the key set and will keep it in the Trust Achor and will keep it trusted for that period of time as well. In this case, this is not very critical because KSK‑2010 has been removed. But imagine that KSK‑2010 has been compromised then this would mean these Unbound resolvers would still keep trusting this key.
So, some takeaways from after the rollover. No one expected the massive flood of DNSKEY queries. So a lot of people were worried about the rollover itself and about some other stages as well in the rollover. But this high amount of DNSKEY queries was not really expected. We saw that trust anchor management comes in different shapes and colours, which causes quite some troubles as well. And we have seen that shipping trust anchors with software has quite a long lasting effect.
So, these observations led us to a bunch of questions. And we ‑‑ and these questions, the first question is do we need to improve telemetry? So actually we have two telemetry protocols, one of which I didn't talk about in the presentation, but I will talk about it in the paper and the paper will be available starting from next week. And both telemetry protocols were quite useful. However, they both had ‑‑ we had troubles sometimes identifying the true source of a telemetry ‑‑ of the signal. So we're not sure whether the signal that we saw was actually coming from the resolver that we also observed. And another problem that we had was that we could not estimate whether the resolvers that we observed were actually important ones. So, for example, for the 8 percent of the resolvers who did not have trusted KSK‑2010, we were wondering do we actually have to care about this resolvers? It would be nice to have some kind of estimate about how many users a signal is representing. But we acknowledge that these both have quite some privacy implications and need to be discussed.
And the second one is, do we need to change trust anchor management? So, what we have seen as well is that more end user applications seem to do DNS. And some of them even do DNSSEC for some reason. And we also confirmed this by just doing a GitHub search and there we can also find a bunch of applications who have hard coded KSK‑2010 and KSK‑2017 as well. So, we think it might make sense to have a central point in an operating system to manage the trust anchors as well.
So to conclude:
We think that the rollover was a success. We think that it should be repeated at some point in time as well. We even think that we could probably add a hold‑down key to the DNSKEY set as well because I think this is the safe way to go. We also showed independent analysis is quite important because we were able to gain some insights that other people, for example ICANN, did not have or did not have the time to do it.
We also think that telemetry must be kept in mind in a very early stage of the protocol development because, for example, for DNSSEC with the problem that we had two protocols actually, but for both, both were not very well deployed at the point in time and had some issues, so we think this if you develop a new protocol, you should take care of telemetry quite early on.
And last we think that trust anchors should be managed centrally, but of course this still needs to be discussed.
With that, I want to thank you. And are there some questions or comments?
CHAIR: Thank you very much.
AUDIENCE SPEAKER: Warren Kumari, Google. I also hope USC runs B route. It turns out that new versions of BIND seem to also have this issue. We sent email to the KSK rollover list in April with some experiments and it seems you can install about 15% of the time the new trigger of new DNS K query.
MORITZ MULLER: What we also say in the paper is to keep the revoked KSK in the zone a bit longer just to have more resolvers notice that this key has actually been revoked as well.
AUDIENCE SPEAKER: Marco d'Itri, Seeweb. I just want to comment that Debian and Ubuntu have been shipping DNSSEC centrally since 2014.
MORITZ MULLER: That's correct, and we also say that this is a good example of how things should work, or we believe things should work.
AUDIENCE SPEAKER: Jim Reid. DNS guy, this is great work so congratulations to your colleagues, it's good to get this telemetry information about the effects of the KSK rollover, in particular that instance that happened in Ireland. My question is about where do we go from here? Because now that the dust has settled on this first initial KSK rollover, we're probably going to do these a bit more frequently in the future and obviously we have to take the lessons that are learned, do some more educational outreach. I think one of the big things that needs to be sorted out is telling people please do not hard code trust anchors in your configuration, how do we get that message out and have you had any plans or has there been any efforts with ICANN's office or the chief technology officer about future work with either gathering information and doing this outreach for the next KSK rollover?
MORITZ MULLER: We haven't talked with ICANN directly about that yet, as far as I know. But ‑‑ and we also have not thought about what are the best ways to do the reaching‑out part. I think ICANN and the community did quite a good job for the rollover before to try to reach a lot of people, but it seems that not everyone was reached before. He probably have to go to the software developers not only the network operators to think about these things. So this is probably the way to go.
AUDIENCE SPEAKER: Hello. You mentioned you want to improve the two telemetry proposals. I suggest we get rid of them, the data from them is actively harmful right now and I don't think it could be better. You say we need to be aware of the amount of users behind the resolver. There is a privacy problem there but there is also a trust problem there. Anybody could say I have a million users, I am on the odd KSK, they could sabotage the key roll that way. The data is so bad it's better not to have it. Then on the management of trust anchors, I think it's time to conclude that 5011 was a mistake, nobody got it right. The OS standards, they got it right. We need to trust them, like Marco says. Thank you.
AUDIENCE SPEAKER: Benno. Thank you for a very interesting presentation. Considering indeed the inbound behaviour, that was also actually complementary to the remarks of Marco and Peter. It was a Ubuntu package which came with the old key, so this was ‑‑ it was not inbound per se, it was the packaging still included the old key. But that has to be coordinated and discussed with the software vendors and the packages.
CHAIR: Any other questions? If not, let me thank Moritz for a very interesting presentation. Thank you.
Now we will have a presentation also when we can start dropping IPv4 on the DNS root servers, from Erik.
ERIK BAIS: Good morning everyone. I was wondering with this presentation for a while, and, you know, we all know currently where we are with the v4 depletion status and where we are currently with the v6 implementation. And I was wondering, how can we actually make that, you know, how can we make a small shift in this? And this has actually come to me in a presentation and we'll have some discussion afterwards, and for myself, the goal with this presentation and with the discussion here is to actually get a discussion going on.
We have all known and seen also in the last presentation about the KSK rollover, that it takes a while for people to adjust, especially if we talk about DNS. And I'm not a DNS guy, just for all honesty not. So doing this is an interesting topic, and I expect some heated debate after that, I'm prepared for that. But we'll have to, we'll have to see where this goes.
So if you look at all the v6 implementations, you know, when we all got started on this, you know, we are currently in the stage of the final days with the RIPE NCC handing out 22s in part these days, it's not unlikely that within the next three weeks, two weeks, RIPE is completely out and we go to the waiting list. Some RIRs already are there, and still v6 is lacking. Even after eight years of the IPv6 roll day.
So how can we get v6 as a standard despite all efforts, most ISPs and corporations are still implementing v4, you know, and for all honesty, I have a broker company, you know, this is some of the things that we also make money with, we facilitate this for companies that have a requirement for v4.
But even in the discussions with the customers that we facilitate on our network but also with the broker company, management had little or no incentive to actually go to v6. I mean, and they do get the bill for v4 anyway.
So this is not a v4/v6 bashing presentation. We all love v4, we all love v6. This is not a question about one versus the other. It is also not about lacking deployments or possible lacking vendor support.
And people saying it can't be done. I'm sorry, I have no excuse for that. You know, we are beyond that point. We need to make something different. Something needs to change.
So, that's setting the boundaries. And the goal is to actually get a fictional date somewhere in time, and it should be beyond budget planning, investment periods, somewhere in, you know, next six, ten years, you know, probably some will say let's do it in three. But we'll need to get started somewhere and get a date, when can we stop dropping v4 on the DNS root servers?
So, all root servers, all root server operators actually support v4 and IPv6 finally, I need to say, not all notes of those root servers actually support v4 and v6. Some are still v4 only. And it's my opinion, I state that correctly, my opinion they should be removed or fixed. So here you have the public list for, at the IANA website and you know you see the Unicast v4 addresses, v6 addresses, everybody is supported. If you look closer, then you see the orange ones here, which are v4 only, and this is just an example, this is not unique for NetNod, there are several others that have similar issues. And the interesting thing is, why are we not flagging v6 only? That's weird. They probably didn't think it was interesting enough. But it actually is. There are actually v6 only nodes as well. But I can't see it. I can't find it here. Talking to the root servers operators, I have talked to RIPE for that, they are run can the K‑root and they said we actually have one and the problem was we couldn't get v4 to work stably, so we actually disabled it so we have now a v6 only node. Which is interesting, and I'm sure, you know, there might be other that have been trying this as well. But, you know, it's just harder to find.
So if you look at an ISP setup with your resolvers, ISPs can fix their own resolvers for dual stack in a couple of hours, if you are running a network this should be doable and it will allow their customers to do v4 and v6 queries if they even support v6 internally in their network anyway, but typically this is the first thing you will fix after your BGP announcements, the first thing you do is get your mail server up and running and get your DNS servers to resolve on v6 as well.
And as soon as you do that, it might actually start to do queries to the root servers on v6 as well. I say might, not ‑‑ I'm not saying that it will. It will query them, but then, you know, you'll get the question, you know, where is latency and which one will actually be selected.
So, if you have a v6‑only root server that you query, you know, it will take a v6 address path. And it will do v4 queries as well across that v6 traffic path.
So, what is required: You need to have v6 in your BGP. Fix your resolvers to do v4/v6 dual stack. And for most of this, it is free of charge.
So, this is a simplified drawing, my drawing skills are not that bad, but ‑‑ so, here you have the setup internally in the network corporation or customer, they query the resolver of the DNS server of the ISP that can be v4 or v6, and if you have a setup with your root servers, it can do, you know, everything on v6. And everything still should be running. So this is the idea.
So all DNS root servers are currently dual stack. Most TLDs ‑‑ I said most because there are some that are just doing v4, and some of the nodes are not v6‑enabled. IPv6, like I stated, is not flagged, as you know, which nodes are or aren't. And v6 is, you know, free to obtain for RIPE NCC members. It's part of your income membership. And for non‑members, like companies, corporations that have legacy or PI space, they can request PI space for just, you know, €50 per object. It's an overcomable project, or problem.
So, the goal is to have all the ISPs have v6 implemented in BGP in five years. Now, this is interesting.
We'll come later back to the statements and do come to the mic if you do not agree with that.
If you look at the stats, what is allocated and then what is actually implemented and live in BGP. You know, there is a big difference here. So, as all ISPs have v6, because most of them actually might have v6 in their LIR but still not have it in BGP, there is free software available that you can use to actually route your v6. You don't need to worry about capacity because if you haven't run v6 you don't have any traffic. You can do this on a BSD box or on Bird or whatever. That's free software, and it's not rocket science. And the initial step is to get your DNS servers dual stack. And as all current DNS software is supporting v6, v4, dual stack configs, out of the box, and most of the software is free of charge, it's open source.
If all the dual stack DNS resolvers can reach the DNS root servers over v6 and v6 is preferred in, you know, by default in most OSs, then, you know, how is there a benefit for using v6? Will it actually ‑‑ is it preferred or is it equally balanced if you do the rate for which is the quickest resolver to resolve. I didn't get a really fixed statement about this. I asked a couple of people in the room that provide Open Source software for this. Some say, yeah, it's ‑‑ the race will actually look which is quicker. So it can be that even if you are doing v6 and v4, it still goes to v4. My preference would be that v4 would have a slight penalty in the software, specifically for the resolvers.
But if v6 is available, you know, there is no requirement to actually still keep running v4 on the root servers. And I'm not talking about the TLDs yet. We need to start somewhere.
Now, there might be old military equipment filtering software that are you know relying on v4 only stuff, and you know, the majority of the environments that they run there, they have their own servers DNS setup anyway, then again who cares? Sometimes it needs to break at some point anyway.
So, as you see, you know, this is a bit different discussion that the previous presentation was doing because they were very careful to not break the Internet. Now my intention is different. I want to have a date and I want to break it. I want to change what we have been doing. We need to change to stop supporting legacy v4 at some point and we need to depreciate it and that message needs to come out.
So, if v6 is the only supported protocol on nodes for DNS servers, you know, within seven years for instance, you know, we can start reducing the amount of active v4 nodes. This will actually might have a slight impact on resolvers just doing v4 because it will actually give a slower response because the nodes that are supporting v4 might be further away from you than the v6 ones. But it's still Unicast and it will still resolve, just a bit slower. And that's basically where the marketing ramp up might actually catch on, because if the message comes out that we're actually going to break the Internet, the core of the Internet recollect the DNS root servers and stop supporting v4 on the root servers in seven years or ten years from now, then maybe management might actually care, and they'll say, we need to do something because this date is coming, and as long as we do not agree when to do this, this will not happen, management does not care, it will not be fixed.
So, some interesting ‑‑ I found some smoke servers that we have in our network, we have been, you know, looking at latency for the root servers and I picked some out of this in a couple of slides. Here you can see a root has 1.1 millisecond response, a v6 is 1.6 on average. B.root, you know, b.root is a bit different because it's on the other side of California, so this is quite far away from us in the Netherlands, and actually v6 is quicker than v4. And it goes all around the world.
C.root. A bit instability on v4, and here again, v6 is quicker.
Then we have f.root. 1.1 on v4, 4 milliseconds on v6, which is a bit weird. Then again...
Here we have i.root. Now, I thought this was an interesting one, there is quite some latency going up and down to, you know ‑‑ and it may be there are different servers, you know, where they are replying or not. But, you know, at least there is a variation here and it's actually quite big. The v6 is actually 6.3 while the average on v4, the average, is 23 milliseconds and it's actually spiking, you know, up to 40 milliseconds.
K.root is exactly the same because we actually peer with the RIPE NCC and you would expect similar performance, v4/v6, all the way down.
In the ‑‑ on the RIPE website, you can actually see the performance from the anchor site from Atlas, they are actually monitoring the performance to both the nodes in v4 ‑‑ the root servers in v4 and v6. And the orange part says, you know, there is a bit more latency and the red part is, as you can see with b.root, it's quite far, so, you know, the system actually flags that as that's a big latency, but it's equal for most on v4 and v6. M.roots is red in v4 and green in v6. Just have a look with DNSMON on the RIPE website and you can see some interesting stuff there and see the differences.
Let's come to the mic. We have a couple of questions.
AUDIENCE SPEAKER: Jordi Palet. You are a bit late. I submitted already an Internet draft on this to Sunset for two years ago. It was not very well received, but anyway, the Working Group was shut down, and then last year I submitted another draft about the same contents to B6 Ops, which was not anyway well received, but I'm happy to re‑try if you want to join. Thank you.
AUDIENCE SPEAKER: Hi. Giovane, SIDN. A very audacious approach to fix an adoption of one protocol with another one. But a funny thing is like, I like it that it's audacious, maybe it's not the right place in my point of view, but the funny thing is, if you can get like the most popular websites to actually do that, it might just work as well. You don't need to break DNS. That's if Google drops it, the A records, there are Google folks in the room.
AUDIENCE SPEAKER: Jim Reid, Freelance DNS consultant. It's a very thought‑provoking presentation, Erik. Unfortunately, all my thoughts are unhappy ones. My first comment is, there are other ways, I think mechanisms, to try and encourage IPv6 deployment. Why don't we start putting pressure on people like the big e‑mail provider, the web browsers, the CP, vendors, all these other people should be doing more to try to get the IPv6 more widely deployed. And to agree with what Giovane just said, the idea of using the DNS as a mechanism to fix a problem in another protocol is probably a very, very bad one.
My next concern is, there are other issues around sort of tactical and strategic aspects of this. If you are going to take this forward, and I think you probably shouldn't, but if you do go ahead with that, please do not talk about breaking the Internet and root servers. Putting those things in the same sentence will set off alarm bells all over the place and the guys in black helicopters are probably going to come armed and pay you a visit. So please moderate and soften the language. The idea of using the root server system experiments or for open purposes like that is potentially dangerous. The second thing is, if the root server operators were to entertain this idea, I don't think they will, but if they were to entertain this idea, it's the start of a very slippery slope. What else are we going to force the root servers to do as a matter of policy that is not related to serving the root zone. I think that's a very, very dangerous thing to go down. The root servers' job is quite simply to serve the root zone. Their job is just to faithfully serve a copy of the zone.
The next thing is that even if you try to get ahead here, the root server operators individually jealously protect their autonomy and their independence of each other. So trying to have that many act in concert like this is going to be very, very difficult. Look at how hard it was to get the root server operators to be DNSSEC‑ready to get the root signed. That took a long, long time. The idea of going to act in concert to do this noble purpose for the purpose of IPv6 deployment is going to be incredibly difficult. Maybe the attitude there might be to have a quiet word with each root server operator individually and say maybe you should think about having a metric where they might have queries to a root server instance is more v6 than v4, consider switching off the v4 and have a gradual winding out, but a big bang effect, I think, is a very, very bad idea to entertain and I hope we certainly don't go down that path. Thank you.
AUDIENCE SPEAKER: Jen Linkova. It's quite funny. I kind of like it, but I think the presentation missed one slide. It would be very interesting to see the actual data, how much of traffic, how many or percentage of queries come into root servers, come in over v4 versus v6, if you start doing this, what percentage of queries will you effect. Obviously we can get much more with a kind word and a gun than with just a kind word itself.
ERIK BAIS: So, I did have some discussions with, on the k.root, and we had a look at some stats for that. There is actually quite a lot of v6 queries already. But those are ‑‑ what you see there is only, you know, the ISPs that actually have enabled v6. And if you look at certain nodes, then you can only see it from a certain region, but it might be that, in your region, you'll do a lot of v4 queries for the Middle East, for instance, if you are in Amsterdam and that they are not doing v6 locally in the Middle East. So, it is ‑‑ in the time that we had, we ‑‑ we saw some, you know, some big chunks of v4 queries on those nodes that actually were out of region and were not from the local ISPs that you would expect, or people that were connected to the AMS‑IX, for instance. And, you know, k.root is well‑connected with all the BGP sessions and all the Internet exchanges, but this was definitely one of the changes that stood out. So...
AUDIENCE SPEAKER: Antonio Prado, SBTAP. Besides root servers, do you have a rough idea of the situation in the ccTLD authoritative name servers?
ERIK BAIS: In what respect? For how many are actually dual stacked or not? There are some public sites on the Internet that I have looked at as well and didn't include them on the presentation. And that's why I said there are vanity TLDs for which, you know, similar as dot shop, dot XXX, the majority is v4 and v6 enabled, not every one. So, there are some that are only v4.
AUDIENCE SPEAKER: Hi. Shane Kerr, NS1. To address that last question, I believe I checked this two or three years ago and I think there was only like five or six ccTLDs which didn't have IPv6 yet, and they were ‑‑ I was unable to get in touch with anyone at any of those organisations, they didn't answer the e‑mails I sent and I don't know... they are not here.
As far as statistics that Jen was asking about, the root server operators actually have a really good standardised format for providing statistics and they all publish it. So this data is easily available and you can correlate it and a lot of them have night web pages as well that you can look at charts and stuff.
My own comment is, based on a project that I was involved with a couple of years ago called the Eddie Project, which was an alternative root server system which was IPv6 only. And the reason that I bring that up is, it's been proven in production environments that you can run IPv6‑only root servers. I think one thing that's missing from this proposal is any kind of ‑‑ I think there's an evolutionary path that can be taken, steps along the way. I think ICANN could provide an IPv6‑only signed root zone, which they don't do today. I think there is probably a great reluctance to go down that path, because right now there is only a single signed version of the root. It's possible to sign a whole bunch of different root servers setups, but that opens up a whole can of worms which is only a small can of worms in the big set of cans of worms that this proposal entails. So... anyway, that's it.
AUDIENCE SPEAKER: I have three points. The first one is, no sympathy for ISPs who can't be bothered to run IPv6. So I don't care about them. But, there is also other people that are non‑ISPs that also talk to the root servers, don't underestimate this. I think you skipped over that. And those could also be a situation where they don't have IPv6 and it's harder for them if they are in an environment where it's hard to get IPv4, I mean for an ISP it's different. And also, like we heard in the previous talk, when you ship, for instance, the cache, there is probably still people running with one that doesn't have the IPv6 addresses in it. So those people will be in trouble and there will probably be all kinds of breakage because of that.
ERIK BAIS: But let's be honest, Nobody should be running Windows 95 any more these days.
AUDIENCE SPEAKER: Only people who do everything they should are connected to the Internet, right?
ERIK BAIS: No. We all know that's not the case. But, you know, at some point, you know, if you look at when they made the change from, you know, two TCP IP with the orange books, they had a specific date when they actually made the change. And that actually, you know, was maybe not perfect, but it worked. The only thing that I'm saying here is, we need to have some date, when are we going to stop supporting this? Because otherwise, in 30 years from now, we'll be having the same discussion.
AUDIENCE SPEAKER: Well, they say the Stone Age didn't end because we ran out of the stones, right. So the IPv4 age isn't going to end because there is no IPv4 DNS servers any more.
ERIK BAIS: Good point.
AUDIENCE SPEAKER: My last point is the most important one. That is, we need all the diversity we can get. So if we have two IP versions with completely separate rooting tables, that's great. If we have 26 IP addresses across those two versions that the DDoSers have to attack in order to take the root down, that's much better than only 13 on one IP version.
AUDIENCE SPEAKER: Pieter Lexis. We make a recursive name server. You mentioned that it is free of charge to have a dual stack setup. It's not. Because the convergence to the fastest authority will take longer simply because we have more addresses to probe so you will incur a hit for the end users once you enable this on a dual stack setup. So just to pile on more worms on the cans as you have already have at this stage.
CHAIR: Thank you very much.
AUDIENCE SPEAKER: Warren Kumari, Google. I hope ISS ‑‑ but speaking in a personal capacity only. I think that v6 is a great protocol. I think everybody should run v6, I think resolvers should all do v6, but I think that I have seen a number of cases where people suggest that we tax v4 to force people to move to v6, and that feels like a fundamentally broken thing. People should do v6 because they want to do v6 because it's better, Not because it's harder or more expensive to do v4.
Other point. RFC 7720 has a set of requirement for root servers, and in there it says "must support v4 and IPv6", so, if you want this to actually happen, maybe you should suggest getting that updated first or go and investigate that. I personally think this is a major, major mistake. Make v6 better, people will use it. Not make v4 worse.
AUDIENCE SPEAKER: Suzanne Woolf, Public Interest Registry, but I'm not speaking for anybody, including the three root servers operators I worked for in the past. There is a mess that there is no governance of the root server system. I think the fastest way to prove that false would be to try something like this. As Warren pointed out, there is an RFC that all the root server operators have publicly submitted to following that does include support for v4 and I don't expect and wouldn't advise any of them in the absence of a widespread community consensus would turn off support for v4, this is the wrong pressure point.
AUDIENCE SPEAKER: Peter Koch, DENIC. Not speaking for root server operators, of course, as an operator of a ccTLD, I think we would be committed to serve the community with the needs they have, and, if there is v4 queries still arriving, we would be happy to respond to those. Regarding the idea, I think it's kind of putting the cart before the horse, weaponising core Internet infrastructure which is not favourable, and it's also kind of, because we had the comparison with the Stone Age already, but it's a bit like killing the canary in the coal mine before you shut down the coal mine. The service should be shut down if there is no need for it and it should not be be a carrot or a stick or something. So again, bad idea, politically very dangerous as others have pointed out. Please don't go there. But again, or not again, but surprisingly, that idea was not the worst I have heard after this talk.
AUDIENCE SPEAKER: Friso Feenstra. Somehow I like the idea of getting it out. There is a better protocol. So not so much in shutting down root servers for IPv4. Maybe better adding additional v6 root servers so v6 responsiveness will increase over v4 and then once that is picked up by the Internet and people on the Internet, people will start saying, hey, I want to get a better connectivity, I want to get things better, and you will see less and less traffic on IPv4, and then at a certain moment people will say for the last users on the Internet that used v4, do we really have to support them or the root servers? I'd like to more get it in that direction.
AUDIENCE SPEAKER: Sander Stefan. Actually, Friso mostly said what I wanted to say. Like, and you're talking about time lines five to seven years and I understand setting a date is important to make people realise they have to move. But, I agree with Friso that maybe we should focus on making v6 better. So, we could, for example, just I mean, k.root is already well‑connected. We could ask the RIPE NCC to focus on any new connectivity on k.root to be just v6. Keep the existing v4 ones but do all new deployments v6 only, for example.
AUDIENCE SPEAKER: Randy Bush. In the latter we employed v6 globally in '97, so ‑‑ but my takeaway from this is not we're going to break the root servers or we're using the root servers as a hammer to deploy IPv6. I think let's give Erik credit for exploring implications and starting us thinking about the technical implications of what happens as we move to a v6‑only universe and how that affects the root servers is an interesting question. To be a little snarky, I have been reading science fiction since I was young, so I like thinking this way, and I think we should start thinking about it. But I don't think it's something we're going to do probably in my career, but that's personally due to my age.
ERIK BAIS: Probably not in mine either if I hear the comments here.
RANDY BUSH: Oh, one more, which is, we could have a contest to see which takes longer, which is converting to IPv6 only and changing an IETF Internet RFC.
ERIK BAIS: Good point.
AUDIENCE SPEAKER: Benedikt Stockebrand. First off, the measurements you had, you made it look like that the different response times depend on the name server. Can that be the networks in between as well so that might be a network issue rather than an issue with the name servers? And the second point is, we probably shouldn't try to ‑‑ well, botanise DNS or protection on things or whatever. But IPv4 is getting more expensive everyday with DS‑Lite and whatever in the way which makes things more expensive to get the equipment to operate and whatever. So, we might actually get a similar effect just by the very nature of IPv4 exceeding its design size.
My opinion is that what we should push for is actually to have all the DNS and whatever nodes support v4 and v6 at similar or equivalent or whatever speed, and eventually things will sort themselves out, because if people move towards IPv6, which, in my opinion, happens because there is more and more limitations to IPv4, things will sort themselves out. And there is one thing. If you give that line to management, it doesn't matter, It simply doesn't matter. Most people only go to the dentist when they can't stand the pain any more, so they will sit it out, and like some people, in November '99, realised, oops, I think I have a problem next year. And I have had people telling me in September '99, all I care about is an extra quarterly report. It will be exactly the same if you tell management you have five more years to get this done. Nothing will happen.
So, they really have to feel the pain and it should be a gradual thing. So, basically, with IPv4 roots disappearing eventually and IPv4 latency going up, things will move that way anyway. There is nothing we should actually try to manipulate for this purpose. I think that would be a bit ‑‑ very, very aggressive way to do things where it's not necessary.
CHAIR: Thank you.
AUDIENCE SPEAKER: I think if you talk in general sense of protocol retirement, like if you are going to turn off IPv4 and as DNS is critical for operation of it, I think the only good upper bound is when you can turn v4 and root servers is start deploying IPv 10, which is the next protocol when we run out of the IPv6 addresses. I really don't want to live in triple stack world.
CHAIR: Thank you very much. Thank you, Erik, for the very good presentation, and by this time I would like to thank you. This session is ended now. We have lunch break. Please make sure to be here at two, and also, please, I would like to remind you to rate the talks on the RIPE 79 website, please take five minutes to do that.
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC