15th October 2019
At 9 a.m.
OSAMA I AL‑DOSARY: Hello, we are about to start, please take your seat. Thank you. Good morning, everyone. Thank you for showing up this early. My name is Osama...
JELTE JANSEN: My name is Jelte.
OSAMA I AL‑DOSARY: We are from the Programme Committee and are chairing this slot this morning.
JELTE JANSEN: You are not here to hear us talk, you are here to hear the great speakers talk. The first one is from Qrator Labs and he will be talking about origin validation. Eugene.
EUGENE BOGOMAZOV: Hello everyone I am from Qrator Labs. This is indeed my first RIPE meeting and I want to talk with you about origin validation and state of ‑‑ and status of some proposals and policies.
Why I am talking about it? I am working at the company that has its own Anycast network and I am leading a department who are trying to find out information about different BGP incidents and we have communicated with IETF community to bring on some modern security drafts for BGP such as BGP route, SPA and so on. You can say to me that prefix origin validation is high popular topic and there are a lot of effort that it was put in this area and radio creation can be done by two clicks for operators. And if you want to host a filtering system in your network, you can just put on a validation cache, take a RPKI to protocol and everything will be find. RPKI validation and filtering was never ‑‑ was never before. So I don't want to take a breath from this that put such effort into this area. I want to talk about different things.
First things first: Why do we need origin validation? Because we want to prevent our address space from traffic redirection and so on, and you all heard about some well‑known cases with prefix optimisers that were take on on the end of the June, at the right of the slide you can find out about first case that appear three months before and it's first case with prefix optimise during this year and when we have such cases, there are two questions: First of all, who are the really real or prefix owner because there are two patches that claim they are real owners and whom you can trust. And when we are talking about trust, we have two options: We have radio registries and RPKI, if we are talking about we have 25 it, 26 different databases and all have different level of trust and this is why, this is one of the reasons why radio objects have been created because you have just one route that take place in IANA and indeed ‑‑ area RIRs such as RIPE, LACNIC and so on. So, if you are talking about trust seems like objects in which you can trust more than in usual route objects.
And when we are talking about route objects and you have different tools that can show your prefix status. So, during this data registries. Some you can see Hurricane widget tools, Ripe widget tools, and so on. And I want to allow some key moments for our own autonomous system. First of all, we use 32 as max length for RAOs. Then we have route objects for black‑holing purposes, we signed out our own address space but our clients don't and also, we use the same validation status so for route objects is as for ROA. And the key features are just listed on this slide.
So, we have our own prefixes as a regular ISP, also we have a website that trying to manor the status of route objects for different networks, and also, we investigated different bad cases, so we have response of you on this area and we known about best practices and sometimes don't agree with them. Some of, we have some questions.
And the questions are: Are these policies and combination of these policies really good? Is minimum max length really ideal in current state? What to do with RIR because it seems like don't have a max length option? And we don't have any standard policy for RIR objects, is it time to create one? And I will try to cover these questions one by one.
Actually, I have even more questions but I have had 30 minute time slot so I think I don't have time to cover them.
And so let's dig into this area. First things first, we need some basic information, but it's ‑‑ yesterday it was a really great tutorial that took time early in the morning and it covered route objects, ROA, how prefix list can be created, problems and other basic information so if you need some knowledge about basic stuff in this area, this is a great tutorial, you can follow the presentation and try to remember this stuff. So I hope I can delve into the more technical details about this.
And when I first come into the routing sphere, it was around four years ago, I was really confused because I saw that there are route objects and ROA objects, and both of them are prefix pairs, you can say route also have max length options but if we are talking about transit IP they can choose just they want to allow some prefixes in the filters. They can say yes, they want, or no, they want an exact. So it seems that they have the same syntaxes, why we have two options? And when I was diving into this sphere I understand they have very different use cases and so on. What is the difference?
First of all, when they are talking about route objects, we are talking about white‑listing good prefixes and when we are talking about ROAs, routers that has pair to pair that can be filtered out. Also, as I was saying before, they have different level of trust because route objects based in set of different IRS, they have different internal policies of creating the policies and in some of the policies you cannot trust. When we are talking about RAOs, this is one approach to this so you trust to this approach or not. And last but not least, there are different leading role that motivate whole infrastructure to create these objects. When we are talking about route objects it's transit IP that for and routes, prefix owner. What I mean by this.
When we are talking about transit ISP and IRRs, transit IP wants to filter out route, so what he is doing, he is trying to find all the ASNs in customer cone and I said a lot of problem and need to, but you all know about them. Then he chooses RIR registries that he want to trust, then he start prefixes from route objects from these registries and then he would trying to choose does he want to allow some prefixes or not.
So when they are talking about transit ISPs there are very many cases of choice for him.
And once thought about exact match and about sub prefixes, we met with this problem when we were trying to set a deal with one of our upstream, as far as I remember it was Redden, and it was sensed that we need route objects for every prefix we were trying to announce but we were needed to enable a co‑opportunity for our prefixes. We created route object for every IP that we were having, and as you can see, we have announced around 17 prefixes and have more than 2000 route objects just to have this opportunity. This is not really good situation, but we met with this problem.
And what are you match as a prefix owner when we are talking about IRR? Your transit ISP wants to create filter and you want to ‑‑ you want your traffic to bypass through this transit ISPs, so you need to accept the play, the rules and you are trying to create your objects to bypass these filters and you don't think to stop other matches from hijacking your address space and your traffic, so you created these objects, your traffic is going on, you forget about this route and this is why we have so many outdated information in these databases. As you can try to guess, some of your transit ISPs will transit filter on. You don't know which type of filter and you create a different type of route objects just to prepare and if your transit ISPs changes the rules you are ready for it.
And if we take a close look at ROA objects the situation is ‑‑ the situation is quite different because the main motivation for ROA creation is for prefix owner to defend address space so they created objects not to allow others to hijack the address space and to not let different hijackers and so on and transit ISP just role, they just check all the rules that was created by prefix owners and maybe this is why ROA are not so popular yet.
And so you have two different options: You can choose both, but when we are talking about RIR, these filters a prefix that was said before, also they can be placed only on some interactions because you need to find OIS in your customer record so you can find customer record only on customer and peer directions. Also, owners ‑‑ prefix owners created objects just to pass these filters and there are in many places for errors and in some basic registries. When we are talking about ROA, it's about blocking bad routes and this filtering can be done in any direction and also you can filter routes that comes from your provider. You created objects to stop the others and to stop them on filtering points and this point is important. Why so? When you are checking about max length there are different BGPs and even drafts that were trying to say that minimum/max length is a good option and I like this quote not only because they have really good definition but also for this phrase you cannot forge these prefixes. Of course you cannot, because this doesn't exist, I just like this definition. So, it was a case at the end of the June that have some main features. First of all, it was of a sub‑prefix attack of address space. Also, ISP has ROA, this minimum max length, and this ISP has really good up‑streams that are trying to apply to invalid policies so any bad route will be dropped. I am trying to take these main features and put them in simpler case to show what is bothering me.
So, you have your Internet and you have your clients into the Internet and you are trying to get traffic from these clients, and so you announce your address spaces, you have a radio and everything is good. But there are other part of the world that have not filters on their up‑streams and so they have haven't announce route and must be specific due to the longest prefix match they have all all the traffic from the damaged area. If you want to return the traffic you will fail because you have not correct ROA objects for these prefixes and so your good up‑streams will filter out any of your bad route even if you are correct prefix owner and you are the route owner of this address spaces, so there is no way for you, for immediately your traffic and so, we have ‑‑ we see that if we have a partial deployment this policy may not work, but policies are okay and we need to just put user where route deployment so to ‑‑ ask prefix owner to create ROA objects and for transit ISP to apply different filtering policy or different filtering points and they were not saying about that in partial deployment maybe we need to change these policies. And if you want to find it out by yourself, you can just look at NANOG mailing list for June and July.
But there is another point of view and it was said no, you can just use max length and if we are doing so here, we allow sub prefixes attacks through your filter but you can always return your traffic ‑‑ of course you need a monitoring and you have some opportunity for traffic engineering. In the ideal world when all transit IP will have filtering sub‑prefix attack will be good but in partial deployment you cannot return your traffic and in ideal world monitoring is not needed. And if you want to make traffic engineering you just needed to create a new ROA object.
So, this is all what I want to talk about max lengths. Let's move on to different policies.
As, you know, validation have three different main states. First, everything is examining according with the policies. Second one, the policy is broken, and the third one, there is no suitable policies for your address space.
And current best practice to use drop invalid policies and all your filters, and here, as it was shown in ideal world, sub‑prefix attack will be gone and you need some explicit exceptions to bypass these filters, which is a good practice. Also, there is a draft in Ops Working Group that are trying to say maybe we can make more softly version of these policies but this is just not a good idea because this took a lot of effort and it doesn't work.
So, the question is: Are there any other cases when drop invalid policies is bad because similar discussion is going for draft and we needed to find out more about real use cases when you broke your customer traffic when you were applying this policy.
When they are talking about not found, route deployment is having a.slow ‑‑ and even if it has been around ten years it still covers only a small amount of address spaces. So, you cannot apply drop not found on your route. The problem is when you are trying to say that I will apply validation status to some other places, you will fail. For example, it was a suggestion by these guys to apply this policy to URPF or anti‑spoofing but it was easily broken because you can announce almost different routes,/1, 2, or 8 into the direction in the sense of the addresses from this huge amount of address spaces. So, when you are taking route validation into play you need to define what /WH you wanton do with no found because not found status not means route will be valid one.
And if we returner IRRs, there are different places so where transit ISPs must make their own choices, the choices independently and this is why the next question come up: Does we need to find out what is a really good best practice defined 1, to find it is RIPE policy and so on. And if we needed maybe we just needed to start work for this policy. Of course, we will make and meet some problems when we are trying to do so. So, we need to define in which registries we really trust and in which registries not, we are trying to find out what to do with conflicts when it came up to compare different type of objects, route objects and ROA ones also. We need to understand the situation for transit ISPs to move on from the custom solution to this practice one, and /O also, there is a problem that was highlighted to around half a year ago and it's taken /TPHAOP account, approximate prefix location. What I mean by this problem:
If you have a transit ISP you can pass part of your block to your customer and if he has another provider, he can get traffic from both of you, from the original owner of this address space and from other guy. But because you have only.part of this traffic you will have only part of the, so you can announce more specifics of this address space because you are really prefix owner and all of this traffic flows through your right again. So is this a real problem? I don't know because it seems like this transit ISP is really a prefix owner, there are no current options to solve it because he can create ROA objects for this address space and do we need to do with such a problem or not? I really don't know. I need your opinion on this kind of situation.
So, if I try to sum all these things up, I came up with these questions: First of all, are there really any other bad cases of drop invalid policies, because when it merged with minimum max length we have some problems in part of the deployment route. Also, which max length we need to use right in our? Maybe we need to use a max length. For some time ‑‑ and wait until one amount of filtering points will be enough to move on to minimum max lengths and during this waiting we just need to monitor our other routes. And what to do with route objects in RIR case. And third question if we need to define some standard, maybe we need to start work and so, when we will be working about it, there are huge amount of questions that will be ‑‑ that we will try to solve, first of all maybe we need to return for prefix owner and also maybe we need to take good approach from ROA validation system and just apply it for our route objects. And last but not least, if you are interested in these questions and if you saw the problems that arises by these questions, maybe you can name the RIPE Working Group where I can address these questions to continue discussion and to trying to solve these problems. So far this is my questions. Do you have yours for me and this is my contacts if you are really interested in the topic and want to continue discussion with me, I am willing, please. So, thank you. Do you have any answers to these questions or do you have yours?
OSAMA I AL‑DOSARY: Any questions, comments or answers?
AUDIENCE SPEAKER: Good morning, Alex ‑. Eugene, thank you for your presentation, it was very informational and very helpful, I think. And but what you meant that as ‑‑ is very big problem because it's full of mistakes, it's full of errors and so on. What do you know, maybe we should start to do something with that? Maybe we should do ‑‑ should start to check and validate IS sets if it's possible?
EUGENE BOGOMAZOV: Thank you for your question. There is work in the that is trying to solve this situation, by a special RPKI objects and I hope if we not came up with a solution for the IS sets, in terms of require policy, we will end up with this to solve this problem.
AUDIENCE SPEAKER: Okay. Thank you.
RANDY BUSH: Arrcus and IIJ. You don't have slide numbers so I can't tell you the slide but essentially when a transit provider gives a customer a sub‑prefix they should delegate through a certificate that sub‑prefix to the customer so that customers can then issue the ROA for both providers.
OSAMA I AL‑DOSARY: Thank you, Randy. Any other questions, comments? So right, just a reminder we have the Routing Working Group on Thursday at 11:00 so you may also want to have a discussion there. Thank you very much. Our next speaker...
Martin Levy from Cloudflare.
MARTIN LEVY: Good morning. Right. One audience member awake. I am giving a presentation for one of our network development engineers who was unable to be here, but it's about the right time for Cloudflare to give you an update on the software that we have been running developing, putting on GitHub and I know some of you have been using so this is sort of a nice re‑review or story as to why we did this and then some interesting stats and some of the directions that we are going.
This is Lewis, there is a video that was done by RIPE on him, you will find it. This is majority of his work. So, I only take claim for the waffling part, not the software part.
So, it's not like RPKI wasn't on our radar, but an event happened last year, there was a leak, it involved a ‑‑ essentially a hijack of a set of name servers and it actually involved real money, I know it says Bitcoin, but they were converted to real money, and this leak, as interesting as it was, basically was the thing that got us to the point of, right, we need to do something, the time has come, we can't wait any longer, the experimentation, playing around we did it, just was not enough, we needed to bring stuff into production and how were we going to do that.
So, although this leak involved DNS servers and the hijacking of them, ultimately it came down to routes, it came down to a route being stolen. And if you actually sort of look at what we were doing at that point in time and how we would have gone about protecting ourselves, we had about 150 dots around the globe, we are just shy of 200 at the moment, we have an awful lot of BGP sessions, both at exchange points, with providers, transit links in, etc., etc., and then internally as well, which are taken out of this list, but we have a lot of BGP. But what may be slightly different for us and I will touch on later, we have IP space from all five of the RIRs, for our operations around the globe. But when it came to RPKI, what we found was, really, the only thing that made any sense for us to spin up was the RIPE validator, and at this point I should say, thank you, a big thank you, to the work that was done at RIPE. If we didn't have that software at that point in time, we would have just been twiddling our thumbs. In our environment, we needed something of that calibre in order to move us forward.
But, 150 dots around the globe, you actually turn around to scratch your head and how do you want to do any of this distribution. So, what we did was, we wrote, so to speak, the second part of the RPKI software batch first. We had a validator, and so we wrote something called GoRTR, this was the RTR protocol, it speaks to the routers, which are distributed around the edge and we Open Sourced that one fairly quickly, you will see a whole bunch of GitHub URLs through this talk so grab the talk online, copy them and I will talk about it later. What we did was, we used the RIPE validator, we built this RTR code and over a course of about four months, we managed to go around the globe getting RTR pumped into the various routers.
Now, when I say various routers, there is something also on RPKI like, but important to talk about from Cloudflare's point of view. We run diverse router manufacturers around the world. We don't run a single platform. Why? Because we don't want a zero day problem to take out near 200 dots around the globe, so we actually run a very diverse platform, and in fact, actually, we have talked at various conferences about all the automation tools that we have that enable us to run a RIS at that it, Juniper, Cisco, etc.
At the same time, back to talking about RPKI, we started signing, and we did that manually to begin with, so that part we can skip.
The diagram is, as you would expect, that the validator here, which at the time was still the right validator, we were feeding and ‑‑ we will talk about this in a bit ‑‑ because we are a CDN, because we are a content delivery network, it's easy for us, but we distribute out of the validator to our RTR instance which runs with inside an edge data centre close to a router that has BGP sessions to the rest of the world. So far, so good.
So, until the beginning of this year, this is what it looked like behind the scenes. We used the validator from RIPE, we generated a JSON file, which was all the routes, the minimum information needed, and we distributed that, but we ran the RIPE validator in one of our data centres, we have got a lot of, well Mesos now, the Mesos cluster existed, we did a little bit of filtering on the results, we wanted to optimise out anything smaller than a 24, etc., etc., the type of housekeeping you'd expect, and why are people generating RAOs? Never mind. So, we built this and this JSON file, its existence started at this point.
We built some stats around this and kept moving. Now, by December, we had drop invalids on all the edge but even before we did that we basically tagged all of the routes depending upon the RPKI status and started looking at the flow stats. And we saw what was being ‑‑ what would have been traffic that was effective ‑‑ affected, but you have to understand something about a content play: We are not an IP backbone, so in fact, actually, we run app default route inside every one of these edges, and that default route points out to one of the exit paths that we contract with. Now, this is very different to the way you would run a backbone, and there have been some excellent talks about why you should not run a default with your running an IP backbone but that is not applicable to us. So, in fact, if we drop a route it still has a default on the way out, so we weren't affecting traffic but we could measure it and see what was going on.
Now, we found an few places where there was hiccups, this is why you test and you adjust appropriately, it's a nice secondary check. But, the amount of traffic, when we moved over, we could actually see on the graph pretty easily that, you know, this much of our traffic, your traffic may vary, was, in fact, actuallying going to signed valid RAOs. The invalids, by the way, that were dropped were meaningless, they just don't show up on this graph, and that in a way should be a whole other conversation. We have a lot of our own tooling for flows so the good news we can build this stuff up fairly quickly.
Right, so that's sort of the first part of turning on drop invalids and the like. Let's talk about how we sign the routes. Remember I said that we had space from all five RIRs. Well, this gets complicated. There isn't much consistency between the IRRs, they all have four digits ‑‑ it goes downhill once you use the dashboard and interface and lo and behold if there is an API it's vastly different so we basically built up a score sheet for these different ‑‑ for the different RIRs. And we went through certain things, like we.looked at whether they do two factor authentication or generate client certificates, etc., how easy they are to use and then of course is there, in fact, an API because we are a heavy automation shop so when we bring an IP address into use for method A or method B we wanton sign it it, we may wanton change prefix lengths, we never wanton change max lengths, the previous conversation is a good one about that. So, here is five slides. AFRINIC comes out pretty low, sorry about that, guys. Did we get ‑‑ there may be an important question here: Did we get everything signed? Yes. But it just takes a little bit of effort. APNIC, yeah, all good, got some hiccups when we tried to send in enormous amount of routes in, we have scales so we aren't signing one or two routes, we are signing a lot of routes, so these type of bugs are easy ones to report and work on and also because we have automation for things like APIs, we can actually do things like don't sign more than N at a time and sort of just feed things out slowly as opposed to, you know, trying to sign hundreds of RAOs at a time.
There is a draft API out of APNIC so this.makes us very, very happy.
ARIN, yeah, got some issues but we are being critical at this point, you know where I am going with these slides. Unfortunately LACNIC is a little bit behind on this one but each one of these are players we are talking to and would move to see an improvement.
And then finally, RIPE, so, we are doing good except for that last point. Certificate encoding seems to be complicated. There is an RTFM requirement ‑‑ or RTFR ‑‑ FC, that sometimes you actually look at stuff and realise it isn't quite matching the RFC. The automation part, which is work in progress, definitely can work with ARIN and can work with RIPE, and by the way, keep in mind, we are having to key our RIR environment just as valid while we are doing all of this. Because we have to deal with that, that's still the real world for a lot of places. Anyway, we are very salt based, we have talked about salt for our automation. It would be a whole separate talk to talk about how we engineered our salt to RPKI work together. It is work in progress and it does show hope, our goal is as the IP usage in the company changes, allocations to different sites, allocations for Anycast, non‑Anycast, that this automatically ROA generation becomes automatic.
If you remember back to my original diagram, I talked about us using the RIPE validator and we are very grateful for the fact that that existed and it was able to be used, but we wanted to write our own validator, and and sort of jumping ahead here, software diversity is a good thing. Having multiple platforms and multiple things doing the same thing in the protocol world is an absolutely true, tried, tested method, so in November 2018 we started with running the NL labs, the very early Routinator 3,000. We in production were still using the RIPE validator but we really wanted something that was ours and we are a software development house ultimately for all of our services, and so we ended up basically going with go, a language that we use extensively inside the company, we have a large number of libraries, of which a large amount of them are on GitHub as well and we could take advantage of the cryptography libraries of, of all the infrastructure we had inside the company and we wanted to go build a validator, and guess what? You can come up with really trivial bugs early on in the game. These are things that people at RIPE had seen, these are ‑‑ people had seen beforehand when, when some of the early software was written, the routinator guys must have seen the same thing and there is ‑‑ there is more and more of this that will be seen out there. But silly things like, you know, you go plug this into your test Juniper in the lab and you realise the whole RPKI environment is disabled, but you don't know why. Kind of stops testing pretty quickly. Other things I mentioned about doing BER coding versus DER, read the RFC. But in theory, some of this cryptography stuff is complicated but once you have got good libraries around it you can move ahead fairly quickly. We built OctoRPKI, released it in February of this year. It is ‑‑ I will actually show this, I will show how easy it is to run but it's written in go, it uses a lot of the other Cloudflare tool kits and it does ‑‑ it does the things you expect it to do. Keep in mind, it doesn't do RTR because that is already a separate process. This has a lot to do with the distribution mode that we operate inside Cloudflare. A couple of data centres that weigh we want to do this validation and pulling the information from the RIRs and then an awful lot of places where we want to do the RTR process, so it does our sync and our RD P, maybe a better way of saying it it does RD R but it does R sync under duress, it can be run locally, it has a lot of features, a great read me file on GitHub so I won't waste time going through everything. But, there is a couple of points here that are important:
Built in Prometheus monitoring, we have infrastructure necessary to do that. It made obvious sense that that was the way we would go with monitoring. And the whole thing runs within Docker, whether you want to do that or not.
Because it's written to run with GoRTR, it essentially generates this JSON file, the format that we were using out of the RIPE validator and it generates it to distribute. We distribute it as an https file because we are a CDN and that is literally the linkage between these two elements. You can load this puppy up on a Mac, on a laptop and you can get fully synced with, depending on your bandwidth, 15 minutes, 20 minutes, it just depends. It's not very complicated. And it will run in one‑off mode. So assuming that all of the fetches work you can work it run‑off and get a complete RPKI tree right there and then.
Let's look at a couple of stats.
We did three here, Routinator, the RIPE and OCTO, and we built for a fairly low footprint, so the fact that we came in slightly below Routinator is nothing ‑‑ I mean, this is noise, although, yeah, you come up here H can you explain this one, somebody? I mean, the CPU usage was something we had top work on and we have got the CPU usage just like Routinator down to whenever the Kron goes out we grab something and move on. We are pretty happy it runs, it's very clean. I mentioned we run Prometheus so here is some of the monitoring, it pumps into Prometheus, we run graph Anna, it's like any other sub system.
Here is it running on a Mac, you have got a validator in one window that sort of up there doing, in this case, R syncs and you have got an API that you can query right here that built into it so you can actually look at some of the data while it plays, and then the other one, yeah, you can tell here on that you have hit stable state, and again, you can go off and look at this ‑‑ this was run ‑‑ this was done when there were about 76,000 RAOs in there.
Want to run it yourself? Here you go. This data is on the GitHub page if you are Dockerised this should be enough to get you up and running. It's not the subject of this talk but, yes, there is an argument flag there to add an extra TAL if you need to bring in the RIPE TAL, we don't distribute it. Here is the RIR environment, point it towards your ROA dot JSON file created by the previous entity, and the two run separately so you can run many of these and few of the others. You can run this on a command line if you wanton as well.
So let's talk about GoRTR quickly in the last five minutes and talk about the actual reality of talking between an RPKI software base and app a router. All routers are not equal. In fact they are not even close. The actual RTR RFC has something like five different transport mechanisms, a couple of which may never, ever get im/‑L /PHRELTed but the key three that /R* there, plain text, which simply means the binary relationship ‑‑ the RTR server and the router, and SSH based interconnect and a TLS based transport. You can see here if you look at Juniper, Cisco, all of the different makes, what they support and don't. The good news is plain text is supported by everybody, but I ask you this simple question: If you go to this much effort to put cryptographically signed information into a routing database, why is your final link to the router via a plain text non‑TLS, non‑SSH connection? I mean, that's asked ironically.
Anyway, there are a couple of public ‑‑ there are a couple of public points that you can use. So you ‑‑ you don't want to even run this software and just do RPKI, this is how you do it. So, this URL exists, that's the port number out of the RFC for plain text. Sorry, this host name exists. This host name exists, with that port number as per the RFC. The RFC explains that the user name should be RPKI, the password should be RPKI, not the topic of discussion of this talk. You know where I could go. This is what you would type into a Cisco router, assuming that you replace this IP address with something up there. And you have the beginnings of your RPKI environment. Of course, you are relying on me running a validator. Don't do that, for obvious, again, security reasons.
Okay. Internally, we have got a dashboard, going to run out of time to do this. Internally, we have a graph query language into all the RPKI data, which is what we are using to build additional tools. We will get these out to the rest of the world at some point in time but it means we can do some pretty deep queries into this. This needs a talk all by itself. Who is familiar with certificate transparency in the web https world? Right. That's enough people. So, we have done the same thing for RPKI. We are doing the same thing, we are writing things into a mrical tree, it's a write once database, it is the same platform we run certificate transparency, as do many others, for the web space. The URL is on the bottom there, it's the beginnings of what's there.
I am going to show one other data and then I am going to stop, the rest of the slides are the background on route leaks. How fresh is your certificate? Depending on the RIR you are using, you sort of have a whole different ball game. So this is the valid from date, we did this ‑‑ we actually did this earlier in the year ‑‑ this is the valid to date. And this nice straight line of blue dots here is the fact that ARIN, when it signs a ROA, gives you ten years and so, therefore, if you sign in this month, you get this month plus ten. If you sign later, you get this much plus ten, a nice straight line. All these other random dots here are different RAOs that we found. I love this part here when earlier in the year there was a whole regeneration are of certificates within the RIPE region, but it's a little over the map here and I think this is, again, work in progress to both detect certificates running out, unless auto renew is turned on, but also to understand whether it actually makes sense for certificates to expire in 2050. We have learned an awful lot from the http world but not much has transferred. I am going to stop at this point. The other slides are on‑line and talk about the route leaks and what we found, why RPKI is useful, let's assume RPKI is useful. I have 30 seconds for questions. That was cleverly planned.
JELTE JANSEN: You have a few minutes.
RANDY BUSH: Historically the reason for texts is transport was that's what people had, and it was designed and if you read whatever the bloody RFC number is, it specifically says do not do this over the WAN, do it within POP. Your situation is validating centrally and transporting globally, if you do you that in text then you had a slide recommending that.
MARTIN LEVY: Yes.
RANDY BUSH: If you want a nice slide to go with this one, look at the exploration of objects and the objects of manifests
MARTIN LEVY: Will do.
RANDY BUSH: There is a mess in there, there is a real disaster in there waiting to go happen. Pardon me, it actually happened twice.
MARTIN LEVY: Okay. Yes, the real world.
AUDIENCE SPEAKER: Friso Feenstra, Rabbobank. You running ‑‑ you said you had all these POPs and an internal network. Are you also running RPKI on your internal network?
MARTIN LEVY: Routes that are internal, it's a good question. The routes internal are signed anyway for various reasons, but actually, no, we don't need to. We heavily filter between all our servers and all our routers for other reasons, just paranoia in general, so I think that the only place where we are looking at dropping invalids is towards the outside world so we don't need to validate internally, we only need to validate against ASs that aren't ours. So, although our routes are all signed, the RPKI process does not execute on the iBGP sessions internally.
AUDIENCE SPEAKER: Thank you.
JELTE JANSEN: No more questions. Thank you, Martin.
MARTIN LEVY: Thank you.
JELTE JANSEN: And the final speaker for this session is Massimo Candela. Who will be talking about geolocation.
MASSIMO CANDELA: Good morning, I am from NTT. The presentation of today is about IP geolocation, which for what I know, it's not a service offered by NTT but is mostly a private research that I did in collaboration with the University of Pisa, and anyway, it's really nice that NTT supports me and I am here presenting.
So, the reason behind this presentation is, I have been doing geolocation for quite some time, and at some point I just asked myself: Okay, but where are we going? I mean, what is the maximum accuracy that we can achieve with active IP geolocation? And is this a case worth it with technology that we have now? And this presentation tries to answer these questions.
So, let's start with some knowledge. What is IP geolocation? So the reason IP address and you want to know where the denies the device connected with that IP address is in the world, which is a different question from you have an IP address and you want to know where the company that owns that IP address is, which is instead sometimes the answer that you get. So, there are various services and if you query them, many of them are based on anyway data sets, where you ‑‑ at some point somebody had to introduce this information and you can report if something is wrong and it gets corrected. Instead, if you want to have something automatic that tries to calculate the geolocation of IPs in an automatic way there are various approaches and among them the most, let's say the most researched, especially in academia, is the active geolocation so it's to use latency measurements to find where your target IP is. So, look at the picture here, imagine that you have a platform which ‑‑ where there are some devices that you use as a source for latency measurement towards your target that you wanton geolocate, so the source are calmed landmarks and you know exactly the position where they are. What you can do is you collect these latencies and at some point you convert them in distances with a factor that is called speed of Internet, called like this. And after you do this conversion, you have essentially circles, the landmark says, well, the target is more or less around me for X kilometres. If tough enough of these landmarks you can do like in the picture, you can detect the intersection of the circles and guess more or less where the target is. This is called ‑‑ it's the most accurate among the approach for active geolocation.
Of course on top of this you can add all the refinement and whatever but we are not interested in this for now.
So, in this presentation, we are not going to describe any geolocation system or any geolocation algorithm, we are not going to compare any geolocation method or dataset. We will only with IP addresses of the Internet infrastructure, which means we will not try to detect location of laptops or any way client devices but only, for example, servers in data centres or any way close to the core of the network and the results are based and possibly biased on RIPE Atlas which is a measurement network and the reasons the first because it's the biggest one in terms of nodes and you can use it for free, which is really good. The second has been reported multiple times that the roundtrip time collected is really reliable and stable, which is really good for. And the third reason and probably the most important is because this platform has two types of devices from where you can do the measurement. Some are called RIPE Atlas probes which you can host at home and have in your home connection and I suggest you to do that, you will help the community. And the other one are one unit servers mostly in data centres and they are essentially in different position, different toplogical position and they represent effort in outreach and in deployment because the anchors they require is ‑‑ there are parameters for connectivity and blah‑blah‑blah.
So what we want to do is calculate the maximum theoretical accuracy that we can achieve worldwide and for each region and we want to study the effects of doing measurements with both these devices, with the one in the probes and the anchors and do they produce different speed of Internet, do they reach the target with different number of hops? Also, we want to quantify the accuracy for different ‑‑ by using different amount of landmarks, so using different argument of sources for the measurements. Verify also the accuracy if instead of geolocating everything on the surface we focus only where there is Internet infrastructure and we will see what that means. And we want also to report to the community, report back parameters for a possible implementation that somebody may do.
So, big disclaimer here, since yesterday a lot of the presentations were about IPv6. I thought about giving you a break, I will go back to the old classic IPv4. The real reason is, because, unfortunately, the coverage of IPv6 in home connections, so especially for the RIPE Atlas probe, is not there yet so there are not enough RIPE Atlas probe, I need IPv6 and this is not a weakness, it's just there is no mostly IPv6 domain in some regions. When I say IP I mean IPv4 in this presentation.
For the dataset we did 23 million ping measurements and the target of this ping measurement where all the RIPE Atlas anchors and NL nodes which is a good platform and the sources used were all the RIPE Atlas probes and anchors but we divided them in two dataset, one called infrastructure when the measurements are start from an anchor, so it's from the core of the network, and edge, when the measurement starts from the probe, and we did this division because we want to analyse this data separately.
This is the coverage of RIPE Atlas on the left and this is the coverage instead of NLNOG ring nodes, each of these dots is one of the devices deployed. Briefly, when you start with geolocation approach, the first thing you do is you have to calculate the distance to the target and to calculate the distance calculate the ping time with the roundtrip measurement and the one way, and in generally, basically /TKOP roundtrip time divided by two which is an approximation, which will introduce errors, but you can trust me, the error is not even the biggest one in the entire process. Doing roundtrip time divided by two is basically supposing that you have the same round ‑‑ the same latency both ways and you are not even considering how much time the target will take to generate so it is, indeed an approximation.
But it's the only way. The other approach would be to have calculate the proper one way delay but you need devices with GPS on board and at that.point why are you doing IP geolocation at all?
So, after you have the one way delay, you convert that to distance by multiplying that for the speed of Internet. And this is going to introduce the biggest part of the error. This is because not only, in general the speed of Internet is used in all the projects that I saw, it is almost always one worldwide for the entire world and it's going to be two‑thirds of the speed of light or four ninths, which is supposed to be the propagation time on the fibre, which is indeed an upper bound, and at the same time, another problem with the speed of Internet is described in this picture, it doesn't take into consideration the reality of the network, it just considers the distance from the surface, so like if there is a wire direct straight, which is not the case. But indeed, after you calculated the distance, what you can do, if you use a dataset like the one we used where you know exactly where the targets are, even if you are pretending you don't, you can compare the calculated distance with the real distance, and after you compare you obtain an error, and this is an important part for the rest of the presentation.
So, but what about the accuracy? Okay, well, before to go to the accuracy, these are some partial results. So we have the ‑‑ we calculated the speed of Internet based on this dataset that we know where the various sources and destinations are, and we obtained that for the infrastructure, so for the measurement done from devices in the core of the network we have around 72 kilometres per mile seconds, and instead 67 for the edge dataset. Which they are in, first of all, pretty far from the fraction of the speed of light commonly used, and, at the same time, there is a.difference between the two that is expected because, as I said, there are differences in the topologist and to evaluate app bit more the difference in topology we analyse trace route in the same time range generated by RIPE Atlas and we detected in general RIPE Atlas reached the target with three HOPs less than the probes, so indeed there is a topology difference and this may be the reason that impacts the calculation, the speed of Internet value.
So what about the accuracy? And to calculate the maximum accuracy, we used the Cramer‑Rao lower bound used for all, including GPS. Unfortunately on my side I had a professor which did previous work with crammer ROA bound. Another expert defines the lower bound of the square root of the minimum square error, you can calculate as the minimum square root of the minimum square root errors. So, so this is ‑‑ I think there is a better ‑‑ at least for me, a way to explain this, which is this 3D image. So this is, I would say, the main result of ‑‑ the most useful one of the entire research is this 3D model where you have on the X and Y axis latitude and longitude and that is the surface of the earth and on the Z axis is more is error and less is accuracy. On the surface you have all these dots which are the landmarks used which is not really important information for now. The crammer ROA lower bound is computing this continuous surface that is a map on top of the entire surface. So if you would like to analyse this data, the best way would be to pinpoint a specific location on the surface on the earth and say what is the error, and what is the error, the minimum error that I can get there? What is the maximum theoretical accuracy I can get there. Why is it theoretical? This continues on the entire surface and it doesn't mean we have any target there, so this includes whatever part of the earth is a model that describes all of it.
If we look at that from the top as a sort of heat map again and use only one speed of Internet which we will see is not the best solution, for the infrastructure dataset we have various parts of the world that are not even covered because it's white. There is, of course, Europe and some parts of North America. If you go on the edge we have almost total coverage, so if you use the probes we have almost total coverage and we have indeed parts that are that you can geolocate below 10 kilometres of errors, and well of course, Europe especially, North America and some parts in South America and South Africa. So this table here is a kind of summary of what happens, what is the trend with the change of the amount of landmarks used. So, I have to have explain a bit what you see here, this is accuracy are the median of sample value of the entire surface, which includes whatever, including the woods or whatever in the other pictures before was there. So the values here is ‑‑ it can be also high. If you use 10 landmarks, so we take 10 random landmarks and try to geolocate with them the entire world we manage to cover only 1.3% of the surface of the earth, in both data sets, and we have an accuracy, we have an error that is more than 600 kilometres. With the edge dataset, so with the one with the probes coming from home connection, having 40 kilometres more of error which is expected. If we reach 100 probes we see the error ask ‑‑ the difference of error between the two data sets is getting smaller, but we still have the infrastructure dataset performs better in terms of coverage, it has 8% coverage more.
When we reach the last row, the last row means we use all of the landmarks in the data set, we see that clearly will the edge dataset in this case now instead wins and has 78 kilometres of error in their geolocation process and 91 .1% of coverage of entire surface of the earth. Each of these RAOs has been computer 50 times before ‑‑ so in a way that they were reliable numbers. And anyway, the idea is to show the trend, so the conclusion here is the infrastructure dataset, so using devices close to the core, they produce better results but in the end if you have much higher number of landmarks you will, in the end, have better results.
So now, what we did here instead is, we just threw away that part and recalculated this based on each specific region, so each measurement starts and end inside the region. And if you see at the red ‑‑ no, too much, sorry. If you see at the red rows here, they are the speed of Internet and before we were using 72 for the infrastructure and 67 for the edge. But we see here that there are completely different numbers depending on the region. Some have 100, even, and some have like 21, so there is a huge difference among the region and if you use a single speed of Internet for the entire world you will over‑estimate in some part and under‑estimate in other parts and this is what is commonly done today T
So, another thing that you notice is the number ‑‑ average number of HOPs. There are some regions that they reach the target with, like, 10 HOPs and others they reach with 17, and if you may think maybe that is the size of the distance between the end point, but it's not true because if you compare, for example, the average geographical distance of Europe and the other regions ‑‑ for example, Middle East, you will have that ‑‑ I mean, some of these regions they have anyway similar geographical distance and a really different amount of HOPs. And another thing that you can see is that you have, again, the infrastructure dataset, that works better compared to the edge one but has a poor coverage and this is the reason why we have only three regions instead of all the seven, it's because we don't have enough measurement for the other one.
These are the results for Europe. This time, these are done with the specific speed of Internet for Europe, calculated for Europe. As you can see, basically, Europe, we cover all of it and we manage to geolocate almost all of it with below 10 kilometres of error or anyway with really low error, and you can see in this chart at the bottom where the red line is the infrastructure dataset and the blue one is the edge dataset, that the infrastructure one is constantly a bit better but has a poor coverage, it doesn't reach the same extent of the edge one, of the probe one and this is also why in the infrastructure one you have nice blue in the centre but a lot of yellow around.
If you go and do the same thing for North America, we have definitely more yellow, but in the centre America we can geolocate everything below 100 kilometres and there is east coast and west coast that also in that case we have a lot of areas with below 10 kilometres of errors.
In this case, the infrastructure dataset works much better than the edge one but has even a lower coverage than the one for Europe. If you see instead the Asia, the infrastructure dataset is not usable at all, and we have the edge some coverage but in general, the biggest portion of the region is way above 200 kilometres of error.
So, now, the question is, as I said, this analysis creates this model, this tries to establish the maximum theoretical accuracy on the surface, wherever it is, even if you don't /SR‑P target there or will never have anything to geolocate there, but what if instead of doing the median of all these results just focus on where we have infrastructure, so what if we remove one node from the pool, one infrastructural node and try to geolocate that one and repeat that for all the nodes? Well, if you do that, this is the ‑‑ the first table is using again app single speed of Internet, the one we calculated at the beginning for the entire world, and even in this case which is not a best case, we have that the results are much improved, especially when we reach at least 50 landmarks. We have that the errors is almost half of the one before, and in particular, if we reach the last row where before was 250 and 78 kilometres of error, we have 6 and 3 kilometres of error, if we focus only on the infrastructure.
And now what we did is, we did the same thing but let's focus on each single region by using the speed of Internet calculated for each region, and in this case, this table was ‑‑ is what it was before, so the maps that I showed you before are summarised here, which is anyway a great summary to have and we can see that almost all of them are below 200, and the other that we have instead is in the case of non‑uniform, and we have that, in case of non‑uniform when the landmarks, we focus on the landmarks we have many of them are below 30 kilometres.
So, quick conclusion:
Geolocation, it is feasible in terms of active measurement and in terms of also precision, especially if you look on ‑‑ especially if you look on specific landmark. But the using of a single value for the speed of Internet is a terrible idea, especially if you use two‑thirds or four ninths of the speed of light. The speed of Internet is strictly related ‑‑ we define it is correlated to the average hop length which is the source target, divided the ‑‑ which is based especially on the region that you use and the type of landmarks and we provide some numbers for that. The infrastructure landmarks are slightly more precise than the edge one, but in the end the number of landmarks is what will make the difference, but in a real implementation, a real geolocation context, the amount of landmarks used is going to be really important, so having less landmarks and more precise can be still an important factor.
And that's all for my presentation, and if you have questions, this is the moment.
AUDIENCE SPEAKER: Hi, Brian Trammell, Google, speaking in a personal capacity, specifically as the author of one of the previous bits of research in this area. Thank you very much for this talk, this is really cool, there are a few really interesting sort of refinements on the work there, I find that the two most interesting ones being not trying to make an assumption about speed of Internet that is global and the other one is using quantifying the error. I had a couple of questions: One, so you are basically doing this speed of Internet calculation on app regional basis. Have you looked at maybe slicing it up in different ways, so I found if you go to the speed of Internet for ‑‑ one of the tables shows on a regional basis, there are huge variations here. And I think what you are capturing there is sort of difference in deployment patterns. It's 100 kilometres millisecond in Asia because you are all fibre along coast lines and not much in interior, I have no idea what is going on in the Middle East. Have you looked at sort of like slicing these things up in different ways, not regionally but maybe based on technology ‑‑ you can use AS as a proxy for how the deployment works if you look at slicing it up differently.
MASSIMO CANDELA: We are working on that. As you remember in the last MAT Working Group I did analysis about the Middle East and we are working on doing it for all the regions and to try to drill down a bit more and try to understand if there are difference even among countries and what it can be, even technology or whatever.
AUDIENCE SPEAKER: One more quick question. So you are talking ‑‑ you are quantifying this based on the median RMS error. Can you talk a little bit about the intuition as you chose that and not maximum. Maximum are 95 percentile there as opposed to median, you are trying to put a bound on it, and the median doesn't seem the right way to bound it.
MASSIMO CANDELA: The median that is used here is ‑‑ you mean in the table of the error, the median that is used there is mostly to aggregate the result because the crammer ROA bound gives you the entire surface and we simple that and after we collect the median ‑‑
AUDIENCE SPEAKER: It's in the spatial domain. Cool. Thanks.
AUDIENCE SPEAKER: Michael Richardson. I am curious about the effect of CGNs on your collection. Do you wind up with a scenario where you are really localised ‑‑ sending probes out from the Atlas probes and go through the CGNs but that path from the probe to the CGN could be constant for a large variety of actual locations? And I am wondering does this matter to your research or really, because you are only going to infrastructure where you presume they are not behind something.
MASSIMO CANDELA: The only infrastructure that we use, the only nodes that we use for infrastructure are the RIPE Atlas anchors and NLNOG rings, nodes and we didn't ‑‑ we didn't do anything specific for any provider.
AUDIENCE SPEAKER: So you are not going backwards to the stated location of the Atlas probe?
MASSIMO CANDELA: No, the Atlas probes are only used as a source and we only, and for doing the calculation, we only need one way.
AUDIENCE SPEAKER: But you do note where the Atlas probes start from so you can measure the distance.
MASSIMO CANDELA: In general the Atlas probes are in the edge so we care they are in home connection. We care they are also in the same region of the target when we do the regional results but we don't care anything else.
AUDIENCE SPEAKER: Okay, so the situation, for instance, the first Layer 2 hop in most of on tar ewe Canada, is Toronto, your next ‑‑ first Layer 2 hop is Toronto, and so that just ‑‑ it seems like it ought to be doing something to the data if you had many Atlas probes that were in that area, that I would think would change things but you are measuring infrastructure where you know where it is, it seems a bit like it might be a weird problem. I am just wondering, when you get enough v6 probes will you do this again?
MASSIMO CANDELA: That was the idea, to do a comparison with v6 and 4. When we will have v6 we will definitely do more about for now the difference is pretty big so I don't see this happening in the near future, honestly.
AUDIENCE SPEAKER: Asteroid. I have ‑‑ thank you very much for your very enlightening presentation. I have two questions. Now that you have sort of quite neatly done a lot of math to resolve the margin of error around speed of Internet and the uncertainty of distances, what ‑‑ I mean, you very quickly jumped over sort of the initial measurement, which is the measured roundtrip between nodes, and then divide up by two. How ‑‑ what kind of statistics do you use on that roundtrip that you measure and do you measure those on different times of day to accommodate for network congestion and other badness?
MASSIMO CANDELA: Thank you for the question, that's a great question. I probably skipped that, but we did ten ping measurement, ten attempts of the same measurement over different hours for each source target and we collected the minimum one of the set. We did ‑‑ why ten? Ten is not a random number because at the beginning we created a small subset of measurement and did hundreds of them, every hour one, for hundreds of time, and we at some point we analysed the serious and if you take only the first ten instead of taking all of them you will have a different of only 2.5% in terms of roundtrip time, so in general, after you have ten samples the other RIR are going to repeat, more or less. So to reduce the loads on our side and on the Atlas side, we said 2.5 percent of variation it's acceptable and we will use only 10 from now on. That's what we did.
AUDIENCE SPEAKER: So you do ten measurements quickly after each other and do that multiple times a day?
MASSIMO CANDELA: No, we did ten measurements, one each hour, every hour, for ten hours.
AUDIENCE SPEAKER: But I mean, does ‑‑ if you only do a single measurement, don't you then end up in sort of potential Layer 2 discovery Arp stuff that you then have to accommodate for?
MASSIMO CANDELA: But this is for a single pair, but we tried to spread all the other pairs in a way that we could not ‑‑ we did over a week, an entire week, spread as much as possible even among different regions in different pairs to avoid overload.
AUDIENCE SPEAKER: Well, it's not the overload. If you do ‑‑ well quite simply if you do a ping the first packet is going to be slower if it's a new host, the second, third and fourth packet will be a lot more reliable minimum. But I mean, in the interests of time, moving to my other question: Looking at your speed of Internet for various reasons, how can you correlate this to ‑‑ how efficient the interconnection is in those particular regions between networks? Because if, I mean, Middle East is of course ‑‑ sticks out, I mean with my background I know a lot of them do not interconnect inside of the Middle East but they all prefer to go to Marseille or somewhere else that's very, very far away and you will typically not find a path that actually goes direct between two countries. Is it ‑‑ is that a fair assessment of why the speed of Internet is so low or is there something else?
MASSIMO CANDELA: It's actually a perfect assessment is the exact MAT Working Group presentation that I did last time, we compared in particular Middle East and Europe and, yes, in ‑‑ the speed of Internet is related to the circuitousness and in which the Middle East has a lot, and they don't peer among countries in Middle East and always go to other countries, in particular Germany, and also other European countries and also US and Russia, that is one of the reasons, it's like ‑‑ essentially it's sir cuteness which can be done to general topology of the network or specific social situation that will, let's say, prevent particular peering among countries in the same region. But indeed, it is related ‑‑ it's not directly an assessment of the quality of the network, but is an assessment of the circuitousness and of the distances compared to number of hops.
AUDIENCE SPEAKER: Are you aware of any resource that you can get sort of an historic record of this speed of Internet for these regions? I think that would be very interesting to see.
MASSIMO CANDELA: No, I don't have for now, for what I know I actually ‑‑ we were the only one for now to calculate this for all the regions separately, or at least I didn't find anything available around but indeed it would be nice to create a process that, I don't know, twice a year would calculate this based on RIPE Atlas data and store that. I will try to propose that to the RIPE Atlas team.
OSAMA I AL‑DOSARY: Thank you, Massimo.
So that is this portion for today. We will be back at 11:00. A quick reminder to please rate the talks. And one other reminder is we have two seats coming up in the Programme Committee, if you are interested or you know someone who is interested, please send your nomination to pc [at] ripe [dot] net and with a short bio. Thank you very much. Before 3:30, thank you, today.
LIVE CAPTIONING BY AOIFE DOWNES, RPR