MAT Working Group
17 October 2019
2 p.m.

BRIAN TRAMMELL: Hello everybody. Welcome back from lunch. I am the co‑chair of the measurement analysis and tools Working Group, together with Nina who is operating our timer. First of all thank, you very much to the scriber from the RIPE NCC.

So, we have a relatively full agenda today. So, four presentations on various bits of measurements recollect measurement results, and then an NCC tools update and then sort of RIPE yesterday we as the chairs decided that we were going to add our own any other business to the agenda, some discussion about the advisory role of this Working Group with respect to sort of the tooling that is maintained and operated by the NCC.

So, this is like half a proposal, but mainly a question from the chairs. Since Nina and I have been chairing this Working Group has operated primarily as a venue for the dissemination of information about measurement methodologies, tools, results of interests to the RIPE community. So, when I was ‑‑ before I became a co‑chair, when I was an enthusiastic attendee of MAT, it was very much sort of the RIPE Atlas results Working Group, which was very useful. We have some of that, I think there is at least one talk that has Atlas, maybe two where Atlas is the source of data for it. It's also been a good place to land measurement talks from the academic cooperation initiative. But in general, it's just sort of a venue for here is some interesting measurement, let's have a look at these results and talk about them.

It turns out especially given sort of that this is a user community for you know RIPE Stat, for RIPE Atlas and so on, the tools that are built and operated by the NCC, to some extent the discussion here already exists as advice to the NCC about how those tools are used or how they could be used in the future.

A question that I want to have some discussion on later, so after our technical content today, so at the end is should we be more explicit about making time for this in the agenda and should we be more explicit about doing this on the mailing list?

So think about that. We will come back to it sort of after the technical programme. And thank you very much. And with that, I can go back ‑‑ I'd like to go and head and introduce Sara Wassermann, and we can start with decrypting quality of experience in and an encrypted Internet.

SARAH WASSERMANN: Good afternoon today I will talk about how we can use.

So, first of all, a little bit of context about QoE monitoring and in it talk I will particularly focus on QoE. He can monitor it on different levels.

First of all, on the service level, on the application level itself where we can extract metrics such as number of patings on quality switches of of course we can also monitor the QoE metrics on the end user device, we can for example measure whether the user watches the video until the end or not.

And as the video has to transition through a network in order to end up on the user's device, we can also monitor metrics on the network level, and there we can for example extract packet loss or throughput and so on.

After this investigation, we can also explicitly ask the user for feedback, whether she liked her experience or not and all the gathered information we can then feed it into a QoE monitoring system.

And why is QoE so important today? So today more users also use the Internet for entertainment purposes and in particular for video streaming and the users want flawless experience. They do not want any bufferings or unnecessary quality switches. Therefore, ISPs need to deliver high quality services in order to avoid that users switch to other ISPs and also they want of course to attract new users.

So this is why ISPs need to integrate QoE into the network managing systems.

Bad QoE does not only impact individual users but can also have outcomes for large companies. Good examples for that, Google and Amazon. So for Amazon, each additional 100 milliseconds can cause 1% of their sales, and if a page load takes an additional 400 MS to load, roughly 9 percent of the users browse away without waiting for the page to load to its completion.

For Google, if the search result extraction takes an additional 400 MS, that can cost approximately 8 million searches per day. This can have a tremendous impact on services such as Google ads.

And today, we face an additional challenge in the Internet. Which is end‑to‑end encryption. Indeed, HTTPS and QoE one of the traditional inspection of packages unusable, this is why VIPs lack of visibility. So here we can propose two solutions.

So one is to monitor directly QoE at the end user device Tor monitor at the network core.

And either of these solutions comes with its own pros and cons, and for example, for monitoring at the end‑user device, we have the huge advantage that we can monitor directly application metrics, so what I was mentioning before, for example, buffering and so on, quality switches, these metrics are directly linked to user QoE. However, these metrics are very hard to extract from official video apps, and we therefore need for example route access to the user's phone and our own video player.

On the other hand when we want to monitor at the network layer, we have the advantage that a network layer metrics are very easily extractible, especially when using the Android API. However, we need more complex approaches to link these network layer metrics to user QoE.

This is where our work comes into play. Because we want to answer the question, can we infer video quality metrics by only relying on network layer metrics? And in particular, what we wanted to do was infer a session of a QoE metrics on YouTube mobile, what we were doing was using the YoMoapp application, which is freely available on the Google play store, and what it does is that it actually monitors while the user is watching a video, it monitors network layer features and also application layer metrics. So what we were doing is that we were, while the users were watching a video, we were extracting the network layer features at the end of the session we flatten on the computer features in a machine model pole, in our case it was app random forest, then we were inferring session quality metrics. Our dataset is composed of more than 3,000 video sessions collected from almost 400 end users all over the world. And what we were doing, we were extracting almost 300 network layer features. But what we were seeing is that with feature selection, only like around 30 features were already enough to have very high accuracy when inferring video quality metrics.

And so, on this slide, you can now see exactly the metrics that we were inferring. So all the different metrics were related to number of stallings, quality switches and rebuffering ratio.

And here on this, you can see the results. The results of the three targets were kind of encouraging. As you can see for the quality switches and the rebuffering ratios, we ended up with a high through positive rate and a very low false positive rate. However for the number of stallings, that still remains quite challenging.

And now we wanted to take our work onto the next level. So instead of predicting on a session level, we wanted to predict quality metrics on the fly. So on the stream based fashion. So in particular, what we wanted to do is, to predict at each time slot a video quality, each one second time slot a video quality metrics. Here we were not focussing on YouTube desktop. So what we were doing you were only relying on network layer features, and we had roughly 200 of them. And they were subdivided into three groups. First of all, we had the snapshot features, which characterised it one second time slot in which we are currently in. So for which we now want to infer the quality metrics. Then we had the trend features which corrected the last three seconds of the video and then we had the cumulative features which characterised the whole session from the beginning until the point we are in the video now.

And in this work, we were benchmarking a whole set of different machinery models and we were seeing that for each of the targets, the three based models worked the best. So

Features selection showed us that for inferring these video query metrics, that cumulative features, so the features that were corrected in the whole session up until that point in time worked the best.

So just a couple of things about the dataset. So what I was already saying we were focussing on YouTube desktop this time and our dataset was composed of more than 15,000 video sessions. And they were chosen completely randomly so it was the videos were not extracted from a fixed subset of videos. Also we recorded our sessions in differs conditions, sometimes on Wi‑Fi, sometimes cellular networks, etc.

So, here you can again see the different metrics that we were predicting. And this time they were related to the resolution, so here we wanted to predict the exact resolution cluster for 144, and 240 P and so on. The ample bit rate that was a problem and then we had a binary classification problem, to see whether the video is stalling at that given moment or not.

Here on this plot you can see the best results we got for each of these prediction targets. So as you can see, on the upper left plot, the resolution was quite high for each of the different classes. Same goes for the average bit rate when we have always a very low error, very often close to 0. And also the metrics for the stallings is quite encouraging, but again, it's again the most challenging problem here.

And now as you already maybe know, there is already a lot of literature, previous work in the literature about QoE inference. I want to say a couple of words of what we're doing differently.

We put a lot of focus on deployment. So we wanted our system skates and also is able to run on a restricted hardware. So, our features computable in constant time also with constant memory usage, and also we do a very fine grain analysis. So as I was saying we would do the analysis on the one second time slot while previous work very often does it on the five seconds or ten seconds scale.

And also, we are solely relying on packet level information, which is a huge advantage as, for example, we do not need chunk detection, which is a very error prone and very difficult to get.

And to wrap up this presentation, I just want to say a couple of words about Tuite work. We want to move towards proactive QoE traffic management, because for now, what we were doing we were predicting QoE metrics in the current time slot. But if ISPs want to avoid QoE they need to take action in that realm. They need to predict QoE, equidation now and to see what happens in for example five or ten seconds.

So this is what we want to do. And also, we can already have some useful information about QoE, by monitoring user behaviour, for example a user who is grubbing the whole time might experience some rebuffering in the video. So some interesting direction to take. It's also to predict user behaviour also only through encrypted traffic.

So that was all from my side. Thanks a lot for listening and I'm now ready to take questions.


BRIAN TRAMMELL: Any questions?

AUDIENCE SPEAKER: Hi. Chris Woodfield I work for sales force, I am here representing the ARIN AC but my question has really nothing to do with that. You mentioned and issue with not being able to do chunk detection in video streams. Presumably because the pay load is encrypted and you will not be able to see the HTTP it gets. I'm speaking speculatively, but I would wonder if you could possibly detect chunks just via traffic pattern analysis, whereas if you are receiving a video stream, you know, most of the responses are going to be just TCP X, which is obviously encrypted but have a certain traffic pattern as opposed to a full sized HTTP get request that would proceed every chunk in HSTS. Is that something that was attempted and failed or considered?

SARAH WASSERMANN: Yes, so we tried to do this detection as you were saying, by looking at the different packets and the inter arrival times as well. But we saw that the ‑‑ it was quite error prone and even sometimes it worked quite well but what also the disadvantage of chunk detection is it takes time. As we want to be as lightweight as possible, besides being a bit error prone, it also is a bit more complex to do, and so, this is why we didn't do chunk detection, to avoid the errors and to be more lightweight.

AUDIENCE SPEAKER: So possible but not reliable enough for your purposes.


BRIAN TRAMMELL: All right. Thank you very much Sara:

STEPHEN STROWES: Good afternoon. My name is Steve and I am in the, I am a researcher in the R&D department at the RIPE NCC. And I'm going to be talking today, this is essentially a bit of a two sided talk. But I'm going to be talking today about some of the work that we kicked off at CAIDA over the summer which is twofold. One part is picking up on big query and learning how to use big query as against the traceroute data that we collect and the other part is what do we see in the data itself?

So, a quick note on the three platforms that we have. That I'm going to be focussing on in this talk. One is RIPE Atlas. I'm sure all of you are well aware of what RIPE Atlas is. And we collect a lot of measurement data, we collect millions of traceroute results every single day. We collect about 20 gigabytes of compressed JSON every single day and we offer results via the publically KPI, most of the data is publicly available. And we don't necessarily always make it easy for you to be look at the whole dataset. We'll give you easy access to particular measurement IDs, but if you want to carve up the data differently it's not always easy for you to do so.

Beyond that we will give you the raw data. So we will give you JSON but we won't offer a platform to do analysis. That part is definitely left to you.

CAIDA's arc platform, many of you are also familiar with this one, has some similar goals. It's a little bit more research focused. It's more geared towards strictly understanding the topology of the actively measured Internet. They have a slightly smaller set of probes but much more aggressiveively hit the BGP table to try and understand what's actually out there.

Their dataset is mostly public, especially after a little while. I think the most recent data you have to get privileged access to it. And the big query platform then for data injection, data warehousing and data analysis is what we have been pouring this data into and trying to learn how to use. Now they describe this as a data warehousing platform, and when I think of warehousing I think of like long dark aisles and you know things stored away forever and never retrieved, right. And it's not quite as old and slow and dusty as that. Like the language in the interface they put on top it have allows you to get at the data and query the data. In some cases alarmingly quickly. Our goal in exploring this is to kind of form a feedback loop for ourselves is it try and understand is this for in kind of data is this a useful platform that we want to use? Is it cost effective? Or conversely is it not? We can IPv4 data in here and determine that it's not good.

Google big query is a Google scale kind of product. And what I mean by that is that volumes of data that we're talking about are like thefully on the elephant's back. We can throw data in there and they don't flinch. The way that the service is modelled is that you never need to generally think about hardware limitations underneath. You never have to ask for for capacity, disc space, more anything, the service just expands into what you need, and the billing on that is reasonable sane. Like I generally haven't been caught out that badly.

Queries are reasonably quick, they are not realtime quick, you don't get responses back from big query in the order of millisecond but you are going to get responses back in the order of seconds to minutes. Which is, on a human scale, super useful. If I'm doing something exploratory with the data, if I'm trying to understand what's in there, if I'm trying to quickly answer a question that the data might be able to answer for me, then I can work through that and get through it quickly and out quickly. It does also allows for exploration of the data in new ways. Like I can carve the data up along any vectors I like, and then I can carve the data up against the actual values in the data, rather than simply asking for everything that came out of a particular probe or whatever.

So, we have been pouring a lot of arc data, we have been pouring a lot of Atlas data and I have been putting other things in here augment those datasets such as delegated stats files and and routing tables and so sorts of things. I am a sucker for public datasets and pouring them ‑‑ now I just want to IPv4 them into this platform, and the main goal here is to try and understand what we see in the arc in the Atlas datasets or the starting goal is to try to understand what we see.

So, here, I'm going to give you a couple of examples of how we use the platform with respect to this data, and I'll give you a couple of high level analyses on what we see. The analyses are preliminary, I'm going to throw that out there, I would like to verify some of the numbers I have. I would like to do a more detailed analysis of some of the numbers that I have. But as a starting point this is fun.

So in this talk, two things: Starting point is, for example, how long does it take to in in each dataset, i.e. in Atlas or in arc, if you start from an arbitrary point in time, how long does it take for each platform to reveal N autonomous systems? And if you shake down the traceroutes into forward paths, how long does it take for each platform to reveal N adjacencies between autonomous systems?

So big query doesn't really understand what a sub‑net is. It doesn't sustained what an IP address is but it does provide helper functions to translate IP addresses from strings to numerals and back again. And the content of this particular example isn't really terribly important but what I'm doing is grabbing a bunch of responding IP addresses from the traceroute data that we have been putting into big query, and then I'm taking those IP addresses and trying to ‑‑ asking it to determine whether that IP address exists between these two start and end points of a particular prefix.
And I'm sure you are all aware, this is the documentation prefix, so of course it's not in use, or maybe it is. Because of course, it's in use because it's in the documentation. But you know, you can pull out this sort of stuff and find weird things in the data.

And that's really easy to do. Like simple prefix matching is super easy to achieve. Prefix matching where you have millions of IP addresses that you are trying to match against routing tables of close to a million routes is a little trickier and in general you wouldn't try and write that from first principles. But, the working longest prefix match err that I have is based on other examples that are out there and it looks a bit like this, and I'm going to provide linking to like this kind of stuff, how we are using the data inside big query. The longer prefix match err looks kind of like that.

So, going back to the question, like, if you are given a large set of traceroutes data on to two platforms, like how quickly do you accumulate responding ASNs in that data? Forget targets, forget what you are sending data to. This is what comes back. It looks unusually like Atlas is the blue line in both of these; we have got IPv6 on the left and IPv4 on the right. It looks in both of these like starting from August 1, 2019, the Atlas platform surfaces fewer ASNs in the v6 slide but more ASNs in the v4 slide. I do not know why this is. But, they both surface a reasonable number of what that graph looks like. That kind of makes sense, they are doing some kind of topology measurement, he would expect them to be trying to hit as much as of the network as mob. And they get a reasonable proportion. Like on the v6 side I guess there is, what, 16 or 17,000 ASNs and on the v4 side, there is somewhere in the order of 65 K. But then, you are like well are these different sets of ASNs completely? And not really.

If I take the ‑‑ if I look for the totally, the full set of unique across both datasets, and the green line is that, like if I combine the datasets, what we see is that effectively on the v6 side, arc sees most everything and Atlas contributes a bit more and kind of the inverse on the v4 side.

Regarding the gap, good question, I'd like to dig further into that and see what's actually different in terms of what we're getting out.

Thinking about adjacencies. So this is one place where big query is a little bit tricky, or rather it might be this is one place where the data format that we have in big query is a little bit tricksy. What I want to be able to build is sets of adjacency, so if you have a traceroute running from hop 1, 2, 4, 5 then you can assert that hops 1 and 2 are adjacent, hop 3 is missing, you can't do anything with that. Hops 4 and 5 are then adjacent. That kind of logic is tricky for me maybe in an SQL kind of language, but the big query system permits arbitrary code. Arbitrarily, the language of choice is JavaScript. It wouldn't be my choice. But, you can write arbitrary codes and you can then bundle in almost whatever you like. You can run whatever logic you like over this. This is not the code to build out the adjacencies, this is a daft example. But the five lines of SQL at the bottom are calling get IPs, that's the function at the top, with some boiler plate around it to make it fit into whatever type checker big query is running and all this is going to do, if you are familiar with the instructs that RIPE Atlas is going to give you is going to loop over all of the responding and pull out the IP addresses, throw them into a set and give it back to you. This is a nonsense example just to show you what you can put in whatever you like.

There is a bigger example where I do the actual adjacency mapping and from that point, once I have lists of adjacencies I can then do the ASN mapping again, and so, I am doing ‑‑ I have some semblance of what the forward adjacency AS graph looks like according to the traceroute data in both platforms. It's not perfect. I am not doing Smarts with things like identifying IXPs or whatever, but it's an interesting starting point.

And what we see in both, again, v6 on the left, v4 on the right. In both cases Atlas is doing a pretty good job of surfacing more adjacencies at the AS level. This is probably a side effect of a platform with 10,000 vantage points versus a platform with 150 vantage points but it's an interesting number nonetheless and you can ask the question again, do these surface the same lengths or are we surfacing completely different lengths? And I am not sure I trust the graph on the left, but it suggests that over the course of two months, if you pull town two months of data, and you look is that you are going to get all of the assume set of all of the linking that appear in both data sets. I do not now how long it would take for the graph on the right to actually converge.

So, these are interesting, and these are fun starting point and I'm going to keep digging into this to see what the differents are between the platforms because that can help us inform what other measurements we want to run and where we want to put probes to maximise what we actually get out of these.

So, that's an interesting starting point for some data analysis.

We have a kind of a direction of travel with this. Without time lines or whatever we get. But big query is a nice general platform for doing a lot of data analysis. It's a nice inexpensive platform for doing a lot of data analysis. It's probably not going to solve all of your research problems but what we would like to get to a point of doing is opening up our data via this, much in the way that other people are doing such as the M‑Lab folks, so that we put up the data and we manage that part. And then you, the researchers, pull down the data, or rather don't pull down the data, you use the data and run queries on the platform and then you cover the cost of that. And the costs are relatively cheap and generally Google will throw money at you and then you can use it for free.

That's kind of where we want to go with it. We have been prototyping this stuff and we're going to kind of hit an iteration 2 and we'll see where we are with that in the coming months.

So this is fun stuff that's ongoing. There is a lot of detail. There is a lot of interesting stuff here and how this works. I'm going to write more about this. I think there will be a bunch of labs articles over the next couple of months. And some of it is going to be how we are using big query, and some of that is going to be what we actually see different in the datasets. As I say this is a double‑sided talks. This is the two sides of the talk.

And on that, I'm happy to take any questions that you have. Either right now or in person after or in either of these places.


BRIAN TRAMMELL: We are actually pretty good on time. We have a fair amount of time for questions, and if nobody steps ‑‑

AUDIENCE SPEAKER: Hi, I am Cassen. I want to know if you are doing anything on traceroute loops in mailing because I know from CAIDA's data that they have also mailing loops is it possible or did you look at it at all?

STEPHEN STROWES: Yes. Whether I would need to break out into a user defined function or not I'm not sure, but like for sure you can do it, whether you have to write some custom codes or not. I'm not doing that yet, but traceroute data is messy, so I'm wary. This is why I call everything preliminary. But I'm going to get there I think.

AUDIENCE SPEAKER: Robert: Give him five minutes okay and then he will get back to you with the results. He has to get off stage first. But also, I think one point that is worth pointing out that this is an enabler also for researchers who want to combine Atlas datasets or arc data sets for that matter with any other datasets that we already have, preferably in big query. So joining between this dataset and an existing other datasets becomes a trivial exercise which inflates the value of the datasets that we have tremendously.

STEPHEN STROWES: It's obviously in there with matching against BGP tables, but on a much smaller dataset that we throw in there is the metadata for the probes, one you have the metadata for the probes it's I seeier to say give me only measurements that have come from probes located in the particular country or located in a particular point on earth. You can throw in basically whatever you like at that point and match.

BRIAN TRAMMELL: I have a couple of my own questions. On the big long you know, the adjacency, the time to adjacencies find, that graph, how long did it take you to generate that? What was the compute time on that?

STEPHEN STROWES: Do you want the time to compute or the time to learn how to use SQL?

BRIAN TRAMMELL: I would like both.

STEPHEN STROWES: I say start of the summer, SQL is not really my native language, so it took a bit of time for me to learn to use this stuff and shake off old ideas of how you use resources and getting used to more of a database mindset. Where you just, you don't mind duplicating data if you were going to throw it away again, that's just how the system works.

So, probably a little, a lot of time trying to figure out to to do prefix matching. But once you get there, it's speedy. So, the prefix matching stage on like for example one day's worth of IP addresses against a full table will take minutes, a few minutes. And building out the adjacencies I think I reransom of those graphs yesterday and they take might be between five and ten minutes. Like none of the graphs that I showed here took more than probably ten minutes to generate, if I generated them from scratch. Which is nice when you are dealing with two months of traceroute data. Like, this is why I kind of want to share codes because it took me a little while to boot‑strap myself into it, so I want to put down examples of hey this is the graph I drew and this is how I did it.

BRIAN TRAMMELL: So would you like, as you're writing this stuff up for RIPE Labs, would sort of like big query SQL for you know refugees from Python whatever, would that be an article that you could write. Because ‑‑ so, my job at Google I have to use it and I haven't quite learned it yet. I would personally find that useful.

STEPHEN STROWES: Yeah, I'm going to be in tutorial mode I think.

BRIAN TRAMMELL: Excellent. Thank you very much. Are there questions? All right. Thanks a lot Stephen.


Next up we have Massimo.

MASSIMO CANDELA: Hello. I'm Massimo from NTT. This presentation is going to be about periodic behaviours in network measurement. Unfortunately, I was not supposed to be here, a colleague of mine was supposed to be here and present this, and is a collaboration with the University of Roma Tre.

First of all, we have to understand what we mean for periodic behaviour. A periodic function is a function that repeats its values in regular intervals. And the part regular intervals is what we are really interested in. So what we are talking about is events that they happen, they are distanced between each other with a specific time frame.

And we are talking about traceroutes. So, if you use whatever visualisation tools for traceroutes you want, I just show this here, you have that ‑‑ they do the same thing. Essentially you have values probes, sources of the traceroutes and they do traceroute to a target and they do this over time, and if you press play, you will see if the route changes, you will see the graph changing accordingly.

At some point you will end up in a situation where the path will alternate between two or more alternatives. And you will do it with a specific time period. So we are not talking about path diversity here, we are talking about when this happens with a specific timing component.

So, to understand it a bit better. This is a time series of traceroutes done between the same probe towards the same anchor, repeated for 24 hours, every 15 minutes. And you have on the X axis, the time and on the Y axis all the possible alternative paths. So, if you have a new path, you will have a new element on the X axis. So essentially they are a sort of unique hash of the IPs of that traceroute. And you see that between 7 a.m. and 7 p.m., the part in red is something that you clearly recognise that is exactly the same block, repeatedly times, a block of four hours repeated three times.

And the part of this repeated three times is a specific sequence of followed path. So you have the first time the traceroute does the path 0. TTL second time does the path before, the third time P 7 and so on. So this is the order sequence of 17 times, 17 paths repeated three times. So, this is mostly what we will talk about. This is a research that started out of curiosity, and is mostly from researchers to researchers, contrarily to all the tech crunch talk. We are not going to make the world a better place here.

But we want to understand the dynamics of this large scale datasets, in particular we did it with Ripe Atlas of course, and we want to analyse the periodic behaviours that happen in traceroutes. We want to understand if they are an artifact of the dataset, if the topology is really changing or if the fact that measuring is or by injecting packet in the network we are measuring the same network that we are measuring.

The dataset that we use is the angering measurement produced by RIPE Atlas, and at that time, 9728 probes issues traceroute to 258 anchors and we analyse around 78 million traceroutes.

And the first step that we had to do was to define the algorithm, to detect this periodic behaviours in the traceroute and it was a bit, we of course, we didn't want to implement it, but there was nothing really fitting for us in literature, because some they required that there is no noise at all in the dataset, some require that there is a final step, manual where you check and I was not going to do it for 78 million traceroutes. And so, we developed one. I'm not going to spend time on this but the paper is published and it's open access, so you can find it and see.

So, what we did is that we did this algorithm, we tuned it in order to detect periodicity in a dataset that we created. And so to detect the one that we inserted synthetically. After when the algorithm was mature enough and was able to detect whatever we inserted in the dataset, we applied the algorithm to the real dataset, the 68 million traceroutes.

What we found is that 36% of the source destination pairs, they exhibited a periodicities at least one in the week, in the time frame that we used and we analyse, of which ‑‑ this is mostly some numbers to characterise the big amount of this periodicities, what they are, but more details can be found in the paper. So 79 percent of this are repeated at most ten times, so the red blocks you saw before repeated 3, at most ten times. And 88 .5 have at most ten alternating paths but few have more than 100 even, and 70% of them, they last less than two hours, and but some they will stay there for the entire time periodics, maybe they are still there. And at what point we thought was like maybe it can be inter‑domain routing. So we did traceroute to IP to AS route path using map it, and we tried to understand the periodic portion of the path was inside the Sam AS or across multiple ASes and surprisingly we found only 0.15% are inter AS, this means that the RIS path diversity inter‑domain is just not periodic.

So we discovered that 32% of this cases they have, a path has a wild card which is essentially a hop that didn't answer to the probe. And in another path where that is is replaced by an IP address. So ‑‑ and also we noticed that around 71% ‑‑ 61%, sorry, are three hops less than 3 hops or 3 hops to the source but in general they are closer to the source or the target. So we thought okay I simply rate limiting, but why is it happening with timing component in why there is this specific period?

So another explanation we thought maybe by, when RIPE Atlas does this measurement, it does generate enough ICMP traffic. So what we did is we cried to detect over time the amount of ICMP packet generated by a probe. This is an over generation because we don't know other ICMP traffic going on and we do not consider it, so only the one generated. And we can see directly here. Indeed it goes between 0 and 800, so there are pickings of 800 packets which are in literature, many ‑‑ various research that thresholds that you can find in the Internet generally are lower than that. And there is indeed a periodicity burst of ICMP. So we believe that the measurement in RIPE Atlas may be tend to synchronise a little bit and we analyse especially 224 probes that exhibit exactly this pattern. In other probes it's a similar pattern but with a lower amount of packets.

And we also matched that 81% of the periodicity, the one in the slide before, they happen exactly during one of these peaks. We did a deeper analysis of this, but I'm going to go ‑‑

So, another thing is, well load balancers of course, is the first thing that you may think, and Atlas in the anchoring measures uses Paris traceroute, it's nice for path exploration, and it changes the ‑‑ well, Paris traceroute is a normal traceroute where the header is crafted in a way where you pit a per flow load balancer will try to follow the same path every time.

So the fact that the Paris ID is changed cyclicaley, it is good, and at the same time, it could be also the cause of this periodistey that we see, of some of this periodicity. To detect for sure is a per flow load balancer we tried to map the change of bath with the change of ID, and if every time we change ID corresponds a change of path, we can be pretty sure that yes, we are hitting our per flow load balancer. In 25% of the cases of the periodicities are in this case.

These are all hints. You can never be 100% sure.

So, another investigation that we did is MPLS tunnels, and they are used in literature ‑‑ well they are used for a lot of traffic engineering, and so, they also generate its reported that they generate a good amount of path diversity. So, there are various configuration that they may leak information about the usage of MPLS standards, and not all of them are easy to let's say discover based on the dataset available. Some data require active measurement that we didn't have ‑‑ active measurement at the time of the traceroute that we didn't have.

What we discovered was a portion of them, and we ‑‑ and almost around 8 percent of the periodicity that we found involved a tunnel in the path. So, 0.54% of the periodic, the path, the change is exactly the tunnel. 7.34, we have the tunnel starts ‑‑ yes, the tunnel starts inside the alternating part of the path. And in 0.12 we have that the tunnel starts before and ends in the periodic path of the path.

So, we believe that the first two especially are compatible with traffic engineering, I repeat these relationship hints.

And the last one, when you convert from traceroute to ‑‑ well from traceroute to topology, you have to map back to router so multiple items they can go in a single router, and this is process of mapping back is called IP resolution, we use the topology by CAIDA and we discovered that indeed 10.27 percent of the periodicity will include analysis somewhere, but only 0.27 percent disappear after the alias resolution, this means that different path, when after the aliasing, they became the same path in 0.27 percent. While in 0.5 they had different set of alternating path well in 8%, the alias was outside the periodic path so we don't care, and in 1.42, the alias didn't change anything and the periodicity stayed there.

And well this is the end of the presentation. This is the sad moment, the first ‑‑ the main worker of this presentation, a friend of mine, Mattia Lodice, which here in this picture it was RIPE 74 in Budapest and I pushed him in this community of measurement because I thought that he had a lot of energy and he could ‑‑ he was going to contribute a lot. Unfortunately, he passed away at the age of 25. I follow his master thesis and I worked with him a lot, and it was really promising research and a good friend. And that's all.

And now if you have questions.


BRIAN TRAMMELL: Questions? None. I have a comment and a question without my Chair hat on. This was really fun research to look at because, as you said, it doesn't really you know change the world, but you looked into a dataset and artifacts in the dataset and it seems like you found a whole like here is one particular phenomenon and you found a whole bunch of different sort of ground causes for the same phenomenon. I look at this the first time I shout cool, Paris traceroute, good. We can go home. That was only a quarter, and like, I was kind of doing the math as I was going through the presentation, and you still have like more, the majority of these are still not easily explainable right.

MASSIMO CANDELA: I think we have around 40% something that we don't have a hint about it. And well, if you can contribute to that, that would be great, and we have something around 40% that is, we don't have any idea why that happened. And...

BRIAN TRAMMELL: So, are you continuing the work, or Guisseppi continuing the work, to dig in to figure out other things here? Because that could feedback into understandings of the biases in the platform?

MASSIMO CANDELA: Yeah, we would like to, especially if somebody else can, after this presentation, share with me, if they have other ideas, because we kind of finished that. Maybe we will find our ideas later. But, yeah, it's really difficult to really understand why, so some of these are speculation. We have other ‑‑ well, these are pretty ‑‑ we are kind of sure that it's there. But others where there is no much speculation we didn't even include them.

BRIAN TRAMMELL: Cool. Thank you very much. Oh, we have a question.

AUDIENCE SPEAKER: Blake. Can you go back to the slide please where you correlate the Atlas probe data with ‑‑ yeah, that one ‑‑ with the control plane. Do you ‑‑ I'm not sure what I'm trying to say here ‑‑ do you have a way to maybe go back to the Atlas people or something and see if there is a way that they can spread out the tests that are firing on that list more cleanly or? I mean it looks like maybe Atlas is causing this problem itself, right?

MASSIMO CANDELA: Okay. We are working on that already, and it can be also ‑‑ well they have already a mechanism in place. I used to work for them, so I know. But, we will definitely investigate it T maybe it's just how the measurement are.scheduled at the beginning, I don't know. Maybe it's out of their control.

AUDIENCE SPEAKER: Or the tests are firing at the same time and you are triggering the right limit errand that sort of thing. Thanks.

BRIAN TRAMMELL: All right. Thank you very much ‑‑ wait, so, last question.

AUDIENCE SPEAKER: Philip Homberg, RIPE NCC, not a question but a comment on Atlas behaviour. So if Atlas works correctly, then every time a measurement is run from a periodic measurement, then it will take a time interval which should be roughly 7 second and then pick a random spot in that time interval to run the measurement. So it should be roughly periodic because it's an interval of 15 minutes but the exact times could validator.

BRIAN TRAMMELL: It sounds like a follow‑up.

MASSIMO CANDELA: It is indeed, the problem that is mostly probably different measurement are going to overlap. But that's...

BRIAN TRAMMELL: All right. Thank you very much.


OLAV KVITTEM: Hello. I am from UniNett. I have been working with network management measurement for 30 years, trying to assess the quality of the network and the actually that's probably why my hair has turned grey, I don't know.

So, this is the research network of Norway, we of course provide a network something to the researchers but we also try to provide data to the networking researchers.

So, back in 2008, we had a cooperation with our international research network of China and we decided to set up a quality monitoring experiment, and we started to develop a couple of participants, but all the while, we came out with increasing to about 17 measurement points. And we had a simple measurements setup. You just wanted to send 100 packets a second, UDP packets with time stamps and sequenced numbers, so ‑‑ and we chose that speed because we wanted to see if the Internet, you know, could provide the same target quality as IT did for switching lines back at links. It's like a 15 MS limit to change routing.

So we just wanted to see how many small outages are there in the Internet and how fast does routing actually change.

And we added also traceroutes to ‑‑ which is not that frequent, of course, that would be just like a minute, and not so high. Just for debugging.

And we then get the log files, you know, 8 million samples per connection. It's not a full mesh in this network, but we get 2GB a day per connection and there is a ‑‑ it's like a big data analytics thing, so we need to reduce the data to records that try to identify every outage. So we tried to look at packets before and after the outage to see what could have caused this outage.

And this problem was left to researchers for about five years, and they were looking at the data. But one day I started to look at the data myself and I discovered there are engineering issues here, the number of outages we are seeing is way too high you know. In September this year, we had 2,000 outages in this very small, let's say, metrics network, and one‑and‑a‑half hour in total accumulated outage time. So to me that sound a bit high. And you can see to the right, that there is also some queue in the network. There are about, on average, about 200 millisecond queue connected to quite a few of these outages, so some of these might be congestion and not just routing outages.

So, you really can't investigate this number of outages to find the reason. But we went and tried to find out some of the bigger outages and we learnt a bit about why these outages happen. They could be, you know, a site power loss and we are now measuring the router reboot time which is obviously 12 minutes for that particular router. A router upgrade, you know, two minutes outage because theality net path did not POP up as it should. There was a fibre flap and we discovered it took 79 seconds each time the fibre flapped before any traffic passed the our at all. Even if have alternate paths, routing does not converge fast enough.

We also discovered some oddities here. I have seen 19 days old packets in the Internet, and that's why there was some routing behaviour RFCs. It turns out that routers with buffer packets on the outgoing link, so if the link goes down the packet will be sent when the link comes up.

So, I also had, you know, looking at the September statistics for years back, and as you can see you know, it seems like you know that you will have 4 or 500 outages per month, which are about 500 milliseconds, and the previous numbers you see were about 50 milliseconds, but this is 500 milliseconds, and you can also see that the time, outage time, which is, let's say, normally, in any September of the last four years, you have lost between one and two hours of outage time ‑‑ of time when using this network.

So this is about what I'm expecting from a network, and I'm not starting to understand why I had to reload my web pages, why participants are, you know, freezing. It's not because you know, my Wi‑Fi, it could be. But it's because of actually the rerouting is happening all the time in the Internet and there are outages that we should try to get rid of.

So, how can we reduce this number of outages by routing or changing your routing protocols?

As a part of this work I also, I'm able to look at the queueing seen in the network I see surprisingly, long queues, you know. You can see queues for than a second, because I see that the delays increase over a short period of time. And the normal, as you can see here the 95% percentile of the queue time is about 200 milliseconds. So quite a few of these events or these links that you have in the network do have some queueing, and actually surprisingly long queues.

And one of the things I was able to do, because of my measurement I was able to see how a queue builds up. You see the dots here, those are the 10 packets per second times, and we see that within 400 milliseconds, we have 400 milliseconds queue within the 150 MS and just after this, there is a packet loss event, so, this means that this outage was a congestion event. Becasue we really would like to see if they are congest outages or routing outages.

In this process, we are able to try to see the unreliability of the network and you can see it's on average about 86P P M, that is about 49 unreliability. That is not that bad. This is the average in this network, and that's not that bad. But if you look at all the small outages, it becomes very annoying, so I think we should think seriously about this. Trying to improve the number of outages that we're seeing when routing changes.

So, rerouting takes time. I don't think it's not about let's say the routing, let's say algorithms, convergence times. It's about redistributing knowledge about availability. And it seems to be across BGP boundaries, so, because most outages that I see are because between domains, IGP inter domains has a much ‑‑ if you work on a global scale, you have excessive rerouting times.

And one problem we also saw was that actually the routing tables, which are 770,000 entries now, global routing tables, it takes quite a while for a router to write them so after changes come and the update of the routing table could take substantial time.

So, the protocol independent ‑‑ or the prefix independent convergence time, prefix independent convergence for routing tables is probably a prerequisite in routers to get down the routing switching time.

And, but one thing, you as a networking engineering to do is to deflect routing from a router before you upgrade it. Before a known fibre maintenance happens, try to defect routing before you do anything. Because then actually, you get an observable outages or rerouting.

As I said, ISIs seem to be let's say under control, so the internal routing is easy to adjust. And it's quite stable, but BGP is, seems to me, to be ‑‑ the way we operate BGP, should be locked at.

So, I have about 2,000 outages a month and I really can't, you know, investigate all of them manually. So we need to combine this with other information. And we have tried to combine this with the RIPE BGP logs, and ‑‑ but we didn't get a good match. Some students looked at it, but we had some trouble in find coherence with the right logs.

We had some luck with the ISIs, state combined with time stamps we had for these outages. We had a quite a good match with SNMP traps, that's another tomorrow thing, you don't get snappy chats from our domains, so that's why the ‑‑ so that's not that useful. Because this is a global, let's say, routing problem.
We have also looked at the ICMP backscatter, that is rows would sometimes send the ICMP unreachables, when they don't have a route so that's actually confess that go they don't have a route so routing is not working. And we had some success in internally in, internally in our network but not that much in the wide area. But there will also be some information in traceroutes. And the, for example, when a traceroute, let's say stops, and the last stop will always be that you see in a traceroute will actually be a fail to forward the packet further on. So, that gives a hint of where your routing problems are. So we could try to take that information let's say from the RIPE routing table and combined with the data we have and perhaps get a bit closer to understand what is failing, when and why.

So, one of the reasons I'm standing here is I would like to have some feedback. I do understand that what I do now, managing 17 nodes with borrowed accounts with borrowed machines over the world and I am getting fed up running these machines, so if I'm going to go on with this experiment ‑‑ so, I need to find you know, so managing and developing your own measurement platform really doesn't scale, it's not sustainable.

So, I am hoping here to see if I can find an existing platform and see if it could be interesting to run these experiments let's say on a platform like RIPE Atlas could be interesting.

So IP did that when I started this experiment. I looked at the RIPE Atlas but it's the rules for how often you could pull was way beyond what RIPE Atlas was all about. But it might be that you could adjust the experiments to adapt to, or adapt the RIPE Atlas rules, I don't know. But I'd like to discuss this with you.

So, that sums up what I'm going to say. And thank you very much for listening.


BRIAN TRAMMELL: We probably have about 15 seconds for comments and questions for each of you. So go.

AUDIENCE SPEAKER: Ivan Beveridge from IG. I was wondering how you found the, for shut down or anything like that, whether you were using that and whether you had any ability to get your upstreams to depreference your traffic rather than just what the local preference and send traffic your way.

OLAV KVITTEM: I haven't done anything to the traffic flows, I am not sure if I get your question.

BRIAN TRAMMELL: I would follow up on that off line.

AUDIENCE SPEAKER: Hi. Is it possible to use this ICMP backscatter to preemptively switch to a less preferred route if you know that there is unreachability. Kick

OLAV KVITTEM: That's actually a good idea. If you have a transport protocol you could think of trying a new address if one fails, yeah, that would be an interesting idea. Let's do it on the hackathon or something.

AUDIENCE SPEAKER: Apologise in advance this is probably a statement more than a question.

BRIAN TRAMMELL: You have got five seconds.

AUDIENCE SPEAKER: Increasing the reliability Internet does not come for free. The question what will often come is what is the return of the investment when it comes to taking the steps necessary to make the Internet more reliable? The question I often have to answer or had to think about is what use the cases are not viable today that would be if the Internet was much more reliable than it is today? So, when we talk about what needs to happen in order to make the Internet more reliable, thinking about it in that context I think would be helpful and maybe afterwards I'll ask you if you have any opinions because we're out of time.

OLAV KVITTEM: Okay. My favourite explanation is actually high quality low delay video which doesn't accept even small outages. And if you want to fly less, you need the high quality video. So... there are some reasons for doing it. Yes. Thank you.

BRIAN TRAMMELL: Thank you very much.


ROBERT KISTELEKI: Hello everyone. I am from the RIPE NCC. And together with my lovely colleagues, I do R&D within the RIPE NCC.

Let's see if I am certified clicker professional.

As some of you might know, the RIPE NCC R&D lives in a symbiosis with RIPE Labs. You just check it out and it has just turned to ten years old. It's been there for a while. It published a lot of articles from the R&D team and the RIPE NCC in general. The original idea was that we see the content so that the community can pick it up and the community did. You guys, together with us, produced about 11,000 articles in those ten years, so congratulations, well done.

We also, as I said, contributed a the low of content here, and eventually over time it turned out that it's sometimes hard to find those tools, so we created a page together with the labs people, to categorise and list all of them. So from now if you have a question where was that tool again that the NCC did, I recommend you go to lab/tools and you will probably find it. It doesn't have everything that we have ever created; that's not the intention. It's more like the current tools, if you will, and it may expand into a community toolset. You decide.

Coming back to research and prototyping. Own labs, we have published in the recent months a couple of arms around a number of themes ranging from RPKI to the tools that we developed and stat and so on. I recommend you go and check out some of them are very interested.

We also worked on other tools and elaborations. Some of my thunder has been stolen by others during this week, but that's okay. The big data thing you can just do SQL and dig into pet a bytes of Atlas data if you want. The eyeball measurements and also the RPKI web test. So, if you haven't done that web test, I recommend you do that on your own network and see what comes out. You should see a smiley face.

Going to SLAs. The probe population is around 10,000. It's pretty stable. We have 500 anchors. I just checked we have almost exactly the same numbers of anchors as NL has ring nodes. Out of those 150‑ish are virtual anchors, we approached Amazon back in the day they generously provided one anchor, VM anchor, in all of their regions, not availability zones,mind you, regions and those are not easy to set up but they work, and they are friends at Google said we can help you too. So some of the Google regions now have virtual anchors in them and we expect to deploy more in other regions as well and if you happen to work for Azure, you know what I'm going to ask. Please talk to me.

All right. If you want to have your own anchor, just apply for that and we'll make it happen.

In the office we have a new stock, we are going to work on that so we can distribute that. Major internal works are concluding. We had to migrate from Python 2 to Python 3 P all the software, all those things, if you are interested in details, I may do a technical talk about that at some point, or just talk to me during the dinner.

And most importantly, software probes are coming. So, this has been asked for a number of times by the community. Software probes are basically RIPE Atlas probes without the need for us to supply the tiny hardware. You can run this on an existing virtual machine, you can create new virtual machines if you want. You can run it on your home router, you can run it on your existing server. It doesn't matter. It's just a package that you install and it starts behaving as an Atlas probe. That's based on the fact that the probes mostly do outwards connections. So we don't care about what happens on your network, we just measure outwards.

For these, we will have a dedicated infrastructure to drive them. So what that means to you, as the user, is nothing different than what you already have for the physical probes. So from the user's perspective, it's exactly the same. If you will we are doing a virtual controlling infrastructure. We treat them all the probes software probes and hardware probes as the same pool but we separate them out basically for security reasons.

If you are measuring, you can, later on, once this is really going live, you can say I want to measure from software probes or from hardware probes or whatever selection character you want, we are going to support that.

Obviously, because there is no physical hardware that we supply any more, we don't know where these devices are coming from and who they are and so on. So the application procedure is going to be different a bit. We would definitely want to avoid like fast flux probes. So people turning up ten and then shutting them down immediately and straining the system. We will have some kind of controls on that dimension in the system. But for now, that is something we will only do if we see ill effects of people applying probes and trying to do the wrong thing.

(I will effects) this is a software probe or a screen‑shot of a status page of a software probe that was started at the hackathon over the weekend by Randy, Randy and his team worked on making software probes available on other platforms like Debbion, so this is Randy's Debbion 10 probe somewhere in Seattle and it has been up and running since Saturday or Sunday. You can start using this as an Atlas probe, you can measure from it if you want.

We are entering what we call a testing phase now and I imagine we are going to do that until the end of the year or so. That link over there is the one that you should use if you want to help us testing this. Not surprisingly it is the same link that we have used at the previous RIPE meeting when we were asking people to sign up and help us test. Also, we were asking for who can help us packages which is still applicable but if you want to enter the testers pool please go there and fill out the survey. We are going to reach out eventually to everyone who subscribes there because it's a bit of a vetted procedure. So if you apply there, we will enable you so that you can exercise the install the software package and your procedure and this is basically because we want to see how the whole procedure works, how the system behaves when we have software probes. If all goes well, then we will open this up in late December, early January, to the world and then you no longer need to be vetted for that.

We are ourselves publish the full probes code on GitHub. We have published the source code as such before, but now it's in a nicer shape or form so you can just go to GitHub and grab it. We are also deploying our packaging and making available and RPM package for CentOS, which is at the moment is 7, we will probably make CentOS 8, we haven't looked at it yet and we are still looking for partners to help us Maccage is up into other platforms. Now ever since I created these slides there is more news on that.

We are working together with the cz.nic people who are producing TTL Turris devices if you know them and they told us that they are almost ready with this, or even next week, they will have the software release on Turris Omnia and on Turris MOX, you still have to go through the vetting procedure. But it's one more platform that's available. As I said during the hackathon, people worked on the Debbion release of it. Mostly because they wanted to make it run on a pie and I think they succeeded it actually is running on a Raspberry PI, as evidenced it's already running on full Debbion servers. I think there are some loose ends to be tied up there, but eventually that's going to happen as well.

If someone can help with dock reidentifying is and we could end up with something like this. This is the containerised version of the Atlas probes.

Okay, news about RIPE Stat. RIPE Stat is growing like crazy. A couple of years ago which I think three or four we were happy to see a million queries a day, and that's, whatever the numbers say, so you may have recall that Axel said about 75 million, which was true, on ample. In the last couple of weeks we were constantly over 100 million queries a day. So you can imagine that going from one million to 100 million for a relatively small organisation like us in an even smaller team dedicated to stat was not easy and we had our moments. But we are trying to keep it up and do everything we can to keep it up and have it working.

The survey points out that this is the most used tool of the RIPE NCC and when you are talking about 100 million queries a day, that's true.

And that is caused partially by the fact that we are cooperating with other RIRs, namely APNIC and LACNIC are looking at this and let's say personalising it to their clientele or their members, and also means, in some cases, we are doing translation of the widgets that we provide in stat. So, the widgets are available now in Portuguese and Spanish as well. And part of the work was to make it possible to internationalise or translate all of these widgets so that was some ground work that had to be done by the stat team and they did a really good job on that.

The LACNIC version is available. The net/docs version is coming up. But otherwise, it's getting there renewal of RIPE Stat. RIPE Stat, as it is today, has been developed about ten years ago, and the concepts behind how the UI works really stemmed from that era. So that's really not modern any more. It needs some kind of facelift which is the other stuff that the stat team is really really looking at. We have a concept for a new UI. The primary reason is that it's much more mobile friendly, it supports the international identification and so on. I recommend that you go and read about it or even try it out and let us know what you think. This is probably going to be the future. Some of you may recall that we used to have a RIPE Stat IOS app, that no long he were works because we didn't maintain it, it was just too much effort to do that. But this new UI can solve that problem as well. Mobile first approach. PW A, if you know all of those magical terms you know where this is heading. It will just work on your mobile and it will do basically the same things as in the browser.

Also, to keep up with reality, we have to change the terms and conditions every now and then. You can see that. And there was some requests for new and more, better data calls. One of them of coming from researchers I believe originally when they said we would like to have an API to get to RIS, RRC locations and peers, and that appeared, if you happened to be in need of where RCs and peers are, that's the place to go.

And then finally, other projects. I couldn't save this as a surprise to you, because it leaked, everyone talked about this all over the week. Massimo and the others are making tools on it. So fine, it's out there. So we have talked about this before, RIS live, basically supplying BGP data to you in realtime. So, with a few seconds of delay tops, if everything goes well. We are about to declare that production ready, meaning we think it's going to stay up even if you hammer it, I'm sure we will have our moments and our hiccups along the way, but the intending is that we want to keep on supporting this tool. And I would strongly recommend that you check out, if you haven't, Massimo's and the NTT's, I should say, BGP alerter thing because it really just works.

With that, are there any questions?

AUDIENCE SPEAKER: Randy Bush. Just, excuse the insertion, but I am a strong believer in the credit‑where‑due department, the group of us who played with the Atlas probe on software was led by Marco Sinaro. The purpose was supporting it to a pie, Lara Wane Gateway and Philip did the heavy lifting and I was just the geek along for the ride.

AUDIENCE SPEAKER: Leslie Daigle, Thinking Cat Enterprises. I appreciate the need to worry about scale and scope from a standpoint of people mitt install this on their laptops and then you'll have bouncing probes there or not there. I am wondering what architecture think is suppose a network operator decides to insert their I don't recollect with software probes, this infrastructure ready for something like that?

ROBERT KISTELEKI: The infrastructure was designed with scaleability in mind from the ground up. Now, we do have bottlenecks, I am not going to say if you throw a million probes at it it's just going to work. It's not. But we are committed to finding the bottlenecks and fixed them as we go along. That said, we also have to look at what is the value for the community who is using RIPE at class (this is Robert) and who is ultimately paying for RIPE Atlas to have loads of probes from the same provider. There are limits we have to impose. But I cannot really predict the future about exactly how that's going to work.

AUDIENCE SPEAKER: Massimo Candela. It's mostly a comment. I thank you by the way for the advertisement. So, I love the idea of the software probes but this is mostly a comment for the community as being checked that the RIPE Atlas really rocks in terms of reliability of the latency measurement also related to geolocation for example. And also about the fact that are there probes you put them, you forget about them and they are always up while software users nodes. So if you can, that's my suggestion, to keep going with the other one if you can also want to go with the other one. That's at least my suggestion. It's a comment.

ROBERT KISTELEKI: Yes, a comment on the comment. For the foreseeable future, we believe we are going to maintain hardware probes. We are going to maintain deploying hardware probes and manufacturing new ones and giving out to the people who want them. It's really hard to see how this is going to change in the future. People may want to replace their hardware probes with a software probe, maybe when the hardware breaks, then it's much easier to switch over to the software. My belief is that over time, we will have more software than hardware, and if hardware turns to be minuscule compared to software, then we'll have to think again where we want to be. For the moment we have no intentions to stop.

MASSIMO CANDELA: It was mostly a comment like if the community, the people can do, the other one is better especially for terms like having always up I think is better.

RANDY BUSH: Just the line of measurements is trying to compare two Atlas software probes which are on the same LAN as a hard probe that I have got running so that we can get some idea.

ROBERT KISTELEKI: And we did something similar with VM anchors, like we compared the hardware and the virtual, and it just ‑‑

RANDY BUSH: They were separate.

BRIAN TRAMMELL: If I can make a comment on the comment on the comment, is there ‑‑ actually this is a question. Is there a plan once hardware and software probes are both sort of like, you know, no availability to be able to tag a software probe as being adjacent to a given hardware probe? I think that you would then get the ability to say at least be able to differentiate and sort of look at the variations in two of those sort of the IPv6 system as well. I'd be willing to do that, I'll probably end up with ‑‑

ROBERT KISTELEKI: It's a good research question whether you can differentiate from a distance, I don't know. But it's a good question.

BRIAN TRAMMELL: All right. So, thank you very much Robert.


We are out of time. We are about 45 seconds over right now. So any other business, comment, about the MAT Working Group we will take to the list. Thank you very much.