Database Working Group
17 October 2019
2 p.m.:
WILLIAM SYLVESTER: Welcome to the database Working Group. We have a lot of work in the company, number a work items that we want to go over. I want to thank RIPE NCC for helping us scribe and with the on‑line chat and with that, we will jump right in. Up first, Ed.
EDWARD SHRYANE: Good afternoon, I am a senior technical analyst at the RIPE NCC and I work on the RIPE database. And today I would like to go through the work we have been doing over the past six months, completed work and number of work items in progress and planned for activity plan for 20 and clean‑ups I would like to propose that we do within the RIPE database. So I have 20 minutes, I will try and keep to time and allow time for questions.
This is the RIPE database team, my colleagues are responsible for all of the good work in the last six months, so thank you to them, and Mahesh will be demoing later and Sander is also here.
Firstly Working Group and policies. At the last RIPE meeting we committed to working on the GDPR, the NWI 8 work item and who is resiliency. I will go through all of that.
We had two Whois releases in last six months, 194 .1 and 1.951, we skipped 1.95,
So significant changes since RIPE 78, firstly we now have consistent Latin 1 normalisation for updates and queries, stores data in Latin one character encoding, previously it was inconsistent, how we handled that, now we will try and substitute Latin 1 where possible and if we can't we will substitute a question‑mark and return a warning to the user to tell them that that has happened.
So the a bit more ‑‑ it's a bit more clear how we handle it. And also, we did some GDPR improvement, not to include personal data in historical queries and not to include reference to personal data in historical queries.
Who is out ages, we did have one very brief outage on 10th July due a deliberate failed over so we failed over our update server to a standby and failed back again so we could firstly for testing it, because we tested in development environments but never in production and also to apply a system updates and configuration changes.
There was about 7 seconds downtime, the behaviour was updates that were already in progress, they completed successfully, any updates that came from during those seven seconds, some failed, it's something we are going to improve. And queries were unaffected. And this is an aspect of improved resiliency since failover, if this happens unexpectedly in the future we have a standby we can failover to very quickly.
The RDAP impression that is modern standardised query protocol. The intention is to be consistent across RIRs, we are working on improving that. The other thing we are working on is compliance with the RFCs so there are some grey areas within the RFCs that we are working to define and define the behaviour. We have implemented many improvements, in the release notes for the last two releases and planning further work after this meeting.
Policy 2017‑02 regular abuse‑c validation, my colleague Marco this morning, he announced it's now finally implemented, complete. On the development side, we completed the initial validation, we worked on the development earlier this year under three categories as Marco said LIR organisation abuse‑c, LIR resources abuse‑c, allocations and assignments, and end user organisation and resources. We also worked with the e‑mails service provider, the validator, to work on improving grey listing and network transient errors. This is something that happened quite a lot in the beginning, but we reduced that to ‑‑ it caused a lot of unnecessary work for the RIPE NCC and for our users, a lot of tickets, but we identified the cause and we made some improvements there. I think that's greatly reduced now. We clarified text in the e‑mail template so based on feedback from the community we clarified and improved the e‑mails that we were sending out. We made some bug fixes, so there are some edge cases that came up in production as we were running the service, that we identified and fixed, and in the end we identified ‑‑ we validated about 77,000 abuse‑c e‑mail addresses and about 5% of those were marked as invalid and we will followed up with that tickets. And thank you to the community for all of your feedbacks made this implementation a lot better.
Numbered work item NWI 8, user synchronisation, this is a phase 1 of the numbered work item. The problem definition is, we broke even in two parts, so to easily add or remove users to the default maintainer from the portal and the second phase would be to implement authentication groups. We went for implementing the synchronisation first as phase one, so we focused on synchronising user accounts from the portal to the default maintainer and we are deferring the authentication groups until later, we can work on defining exactly what we want there. We implemented it during August and deployed it in September and there is about 50 organisations have already adopted synchronisation, that was a good start. And Mahesh will be demoing the feature to explain how that works later.
Finally, we have inband notifications. This is a work item that's still in progress. The problem definition is to allow routing information to be pushed out to users regardless of membership status and you can break that into two pieces. There is a legal issue around the opening the protocol and there is a technical issue around which protocol we should adopt. So for ‑‑ on the legal side we had a review, our legal team looked at opening, it was made a member only service through a General Meeting decision in 2011, so it's now part of the charging scheme model and opening the NRTM will need at least Executive Board resolution, I have asked for that to be added to an upcoming meeting. The second thing there, there currently is a separate NRTM agreement on top of the Whois terms and conditions and I would like to drop that if possible, since the data that we are returning is very similar to the data would you get from a regular Whois query. So there may well be no need for a separate NRTM agreement in the future but that's for the legal team to decide.
Solution definitions. At the last RIPE meeting I presented one involving HTPS WebSocket and JSON, effectively a wrap around the current implementation which is easy for us to implement, that is the advantage but there are a couple of downsides; one is that the protocol is not cacheable, also it's a push mechanism, so you need to stay connected to get updates, and finally, the protocol is, it's very similar to what we already have. There are alternatives and I didn't want to preclude discussion, there has been some off‑list discussion in the meantime and there is a separate agenda item coming up to talk about that in more detail.
GDPR, we have made a lot of progress on compliance since RIPE 78. First of all, there was a suggestion at the last RIPE meeting to switch to using role maintainer pairs rather than person maintainer pairs across our services, because the RIPE NCC was contributing in a way to the volume of personal objects that are in the database. So we have implemented a change in the web application, the default now will be to create a role maintainer pair and during the membership application process also role objects will be created instead of persons.
Following legal review and this was presented at RIPE 76, we now don't include personal data in historical queries and we don't include personal role references either. There is a presentation on that. And finally, we made a bug fix, we realised the show version flag in the query protocol allowed you to query for object data even if you are blocked for querying for excessively for personal data so we closed that gap to make the query side consistent now.
One more thing was too, that the RIPE Stat database widget, it's an alternative way of querying the Whois day, now is consistent with how we do things. There is a couple of things, not returning personal history for role objects, and if you are blocked for excessive querying you won't get anything through RIPE Stat either.
And finally, we have made some improvements to our unreferenced object clean‑up job, cleaning up unused personal data in the database. This has existed for a long time and there is a page that explains how it works but we have extended it, there were some gaps where there was combinations of references that we weren't handling and in general we should be deleting unreferenced person role maintainer and objects that are not referenced from outside. Three groups that we have added now are maintainer organisation, organisation person role and maintainer person role organisation. So we have the information, it's a lot easier to deal with unreferenced data.
Moving on to proposed clean‑ups. So I am going to propose four clean‑ups to date in the RIPE database. I appreciate any feedback questions on that. I will also plan to notify the Working Group on the mailing list as well so if anybody misses this presentation, they can follow up on the list.
First thing clean up locked person objects so this follows on from the presentation at the last RIPE meeting. We have a huge backlog of locked objects in the day, roughly 620,000 out of a total of 2 million. The background is originally objects ‑‑ maintaining an object was optional in the RIPE database until 2011, that left a lot of unmaintained objects in the database. In 2016 we decided at that point anything that was unmaintained we lock because we couldn't identify proper owner of that information, but now we are left with a legacy of hundreds of thousands of locked persons that we are effectively maintained by the RIPE NCC. So the ‑‑ there are a couple of goals here in order clean these up that we would like to align with the RIPE policy. Registration data must be correct at all times, that they have to be maintained properly and not by us, and in line with the GDPR. So we assign responsibility for contact details to the correct maintainer. What we are not doing, we are not planning to delete these persons or replacing with dummy data because we feel we would be replacing one problem with another problem. Also, we are not going to try and validate correctness of contact details here because there is just so much of it.
So the idea is that 90% of these references to locked persons are from IPv4 assignments. And within those references, 74% of those assignments are the responsibility of ten LIRs. So within that, we have a way to either assign responsibility for these locked persons to the ‑‑ directly to the assignment maintainer. And in this case we found, since the last RIPE meeting we investigated a bit further, two‑thirds of these locked persons we can assign directly to the assignment maintainer, because in some way the assignment was generally created at the same time as the person so there is a connection between the two of them.
Otherwise, we are planning to set the person maintainer to the LIR maintainer so the less specific allocation maintainer.
In order do this, we would ‑‑ we had discussion on mailing list, we had announced ‑‑ announced Services Working Group and database Working Group and plan to notify end user in LIR organisations before and afterwards. And what is very important is they themselves validate the contact details for these resources, because we are talking about a legacy of person objects that have existed since before 2016 so this data is quite old and there is no way for us to validate it.
So that's the first planned clean up.
Next, within the RIPE database documentation there is an interesting paragraph on the ought numb object that maintainer is not applicable ‑ it's an optional attribute within the aut‑num object but has no purpose, should be deprecated so I remove to remove this, remove the attribute from the aut‑num objects and will affect announce again to maintainer before and after we do any of this removal.
Third clean up, remove status remarks attribute we found in just over 19,000 inetnum objects we still have a legacy status remark that we added to the objects in 2015. To explain the implementation of a policy to set the status to legacy and this was directed at the users of those resources but now four years on we feel that the comment ‑‑ the remark is less useful and it is ‑ it's obsolete. We already cleaned up a similar remark on aut‑num objects two years ago so we feel the time has come to remove this status remark from inetnum objects also. The plan here would be to remove the remarks and again to e‑mail the affected objects before and after the change and it's remarks attributes hopefully it doesn't have any operational impact.
Finally, set default maintainer. So we have an internal job that reports on inconsistencies with the default maintainer feature. Default maintainer, it allows an LIR to update their own organisation and allocation resource objects. It was implemented two or three years ago. First of all, we would like to extend to thank to all of the top level resources for an organisation, it's an insist see that an organisation can Saturday default maintainer in the portal but they don't have access to the ASN, to the assignments and to their legacy objects with that default maintainer so feel it's an inconsistency that we could fix by setting that default on all top level objects for an LIR organisation.
Secondly, we found over time this nightly job has identified more inconsistencies, manual changes have led to the default maintainer being removed from top level objects by end user and by staff within the RIPE NCC doing manual changes during transfers. So I propose a nightly job to resynchronize on these top level objects and to set the default maintainer and remove other user maintainers. So the objects in the database will reflect the status of the default maintainer in the portal.
So they are the four proposed clean‑ups. If there is any feedback on that, I'd appreciate it. Otherwise, I am going to e‑mail the list later with the summary of those things.
RUDIGER VOLK: Deutsch Telecom. Two points. First, the default maintainers, how do I recognise them? Is that a separate attribute or is the schema implicitly pointing something out?
EDWARD SHRYANE: It's an additional MNT by attribute on the top level resources; it's a user maintainer that is on the top level resources for an LIR organisation. So you will have an ‑‑ you would have an allocation, status allocation, you will have the org references to the LIR and the MNT buyer will be the user maintainer, as it appears in the LIRs portal.
RUDIGER VOLK: And an additional maintained by?
EDWARD SHRYANE: Yes.
RUDIGER VOLK: So if I use the traditional Whois I will potentially see two maintained by and I will not know which one is the default?
EDWARD SHRYANE: By default you will have a RIPE NCC maintainer on the object and you will see a user maintainer. But it is possible to add additional user maintainers so there may be good reasons to do that but it creates an inconsistency with the portal because you only choose one default maintainer.
RUDIGER VOLK: My point is, at least in the traditional Whois information, it is not detectible which is which, or well okay it may be the first one is and, well, okay, that's kind of having the order of same named attributes in RPSL is kind of knot really well known concept.
EDWARD SHRYANE: To answer that, we have an internal database that keeps track, so we can tell what choice the user has made. So we know which default mainer is the correct one.
RUDIGER VOLK: Yes, but kind of, we have a huge number of people that may want to look and potentially fiddle around with our LIRs data and, well, okay, it will not be obvious to everybody who is in the business what actually is there and how it's interpreted. You mentioned that 74% of the 640,000 objects, person objects, are responsibility of ten LIRs. I am not completely sure whether that hits our LIR, it might. Slain not naming names.
RUDIGER VOLK: Kind of regardless of that, at least for those LIRs, one could imagine that they actually need high ‑‑ hire temp staff to deal with what is coming there. For smaller ones I don't know what bad impact may hit them. I think it would be a good idea to very quickly pull the ‑‑ take the information that you used for doing this classification and send out a notice, hey guys, we are looking at this and this is the ‑‑ this is the set of persons and the number that you are expected to work on. Early on.
EDWARD SHRYANE: I absolutely agree with that and that it's going to create lot of workload for certain LIRs but this is data that is ‑‑ has been there for years and I think needs to be ‑‑ the time has come to clean it up.
RUDIGER VOLK: I very much appreciate that you are taking this up, but, please, please, please, push the information early on so that the impact can be managed.
EDWARD SHRYANE: Thank you, I understand.
DENIS WALKER: Just a quick response to Rudiger's comment there. These objects are current leap locked so you can't do anything at all with them right now so why would you need to employ staff to do something if there is nothing to do?
RUDIGER VOLK: Because quite obviously, the thing is when they are unlocked, the maintainer will be set and the maintainer will be obliged to immediately fix them formally.
DENIS WALKER: But they are not fixed ‑‑ ‑
RUDIGER VOLK: Fix in the sense of not being stable but correct them.
DENIS WALKER: But you are not doing that at the moment.
RUDIGER VOLK: We cannot at the moment formally because you were saying even verbatim there that since 2011, the LIR, the maintainer has a responsibility to keep the data up to date, must be correct at all times when it's changed now, in five seconds I should be fixing it when these are unlocked the guys who are getting the maintainer ‑‑ getting to be the maintainers will have an obligation to do something there. Even if they cannot do anything right now.
SHANE KERR: But they can do something right now because presumably they have control over the objects that reference these. They could dereference them and they will get cleaned up automatically. They already have that capability today.
EDWARD SHRYANE: Right now you can assign a different, create a person or role object, update your resource and once the person is unreferenced it will be cleaned up, like the job I mentioned earlier. But that is a very slow process. When we did this, we locked these objects three years ago there was over 900,000, nearly a million. Now just over 600,000. 20 objects have been cleaned up in the last six months, since the last RIPE meeting it's very slow process, I don't think we can wait for that to happen on its own.
SUZANNE TAYLOR: From the RIPE NCC and I have a question from remote participatant Cynthia, is there any solution to remove maintainer lower etc. That are not the default maintainer as currently when you change the detainer the old is still left in the maintainer lower attribute.
EDWARD SHRYANE: Currently is not a part of the default maintainer process, and it's up to the LIR to organise the MNT lower themselves. But it's something we could look at, if you want the default maintainer as lower, that is doable as well. Currently it doesn't do that.
If there is nothing else, I will move on to the final section. I know I am already over time. Just to briefly mention upcoming work. So there is a draft activity plan has been published for next year, there is a whole bunch of thing we are planning to do for the RIPE database, improve the res I will yen see, new features, we plan to work on NWI 9, reduce the number of out of region objects, this is the RIPE NONAUTH database we created last year, there's a policy now in last call that will delete after notifying the user, will delete our Route‑6 objects that are conflicting with KSK and NWI 3 which is for discussion in this session, GDPR, since there are further improvements we can make and finally, but a big one, usability improvements, we haven't made a massive usability improvements recently. There is a whole pile of things we could do, improving the search interface, route object creation, queries of person authentication, something we would like to do, make the RIPE database lot more usable. That is our plan for next year.
Any questions? Thank you.
(Applause)
WILLIAM SYLVESTER: Up next, Shane Kerr.
SHANE KERR: I wasn't planning on doing slides, but we made slides for the session later today. I am not in the agenda even. I am part of the new RIPE database requirements task force. So, there is a call out asking people ‑‑ I am ‑‑ asking for members of a task force to look at the requirements of the RIPE database, the idea being we haven't really done a good re‑thinking of what the RIPE database is for, ever, so basically that's the idea of this task force, members were selected and we met for our first meeting this week. You can see the list. We will be talking about this a bit more at the RIPE community plenary later today.
You can contact any of us at any time if you have questions or suggestions or something like that. Our mailing list is publicly archived, I was reminded earlier today. There is the link, it shouldn't be hard to find, if you Google it you should find it.
So the only difference between, we were handed a draft charter for the task force with the idea was to limit the scope and make sure that we finished the work in a timely manner, the idea is not for this to be a long‑running activity but to be finished in a year basically. We made a minor adjustment to our time‑line of deliverables for that, the original plan was for us to have a first draft by the end of 2019. We thought that was a bit aggressive since we are halfway through October so what we are going to commit to instead is to have an outline skeleton of all the things that are going nobody the requirements document by the end of the year, we will get a full first draft a couple months before the next RIPE meeting and we will present that to the community so there will be time for feedback and discussions on that initial first draft and then we can meet at the next RIPE meeting and discuss it in person if that makes sense, with the objective to making updates and revisions to that, filling out missing piece, removing pieces that don't make sense, all that good stuff, to have a final version for RIPE 81, and that's the plan. And at that point there is a requirements document, our work is done, hopefully the task force disbands and passes the work back to the RIPE Database Working Group or wherever it ends up, like I can't imagine it will be anywhere other than here but there may be parts that go to other groups too. This group will be the primary focus of the work but because there is also use of the database by a bunch of other Working Groups, routing, IPv6, all of our other friendly Working Groups will probably have input as well.
That's all I had to say. Are there questions?
NURANI NIMPUNO: Asteroid. Thanks for this. I was going to say something along the lines of make sure that you do good outreach and get the right people involved and a diverse set of people. And I saw the people who were on the task force and I am absolutely convinced you will do so. They are all very good people so good luck and thanks.
SHANE KERR: Great, thank you. All right.
(Applause)
FELIPE VICTOLLA SILVEIRA: Up next we have a demo from Mahesh of the RIPE NCC for NWI 8, the new SSO authentication groups.
MAHESH AGGRWAL: I will be explaining more in detail about this. So, lets start first with why it is needed actually. So, three years back user feature in RIPE portal, allows the user to manage the resources and organisation directly in database but down side is whenever the users are added or removed in RIPE portal the same needs to be, the user needs to be got to in RIPE database and ‑‑ as well. So it's a repetitive and manual task but opens a window of unauthorised access because suppose if the users have been removed from RIPE portal but same has not been removed in RIPE database then there is unauthorised access. So to remove this we introduce a feature called synchronization default, you need to set default maintainer and once you have set that you can unable sin Kronisation, as soon as you enenable all the non‑billing users will automatically added to the default maintainer in RIPE database. So point we noted here is we will be replacing all the SSO attributes currently in default mainer with these SSO coming from the RIPE portal and only will be affected by it.
And any future addition and removal of users in RIPE portal will automatically be deflected in the default maintainer in the RIPE database so there is no manual task needed for it.
How do enable synchronisation. First of all, you always need to log into the LIR portal. There is a section called "my account," you have a subsection of maintainer. Essentially where you set default maintainer. We have introduced a small check box, this is synchronised, so once you click on this check box there is a nice pop‑up explains you basically asks for confirmation and explains in short what it means. Once you click on yes, and is enabled you will see check box as ticked and a green message. It's just two click away.
So, what is the ‑‑ is it means that anyone can enable synchronisation? Actually to enable your account must be present in the default maintainer in RIPE database so if it is not present, it will appear you are not allowed to enable and disable synchronization, as authentication must represent that RIPE database for that default maintainer. Does mean can multiple organisation enable? If we allow this we end up scenario where multiple organisation will essentially overriding each other's attributes, so we are not allowing multiple organisation to synchronise at the same time. So, by default if test MNT is synchronised with organisation org test RIPE, any other organisation will be disabled with this error message.
So, how does RIPE database get affected? Since the synchronisation is unable we have a restriction that you cannot add any SSO attributes from the RIPE database. If you try to do so, you will get an error message like ‑‑ but please, mind that, it affects only SSO attributes user can still go ahead and add any other attributes.
So, is maintainer synchronised? Basically in the user account page you can add users. We have added a new banner here, a warning whatever you do in this page will have an effect on the RIPE database for that maintainer.
Turning off synchronisation. So in the same way you enable it, you just click on the check box to disable it. Again, a pop‑up will come up asking for a confirmation and it will be ‑‑ we will turn off the synchronisation. A couple of points to note while turning off synchronisation the default maintainer state will not be changed; if you turn it off, it's not going to fall back to a state where you were before you turn it on. So we won't change the default maintainer, it will remain as it is but of course the future additional removal of any users in RIPE portal will have no effect on RIPE database so default maintainer no longer updated and you can go ahead and change SSO attributes in RIPE database then.
So I want to mention one special case, let's suppose synchronisation is enenabled and you are trying to change the default maintainer, this is the scenario and you are trying to change the multi task to MNT test 2, the new default maintainer will be automatically synchronised and the previous will be the ‑‑ will be turned off for it. It's just a nice feature.
Future work:
So we are thinking of making SSO authentication groups but only if there is sufficient interest from the Database Working Group and based on our priorities. So if it is allowed we will be working on creating SSO authentication groups.
For further reading we have created lab articles basically, it just explains everything I just explained you. And for any discussion or feedback would he love to hear from you just e‑mail Database Working Group.
And I am open to questions and ‑‑ any questions? Thank you, guys. Thanks.
(Applause)
FELIPE VICTOLLA SILVEIRA: Up next, Job talking about NWI 3, you RIPE NONAUTH.
JOB SNIJDERS: Hi everyone. For a number of years, we have NWI 3 as a working item for our Working Group, but I have personally come to the interpretation that recent developments have overtaken the necessity to continue work on this numbered work item, in this presentation I would like to explain why I think that's the case and then perhaps suggest that we as a group can abandon this work item.
A little bit of history. Back in early 2016, this is ‑‑ predates the split between RIPE and RIPE NONAUTH, this predates the current policy in the Routing Working Group to leverage RPKI to clean up some aspects of IRR data. The idea back then was that when after free Frans ten years ago, INET numbers transferred from RIPE towards AFRINIC but perhaps as a small omission, a tiny detail we missed, the route objects that belonged together with those INET NUMs were not transferred to AFRINIC, and this leads to operational challenges where information is stored in a database where AFRINIC members are not necessarily in a direct relation with the RIPE NCC. And in a way this makes them second‑class citizens because they did not have the same authorisation model as what normal RIPE resources would have.
So, at the time the idea was to move ‑‑ a proposed idea was to move 34,000 objects in one swift move from the RIPE database to the AFRINICish database and say good luck with the rest of your data. Obviously, this effort has not yet materialised in anything tangible since this has not happened and this action item is also fairly old.
And I think one of the challenges here is that it would require buy in from two organisations, AFRINIC and RIPE NCC, and buy in from two different communities that are physically not close to each other. This is ‑‑ means a lot of talking and a lot of risk that the process is bogged down because of the amount of people would he need to involve.
Another downside of this approach is that it would be a rather large immediate change where I personally believe that operational consequences would be fine and we have nothing to worry about, but this may be something that an interpretation other people don't agree with.
So, let's move forward a few years to the 2018‑06 proposal to clean up data that exists in the RIPE non‑authoritative database leveraging RPKI ROAs. The idea is we can use those ROAs to apply the origin validation procedure on IRR objects and previously we have always looked at that procedure to apply it to BGP updates received over I or eBGP, but we can also leverage this procedure in context of IRR.
A quick example of what the mechanism is, and we have presented about this before, but I think it would not hurt to gave quick recap on what the proposal is about, here we have a /24 route object registered in the RIPE non‑authoritative database and this was created out the explicit consent of NTT, but 129.250.15.0 /24 is a number resource administrated by NTT. Now, NTT has created a radio, one ROA exists and one radio specifies that a /16 can exist the maximum prefix length is 16 and the only valid origin for that prefix is AS 2914. And if we take into consideration that RPKI ROA data by definition is created by the resource holders because that's how we constructed the RPKI services in this community, then we can use that express ‑‑ explicit publication of routing intentions to analyse whether thattish route object has a right to exist or not. And if we go back to this route object, this route object describes a situation that should not exist because if a BGP announcement were to align with this route object, that BGP announcement will be marked as a RPKI invalid announcement, because of the ROA. So the proposal is to download the ROA data, apply the origin validation procedure to the RIPE NONAUTH dataset, send out some e‑mails, have a grace period and then delete route objects when they are in conflict with published RPKI data.
And the nice thing about this proposal, it does not affect RIPE‑managed space in any way, it does not affect legacy resources in case they cannot create RPKI RAOs, it's I think a very gentle way to slowly get rid of the incorrect information in the RIPE NONAUTH database.
Some URLs. There is a test tool so you can see whether you are affected or not, and this proposal is currently in last call.
Now, my proposal to this group is that in context of 2018‑06, if that is ratified by the community and implemented by the RIPE NCC if that happens, I no longer see a need for NWI 3 to be discussed any further, because the moment the 2018‑06 proposal is implemented we can send consistent message to all stakeholders of the Internet community and tell them: Should you observe route objects in RIPE's non‑authoritativish dataset, the way to remove them is to publish RPKI RAOs and I think this will allow us to slowly see less and less AFRINIC route ‑‑ route objects covering AFRINIC resources in the RIPE non‑authoritative database and I think such a gentle slower approach to clean up the data is more beneficial than attempting to move this data in one go.
And if people don't create RAOs, if they are happy with the object as it exists today, then that's fine, that's also an option.
So, in short, if 2018‑06 is ratified, I think it would be my preference to abandon NWI 3. With that, I would like to open up the floor for some questions, comments.
SUZANNE TAYLOR: RIPE NCC. And I have another comment this time from Cynthia independent, who says: "Thank you for raising this issue and having suggestion on how to fix it."
JOB SNIJDERS: Thank you.
RUDIGER VOLK: Deutsch Telecom. I think NWI 3 quite certainly is overtaken and regardless of what policy actions else happen, we can just put it to the dumpster. For the actual ‑‑ for the actual dealing and disposal of route objects be that at a certain time classify under origin validation as invalid, I have my reservations, but I think I don't want to bore the audience right now with details. My general approach is, don't mess with other people's data. Sending out first messages about, we have detected this or that, does look bad, or invalid, for me it's quite clearly by far the preferred way over just deleting stuff and the lead time for deleting stuff that would be needed for avoiding negative operational input would have to be fairly long and need the additional communication anyway.
JOB SNIJDERS: Thank you for your feedback. It is good to hear that we at least under existence of NWI 3 are in agreement. But perhaps for different reasons, which is fine. Any other input or feedback? Then I think the work item referred spec to the chairs ‑‑ referred back to the chairs and the chairs would need to gauge whether it is appropriate to close the action item or not. The ball is now in your court.
FELIPE VICTOLLA SILVEIRA:
WILLIAM SYLVESTER: We will take it back to the mailing list.
(Applause)
Up next we have NWI 9 talk talking about the inband notification mechanism.
STAVROS KONSTANTARAS: I am standing in front of you with two hats, one hat is AMS‑IX and the other one is Euro‑IX and I am going to explain later why I have two hats for that.
Let's go further. NWI 9. So this item actually raised on 22 March quite some time, my ex‑colleague, and he clearly stated in an e‑mail that he sent in the mailing list that AMS‑IX is interested on having a mechanism implemented by RIPE NCC that a client can register interest for objects that exist already in RIPE database, mostly routing objects, right? And then we are some restful or whatever mechanism we can get these objects, download them, and the reason is we configure bed filters, route service filters based on that. And then there was a discussion initiated and many people expressed their interest on that argument, support, whatever. It was ongoing discussion. And it became NWI 9.
I want to talk about a motivation regarding this item. The motivation is that we had to report from customers that they received huge amounts of BGP updates in hourly basis and that makes sense because we are very aggressive, but they do receive a lot of chunks of BGP updates which can last between five and 15 minutes based on what they will have done in their RIPE database or in other object.
And another motivation is we have customers reporting that I configured my policy and has not been posted yet to your route server configuration and how much do I have to wait? People expect it to happen very fast, things do not work that fast, especially when we talk about route servers and still people claim we are the fastest in the market so people expect to be more fast. There are a couple of motivation reasons behind for this working item. We started doing some research inside AMS‑IX and said okay, what happens ‑‑ what happens with the changes in the RIPE database with the objects and after doing some research we discovered that the relevant object changes that happened in reference ‑‑ and need to be picked up by the route servers is around 25, 30, in an hourly basis, and this is numbers since 2019, 1st January 2019. And we pull out these numbers on a graph and of course this is not the whole truth; there are also sometimes lots where we have between 100 and 200 changes that need to be picked up, filter generated and pushed in the route servers but the majority of the time, 73% of all times between 0 and 1500 changes we need to pick up and put on the route servers. People go to the RIPE database mostly or any IRR database, they update their ‑‑ their route objects, and these changes need to be retrieved and filters need to be configured. So, we have to really continue supporting this model. So, then ‑‑ so in order to make this model work more efficient we had to look into our current model that we have not only but this is a very typical model that most of the IXPs have deployed in order to support their route S, as you can see here, people use their ‑‑ they have RADB sources like RIPE DB, where they can retrieve the objects, we also have RPKI database. You can also have the Team CYMRU, you fetch this information and put it in your configuration parser or framework, whatever people deployment you also have a validator around which may it please speak or may not speak with your route server. You have 1, 2 or 4 instances of route servers. So this is a classic model. And this model in all the people around, in all IXPs, most of the time is a pull model. I know for RPKI we have the new protocol back‑up RPKI when it started was based on R sync which means the validator had to go and fetch via R sync the information and store it locally in the cache. So it was a full model from the beginning.
So, what are the requirements now we want to fix this problem? There are a bunch of requirements but I would stick only to this requirement because they are more relevant to NWI ‑‑ he want to have the route servers always available, when you push a lot of changes in the route servers, at the same time, when BGP converges, it makes route server not available, when you want to do ‑‑ with some information the route server doesn't respond and this is something we want to avoid, especially when you troubleshoot critical information. Of course, we don't ‑‑ want to have less BGP to customers, I would like to have updates when it's possible or when they are critical and not having redundant information sending to customers.
Fast filtering update of course and fast BGP convergence time is critical importance. In order to support this, I would have to say abandon pull model and go to push model. And in order to go to push model we need three things:
We need subscription, we need signalling and transport. With subscription, we mean people or software agents can sign up for changes for objects that they are interested for; signalling means the server side notifies me when the client or subscriber has a change that needs to be picked up; and transport transfer the new information from the serve side to client side. And this is why we have NWI 9 now for subscription and signalling. So NWI 9 focuses mostly on these two parts and not on the transport part of of course it is a good thing that needs to be happening but we start with that.
So, now, people say we have the NRTM protocol already deployed, might be a good solution, but the protocol focuses on transporting data and nothing on signalling or subscription and it's a work around on top of Telnet protocol. It's not completely stateful or stateless, it requires some initial set‑up, I have to get the split files, go via process to initiate it in order to read the first three serial numbers and go further and pick up changes and so on. And of course routing information, the routing information are still sub sected to RIPE membership. And as RIPE database engineer said correctly, it's for what we want to achieve.
We started looking for solutions around the market, what exists that can feed probably this requirement, and we find that the model based on public subscriber and some topics that you register could be a good candidate, for example, if you have a candidate that says I want to receive all the changes regarding AS‑set AMS‑IX route servers, if I subscribe when there is a change in that I will get notification and I can fetch the changes or maybe I can get a diff at the same time, it doesn't matter. That could be perhaps a good solution, I don't know. But could help. And based on documentation it's also a solution that is very, very scaleable. As you can see, it's parallel operation and message caching so it's very good on that, based on Wikipedia of course, but why not?
So I would say that maybe to go further know within NWI 9 there are three approaches, I would say the Langzaam means slow, the SNEL, fast and Tesnel too fast.
The Langzaam is we have to replaced an Na Rn Tg M,za that'sa form sure. Maybe with RRDP or similar accommodate all the new requirements that come to the model market. We need to first deal with the legal part. We need to design a new protocol, prototype, test, standardise and so on and that could be a future solution and we do a lot of work, it requires a lot of time but can also help us for the future.
The SN EL solution is what actually Ed described before and he presented in the previous RIPE meeting, a solution based on HTPS and web sockets. I will not go deep on that. That is, however, a discussion here regarding if this work might be double work because if we need also to replace an RTM might be an overlap of working between the Snel solution and Langzaam solution. That is something we need to investigate with RIPE database engineers and of the whole solution. The Tesnel solution, we go to Google, there is a top‑up model and write a connector and when there is a change that changes can go over there and we can pick it up using some Python client. I don't say we have to go with a Google solution, I just say we consume a current solution that exists outside. Nevertheless we are talking about writing information that is already publicly available in the market so we don't push any private information or maintainer to pub system. Maybe that could be a good solution to go further now, get the lessons, imwork and prototypes and see how it goes. If it doesn't fit we can easily scrap it because it's a very quick solution and go with more agile approach.
And that is not an issue that we had, it was ‑‑ this issue we raised in the Euro‑IX community, we had a workshop in Amsterdam with other engineers from other IXPs, the whole IXP community how they deal with similar issues and problems, they had to same messages from customers and then we decided to fully support as an IXP community and make it as a Euro‑IX project and that's the reason why I am standing here with my Euro‑IX hat because I am presenting the willingness of the whole Euro‑IX community, not just AMS‑IX. Based on that, I would like to thank you for your attention and answer your questions.
RUDIGER VOLK: Dealing with your requirement obviously costs a lot of complexity and ‑‑
STAVROS KONSTANTARAS:
RUDIGER VOLK: Whatever is being done, there will be serious complexity on your side to deal with the dynamic data. STAVROS KONSTANTARAS: Yes.
RUDIGER VOLK: When I am looking at this, I'm seeing, well, okay, in the NRTM feed or any equivalent encoding of of that data stream, you get all the information, you are saying, well, okay, you do not want all of them.
STAVROS KONSTANTARAS: Correct
RUDIGER VOLK: Okay, you get it all, I think the adequate approach is, you do your complex things that you need to do and you add the tiny thing that you would need to interface with your requirement to be done on the server side, do it on the client side, just do it out what you filter out is relevant to you, and kind of pushing that data flow over Google infrastructure and tools and so on, may be a way to make things easier but I think it is an entirely stupid idea to throw the complexity requirement on to the RIPE database server, kind of I am looking at that, I have had with the basic use of the database server, a couple of more or less catastrophic events when things failed and I am extremely cautious about throw more complexity into that software and operations.
STAVROS KONSTANTARAS: To answer to your comment, complexity, yes, you are going to throw the complexity to the client side, I agree, and okay, that's something that I would like to avoid of course. But at the same time, yes, at the same time, RIPE database doesn't need to do anything complex because they already ‑‑ wait, wait, wait ‑‑ they already have the last motified field which keeps state when it change what happened to an object. The only thing that needs to be done is have a list of subscribers and notify people or agents or whatever when there is a...
RUDIGER VOLK: Fine.
JOB SNIJDERS: If I may, and this will be another long winded comment so chairs, please, signal when I should stop talking. First of all, it is very nice to see interest in this type of problem space. There are not that many people out there that are willing to investigate how things currently work and what can be done to improve things.
I have more comments I would like to make. NRTM can act as a pull protocol as a push protocol and you can get from a change in the RIPE database to the object being pushed to the client in seven seconds. So speed is something we already have. However, NRTM version 3 is not a modern protocol, it lax authenticity and integrity and a lot of things, so there is no question in my mind that we need NRTM version 4 that is structured around modern day mechanisms.
But I, from an architectural point of view, I am concerned with putting some of the decision‑making into the RIPE database infrastructure itself, because if you look at how clients interact with the IRR work there are various algorithms how they construct a list of prefixes from an input parameter such as an AS F ‑‑ which system you are interested in, the order in which you parse them, how you ‑‑
STAVROS KONSTANTARAS: You are talking about resolving? Snide yes
STAVROS KONSTANTARAS: Don't want that.
JOB SNIJDERS: We construct filter ‑‑ prefix based filters for BGP sessions.
STAVROS KONSTANTARAS: I can foresee this is a five years project. I mean, all the solutions might be there.
JOB SNIJDERS: No, I am much more optimistic than five years. The trick is, we can ‑‑ I don't think we can get around having a client, the operator in their own environment instructs to resolve certain things, and if you say I would like to subscribe to changes to this AS set, there may be AS sets that are indirectly referenced, indirectly referenced that may or may not exist in the RIPE database or not, and the moment ‑‑ in a public pops up mechanism you cannot tell the server your policies so the server can never accurately inform you, this changed and it seemingly is not related to your operation but through indirect references which RIPE's database cannot be aware of this is the change that is relevant to you. So ‑‑
STAVROS KONSTANTARAS: You again are talking about resolving ‑‑ sorry for interrupting you. For example in AS set, got ripe.net, see members members and AS numbers, right. At some point this might change and two members go away or three new members come, reference of course, yes.
JOB SNIJDERS: And they have references ‑‑
STAVROS KONSTANTARAS: I am going to still resolve locally, Daemon or i can use Whois or ‑‑ I will resolve locally. What I want is someone to tell me, look, there is ‑‑ this is a change, this is the diff, that will be ideal. If not, okay, but for now would like to have, there is a change, two new members came with two ‑‑ I will take the information and go and do my resolving part. Ideally, we can our RRDP or whatever in even perfect scenario can have resolving as well but you know that this becomes even more and more complex now as more requirements you put into the sophisticated mechanics.
JOB SNIJDERS: Let me rephrase: In the current NRTM model and similarities can be drawn to how we deal with RPKI data, you get everything. Speak as a network architecture with horrible, I am going to make an argument based on ‑‑ let's see if I can work it differently. In the current model you receive everything and it's not even fire host because the amount of objects that change per day is limited, it's very manageable, small data. The way to do resolving, whether that's local or remote, but any resolving at all requires the full set of data to exist in your local cache. So I think a pops up mechanism where you instruct the server, give May subset of changes, can cannot be transformed into a complete solution.
STAVROS KONSTANTARAS: It's not a complete solution but NWI 9 focuses on signalling first and then on the transporting of data. I mean, the transporting of data is already solved by, as you said, NRTM, we already have it. I don't like it. Definitely, if I going to build a solution next year, I will not build around Telnet, there are already reasons, you already mentioned that, I will wait for the new solution of course but that is something already that existing, let's say, okay, that is another problem, let's first fix the first part with signalling and fix transport and how to call in the data and store it. Storing already happened with RRD version 4 because it's kind of modern. Still uses R sync to or NRTM. Slowly we can fix the parts.
JOB SNIJDERS: To prevent this from becoming a five‑year project, and we discussed this in off‑line conversation, I think what we are trying to tackle here is actually two distinct issues:
One is, we need to give everybody, RIPE members and non‑RIPE members, access to the NRTM data stream so that everybody can more quickly than once every 24 hours get access to routing information. This is a matter of policy.
STAVROS KONSTANTARAS: Legal part, yes.
JOB SNIJDERS: It doesn't matter if it's version 3 or the next new thing or whatever we call, it
STAVROS KONSTANTARAS: That is the legal part, we know that.
JOB SNIJDERS: And secondly, once ‑‑ not once, in parallel to opening up people's access to realtime components of how routing information is distributed, we need to specify what we expect from NRTM 4, what the downfalls of NRTM version 3 are, and details like should there be a ‑‑ do we send everything to clients and clients to need to decide based on local policy. At that point we have a technical debate whether you can coherently generate filters or not, but I think if we split it out in those two paths, we will make more progress.
STAVROS KONSTANTARAS: That's for sure, we need to split up in two parts, that's not negotiatable. We can discuss around the technical requirements the next step, I think from Bijal is going to work on it.
JOB SNIJDERS: In this problem space there is zero difference between IXPs and ISPs.
STAVROS KONSTANTARAS: Let's not start this discussion now.
FELIPE VICTOLLA SILVEIRA: So just for clarification, I think what we are hearing is, we are interested in splitting NWI 9 into two paths.
STAVROS KONSTANTARAS: Legal part and technical part.
FELIPE VICTOLLA SILVEIRA: I think there were a couple of things, Ed, you aye talked about too.
JOB SNIJDERS: Task one for our community, we should give everybody as close to realtime access as we can to the dissemination of routing information. The implementation that we currently can do with today's infrastructure and today's technologies is to just say any IP address can connect to the NRTM server and receive updates, but the updates, serve side will be limited to just routing information. So that's low hanging fruit, it's easy, removes friction, it improves the security posture of RIPE members because it facilitates non‑RIPE members to construct better filters. It's essentially a no‑brainer to me.
Secondly, I would say we need to specify what NRTM 4 looks like and as it pops up web sockets over TLS, is it RRDP constructed, we can leave that ‑‑ we don't need to answer those questions right now. I think there is universal agreement that NRTM 3 has ‑‑ is reaching the end of its life and we need to come up with something that can be transformed into an industry standard, also taking into consideration the IETF. So, NRTM 4, let's fight on the details.
RUDIGER VOLK: And kind of, the subscription thing or filtering thing, well okay, the ‑‑ at least for doing quick implementation, complexity also would be your enemy, what is needed is a filter definition language that would be ‑‑ that would be ‑‑ could be used as a subscription request things, or you could implement the filter locally client side and quite obviously, quite obviously, you will be much more flexible if you do your things according to your requirements locally and we avoid putting complexity into the database server. If, after five years of experience, the Euro‑IX members come back and say, yes, we now have the ultimate subscription filter language, well okay, maybe that looks so nice and the database technology used in Amsterdam offers ways for implementing that right now I think keep the architecture simple and separate.
STAVROS KONSTANTARAS: That's why we proposed three approaches over here and one of them was the very fast approach which means we connect, somehow, these changes with public duties ‑‑ pops ‑‑ like Google. I know you don't like Google, that's fine, I don't like either. It's a solution that exists outside and it can handle very good scalability problem which means we can easily implement something, focus on NRTM 4 which can fulfil all the requirements. You don't like this approach?
RUDIGER VOLK: You can today subscribe to MRT M and do your filtering ‑‑
STAVROS KONSTANTARAS: Telnet
RUDIGER VOLK: Kind of transport, whether the packets are attached to pigeons or carried by glass fibre doesn't matter a lot. And kind of the thing is, conceptually you have the data stream and you can filter on the data stream and you can optimise the transport of the data stream.
STAVROS KONSTANTARAS: And start writing some fancy ‑‑ on top of Telnet I have text I have to parse.
RUDIGER VOLK: You have that already.
STAVROS KONSTANTARAS: Did you check if it works?
WILLIAM SYLVESTER: I think we are going to take this to the list, and thank you.
Now we are going to talk about the NWIs that is open, real quick.
DENIS WALKER: Just a quick comment on some of the outstanding NWIs. There was one ‑‑ den I can Walker, by the way, co‑chair. We created one quite recently NWI 10 on the country attribute there was a very quick discussion on the mailing list, half dozen comments and basically everyone said they had no objections to going ahead with this. Nobody has come up with negative things could I think we can declare consensus on this one. It has been talked about quite a lot. So, if there is no comments today then we will just ask the RIPE NCC to go ahead and implement it.
Any last comments on NWI 10 about adding the country attribute to the organisation object? No.
RUDIGER VOLK: Mandatory or optional? Walk the one in organisational will be mandatory one maintained by the NCC, you don't have to do anything. There is still the one exists in the resource objects, which you can set to anything you want, has absolutely zero meaning to anybody else and that's how it's been since year one.
So NCC, we would like you to go ahead and do it.
One other comment, there is still four four of them outstanding, numbers, 1, 2, 4 and 6, the whole principle when, this was to get work done. Having them sitting there for a number of years is kind of the opposite of what we wanted to achieve. So, these four outstanding ones that no one has talked about for a long time, over the next few months, between now and next RIPE meeting, I will bring them up one at a time on mailing list, have a quick discussion and decide to take some action or reject it. One way or the other. We don't want to leave them sitting there. I will just bring them up on the mailing list and let you have your comments. Any other comments at all on the NWI system or any of the NWIs? No. Okay. Thank you.
WILLIAM SYLVESTER: With that, we have a couple of minutes left, does anybody have any other business? All right. I see none. I want to say thank you again to RIPE NCC for support scribing and online chat and stenographer for helping us out today and with that, we will adjourn until our next meeting. We will speak to you all on the mailing list.
(Applause)
LIVE CAPTIONING BY AOIFE DOWNES, RPR
DUBLIN, IRELAND