Manual TU Delft Benchmark
For some of the providers, TU Delft has drawn up an abuse benchmark. On this page, the aim of this benchmark and the model underlying it are described.
Virtually all online service providers have to deal with abuse, and the hosting market is no exception. Here, for example, abuse is in the form of hacked servers, phishing sites, malicious re-directs, command-and-control servers, spam servers, exploit kits, et cetera.
If you try to combat abuse as a hosting company then how do you know whether you have been effective? If you have ten abuse incidents per year, for example hacked servers or spammed clients, then is that a lot? Or is it little? How many incidents do other providers in the hosting market experience? How can you compare those numbers between companies of different sizes and with different services? Without an answer to these questions, it is hard to know whether you are doing things well.
Over the past few years, a team from TU Delft has worked on an abuse benchmark for the hosting sector. We are now sharing the most recent version of this with the sector. This does not mean, however, that the benchmark is perfect and without limitations. It still contains uncertainty and noise. The benchmark does, nevertheless, contain valuable information that is useful for every hosting company which wants to learn how to better combat abuse. We have tested the benchmark extensively and have discovered that it can predict with a high degree of confidence how many incidents will occur in the hosting network. The benchmark is not intended for naming and shaming, but rather to give individual companies insight into where they stand compared to other providers.
The ins and outs of the benchmark and the tests we have performed have been published in a peer-reviewed scientific article . In brief, the benchmark works as follows.
- We define a hosting provider as the entity that according to WHOIS data is responsible for the IP ranges with hosting services. Therefore we do not take Autonomous Systems (AS) as the starting point. In a previous study, we discovered that on average seven providers are active per AS. In this first version of the benchmark, we only included providers that are members of NBIP, ISPConnect or DHPA and for whom we could retrieve the WHOIS information. That includes 129 providers.
- We subsequently took several abuse feeds . For each feed, we counted the number of incidents observed at each provider in the period January to August 2018.
- Large providers have more incidents than small providers because they have more clients and infrastructure. Of course, this does not mean that they are less secure. The type of service provided also makes a difference. We therefore collected several characteristics of the providers, such as the number of IP addresses they advertise, how many domains they host and how much shared hosting they have.
- We entered the abuse and provider data in a statistical model. The model examines the number of incidents, taking into account the size of the provider and, to a limited extent, the type of services and the network. We will not explain in detail how the model works, but it basically works in the same way as an online maths test. Such a model estimates how good somebody is at maths if they have a certain distribution of points across the various questions of the maths test. In the model for the benchmark, each feed is a test question that the provider scores a certain number of points for. The model subsequently estimates which underlying skill best explains that score distribution compared to the other “pupils”. The model also provides a margin of uncertainty for that score. Some scores are quite robust, whereas others have a large margin.
- The benchmark is a number that expresses the position the provider occupies in the total group of 129 providers according to the model. That number is expressed as a percentile. A provider with a score of 20 is therefore located in the 20th percentile. This means 20% of all providers have more abuse than this provider and that 80% of all providers have less abuse with due consideration for the size of provider and type of services provided. A score of 20 is therefore a bad score in terms of combating abuse. It means you belong to the worst 20% of the market. We have adopted the following simple classification to communicate the results: a score of 1-20 is bad, between 20-80 is average and between 80-100 is good.
- Finally, we have also calculated a vulnerability benchmark in addition to the abuse benchmark. This follows the same steps but using vulnerability data instead of abuse data . This benchmark is also expressed with the descriptions bad (1-20), average (20-80) and good (80-100). A provider could therefore perform well in terms of abuse and badly in terms of vulnerabilities.
The abuse and vulnerability data used in the benchmark cover the period January to August 2018. This data is only available to a limited extent in the AbuseIO environment of abuseplatform.nl. This environment does not yet contain all feeds and also only the most recent data. With subsequent iterations of the benchmark, the data should become synchronous with that of the platform. In other words, the benchmark data will be based on the incidents and vulnerabilities that the provider can see in his AbuseIO account on abuseplatform.nl.
Arman Noroozian, Michael Ciere, Maciej Korczynski, Samaneh Tajalizadehkhoob & Michel van Eeten (2017), Inferring Security Performance of Providers from Noisy and Heterogenous Abuse Datasets, Workshop on the Economics of Information Security (WEIS2017), La Jolla, CA.
- Spamhaus DBL (split into C&C, phishing, malware, spam)
- Shadowserver Compromised website report
- Shadowserver Command and control report
- Google Safe Browsing
- Shadowserver Drone Report
- Shadowserver CHARGEN report
- Shadowserver Open Memcached server report
- Shadowserver Open Resolvers Report
- Shadowserver SSL Freak Vulnerable Report
- Shadowserver SSL Poodle Report