Benchmarking our Network Vulnerability Scanner and 6 others
In January 2024, we decided to evaluate the most used network vulnerability scanners - Nessus Professional, Qualys, Rapid7 Nexpose, Nuclei, OpenVAS, and Nmap vulnerability scripts - including our own, which industry peers can validate independently.
Here’s why we did it, what results we got, and how you can verify them (there’s a white paper you can download with access to all the results behind this benchmark).
Why we made this benchmark - a look behind the scenes
We know how time-consuming it is to try to compare tools using information from vendors, conversations with peers, opinions from forums and communities, and so on.
We’ve experienced this first-hand, both as tool users, when some of our team members worked as penetration testers, and as tool makers, building the network tools on Pentest-Tools.com.
So this evaluation is a resource that responds to what many security specialists need which also doubled as a rich learning experience for our team.
Doing this benchmark allowed us to understand where our Network Vulnerability Scanner stands among its alternatives. We invest a lot in this tool and it’s one of the most powerful in our 20+ toolkit. There’s actually a dedicated team which constantly works on improving its detection capabilities, including by developing modules for critical CVEs.
Let’s see how it all came together.
Building a benchmark for network vulnerability scanners
Benchmarks for network vulnerability scanners are rare and far in between, for two main reasons:
cybersecurity threats evolve extremely quickly, so it’s difficult to establish a static benchmark that stays relevant for a longer period of time
vulnerabilities themselves are so diverse that a relevant benchmark needs to cover a wide range of scenarios, so choosing an evaluation metric is quite tricky.
Figuring our way through complex challenges is why we went into cybersecurity, so creating this benchmark really appealed to us.
First, we chose both open-source (OpenVAS, Nuclei, Nmap vulnerability scripts) and commercial scanners (Nessus Professional, Rapid7 Nexpose, Qualys) to properly reflect the tooling mix most security teams use.
Next, we decided to focus exclusively on remote checks or on assessments from a black-box perspective. We did this because remote detection offers a realistic assessment of the attack surface visible and accessible from the outside.
High-risk vulnerabilities that an external attacker can exploit remotely are particularly attractive targets. They also pose a substantial risk to organizations, so finding and addressing them plays an important role in strengthening the network against unauthorized intrusions.
Finally, to maintain metrics uniform and objective, we settled on two performance indicators:
Detection availability - a scanner states, through a vulnerability database, that it has detection for a specific vulnerability
Detection accuracy - a scanner identifies the specific vulnerabilities.
Shortcomings of a network scanner's benchmark
Currently, most infrastructure sits behind firewalls and Intrusion Prevention Systems (IPSs), which detect and block malicious packets. This creates a challenge for scanners that use remote checks as they can miss vulnerabilities because of the blind spots that IPSs create.
However, these defenses don’t discourage attackers, so the target remains at risk of compromise.
In an ideal world, we’d be able to create an independently verifiable benchmark for both local and remote scanners because they complement each other and provide a complete picture of a system's security posture, blending the depth of local insights with the breadth of remote observation. Alas, we don’t live in a perfect world.
While certain tools might be capable of local checks for the CVEs we analyzed in this benchmark, there was no way we could include this in our evaluation.
How we set the test environment
We used virtual machines hosted on the Vultr cloud platform, protected by a firewall that operates on an IP whitelist mechanism. This gave us the necessary safeguard against unauthorized access while keeping the setup process straightforward.
As part of this carefully structured setup, we deployed every vulnerable Docker container available on Vultr in December 2023. This deployment included 167 distinct environments, spread across 17 instances, with each instance hosting around 10 vulnerable services.
We ran each scanner against all 167 vulnerable environments on vulhub to keep the evaluation both comprehensive and impartial and make sure anyone can independently validate the results.
To manage this complex array of services efficiently, we equipped each instance with its own Docker Compose file, which streamlined configuration and services management.
For clarity, we categorized the analyzed CVEs into those detectable remotely (128 environments) and those that are not (39 environments).
If you want to independently confirm the findings, know that all scanners were updated with the latest detections as of January 2024.
We started most tools with their default settings, targeting the entire TCP port range (1-65535).
We did all our scanning activities for this benchmark during January 2024.
Benchmark results
When looking at the numbers, keep in mind that the number of identified vulnerabilities doesn’t automatically certify the quality of a vulnerability scanner's performance.
The vulnerabilities included in this analysis are a very small subset of the coverage each vulnerability scanner is capable of. Factors such as user-friendliness, the ability to integrate with other systems or the quality of support services can be just as relevant for your organization’s context, but there’s no impartial way to compare them.
Without further ado, let’s see the results:
Here’s a bit more context for this data:
The benchmark shows a similar level of detection availability among the major commercial key players (except for Nexpose, where we couldn’t differentiate between local and remote checks in their vulnerability database). This is relevant in the context of commercial vulnerability scanning solutions which state they have detections for the majority of vulnerabilities in testing environments.
There’s a notable disparity between the detection availability and the actual accuracy of certain tools. We saw the biggest difference in Nessus, which reports detections for 55.09% of all vulnerable environments we tested but only successfully identified 18.56% of them. Similarly, it claims to have templates for 67.19% of all remotely detectable vulnerabilities, yet it only accurately detects 22.66% of them (e.g. CVE-2022-26134 - Confluence OGNL Injection, CVE-2018-7600 - Drupalgeddon 2, CVE-2014-6271 - Shellshock). Following this, both Qualys and Nuclei show lower variance, with their actual detection rates being about 25% lower than what their vulnerability database suggests.
There’s also a subtle shift between the overall and remotely detectable classifications: for all templates, Qualys secures the second position, with Nuclei following in third. However, when focusing solely on vulnerabilities that can be detected remotely, Nuclei moves up to second place, pushing Qualys down to third. This indicates Nuclei has a slightly broader scope in terms of detection for vulnerabilities that can be remotely detected.
All the data behind the results in this benchmark are in this publicly available whitepaper you can download:
A thorough benchmark of network vulnerability scanners 2024
How we plan to use these results
We will use the data in this benchmark to improve our Network Vulnerability Scanner and to add more detections for the CVEs analyzed.
It may be also helpful to know that we always analyze which detections to add based on:
Their EPSS (Exploit Prediction Scoring System) score
How widespread the use of the vulnerable technology/framework is
The vulnerability’s CVSSv3 score
How intensely debated the vulnerability is in the cybersecurity community (e.g: Twitter posts)
If you’re curious to learn more about how our Network Vulnerability Scanner works, we did a video walkthrough of its core engines:
We’re very interested to know if this benchmark is useful for you and how we could improve it, so give us a shoutout if you want to share your feedback!