VulnTotal: Tool for cross-validating vulnerability

Organization - AboutCode

Keshav Priyadarshi
GitHub: keshav-space
LinkedIn: @keshav-space
Project: VulnTotal
Proposal: Link

Overview

VulnTotal cross-validates the vulnerability coverage of publicly available vulnerability check tools and databases. It’s inspired by the VirusTotal multi-scanner virus scanning service. There are scenarios where a package is reported as vulnerable by some tools or databases but not by others, VulnTotal helps in detection such anomaly. We can gradually work with these tool providers to keep each other apprised about newly discovered vulnerabilities and anomaly, making FOSS more secure.

Sneak Peek

https://user-images.githubusercontent.com/44315208/188985807-b13e2c08-dd5c-40ec-8f8d-6b15b6d6f4db.gif

VulnTotal takes PURL as an argument and returns vulnerability data from various data sources. By default, vulnerability data is grouped by CVE.

Note

A PURL is a URL string used to identify and locate a software package in a mostly universal and uniform way across programming languages, package managers, packaging conventions, tools, APIs and databases. more on PURL

VulnTotal Development - Walkthrough

Initial Configuration

The initial PR and commits outlined the core structure and implemented VendorData and DataSource inside validator.py.

VendorData is dataclass that encapsulates aliases, affected_versions and fixed_versions for a vulnerability.

DataSource outlines core methords such as datasource_advisory and supported_ecosystem to be implemented by subclass.

Below is the tree view of VulnTotal for better understanding

vulntotal
 ├── validator.py
 ├── vulntotal_cli.py
 ├── vulntotal_utils.py
 ├── datasources
 │     ├── __init__.py
 │     ├── deps.py
 │     ├── github.py
 │     ├── gitlab.py
 │     ├── oss.py
 │     ├── osv.py
 │     ├── snyk.py
 │     └── vulnerablecode.py
 └── tests
       ├── test_deps.py
       ├── test_github.py
       ├── test_oss.py
       ├── test_osv.py
       ├── test_snyk.py
       ├── test_vulnerablecode.py
       └── test_data
             ├── deps/
             ├── github/
             ├── oss_index/
             ├── osv/
             ├── snyk/
             └── vulnerablecode/

PR and commits related to initial configuration

Adding DataSource

The initial config made adding datasource fairly smooth. AnyNewDataSource just needed to inherit DataSource and implement datasource_advisory and supported_ecosystem

datasource_advisory is core method that takes PURL as an arguments and yields VendorData

supported_ecosystem should return a dictionary that maps PURL equivalent of ecosystem (aka purl.type) to DataSource equivalent ecosystem.

Currently Supported DataSource

1. Open Source Vulnerability <osv.dev>

OSV provides API end-point for querying package vulnerability. Unfortunately NuGet package names aren’t case normalized by OSV. So the OSVDataSource employs NuGet SearchQueryService for discovering the valid case-sensitive package name and then uses that to query OSV. For more on this issue see nexB/vulnerablecode/#800

Related PR: nexB/vulnerablecode#788

2. Open Source Insights <deps.dev>

Writing datasource for deps was quite uneventful. Deps doesn’t provide any documented API except for GCP BigQuery, but it does have obfuscated API and DepsDataSource makes use of that.

Related PR: nexB/vulnerablecode#789

3. GitHub Advisory Database

GitHub provide GraphQL end-point for querying package vulnerability, but it comes with a caveat that one can’t query a specific version of a particular package. It dumps vulnerability related to all the versions of a particular package. For this vulntotal_utils implements a specialized method github_constraints_satisfied to filters out vulnerabilities for specific version.

Related PR: nexB/vulnerablecode#804

4. Sonatype OSS Index

OSSIndexDataSource makes use of oss-index API. OSS-Index only provides CVE’s related particular package version and makes no mention of either the affected package versions or fixed package version.

Related PR: nexB/vulnerablecode#829

5. VulnerableCode Advisory Database

VulnerableCodeDataSource currently make use of local VulnerableCode instance, but soon will be migrated to global instance.

Related PR: nexB/vulnerablecode#832

6. Snyk Vulnerability Database

Snyk comes with no API whatsoever, so had to restore to web scrapping using BeautifulSoup. A specialized method snky_constraints_satisfied was implemented just filter out vulnerabilities for specific version. Among all the datasources currently available, Snyk is the only one that keeps track of malicious packages.

Related PR: nexB/vulnerablecode#842

7. GitLab Gemnasium Advisory Database

Again, GitLab comes with no API, so GitlabDataSource is designed to directly fetch package vulnerability data from GitLab gemnasium repository. For case-sensitive package name, GitLab GraphQL end-point is used to get the exact case-sensitive package name. A similar method gitlab_constraints_satisfied is implemented to filter out vulnerabilities for specific version.

Related PR: nexB/vulnerablecode#883

Automatic Datasourse Registery

All new Datasource must be added to DATASOURCE_REGISTERY to make it available for use. Fortunately __init__.py is configured to take care of this, as soon as a new and valid datasource file is added inside datasources directory it automatically gets registered and vice versa.

Related PR: nexB/vulnerablecode#901

Command-line Interface

VulnTotal CLI takes PURL as an argument and returns vulnerability data from various data sources. By default, vulnerability data is grouped by CVE. It also supports JSON and YAML data dump. Since most datasources are Network I/O intensive, so by default CLI makes use of ThreadPoolExecutor for better efficiency.

Related PR: nexB/vulnerablecode#801

Tip

CLI comes with lots of hidden features that are specially useful while debugging a datasource.
Look inside vulntotal_cli.py to discover them all.

Pre GSoC

Post GSoC - Future Plans & Suggestions

Closing Thoughts

Thoroughly enjoyed working on this project. Weekly calls were greatly helpful and thanks to Philippe, Hritik, Tushar, Shivam for the thoughtful inputs. Learned a lot about various interesting projects and what it takes to tame some of the real world problems. Greatly enhanced my ability to conduct myself in an open source world. All in all it’s been a remarkable journey.