VulnTotal: Tool for cross-validating vulnerability
Organization - AboutCode
Overview
VulnTotal cross-validates
the vulnerability coverage of publicly available
vulnerability check tools and databases. It’s inspired by the VirusTotal
multi-scanner virus scanning service. There are scenarios where a package
is reported as vulnerable by some tools or databases but not by others,
VulnTotal helps in detection such anomaly. We can gradually work with
these tool providers to keep each other apprised about newly discovered
vulnerabilities and anomaly, making FOSS more secure.
Sneak Peek
Note
A PURL is a URL string used to identify and locate a software package in a mostly universal and uniform way across programming languages, package managers, packaging conventions, tools, APIs and databases. more on PURL
VulnTotal Development - Walkthrough
Initial Configuration
The initial PR and commits outlined the core structure and implemented
VendorData
and DataSource
inside validator.py
.
VendorData is dataclass that encapsulates aliases
,
affected_versions
and fixed_versions
for a vulnerability.
DataSource outlines core methords such as datasource_advisory
and
supported_ecosystem
to be implemented by subclass.
Below is the tree view of VulnTotal for better understanding
vulntotal
├── validator.py
├── vulntotal_cli.py
├── vulntotal_utils.py
├── datasources
│ ├── __init__.py
│ ├── deps.py
│ ├── github.py
│ ├── gitlab.py
│ ├── oss.py
│ ├── osv.py
│ ├── snyk.py
│ └── vulnerablecode.py
└── tests
├── test_deps.py
├── test_github.py
├── test_oss.py
├── test_osv.py
├── test_snyk.py
├── test_vulnerablecode.py
└── test_data
├── deps/
├── github/
├── oss_index/
├── osv/
├── snyk/
└── vulnerablecode/
PR and commits related to initial configuration
Adding DataSource
The initial config made adding datasource fairly smooth. AnyNewDataSource just needed to
inherit DataSource
and implement datasource_advisory
and supported_ecosystem
datasource_advisory is core method that takes PURL as an arguments and yields VendorData
supported_ecosystem should return a dictionary that maps PURL equivalent of ecosystem (aka purl.type) to DataSource equivalent ecosystem.
Currently Supported DataSource
1. Open Source Vulnerability <osv.dev>
OSV provides API end-point for querying package vulnerability. Unfortunately NuGet package names aren’t case normalized by OSV. So the OSVDataSource employs NuGet SearchQueryService for discovering the valid case-sensitive package name and then uses that to query OSV. For more on this issue see nexB/vulnerablecode/#800
Related PR: nexB/vulnerablecode#788
2. Open Source Insights <deps.dev>
Writing datasource for deps was quite uneventful. Deps doesn’t provide any documented API except for GCP BigQuery, but it does have obfuscated API and DepsDataSource makes use of that.
Related PR: nexB/vulnerablecode#789
3. GitHub Advisory Database
GitHub provide GraphQL end-point for querying package vulnerability, but it comes with a caveat
that one can’t query a specific version of a particular package. It dumps vulnerability related to
all the versions of a particular package. For this vulntotal_utils implements a specialized method
github_constraints_satisfied
to filters out vulnerabilities for specific version.
Related PR: nexB/vulnerablecode#804
4. Sonatype OSS Index
OSSIndexDataSource makes use of oss-index API. OSS-Index only provides CVE’s related particular package version and makes no mention of either the affected package versions or fixed package version.
Related PR: nexB/vulnerablecode#829
5. VulnerableCode Advisory Database
VulnerableCodeDataSource currently make use of local VulnerableCode instance, but soon will be migrated to global instance.
Related PR: nexB/vulnerablecode#832
6. Snyk Vulnerability Database
Snyk comes with no API whatsoever, so had to restore to web scrapping using BeautifulSoup.
A specialized method snky_constraints_satisfied
was implemented just filter out
vulnerabilities for specific version.
Among all the datasources currently available, Snyk is the only one that keeps track
of malicious packages.
Related PR: nexB/vulnerablecode#842
7. GitLab Gemnasium Advisory Database
Again, GitLab comes with no API, so GitlabDataSource is designed to directly
fetch package vulnerability data from GitLab gemnasium
repository. For case-sensitive package name, GitLab GraphQL end-point is
used to get the exact case-sensitive package name.
A similar method gitlab_constraints_satisfied
is implemented to filter out
vulnerabilities for specific version.
Related PR: nexB/vulnerablecode#883
Automatic Datasourse Registery
All new Datasource must be added to DATASOURCE_REGISTERY
to make it available for use.
Fortunately __init__.py
is configured to take care of this, as soon as a new and valid
datasource file is added inside datasources directory it automatically gets registered
and vice versa.
Related PR: nexB/vulnerablecode#901
Command-line Interface
VulnTotal CLI takes PURL as an argument and returns vulnerability data from various data sources. By default, vulnerability data is grouped by CVE. It also supports JSON and YAML data dump. Since most datasources are Network I/O intensive, so by default CLI makes use of ThreadPoolExecutor for better efficiency.
Related PR: nexB/vulnerablecode#801
Tip
vulntotal_cli.py
to discover them all.Pre GSoC
Test sorting of all the OpenSSL versions ever released. nexB/univers#61
Migrate OpenSSL importer to importer-improver model. nexB/vulnerablecode#690
Correct notes for cvssv3.1_qr. nexB/vulnerablecode#599
Add from_versions in VersionRange. nexB/univers#55
Add OpenSSL support in univers. nexB/univers#42
Fix for NpmVersionRange.from_native and README. nexB/univers#34
Add black code-style test for skeleton. nexB/skeleton#56
Post GSoC - Future Plans & Suggestions
Support query using aliases. nexB/vulnerablecode/#824
Adding more DataSource like mend.io. nexB/vulnerablecode/#835
Support for API and Web UI.
Cluster analysis of advisory fetched from different DataSources. nexB/vulnerablecode#822
Handle forever vulnerable packages in VulnerableCode nexB/vulnerablecode#855
Closing Thoughts
Thoroughly enjoyed working on this project. Weekly calls were greatly helpful and thanks to Philippe, Hritik, Tushar, Shivam for the thoughtful inputs. Learned a lot about various interesting projects and what it takes to tame some of the real world problems. Greatly enhanced my ability to conduct myself in an open source world. All in all it’s been a remarkable journey.