Enrich SBOM data based on OSSF Security Score Card

Organization: AboutCode

Projects:

Mentors:

Overview

The primary objective of this project was to fetch and integrate the OpenSSF Scorecard data into the Scancode.io platform for all detected packages, thereby enhancing its capabilities for security and community health metrics analysis. The project involved work on two key repositories: Scorecode,which was developed as a PyPI package, and Scancode.io, where the integration with Scorecard data was implemented within scanning pipelines.

Scorecode

scorecode package serves as a PyPI package that has the functions to fetch and store OpenSSF Scorecard data using the OpenSSF public API (https://api.securityscorecards.dev/). It also includes Django mixin models that can be extended and integrated into other platforms with databases, such as Scancode.io and PurlDB, ensuring seamless utilization of Scorecard data across various projects.

Scancode.io

In the Scancode.io project, I added a pipeline that interacts with the scorecode package to fetch and store Scorecard data in the Scancode.io database. The data can then be exported into various outputs like the Software Bill of Materials (SBOM) CycloneDX format (and SPDX too in the future), providing insights into security and community health

Implementation

1. Scorecode Repository:

Developed a PyPI package to interact with the OpenSSF API and fetch Scorecard data to be used in other software packages and store it in appropriate objects.

Created Django mixin models to enable easy extension and integration of Scorecard data into platforms with databases like Scancode.io.

For more information, you can visit the scorecode package on PyPI.

2. Scancode.io Integration:

Developed a pipeline within Scancode.io to call Scorecode functions, retrieve Scorecard data, and save it in the Scancode.io database.

Enhanced the existing SBOM export functionality to include Scorecard data, allowing for detailed security posture and community health metrics analysis in CycloneDX format.

4. Testing:

Conducted comprehensive testing across two primary repositories hosted on GitHub and GitLab to ensure accurate fetching, storage, and export of Scorecard data:

GitHub:

nexB/scancode-toolkit

tensorflow/tensorflow

apache/spark

GitLab: gitlab-org/gitlab

Verified seamless integration and accurate data retrieval across different package ecosystems supported by Scancode.io, ensuring that the Scorecard data aligns with the expected structure and content.

Implemented and executed automated test cases using pytest, which include:

Validation of key fields such as scoring_tool, scoring_tool_version, score_date, score, scoring_tool_documentation_url, and checks.

Type checks for each field to ensure data integrity.

URL validation to confirm that the documentation links are correctly formatted and point to the expected resources.

Added additional test cases for edge scenarios such as non-existent repositories, private repositories, and invalid input formats to ensure robustness and reliability.

Linked Pull Requests

Sr. no	Name	Link	Status
1	Scorecard Integration	aboutcode.org/scancode.io#1294	Open
2	Models integration	aboutcode.org/scorecode#5	Merged
3	Scorcard api call integration	aboutcode.org/scorecode#1	Merged
4	Mixin models for storing scorecard data	aboutcode.org/scorecode#4	Merged

Related Issues

Sr. no	Name	Link
1	Store OSSF scorecard data in scancode.io models	aboutcode-org/scancode.io#1283
2	Show OSSF scorecard data in the UI as quality data	aboutcode-org/scancode.io#1284
3	Export OSSF scorecard data in SBOMs	aboutcode-org/scancode.io#1285
4	Compute summary and clarity for EACH package in a codebase	aboutcode-org/scorecode#3
5	Provide data values in scan results to correspond with license_clarity_score elements	aboutcode-org/scorecode#2

Project Reference Links

Pre GSOC Work

Before GSoC officially started, I had the opportunity to contribute to the ScanCode.io and purldb.io project. During this period, I focused on enhancing various functionalities and laying the groundwork for the upcoming integration of the OpenSSF Scorecard. Below is a list of key pull requests I made:

These contributions were essential in building a solid foundation for the integration of the ScoreCode repository during GSoC.

Post GSoC

After GSoC, the goal is to merge the pull requests into their respective repositories, enabling users to leverage the OpenSSF Scorecard integration for enhanced vulnerability analysis in Scancode.io. Future work includes extending this integration to other platforms like PurlDB.

Acknowledgements

This project wouldn’t have been possible without the incredible support and mentorship of an outstanding team:

The weekly status calls were more than just updates; they were a source of inspiration, ideas, and camaraderie. And the 1:1 calls with Ayan Sinha Mahapatra and Philippe Ombredanne were like mini-masterclasses in software development.

To my mentors: Thank you for not just teaching me the ropes but for showing me how to swing from them! This journey was as much about learning as it was about having fun, and I couldn’t have asked for a better crew to sail with.