Compute summary for all detected packages.

Overview

Previously, we computed the summary at the codebase level, which included elements like the license_clarity_score, declared_holder, other_license_expressions, and more. This project aims to improve scanning accuracy by computing summaries and license clarity scores for each package and its files, rather than for the entire scan. This involves enhancing package models and ensuring accurate attribute collection across all package ecosystems.

Implementation

  • Added a new command-line option called --package-summary:

    • Provides a package-level summary within a single codebase.

    • Involves the license_clarity_score calculation.

    • Populates package attributes like copyright, holder, other_license_expression, notice_text.

  • The --package-summary option must be used with:

    • --classify: Helps ScanCode further classify scanned files/directories into categories like legal, readme, top-level, manifest.

    • --package or -p: Detects various package manifests, lockfiles, and package-like data, assembles codebase-level packages and dependencies, and tags files as part of the packages.

  • Benefits of the change:

    • Allows users to obtain a more refined summary for each individual package in a codebase.

    • Improves package assembly for various package ecosystems like npm, python-whl, rust, rubygems, etc. Since the package-level summary heavily depends on the package assembly, there were several scenarios where key files for top-level packages were not properly tagged. To address this, a method called get_top_level_resources was implemented. This method retrieves the resources for top-level packages, which helps in correctly tagging the key files.

  • Testing:

    • All changes are tested through multiple full scan tests.

    • Validated both correct behavior and error handling.

Linked Pull Requests

Sr. no

Link

Status

1

https://github.com/aboutcode-org/scancode-toolkit/pull/3792

Open

Post GSoC

I would like to merge this PR into Scancode Toolkit, hopefully allowing users to leverage this feature to expand their package/codebase scanning capabilities.

Acknowledgements

I would like to thank my mentors:

Weekly Status calls were greatly helpful and those special 1:1 calls with Ayan Sinha Mahapatra and Philippe Ombredanne were so amazing. Thank you for your time and your patience!