Extending license detection to use licenses external to ScanCode Toolkit

Organization: AboutCode
Mentors: Philippe Ombredanne, AyanSinhaMahapatra, Jono Yang

Overview

When doing license detection, ScanCode uses the licenses and rules in the ScanCode LicenseDB. The goal of this project is to extend the capabilities of ScanCode license detection to include licenses that are external to LicenseDB, such as proprietary licenses to be kept within an organization. I also extended it to include licenses installed from external sources.

Implementation

All the work I did is contained in this single PR. I added a new command line option called --additional-license-directory that someone can use to include additional licenses/rules contained in other directories in the license index. Scancode Toolkit uses this license index when doing license detection. This option must be called with --reindex-licenses to explicitly regenerate the license cache, and then when doing license scans, users can just use the regular --license option and these additional licenses and/or rules will be used in license detection.

This change also allows users to install directories of licenses or rules to their local machine, and then Scancode Toolkit will detect and include them in the license cache when someone is reindexing the licenses. If someone wants to create a directory of licenses or rules that they want to install and use in Scancode Toolkit, they must subclass a new Plugin class I added. This allows Scancode Toolkit to identify the location of these installed licenses/rules through a unique entry point and add them to the license index.

Finally, all these changes are tested through multiple unit tests validating both correct behavior and error handling as needed.

Post GSoC

I would like to merge this PR into Scancode Toolkit, hopefully allowing users to leverage this feature to expand their license detection capabilities.

Acknowledgements

Thanks to Jono and Phillippe for being my mentors. I enjoyed all the meetings, code reviews, and design discussions. Thank you for your time and your patience!