Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Discussion options

Select Topic Area

Question

Body

Which code repositories on GitHub are legal to collect data from for model training?

You must be logged in to vote

Replies: 2 comments

Comment options

Hey man.
You can only use code from repositories whose licenses allow it. Public doesn’t mean free to train on, the license decides.
For example MIT, Apache, and BSD licenses generally permit model training if you keep attribution.
HOWEVER GPL-type licenses have restrictions, and repos with no license can’t be used at all.
Always check the repo’s LICENSE file before using it for training if you dont want legal battles with lawyers and such.

You must be logged in to vote
0 replies

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Security Build security into your GitHub workflow with features to keep your codebase secure Question Ask and answer questions about GitHub features and usage
3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.