source{d} MLonCode models

bot-detection

Model that identifies bots from humans among developer identities.

Example:

from sklearn.preprocessing import LabelEncoder
from sourced.ml.models import BotDetection
from xgboost import XGBClassifier

bot_detection = BotDetection.load(bot-detection)
xgb_cls = XGBClassifier()
xgb_cls._Booster = bot_detection_model.booster
xgb_cls._le = LabelEncoder().fit([False, True])
print('model configuration: ', xgb_cls)
print('BPE model vocabulary size: ', len(bot_detection.bpe_model.vocab()))

1 model:

94806d1f-1995-4c72-89c9-07681fa9d97d

bow

Weighted bag-of-words, that is, every bag is a feature extracted from source code and associated with a weight obtained by applying TFIDF.

Example:

from sourced.ml.models import BOW
bow = BOW().load(bow)
print("Number of documents:", len(bow))
print("Number of tokens:", len(bow.tokens))

4 models:

docfreq

Document frequencies of features extracted from source code, that is, how many documents (repositories, files or functions) contain each tokenized feature.

Example:

from sourced.ml.models import DocumentFrequencies
df = DocumentFrequencies().load(docfreq)
print("Number of tokens:", len(df))

2 models:

id2vec

Source code identifier embeddings, that is, every identifier is represented by a dense vector.

Example:

from sourced.ml.models import Id2Vec
id2vec = Id2Vec().load(id2vec)
print("Number of tokens:", len(id2vec))

2 models:

id_splitter_bilstm

Model that contains source code identifier splitter BiLSTM weights.

Example:

from sourced.ml.models.id_splitter import IdentifierSplitterBiLSTM
id_splitter = IdentifierSplitterBiLSTM().load(id_splitter_bilstm)
id_splitter.split(identifiers)

1 model:

522bdd11-d1fa-49dd-9e51-87c529283418

topics

Topic modeling of Git repositories. All tokens are identifiers extracted from repositories and seen as indicators for topics. They are used to infer the topic(s) of repositories.

Example:

from sourced.ml.models import Topics
topics = Topics().load(topics)
print("Number of topics:", len(topics))
print("Number of tokens:", len(topics.tokens))

1 model:

c70a7514-9257-4b33-b468-27a8588d4dfa

typos_correction

Model that suggests fixes to correct typos.

Example:

from lookout.style.typos.corrector import TyposCorrector
corrector = TyposCorrector().load(typos_correction)
print("Corrector configuration:\n", corrector.dump())

3 models:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

source{d} MLonCode models

bot-detection

bow

docfreq

id2vec

id_splitter_bilstm

topics

typos_correction

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Name	Name	Last commit message	Last commit date
Latest commit History 42 Commits
bot-detection	bot-detection
bow	bow
docfreq	docfreq
id2vec	id2vec
id_splitter_bilstm	id_splitter_bilstm
topics	topics
typos_correction	typos_correction
.gitignore	.gitignore
README.md	README.md
SUMMARY.md	SUMMARY.md
index.json	index.json

Search code, repositories, users, issues, pull requests...

src-d/models

Folders and files

Latest commit

History

Repository files navigation

source{d} MLonCode models

bot-detection

bow

docfreq

id2vec

id_splitter_bilstm

topics

typos_correction

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Packages