DeepSegment: A sentence segmenter that actually works!

DeepSegment is available as a free to use API (https://fastdeploy.notai.tech/free_apis) and as a self-hostable service via https://github.com/notAI-tech/fastDeploy

Note: For the original implementation please use the "master" branch of this repo.

Code documentation available at http://bpraneeth.com/docs

Installation:

# Tested with (keras==2.3.1; tensorflow==2.2.0) and (keras==2.2.4; tensorflow==1.14.0)
pip install --upgrade deepsegment

Supported languages:

en - english (Trained on data from various sources)

fr - french (Only Tatoeba data)

it - italian (Only Tatoeba data)

Usage:

from deepsegment import DeepSegment
# The default language is 'en'
segmenter = DeepSegment('en')
segmenter.segment('I am Batman i live in gotham')
# ['I am Batman', 'i live in gotham']

Using with tf serving docker image

docker pull bedapudi6788/deepsegment_en:v2
docker run -d -p 8500:8500 bedapudi6788/deepsegment_en:v2

from deepsegment import DeepSegment
# The default language is 'en'
segmenter = DeepSegment('en', tf_serving=True)
segmenter.segment('I am Batman i live in gotham')
# ['I am Batman', 'i live in gotham']

Finetuning DeepSegment

Since one-size will never fit all, finetuning deepsegment's default models with your own data is encouraged.

from deepsegment import finetune, generate_data

x, y = generate_data(['my name', 'is batman', 'who are', 'you'], n_examples=10000)
vx, vy = generate_data(['my name', 'is batman'])

# NOTE: name, epochs, batch_size, lr are optional arguments.
finetune('en', x, y, vx, vy, name='finetuned_model_name', epochs=number_of_epochs, batch_size=batch_size, lr=learning_rate)

Using with a finetuned checkpoint

from deepsegment import DeepSegment
segmenter = DeepSegment('en', checkpoint_name='finetuned_model_name')

Training deepsegment on custom data: https://colab.research.google.com/drive/1CjYbdbDHX1UmIyvn7nDW2ClQPnnNeA_m

Similar Projects:

https://github.com/bminixhofer/nnsplit (with bindings for Python, Rust and Javascript.)

Name	Name	Last commit message	Last commit date
Latest commit History 85 Commits 85 Commits
.github/ISSUE_TEMPLATE	.github/ISSUE_TEMPLATE
deepsegment	deepsegment
LICENSE	LICENSE
README.md	README.md
setup.py	setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSegment: A sentence segmenter that actually works!

Installation:

Supported languages:

Usage:

Using with tf serving docker image

Finetuning DeepSegment

Using with a finetuned checkpoint

Similar Projects:

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

DeepSegment: A sentence segmenter that actually works!

Installation:

Supported languages:

Usage:

Using with tf serving docker image

Finetuning DeepSegment

Using with a finetuned checkpoint

Similar Projects:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages