Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
This repository was archived by the owner on Apr 30, 2021. It is now read-only.

notAI-tech/deepsegment

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
85 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepSegment: A sentence segmenter that actually works!

Downloads DOI

DeepSegment is available as a free to use API (https://fastdeploy.notai.tech/free_apis) and as a self-hostable service via https://github.com/notAI-tech/fastDeploy

Note: For the original implementation please use the "master" branch of this repo.

Code documentation available at http://bpraneeth.com/docs

Installation:

# Tested with (keras==2.3.1; tensorflow==2.2.0) and (keras==2.2.4; tensorflow==1.14.0)
pip install --upgrade deepsegment

Supported languages:

en - english (Trained on data from various sources)

fr - french (Only Tatoeba data)

it - italian (Only Tatoeba data)

Usage:

from deepsegment import DeepSegment
# The default language is 'en'
segmenter = DeepSegment('en')
segmenter.segment('I am Batman i live in gotham')
# ['I am Batman', 'i live in gotham']

Using with tf serving docker image

docker pull bedapudi6788/deepsegment_en:v2
docker run -d -p 8500:8500 bedapudi6788/deepsegment_en:v2
from deepsegment import DeepSegment
# The default language is 'en'
segmenter = DeepSegment('en', tf_serving=True)
segmenter.segment('I am Batman i live in gotham')
# ['I am Batman', 'i live in gotham']

Finetuning DeepSegment

Since one-size will never fit all, finetuning deepsegment's default models with your own data is encouraged.

from deepsegment import finetune, generate_data

x, y = generate_data(['my name', 'is batman', 'who are', 'you'], n_examples=10000)
vx, vy = generate_data(['my name', 'is batman'])

# NOTE: name, epochs, batch_size, lr are optional arguments.
finetune('en', x, y, vx, vy, name='finetuned_model_name', epochs=number_of_epochs, batch_size=batch_size, lr=learning_rate)

Using with a finetuned checkpoint

from deepsegment import DeepSegment
segmenter = DeepSegment('en', checkpoint_name='finetuned_model_name')

Training deepsegment on custom data: https://colab.research.google.com/drive/1CjYbdbDHX1UmIyvn7nDW2ClQPnnNeA_m

Similar Projects:

https://github.com/bminixhofer/nnsplit (with bindings for Python, Rust and Javascript.)

Packages

 
 
 

Contributors

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.