GitHub - postgresml/postgresml at b6476eb41c71ac2f6312804b5b95c1b1cc9ccbaf

PostgresML

Simple machine learning with PostgreSQL

Introduction

PostgresML is a PostgreSQL extension that enables you to perform ML training and inference on text and tabular data using SQL queries. With PostgresML, you can seamlessly integrate machine learning models into your PostgreSQL database and harness the power of cutting-edge algorithms to process text and tabular data efficiently.

Text Data

Perform natural language processing (NLP) tasks like sentiment analysis, question and answering, translation, summarization and text generation
Access 1000s of state-of-the-art language models like GPT-2, GPT-J, GPT-Neo from 🤗 HuggingFace model hub
Fine tune large language models (LLMs) on your own text data for different tasks

Translation

SQL Query

Result

SELECT pgml.transform(
    'translation_en_to_fr',
    inputs => ARRAY[
        'Welcome to the future!',
        'Where have you been all this time?'
    ]
) AS french;

                         french                                 
------------------------------------------------------------

[
    {"translation_text": "Bienvenue à l'avenir!"},
    {"translation_text": "Où êtes-vous allé tout ce temps?"}
]

Sentiment Analysis

SQL Query

Result

SELECT pgml.transform(

    '{"model": "roberta-large-mnli"}'::JSONB,
    inputs => ARRAY
    [
        'I love how amazingly simple ML has become!', 
        'I hate doing mundane and thankless tasks. ☹️'
    ]

) AS positivity;

                    positivity
------------------------------------------------------
[
    {"label": "NEUTRAL", "score": 0.8143417835235596}, 
    {"label": "NEUTRAL", "score": 0.7637073993682861}
]

Tabular data

Training a classification model

Training

Inference

SELECT * FROM pgml.train(
    'Handwritten Digit Image Classifier',
    algorithm => 'xgboost',
    'classification',
    'pgml.digits',
    'target'
);

SELECT pgml.predict(
    'My Classification Project', 
    ARRAY[0.1, 2.0, 5.0]
) AS prediction;

Installation

PostgresML installation consists of three parts: PostgreSQL database, Postgres extension for machine learning and a dashboard app. The extension provides all the machine learning functionality and can be used independently using any SQL IDE. The dashboard app provides a eays to use interface for writing SQL notebooks, performing and tracking ML experiments and ML models.

Docker

Step 1: Clone this repository

git clone git@github.com:postgresml/postgresml.git

Step 2: Start dockerized services. PostgresML will run on port 5433, just in case you already have Postgres running. You can find Docker installation instructions here

cd postgresml
docker-compose up

Step 3: Connect to PostgresDB with PostgresML enabled using a SQL IDE or psql

postgres://postgres@localhost:5433/pgml_development

Free trial

If you want to check out the functionality without the hassle of Docker please go ahead and start PostgresML by signing up for a free account here. We will provide 5GiB disk space on a shared tenant.

Getting Started

IDE support

DBeaver
Data Grip
Tableau
Power BI
Jupyter
VSCode

NLP Tasks

Text Classification
Token Classification
Table Question Answering
Question Answering
Zero-Shot Classification
Translation
Summarization
nConversational
Text Generation
Text2Text Generation
Fill-Mask
Sentence Similarity

Regression

Classification

Applications

Text

AI writing partner
Chatbot for customer support
Social media post analysis
Fintech
Healthcare
Insurance

Tabular data

Fraud detection
Recommendation

Benefits

Access to hugging face models - a little more about open source language models
Ease of fine tuning and why
Rust based extension and its benefits
Problems with HTTP serving and how PML enables microsecond latency
Pgcat for horizontal scaling

Concepts

Database
Extension
ML on text data
Transform operation
Fine tune operation
ML on tabular data
Train operation
Deploy operation
Predict operation

Deployment

Docker images
- CPU
- GPU
Data persistence on local/EC2/EKS
Deployment on AWS using docker images

What's in the box

See the documentation for a complete list of functionality.

All your favorite algorithms

Whether you need a simple linear regression, or extreme gradient boosting, we've included support for all classification and regression algorithms in Scikit Learn and XGBoost with no extra configuration.

Managed model deployments

Models can be periodically retrained and automatically promoted to production depending on their key metric. Rollback capability is provided to ensure that you're always able to serve the highest quality predictions, along with historical logs of all deployments for long term study.

Online and offline support

Predictions are served via a standard Postgres connection to ensure that your core apps can always access both your data and your models in real time. Pure SQL workflows also enable batch predictions to cache results in native Postgres tables for lookup.

Instant visualizations

Run standard analysis on your datasets to detect outliers, bimodal distributions, feature correlation, and other common data visualizations on your datasets. Everything is cataloged in the dashboard for easy reference.

Hyperparameter search

Use either grid or random searches with cross validation on your training set to discover the most important knobs to tweak on your favorite algorithm.

SQL native vector operations

Vector operations make working with learned embeddings a snap, for things like nearest neighbor searches or other similarity comparisons.

The performance of Postgres

Since your data never leaves the database, you retain the speed, reliability and security you expect in your foundational stateful services. Leverage your existing infrastructure and expertise to deliver new capabilities.

Open source

We're building on the shoulders of giants. These machine learning libraries and Postgres have received extensive academic and industry use, and we'll continue their tradition to build with the community. Licensed under MIT.

Name	Name	Last commit message	Last commit date
Latest commit History 949 Commits
.github/workflows	.github/workflows
pgml-dashboard	pgml-dashboard
pgml-docs	pgml-docs
pgml-extension	pgml-extension
.editorconfig	.editorconfig
.git-blame-ignore-revs	.git-blame-ignore-revs
.gitattributes	.gitattributes
.gitignore	.gitignore
.gitmodules	.gitmodules
MIT-LICENSE.txt	MIT-LICENSE.txt
README.md	README.md
SECURITY.md	SECURITY.md
docker-compose.yml	docker-compose.yml

Search code, repositories, users, issues, pull requests...

License

postgresml/postgresml

Folders and files

Latest commit

History

Repository files navigation

PostgresML

Table of contents

Introduction

Text Data

Tabular data

Installation

Docker

Free trial

Getting Started

IDE support

NLP Tasks

Regression

Classification

Applications

Text

Tabular data

Benefits

Concepts

Deployment

What's in the box

All your favorite algorithms

Managed model deployments

Online and offline support

Instant visualizations

Hyperparameter search

SQL native vector operations

The performance of Postgres

Open source

Frequently Asked Questions (FAQs)

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 63

Packages 0

Uh oh!

Uh oh!

Contributors 48

Languages

Packages