You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+82-21Lines changed: 82 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -31,18 +31,19 @@
31
31
</p>
32
32
33
33
34
-
##Table of contents
34
+
# Table of contents
35
35
-[Introduction](#introduction)
36
36
-[Installation](#installation)
37
37
-[Getting started](#getting-started)
38
38
-[Natural Language Processing](#nlp-tasks)
39
+
-[Text Classification](#text-classification)
39
40
-[Regression](#regression)
40
41
-[Classification](#classification)
41
42
42
-
##Introduction
43
+
# Introduction
43
44
PostgresML is a PostgreSQL extension that enables you to perform ML training and inference on text and tabular data using SQL queries. With PostgresML, you can seamlessly integrate machine learning models into your PostgreSQL database and harness the power of cutting-edge algorithms to process text and tabular data efficiently.
44
45
45
-
###Text Data
46
+
## Text Data
46
47
- Perform natural language processing (NLP) tasks like sentiment analysis, question and answering, translation, summarization and text generation
47
48
- Access 1000s of state-of-the-art language models like GPT-2, GPT-J, GPT-Neo from :hugs: HuggingFace model hub
48
49
- Fine tune large language models (LLMs) on your own text data for different tasks
@@ -96,7 +97,7 @@ SELECT pgml.transform(
96
97
]
97
98
```
98
99
99
-
###Tabular data
100
+
## Tabular data
100
101
-[47+ classification and regression algorithms](https://postgresml.org/docs/guides/training/algorithm_selection)
101
102
-[8 - 40X faster inference than HTTP based model serving](https://postgresml.org/blog/postgresml-is-8x-faster-than-python-http-microservices)
102
103
-[Millions of transactions per second](https://postgresml.org/blog/scaling-postgresml-to-one-million-requests-per-second)
@@ -124,10 +125,10 @@ SELECT pgml.predict(
124
125
) AS prediction;
125
126
```
126
127
127
-
##Installation
128
+
# Installation
128
129
PostgresML installation consists of three parts: PostgreSQL database, Postgres extension for machine learning and a dashboard app. The extension provides all the machine learning functionality and can be used independently using any SQL IDE. The dashboard app provides a eays to use interface for writing SQL notebooks, performing and tracking ML experiments and ML models.
129
130
130
-
###Docker
131
+
## Docker
131
132
132
133
Step 1: Clone this repository
133
134
@@ -146,12 +147,12 @@ Step 3: Connect to PostgresDB with PostgresML enabled using a SQL IDE or <a href
If you want to check out the functionality without the hassle of Docker please go ahead and start PostgresML by signing up for a free account [here](https://postgresml.org/signup). We will provide 5GiB disk space on a shared tenant.
151
152
152
-
##Getting Started
153
+
# Getting Started
153
154
154
-
###Option 1
155
+
## Option 1
155
156
- On local installation go to dashboard app at `http://localhost:8000/` to use SQL notebooks.
156
157
157
158
- On the free tier click on **Dashboard** button to use SQL notebooks.
@@ -160,7 +161,7 @@ If you want to check out the functionality without the hassle of Docker please g
160
161
- Try one of the pre-built SQL notebooks
161
162

162
163
163
-
###Option 2
164
+
## Option 2
164
165
- Use any of these popular tools to connect to PostgresML and write SQL queries
PostgresML integrates 🤗 Hugging Face Transformers to bring state-of-the-art NLP models into the data layer. There are tens of thousands of pre-trained models with pipelines to turn raw text in your database into useful results. Many state of the art deep learning architectures have been published and made available from Hugging Face <ahref= "https://huggingface.co/models"target="_blank">model hub</a>.
177
178
178
179
You can call different NLP tasks and customize using them using the following SQL query.
@@ -184,12 +185,15 @@ SELECT pgml.transform(
184
185
args => JSONB -- (optional) arguments to the pipeline.
185
186
)
186
187
```
187
-
###Text Classification
188
+
## Text Classification
188
189
189
190
Text classification involves assigning a label or category to a given text. Common use cases include sentiment analysis, natural language inference, and the assessment of grammatical correctness.
Sentiment analysis is a type of natural language processing technique that involves analyzing a piece of text to determine the sentiment or emotion expressed within it. It can be used to classify a text as positive, negative, or neutral, and has a wide range of applications in fields such as marketing, customer service, and political analysis.
195
+
196
+
*Basic usage*
193
197
```sql
194
198
SELECTpgml.transform(
195
199
task =>'text-classification',
@@ -211,7 +215,7 @@ SELECT pgml.transform(
211
215
The default <ahref="https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english"target="_blank">model</a> used for text classification is a fine-tuned version of DistilBERT-base-uncased that has been specifically optimized for the Stanford Sentiment Treebank dataset (sst2).
212
216
213
217
214
-
*Sentiment Analysis using specific model*
218
+
*Using specific model*
215
219
216
220
To use one of the over 19,000 models available on Hugging Face, include the name of the desired model and its associated task as a JSONB object in the SQL query. For example, if you want to use a RoBERTa <ahref="https://huggingface.co/models?pipeline_tag=text-classification"target="_blank">model</a> trained on around 40,000 English tweets and that has POS (positive), NEG (negative), and NEU (neutral) labels for its classes, include this information in the JSONB object when making your query.
217
221
@@ -236,7 +240,7 @@ SELECT pgml.transform(
236
240
]
237
241
```
238
242
239
-
*Sentiment analysis using industry specific model*
243
+
*Using industry specific model*
240
244
241
245
By selecting a model that has been specifically designed for a particular industry, you can achieve more accurate and relevant text classification. An example of such a model is <ahref="https://huggingface.co/ProsusAI/finbert"target="_blank">FinBERT</a>, a pre-trained NLP model that has been optimized for analyzing sentiment in financial text. FinBERT was created by training the BERT language model on a large financial corpus, and fine-tuning it to specifically classify financial sentiment. When using FinBERT, the model will provide softmax outputs for three different labels: positive, negative, or neutral.
242
246
@@ -263,14 +267,16 @@ SELECT pgml.transform(
263
267
]
264
268
```
265
269
266
-
*Natural Language Infenrence (NLI)*
270
+
### Natural Language Inference (NLI)
271
+
272
+
NLI, or Natural Language Inference, is a type of model that determines the relationship between two texts. The model takes a premise and a hypothesis as inputs and returns a class, which can be one of three types:
273
+
- Entailment: This means that the hypothesis is true based on the premise.
274
+
- Contradiction: This means that the hypothesis is false based on the premise.
275
+
- Neutral: This means that there is no relationship between the hypothesis and the premise.
267
276
268
-
In NLI the model determines the relationship between two given texts. Concretely, the model takes a premise and a hypothesis and returns a class that can either be:
269
-
- entailment, which means the hypothesis is true.
270
-
- contraction, which means the hypothesis is false.
271
-
- neutral, which means there's no relation between the hypothesis and the premise.
277
+
The GLUE dataset is the benchmark dataset for evaluating NLI models. There are different variants of NLI models, such as Multi-Genre NLI, Question NLI, and Winograd NLI.
272
278
273
-
The benchmark dataset for this task is GLUE (General Language Understanding Evaluation). NLI models have different variants, such as Multi-Genre NLI, Question NLI and Winograd NLI.
279
+
If you want to use an NLI model, you can find them on the :hugs: Hugging Face model hub. Look for models with "nli" or "mnli".
The QNLI task involves determining whether a given question can be answered by the information in a provided document. If the answer can be found in the document, the label assigned is "entailment". Conversely, if the answer cannot be found in the document, the label assigned is "not entailment".
301
+
302
+
If you want to use an QNLI model, you can find them on the :hugs: Hugging Face model hub. Look for models with "qnli".
303
+
304
+
```sql
305
+
SELECTpgml.transform(
306
+
inputs => ARRAY[
307
+
'Where is the capital of France?, Paris is the capital of France.'
The Quora Question Pairs model is designed to evaluate whether two given questions are paraphrases of each other. This model takes the two questions and assigns a binary value as output. LABEL_0 indicates that the questions are paraphrases of each other and LABEL_1 indicates that the questions are not paraphrases. The benchmark dataset used for this task is the Quora Question Pairs dataset within the GLUE benchmark, which contains a collection of question pairs and their corresponding labels.
326
+
327
+
```sql
328
+
SELECTpgml.transform(
329
+
inputs => ARRAY[
330
+
'Which city is the capital of France?, Where is the capital of France?'
0 commit comments