Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 8721ce8

Browse filesBrowse files
committed
First version of text classification
1 parent ad16887 commit 8721ce8
Copy full SHA for 8721ce8

File tree

1 file changed

+82
-21
lines changed
Filter options

1 file changed

+82
-21
lines changed

‎README.md

Copy file name to clipboardExpand all lines: README.md
+82-21Lines changed: 82 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -31,18 +31,19 @@
3131
</p>
3232

3333

34-
## Table of contents
34+
# Table of contents
3535
- [Introduction](#introduction)
3636
- [Installation](#installation)
3737
- [Getting started](#getting-started)
3838
- [Natural Language Processing](#nlp-tasks)
39+
- [Text Classification](#text-classification)
3940
- [Regression](#regression)
4041
- [Classification](#classification)
4142

42-
## Introduction
43+
# Introduction
4344
PostgresML is a PostgreSQL extension that enables you to perform ML training and inference on text and tabular data using SQL queries. With PostgresML, you can seamlessly integrate machine learning models into your PostgreSQL database and harness the power of cutting-edge algorithms to process text and tabular data efficiently.
4445

45-
### Text Data
46+
## Text Data
4647
- Perform natural language processing (NLP) tasks like sentiment analysis, question and answering, translation, summarization and text generation
4748
- Access 1000s of state-of-the-art language models like GPT-2, GPT-J, GPT-Neo from :hugs: HuggingFace model hub
4849
- Fine tune large language models (LLMs) on your own text data for different tasks
@@ -96,7 +97,7 @@ SELECT pgml.transform(
9697
]
9798
```
9899

99-
### Tabular data
100+
## Tabular data
100101
- [47+ classification and regression algorithms](https://postgresml.org/docs/guides/training/algorithm_selection)
101102
- [8 - 40X faster inference than HTTP based model serving](https://postgresml.org/blog/postgresml-is-8x-faster-than-python-http-microservices)
102103
- [Millions of transactions per second](https://postgresml.org/blog/scaling-postgresml-to-one-million-requests-per-second)
@@ -124,10 +125,10 @@ SELECT pgml.predict(
124125
) AS prediction;
125126
```
126127

127-
## Installation
128+
# Installation
128129
PostgresML installation consists of three parts: PostgreSQL database, Postgres extension for machine learning and a dashboard app. The extension provides all the machine learning functionality and can be used independently using any SQL IDE. The dashboard app provides a eays to use interface for writing SQL notebooks, performing and tracking ML experiments and ML models.
129130

130-
### Docker
131+
## Docker
131132

132133
Step 1: Clone this repository
133134

@@ -146,12 +147,12 @@ Step 3: Connect to PostgresDB with PostgresML enabled using a SQL IDE or <a href
146147
postgres://postgres@localhost:5433/pgml_development
147148
```
148149

149-
### Free trial
150+
## Free trial
150151
If you want to check out the functionality without the hassle of Docker please go ahead and start PostgresML by signing up for a free account [here](https://postgresml.org/signup). We will provide 5GiB disk space on a shared tenant.
151152

152-
## Getting Started
153+
# Getting Started
153154

154-
### Option 1
155+
## Option 1
155156
- On local installation go to dashboard app at `http://localhost:8000/` to use SQL notebooks.
156157

157158
- On the free tier click on **Dashboard** button to use SQL notebooks.
@@ -160,7 +161,7 @@ If you want to check out the functionality without the hassle of Docker please g
160161
- Try one of the pre-built SQL notebooks
161162
![notebooks](pgml-docs/docs/images/notebooks.png)
162163

163-
### Option 2
164+
## Option 2
164165
- Use any of these popular tools to connect to PostgresML and write SQL queries
165166
- <a href="https://superset.apache.org/" target="_blank">Apache Superset</a>
166167
- <a href="https://dbeaver.io/" target="_blank">DBeaver</a>
@@ -172,7 +173,7 @@ If you want to check out the functionality without the hassle of Docker please g
172173
- <a href="https://jupyter.org/" target="_blank">Jupyter</a>
173174
- <a href="https://code.visualstudio.com/" target="_blank">VSCode</a>
174175

175-
## NLP Tasks
176+
# NLP Tasks
176177
PostgresML integrates 🤗 Hugging Face Transformers to bring state-of-the-art NLP models into the data layer. There are tens of thousands of pre-trained models with pipelines to turn raw text in your database into useful results. Many state of the art deep learning architectures have been published and made available from Hugging Face <a href= "https://huggingface.co/models" target="_blank">model hub</a>.
177178

178179
You can call different NLP tasks and customize using them using the following SQL query.
@@ -184,12 +185,15 @@ SELECT pgml.transform(
184185
args => JSONB -- (optional) arguments to the pipeline.
185186
)
186187
```
187-
### Text Classification
188+
## Text Classification
188189

189190
Text classification involves assigning a label or category to a given text. Common use cases include sentiment analysis, natural language inference, and the assessment of grammatical correctness.
190191
![text classification](pgml-docs/docs/images/text-classification.png)
191192

192-
*Sentiment Analysis*
193+
### Sentiment Analysis
194+
Sentiment analysis is a type of natural language processing technique that involves analyzing a piece of text to determine the sentiment or emotion expressed within it. It can be used to classify a text as positive, negative, or neutral, and has a wide range of applications in fields such as marketing, customer service, and political analysis.
195+
196+
*Basic usage*
193197
```sql
194198
SELECT pgml.transform(
195199
task => 'text-classification',
@@ -211,7 +215,7 @@ SELECT pgml.transform(
211215
The default <a href="https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english" target="_blank">model</a> used for text classification is a fine-tuned version of DistilBERT-base-uncased that has been specifically optimized for the Stanford Sentiment Treebank dataset (sst2).
212216

213217

214-
*Sentiment Analysis using specific model*
218+
*Using specific model*
215219

216220
To use one of the over 19,000 models available on Hugging Face, include the name of the desired model and its associated task as a JSONB object in the SQL query. For example, if you want to use a RoBERTa <a href="https://huggingface.co/models?pipeline_tag=text-classification" target="_blank">model</a> trained on around 40,000 English tweets and that has POS (positive), NEG (negative), and NEU (neutral) labels for its classes, include this information in the JSONB object when making your query.
217221

@@ -236,7 +240,7 @@ SELECT pgml.transform(
236240
]
237241
```
238242

239-
*Sentiment analysis using industry specific model*
243+
*Using industry specific model*
240244

241245
By selecting a model that has been specifically designed for a particular industry, you can achieve more accurate and relevant text classification. An example of such a model is <a href="https://huggingface.co/ProsusAI/finbert" target="_blank">FinBERT</a>, a pre-trained NLP model that has been optimized for analyzing sentiment in financial text. FinBERT was created by training the BERT language model on a large financial corpus, and fine-tuning it to specifically classify financial sentiment. When using FinBERT, the model will provide softmax outputs for three different labels: positive, negative, or neutral.
242246

@@ -263,14 +267,16 @@ SELECT pgml.transform(
263267
]
264268
```
265269

266-
*Natural Language Infenrence (NLI)*
270+
### Natural Language Inference (NLI)
271+
272+
NLI, or Natural Language Inference, is a type of model that determines the relationship between two texts. The model takes a premise and a hypothesis as inputs and returns a class, which can be one of three types:
273+
- Entailment: This means that the hypothesis is true based on the premise.
274+
- Contradiction: This means that the hypothesis is false based on the premise.
275+
- Neutral: This means that there is no relationship between the hypothesis and the premise.
267276

268-
In NLI the model determines the relationship between two given texts. Concretely, the model takes a premise and a hypothesis and returns a class that can either be:
269-
- entailment, which means the hypothesis is true.
270-
- contraction, which means the hypothesis is false.
271-
- neutral, which means there's no relation between the hypothesis and the premise.
277+
The GLUE dataset is the benchmark dataset for evaluating NLI models. There are different variants of NLI models, such as Multi-Genre NLI, Question NLI, and Winograd NLI.
272278

273-
The benchmark dataset for this task is GLUE (General Language Understanding Evaluation). NLI models have different variants, such as Multi-Genre NLI, Question NLI and Winograd NLI.
279+
If you want to use an NLI model, you can find them on the :hugs: Hugging Face model hub. Look for models with "nli" or "mnli".
274280

275281
```sql
276282
SELECT pgml.transform(
@@ -282,6 +288,61 @@ SELECT pgml.transform(
282288
}'::JSONB
283289
) AS nli;
284290
```
291+
*Result*
292+
```sql
293+
nli
294+
------------------------------------------------------
295+
[
296+
{"label": "ENTAILMENT", "score": 0.98837411403656}
297+
]
298+
```
299+
### Question Natural Language Inference (QNLI)
300+
The QNLI task involves determining whether a given question can be answered by the information in a provided document. If the answer can be found in the document, the label assigned is "entailment". Conversely, if the answer cannot be found in the document, the label assigned is "not entailment".
301+
302+
If you want to use an QNLI model, you can find them on the :hugs: Hugging Face model hub. Look for models with "qnli".
303+
304+
```sql
305+
SELECT pgml.transform(
306+
inputs => ARRAY[
307+
'Where is the capital of France?, Paris is the capital of France.'
308+
],
309+
task => '{"task": "text-classification",
310+
"model": "cross-encoder/qnli-electra-base"
311+
}'::JSONB
312+
) AS qnli;
313+
```
314+
315+
*Result*
316+
```sql
317+
qnli
318+
------------------------------------------------------
319+
[
320+
{"label": "LABEL_0", "score": 0.9978110194206238}
321+
]
322+
```
323+
324+
### Quora Question Pairs
325+
The Quora Question Pairs model is designed to evaluate whether two given questions are paraphrases of each other. This model takes the two questions and assigns a binary value as output. LABEL_0 indicates that the questions are paraphrases of each other and LABEL_1 indicates that the questions are not paraphrases. The benchmark dataset used for this task is the Quora Question Pairs dataset within the GLUE benchmark, which contains a collection of question pairs and their corresponding labels.
326+
327+
```sql
328+
SELECT pgml.transform(
329+
inputs => ARRAY[
330+
'Which city is the capital of France?, Where is the capital of France?'
331+
],
332+
task => '{"task": "text-classification",
333+
"model": "textattack/bert-base-uncased-QQP"
334+
}'::JSONB
335+
) AS qqp;
336+
```
337+
338+
*Result*
339+
```sql
340+
qqp
341+
------------------------------------------------------
342+
[
343+
{"label": "LABEL_0", "score": 0.9988721013069152}
344+
]
345+
```
285346
### Token Classification
286347
### Table Question Answering
287348
### Question Answering

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.