Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit ad16887

Browse filesBrowse files
committed
Updates to text-classification
1 parent 47e0cea commit ad16887
Copy full SHA for ad16887

File tree

1 file changed

+35
-15
lines changed
Filter options

1 file changed

+35
-15
lines changed

‎README.md

Copy file name to clipboardExpand all lines: README.md
+35-15Lines changed: 35 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ SELECT pgml.transform(
189189
Text classification involves assigning a label or category to a given text. Common use cases include sentiment analysis, natural language inference, and the assessment of grammatical correctness.
190190
![text classification](pgml-docs/docs/images/text-classification.png)
191191

192-
*Basic SQL query*
192+
*Sentiment Analysis*
193193
```sql
194194
SELECT pgml.transform(
195195
task => 'text-classification',
@@ -208,10 +208,10 @@ SELECT pgml.transform(
208208
{"label": "NEGATIVE", "score": 0.9903519749641418}
209209
]
210210
```
211+
The default <a href="https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english" target="_blank">model</a> used for text classification is a fine-tuned version of DistilBERT-base-uncased that has been specifically optimized for the Stanford Sentiment Treebank dataset (sst2).
211212

212-
A fine-tune checkpoint of DistilBERT-base-uncased that is tuned on Stanford Sentiment Treebank(sst2) is used as a default <a href="https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english" target="_blank">model</a> for text classification.
213213

214-
*SQL query using specific model*
214+
*Sentiment Analysis using specific model*
215215

216216
To use one of the over 19,000 models available on Hugging Face, include the name of the desired model and its associated task as a JSONB object in the SQL query. For example, if you want to use a RoBERTa <a href="https://huggingface.co/models?pipeline_tag=text-classification" target="_blank">model</a> trained on around 40,000 English tweets and that has POS (positive), NEG (negative), and NEU (neutral) labels for its classes, include this information in the JSONB object when making your query.
217217

@@ -236,7 +236,7 @@ SELECT pgml.transform(
236236
]
237237
```
238238

239-
*SQL query using models from specific industry*
239+
*Sentiment analysis using industry specific model*
240240

241241
By selecting a model that has been specifically designed for a particular industry, you can achieve more accurate and relevant text classification. An example of such a model is <a href="https://huggingface.co/ProsusAI/finbert" target="_blank">FinBERT</a>, a pre-trained NLP model that has been optimized for analyzing sentiment in financial text. FinBERT was created by training the BERT language model on a large financial corpus, and fine-tuning it to specifically classify financial sentiment. When using FinBERT, the model will provide softmax outputs for three different labels: positive, negative, or neutral.
242242

@@ -262,17 +262,37 @@ SELECT pgml.transform(
262262
{"label": "neutral", "score": 0.8062630891799927}
263263
]
264264
```
265-
- Token Classification
266-
- Table Question Answering
267-
- Question Answering
268-
- Zero-Shot Classification
269-
- Translation
270-
- Summarization
271-
- Conversational
272-
- Text Generation
273-
- Text2Text Generation
274-
- Fill-Mask
275-
- Sentence Similarity
265+
266+
*Natural Language Infenrence (NLI)*
267+
268+
In NLI the model determines the relationship between two given texts. Concretely, the model takes a premise and a hypothesis and returns a class that can either be:
269+
- entailment, which means the hypothesis is true.
270+
- contraction, which means the hypothesis is false.
271+
- neutral, which means there's no relation between the hypothesis and the premise.
272+
273+
The benchmark dataset for this task is GLUE (General Language Understanding Evaluation). NLI models have different variants, such as Multi-Genre NLI, Question NLI and Winograd NLI.
274+
275+
```sql
276+
SELECT pgml.transform(
277+
inputs => ARRAY[
278+
'A soccer game with multiple males playing. Some men are playing a sport.'
279+
],
280+
task => '{"task": "text-classification",
281+
"model": "roberta-large-mnli"
282+
}'::JSONB
283+
) AS nli;
284+
```
285+
### Token Classification
286+
### Table Question Answering
287+
### Question Answering
288+
### Zero-Shot Classification
289+
### Translation
290+
### Summarization
291+
### Conversational
292+
### Text Generation
293+
### Text2Text Generation
294+
### Fill-Mask
295+
### Sentence Similarity
276296

277297
## Regression
278298
## Classification

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.