Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 2f33c43

Browse filesBrowse files
committed
Added protobuf for finbert support and text-classification readme in progress
1 parent cb9b2d4 commit 2f33c43
Copy full SHA for 2f33c43

File tree

4 files changed

+109
-24
lines changed
Filter options

4 files changed

+109
-24
lines changed

‎README.md

Copy file name to clipboardExpand all lines: README.md
+107-22Lines changed: 107 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ PostgresML is a PostgreSQL extension that enables you to perform ML training and
4949

5050
**Translation**
5151

52-
*SQL Query*
52+
*SQL query*
5353

5454
```sql
5555
SELECT pgml.transform(
@@ -62,7 +62,7 @@ SELECT pgml.transform(
6262
```
6363
*Result*
6464

65-
```bash
65+
```json
6666
french
6767
------------------------------------------------------------
6868

@@ -75,27 +75,24 @@ SELECT pgml.transform(
7575

7676

7777
**Sentiment Analysis**
78-
*SQL Query*
78+
*SQL query*
7979

8080
```sql
8181
SELECT pgml.transform(
82-
83-
'{"model": "roberta-large-mnli"}'::JSONB,
84-
inputs => ARRAY
85-
[
82+
task => 'text-classification',
83+
inputs => ARRAY[
8684
'I love how amazingly simple ML has become!',
8785
'I hate doing mundane and thankless tasks. ☹️'
8886
]
89-
9087
) AS positivity;
9188
```
9289
*Result*
93-
```bash
90+
```json
9491
positivity
9592
------------------------------------------------------
9693
[
97-
{"label": "NEUTRAL", "score": 0.8143417835235596},
98-
{"label": "NEUTRAL", "score": 0.7637073993682861}
94+
{"label": "POSITIVE", "score": 0.9995759129524232},
95+
{"label": "NEGATIVE", "score": 0.9903519749641418}
9996
]
10097
```
10198

@@ -144,7 +141,7 @@ cd postgresml
144141
docker-compose up
145142
```
146143

147-
Step 3: Connect to PostgresDB with PostgresML enabled using a SQL IDE or [`psql`](https://www.postgresql.org/docs/current/app-psql.html)
144+
Step 3: Connect to PostgresDB with PostgresML enabled using a SQL IDE or <a href="https://www.postgresql.org/docs/current/app-psql.html" target="_blank">psql</a>
148145
```bash
149146
postgres://postgres@localhost:5433/pgml_development
150147
```
@@ -165,18 +162,106 @@ If you want to check out the functionality without the hassle of Docker please g
165162

166163
### Option 2
167164
- Use any of these popular tools to connect to PostgresML and write SQL queries
168-
- [Apache Superset](https://superset.apache.org/)
169-
- [DBeaver](https://dbeaver.io/)
170-
- [Data Grip](https://www.jetbrains.com/datagrip/)
171-
- [Postico 2](https://eggerapps.at/postico2/)
172-
- [Popsql](https://popsql.com/)
173-
- [Tableau](https://www.tableau.com/)
174-
- [Power BI](https://powerbi.microsoft.com/en-us/)
175-
- [Jupyter](https://jupyter.org/)
176-
- [VSCode](https://code.visualstudio.com/)
165+
- <a href="https://superset.apache.org/" target="_blank">Apache Superset</a>
166+
- <a href="https://dbeaver.io/" target="_blank">DBeaver</a>
167+
- <a href="https://www.jetbrains.com/datagrip/" target="_blank">Data Grip</a>
168+
- <a href="https://eggerapps.at/postico2/" target="_blank">Postico 2</a>
169+
- <a href="https://popsql.com/" target="_blank">Popsql</a>
170+
- <a href="https://www.tableau.com/" target="_blank">Tableau</a>
171+
- <a href="https://powerbi.microsoft.com/en-us/" target="_blank">PowerBI</a>
172+
- <a href="https://jupyter.org/" target="_blank">Jupyter</a>
173+
- <a href="https://code.visualstudio.com/" target="_blank">VSCode</a>
177174

178175
## NLP Tasks
179-
- Text Classification
176+
PostgresML integrates 🤗 Hugging Face Transformers to bring state-of-the-art NLP models into the data layer. There are tens of thousands of pre-trained models with pipelines to turn raw text in your database into useful results. Many state of the art deep learning architectures have been published and made available from Hugging Face <a href= "https://huggingface.co/models" target="_blank">model hub</a>.
177+
178+
You can call different NLP tasks and customize using them using the following SQL query.
179+
180+
```sql
181+
SELECT pgml.transform(
182+
task => TEXT OR JSONB, -- Pipeline initializer arguments
183+
inputs => TEXT[] OR BYTEA[], -- inputs for inference
184+
args => JSONB -- (optional) arguments to the pipeline.
185+
)
186+
```
187+
### Text Classification
188+
189+
Text classification involves assigning a label or category to a given text. Common use cases include sentiment analysis, natural language inference, and the assessment of grammatical correctness.
190+
![text classification](pgml-docs/docs/images/text-classification.png)
191+
192+
*Basic SQL query*
193+
```sql
194+
SELECT pgml.transform(
195+
task => 'text-classification',
196+
inputs => ARRAY[
197+
'I love how amazingly simple ML has become!',
198+
'I hate doing mundane and thankless tasks. ☹️'
199+
]
200+
) AS positivity;
201+
```
202+
*Result*
203+
```json
204+
positivity
205+
------------------------------------------------------
206+
[
207+
{"label": "POSITIVE", "score": 0.9995759129524232},
208+
{"label": "NEGATIVE", "score": 0.9903519749641418}
209+
]
210+
```
211+
212+
A fine-tune checkpoint of DistilBERT-base-uncased that is tuned on Stanford Sentiment Treebank(sst2) is used as a default <a href="https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english" target="_blank">model</a> for text classification.
213+
214+
*SQL query using specific model*
215+
216+
To use one of the over 19,000 models available on Hugging Face, include the name of the desired model and its associated task as a JSONB object in the SQL query. For example, if you want to use a RoBERTa <a href="https://huggingface.co/models?pipeline_tag=text-classification" target="_blank">model</a> trained on around 40,000 English tweets and that has POS (positive), NEG (negative), and NEU (neutral) labels for its classes, include this information in the JSONB object when making your query.
217+
218+
```sql
219+
SELECT pgml.transform(
220+
inputs => ARRAY[
221+
'I love how amazingly simple ML has become!',
222+
'I hate doing mundane and thankless tasks. ☹️'
223+
],
224+
task => '{"task": "text-classification",
225+
"model": "finiteautomata/bertweet-base-sentiment-analysis"
226+
}'::JSONB
227+
) AS positivity;
228+
```
229+
*Result*
230+
```json
231+
positivity
232+
-----------------------------------------------
233+
[
234+
{"label": "POS", "score": 0.992932200431826},
235+
{"label": "NEG", "score": 0.975599765777588}
236+
]
237+
```
238+
239+
*SQL query using models from specific industry*
240+
241+
By selecting a model that has been specifically designed for a particular industry, you can achieve more accurate and relevant text classification. An example of such a model is <a href="https://huggingface.co/ProsusAI/finbert" target="_blank">FinBERT</a>, a pre-trained NLP model that has been optimized for analyzing sentiment in financial text. FinBERT was created by training the BERT language model on a large financial corpus, and fine-tuning it to specifically classify financial sentiment. When using FinBERT, the model will provide softmax outputs for three different labels: positive, negative, or neutral.
242+
243+
```sql
244+
SELECT pgml.transform(
245+
inputs => ARRAY[
246+
'Stocks rallied and the British pound gained.',
247+
'Stocks making the biggest moves midday: Nvidia, Palantir and more'
248+
],
249+
task => '{"task": "text-classification",
250+
"model": "ProsusAI/finbert"
251+
}'::JSONB
252+
) AS market_sentiment;
253+
```
254+
255+
*Result*
256+
```json
257+
258+
market_sentiment
259+
------------------------------------------------------
260+
[
261+
{"label": "positive", "score": 0.8983612656593323},
262+
{"label": "neutral", "score": 0.8062630891799927}
263+
]
264+
```
180265
- Token Classification
181266
- Table Question Answering
182267
- Question Answering

‎docker-compose.yml

Copy file name to clipboardExpand all lines: docker-compose.yml
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ services:
1010
context: ./pgml-extension/
1111
dockerfile: Dockerfile.local
1212
ports:
13-
- "5433:5432"
13+
- "6453:5432"
1414
command:
1515
- sleep
1616
- infinity
494 KB
Loading

‎pgml-extension/Dockerfile.local

Copy file name to clipboardExpand all lines: pgml-extension/Dockerfile.local
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ RUN cat /etc/apt/sources.list
1111
RUN apt-get update && apt-get install -y postgresql-pgml-14
1212

1313
# Cache this, quicker
14-
RUN pip3 install xgboost scikit-learn diptest torch lightgbm transformers datasets sentencepiece sentence_transformers sacremoses sacrebleu rouge
14+
RUN pip3 install xgboost scikit-learn diptest torch lightgbm transformers datasets sentencepiece sentence_transformers sacremoses sacrebleu rouge protobuf
1515

1616
COPY --chown=postgres:postgres . /app
1717
WORKDIR /app

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.