Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 4846135

Browse filesBrowse files
committed
updated readme
1 parent e897643 commit 4846135
Copy full SHA for 4846135

File tree

Expand file treeCollapse file tree

1 file changed

+35
-2
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+35
-2
lines changed

‎README.md

Copy file name to clipboardExpand all lines: README.md
+35-2Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,6 @@ The script `./vector-embeddings/02-create-vectors-table.sql` does exactly that.
5757

5858
## Find similar articles by calculating cosine distance
5959

60-
The third script `./vector-embeddings/03-find-similar-articles.sql` starts invoking OpenAI to get the vector embeddings of an arbitrary text.
61-
6260
Make sure to have an Azure OpenAI [embeddings model](https://learn.microsoft.com/azure/cognitive-services/openai/concepts/models#embeddings-models) deployed and make sure it is using the `text-embedding-ada-002` model.
6361

6462
Once the Azure OpenAI model is deployed, it can be called from Azure SQL database using [sp_invoke_external_rest_endpoint](https://learn.microsoft.com/sql/relational-databases/system-stored-procedures/sp-invoke-external-rest-endpoint-transact-sql), to get the embedding vector for the "the foundation series by isaac asimov", text, for example, using the following code (make sure to replace the `<your-api-name>` and `<api-key>` with yout Azure OpenAI deployment):
@@ -101,6 +99,41 @@ SUM(v1.[vector_value] * v2.[vector_value]) /
10199

102100
thanks to columnstore, even on small SKU, the performance can be pretty fast, well within the sub-second goal.
103101

102+
## Encapsulating logic to retrieve embeddings
103+
104+
The described process can be wrapped into stored procedures to make it easy to re-use it. The scripts in the `./vector-embeddings/` show how to create a stored procedure to retrieve the embeddings from OpenAI:
105+
106+
- `03-store-openai-credentials.sql`: stores the Azure OpenAI credentials in the Azure SQL database
107+
- `04-create-get-embeddings-procedure.sql`: create a stored procedure to encapsulate the call to OpenAI using the script.
108+
109+
## Finding similar articles
110+
111+
The script `05-find-similar-articles.sql` uses the created stored procedure and the process explained above to find similar articles to the provided text.
112+
113+
## Encapsulating logic to do similarity saerch
114+
115+
To make it even easier to use, the script `06-sample-function.sql` shows a sample function that can be used to find similar articles by just providing the text, as demonstrated in script `07-sample-function-usage` with the following example:
116+
117+
```sql
118+
declare @e nvarchar(max);
119+
declare @text nvarchar(max) = N'the foundation series by isaac asimov';
120+
121+
exec dbo.get_embedding 'embeddings', @text, @e output;
122+
123+
select * from dbo.SimilarContentArticles(@e) as r order by cosine_distance desc
124+
```
125+
126+
## Alternative sample with Python and a local embedding model
127+
128+
If you don't want or can't use OpenAI to generate embeddings, you can use a local model like `https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1` to generate embeddings. The Python script `./python/hybrid_search.py` shows how to
129+
130+
- use Python to generate the embeddings
131+
- do similarity search in Azure SQL database
132+
- use [Fulltext search in Azure SQL database with BM25 ranking](https://learn.microsoft.com/en-us/sql/relational-databases/search/limit-search-results-with-rank?view=sql-server-ver16#ranking-of-freetexttable)
133+
- do re-ranking applying Reciprocal Rank Fusion (RRF) to combine the BM25 ranking with the cosine similarity ranking
134+
135+
Make sure to setup the database for this sample using the `./python/00-setup-database.sql` script. Database can be either an Azure SQL DB or a SQL Server database.
136+
104137
## Conclusions
105138

106139
Azure SQL database, and by extension SQL Server, already has a great support for vector operations thanks to columnstore and its usage of [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) [AVX-512 instructions](https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html).

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.