Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit fd56008

Browse filesBrowse files
committed
updated readme
1 parent c0d8ee5 commit fd56008
Copy full SHA for fd56008

File tree

1 file changed

+32
-36
lines changed
Filter options

1 file changed

+32
-36
lines changed

‎README.md

Copy file name to clipboardExpand all lines: README.md
+32-36Lines changed: 32 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Azure SQL database can be used to easily and quickly perform vector similarity s
2222
The **native option** is to use the new Vector Functions, recently introduced in Azure SQL database. Vector Functions are a set of functions that can be used to perform vector operations directly in the database.
2323

2424
> [!NOTE]
25-
> Vector Functions are in Early Adopter Preview.
25+
> Vector Functions are in Early Adopter Preview. Get access to the preview via https://aka.ms/azuresql-vector-eap-announcement
2626
2727
![](_assets/azure-sql-cosine-similarity-native.gif)
2828

@@ -48,22 +48,21 @@ Make sure to replace the `<account>` and `<sas-token>` placeholders with the val
4848

4949
Run each section (each section starts with a comment) separately. At the end of the process (will take up to a couple of minutes) you will have all the CSV data imported in the `wikipedia_articles_embeddings` table.
5050

51-
## Create Vectors Table
51+
## Add embeddings columns to table
5252

53-
In the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into a columnstore index. Thanks to `OPENJSON`, turning a vector into a set of values that can be saved into a column is very easy:
53+
In the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into more compact and optimzed binary format index. Thanks to `JSON_ARRAY_TO_VECTOR`, turning a vector into a set of values that can be saved into a column is very easy:
5454

5555
```sql
56-
select
57-
v.id as article_id,
58-
cast(tv.[key] as int) as vector_value_id,
59-
cast(tv.[value] as float) as vector_value
60-
from
61-
[dbo].[wikipedia_articles_embeddings] as v
62-
cross apply
63-
openjson(title_vector) tv
56+
alter table wikipedia_articles_embeddings
57+
add title_vector_native varbinary(8000);
58+
59+
update
60+
wikipedia_articles_embeddings
61+
set
62+
title_vector_native = json_array_to_vector(title_vector),
6463
```
6564

66-
The script `./vector-embeddings/02-create-vectors-table.sql` does exactly that. It creates two tables, one for the title embeddings and one for the content embeddings, and the creates a clustered columnstore on those tables to enable efficient vector processing.
65+
The script `./vector-embeddings/02-use-native-vectors.sql` does exactly that. It takes the existing columns with vectors stored in JSON arrays and turns them into vectors saved in binary format.
6766

6867
## Find similar articles by calculating cosine distance
6968

@@ -84,31 +83,18 @@ exec @retval = sp_invoke_external_rest_endpoint
8483
select @response;
8584
```
8685

87-
The response can be turned into a table using the `OPENJSON` function again:
86+
The vector returned in the response can extrated using `json_query`:
8887

8988
```sql
90-
select
91-
cast([key] as int) as [vector_value_id],
92-
cast([value] as float) as [vector_value]
93-
into
94-
#t
95-
from
96-
openjson(@response, '$.result.data[0].embedding')
89+
set @re = json_query(@response, '$.result.data[0].embedding')
9790
```
9891

9992
Now is just a matter of taking the vector of the sample text and the vectors of all wikipedia articles and calculate the cosine similarity. The math can be easily expressed in T-SQL:
10093

10194
```sql
102-
SUM(v1.[vector_value] * v2.[vector_value]) /
103-
(
104-
SQRT(SUM(v1.[vector_value] * v1.[vector_value]))
105-
*
106-
SQRT(SUM(v2.[vector_value] * v2.[vector_value]))
107-
)
95+
vector_distance('cosine', @embedding, title_vector)
10896
```
10997

110-
thanks to columnstore, even on small SKU, the performance can be pretty fast, well within the sub-second goal. If you need to use other distance metric, you can find here a description of how to implement the most common one in pure T-SQL: [How to calculate common vectors distances in T-SQL](distance-calculations-in-tsql.md)
111-
11298
## Encapsulating logic to retrieve embeddings
11399

114100
The described process can be wrapped into stored procedures to make it easy to re-use it. The scripts in the `./vector-embeddings/` show how to create a stored procedure to retrieve the embeddings from OpenAI:
@@ -125,12 +111,20 @@ The script `05-find-similar-articles.sql` uses the created stored procedure and
125111
To make it even easier to use, the script `06-sample-function.sql` shows a sample function that can be used to find similar articles by just providing the text, as demonstrated in script `07-sample-function-usage` with the following example:
126112

127113
```sql
128-
declare @e nvarchar(max);
114+
declare @embedding varbinary(8000);
129115
declare @text nvarchar(max) = N'the foundation series by isaac asimov';
130116

131-
exec dbo.get_embedding 'embeddings', @text, @e output;
132-
133-
select * from dbo.SimilarContentArticles(@e) as r order by cosine_distance desc
117+
exec dbo.get_embedding 'embeddings', @text, embedding output;
118+
119+
select top(10)
120+
a.id,
121+
a.title,
122+
a.url,
123+
vector_distance('cosine', @embedding, title_vector_ada2) cosine_distance
124+
from
125+
dbo.wikipedia_articles_embeddings a
126+
order by
127+
cosine_distance;
134128
```
135129

136130
## Alternative sample with Python and a local embedding model
@@ -146,10 +140,12 @@ Make sure to setup the database for this sample using the `./python/00-setup-dat
146140

147141
## Conclusions
148142

149-
Azure SQL database, and by extension SQL Server, already has a great support for vector operations thanks to columnstore and its usage of [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) [AVX-512 instructions](https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html).
143+
Azure SQL database, has now support to perform vector operations directly in the database, making it easy to perform vector similarity search. Using vector search along with fulltext search and BM25 ranking, it is possible to build powerful search engines that can be used in a variety of scenarios.
150144

151-
A vector is nothing more than a list numbers (in this scope) and list of numbers can be perfectly stored in a column and even better in a columnstore index. You can start to take advantange of optimized vectors operations right now, directly in Azure SQL database, taking also advantage of all other features that it offers to developers.
145+
> [!NOTE]
146+
> Vector Functions are in Early Adopter Preview. Get access to the preview via https://aka.ms/azuresql-vector-eap-announcement
152147
153-
## Community Samples
148+
## More resources
154149

155-
[Implement The RAG Pattern](https://blazorhelpwebsite.com/ViewBlogPost/5066)
150+
- [Azure SQL & AI](https://aka.ms/sql-ai)
151+
- [Azure SQL Vector Samples](https://github.com/Azure-Samples/azure-sql-db-vector-search)

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.