You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+32-36Lines changed: 32 additions & 36 deletions
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ Azure SQL database can be used to easily and quickly perform vector similarity s
22
22
The **native option** is to use the new Vector Functions, recently introduced in Azure SQL database. Vector Functions are a set of functions that can be used to perform vector operations directly in the database.
23
23
24
24
> [!NOTE]
25
-
> Vector Functions are in Early Adopter Preview.
25
+
> Vector Functions are in Early Adopter Preview. Get access to the preview via https://aka.ms/azuresql-vector-eap-announcement
@@ -48,22 +48,21 @@ Make sure to replace the `<account>` and `<sas-token>` placeholders with the val
48
48
49
49
Run each section (each section starts with a comment) separately. At the end of the process (will take up to a couple of minutes) you will have all the CSV data imported in the `wikipedia_articles_embeddings` table.
50
50
51
-
## Create Vectors Table
51
+
## Add embeddings columns to table
52
52
53
-
In the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into a columnstore index. Thanks to `OPENJSON`, turning a vector into a set of values that can be saved into a column is very easy:
53
+
In the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into more compact and optimzed binary format index. Thanks to `JSON_ARRAY_TO_VECTOR`, turning a vector into a set of values that can be saved into a column is very easy:
The script `./vector-embeddings/02-create-vectors-table.sql` does exactly that. It creates two tables, one for the title embeddings and one for the content embeddings, and the creates a clustered columnstore on those tables to enable efficient vector processing.
65
+
The script `./vector-embeddings/02-use-native-vectors.sql` does exactly that. It takes the existing columns with vectors stored in JSON arrays and turns them into vectors saved in binary format.
67
66
68
67
## Find similar articles by calculating cosine distance
The response can be turned into a table using the `OPENJSON` function again:
86
+
The vector returned in the response can extrated using `json_query`:
88
87
89
88
```sql
90
-
select
91
-
cast([key] asint) as [vector_value_id],
92
-
cast([value] as float) as [vector_value]
93
-
into
94
-
#t
95
-
from
96
-
openjson(@response, '$.result.data[0].embedding')
89
+
set @re = json_query(@response, '$.result.data[0].embedding')
97
90
```
98
91
99
92
Now is just a matter of taking the vector of the sample text and the vectors of all wikipedia articles and calculate the cosine similarity. The math can be easily expressed in T-SQL:
thanks to columnstore, even on small SKU, the performance can be pretty fast, well within the sub-second goal. If you need to use other distance metric, you can find here a description of how to implement the most common one in pure T-SQL: [How to calculate common vectors distances in T-SQL](distance-calculations-in-tsql.md)
111
-
112
98
## Encapsulating logic to retrieve embeddings
113
99
114
100
The described process can be wrapped into stored procedures to make it easy to re-use it. The scripts in the `./vector-embeddings/` show how to create a stored procedure to retrieve the embeddings from OpenAI:
@@ -125,12 +111,20 @@ The script `05-find-similar-articles.sql` uses the created stored procedure and
125
111
To make it even easier to use, the script `06-sample-function.sql` shows a sample function that can be used to find similar articles by just providing the text, as demonstrated in script `07-sample-function-usage` with the following example:
126
112
127
113
```sql
128
-
declare @e nvarchar(max);
114
+
declare @embedding varbinary(8000);
129
115
declare @text nvarchar(max) = N'the foundation series by isaac asimov';
## Alternative sample with Python and a local embedding model
@@ -146,10 +140,12 @@ Make sure to setup the database for this sample using the `./python/00-setup-dat
146
140
147
141
## Conclusions
148
142
149
-
Azure SQL database, and by extension SQL Server, already has a great support for vector operations thanks to columnstore and its usage of [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data)[AVX-512 instructions](https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html).
143
+
Azure SQL database, has now support to perform vector operations directly in the database, making it easy to perform vector similarity search. Using vector search along with fulltext search and BM25 ranking, it is possible to build powerful search engines that can be used in a variety of scenarios.
150
144
151
-
A vector is nothing more than a list numbers (in this scope) and list of numbers can be perfectly stored in a column and even better in a columnstore index. You can start to take advantange of optimized vectors operations right now, directly in Azure SQL database, taking also advantage of all other features that it offers to developers.
145
+
> [!NOTE]
146
+
> Vector Functions are in Early Adopter Preview. Get access to the preview via https://aka.ms/azuresql-vector-eap-announcement
152
147
153
-
## Community Samples
148
+
## More resources
154
149
155
-
[Implement The RAG Pattern](https://blazorhelpwebsite.com/ViewBlogPost/5066)
0 commit comments