Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 20b79f2

Browse filesBrowse files
authored
Announce 2.0 (#325)
1 parent 7bac309 commit 20b79f2
Copy full SHA for 20b79f2

File tree

Expand file treeCollapse file tree

13 files changed

+517
-207
lines changed
Filter options
Expand file treeCollapse file tree

13 files changed

+517
-207
lines changed

‎pgml-docs/docs/blog/architecture.md

Copy file name to clipboard
+127Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
2+
<!--
3+
Our performance questions are centered around Machine Learning, which introduces factors beyond pure language runtime performance. Namely, where does the data from the computation come from and how does does it cross IO and process boundaries? Data movement for algorithms can be orders of magnitude more expensive than the other latency costs for ML applications, so we'd like our benchmarks to include the full picture. Inference is generally more latency sensitive than training, but training optimizations can change R&D iteration loops from days to minutes, or enable new data scales, so they can still be game changing.
4+
5+
In this post, we'll explore some potential architectures for Machine Learning, including a feature store, model store, and the performance characteristics of the inference layer. You can also compare the code snippets for "readability" and "maintainability" which is another important consideration.
6+
7+
8+
9+
10+
11+
12+
13+
![Machine Learning Infrastructure](/blog/benchmarks/Machine-Learning-Infrastructure-2.webp)
14+
15+
Warming up w/ some data generation
16+
----------------------------
17+
18+
To get started, we'll generate some test data for benchmarking, and document the process for anyone that wants to reproduce the results on their own hardware. We'll create a test set of 10,000 random embeddings with 128 dimensions, and print them out to `/dev/null`. This test only involves serialization to stdout, not persistence, so we can get an initial idea of language runtime.
19+
20+
<center>
21+
<iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vShmCVrYwmscys5TIo7c_C-1M3gE_GwENc4tTiU7A6_l3YamjJx7v5bZcafLIDcEIbFu-C2Buao4rQ6/pubchart?oid=278281764&amp;format=interactive"></iframe>
22+
</center>
23+
24+
=== "SQL"
25+
`time psql -f embedding.sql > /dev/null`
26+
27+
```sql linenums="1" title="embedding.sql"
28+
SELECT ARRAY_AGG(random()) AS vector
29+
FROM generate_series(1, 1280000) i
30+
GROUP BY i % 10000;
31+
```
32+
=== "Python"
33+
`time python3 embedding.py > /dev/null`
34+
35+
```python linenums="1" title="embedding.sql"
36+
import random
37+
embeddings = [
38+
[
39+
random.random() for _ in range(128)
40+
] for _ in range (10_000)
41+
]
42+
print(embeddings)
43+
```
44+
=== "Numpy"
45+
`time python3 embedding_numpy.py > /dev/null`
46+
47+
```python linenums="1" title="embedding_numpy.py"
48+
import sys
49+
import numpy
50+
numpy.set_printoptions(threshold=sys.maxsize)
51+
52+
embeddings = numpy.random.rand(10_000, 128)
53+
print(embeddings)
54+
```
55+
=== "Rust"
56+
`time cargo run --release > /dev/null`
57+
58+
```rust linenums="1" title="lib.rs"
59+
fn main() {
60+
let mut embeddings = [[0_f32; 128]; 10_000];
61+
for i in 0..10_000 {
62+
for j in 0..128 {
63+
embeddings[i][j] = rand::random()
64+
}
65+
};
66+
println!("{:?}", embeddings);
67+
}
68+
```
69+
70+
#### _Well, that's unexpected_
71+
72+
Numpy is relatively slow, even though everyone says it's "fast". It's important to actually measure our workloads to find these order of magnitude aberations that differ from expectations. The reason numpy is so slow, is it spends a lot of time formatting output into neat rows and columns. These additional string manipulations eat up time. We can prove that by skipping the serialization time, and only look at the raw generation times.
73+
74+
<center>
75+
<iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vShmCVrYwmscys5TIo7c_C-1M3gE_GwENc4tTiU7A6_l3YamjJx7v5bZcafLIDcEIbFu-C2Buao4rQ6/pubchart?oid=916756801&amp;format=interactive"></iframe>
76+
</center>
77+
=== "SQL"
78+
`time psql -f embedding.sql > /dev/null`
79+
80+
```sql linenums="1" title="embedding.sql"
81+
SELECT NULL FROM (
82+
SELECT ARRAY_AGG(random()) AS vector
83+
FROM generate_series(1, 1280000) i
84+
GROUP BY i % 10000
85+
) temp;
86+
```
87+
=== "Python"
88+
`time python3 embedding.py > /dev/null`
89+
90+
```python linenums="1" title="embedding.sql"
91+
import random
92+
embeddings = [
93+
[
94+
random.random() for _ in range(128)
95+
] for _ in range (10_000)
96+
]
97+
```
98+
=== "Numpy"
99+
`time python3 embedding_numpy.py > /dev/null`
100+
101+
```python linenums="1" title="embedding_numpy.py"
102+
import numpy
103+
embeddings = numpy.random.rand(10_000, 128)
104+
```
105+
=== "Rust"
106+
`time cargo run --release > /dev/null`
107+
108+
```rust linenums="1" title="lib.rs"
109+
fn main() {
110+
let mut embeddings = [[0 as f32; 128]; 10_000];
111+
for i in 0..10_000 {
112+
for j in 0..128 {
113+
embeddings[i][j] = rand::random()
114+
}
115+
};
116+
}
117+
```
118+
119+
Numpy is the easiest implementation. It's a single function call that does exactly what we want, except when it doesn't. Then we have to search the docs or code or internet to figure out what's going on. Rust and Python are pretty close in terms of readability and maintainability for me, although Rust has extra type annotations. The SQL implementation is concise, but will probably be the most difficult to maintain. The GROUP BY modulo factor is not how I'd first thought to implement this, and it leaves a coupling between the two dimensions in the array. Most programmers are less used to thinking in a declarative language for this type of work.
120+
121+
<center>
122+
<iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vShmCVrYwmscys5TIo7c_C-1M3gE_GwENc4tTiU7A6_l3YamjJx7v5bZcafLIDcEIbFu-C2Buao4rQ6/pubchart?oid=2007994359&amp;format=interactive"></iframe>
123+
</center>
124+
125+
A look at memory usage for our simple benchmark reveals another unexpected bit. PSQL is actually not executing the program and consuming memory, the Postgres server process is, and PSQL is just a client. This means it's also had the burden of establishing connections and passing data across process boundaries that none of our other programms had, which we haven't accounted for. Rust is the only implementation that actually did what we set out to do in this very trivial exercise. We managed to introduce unnecessary and complexity in the other implementations. Benchmarks (like most software engineering) can be tricky business.
126+
127+
Overall, Rust shows enough promise in this microbenchmark at over twice the speed and memory efficiency of Numpy that it warrants digging deeper to see how far we can take things. -->

‎pgml-docs/docs/blog/benchmarking.md

Copy file name to clipboardExpand all lines: pgml-docs/docs/blog/benchmarking.md
-207Lines changed: 0 additions & 207 deletions
This file was deleted.
+3Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
import random
2+
embeddings = [[random.random() for _ in range(128)] for _ in range (10_000)]
3+
print(embeddings)
+16Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
-- SELECT ARRAY_AGG(random()) AS vector
2+
-- FROM generate_series(1, 1280000) i
3+
-- GROUP BY i % 10000;
4+
5+
SELECT 1 FROM (
6+
SELECT ARRAY_AGG(random()) AS vector
7+
FROM generate_series(1, 1280000) i
8+
GROUP BY i % 10000
9+
) f LIMIT 0;
10+
11+
-- CREATE TABLE embeddings AS
12+
-- SELECT ARRAY_AGG(random()) AS vector
13+
-- FROM generate_series(1, 1280000) i
14+
-- GROUP BY i % 10000;
15+
16+
-- COPY embeddings TO '/tmp/embeddings.csv' DELIMITER ',' CSV HEADER;
+6Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
import sys
2+
import numpy
3+
numpy.set_printoptions(threshold=sys.maxsize)
4+
5+
embeddings = numpy.random.rand(10_000, 128)
6+
print(embeddings)

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.