Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 9b6737c

Browse filesBrowse files
committed
README.
1 parent 6f62e61 commit 9b6737c
Copy full SHA for 9b6737c

File tree

1 file changed

+332
-0
lines changed
Filter options

1 file changed

+332
-0
lines changed

‎README.md

Copy file name to clipboard
+332Lines changed: 332 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,332 @@
1+
JsQuery – json query language with GIN indexing support
2+
=======================================================
3+
4+
Introduction
5+
------------
6+
7+
JsQuery – is a language to query jsonb data type, introduced in PostgreSQL
8+
release 9.4.
9+
10+
It's primary goal is to provide an additional functionality to jsonb
11+
(currently missing in PostgreSQL), such as a simple and effective way
12+
to search in nested objects and arrays, more comparison operators with
13+
indexes support. We hope, that jsquery will be eventually a part of
14+
PostgreSQL.
15+
16+
Jsquery is released as tsquery data type (similar to tsquery) and @@
17+
match operator for jsonb.
18+
19+
Authors
20+
-------
21+
22+
* Teodor Sigaev <teodor@sigaev.ru>, Postgres Professional, Moscow, Russia
23+
* Alexander Korotkov <aekorotkov@gmail.com>, Postgres Professional, Moscow, Russia
24+
* Oleg Bartunov <oleg@sai.msu.su>, Postgres Professional, Moscow, Russia
25+
26+
Availability
27+
------------
28+
29+
JsQuery is realized as an extension and not available in default PostgreSQL
30+
installation. It is available from
31+
[github](https://github.com/akorotkov/jsquery)
32+
under the same license as
33+
[PostgreSQL](http://www.postgresql.org/about/licence/)
34+
and supports PostgreSQL 9.4+.
35+
36+
Regards
37+
-------
38+
39+
Development is sponsored by [Wargaming.net](http://wargaming.net).
40+
41+
Installation
42+
------------
43+
44+
JsQuery is PostgreSQL extension which requires PostgreSQL 9.4 or higher.
45+
Before build and install you should ensure following:
46+
47+
* PostgreSQL version is 9.4 or higher.
48+
* You have development package of PostgreSQL installed or you built
49+
PostgreSQL from source.
50+
* Your PATH variable is configured so that pg\_config command available.
51+
52+
Typical installation procedure may look like this:
53+
54+
$ git clone https://github.com/akorotkov/jsquery.git
55+
$ cd jsquery
56+
$ make USE_PGXS=1
57+
$ sudo make USE_PGXS=1 install
58+
$ make USE_PGXS=1 installcheck
59+
$ psql DB -c "CREATE EXTENSION jsquery;"
60+
61+
JSON query language
62+
-------------------
63+
64+
JsQuery extension contains `jsquery` datatype which represents whole JSON query
65+
as a single value (like `tsquery` does for fulltext search). The query is an
66+
expression on JSON-document values.
67+
68+
Simple expression is specified as `path binary_operator value` or
69+
`path unary_operator`. See following examples.
70+
71+
* `x = "abc"` – value of key "x" is equal to "abc";
72+
* `$ @> [4, 5, "zzz"]` – the JSON document is an array containing values
73+
4, 5 and "zzz";
74+
* `"abc xyz" >= 10` – value of key "abc xyz" is greater than or equal to 10;
75+
* `volume IS NUMERIC` – type of key "volume" is numeric.
76+
* `$ = true` – the whole JSON document is just a true.
77+
* `similar_ids.@# > 5` – similar\_ids is an array or object of length greater
78+
than 5;
79+
* `similar_product_ids.# = "0684824396"` – array "similar\_product\_ids"
80+
contains string "0684824396".
81+
* `*.color = "red"` – there is object somewhere which key "color" has value
82+
"red".
83+
* `foo = *` – key "foo" exists in object.
84+
85+
Path selects set of JSON values to be checked using given operators. In
86+
the simplest case path is just an key name. In general path is key names and
87+
placeholders combined by dot signs. Path can use following placeholders:
88+
89+
* `#` – any index of array;
90+
* `%` – any key of object;
91+
* `*` – any sequence of array indexes and object keys;
92+
* `@#` – length of array or object, could be only used as last component of
93+
path;
94+
* `$` – the whole JSON document as single value, could be only the whole path.
95+
96+
Expression is true when operator is true against at least one value selected
97+
by path.
98+
99+
Key names could be given either with or without double quotes. Key names
100+
without double quotes shouldn't contain spaces, start with number or concur
101+
with jsquery keyword.
102+
103+
The supported binary operators are:
104+
105+
* Equality operator: `=`;
106+
* Numeric comparison operators: `>`, `>=`, `<`, `<=`;
107+
* Search in the list of scalar values using `IN` operator;
108+
* Array comparison operators: `&&` (overlap), `@>` (contains),
109+
`<@` (contained in).
110+
111+
The supported unary operators are:
112+
113+
* Check for existence operator: `= *`;
114+
* Check for type operators: `IS ARRAY`, `IS NUMERIC`, `IS OBJECT`, `IS STRING`
115+
and `IS BOOLEAN`.
116+
117+
Expressions could be complex. Complex expression is a set of expressions
118+
combined by logical operators (`AND`, `OR`, `NOT`) and grouped using braces.
119+
120+
Examples of complex expressions are given below.
121+
122+
* `a = 1 AND (b = 2 OR c = 3) AND NOT d = 1`
123+
* `x.% = true OR x.# = true`
124+
125+
Prefix expressions are expressions given in the form path (subexpression).
126+
In this case path selects JSON values to be checked using given subexpression.
127+
Check results are aggregated in the same way as in simple expressions.
128+
129+
* `#(a = 1 AND b = 2)` – exists element of array which a key is 1 and b key is 1
130+
* `%($ >= 10 AND $ <= 20)` – exists object key which values is between 10 and 20
131+
132+
Path also could contain following special placeholders with "every" semantics:
133+
134+
* `#:` – every indexes of array;
135+
* `%:` – every key of object;
136+
* `*:` – every sequence of array indexes and object keys.
137+
138+
Consider following example.
139+
140+
%.#:($ >= 0 AND $ <= 1)
141+
142+
This example could be read as following: there is at least one key which value
143+
is array of numerics between 0 and 1.
144+
145+
We can rewrite this example in the following form with extra braces.
146+
147+
%(#:($ >= 0 AND $ <= 1))
148+
149+
The first placeholder `%` checks that expression in braces is true for at least
150+
one value in object. The second placeholder `#:` checks value to be array and
151+
all its elements satisfy expressions in braces.
152+
153+
We can rewrite this example without `#:` placeholder as follows.
154+
155+
%(NOT #(NOT ($ >= 0 AND $ <= 1)) AND $ IS ARRAY)
156+
157+
In this example we transform assertion that every element of array satisfy some
158+
condition to assertion that there is no one element which doesn't satisfy the
159+
same condition.
160+
161+
Some examples of using paths are given below.
162+
163+
* `numbers.#: IS NUMERIC` – every element of "numbers" array is numeric.
164+
* `*:($ IS OBJECT OR $ IS BOOLEAN)` – JSON is a structure of nested objects
165+
with booleans as leaf values.
166+
* `#:.%:($ >= 0 AND $ <= 1)` – each element of array is object containing
167+
only numeric values between 0 and 1.
168+
* `documents.#:.% = *` – "documents" is array of objects containing at least
169+
one key.
170+
* `%.#: ($ IS STRING)` – JSON object contains at least one array of strings.
171+
* `#.% = true` – at least one array element is objects which contains at least
172+
one "true" value.
173+
174+
Usage of path operators and braces need some explanation. When same path
175+
operators are used multiple times they may refer different values while you can
176+
refer same value multiple time by using braces and `$` operator. See following
177+
examples.
178+
179+
* `# < 10 AND # > 20` – exists element less than 10 and exists another element
180+
greater than 20.
181+
* `#($ < 10 AND $ > 20)` – exists element which both less than 10 and greater
182+
than 20 (impossible).
183+
* `#($ >= 10 AND $ <= 20)` – exists element between 10 and 20.
184+
* `# >= 10 AND # <= 20` – exists element great or equal to 10 and exists
185+
another element less or equal to 20. Query can be satisfied by array with
186+
no elements between 10 and 20, for instance [0,30].
187+
188+
Same rules apply when you search inside objects and branchy structures.
189+
190+
See our
191+
[pgconf.eu presentation](http://www.sai.msu.su/~megera/postgres/talks/pgconfeu-2014-jsquery.pdf)
192+
for more examples.
193+
194+
GIN indexes
195+
-----------
196+
197+
JsQuery extension contains two operator classes (opclasses) for GIN which
198+
provide different kinds of query optimization.
199+
200+
* jsonb\_value\_path\_ops
201+
* jsonb\_path\_value\_ops
202+
203+
In each of two GIN opclasses jsonb documents are decomposed into entries. Each
204+
entry is associated with particular value and it's path. Difference between
205+
opclasses is in the entry representation, comparison and usage for search
206+
optimization.
207+
208+
For example, jsonb document
209+
`{"a": [{"b": "xyz", "c": true}, 10], "d": {"e": [7, false]}}`
210+
would be decomposed into following entries:
211+
212+
* "a".#."b"."xyz"
213+
* "a".#."c".true
214+
* "a".#.10
215+
* "d"."e".#.7
216+
* "d"."e".#.false
217+
218+
Since JsQuery doesn't support search in particular array index, we consider
219+
all array elements to be equivalent. Thus, each array element is marked with
220+
same `#` sign in the path.
221+
222+
Major problem in the entries representation is its size. In the given example
223+
key "a" is presented three times. In the large branchy documents with long
224+
keys size of naive entries representation becomes unreasonable. Both opclasses
225+
address this issue but in a slightly different way.
226+
227+
### jsonb\_value\_path\_ops
228+
229+
jsonb\_value\_path\_ops represents entry as pair of path hash and value.
230+
Following pseudocode illustrates it.
231+
232+
(hash(path_item_1.path_item_2. ... .path_item_n); value)
233+
234+
In comparison of entries path hash is the higher part of entry and value is
235+
its lower part. This determines the features of this opclass. Since path
236+
is hashed and it is higher part of entry we need to know the full path to
237+
the value in order to use it for search. However, once path is specified
238+
we can use both exact and range searches very efficiently.
239+
240+
### jsonb\_path\_value\_ops
241+
242+
jsonb\_path\_value\_ops represents entry as pair of value and bloom filter
243+
of path.
244+
245+
(value; bloom(path_item_1) | bloom(path_item_2) | ... | bloom(path_item_n))
246+
247+
In comparison of entries value is the higher part of entry and bloom filter of
248+
path is its lower part. This determines the features of this opclass. Since
249+
value is the higher part of entry we can perform only exact value search
250+
efficiently. Range value search is possible as well but we would have to
251+
filter all the the different paths where matching values occur. Bloom filter
252+
over path items allows index usage for conditions containing `%` and `*` in
253+
their paths.
254+
255+
### Query optimization
256+
257+
JsQuery opclasses perform complex query optimization. Thus it's valuable for
258+
developer or administrator to see the result of such optimization.
259+
Unfortunately, opclasses aren't allowed to do any custom output to the
260+
EXPLAIN. That's why JsQuery provides following functions which allows to see
261+
how particular opclass optimizes given query.
262+
263+
* gin\_debug\_query\_value\_path(jsquery) – for jsonb\_value\_path\_ops
264+
* gin\_debug\_query\_path\_value(jsquery) – for jsonb\_path\_value\_ops
265+
266+
Result of these functions is a textual representation of query tree which
267+
leafs are GIN search entries. Following examples show different results of
268+
query optimization by different opclasses.
269+
270+
# SELECT gin_debug_query_path_value('x = 1 AND (*.y = 1 OR y = 2)');
271+
gin_debug_query_path_value
272+
----------------------------
273+
x = 1 , entry 0 +
274+
275+
# SELECT gin_debug_query_value_path('x = 1 AND (*.y = 1 OR y = 2)');
276+
gin_debug_query_value_path
277+
----------------------------
278+
AND +
279+
x = 1 , entry 0 +
280+
OR +
281+
*.y = 1 , entry 1 +
282+
y = 2 , entry 2 +
283+
284+
Unfortunately, jsonb have no statistics yet. That's why JsQuery optimizer has
285+
to do imperative decision while selecting conditions to be evaluated using
286+
index. This decision is made by assumtion that some condition types are less
287+
selective than others. Optimizer divides conditions into following selectivity
288+
class (listed by descending of selectivity).
289+
290+
1. Equality (x = c)
291+
2. Range (c1 < x < c2)
292+
3. Inequality (x > c)
293+
4. Is (x is type)
294+
5. Any (x = \*)
295+
296+
Optimizer evades index evaluation of less selective conditions when possible.
297+
For example, in the `x = 1 AND y > 0` query `x = 1` is assumed to be more
298+
selective than `y > 0`. That's why index isn't used for evaluation of `y > 0`.
299+
300+
# SELECT gin_debug_query_path_value('x = 1 AND y > 0');
301+
gin_debug_query_path_value
302+
----------------------------
303+
x = 1 , entry 0 +
304+
305+
With lack of statistics decisions made by optimizer can be inaccurate. That's
306+
why JsQuery supports hints. Comments `/*-- index */` and `/*-- noindex */`
307+
placed in the conditions forces optimizer to use and not use index
308+
correspondingly.
309+
310+
SELECT gin_debug_query_path_value('x = 1 AND y /*-- index */ > 0');
311+
gin_debug_query_path_value
312+
----------------------------
313+
AND +
314+
x = 1 , entry 0 +
315+
y > 0 , entry 1 +
316+
317+
SELECT gin_debug_query_path_value('x /*-- noindex */ = 1 AND y > 0');
318+
gin_debug_query_path_value
319+
----------------------------
320+
y > 0 , entry 0 +
321+
322+
Contribution
323+
------------
324+
325+
Please, notice, that JsQuery is still under development and while it's
326+
stable and tested, it may contains some bugs. Don't hesitate to raise
327+
[issues at github](https://github.com/akorotkov/jsquery/issues) with your
328+
bug reports.
329+
330+
If you're lacking of some functionality in JsQuery and feeling power to
331+
implement it then you're welcome to make pull requests.
332+

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.