99
99
</entry>
100
100
<entry><type>real</type></entry>
101
101
<entry>
102
- Returns a number that indicates how similar the first string
103
- to the most similar word of the second string. The function searches in
104
- the second string a most similar word not a most similar substring. The
105
- range of the result is zero (indicating that the two strings are
106
- completely dissimilar) to one (indicating that the first string is
107
- identical to one of the words of the second string).
102
+ Returns a number that indicates the greatest similarity between
103
+ the set of trigrams in the first string and any continuous extent
104
+ of an ordered set of trigrams in the second string. For details, see
105
+ the explanation below.
108
106
</entry>
109
107
</row>
110
108
<row>
131
129
</tgroup>
132
130
</table>
133
131
132
+ <para>
133
+ Consider the following example:
134
+
135
+ <programlisting>
136
+ # SELECT word_similarity('word', 'two words');
137
+ word_similarity
138
+ -----------------
139
+ 0.8
140
+ (1 row)
141
+ </programlisting>
142
+
143
+ In the first string, the set of trigrams is
144
+ <literal>{" w"," wo","ord","wor","rd "}</literal>.
145
+ In the second string, the ordered set of trigrams is
146
+ <literal>{" t"," tw",two,"wo "," w"," wo","wor","ord","rds", ds "}</literal>.
147
+ The most similar extent of an ordered set of trigrams in the second string
148
+ is <literal>{" w"," wo","wor","ord"}</literal>, and the similarity is
149
+ <literal>0.8</literal>.
150
+ </para>
151
+
152
+ <para>
153
+ This function returns a value that can be approximately understood as the
154
+ greatest similarity between the first string and any substring of the second
155
+ string. However, this function does not add padding to the boundaries of
156
+ the extent. Thus, a whole word match gets a higher score than a match with
157
+ a part of the word.
158
+ </para>
159
+
134
160
<table id="pgtrgm-op-table">
135
161
<title><filename>pg_trgm</filename> Operators</title>
136
162
<tgroup cols="3">
156
182
<entry><type>text</type> <literal><%</literal> <type>text</type></entry>
157
183
<entry><type>boolean</type></entry>
158
184
<entry>
159
- Returns <literal>true</literal> if its first argument has the similar word in
160
- the second argument and they have a similarity that is greater than the
161
- current word similarity threshold set by
162
- <varname>pg_trgm.word_similarity_threshold</varname> parameter.
185
+ Returns <literal>true</literal> if the similarity between the trigram
186
+ set in the first argument and a continuous extent of an ordered trigram
187
+ set in the second argument is greater than the current word similarity
188
+ threshold set by <varname>pg_trgm.word_similarity_threshold</varname>
189
+ parameter.
163
190
</entry>
164
191
</row>
165
192
<row>
@@ -302,10 +329,11 @@ SELECT t, word_similarity('<replaceable>word</replaceable>', t) AS sml
302
329
WHERE '<replaceable>word</replaceable>' <% t
303
330
ORDER BY sml DESC, t;
304
331
</programlisting>
305
- This will return all values in the text column that have a word
306
- which sufficiently similar to <replaceable>word</replaceable>, sorted from best
307
- match to worst. The index will be used to make this a fast operation
308
- even over very large data sets.
332
+ This will return all values in the text column for which there is a
333
+ continuous extent in the corresponding ordered trigram set that is
334
+ sufficiently similar to the trigram set of <replaceable>word</replaceable>,
335
+ sorted from best match to worst. The index will be used to make this
336
+ a fast operation even over very large data sets.
309
337
</para>
310
338
311
339
<para>
0 commit comments