If you want to add simple text search functionality to an application backed by an SQL database one of the first things that may come to your mind is the SQL LIKE
operator. The LIKE operator and its case-insensitive sibling ILIKE
find substrings in text data via wildcards such as %
, which matches any sequence of zero or more characters:
SELECT * FROM book WHERE title ILIKE '%dog%'.
However, this approach satisfies only very basic requirements for text search, because it only matches exact substrings. That’s why application developers often use an external search engine like Elasticsearch based on the Apache Lucene library.
With a PostgreSQL database there is another option: it comes with a built-in full-text search. A full-text search analyzes text according to the language of the text, parses it into tokens and converts them into so-called lexemes. These are strings, just like tokens, but they have been normalized so that different forms of the same word, for example “pony” and “ponies”, are made alike. Additionally, stop words are eliminated, which are words that are so common that they are useless for searching, like “a” or “the”. For this purpose the search engine uses a dictionary of the target language.
In PostgreSQL, there are two main functions to perform full-text search: they are to_tsvector
and to_tsquery
. The ts part in the function names stands for “text search”. The to_tsvector
function breaks up the input string and creates a vector of lexemes out of it, which are then used to perform full-text search using the to_tsquery
function. The two functions can be combined with the @@
(match) operator, which applies a search query to a search vector:
SELECT title
FROM book
WHERE to_tsvector(title) @@ to_tsquery('(cat | dog) & pony')
The query syntax of ts_query
supports boolean operators like |
(or), &
(and), !
(not) and grouping using parentheses, but also other operators like and <->
(“followed by”) and *
(prefix matching).
You can specify the target language as a parameter of to_tsvector
:
# SELECT to_tsvector('english', 'Thousands of ponies were grazing on the prairie.');
'graze':5 'poni':3 'prairi':8 'thousand':1
Here’s another example in German:
# SELECT to_tsvector('german', 'Wer einen Fehler begeht, und ihn nicht korrigiert, begeht einen zweiten (Konfuzius)');
'begeht':4,9 'fehl':3 'konfuzius':12 'korrigiert':8 'wer':1 'zweit':11
PostgreSQL supports dictionaries for about 80+ languages out-of-the-box.
The examples in this article are just a small glimpse of what is possible with regards to full-text search in PostgreSQL. If you want to learn more you should consult the documentation. The key takeaway is that there is another option between simple LIKE clauses and an external search engine.