Asked  9 Months ago    Answers:  5   Viewed   10 times

I have a DB with several fields

word_id — INTEGER PRIMARY_KEY
word — TEXT
...

..and ~150k rows.

Since this is a dictionary, I'm searching for a word with mask 'search_string%' using LIKE. It used to work just fine, taking 15ms to find matching rows. The table has an index for a field 'word'. Recently I've modified the table (some fields of that table which are out of the scope) and something happened — it's taking 400ms to execute query, so I understand that as it fails to use index now. Straightforward query with = instead of like shows 10ms result. Does someone have an idea what's happening here?

 Answers

1

An index cannot safely be used in this case. A naive implementation would transform this:

... WHERE word LIKE 'search_string%'

into

... WHERE word >= 'search_string' AND word < 'search_strinh'

by incrementing the last character of the search string. The greater-than and less-than operators can use an index, where LIKE cannot.

Unfortunately, that won't work in the general case. The LIKE operator is case-insensitive, which means that 'a' LIKE 'A' is true. The above transformation would break any search string with capitalized letters.

In some cases, however, you know that case sensitivity is irrelevant for a particular column, and the above transformation is safe. In this case, you have two options.

  1. Use the NOCASE collating sequence on the index that covers this particular field.
  2. Change the behavior of the LIKE operator program-wide by running PRAGMA case_sensitive_like = ON;

Either of these behaviors will enable SQLite to transparently do the above transformation for you; you just keep using LIKE as always, and SQLite will rewrite the underlying query to use the index.

You can read more about "The LIKE Optimization" on the SQLite Query Optimizer Overview page.

Monday, September 6, 2021
 
Pradip
 
1

FTS does not support LIKE

The previously accepted answer was incorrect. Full Text Search with its full text indexes is not for the LIKE operator at all, it has its own operators and doesn't work for arbitrary strings. It operates on words based on dictionaries and stemming. It does support prefix matching for words, but not with the LIKE operator:

  • Get partial match from GIN indexed TSVECTOR column

Trigram indexes for LIKE

Install the additional module pg_trgm which provides operator classes for GIN and GiST trigram indexes to support all LIKE and ILIKE patterns, not just left-anchored ones:

Example index:

CREATE INDEX tbl_col_gin_trgm_idx  ON tbl USING gin  (col gin_trgm_ops);

Or:

CREATE INDEX tbl_col_gist_trgm_idx ON tbl USING gist (col gist_trgm_ops);
  • Difference between GiST and GIN index

Example query:

SELECT * FROM tbl WHERE col LIKE '%foo%';   -- leading wildcard
SELECT * FROM tbl WHERE col ILIKE '%foo%';  -- works case insensitively as well

Trigrams? What about shorter strings?

Words with less than 3 letters in indexed values still work. The manual:

Each word is considered to have two spaces prefixed and one space suffixed when determining the set of trigrams contained in the string.

And search patterns with less than 3 letters? The manual:

For both LIKE and regular-expression searches, keep in mind that a pattern with no extractable trigrams will degenerate to a full-index scan.

Meaning, that index / bitmap index scans still work (query plans for prepared statement won't break), it just won't buy you better performance. Typically no big loss, since 1- or 2-letter strings are hardly selective (more than a few percent of the underlying table matches) and index support would not improve performance to begin with, because a full table scan is faster.


text_pattern_ops for prefix matching

For just left-anchored patterns (no leading wildcard) you get the optimum with a suitable operator class for a btree index: text_pattern_ops or varchar_pattern_ops. Both built-in features of standard Postgres, no additional module needed. Similar performance, but much smaller index.

Example index:

CREATE INDEX tbl_col_text_pattern_ops_idx ON tbl(col text_pattern_ops);

Example query:

SELECT * FROM tbl WHERE col LIKE 'foo%';  -- no leading wildcard

Or, if you should be running your database with the 'C' locale (effectively no locale), then everything is sorted according to byte order anyway and a plain btree index with default operator class does the job.

More details, explanation, examples and links in these related answers on dba.SE:

  • Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
  • How is LIKE implemented?
  • Finding similar strings with PostgreSQL quickly
Tuesday, June 1, 2021
 
1

If your requirement is to find a particular z_id and the x_ids and y_ids linked to it (as distinct from quickly selecting a range of z_ids) you could look into a non-indexed hash-table nested-relational db that would allow you to instantly find your way to a particular z_id in order to get its y_ids and x_ids -- without the indexing overhead and the concomitant degraded performance during inserts as the index grows. In order to avoid clumping (aka bucket collisions), choose a key hashing algorithm that puts greatest weight on the digits of z_id with greatest variation (right-weighted).

P.S. A database that uses a b-tree may at first appear faster than a db that uses linear hashing, say, but the insert performance will remain level with the linear hash as performance on the b-tree begins to degrade.

P.P.S. To answer @kawing-chiu's question: the core feature relevant here is that such a database relies on so-called "sparse" tables in which the physical location of a record is determined by a hashing algorithm which takes the record key as input. This approach permits a seek directly to the record's location in the table without the intermediary of an index. As there is no need to traverse indexes or to re-balance indexes, insert-times remain constant as the table becomes more densely populated. With a b-tree, by contrast, insert times degrade as the index tree grows. OLTP applications with large numbers of concurrent inserts can benefit from such a sparse-table approach. The records are scattered throughout the table. The downside of records being scattered across the "tundra" of the sparse table is that gathering large sets of records which have a value in common, such as a postal code, can be slower. The hashed sparse-table approach is optimized to insert and retrieve individual records, and to retrieve networks of related records, not large sets of records that have some field value in common.

A nested relational database is one that permits tuples within a column of a row.

Wednesday, June 16, 2021
 
SubniC
 
2

In the expression 'hi' LIKE '%hi there%', it is not possible to find any characters to replace the % wildcards so that the strings would match.

You need to do the comparison the other way around, i.e., 'hi there' LIKE '%hi%':

db.rawQuery("SELECT shompet FROM sentence" +
            " WHERE ? LIKE '%' || " + column + " || '%'",
            new String[] { newMessage });
Monday, December 13, 2021
 
3

I would imagine that you must have a non covering index with leading column comparepnfwd that is used by the literal query but not by the query with the variable.

You can use OPTION (RECOMPILE) to get SQL Server to recompile the plan taking into account the actual variable value.

Monday, December 27, 2021
 
Owen
 
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :