Real-Time Full-Text Site Search with SQLite FTS5 extension

Real-Time Full-Text Site Search with SQLite FTS5 extension

At SQLite Cloud, we recently implemented a comprehensive documentation system based on markdown. To further enhance our documentation system, we wanted to add real-time search capabilities, and one of the obvious options was to use a third-party API service such as Algolia.

However, I came across an interesting article a few months ago that caught my attention. It described how Redis had implemented documentation search using a real-time full-text search module, which eventually became RediSearch:

SQLite comes equipped with FTS5, a remarkably potent full-text search module that we were eager to test on our internal documentation system. Our markdown documentation is publicly available on GitHub, allowing you to browse it at your convenience.

To begin, we created a documentation.sqlite database and defined a single FTS5 documentation table with the following declaration:

CREATE VIRTUAL TABLE documentation USING fts5 (url, content);

Afterward, we proceeded to develop a script that parses each markdown (MD) file and removes any tags present, resulting in a pure text file. Upon completion of the parsing procedure, each text file was added to the documentation table via the following code:

INSERT INTO documentation (url, content) VALUES (?1, ?2);

url is generated by starting at the full file path, while content is the text file itself.

Upon completion of the processing phase, the documentation table is filled with both the URL and content of every markdown file located in the docs subdirectory. As a result, the database size is roughly 438 KB.

Next, we created a new search endpoint within our GO backend and began distributing our solution through the same endpoint utilized by Redis, to ensure a fair comparison. Our container was deployed to instance groups located in the US-West, US-East, Zurich, and Mumbai zones on Google Cloud. A global load balancer was utilized to route traffic to these instances based on the geographical origin of the request and the current load within the instance group.

Since the Redis article did not mention the computational power of their nodes, we decided to play safe and we opted for the least expensive and least powerful nodes available (priced at approximately $6/month).

To gain a better understanding of the numbers involved, we created a search console module on the front end. The final result far surpassed our expectations and left us in awe:

Javascript sends an asynchronous request to our GO backend. The query time, which represents the time spent by the GO backend to send a request to an SQLite Cloud instance and receive a response, consistently remains below 6ms. The total time, which includes both query time and latency time (the duration spent traveling from your location to the nearest data center and back), consistently remains below 35ms.

To provide a practical example, I executed the plugin query from Italy. The Google load balancer selected the nearest data center, which happened to be located in Zurich, Switzerland (the physical distance between the two locations was also displayed). The exact query performed by the backend using the plugin search key was:

SELECT url, replace(snippet(documentation, -1, '<b>', '</b>', '...', 10), x'0A', ' ') as 'snippet' FROM documentation WHERE content MATCH 'plugin*' ORDER BY rank LIMIT 256;

Redis is widely regarded as one of the fastest database management systems available, primarily because it is an in-memory database. These findings demonstrate the remarkable speed and efficiency of both SQLite and SQLite Cloud, even with persistent data stored on a physical disk (it is worth noting that the test was conducted on the most affordable Google Cloud VM).

The final result looks like this:

And it can be tested on our official documentation site.


Further details on the FTS5 query syntax can be found on the official SQLite FTS5 page.