SQLite Can Do That: 4 extensions to supercharge SQLite

SQLite Can Do That: 4 extensions to supercharge SQLite

In addition to its simplicity, reliability, and outstanding performance characteristics, SQLite also features a modular design. As a result, SQLite has a robust extension ecosystem, full of packages that can turn SQLite into just about any kind of data store you'd need.

In this blog post, we will spotlight some of our favorite extensions you can start using to introduce additional functionality to SQLite (and SQLite Cloud).

The SQLite package manager

Before we dig into our picks for the top SQLite extensions, let's look at the easiest way to install and manage extensions for SQLite. sqlpkg is an open source SQLite extension package manager. It offers a streamlined way to enhance the capabilities of your SQLite databases by allowing you to install, manage, and discover nearly 100 different extensions. Designed with simplicity in mind, it operates with commands similar to other popular package managers. Here's every command you need to get started.

How to install and use sqlpkg

# To install on Mac/Linux, run
curl -sS https://webi.sh/sqlpkg | sh

# or on Windows
curl.exe https://webi.ms/sqlpkg | powershell

# to search for extensions
sqlpkg search [extension-name]

# to install extensions
sqlpkg install [package-name]

# other helpful commands
sqlpkg list
sqlpkg update [package-name]
sqlpkg remove [package-name]

Without further ado, let's get on with our top picks.

Web requests with http

sqlite-http allows SQLite to make HTTP requests directly from SQL queries. This effectively transforms SQLite from a static database management system into a dynamic data retrieval and interaction tool.

How http works

The primary function introduced by http is http_get, which takes a URL as a parameter and returns the response body as a string directly in SQL queries. It sends a GET request to the specified URL and handles the response by embedding it in the result set of the SQL query.

How to use sqlite-http

To begin using the http extension in your SQLite environment, you first need to install it using sqlpkg. Here’s how you can install it:

# install the package
sqlite> sqlpkg install asg017/http
# load the package
sqlite> .load ./http0
# Send a get request
sqlite> SELECT http_get('https://api.example.com/data');

http use cases

  1. Automatically fetch and store JSON or XML data from third-party APIs directly into the database.

  2. Enhance existing database records by pulling additional information from the web. Pair with sqlite-html to extract and manipulate the incoming data.

  3. Execute real-time data analysis by fetching and processing data on the fly from various online sources within SQL queries.

  4. Schedule regular updates of database content from external sources without the need for additional scripting or programming outside of SQL.

The simplicity of integrating HTTP requests into SQL commands opens up some interesting possibilities for data interaction and integration, all directly within your database engine.

Bloom filters with bloom

The Bloom Filter extension introduces a probabilistic data structure to SQLite, enhancing its capabilities to handle membership queries efficiently, especially in scenarios involving massive datasets. Like with any bloom filter use case, this extension is best for environments where speed is the priority, and a certain degree of error in the results—specifically, false positives—is acceptable.

What is a Bloom filter?

Bloom filters are a probabilistic data structure used to test whether an element is a member of a set. They are highly efficient in terms of space and time, but they allow for a small probability of false positives (incorrectly reporting that an element is in the set when it is not). A Bloom filter starts with a bit array of m bits, all initially set to 0. This array is the core of the Bloom filter, where information about elements is stored.

The Bloom filter then uses k different hash functions, with each hash function mapping any item to one of the m positions in the bit array with a uniform random distribution.

To add an element to the Bloom filter, the element is passed through all 𝑘 hash functions to get k array positions. The bits at all these positions are set to 1. For example, if the hash functions map an element to positions 2, 5, and 7, then bits 2, 5, and 7 of the array are set to 1.

To check whether an element is in the set, the element is again passed through all 𝑘k hash functions to generate k positions in the array. If all bits at these positions are 1, the filter reports that the element might be in the set. If any of the bits at these positions is 0, the element is definitely not in the set.

For SQLite, integrating a Bloom filter via an extension allows it to perform membership checks much faster than a full table scan, which is useful for querying large datasets.

How to use the Bloom filter extension

# load the extension
sqlite> .load dist/bloom
# create a filter table
sqlite> create virtual table sports_cars using bloom_filter(20);
# insert data
sqlite> insert into sports_cars values ('ferrari'), ('lamborghini'), ('Alfa Romeo');
# query for existence
sqlite> select count(*) from sports_cars('ferrari');
1
sqlite> select count(*) from sports_cars('bmw');
0

Use cases for the Bloom filter extension

  1. In caching, quick checks for whether data is already stored or not are necessary. Bloom filters can significantly reduce the overhead from these checks.

  2. For applications dealing with large-scale network traffic, such as routers or firewalls, Bloom filters can quickly check if an IP address or URL has been seen before or if it matches known threat patterns.

  3. In large databases, quickly checking if a record already exists before insertion can prevent duplicates and maintain data integrity without the overhead of searching the entire dataset.

  4. In big data scenarios, where data volumes can be enormous, using a Bloom filter to pre-check the existence of items can significantly reduce the amount of data that needs to be processed.

Vector search with sqlite-vss

sqlite-vss (SQLite Vector Similarity Search) is a SQLite extension for executing vector searches in your SQLite database. It is based on FAISS, a library developed by Facebook's AI Research team for efficient similarity searches and clustering of dense vectors.

This extension provides a practical middle ground between simple, less scalable methods like storing embeddings in JSON files and more complex, server-based vector databases such as Pinecone or Qdrant.

With sqlite-vss, embeddings are stored directly in SQLite databases, and users can perform queries using SQL syntax, making it an attractive option for local-first applications. The extension supports operations such as K-nearest neighbor searches through simple SQL queries, providing solid foundation for applications requiring semantic search, recommendation systems, or any feature that relies on fast and efficient similarity searches on embeddings.

How sqlite-vss works

The primary role of sqlite-vss is to store vector embeddings within a SQLite database. These embeddings, which are high-dimensional vectors usually derived from complex data types like text or images using machine learning models, represent the essential features of the data. In sqlite-vss, embeddings are stored in a specialized format directly within the database, which allows them to be queried effectively.

Under the hood, sqlite-vss uses FAISS for indexing the vectors. FAISS employs various indexing methods, such as quantization, which helps in compressing the vectors and reducing their dimensionality, speeding up searches. sqlite-vss manages the indices, and allows users to perform nearest neighbor searches on them.

How to use sqlite-vss

# install the extensions
sqlpkg install asg017/vss
sqlpkg install asg017/vector
# load the extensions in the SQLite CLI
.load ./vector0
.load ./vss0
# create a virtual table
# Number of dimensions == 300
create virtual table vss_movies using vss0(
  movie_description(300),
);

# Insert vectors into vss0 tables as JSON or raw bytes.
insert into vss_movies(rowid, description_embeddings)
  select rowid, description_embedding from movies;

# Execute a nearest-neighbor query 
select rowid, distance
from vss_movies
where vss_search(
  description_embedding,
  (select description_embedding from movies where rowid = 1)
)
limit 100; # number of neighbors to retrieve

sqlite-vss use cases

  1. Power semantic search engines that require understanding the meaning behind queries rather than just matching keywords.

  2. Develop recommendation systems where the goal is to suggest items similar to those a user has liked in the past.

  3. Identify duplicate data by comparing its vector representations.

  4. Improve the functionality of chatbots and virtual assistants by enabling them to understand and respond to queries with more relevant information.

Native CSV capabilities with Xsv

The sqlite-xsv extension enables native parsing and storage of data in Comma Separated Values (CSV) format. This can simplify and accelerate data import and export tasks. It's extremely fast - one of the fastest ways to operate on CSV data.

How sqlite-xsv works

sqlite-xsv is written in Rust and based on sqlite-loadable, as well as the csv crate. It features some enhanced ergonomics over other csv extensions, including the ability to query multiple files and gzip'ed or zstd compressed files directly.

How to use sqlite-xsv

# install the extension
sqlpkg install asg017/xsv
# load the extension
.load ./xsv0
# Create a table over your csv file
create virtual table temp.students using csv(
  filename="cities.csv"
);
# Query the csv directly
select * from temp.cities;
/*
┌────┬───────┬─────┬─────────┐
│ id │ name  │ pop │ country │
├────┼───────┼─────┼─────────┤
│ 1  │ Parma │ 194 │ Italy   │
│ 2  │ Pisa  │ 90  │ Italy   │
│ 3  │ Rome  │ 2800│ Italy   │
└────┴───────┴─────┴─────────┘
*/

Use Cases for XSV

  1. Integrate CSV data into your database easily.

  2. Clean and prepare data directly on CSVs.

  3. Stop loading/unloading between CSVs and SQLite.

Wrapping up

While Postgres gets a lot of attention for its extensibility, SQLite has a growing extension ecosystem that is supported by the work of open source contributors. Big shout-out to Alex Garcia (author of the http, vss, and xsv extensions) and Shawn Wagner (author of the bloom extension) in particular. SQLite is better thanks to these contributors!

SQLite's ability to be extended with additional functionalities like HTTP requests, probabilistic filters, vector search, and native CSV support makes it a versatile tool for anyone who wants a simple, lightweight and fully-featured database engine.

SQLite Cloud is fully compatible with the SQLite extension ecosystem, which means you can use all of these extensions on a global scale. If you’d like to try for yourself, sign up for free here.

Until next time, thanks for reading.