Strong Consistency with Raft and SQLite

Strong Consistency with Raft and SQLite

In the recent past, a typical client/server environment consisted of multiple clients connected to a central database server. A central database architecture is a system where all data is stored in a single centralized location.

When client A from Singapore commits a piece of information into the central database, client B from New York can access the same up-to-date information without any issue because it reads from the same central database.

The central database architecture is consistent and up-to-date by default, as there is only one source of truth. So why don't use a central database architecture for everything?

A central database architecture represents a single point of failure: If the central database fails, the entire system may be affected, leading to downtime and data loss and as the amount of data grows, a central database may become overwhelmed, leading to performance issues and slower response times.

Modern architectures involve a distributed database system where data is stored across multiple decentralized locations. This approach leads to numerous advantages:

  1. Scalability: Distributed databases can be scaled horizontally by adding more nodes to the network, making handling larger amounts of data and higher volumes of requests easier.

  2. High availability: Because data is stored across multiple locations, if one node fails, the data is still available on other nodes.

  3. Localized access: With a distributed database, data can be stored closer to the users who need it, which can improve response times and reduce network latency.

  4. Lower cost: Distributed databases can be built using commodity hardware, which is less expensive than specialized hardware required for a central database.

A distributed database system is the right solution for modern requirements, but maintaining data consistency across multiple nodes can be challenging, especially when dealing with updates and modifications.

In the example above, we need to guarantee that if client A from Singapore commits a piece of information in the node located in Asia, then client B connected to the node located in New York can access that up-to-date information transparently and with the minimum amount of latency.

Consistency means a lot of things, and in database terminology, there are several shades of consistency. The eventual consistency model is easy to achieve, where data read from client B can be the most up-to-date information. There is no guarantee about it. This compromise can be good enough for some instances like the number of reads in a blog post or likes in a Facebook article. At one point (we don't know when), the change in the Singapore database will be propagated to all the nodes, and the information will be consistent across all the nodes in the cluster.

The above model would be unacceptable for other cases, especially the ones that need to use transactions. Think about a bank account balance, a flight reservation system, or a CMS or a real-time app. For these cases, a strong consistency model must be used. Strong consistency guarantees that the data must be strongly consistent at all times. All the server nodes across the world should contain the same value as an entity at any point in time.

SQLite Cloud guarantees strong consistency and uses the Raft consensus algorithm under the hood.

The Raft consensus algorithm is a distributed consensus algorithm designed to help manage replicated logs (database changes in our case) in a distributed system. The algorithm was introduced in 2014 by Diego Ongaro and John Ousterhout as an alternative to other, more complex, consensus algorithms. Raft works by ensuring that all nodes in a distributed system agree on the same log of commands or events. In doing so, it helps achieve strong consistency across the system.

How Does Raft Work?

The Raft consensus algorithm works by electing a leader node that manages the replicated log. The leader node receives commands from clients and replicates them across all nodes in the distributed system. Raft ensures that all nodes have the same log of commands by using a series of communication messages called "log entries." Each log entry contains a command or event that is appended to the log of each node.

Raft divides time into terms, where each term begins with a leader election. During each term, a single leader is elected, and that leader is responsible for managing the replicated log. If the leader fails or becomes unreachable, a new leader is elected, and the process starts over.

To ensure that all nodes have the same log of commands, Raft uses a mechanism called "log replication." During log replication, the leader sends log entries to all other nodes in the distributed system. Once a majority of nodes in the system have received and committed the log entry, it is considered committed and can be executed. This process ensures that all nodes in the system have the same log of commands and can maintain strong consistency.

How To Achieve Strong Consistency with Raft?

Raft achieves strong consistency by ensuring that all nodes in the distributed system have the same log of commands. To achieve this, Raft uses several mechanisms, including leader election, log replication, and commit mechanisms.

During leader election, Raft ensures that a single node is responsible for managing the replicated log. This ensures that all nodes in the system are working with the same log of commands.

During log replication, Raft ensures that all nodes in the system receive the same log of commands by using a majority vote. This ensures that any updates made to the log are replicated to a majority of nodes in the system, maintaining strong consistency (SQLite Cloud guarantees to prevent stale reads in any node).

Finally, Raft uses a commit mechanism to ensure that log entries are executed in the same order on all nodes in the system. This helps maintain strong consistency by ensuring that all nodes execute the same commands in the same order.

What to distribute with SQLite?

Thanks to Raft, we have a way to distribute a database change across all the cluster nodes. But what is the real information to distribute?

Some database vendors choose to distribute the raw SQL statement, so, if client A connected to Singapore performs the following statement:

INSERT INTO foo (col1, col2, col3) VALUES (1, 2, 3);

Then this same SQL statement as-is will be replicated across all the cluster nodes. This simple statement will work fine because it contains deterministic values like 1, 2, and 3 that cannot change during the execution on different nodes.

But what about if the executed SQL statement is something like:

INSERT INTO foo (col1, col2, col3) VALUES (RANDOM(), TIME(), DATE());

This SQL statement will produce different outputs on different nodes, breaking the consistency we need to guarantee. This problem is due to non-deterministic SQL functions like TIME, DATE, and RANDOM. If the time used by the information to travel from Singapore to New York is 100ms, then the TIME function will be 100ms higher when written to the New York node, which can have catastrophic consequences for the logic of our solution.

Some database engines warn users not to use non-deterministic SQL functions, introducing severe limitations in database usage. Other database engines implemented complex SQL parsers that replace non-deterministic SQL functions with deterministic values. This solution introduces a runtime delay because each SQL statement must be parsed and analyzed. More importantly, it increases the difficulty of staying up-to-date with the new versions of the underline SQL engine (SQLite in this specific case) because the custom parser must be updated each time a new SQL function/feature is introduced.

How to safely distribute a non-deterministic SQL statement?

With SQLite Cloud, we never distribute an SQL statement as-is, we only distribute the changeset derived from executing the SQL statement in one node.

In other words, instead of distributing this raw SQL statement:

INSERT INTO foo (col1, col2, col3) VALUES (RANDOM(), TIME(), DATE());

We distribute the result of the statement (plus some other metadata):

{
    "table": foo,
    "data": {
        "col1": 8875612685081594789,
        "col2": "10:57:34",
        "col3": "2023-03-21"
    }
}

By distributing a changeset we are immune to the side effects of non-deterministic SQL statements.

SQLite Cloud is now in beta and we understand that users feedback is critical to our success. As such, I would like to personally invite you to join our beta program and share any comments, observations, or feature requests that you may have with us.

As a beta tester, you have a unique opportunity to help shape the future of SQLite Cloud. Your feedback will help us to identify areas for improvement and to develop new features that meet your needs. We are committed to providing you with a high-quality cloud database service, and we believe that your input will be invaluable in achieving this goal.