Knowing your Nodes: How to Scale in SQLite Cloud

Knowing your Nodes: How to Scale in SQLite Cloud

SQLite Cloud is powered by a custom implementation of the Raft distributed algorithm. In our implementation, there are three types of nodes: leaders, followers, and learners. Knowing the difference between them is essential for effectively scaling your SQLite Cloud clusters.

Refresher: Raft and Strong Consistency

Raft is a consensus algorithm designed to manage a replicated log across multiple nodes. Raft systems consist of a single “leader” node that can serve both reads and writes, and “follower” nodes that can only serve read requests. When the leader is unavailable, an election process begins to promote a follower node to a leader node, ensuring high availability in the face of outages or system errors.

In SQLite Cloud, the replicated log managed by Raft is a list of all changes made to your database. It’s important to note that SQLite Cloud distributes these diffsets, and not SQL statements (similar to how logical replication works in a database like PostgreSQL). This helps ensure data integrity, as applying non-deterministic SQL expressions across nodes could lead to inconsistent results.

SQLite Cloud uses the Raft algorithm to guarantee strong consistency - any read operation is guaranteed to return the most recent write on the database. Strong consistency is often associated with a bit more latency. Updates to the cluster require network requests to each node to guarantee that data read from any one node is the most up-to-date. SQLite Cloud has a highly optimized network layer, but light can only travel so fast.

For read-heavy workloads where strong consistency is not a hard requirement, you can disable linearizable reads on your cluster. This changes the behavior of your cluster - reads are no longer guaranteed to be strongly consistent, but eventually consistent. This tradeoff means users can read from the closest node without waiting to confirm it is up to date, which can dramatically reduce the network latency of read requests.

Strong consistency is not always a hard requirement. For example, if your SQLite database stores a search index on documentation, eventual consistency is likely good enough. In that case, disabling linearizable reads on your nodes is a good way to get the absolute best performance for your end user.

Node Types in SQLite Cloud

SQLite Cloud utilizes three primary node types: Leader, Follower, and Learner nodes. Leader nodes are implementations of the Raft leader node specification. They are responsible for all writes and updates to the database, and replicate those changes to other nodes in the cluster. There can only be one leader node in a cluster, which means you can’t scale write throughput by simply adding more leader nodes.

Follower nodes play a role in maintaining consistency and ensuring fault tolerance. They are only able to serve read requests. If a client is connected to a follower node and sends a write request, it is forwarded to the leader node automatically. Follower nodes participate in the election process, meaning that the more follower nodes you have, the more potential there is for latency during writes (when all nodes are synchronized), or when a new leader is elected. You must have an odd number of followers to ensure a majority vote can occur if a leader is lost. Additionally, with an odd number of nodes, the system can tolerate the failure of up to (n-1)/2 nodes, where 'n' is the total number of nodes. An even number of nodes doesn't add fault tolerance compared to a cluster with one fewer node. In a distributed system with an even number of nodes, it's possible to have a split vote during leader election, where each candidate receives an equal number of votes. This situation can lead to stalemates and delays in achieving consensus. With an odd number of nodes, a split vote scenario is less likely to occur, as there will always be a majority.

To minimize the effect of scaling reads on write performance, you can add learner nodes. Learner nodes are a type of follower node - the only difference is that they do not participate in the consensus process. Adding learner nodes is an effective way to scale read operations without increasing write latency.

While you can connect to specific nodes in SQLite Cloud, we recommend connecting to our multi-region load balancer. This ensures traffic is routed to the nearest node to your user to minimize latency.

Future work

There is no single solution for scaling write operations. While adding more clusters can help, synchronizing across clusters is not currently supported. Adding more resources to a leader node is also not a straightforward solution, as the leader node can change at any time. To help address this in the case of write-heavy workloads, we are building a local-first sync library to effectively manage large volumes of write transactions. As a user makes changes to the database on their device, changes are made on the local client SQLite database first, providing a zero-latency experience. Then, changes are seamlessly synced and merged with the cloud and across connected devices. This provides the best of both worlds, delivering both high availability and minimal latency.

For more detailed technical insights, refer to our previous blog posts on Raft and distributed SQLite, or our documentation on scaling in SQLite Cloud. You can sign up for a free account now to start exploring the SQLite Cloud beta.

Stay tuned for updates on our local-first approach and other exciting developments in SQLite Cloud.

Until next time!