Python, Pandas and SQLite Cloud

Python, Pandas and SQLite Cloud

Daniele Briggi's photo
·

3 min read

Check out what we’re building and sign up for free, no credit card required.

Hello, my name is Daniele - I’m a senior software engineer here at SQLite Cloud. Recently, we decided it was time to improve our Python SDK, as a part of a broader initiative to prepare for GA.

As a Python developer, I was excited to tackle our driver implementation. At SQLite Cloud, it is very important to us that developers can seamlessly migrate from using a SQLite database to using SQLite Cloud. This means that I needed to write an interface that would be identical to the original SQLite API (based on PEP249). Then, a user can simply import our package instead of sqlite3, and replace their connection string. Everything else should just work.

Writing the Driver

The first step was to study SQLite’s public API. Every aspect of SQLite has been optimized for simplicity, including the API. This made identifying the core method signatures easy.

Next, I focused on implementing the connection method, adapting it to our client that communicates with SQLite Cloud via our SCSP protocol. Connection customization is achieved through the connection string, enabling features like non-linearizable operations and compression. One by one, I learned the expected behavior of each public method to match sqlite3, including handling failures.

SQLite Cloud and Support for Pandas

The Pandas library is an industry standard tool for data manipulation and analysis. For example, many users have mentioned they use SQLite to import and explore CSVs with Dataframes locally.

Because our implementation features the exact same interface as sqlite3, I hoped there would be minimal work required to get pandas working with our driver.

I tried running the following code with no modifications:

import io

import pandas as pd

import sqlitecloud

dfprices = pd.read_csv(
    io.StringIO(
        """DATE,CURRENCY,PRICE
    20230504,USD,201.23456
    20230503,USD,12.34567
    20230502,USD,23.45678
    20230501,USD,34.56789"""
    )
)

conn = sqlitecloud.connect("sqlitecloud://myhost.sqlite.cloud:8860/mydatabase.sqlite?apikey=myapikey")

conn.executemany("DROP TABLE IF EXISTS ?", [("PRICES",)])

# Write the dataframe to the SQLite Cloud database as a table PRICES
dfprices.to_sql("PRICES", conn, index=False)

# Create the dataframe from the table PRICES on the SQLite Cloud database
df_actual_prices = pd.read_sql("SELECT * FROM PRICES", conn)

# Inspect the dataframe
print(df_actual_prices.head())

# Perform a simple query on the dataframe
query_result = df_actual_prices.query("PRICE > 50.00")

print(query_result)

And to my great excitement, it just worked! And, thankfully, it surfaced some discrepancies between our implementation and sqlite3 - for example, I discovered that cursors should not disconnect the given connection. This approach guarantees that our driver mirrors the sqlite3 experience.

While we need to do further testing, this compatibility is a good indicator that the driver implementation may slot in well with many other SQLite-compatible APIs (ORMs, for example).

Key Takeaways

Our decision to replicate the sqlite3 interface has proven to be a success, allowing compatibility with other tools out-of-the-box. I learned quite a bit about sqlite3 and its APIs, which you can learn more about here.

The Python SDK, like many of our SDKs, is still under development. You can follow my progress on the Python SDK in this GitHub issue. Check out our repo for more information and examples. And, if you do check it out, please open an issue if you find any bugs - the whole SQLite Cloud team thanks you! 🙏

Until next time, Ciao!