Exploring SQL.js HTTP VFS: A Journey in Remote Database Querying
I've been experimenting with SQL.js HTTP VFS for a few months now. Essentially, this is a SQLite engine compiled into WebAssembly that can run against a remote database. The intriguing part is that it implements a virtual file system and supports range requests.
The Concept
Imagine you have a gigabyte-sized SQL database stored in the cloud. As long as your server supports range requests, this setup allows you to fetch data one kilobyte at a time. This is similar to how SQLite would read a local database from the file system if configured to page with file reads of 1024 bytes. The exciting aspect is that if you index and organize your data efficiently, you could theoretically perform the same requests remotely from a browser to a SQL database on a CDN. This means your backend server wouldn't need to run a database instance.
Initial Experiments
In August, I spent a couple of weekends writing a possible analytics clone using SQL.js HTTP VFS. It was an interesting project, but I didn't delve deeply into it. The implementation cleared a lot of data, almost as if downloading the entire database, so it didn't thoroughly test the concept. However, it served as a valuable proof of concept.
Rewriting and Optimization
Fast forward to December, I decided to rewrite the project. I started with an injected log parser, ending up with a 300 MB log file, which I parsed into a roughly 300 MB SQLite database. This provided a substantial database to experiment with. The size required me to optimize log parsing and inserts, reducing the time to parse over 1.3 million rows to about 57 seconds. This included parsing IP addresses, country lookups, URLs, and user agents—all done locally without web requests.
Querying Challenges
Running queries on this indexed database locally resulted in approximately 13-second queries to group by pathname and count requests for the top 25 pages. This wasn't ideal, especially since I only loaded eight hours of data. In a production environment, the dataset would be much larger. Adding a few indexes significantly improved performance, reducing query times to 100-300 milliseconds.
The Realization
As I was about to implement the WebAssembly module to run everything remotely, I had a realization. What does this setup truly offer? While it allows for scaling the number of reads infinitely, it doesn't make much sense unless you're dealing with massive datasets that can't be pre-computed. For my use case, with only eight SQL queries to run, pre-computing on the server was more efficient.
Conclusion
This project was an intriguing exploration of a virtual file system over HTTP. While it was fun and educational, it didn't present a compelling use case for my needs. Unless you're dealing with scenarios where pre-computing data is impractical, the benefits of this approach are limited. Nonetheless, it was a fascinating experiment that expanded my understanding of remote database querying.
Information
- Show
- FrequencyEvery two months
- Published26 December 2021 at 00:00 UTC
- Length8 min
- RatingClean