icql-dba v10.1.0
ICQL-DBA
Table of Contents generated with DocToc
- Introduction
- OIMDB Functionality
- Switching between File- and RAM-Based Modes (Mode Transfer)
- Continuous Persistency
- Eventual Persistency
- Ad Hoc Persistency
- Regular Acquisition
- Ad Hoc Acquisition
- Privileged / Special Schemas: Main and Temp
- Usage
- Notes on Import Formats
- API
- SQL Submodule
- ICQL-DBA Plugins
- Rave Reviews (albeit for the concept, not this software)
- Similar Projects
- To Do
Introduction
- ICQL-DBA is an SQLite Database Adapter with Optional In-Memory DB (OIMDB) functionality.
- Implemented using
better-sqlite3(B3 in the below) to provide the interface between NodeJS (JavaScript) and SQLite. - Because B3 is almost fully synchronous, ICQL operates almost completely synchronously, too.
- SQLite/B3 already provides In-Memory (IMDB) functionality. However, ICQL-DBA makes it easier to switch between In-Memory (RAM) and On-Disk operational modes, hence the O for Optional in OIMDB.
- Using ICQL-DBA, you could open a DB file, add some data which will readily be written to disk, then switch to RAM mode to perform some tedious data mangling, and then save the new DB state to the same file you opened the DB originally from.
OIMDB Functionality
- To process a DB file in RAM, ICQL DB first opens the file using a ad-hoc schema name and then copies all
DB objects (table, view and index definitions as well as data) from that ad-hoc schema into a RAM-based
schema using the name supplied in the
open()call.- Note—SQLite 'temporary schemas' are mostly based in RAM but may use disk space in case available
memory becomes insufficient. Schemas without disk-based backup also exists; ICQL-DBA users can elect
to use either model (with the
disk: true|falseconfiguration) although IMO there's little reason to not use optional HD support. - Note—Confusingly, to get a RAM-based DB with the original SQLite/B3 API, you either use the empty
string
''to get disk support (in ase of RAM shortage) or the pseudo-path':memory:'to get one without disk support. In ICQL-DBA, you use the boolean settingsramanddiskinstead which is much clearer. This frees thepathargument from doing double duty, so one can use it to specify a default file path to be used implicitly for thesave()command.
- Note—SQLite 'temporary schemas' are mostly based in RAM but may use disk space in case available
memory becomes insufficient. Schemas without disk-based backup also exists; ICQL-DBA users can elect
to use either model (with the
Switching between File- and RAM-Based Modes (Mode Transfer)
the
transfer()API method may be used- to switch between Regular and Eventual Persistency, and to
- associate/dissociate a DB with/from a new file path.
transfer { schema, ram: true, [ path: 'path/to/file.db' ], }switches from file-based Continuous Persistency to memory-based Eventual Persistency. This is a no-op in case the DB is already memory-based.- Likewise,
transfer { schema, ram: false, [ path: 'path/to/file.db' ], }switches- a memory-based DB to Continuous Persistency. If the DB was originally opened from a file, the
pathsetting is optional. Ifpathis given, the DB will now be associated with that (possibly new) on-disk location
- a memory-based DB to Continuous Persistency. If the DB was originally opened from a file, the
Continuous Persistency
- When a DB is
open()ed (from a file) with settingram: false, every change to its structure or its business data will be immediately reflected to disk; this, of course, is the regular mode of operation for SQLite and most all RDBMSes and is, hence, known as Continuous Persistency.- If the referenced file is non-existant, it will be auto-created unless
create: falsehas been specified.
- If the referenced file is non-existant, it will be auto-created unless
- Continuous Persistency always uses the SQLite binary file format.
Eventual Persistency
- While file-based SQLite DBs are permanently persistent (i.e. each change is written to disk as soon and as
safely as possible to make the DB resistant against unexpected interruptions), ICQL-DBA's OIMDB mode is
'eventually persistent' for all states of the DB arrived at right after a
save()command has completed and before any new changes have been executed. - Eventual Persistency to disk is implemented with synchronous calls to
vacuum $schema into $path(no schema-copying is involved in this step). The API method to do so issave(), a method that does nothing in case a given schema is disk-based (and therefore writes all changes to disk, continuously); therefore, one can make it so that the same code with strategically placedsave()statements works for both RAM-based and disk-based DBs without any further changes.- The RAM DB will be much faster than the disk-based one, but of course the disk-based one will be better safeguarded against data loss from unexpected interruptions.
- Eventual Persistency always uses the SQLite binary file format.
Ad Hoc Persistency
- There's also a way to do 'Ad Hoc' Persistency using the
export()API. Theexport()method will allow to write, for example, an SQL dump or the SQLite binary format to a given file.
Regular Acquisition
- 'Regular Acquisition' is a fancy way to describe what the
open()method does
Ad Hoc Acquisition
- The counterpart to
export()(Ad Hoc Persistency) isimport()(Ad Hoc Acquisition). - Some file formats (such as
sql) may be valid when empty and result in an empty DB.
| Acquisition | Persistency | |
|---|---|---|
| Regular and Eventual | open() | save()¹ |
| Ad Hoc | import() | export() |
¹ save() calls are optional no-ops for Continuous Persistency
Privileged / Special Schemas: Main and Temp
- SQLite has two special schemas,
mainandtemp. - "The schema-names 'main' and 'temp' refer to the main database and the database used for temporary tables. The main and temp databases cannot be attached or detached."—SQLite Documentation
- When you create a
better-sqlite3object, that data will be put into themainschema. - It is possible to circumvent the
mainschema in SQLite/B3; to do so, call eithernew Database ':memory:'which will create an emptymainschema; then, you can execute an SQLattachstatement likeattach $path as $schema;to open a file- or RAM-based DB under another schema of your choosing. - This procedure is not very straightforward; compare this to how you would use
open()in ICQL-DBA. - When you create a temporary table as in
create temp table t;orcreate temporary table t;, thentwill be available astemp.t(but nottemporary.t).
Usage
Create DBA Object
DBA = require 'icl-dba'
dba = new DBA.Dba()dbais constructed with an emptypathargument to B3, i.e.require('better-sqlite3')('').- This means the underlying B3 instance is connected to an empty temporary database under the
mainschema. dbais not yet connected to any file-based DB; the only meaningful next operation isopen().- In the future, may add configuration to create
dbaand open an existing DB in a single step.
Create DB with open()
New or Existing File-Based DB with Continuous Persistency
pathmust be a string that is a valid file system path (up to the parent directory); its final component must either point to an existing SQLite DB file or be non-existant. (Write permissions are required in case modifications are intended.)- In case the location indicated by
pathdoes not exist, a new SQLite DB file will be created. To prevent autocreation, specifycreate: false, which will cause an error to be thrown.
dba.open { path: 'path/to/my.db', schema: 'myschema', }RAM DB with Eventual Persistency
- It is possible to open a file DB and transfer it to RAM by
open()ing an SQLite file withram: true. This will copy the DB's structure and its data to RAM. - Observe that when a path is given,
ramdefaults tofalse; when no path is given (see below),ramdefaults totrue. - Use
disk: falseto keep SQLite from using temporary files to be used in case of RAM shortage (but why should you). - The path given will be used when
save()is called later. - It is not allowed to call
save()withpathwhen DB was opened withpath.
dba.open { path: 'path/to/my.db', schema: 'myschema', ram: true, }New RAM DB without Eventual Persistency
- To
open()a RAM DB that has no inherent link to a file, omit thepathsetting (or set it tonull). - To obtain a RAM DB from an existing file DB but without writing changes back to that file, use
import(). - Observe that when
pathis missing,ramdefaults totrue, so in this case it may be omitted or set tonull.
dba.open { schema: 'myschema', }Import a DB
- Observe that unlike most of the ICQL-DBA API,
dba.import()is asynchronous. This is mostly due to the relative scarcity of synchronous parsing (and, generally, file-handling) packages for the NodeJS ecosystem. - Supported formats include
sqlitefor the SQLite binary file format andsqlfor SQL dumps.
- Unlike
open(),import()accepts SQL dumps (and, in the future, possibly other formats). - Use
format: 'sqlite'orformat: 'sql'to explicitly specify the intended file format in case the file extension does not match:.dump,.sqlare recognized asformat: 'sql'(Not yet implemented).db,.sqliteare recognized asformat: 'sqlite'
- In the future,
import()may accept additional arguments:save_asto specify a path for Continuous or Eventual Persistencyramwill default tofalsewhensave_asis not given and totrueotherwiseoverwriteto specify whether an existing file at the position indicated bysave_asshould be overwritten.
dba = new Dba()
schema = 'myschema'
dba.open { schema, }
await dba.import { path: 'path/to/some.db', schema, }Notes on import { format: 'sql', }
- A valid SQL script may contain arbitrary SQL statements other than such statements as are output into an SQL dump and are strictly requisite to re-creating a given DB instance's state.
- A valid SQLite SQL script may also contain dot-commands that would be interpreted by the SQLite shell and are not part of SQL proper
- Even if all dot-commands are discarded or declared errors, there's still
attach 'path/to/my.db' as myschema, so an SQL script may define structures and data in multiple schemas that may also reference each other. - One way to deal with this is to make not only all dot-commands illegal (they don't work with
dba.execute()anyhow so no change is required), but also demand that valid SQL scripts either- do not reference any schema except
main, explicitly or implicitly, or - where a schema other than
mainis intended, an explicit configuration setting likefrom_schemamust be included, and only that schema will be imported.
- do not reference any schema except
- In any event,
imüport()ing SQL scripts/dumps will include:- setting up a temporary
dba(or B3) instance, - in case batch mode is used (for big files), crude lexing of the SQL is needed so we can delineate and group statments (already implemented),
vacuuming of the temporary DB to an SQLite binary DB file, andopen()ing that file from the original DBA instance.
- setting up a temporary
Transfer DB
Transfer File-Based DB to RAM
transfer_to_ram()allows to convert a file DB to a RAM DB.- Setting
schemais a required setting that is the name of the file-based DB which will become the name of the RAM DB. - It will throw an error if the schema given is already a RAM DB.
- for the duration of RAM-based operation, the connection to the file is terminated; therefore, Continuous Persistency is not available
- user is responsible for either calling
save()at appropriate points in time or else calltransfer_to_file()once RAM-based operation should be terminated and results saved.
dba.transfer_to_ram { schema: 'myschema', }Transfer RAM DB to file
transfer_to_file()allows to convert a RAM DB to a file DB.- will (1) either copy the old DB file to a new location or else delete it, depending on configuration
(
### TAINTwhich configuration?), then (2) callsave_as()with the original path - When called without a
pathsetting then the schema's associated path will be used. - When the schema points to a file DB and the path is not given or resolves to the associated path,
transfer_to_file()is a no-op. - The path given becomes the associated path of the DB; this works for both file and RAM DBs. Also see
export().
dba.transfer_to_file { schema: 'myschema', path, }Save DB
Use save() to Save to Linked File
- File-based DBs have Continuous Persistency, no need to call
save()(but no harm done, either). - RAM DBs must be
save()d manually in order to persist changes in structure or data. save()throws an error ifpathsetting is given.save()throws an error ifschemasetting isn't given orschemais unknown.- Use
transfer_to_file { path, }(andsave()after subsequent changes) to add or change the file path linked to a RAM DB. - Can also use
export { path, overwrite: true, format: 'sqlite', }to repeatedly save a RAM DB as an SQLite binary file DB.
dba.save { schema: 'myschema', }- The above call is roughly equivalent to calling
dba.export()with a few additional parameters:
schema = 'myschema'
path = dba._schemas[ schema ].path
dba.export { schema, path, format: 'sqlite', overwrite: true, }- The choice between
save()andexport()is rather intentional (conceptual) than extensional (material):- one calls
save { schema, }to 'persist the state of a schema to its associated DB file', whereas - one calls
export { schema, path, }to 'make a durable copy of this RAM DB'.
- one calls
Exporting to Binary and Textual Formats
- The path given will not become the associated path of the DB; this is different from
transfer_to_file().
dba.export { schema: 'myschema', path, format, overwrite, }Notes on Import Formats
CSV
- Configuration:
transform: optionalfunction, default:null_extra: optionalobject, default:null. This value will be passed tocsv-parserwhich does the hard part of parsing CSV so you can useawait dba.import { ..., format: 'csv', _extra: { ... }, ...}to directly talk tocsv-parser. Notice however that some settings may be overridden without notice bydba.import(). For a description of options seecsv-parser.- You can skip incomplete lines when they have empty fields, which are expressed as
nullvalues, either when all fields arenullor when any field isnull. Observe that this is only tested against the columns that were selected withinput_columns(where set explicitly):skip_any_nulloptionalboolean, default:falseskip_all_nulloptionalboolean, default:false
trimoptionalboolean, default:true. Whether to remove leading and trailing whitespace from field values.default_valueoptional; van be any value, default:null. This value will be applied to all fields that are found to be (the) empty (string) (after optional trimming). Observe that quoting a field value will not prevent trimming.quoteoptionalinput_columns:- optional
booleanor nonemptylist of nonempty texts, default:null true: first non-skipped row of source contains column names; rows are objectsfalse: rows are lists- list of
nnames: only the firstncolumns will be kept; rows are objects
- optional
table_name: optionalnonempty_text, defaults tomaintable_columns:null: columns are created astexts depending on the first row encountered; if it is a list, columns will be namedc1,c2,c3, ...,c${n}{ name: type, name: type, ..., }: columns are created with thenames andtypes given[ name, name, ..., ]: all columns are created astext
API
TBD
dba.do_unsafe: ( f ) ->—given a synchronous functionf, setunsafeModetotrue, callf(), then setunsafeModetofalse. Used judiciously, this allows e.g. to update rows in a table while iterating over a result set. To ensure proper functioning with predictable results and avoiding endless loops (caused by new rows being added to the result set), it is suggested to add a fieldlck boolean not null default false(for 'locked') to tables for which concurrent updates are planned. Setlckof all or a subset of rows totrueand addwhere lckto yourselectstatement; any inserted rows will then have the defaultlck = falsevalue and be cleanly separated from the result set.dba.do_unsafe_async: ( f ) ->—Same asdba.do_unsafe()but for async functions.
User-Defined Functions
User-Defined Functions (UDFs) is one feature that sets SQLite apart from other RDBMSes because unlike other databases, SQLite allows users to define functions in user code, on the connection. Therefore, NodeJS users can define and use UDFs that are written in JavaScript or WASM and that can access the current machine's environment (e.g. the file system). On the one hand, this is probably somewhat slower than e.g. using a compiled PostgreSQL extension written in C, but on the other hand, userspace functions are orders of magnitude easier to write than a Posgres C extension; also, such functions can take advantage of the existing NodeJS ecosystem which is a huge plus and any speed penalty incurred by using JavaScript for 'hot' UDFs might be offset by an re-implementation in, say, Rust.
One downside—or, shall we say, "characteristic aspect"—of defining UDFs on the client side (as opposed to writing them embedded in SQL) is that your DB or at least some aspects of it may become unusable without suitable initialization of the connection. It is to be expected, though, that in a complex application some parts are not bound to function properly without other parts being in place—the application code as such won't work when the database of a DB-based app is missing, and the DB may not fully work without the application code. (This, by the way, is exactly true for Fossil SCM-based repositories, which might be regarded as the poster child of an application built around an SQLite database.)
dba.create_function: ( cfg ) ->single-valued functionsdba.create_aggregate_function: ( cfg ) ->aggregate functionsdba.create_window_function: ( cfg ) ->window functionsdba.create_table_function: ( cfg ) ->table-valued functionsdba.create_virtual_table: ( cfg ) ->virtual tables
Connection Initialization
Right after a connection to an SQLite DB has been instantiated, an initialization method @initialize_sqlt
connection is called. By overriding this method in a derived class, one can configure the connection e.g.
by calling better-sqlite3's pragme() method. When doing so, observe that the call happens before
the dba.sqlt attribute is set, so avoid to access the dba (this/@) instance. You're on the safe
side if you restrict yourself to accessing the first argument to initialize_sqlt(). The default
implementation of the method looks like this:
initialize_sqlt: ( sqlt ) ->
sqlt.pragma "foreign_keys = true;"
return nullIn your own implementation, do not forget to call super sqlt to get the default configuration for the
connection.
SQL Submodule
dba = new Dba()
{ SQL, I, L, X, } = dba.sql
table = 'mytable'
value = 'something'
sql = SQL"select * from #{I table} where x == #{L value};"
# == select * from "mytable" where x == 'something';SQL: currently a no-op, but can be used (e.g. with coffeeplus) to signal text highlighting the language used in the string literal.I: format a text as an SQL identifier (using double quotes)L: format a value as an SQL literalX: format a flat list as an SQL row value (a.k.a. a vector)
ICQL-DBA Plugins
Rave Reviews (albeit for the concept, not this software)
For the concept of using in-memory SQLite DBs (not specifically ICQL-DBA, which probably nobody uses):
We use SQLite in-memory databases for executing 100% of our business logic these days. Letting the business write all the rules in SQL is the biggest win of my career so far.
Also, if you think SQLite might be too constrained for your business case, you can expose any arbitrary application function to it. E.g.:
https://docs.microsoft.com/en-us/dotnet/standard/data/sqlite/user-defined-functions
The very first thing we did was pipe DateTime into SQLite as a UDF. Imagine instantly having the full power of .NET6 available from inside SQLite.
Note that these functions do NOT necessarily have to avoid side effects either. You can use a procedural DSL via SELECT statements that invokes any arbitrary business method with whatever parameters from the domain data.
The process is so simple I am actually disappointed that we didn't think of it sooner. You just put a template database in memory w/ the schema pre-loaded, then make a copy of this each time you want to map domain state for SQL execution.
You can do conditionals, strings, arrays of strings, arrays of CSVs, etc. Any shape of thing you need to figure out a conditional or dynamic presentation of business facts.
Oh and you can also use views to build arbitrary layers of abstraction so the business can focus on their relevant pieces.—https://news.ycombinator.com/item?id=27568537
Similar Projects
- Datasette
To Do
- CSV import
- implement importing to an existing schema; this will simplify import options (no need to make
ram: trueetc. configurable) - implement (async) streaming with SteamPipes transforms
- implement batching (?)
- implement passing options to CSV parser
- allow to specify column names, types for targetted table
- clarify whether
skip_firstmeans to skip the (physical) first line of imported file or the first line that is not skipped because it was blank or empty - ensure that all unquoted fields are trimmed
- ensure that all empty fields contain
nullinstead of an empty string - implement skipping comments
- implement importing to an existing schema; this will simplify import options (no need to make
- TSV import (differs only in configuration (
delimiter,quotes) from CSV) - Consider to use B3
serialize()forexport { format: 'sqlite', } - Allow to
open()RAM DB without path - Re-implement Hollerith codec for
int32only, making it faster and smaller; add documentation along the lines of"DBA: VNRs"(in hengist dev) how to efficiently sort VNRs - we (temporarily?) accept a
pathargument innew Dba { path, }; this is used to attach themainschema. Alternatively, and to keep the API mosre consistent(?), we could remove that argument and stipulate that the firstdba.open()call implicitly creates thedba.sqltobject; if theschemagiven in that call is notmain, thenmainwill be a RAM DB. discuss project focus and non-goals
while ICQL-DBA may gain some support for generated SQL, building kind-of-an-ORM is not one of its goals. Cf. Datasette allows constructs à la
for row in db["dogs"].rows_where(select='name, age'): ...which already shows one of the general disadvantages of ORMs, namely, that one has to suddenly re-create parts of SQL in a more awkward way. Instead ofselect name, age from dogs where age > 1 order by age desc;now you have to write
db[ 'dogs' ].rows_where( 'age > 1', select = 'name, age', order_by = 'age desc' )which is considerably longer, more convoluted, and has an appreciably larger API surface than
dba.query().Observe that all of the arguments are really SQL fragments so in reality you still have to write SQL.
Worse, now the equivalent to that one SQL string
"select name, age from dogs where age > 1 order by age desc;"has been diluted into four micro strings:'dogs','age > 1','name, age', and'age desc', plus three Python identifiers:rows_where,select, andorder_by.Worse again, you still don't get column and table name parametrization (you can replace
'name, age'with an interpolated string and variables, but you'll have to provide proper escaping (quotes, spaces, capitalization) and concatenation (commas) yourself).
re/define APIs for
- single-valued functions:
dba.create_function()(dba.function()) - aggregate functions:
dba.create_aggregate_function()(dba.aggregate()) - window functions:
dba.create_window_function() - table-valued functions:
dba.create_table_function() - virtual tables:
dba.create_virtual_table()
- single-valued functions:
dba.interpolate(): add simple facilities to construct basic SQL clauses and statements such as inserts,valuesclauses &c. Syntax could use dollar, format, colon, name ($X:name) for named insertions and question mark, format, colon (?X:)""for positional inseertions. These would have to be processed beforedba.prepare sqlis called. The format parameter is optional and defaults toIfor 'identifier', as constructing statements with parametrized table and column names is the expected primary use case for interpolation. Other values for format areL(for 'literal') andV(for 'values', i.e. round brackets around a comma-delimited list of literals).Examples:
dba.query "select $:col_a, $:col_b where $:col_b in $V:choices", \ { col_a: 'foo', col_b: 'bar', choices: [ 1, 2, 3, ], } dba.query "select ?:, ?: where ?: in ?V:", \ [ 'foo', 'bar', 'bar', [ 1, 2, 3, ], ]enable 'concurrent UDFs' (user-defined functions that execute SQL statements)
- From v7.1.0 on ICQL/DBA uses a recent algamation from https://sqlite.com/download.html with
SQLITE_USE_URIset to1so concurrent UDFs are possible.
- From v7.1.0 on ICQL/DBA uses a recent algamation from https://sqlite.com/download.html with
make it so that RAM DBs may be opened with a
nameparameter incfgthat is then used to build a file URL likefile:#{name}?mode=memory&cache=sharedwherenamebecomes the identifier for shared memory across allbetter-sqlite3instances running in the same process.- where a RAM DB is opened without a
nameparameter incfg, assign a randomly chosen name likernd_4f333589dc3ae799; this can be recovered from adbainstance so that a second conncetion can be instantiated. - to make results more predictable, deprecate use of
paths like''and':memory:'.
- where a RAM DB is opened without a
implement
dba.initialize_sqlt()- remove dependency on
hollerith-codec, replace with simpler, faster implementation, publish asicql-dba-vnr. - consider to replace
tempywith a leaner module:
npx howfat -r table icql-dba
icql-dba@7.2.0 (63 deps, 14.36mb, 687 files)
╭───────────────────────┬──────────────┬──────────┬───────╮
│ Name │ Dependencies │ Size │ Files │
├───────────────────────┼──────────────┼──────────┼───────┤
│ hollerith-codec@3.0.1 │ 3 │ 967.28kb │ 91 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ tempy@1.0.1 │ 50 │ 919.3kb │ 396 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ intertype@7.6.7 │ 1 │ 509.95kb │ 41 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ cnd@9.2.2 │ 1 │ 270.78kb │ 38 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ mysql-tokenizer@1.0.7 │ 0 │ 98.6kb │ 15 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ n-readlines@1.0.3 │ 0 │ 96.01kb │ 16 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ csv-parser@3.0.0 │ 1 │ 58.66kb │ 29 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ letsfreezethat@3.1.0 │ 0 │ 40.27kb │ 8 │
╰───────────────────────┴──────────────┴──────────┴───────╯- add export in ASCII/Unicode tabular format for overview
- add automatic deletion of tables, views, indexes
- add web server plugin
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
5 years ago
5 years ago
5 years ago
5 years ago