icql-dba v10.1.0
ICQL-DBA
Table of Contents generated with DocToc
- Introduction
- OIMDB Functionality
- Switching between File- and RAM-Based Modes (Mode Transfer)
- Continuous Persistency
- Eventual Persistency
- Ad Hoc Persistency
- Regular Acquisition
- Ad Hoc Acquisition
- Privileged / Special Schemas: Main and Temp
- Usage
- Notes on Import Formats
- API
- SQL Submodule
- ICQL-DBA Plugins
- Rave Reviews (albeit for the concept, not this software)
- Similar Projects
- To Do
Introduction
- ICQL-DBA is an SQLite Database Adapter with Optional In-Memory DB (OIMDB) functionality.
- Implemented using
better-sqlite3
(B3 in the below) to provide the interface between NodeJS (JavaScript) and SQLite. - Because B3 is almost fully synchronous, ICQL operates almost completely synchronously, too.
- SQLite/B3 already provides In-Memory (IMDB) functionality. However, ICQL-DBA makes it easier to switch between In-Memory (RAM) and On-Disk operational modes, hence the O for Optional in OIMDB.
- Using ICQL-DBA, you could open a DB file, add some data which will readily be written to disk, then switch to RAM mode to perform some tedious data mangling, and then save the new DB state to the same file you opened the DB originally from.
OIMDB Functionality
- To process a DB file in RAM, ICQL DB first opens the file using a ad-hoc schema name and then copies all
DB objects (table, view and index definitions as well as data) from that ad-hoc schema into a RAM-based
schema using the name supplied in the
open()
call.- Note—SQLite 'temporary schemas' are mostly based in RAM but may use disk space in case available
memory becomes insufficient. Schemas without disk-based backup also exists; ICQL-DBA users can elect
to use either model (with the
disk: true|false
configuration) although IMO there's little reason to not use optional HD support. - Note—Confusingly, to get a RAM-based DB with the original SQLite/B3 API, you either use the empty
string
''
to get disk support (in ase of RAM shortage) or the pseudo-path':memory:'
to get one without disk support. In ICQL-DBA, you use the boolean settingsram
anddisk
instead which is much clearer. This frees thepath
argument from doing double duty, so one can use it to specify a default file path to be used implicitly for thesave()
command.
- Note—SQLite 'temporary schemas' are mostly based in RAM but may use disk space in case available
memory becomes insufficient. Schemas without disk-based backup also exists; ICQL-DBA users can elect
to use either model (with the
Switching between File- and RAM-Based Modes (Mode Transfer)
the
transfer()
API method may be used- to switch between Regular and Eventual Persistency, and to
- associate/dissociate a DB with/from a new file path.
transfer { schema, ram: true, [ path: 'path/to/file.db' ], }
switches from file-based Continuous Persistency to memory-based Eventual Persistency. This is a no-op in case the DB is already memory-based.- Likewise,
transfer { schema, ram: false, [ path: 'path/to/file.db' ], }
switches- a memory-based DB to Continuous Persistency. If the DB was originally opened from a file, the
path
setting is optional. Ifpath
is given, the DB will now be associated with that (possibly new) on-disk location
- a memory-based DB to Continuous Persistency. If the DB was originally opened from a file, the
Continuous Persistency
- When a DB is
open()
ed (from a file) with settingram: false
, every change to its structure or its business data will be immediately reflected to disk; this, of course, is the regular mode of operation for SQLite and most all RDBMSes and is, hence, known as Continuous Persistency.- If the referenced file is non-existant, it will be auto-created unless
create: false
has been specified.
- If the referenced file is non-existant, it will be auto-created unless
- Continuous Persistency always uses the SQLite binary file format.
Eventual Persistency
- While file-based SQLite DBs are permanently persistent (i.e. each change is written to disk as soon and as
safely as possible to make the DB resistant against unexpected interruptions), ICQL-DBA's OIMDB mode is
'eventually persistent' for all states of the DB arrived at right after a
save()
command has completed and before any new changes have been executed. - Eventual Persistency to disk is implemented with synchronous calls to
vacuum $schema into $path
(no schema-copying is involved in this step). The API method to do so issave()
, a method that does nothing in case a given schema is disk-based (and therefore writes all changes to disk, continuously); therefore, one can make it so that the same code with strategically placedsave()
statements works for both RAM-based and disk-based DBs without any further changes.- The RAM DB will be much faster than the disk-based one, but of course the disk-based one will be better safeguarded against data loss from unexpected interruptions.
- Eventual Persistency always uses the SQLite binary file format.
Ad Hoc Persistency
- There's also a way to do 'Ad Hoc' Persistency using the
export()
API. Theexport()
method will allow to write, for example, an SQL dump or the SQLite binary format to a given file.
Regular Acquisition
- 'Regular Acquisition' is a fancy way to describe what the
open()
method does
Ad Hoc Acquisition
- The counterpart to
export()
(Ad Hoc Persistency) isimport()
(Ad Hoc Acquisition). - Some file formats (such as
sql
) may be valid when empty and result in an empty DB.
Acquisition | Persistency | |
---|---|---|
Regular and Eventual | open() | save() ¹ |
Ad Hoc | import() | export() |
¹ save()
calls are optional no-ops for Continuous Persistency
Privileged / Special Schemas: Main and Temp
- SQLite has two special schemas,
main
andtemp
. - "The schema-names 'main' and 'temp' refer to the main database and the database used for temporary tables. The main and temp databases cannot be attached or detached."—SQLite Documentation
- When you create a
better-sqlite3
object, that data will be put into themain
schema. - It is possible to circumvent the
main
schema in SQLite/B3; to do so, call eithernew Database ':memory:'
which will create an emptymain
schema; then, you can execute an SQLattach
statement likeattach $path as $schema;
to open a file- or RAM-based DB under another schema of your choosing. - This procedure is not very straightforward; compare this to how you would use
open()
in ICQL-DBA. - When you create a temporary table as in
create temp table t;
orcreate temporary table t;
, thent
will be available astemp.t
(but nottemporary.t
).
Usage
Create DBA Object
DBA = require 'icl-dba'
dba = new DBA.Dba()
dba
is constructed with an emptypath
argument to B3, i.e.require('better-sqlite3')('')
.- This means the underlying B3 instance is connected to an empty temporary database under the
main
schema. dba
is not yet connected to any file-based DB; the only meaningful next operation isopen()
.- In the future, may add configuration to create
dba
and open an existing DB in a single step.
Create DB with open()
New or Existing File-Based DB with Continuous Persistency
path
must be a string that is a valid file system path (up to the parent directory); its final component must either point to an existing SQLite DB file or be non-existant. (Write permissions are required in case modifications are intended.)- In case the location indicated by
path
does not exist, a new SQLite DB file will be created. To prevent autocreation, specifycreate: false
, which will cause an error to be thrown.
dba.open { path: 'path/to/my.db', schema: 'myschema', }
RAM DB with Eventual Persistency
- It is possible to open a file DB and transfer it to RAM by
open()
ing an SQLite file withram: true
. This will copy the DB's structure and its data to RAM. - Observe that when a path is given,
ram
defaults tofalse
; when no path is given (see below),ram
defaults totrue
. - Use
disk: false
to keep SQLite from using temporary files to be used in case of RAM shortage (but why should you). - The path given will be used when
save()
is called later. - It is not allowed to call
save()
withpath
when DB was opened withpath
.
dba.open { path: 'path/to/my.db', schema: 'myschema', ram: true, }
New RAM DB without Eventual Persistency
- To
open()
a RAM DB that has no inherent link to a file, omit thepath
setting (or set it tonull
). - To obtain a RAM DB from an existing file DB but without writing changes back to that file, use
import()
. - Observe that when
path
is missing,ram
defaults totrue
, so in this case it may be omitted or set tonull
.
dba.open { schema: 'myschema', }
Import a DB
- Observe that unlike most of the ICQL-DBA API,
dba.import()
is asynchronous. This is mostly due to the relative scarcity of synchronous parsing (and, generally, file-handling) packages for the NodeJS ecosystem. - Supported formats include
sqlite
for the SQLite binary file format andsql
for SQL dumps.
- Unlike
open()
,import()
accepts SQL dumps (and, in the future, possibly other formats). - Use
format: 'sqlite'
orformat: 'sql'
to explicitly specify the intended file format in case the file extension does not match:.dump
,.sql
are recognized asformat: 'sql'
(Not yet implemented).db
,.sqlite
are recognized asformat: 'sqlite'
- In the future,
import()
may accept additional arguments:save_as
to specify a path for Continuous or Eventual Persistencyram
will default tofalse
whensave_as
is not given and totrue
otherwiseoverwrite
to specify whether an existing file at the position indicated bysave_as
should be overwritten.
dba = new Dba()
schema = 'myschema'
dba.open { schema, }
await dba.import { path: 'path/to/some.db', schema, }
Notes on import { format: 'sql', }
- A valid SQL script may contain arbitrary SQL statements other than such statements as are output into an SQL dump and are strictly requisite to re-creating a given DB instance's state.
- A valid SQLite SQL script may also contain dot-commands that would be interpreted by the SQLite shell and are not part of SQL proper
- Even if all dot-commands are discarded or declared errors, there's still
attach 'path/to/my.db' as myschema
, so an SQL script may define structures and data in multiple schemas that may also reference each other. - One way to deal with this is to make not only all dot-commands illegal (they don't work with
dba.execute()
anyhow so no change is required), but also demand that valid SQL scripts either- do not reference any schema except
main
, explicitly or implicitly, or - where a schema other than
main
is intended, an explicit configuration setting likefrom_schema
must be included, and only that schema will be imported.
- do not reference any schema except
- In any event,
imüport()
ing SQL scripts/dumps will include:- setting up a temporary
dba
(or B3) instance, - in case batch mode is used (for big files), crude lexing of the SQL is needed so we can delineate and group statments (already implemented),
vacuum
ing of the temporary DB to an SQLite binary DB file, andopen()
ing that file from the original DBA instance.
- setting up a temporary
Transfer DB
Transfer File-Based DB to RAM
transfer_to_ram()
allows to convert a file DB to a RAM DB.- Setting
schema
is a required setting that is the name of the file-based DB which will become the name of the RAM DB. - It will throw an error if the schema given is already a RAM DB.
- for the duration of RAM-based operation, the connection to the file is terminated; therefore, Continuous Persistency is not available
- user is responsible for either calling
save()
at appropriate points in time or else calltransfer_to_file()
once RAM-based operation should be terminated and results saved.
dba.transfer_to_ram { schema: 'myschema', }
Transfer RAM DB to file
transfer_to_file()
allows to convert a RAM DB to a file DB.- will (1) either copy the old DB file to a new location or else delete it, depending on configuration
(
### TAINT
which configuration?), then (2) callsave_as()
with the original path - When called without a
path
setting then the schema's associated path will be used. - When the schema points to a file DB and the path is not given or resolves to the associated path,
transfer_to_file()
is a no-op. - The path given becomes the associated path of the DB; this works for both file and RAM DBs. Also see
export()
.
dba.transfer_to_file { schema: 'myschema', path, }
Save DB
Use save()
to Save to Linked File
- File-based DBs have Continuous Persistency, no need to call
save()
(but no harm done, either). - RAM DBs must be
save()
d manually in order to persist changes in structure or data. save()
throws an error ifpath
setting is given.save()
throws an error ifschema
setting isn't given orschema
is unknown.- Use
transfer_to_file { path, }
(andsave()
after subsequent changes) to add or change the file path linked to a RAM DB. - Can also use
export { path, overwrite: true, format: 'sqlite', }
to repeatedly save a RAM DB as an SQLite binary file DB.
dba.save { schema: 'myschema', }
- The above call is roughly equivalent to calling
dba.export()
with a few additional parameters:
schema = 'myschema'
path = dba._schemas[ schema ].path
dba.export { schema, path, format: 'sqlite', overwrite: true, }
- The choice between
save()
andexport()
is rather intentional (conceptual) than extensional (material):- one calls
save { schema, }
to 'persist the state of a schema to its associated DB file', whereas - one calls
export { schema, path, }
to 'make a durable copy of this RAM DB'.
- one calls
Exporting to Binary and Textual Formats
- The path given will not become the associated path of the DB; this is different from
transfer_to_file()
.
dba.export { schema: 'myschema', path, format, overwrite, }
Notes on Import Formats
CSV
- Configuration:
transform
: optionalfunction
, default:null
_extra
: optionalobject
, default:null
. This value will be passed tocsv-parser
which does the hard part of parsing CSV so you can useawait dba.import { ..., format: 'csv', _extra: { ... }, ...}
to directly talk tocsv-parser
. Notice however that some settings may be overridden without notice bydba.import()
. For a description of options seecsv-parser
.- You can skip incomplete lines when they have empty fields, which are expressed as
null
values, either when all fields arenull
or when any field isnull
. Observe that this is only tested against the columns that were selected withinput_columns
(where set explicitly):skip_any_null
optionalboolean
, default:false
skip_all_null
optionalboolean
, default:false
trim
optionalboolean
, default:true
. Whether to remove leading and trailing whitespace from field values.default_value
optional; van be any value, default:null
. This value will be applied to all fields that are found to be (the) empty (string) (after optional trimming). Observe that quoting a field value will not prevent trimming.quote
optionalinput_columns
:- optional
boolean
or nonemptylist of nonempty texts
, default:null
true
: first non-skipped row of source contains column names; rows are objectsfalse
: rows are lists- list of
n
names: only the firstn
columns will be kept; rows are objects
- optional
table_name
: optionalnonempty_text
, defaults tomain
table_columns
:null
: columns are created astext
s depending on the first row encountered; if it is a list, columns will be namedc1
,c2
,c3
, ...,c${n}
{ name: type, name: type, ..., }
: columns are created with thename
s andtype
s given[ name, name, ..., ]
: all columns are created astext
API
TBD
dba.do_unsafe: ( f ) ->
—given a synchronous functionf
, setunsafeMode
totrue
, callf()
, then setunsafeMode
tofalse
. Used judiciously, this allows e.g. to update rows in a table while iterating over a result set. To ensure proper functioning with predictable results and avoiding endless loops (caused by new rows being added to the result set), it is suggested to add a fieldlck boolean not null default false
(for 'locked') to tables for which concurrent updates are planned. Setlck
of all or a subset of rows totrue
and addwhere lck
to yourselect
statement; any inserted rows will then have the defaultlck = false
value and be cleanly separated from the result set.dba.do_unsafe_async: ( f ) ->
—Same asdba.do_unsafe()
but for async functions.
User-Defined Functions
User-Defined Functions (UDFs) is one feature that sets SQLite apart from other RDBMSes because unlike other databases, SQLite allows users to define functions in user code, on the connection. Therefore, NodeJS users can define and use UDFs that are written in JavaScript or WASM and that can access the current machine's environment (e.g. the file system). On the one hand, this is probably somewhat slower than e.g. using a compiled PostgreSQL extension written in C, but on the other hand, userspace functions are orders of magnitude easier to write than a Posgres C extension; also, such functions can take advantage of the existing NodeJS ecosystem which is a huge plus and any speed penalty incurred by using JavaScript for 'hot' UDFs might be offset by an re-implementation in, say, Rust.
One downside—or, shall we say, "characteristic aspect"—of defining UDFs on the client side (as opposed to writing them embedded in SQL) is that your DB or at least some aspects of it may become unusable without suitable initialization of the connection. It is to be expected, though, that in a complex application some parts are not bound to function properly without other parts being in place—the application code as such won't work when the database of a DB-based app is missing, and the DB may not fully work without the application code. (This, by the way, is exactly true for Fossil SCM-based repositories, which might be regarded as the poster child of an application built around an SQLite database.)
dba.create_function: ( cfg ) ->
single-valued functionsdba.create_aggregate_function: ( cfg ) ->
aggregate functionsdba.create_window_function: ( cfg ) ->
window functionsdba.create_table_function: ( cfg ) ->
table-valued functionsdba.create_virtual_table: ( cfg ) ->
virtual tables
Connection Initialization
Right after a connection to an SQLite DB has been instantiated, an initialization method @initialize_sqlt
connection
is called. By overriding this method in a derived class, one can configure the connection e.g.
by calling better-sqlite3
's pragme()
method. When doing so, observe that the call happens before
the dba.sqlt
attribute is set, so avoid to access the dba
(this
/@
) instance. You're on the safe
side if you restrict yourself to accessing the first argument to initialize_sqlt()
. The default
implementation of the method looks like this:
initialize_sqlt: ( sqlt ) ->
sqlt.pragma "foreign_keys = true;"
return null
In your own implementation, do not forget to call super sqlt
to get the default configuration for the
connection.
SQL Submodule
dba = new Dba()
{ SQL, I, L, X, } = dba.sql
table = 'mytable'
value = 'something'
sql = SQL"select * from #{I table} where x == #{L value};"
# == select * from "mytable" where x == 'something';
SQL
: currently a no-op, but can be used (e.g. with coffeeplus) to signal text highlighting the language used in the string literal.I
: format a text as an SQL identifier (using double quotes)L
: format a value as an SQL literalX
: format a flat list as an SQL row value (a.k.a. a vector)
ICQL-DBA Plugins
Rave Reviews (albeit for the concept, not this software)
For the concept of using in-memory SQLite DBs (not specifically ICQL-DBA, which probably nobody uses):
We use SQLite in-memory databases for executing 100% of our business logic these days. Letting the business write all the rules in SQL is the biggest win of my career so far.
Also, if you think SQLite might be too constrained for your business case, you can expose any arbitrary application function to it. E.g.:
https://docs.microsoft.com/en-us/dotnet/standard/data/sqlite/user-defined-functions
The very first thing we did was pipe DateTime into SQLite as a UDF. Imagine instantly having the full power of .NET6 available from inside SQLite.
Note that these functions do NOT necessarily have to avoid side effects either. You can use a procedural DSL via SELECT statements that invokes any arbitrary business method with whatever parameters from the domain data.
The process is so simple I am actually disappointed that we didn't think of it sooner. You just put a template database in memory w/ the schema pre-loaded, then make a copy of this each time you want to map domain state for SQL execution.
You can do conditionals, strings, arrays of strings, arrays of CSVs, etc. Any shape of thing you need to figure out a conditional or dynamic presentation of business facts.
Oh and you can also use views to build arbitrary layers of abstraction so the business can focus on their relevant pieces.—https://news.ycombinator.com/item?id=27568537
Similar Projects
- Datasette
To Do
- CSV import
- implement importing to an existing schema; this will simplify import options (no need to make
ram: true
etc. configurable) - implement (async) streaming with SteamPipes transforms
- implement batching (?)
- implement passing options to CSV parser
- allow to specify column names, types for targetted table
- clarify whether
skip_first
means to skip the (physical) first line of imported file or the first line that is not skipped because it was blank or empty - ensure that all unquoted fields are trimmed
- ensure that all empty fields contain
null
instead of an empty string - implement skipping comments
- implement importing to an existing schema; this will simplify import options (no need to make
- TSV import (differs only in configuration (
delimiter
,quotes
) from CSV) - Consider to use B3
serialize()
forexport { format: 'sqlite', }
- Allow to
open()
RAM DB without path - Re-implement Hollerith codec for
int32
only, making it faster and smaller; add documentation along the lines of"DBA: VNRs"
(in hengist dev) how to efficiently sort VNRs - we (temporarily?) accept a
path
argument innew Dba { path, }
; this is used to attach themain
schema. Alternatively, and to keep the API mosre consistent(?), we could remove that argument and stipulate that the firstdba.open()
call implicitly creates thedba.sqlt
object; if theschema
given in that call is notmain
, thenmain
will be a RAM DB. discuss project focus and non-goals
while ICQL-DBA may gain some support for generated SQL, building kind-of-an-ORM is not one of its goals. Cf. Datasette allows constructs à la
for row in db["dogs"].rows_where(select='name, age'): ...
which already shows one of the general disadvantages of ORMs, namely, that one has to suddenly re-create parts of SQL in a more awkward way. Instead ofselect name, age from dogs where age > 1 order by age desc;
now you have to write
db[ 'dogs' ].rows_where( 'age > 1', select = 'name, age', order_by = 'age desc' )
which is considerably longer, more convoluted, and has an appreciably larger API surface than
dba.query()
.Observe that all of the arguments are really SQL fragments so in reality you still have to write SQL.
Worse, now the equivalent to that one SQL string
"select name, age from dogs where age > 1 order by age desc;"
has been diluted into four micro strings:'dogs'
,'age > 1'
,'name, age'
, and'age desc'
, plus three Python identifiers:rows_where
,select
, andorder_by
.Worse again, you still don't get column and table name parametrization (you can replace
'name, age'
with an interpolated string and variables, but you'll have to provide proper escaping (quotes, spaces, capitalization) and concatenation (commas) yourself).
re/define APIs for
- single-valued functions:
dba.create_function()
(dba.function()
) - aggregate functions:
dba.create_aggregate_function()
(dba.aggregate()
) - window functions:
dba.create_window_function()
- table-valued functions:
dba.create_table_function()
- virtual tables:
dba.create_virtual_table()
- single-valued functions:
dba.interpolate()
: add simple facilities to construct basic SQL clauses and statements such as inserts,values
clauses &c. Syntax could use dollar, format, colon, name ($X:name
) for named insertions and question mark, format, colon (?X:
)""
for positional inseertions. These would have to be processed beforedba.prepare sql
is called. The format parameter is optional and defaults toI
for 'identifier', as constructing statements with parametrized table and column names is the expected primary use case for interpolation. Other values for format areL
(for 'literal') andV
(for 'values', i.e. round brackets around a comma-delimited list of literals).Examples:
dba.query "select $:col_a, $:col_b where $:col_b in $V:choices", \ { col_a: 'foo', col_b: 'bar', choices: [ 1, 2, 3, ], } dba.query "select ?:, ?: where ?: in ?V:", \ [ 'foo', 'bar', 'bar', [ 1, 2, 3, ], ]
enable 'concurrent UDFs' (user-defined functions that execute SQL statements)
- From v7.1.0 on ICQL/DBA uses a recent algamation from https://sqlite.com/download.html with
SQLITE_USE_URI
set to1
so concurrent UDFs are possible.
- From v7.1.0 on ICQL/DBA uses a recent algamation from https://sqlite.com/download.html with
make it so that RAM DBs may be opened with a
name
parameter incfg
that is then used to build a file URL likefile:#{name}?mode=memory&cache=shared
wherename
becomes the identifier for shared memory across allbetter-sqlite3
instances running in the same process.- where a RAM DB is opened without a
name
parameter incfg
, assign a randomly chosen name likernd_4f333589dc3ae799
; this can be recovered from adba
instance so that a second conncetion can be instantiated. - to make results more predictable, deprecate use of
path
s like''
and':memory:'
.
- where a RAM DB is opened without a
implement
dba.initialize_sqlt()
- remove dependency on
hollerith-codec
, replace with simpler, faster implementation, publish asicql-dba-vnr
. - consider to replace
tempy
with a leaner module:
npx howfat -r table icql-dba
icql-dba@7.2.0 (63 deps, 14.36mb, 687 files)
╭───────────────────────┬──────────────┬──────────┬───────╮
│ Name │ Dependencies │ Size │ Files │
├───────────────────────┼──────────────┼──────────┼───────┤
│ hollerith-codec@3.0.1 │ 3 │ 967.28kb │ 91 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ tempy@1.0.1 │ 50 │ 919.3kb │ 396 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ intertype@7.6.7 │ 1 │ 509.95kb │ 41 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ cnd@9.2.2 │ 1 │ 270.78kb │ 38 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ mysql-tokenizer@1.0.7 │ 0 │ 98.6kb │ 15 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ n-readlines@1.0.3 │ 0 │ 96.01kb │ 16 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ csv-parser@3.0.0 │ 1 │ 58.66kb │ 29 │
├───────────────────────┼──────────────┼──────────┼───────┤
│ letsfreezethat@3.1.0 │ 0 │ 40.27kb │ 8 │
╰───────────────────────┴──────────────┴──────────┴───────╯
- add export in ASCII/Unicode tabular format for overview
- add automatic deletion of tables, views, indexes
- add web server plugin
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago
3 years ago