1.0.20250311 • Published 3 months ago

gash v1.0.20250311

Weekly downloads
-
License
BSD-1-Clause
Repository
github
Last release
3 months ago

GASH

Gash is a tool for running javascript from the command-line (CLI).

Instead of using a library to write a bespoke CLI script, you annotate your script or package and gash runs it for you. (In essence, your script becomes a plugin for gash.)

To learn how to write these annotations, skip straight to the IDL section.

Example: Custom script

If you're hacking together a custom script, it's often easiest to directly annotate the file by exporting a variable called gash. For example,

  • vectorise.mjs:

    export const gash = `IDL=1 
    vectorise [--force] JSON_FILE NAME 
    :: default( await $1.json(), $2, {force}) as JSON -> $1`; 
    
    export default function main( json, id, {force} )
        /* modify json */
        ...
        /* return to get it saved out */ 
        return json;  
    }

    The above can be invoked as:

$ gash vectorise.mjs --force somefile.json id

or as:

$ gash vectorise.mjs somefile.json id

Example: Package

Or you can annotate scripts in the gash key of your package.

  • package.json:
    {
       "name": "my-script",
       "main": "main.mjs",
       "gash": [
            "IDL=1",
            "my-script                -- Process PNG_FILEs to produce our data.", 
            "    [--iterations=COUNT] -- Number of times to process data.",
            "    PNG_FILE... ",
            ":: crunchData( $1.map( file => file.toBuffer()) ,{iterations=1})"
       ]
    }
  • main.mjs:

    import Processor from "./Processor.mjs"; 
    
    export function crunchData( buffers, {iterations}) {
        const processor = new Procesor;
        for (const b of buffers)    
            processor.add( b, iterations );  
       return processor.stats();
    }

    And invoked as

gash my-script --iterations=3 **/*.png 

Alpha Status & Mercurial Repo

gash has been in existence privately for several years. (And it has the technical debt to prove it.) A recent round of tidying to get it fit for public release has only highlighted the many limitations and frustrations that remain. (Including the documentation.)

However gash's too bloody useful to keep locked up. Hopefully the shortcomings will be rectified over time. But despite the 1.x tag it is really still alpha.

Also the principle repo is mercurial. I intended to switch fully to git, as I become familiar with the gitisms - assuming I can master them and my innate desire to swim against the tide doesn't make me stick to hg.

Requirements


It was my aim that gash shouldn't support any version of node.js before the most recent LTS release. That didn't survive contact with the real world (long, hard stare at glitch) and it probably runs on node 16.14, although there may still be bugs.

Installation

It's expected gash will be installed globally:

npm install -g gash

It's gash's job to manage compatibility issues. (The IDL= tag that all IDL declarations begin with solve the internal compatibility. You can similary invoke scripts with CLI= tag.)

gash installs a single binary: gash.mjs which is a rolled up and minified (dist/gash.mjs) which can be moved about anywhere and used, as is.

Optional Dependencies

gash has an optional dependency on better-sqlite3. The intent is to retire this for node's internal implementation when it becomes stable. (But some of my tools depends on better-sqlite's features.) It's only required if you call the database() method on File and otherwise won't be used.

Project axiom

A0: It is a strong project goal that any javascript module invokable by gash should be invokable without gash. There should be no strong dependencies.

Current exceptions:

  • We have a custom implementation of fetch fetch that handles files.
  • the browser globals alert(), confirm() and prompt() are supplied for the console. (Implement them please node.)
  • console.status() console.statusClear() and console.statusFlush() allow updating of a console "status" line. (Implement them, everyone!)

API

It's a project goal that gash will become embeddable in other projects. But no public API is yet offered. It's strictly command-line. The xwh server is a step towards that.

CLI

gash is invoked as gash SCRIPTLET [OPTIONS]... where OPTIONS depend on the SCRIPTLET.

SCRIPTLET can be a javascript source file, a JSON file, a directory containing package.json; it can be a module somewhere in the package tree; or it can be one of the builtins. gash help list builtins and modules found in the package tree. See Package resolution for more details.

Coventions

Throughout this documents, arguments are written as

gash '"cmd --argument *.html *.css"'

This utilises doubled quoting and single first argument reparsing to run on Windows and Unix without interference from their respective shells.

Obviously, it's up to you what you write in the privacy of your own terminal. But remember on POSIX shells (i.e. bash, zsh etc...) you will have to protect the arguments from evaluation; e.g.

gash dir * 

will call dir with all the files whereas

gash dir '*'

will let dir lookup the files (and start listing them as soon as one is found, instead of waiting for the whole directory to be read).

And similarly, POSIX shells will want to interpret many of the features gash provides; e.g.

gash scriptlet --option=${value}

is very different to

gash scriptlet '"--option=${value}"'

Tokenisation

Doubled quoting

Single and double quotes are trimmed from any argument gash is passed (i.e. any argument in argv). For example, typing gash echo "'some'" or echo '"some"' results in gash outputing some (without any quotes).

The exact rule is that quotes are removed when one of the quote characters ("or ') opens an argument, and the same quote character closes it and occurs nowhere else in the argument. (REDUP-* first-pass)

This allows safe cross-platform quoting; e.g.

{
    "scripts": {
        "start": "gash '\"xwh ${env.PORT}\"'"
    }
}

means npm start will work on Windows and unix. (Windows only understands double-quotes so gash is passed 'xwh ${env.PORT}' as its argument. Whereas, unix shells will respect the outermost single quotes and pass gash "xwh ${env.PORT}" as its argument. In either case, the extra set of quotes is trimmed.)

The same rule applies to the arguments of options (i.e. the bit after an equals sign) For example, echo-value --value='"some thing"' printssome thing(and not"some thing"`) to the console.

This is a feature of thr argv parser.

Single first argument reparsing

If gash has a single positional argument then it reparses it using it's own tokensier; e.g.

gash '"scriptlet --option=${value} *.html"'

is the same as

gash scriptlet --option='${value}' '*.html'

This flexibility, however, can be a gotcha if you are trying to invoke a scriptlet with a space in the filename and no other arguments.
currently no work around.

Tokenisation of argv

When gash is run from another shell (bash, zsh, cmd.exe, etc...), that shell breaks the command-line into a series of arguments. These are passed to gash in an array known as argv. Unhelpfully, all the quotes have been removed which leaves gash in the dark about what was quoted. So gash applies the following logic:

  1. Elements of argv that contain exactly two ocurrences of the single- (') or double- (") quote---one at the beginning and one at the end---have them stripped. Such arguments are considered quoted, and can never be "syntax" - i.e. cannot be options, operators, etc... (REDUP-* first pass)

    NB. To access this feature from another shell means reduplicating quotes, e.g. gash echo '"--output=file"'

  2. Elements of argv that begin with ${ are expected to be expressions. They'll be passed using the parser for bindings. And they are expected to finish with the closing } as the last character of the argument. Anything else should be an error. (Tests?)

  3. Elements of argv that begin with a backtick (`) are passsed as template literals. They are expected to run through to the end of the argument and terminate with another backtick. It's an error if they terminate before the end of the argument, or don't terminate at all. (Tests?)

  4. Elements of argv that exactly match an operator (e.g. >, |, &&, etc...) are treated as an operator. (You need reduplicated quoting to prevent this; e.g. gash echo some '"&&"' echo thing prints some && echo thing to the console; whereas gash echo some "&&" echo thing prints two lines some and thing)

  5. The first element of argv that is a double-dash (--) is deleted. All remaining elements of argv that beging with a dash will be treated as positionals. For example, in gash cmd --x -- --y --x is an option, but --y isn't. (Tests?) And gash echo -- --y prints --y rather than complaining about a missing option.

    (This is not the only way to do it. Things that look like options can be quoted and will be treated as strings. But, because of the shell you have to reduplicate the quotes - e.g. gash echo '"-Cdir"' is a safe way to print -Cdir to the console. But so is gash echo -- -Cdir.)

  6. Elements of argv that are a single dash (-) are the file topic. This wil be stdin, if a FILE, or a literal '-' if the argument is not "file-like". The section on the FILE TOPIC gives the ins and outs.

  7. Elements of argv that start with a single dash (-x) are passed as short options.

  8. Elements of argv that start with a double dash (--some) are treated as long options. If they don't contain an equal sign (=) the option name is the entire string. If they do contains an equals sign (--some=value) they are split at the first equal. The option name is everything up to the equals (--some) and the remainder after the equals is reparsed according rules 1-3 above to create the option's value.

    (The equals sign is entirely optional. An option expecting a value will consume the next positional. But it makes for more obvious parsing.)

Tokenisation of strings

When breaking a string into "arguments", gash uses the follow rules:

  1. Arguments are separated by "spaces" (any codepoint matching /\s/) and by the specials !?$%^=@#~()[]{}<>&|,;`'". (So some>file is two argments separator by the > operator.)
  2. Spaces and specials must be quoted for their literal value to appear in an argument. (ARGTOK-SPECIAL-*) Due to code reuse, the double-colon :: cannot appear unquoted. (A bug to be fixed...)
  3. Simple Quoting:
    1. Quoting happens with single (') or double (") quotes.
    2. Quoting runs through to the next occurence of the character.
    3. Quoting must cover the entire argument. (So there is no quoting spaces in the middle of the string like some" "thing".) The exception is the argument to an option counts so --some="thing" fine, as is --empty=""
    4. There are no escapes with backslashes or anything else. (This is for compatibility with Windows where, for example, "\\server\share\some dir\" makes it about impossible to escape anything with a backslash.) (ARGTOK-BACKSLASH-*)
    5. Quotes must be proceed by or followed by a space or an operator; it is an error for them to happen mid argument. (So cmd x" "y is an error.)
  4. Backticks also quote. (e.g cmd `hello world`)

    These are not command subsitution, as is normal for a shell, but are strings evaluated according to the rules for javascript's template literals. That means:

    • Templates supports backslash escapes (so gash '"echo1 `\x22`"' will output a ") and backslashes will need to be escaped (so gash '"echo1 `c:\\\\dir`"' outputs c:\\dir
    • The Expressions in a template literals evaluated as "bindings", not as straight javascript. For example gash '"echo1 `The answer is ${prompt()}`"' will prompt for a command line string and then echo it.

    • The execution context is undefined.

    • They should always return strings.

    • $( is reserved TEMPLATE-NO-$( and must be escaped. At some point it will support CLI subshells, e.g. gash echo1 `some$(cmd)thing`"

  5. ${} can be used to say the argument should be the result of an expression; e.g. gash '"cat ${prompt()}"' It cannot be quoted and must take up the whole argument, i.e. a space must follow the closing } Strings will probably be instantiated, but non-strings will be left as is.

Version token

When invoking the global gash from a script, start the command line with CLI=1; for example:

#!/bin/sh 

gash '"CLI=1 scriptlet --option=value"'

This tells gash which version you are using and it might even adapt.

Recursion and wildcards

** wildcard available.

The ** wildcard is available to recursie into subdirectories; for example

     gash '"dir **/*.txt"'

lists all .txt files in the current directory tree - subject to exclusions (see below).

(It also lists all directories. You can turn this off with --no-dirs.)

TO DO: Switch to node's globs.

This was built before node gained glob support. It will eventually move over to node's globs and equivalent functions.

Recursive handling of directories.

If a scriptlet wants a glob, and a directory is given, then ** will be appended to it. (For example, gash dir some_dir is the same as
gash '"dir some_dir/**"')

BUT, if a scriptlet specifies a type of file (for example, it specifes JSON files) then the glob will have the relevant extension(s) (so json-thingy dir becomes gash json-thingy dir/**/*.json).

The extension are those listed for the mimetype, and multiple extensions can be implied. (For example, html-thingy srcwww would become gash html-thingy srcwww/**/*.htm srcwww/**/*.htm)

The --no-recurse option prevents this behaviour; and a non-recursive glob is used. (So gash json_thingy --no-recurse dir is gash json_thingy dir/*.json)

--recurse

The option --recurse or -R inserts ** before the final element of all globs. (For example gash '"make_ico -R dir\*.png"' is the same as gash '"make_ico dir\**\*.png'")

Exclusion

All filenames returned by globs are vetted against an exclusion-list that filters out matching file. The default exclusion list is .* - so files beginning with a '.' are not reported.

This can be controlled by the --exclude flag. Setting it overrides all the defaults. (Which means the **/.* is lost. So for example gash dir --exclude=p* . will return .git file. You need to do gash dir --exclude=p* --exclude=.* . to preserve the .* exclusion.)

To view all files, set the exclude list to the empty pattern. (For example gash dir --exclude="" . shows everything.)

Pipelines and redirection

>FILE

The > output redirection is available as normal. It's syntactic sugar for --output=FILE e.g.

gash '"echo boo >goose"'

is the same as

gash '"echo boo --output=goose"'

There is (currently) no rewriting of process.stdout (REDIR-OUT-ARGTOK,REDIR-OUT-ARGTOK)

2>FILE

Stderr redirection is available via the 2> operator. (REDIR-LOG-ARGTOK,REDIR-LOG-ARGTOK) It's syntactic sugar for --log=FILE e.g.

gash '"log-echo message 2>status.log"'

is the same as

gash '"log-echo message --log=status.log"'

given a hypothentical log-echo command. Agian, there is (currently) no rewriting of process.stderr; it just captures the console.xxx messages.

<FILE

The '<' input redirection is available. And syntactic sugar for --input=FILE e.g.

gash '"select "* from table" <database.sq3"'

is the same as

gash '"select "* from table" --input=database.sq3"'

'-' (The File Topic)

As is traditional, - can be used as a file name meaning stdin; e.g.

gash '"echo fish | cat -"'

which asks cat to output stdin, and so will print fish. (PIPE-FILETOPIC-POSITIONAL)

- can also be used as arguments to an option:

gash '"echo `{"option":"value"}` | cmd --config=-"'

which means cmd's config file will be the passed json - assuming cmd's IDL includes --config=JSON_FILE or something similar. (PIPE-FILETOPIC-OPTION)

A - that occur after "--" is treated as an ordinary dash. For example

gash 'cat -- -'

the above tries to output a file called "-". (STRTOK-FILETOPIC ARRAYTOK-FILETOPIC) But it's probably easier just to supply a path

  • e.g:
gash 'cat ./-'

A dash will be treated as plain text when not used to an argument that's "file-like"; for example:

gash 'echo -'

which outputs an ordinary "-". (FILETOPIC-TEXT-*)

"-" cannot be used twice - as it's a stream; for example this is an error:

gash 'echo "boo!" | cmd - -'

And you can't use --output while piping: (PIPE-TEELESS)

gash 'echo "boo!" --output=file.txt | cmd'  

Shell Variables / Substitution

There is hastily bolted on support for shell variables via ${}; for example, in the below, the second argument to cmd are the conents of the variable "thing".

gash '"cmd ${thing}"'

Remember when calling from a conventional posix shell, you need to protect against the shell inserting its own variable.

Variables must be complete arguments; e.g. this will NOT work:

gash '"fetch https://${host}/${file}"'

The variables won't even be spotted (use backticks). NB, however, this does work:

gash '"cmd --arg=${value}"'

because what comes after the '=' is special.

What goes in the brackets is an expression in the same javascript-derived language that is used for the IDL's binding. So you can do thinfs like:

gash '"cmd --arg=${prompt()}"'

Soft coercion

If the value returned is a string, it will be converted into the type as if you'd typed it on the command line; for example access:

gash '"xwh ${env.PORT}"'

this works fine, even though env.PORT is a string. Ditto:

gash '"xwh ${prompt(`Enter_port:`)}"'

Caveats

The following limitations:

  • Variables can't appear in lists. For example, annoyingly, this doesn't (yet) work: gash 'echo ${var}'

List of global switches

Before the module name.

These option can appear anywhere:

  • --stacktrace When emitting errors, print a full stack trace.
  • --inspect Wait to launch the script until a debugger has attached.
  • --cwd=<DIR> and -C<DIR> Set the current directory. When it occurs before the command, it will affect scriptlet resolution; e.g. gash -C/some/dir cmd will look for cmd in /some/dir/. When it occurs after, it will not; so gash cmd -C/some/dir will find cmd in the current directory and then changes to /some/dir/ for argument resolution and running the script.

Options

Scriptlet Resolution

gash can run any javascript function - provided it has information on how to call it from the Command Line Interface (CLI). Functions with such annotations are called "scriptlets", and "scriptlet resolution" is the process of locating the javascript function and its annotation.

Outline Algorithm

The process to resolve a SCRIPTLET_NAME into the (module, annotation)-pair works as follows:

  1. If a SCRIPTLET_NAME begins http:, https: or file: then it's assumed to be a module url. ((No tests))

  2. If a SCRIPTLET_NAME begins /, \, ./, .\ or [A-Z]: then its assumed to be a file url. ((No tests))

  3. If neither of these conditions are met, then a package tree is created for the current directory. Roughly, gash steps up the directory tree to find a package.json and adds likely candidates along with some builtins. gash help will list the scriplets in the package tree.

  4. If the SCRIPTLET_NAME matches a package name in the package tree,
    then that will be used.

  5. If there's no matching package in the package tree, and the SCRIPTLET_NAME doesn't begin with '@', but does contain a full stop (.), a forward slash (/) or backslash (\) - then it's assumed to be a file.

  6. Anything else is an error.

builtin: resolve

The resolve builtin provides access to gash's resolver. Given a scriptlet name it returns the url of the module that will be used. (For example gash resolve my-command will point to the ESM module that will be imported to run my-command.)

Adding --idl option to resolve will show the exact annotation - "the IDL". (For example gash resolve --idl my-commend.) Alternatively, the help builtin will show a user friendly version of the idl. (For example gash help my-command or gash my-command --help.)

Files

When presente dwith a file, gash first looks at the extension. If a it's .mjs or .js then gash will treat it as an ESM module. (For example, gash my-file.mjs) See Modules for how they are handled.

If a file has a .json extension, then a package tree will be built from it and the root package used. (For example, gash ./app.json would parse app.json as if it was a package.json and likely invoke the main module.)

If a file is a directory and it contains a package.json file, then gash will use that file and invoke the root. (For example, gash c:\src\project would use c:\src\project\package.json)

Anything else should be an error.

Modules

When presented with a module, gash will import() it and look at the gash export.

If that's a string, then that is expected to be the Interface Definition Language (IDL) description of how to run the file.

For example, suppose my-cmd.mjs looks like this:

export function main() {
    return "Hello world!"; 
}

export const gash = "IDL=1 my-cmd :: main()"; 

Then gash ./my-cmd.mjs would echo "Hello world!";

The gash key can also be an array of strings - which will be joined with '\n'. Anything else is an error.

The Package Tree

Given a package.json, gash will build a package tree as follows:

  1. First it will add all the builtins to the tree. These can't be overriden.

  2. Then it looks at the "gash" key of the package.json.

    1. If the "gash" key is a string (or an array of strings), then a scritplet is added with the same name as the package. (The value of the gash key is the IDL for the main entry point---arrays are joined with line-breaks---and the IDL declared name must match the package name.)

    2. If it's a non-null object, then it's treated as scriptlet list.

    3. Otherwise, no root package will be available.

  3. If the package.json doesn't provide a gash scriptlet list (2ii), then the "dependencies" and "devDependencies" will be evaluated, and any explicit gash packages provide by them will be added to the tree. (I.e. to block this auto adding, you must supply an object to the gash key; you will then have to manually declare the main package.)

  4. Finally, any entries in the "scripts" key that begin gash will be added. (FIXME: all package scripts of the top level package should be available.)

Scriptlet List

{ 
  "name": "my-package",
  "main": "main.mjs",
  "gash": {
      "-- Keys beginning '--' are ignored":
         [ "(Yes, Crockford, when formal comments aren't available",
           "everybody invents their own incompatible syntax;"
           "better to have one agreed upon standard.)" ],

      "-- The package below will be called `my-package` and uses main.mjs": "",
      "my-package": [ "IDL=1", "-- My package!!", "my-package :: default()" ],

      "-- The package below will be called `package2`, and use the main of":"", 
      "-- the `package2` dependecy.": "",
      "package2":   "IDL=1 package2 :: default()",

      "-- The package names for this entry will be whatever pacakge3 provides": 
         "it may be `package3` or it could be `wibble` and `wobble`.", 
      "package3":   "*",
      
      "-- The package below will be called `script`, come what may.": "",
      "./script.mjs": "*",

      "-- The package below will be called `first-entry`": "",
      "./script2.mjs#first-entry": "IDL=1 first-entry :: first()",

      "-- The package below will be called `second-entry`": "",
      "./script2.mjs#second-entry": "IDL=1 $0 :: second()"

      "-- The package below is called 'max' and needs no module": "",
      "#max": "IDL=1 max INT... :: with() Math.max(...$1)"
  },
  "dependencies": {
     "package1": "1.0",
     "package2": "1.1" 
  },
  "devDependencies": {
    "package3": "2.0"
  } 
} 

In the above, the scriptlet list is value of the "gash" entry - i.e.:

{
    "my-package": [ "IDL=1", "-- My package!!", "my-package :: default()" ],
    "package2":   "IDL=1 package2 :: default()",
    "package3":   "*",
    "./script.mjs": "*",
    "./script1.mjs#first-entry": "IDL=1 first-entry :: first()",   
    "./script2.mjs#second-entry": "IDL=1 $0 :: second()",
    "#max": "IDL=1 max INT... :: with() Math.max(...$1)"   
}

Entries where the key begins "--" are ignored. Other keys can be:

  • the name of a package declared in the outer package.json. This must either match the "name" field of the outer package.json, itself, or be listed in its "dependencies" and "devDependencies" fields.
  • the url of an ESM module. Currently all urls must begin ./ or node: Relative urls should be restricted to the subtree of the package (but this is not yet enforced).

  • a fragment (i.e. something beginning '#'). The url used will be that of an empty module. This is mainly used to declare gash's builtin comamnds.

The values for each entry are either the explicit IDL to use or "*" to indicate the IDL (and possibly commands) will be "imported" from the module or package. ("package3" and "./script.mjs" in the above.)

Scriptlet Names (External Names)

The name of a scriptlet is deduced from the key in the scriptlet list.

When the key is an URL, and there's no "#"-fragment, then the path's basename is used, stripped of any extension. (For example, ./script.mjs will create a scriptlet called script.)

If the URL has a fragment, then the fragment (stripped of the '#') will be used as the scriptlet name. (For example, ./script1.mjs#first-entry will create a scriptlet called first-entry, and ./script1.mjs#second-entry will create a scriptlet called second-entry.)

When a scritplet's name is a dependency name (i.e. another package), then the scriptlet name will be the dependency's name.

When the IDL for a dependency is '*' and the dependency provides multiple scriptlets, then they will be created as sub-commands of the scriptlet:

For example, if "package3" has this as it's package.json:

{   
   "name":"package3",
   "main": "./main.mjs",
   "gash": {
       "./main.mjs#wibble": "IDL=1 wibble :: wibble()", 
       "./main.mjs#wobble": "IDL=1 wobble :: wobble()"
   } 
} 

Then the command wibble and wobble will be accessible as package3 wibble and package3 wobble (PKT-MULTI)

IDL Names (Internal Names)

The IDL also gives a name for the scriptlet. This must match the "external" scriptlet name.

For example, if a script is called "./script.mjs" then the IDL can't be "IDL=1 cmd :: default()" (NAM-ESM) it has to be something like "IDL=1 script :: default()". Likewise this package is misnamed: (NAM-PKG)

{
    "name": "cmd",
    "main": "./main.mjs",
    "gash": "IDL=1 command :: default()"
}

Again, the "gash" key has to be something like "IDL=1 cmd :: default()".

However the IDL can use "$0" as the scriptlet name, and that will always take on the external name; (NAM-$0) for example, the scriplet in this package is called cmd-thing.

{
    "name": "cmd-thing",
    "main": "./main.mjs",
    "gash": "IDL=1 $0 :: default()"
}

Exit codes

gash exits with the following codes:

codemeaning
0The command line was successful.
1The command line was executed without an exception, but return false.
2The scriptlet itself threw an exception - i.e. user code. (EXIT-USER-EX)
5Some other error.

builtins

Gash provides a number of builtin scriptlets. As follows:

template

template (JSON|FILE) TEMPLATE_FILE

The template builtin evaluates the contents of TEMPLATE_FILE as if it were a string quoted with `` - i.e. a javascript template literal. (JSON|FILE) is a dicionary that provides the variables used for substitution.

For example, given a file vars.json:

{
    "ghost": "Boo!"
}

And a template file, template.txt:

I said to the ghost, "${ghost}" 

Then the output of gash '"template vars.json template.txt"' will be I said to the ghost, "Boo!" (BUILTIN-TEMPLATE-FILE)

If the json has come from another scriptlet, then it can be piped to template with the file topic; e.g. json-outputting-cmd | template - template.shtml (BUILTIN-TEMPLATE-FILETOPIC)

IDL

An IDL string looks like this:

IDL=<version> <usage> :: <binding>
  • <version> has to be 1.
  • <usage> describes the command line options and parameters and defaults.
  • And the <binding> describes how to invoke the javascript.

Overview of Usage

A typical usage string might be:

cmd [--show-line-numbers] [--max=COUNT] [--pattern=(regex|glob)] TEXT [SOURCE_FILE...] RESULT_FILE

It's roughly as you'd expect:

  • The scriptlet name must match the "external" name. or be the special "$0". Because of this it's allowed to be mixed case. (NAM-MIX)
  • Long options must be lower case - separated by hyphens ("kebab case"). They must come before any positionals. Options without values (booleans) can have an "--no-" to turn it off.
  • UPPERCASE text indicates values the user must supply, and must match a type - see below. In some cases, "unions" are supported, e.g. (HTML_FILE|TEXT_FILE).
  • Square brackets indicate optional values. Positionals can be nested, as long, as its unambiguous.
  • An elipsis indicates a positional is repeated (e.g. FILE...). (RE-POS RE-OUT RE-IN) Options can also be repeated, but it must be optional and the elipsis must be outside the square brackets (e.g. [--option=TEXT]...) (MULT-OUT MULT-IN) Repeatd booleans (e.g. [--verbose]...) become a count of the number of times applied.
  • Lower case text (that's not an option) is literal text. Brackets and vertical bars can be used to create alternatives (i.e. in the above example the --pattern option, if present, must take the value "regex" or "glob").
  • Short options aren't show above, but they can created as (-c|--long-value).
  • Help text can be placed after the scriptlet name and options with a "-- " comment:
    cmd          -- This does some important stuff. 
      [--option] -- Turn the optional thing on.

You can also place a description of the comamnd after "IDL=1" instead of after the scriptlet name. Positionals currently can't receive help text.

  • It's possible to set bash-style "environment vars" before the scriptlet name, for example OUTPUT_FORMAT=Table calc-report SOURCE_FILE. These supplied default text values. Can be useful for platform options and where configuring an option is hard.

Overview of Binding

A typical binding string might be:

someFunction( $text, $2, { maxCount = 4, showLineNumbers, pattern = "text" }) -> $3

This is a function call - but with defaulting (like a normal function declaration) and some historic magic.

  • There are two forms of binding: a function-form and an expression form. In both cases, the exports of the module are in scope. (With any default export available as default.)

    1. For the function-form, the binding must begin with a call to a function. (Historic quirk: it's possible to omit the name and default was assumed. Don't do that.) But there can't be anything after the function call. And it must be the only export that is used.
    2. The expression-form begins with(), and the with must include all the identifiers from the module that will be used. A normal expression can then follow (e.g. with(default,TYPE_ONE) default( $1 = TYPE_ONE ) or with(Archive) new Archive($1).method($2)) This is really new, and probably full of bugs.

  • The values of any command-line options are in scope as camel-case names. Optional values must have a value supplied via defaulting or ??.

  • The positonals are available as the customary $1, $2, etc... numbered as per the order they occur in the binding.
  • If a type name only occurs once as a positional, then it is available as camel cased name with a leading $. (So, in this case, $1 and $text are the same variable, as are $sourceFile and $2 and $resultFile and $3. (BIND-IPOS BIND-SUB)
  • stdin/stdout is combined in a magical FILE-like object called $-.
  • The = operator is not used for assigment but for "defaulting" (as per function declaration); i.e. it assigns only if the value doesn't already have a value.
  • When the output of a command is to be stored in a file supplied on the command line, use arrow-assigment (-> $3). But if the output is going to stdout, you don't need this.
  • All operators are missing - the only ones supported are ?? and ?: (as well as . and []). Some maths is available via Math.sum(), Math.product(), Math.bits(), Math.reciprocal(), Math.neg(), Math.div(), and Math.quotient(). (But why are you doing complex arithmetic in glue code?) And Math.equal() will say if two numbers are equal.
  • ??is actually a catch operator.
  • The proposed pipeline operator |> is available. Also, topics can be numbered, with '%1' being the output of the original function:
    IDL=1 
    cmd ARCHIVE_FILE FILE... 
    :: with (Archive)
         new Archive()
         |> $2.forEach( file => %.add( file ) )
         |> %1.toUint8Array() -> $archiveFile'
  • It's an async context with await available.
  • The unary '' operator is equivalent to [Symbol.iterator] and iterators have a map on their prototype. (So `files.map( f => f.name,f.size)` returns an iterator that iterates the names and sizes of files.)

Boolean options

Options that don't take a value are called booleans options.

export const gash=`IDL=1 boolie [--flag] :: default(flag);
export default function(flag) { return flag ? "enabled" : "disabled" }

A boolean's lexical variable defaults to false (e.g. flag in the above is false and gash boolie returns "disabled") (BOOL-UNSET) and becomes true, when the option is present in the command-line invocation (e.g., gash boolie --flag returns "enabled".) (BOOL-SET)

However booleans options that begin --no- have the "no"-prefix removed from their lexical name and the usual semantics are reveresed. So for example:

export const gash=`IDL=1 nolly [--no-can-do] :: default(canDo);
export default function(canDo) { return canDo ? "done" : "You're kidding me, right?!" }

in the above, gash nolly retuns "done" (NO-UNSET) and gash --no-can-do returns "You're kidding me, right?!" (NO-SET).

Tristate

It's possible to declare both a boolean option and it's negation in the usage; for example:

export const gash=`IDL=1 tristate [--no-state] [--state] 
:: default((state?1:-1) ?? 0)`;

In these cases, either the option (e.g., gash tristate --state (TRISTATE-TRUE)) or it's negation (e.g. gash tristate --no-state (TRISTATE-FALSE)) can be supplied on the CLI. (But not both. {})

If the option is not supplied, then it will be "missing" and a default must be set. (TRISTATE-ABSENT) (Unfortunately, the lack of a default in the binding can only be picked up at run time.) (FYI Missing values parse through the ternary operator and are caught by ?? - which is why (state?1:-1)??0 works.)

Booleans options can only occur once in the usage. (For example, IDL=1 cmd [--flag] [--flag] throws.) (NODUP-BOOL)

Recurring

"Boolean" options may be repeated:

export const gash = `IDL=1 loggable [(--verbose|-v)]... :: default({verbose})`; 

Their lexical value becomes the number of time they're repeated on the command-line - defaulting to zero if absent. (For example, loggable is called with {verbose:0}, loggable --verbose -v is called with {verbose:2} and loggable -vvv is called with {verbose:3}.) (MULTIBOOL-*)

STRING, STR and TEXT positionals and options.

Positional parameters can be declared as STRING, STR, or TEXT:

export const gash=`IDL=1 cmd STR STRING TEXT :: default($1,$2,$3);
export default function(...strings) { return strings }

Each is passed to the javascript as \ value.

Likewise, options can be given STRING, STR, or TEXT values:

export const gash=`IDL=1 cmd --scheme=STR --host=STRING --path=TEXT :: default(scheme,host,path);`
export default function(scheme,host,path) { return `${scheme}://${host}/${path}` }

When options are genuinely optional (i.e. enclosed in square brackets) using omitted values is an error; so a default must be supplied:

export const gash=`IDL=1 
     cmd [--scheme=STR] [--host=STRING] [--path=TEXT] 
  :: default(scheme = 'https:', host = '127.0.0.1', path = '' )`;

Methods

Inside gash all strings have the following extra methods ("flying monkeys"):

hash([seed])
  • seed \ The seed to use. Defaults to a (non-cryptographic) random number.
  • Returns: \

Calculate a fast, non-cryptographic hash of the string. (It's currently the murmur3 32bit hash of the string's utf8 encoding.)

{
    "gash": [
        "IDL=1",
        "shuffle --seed=NUMBER STRING...", 
        ":: with( )",
        "    $1.toSorted( ( a, b ) => Math.diff( a.hash( seed ), b.hash( seed ) ) )",
    ]   
}

FILE, FILE_NAME and FILENAME positionals and named options

Positional and named options can declared can be declared as FILE, FILE_NAME and FILENAME.

FILE values are passed to the javascript as \ objects (where available) or \ (platforms without buffer) - see File.toBuffer() (FILE-REAL)
FILE_NAME and FILENAME options are passed to the javascript as strings. (FN-REAL)

However, inside gash (i.e. in the binding) they all have the same API, roughly modelled on the File API File and set out below:

Properties

name

A string containing the basename of the file. (For example, a scriptlet declares as IDL=1 cmd FILE :: with() $1.name called as cmd some/path/to/file.txt will return file.txt) (FILE-NAME)

webkitRelativePath

A string containing the name the user supplied on the command line. (For example, a scriptlet declared as IDL=1 cmd --option=FILENAME :: with() option.webkitRelativePath and called as cmd some/path/to/file.txt will return some/path/to/file.txt) (FILE-RELPATH)

fullPath

A string containing the absolute path (For example, a scriptlet declared as IDL=1 cmd FILE_NAME :: with() $1.fullpath and called as cmd some/file.txt might return return c:\\dir\\some\\file.txt) (FILE-FULLPATH) (This property comes from FileSystemEntry.)

ext

A string containing the file's extension including the leading period - exactly as returned by node's extname() command. (For example, a scriptlet declares as IDL=1 cmd FILE :: with() $1.ext called as cmd some/path/to/file.txt will return .txt) (FILE-EXT)

Note, if the name begins with a dot, then that doesn't count as an extension. (For example, the previous command, invoked as cmd some/path/to/.configrc returns an empty string.) (FILE-DOTTED-NAME)

Source: Bespoke

extlessName

A string containing the file's basename, sans extension. (For example, a scriptlet declares as IDL=1 cmd FILE :: with() $1.extless called as cmd some/path/to/file.txt will return file) (FILE-EXTLESSNAME)

Note, if the name begins with a dot, then doesn't count as an extension. (For example, the previous command, invoked as cmd some/path/to/.configrc returns .configrc.) (FILE-DOTTED-NAME)

Source: Bespoke

dir

A string containing the directory of the file's absolute path. (For example, a scriptlet declares as IDL=1 cmd FILE :: with() $1.dir called as cmd /some/path/to/file.txt will return /some/path/to) (FILE-DIR) Source: Bespoke

isFile

A boolean that is true if the file exists and is an ordinary file. (FILE-IS) [Source: FileSystemEntry.])

isDirectory

A boolean that is always false (FILE-NOTDIR) [Source: FileSystemEntry]

size

The size in bytes of the file, if it exists. (FILE-SIZE) Or NaN if it doesn't. (FILE-SIZE-MISSING}]

lastModified

The number of milliseconds (probably excluding leap seconds) since 1st January 1970, if the file exists. (FILE-DATE) Or NaN if it doesn't. (FILE-DATE-MISSING}]

type

A string containg the mimetype of the file. (FILE-TYPE) This is currently deduced from the filename alone, so will be defined even if the file doesn't exists. (FILE-TYPE-MISSING) And if the file name has no extension (FILE-TYPE-NOEXT) or the extension can refer to multiple file types (e.g. .ico), or the extension is not known, then the empty string is returned.

Most of this is consistent with what browsers do; see MDN.

Methods

(NB, the <type>() methods return promises---thanks File---and the to<Type>() methods return synchronous results.)

toArrayBuffer()

Return the content of the file a \ (FILE-TO-ARRAYBUFFER)

On node 22, this invariably involves creating a copy of the buffer. (Because the buffers return by readFileSync() are partial views of the underlying ArrayBuffer and so the whole ArrayBuffer has to be copied.) Using toBuffer() avoids that.

arrayBuffer()

Asynchronous version of toArrayBuffer(); i.e. return the contents of toArrayBuffer() wrapped in a proimise. (FILE-ARRAYBUFFER)

toBuffer()

Return the content of the file as a \ (node.js) or an \ (other platforms.) (For example, a scriptlet declared as IDL=1 cmd FILE :: with() $1.toBuffer() and called as cmd some/file.txt would return the contents of some/file.txt as a Buffer) (FILE-TO-BUFFER)

If you try and pass a File instance from gash to a javascript function, then this method will be automaticall used. (For example, IDL=1 cmd FILE :: with() $1 returns the same as IDL=1 cmd FILE :: with() $1.toBuffer()) This is a process known as type realisation. This is partly historic (File wasn't in node when gash was created) and also because the object used by gash isn't a File. But 9-times-out-of-10, it's what's wanted and a real File is a asynchronous hell hole. (FILE-REAL)

buffer()

Return the contents of file.toBuffer() wrapped in a promise. (FILE-BUFFER) For symmetry.

database([TABLE_NAME])

This returns an sqlite3 database. The following files are supported:

  • .sq3 Opened as an sqlie3 table for read/write.
  • .csv Converted into a :memory: database using TABLE_NAME as the name of the table (default "data"). No write.
  • .json This must be an array of objects. It will be converted into a :memory: database with a table called TABLE_NAME and there will be a column for each field in the object; exactly as if you called console.table() on it. No write.

Currently, this uses better-sqlite3 and relies on that being available. But the plan is to transition to node's sqlite implementation once that becomes stable.

toJSON()

A convenience method that parses the contents of the file with JSON.parse(). (FILE-TO-JSON)

json()

A promise containing the contents of the file parsed via JSON.parse(). (FILE-JSON)

toLines()

Return an iterator that iterates over the lines in the file, returning the text content of each line. The line seperator can be \n or \r\n or EOF. It's not included the text returned. (FILE-TO-LINES)

lines()

Return an async iterator that iterates over the lines in the file. Apaert from async, it's the same as toLines() (FILE-TO-LINES)

toText()

Return the content of the file as utf8 encoded string. (For example, a scriptlet declared as IDL=1 cmd FILE :: with() $1.toText() and called as cmd some/file.txt would return the contents of some/file.txt) (FILE-TO-TEXT)

text()

For compatibility with File, this method return the contents of file.toText() wrapped in a promise. (FILE-TEXT)

toURL()

Return the full path to the file as a file: url (FILE-URL}, even if the file doesn't exist. (FILE-URL-MISSING)

(JSON|FILE)

Positional and named options can declared can be declared as (JSON|FILE) This can be either literal JSON, or a file which is parsed as JSON.

On command line:

When invoked from the command line, gash will parse the argument as JSON, iff it begins { or [. Otherwise it's treated as a filename.

For example, gash '"template `{}` file.txt"' Passes template an empty dictionary (i.e the javascript object {}) and not the contents of a file called {} (and, yes, that is a valid filename). On the other hand, gash '"template true file.txt"' will pass template the contents of a file called true and not the literal value true.

Passing literal JSON is virtually impossible thanks to the quoting minefield. But if you really must:

  • From a POSIX shell: gash 'template `{"key":"value string"}` template-file.txt' is probably easiest. (LITERAL-JSON-DQ)
  • From windows cmd.exe, gash "template `{""key"":""value string""}` template-file.txt" should work. (No test)
  • And if you want it to be cross-platform, it's gash '"template `{\x22key\x22:\x22value string\x22}` template-file.txt"' (LITERAL-JSON-HEX)
  • You can cut down on quotes by using expressions; e.g. gash '"template ${ ( {key:`value string`,count:2} ) } template-file.txt"'. Note the need for an extra set of parenthesis inside ${} and the use template literals rather than quoted strings. That's cross platform. (And the spaces inside ${} are for exposition and can be omitted.) (LITERAL-JSON-EXPR)

From Javascript:

The contents of a (JSON|FILE) argument are passed to javascript as the JSON object.

Small print: Actually, inside IDL, the object is either the literal JSON or a FILE (as described above) with all the file methods. If it's a file (including the file topic) it's altered so it undergoes type realisation via the toJSON() method. Meanwhile, literal JSON probably ought to be wrapped in JSON.parse(JSON.stringify(value)) to confirm it's valid JSON, but that doesn't happen (for now) so just about anything can be passed through...

Other Recognised Types

These exist by remain undocumented:

TypeJSDesc
INT, INTEGERnumber bigintAn integer - can end up as either an Number or BigInt
COUNTnumberAn unsigned integer - can end up as either an Number or BigInt.
NAMEstringA string, but with different semantic connetations.
URLWhatWG/node URL
JSONobject
DATEDate
DIR?Partly FileSystemDirectoryEntry, partly File.
DIRNAME, DIR_NAMEstringBut with likely special rules
SCRIPTLET?This will be resolved to a scriptlet; resolve is IDL=1 resolve [--idl] SCRIPTLET :: with() $1[idl?'idl':'moduleUrl'].toString()

Subtypes

You may prefix a type name by an arbitary (uppercase) string followed by an underscore. E.g. COUNT could be BYTE_COUNT or RECORD_COUNT. (These are called "subtypes".)

Subtypes may have special meanings. For files, it's the extension; e.g. PNG_FILE means a *.png file is expected.

Index Types

Positional arguments may append a number as a suffix to the type e.g. cmd FILE1 FILE2. (But not cmd --file=FILE1.)

Type Unions

Its possible to combine types with brackets and vertical bars, e.g. (TXT_FILE|HTML_FILE) to indicate the value must be one of the listed types.

It's always possible to combine subtypes, as a union e.g. (PNG_FILE|JPEG_FILE). Most others are currently not possible. But (FILE|URL) is specially handled (as is the even more exotic (FILE|DIR).) You will need to figure out ways to handle this in the binding.

Type realisation

Type realisation is a behind-the-scenes cast which gash inserts to turn its internal types into the publically declared one.

For example in this code:

{
  "name": "cmd",
  "gash": "IDL=1 cmd GLOB :: callback($glob)" 
} 

Callback is called with a string. But globs aren't stored internally as strings - they are instances of a class called GlobPattern and un unwritten $glob.toString() call is injected. That is "type realisation".

The most obvious example of this is integers. Integers are stored in a class called PotentiallyUnsafeInteger but in this code:

{
  "name": "repeat",
  "gash": "IDL=1 repeat COUNT :: callback($1)" 
} 

$1 is realised as an ordinary javascript number - provided it's not too big. But if COUNT is too big to be accurately represented as an integer (e.g. repeat 9007200254740992) then an error will be thrown.

However it can be cast via toNumber() and toBigInt(). For example:

{
  "name": "repeat",
  "gash": "IDL=1 repeat COUNT :: callback($1.toNumber())" 
} 

And now repeat 9007200254740992 will be called with the Number(9007200254740992). (And so will repeat 9007200254740993 be called 9007200254740992 - because a double can't represent 9007200254740993.)

Alternatively, if this matters, cast it to BigInt.

{
  "name": "repeat",
  "gash": "IDL=1 repeat COUNT :: callback($1.toBigInt())" 
}   

repeat 9007200254740992 and repeat 9007200254740993 are now called with 9007200254740992n and 9007200254740993n.

Flying Monkeys

If you're a javascript programmer, you'll have heard of monkey patching. (Don't do it, kids. Just don't do it.)

A flying monkey is a monkey patch that doesn't land. Instead of patching the object, there's a flying monkey that watches the object and swoops in and steals the reference when you try to use it. But it exists only in gash's other-side-of-the-rainbow realm; so you may interact with the flying monkey's shadow, but never reach it.

Which is a poetic way of saying it's a API hack which is only available from inside a gash binding.

For example,

{
  "name": "repeat"
  "gash": 
     "IDL=1 cmd URL :: default(fetch($url).buffer())" 
   
} 

fetch() returns a normal Response object. And no Response.prototype.buffer has been added. Instead a flying monkey swoops in and implements that convenience for you (a shortcut for Buffer.from(await fetch($1).arrayBuffer())).

It some cases, flying monkeys can be whole types. For example, the File class is a flying monkey. If that's the case, the class will undergo type realisation before being passed to your code.

Current list of flying monkeys:

FunctionCode
Response.prototype.buffer() => Buffer.from(await this.arrayBuffer())
Iterator.prototype.mapfunction*(callback){for (const value of this) yield callback(value)}

@option decorator (aka 'Inline Options')

The @option "decorator" can be used in the bindings to create options without declaring them in the usage. For example:

export const gash = `IDL=1 
cmd :: 
default( @option(File) file, @option count = 4, { @option flag = false })` 

is equivalent to:

export const gash = `IDL=1 
cmd [--count=INT] [--flag] --file=FILE  
:: default( file, count = 4, { flag })` 

@option decorates the identifier; so for example:

export const gash = `IDL=1
cmd 
:: default( {useThing: @option thing } )`

is fine.

The argument to the @option decorator is the options type. If you default an option, the types must agree. (@OPTION-TYPEMATCH) For example, this is illegal:

export const gash = `IDL=1 cmd :: default(@option(String) foo = 4)`;

Globals Objects

Invocation Globals

These globals are available to scriptlets that are executed by gash.

Console

All scripts executing under gash, have globalThis.console remapped to a subtyped version of node's Console class. All messages are written to stderr (whereas, on node, some console messages write to stdout and some to stderr). See --log=FILE on how to remap this

It adds three new method:

console.status?.( [data],[...args] )

console.status?.() writes a single line "status" message to the output stream.

When writing to the screen, the status message overwrites previous status messages, but isn't overwritten by other log messages (which appear above it).

When console is writing to a file, this message is not recorded - unless console.statusFlush?.() is called.

It's intended to give feed back to the user on what the script is doing (e.g. the number of bytes downloaded or the percentage of the job complete).

If a scriptlet executes successfully, console.statusClear() is called. If an error occurs, console.statusFlush() is called.

You should always call this as console.status?.() so that your script works with non-gash implementations.

console.statusClear?.()

Erases the status message.

console.statusFlush?.()

This "flushes" the current status message to the regular log and clears the status line. (If console is writing to a file, then the message will be written to the log. When writing to the terminal, it will start scrolling up the screen and new status and log messages will appear below it.)

Use this once a task has completed to record the final state (e.g. the bytes downloaded). (Although a more robust implementation would call console.statusClear?.() and then use console.log() to record a message that all consoles can use.)

Bind Globals

The following variables are available during invocation:

EOL

Set to the OS line ending. (?To do: --output-line-separator?)

fetch

gash has a very hacked implementation of fetch that can handle file urls. This is used internally. And, if needed, it can be passed through as an argument.

//! examples/cat.mjs
export const gash = "IDL=1 cat FILENAME :: default($1.toURL(),{fetch})";

export default async function cat( url, {fetch} ) {
    return await (await fetch(url)).text();    
}

gash

There is a gash global that allows access to subshells.

For example, this will cause gash all to run both foo and bar:

{
    "gash": {
        "./foo.mjs": "*",
        "./bar.mjs": "IDL=1 bar :: default()",
        "#all": "IDL=1 all :: gash('foo && bar')"
    }
}

But it's real strength is allowing scripts that leverage gash's argument parsing. For example, this means you can do gash start or gash start --port=99:

{
    "gash": {
        "#start": 
           "IDL=1 start [--port=INT] :: gash(['xwh','--safe-call',`${port??env.PORT??80}`])" 
    },
    "scripts": {
        "start": "gash start"
    }
}

(This also allows defaulting of positionals which currently isn't possible any other way.)

The gash global expects either string (SH-STR) or an Array of strings(SH-ARRAY-*) as an argument. Note, that most parameters will be instanced, so this would be broken as port is a number, not a string:

{
    "gash": {
        "#start": "IDL=1 start --port=INT :: gash(['xwh',port])" 
    }
}

(This will be fixed at some point.)

The gash global only exists for blank modules. It can't be called using the with()(SH-NO-WITH) The return type is magical and shouldn't be messed with. (Tested needed)

sessionId

This is always set to the NIL UUID 00000000-0000-0000-0000-000000000000 (Although that may change in the future.) It's provided for compatibility with the web host, xwh.

Changes

1.0.20250311

  • FIXES some glaring typos in the introductary examples, and rewords them to hopefully make them clearer.

1.0.20250310

  • FIXES help throwing errors for scriptlets with mandatory options (e.g. gash help echo-value no longer throws).

  • FIXES: help calling us js-hell; it nows calls us gash ;)