target-clickhouse v2.6.1
Target Clickhouse
A Singer target for Clickhouse, for use with Singer streams generated by Singer taps, written in node js using singer-node.
Usage
Install
As npm package on host
npm install -g target-clickhouse
Docker image
docker pull ghcr.io/biron-bi/target-clickhouse
Run
Create a config file
config.jsonwith connection information and ingestion parameters.{ "host": "localhost", "port": 8123, "database": "destination_database", "username": "user", "password": "averysecurepassword" }Run
target-clickhouseagainst a Singer tap.
In the following exemples:
We echo state at the end of a 'state.jsonl' file
The file current_state.json contains last line of state.jsonl
The file config.json contains clickhouse connection informations
Npm package:
<tap-anything> --state current_state.json | target-clickhouse --config config.json >> state.jsonlDocker:
In this exemple, container reads config file in a /config directory
<tap-anything> --state current_state.json | docker run --rm -i -a STDIN -a STDOUT -a STDERR -v "$(pwd):/config:ro" ghcr.io/biron-bi/target-clickhouse --config /config/config.json >> state.jsonlConfig.json
The fields available to be specified in the config file.
Mandatory fields
hostportusernamepassworddatabase
Optional fields
logging_levelDefault to"INFO"subtable_separatorDefault to"__"translate_values: Whether fields should be parsed again to allow conversion of specific values, e.g.Trueaccepted astrue. Defaultfalsebatch_size: Amount of records to read before sending to clickhouse. Default100finalize_concurrency: Amount of concurrent stream ingestion finalisation. Default3extra_active_tables: List of tables that are considered active even if not present in ACTIVE_STREAMS message. Default[]finalize_concurrency
Singer specification extension
Several features are supported that are not standard to the singer Spec:
- Update schemas : Pass the repeatable CLI option
--update-streams <stream>to specify streams for which you want to recreate tables (root and children). - Clean first : Specify
clean_first: truein SCHEMA messages to wipe table content before each ingestion. - Cleaning column : Specify
cleaning_column: "<column_name>"in SCHEMA messages to wipe table content that matches column value during ingestion. For instance, if column "date" is specified as cleaning column, and the value "2022-01-01" is encountered in a record, all rows with values "2022-01-01" are replaced with those contained in the stream - All key properties : Specify
all_key_properties: {props: [], children: {}}in SCHEMA messages to specify primary keys for all children of a root table. This will allow children to create a foreign key to their parent (with the format_parent_<column>)
Sponsorship
Target Clickhouse is written and maintained by Biron https://birondata.com/
Acknowledgements
Special thanks to the people who built
License
Distributed under the AGPLv3
1 year ago
1 year ago
1 year ago
1 year ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
2 years ago
3 years ago
3 years ago
3 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago