1.0.11 ā€¢ Published 10 months ago

@disane-dev/private-ci-test v1.0.11

Weekly downloads
-
License
MIT
Repository
github
Last release
10 months ago

Document scraper for getting invoices automagically as pdf (useful for taxes or DMS)

šŸ  Homepage

Prerequisites

  • npm >=9.1.2
  • node >=18.12.1

Configuration

All settings can be changed via CLI, env variable (even when using docker).

SettingDescriptionDefault value
AMAZON_USERNAMEYour Amazon usernamenull
AMAZON_PASSWORDYour amazon passwordnull
AMAZON_TLDAmazon top level domainde
AMAZON_YEAR_FILTEROnly extracts invoices from this year (i.e. 2023)2023
AMAZON_PAGE_FILTEROnly extracts invoices from this page (i.e. 2)null
AMAZON_ONLY_NEWTracks already scraped documents and starts a new run at the last scraped onetrue
FILE_DESTINATION_FOLDERDestination path for all scraped documents./documents/
FILE_FALLBACK_EXTENSIONFallback extension when no extension can be determined.pdf
DEBUGDebug flag (sets the loglevel to DEBUG)false
SUBFOLDER_FOR_PAGESCreates subfolders for every scraped page/pluginfalse
LOG_PATHSets the log path./logs/
LOG_LEVELLog level (see https://github.com/winstonjs/winston#logging-levels)info
RECURRINGFlag for executing the script periodically. Needs 'RECURRING_PATTERN' to be set. Default truewhen using docker containerfalse
RECURRING_PATTERNCron pattern to execute periodically. Needs RECURRING to true*/30 * * * *
TZTimezone used for docker enviromentsEurope/Berlin

Install

npm install

Usage

$ npm install -g @disane-dev/private-ci-test
$ docudigger COMMAND
running command...
$ docudigger (--version)
@disane-dev/private-ci-test/1.0.11 linux-x64 node-v18.16.1
$ docudigger --help [COMMAND]
USAGE
  $ docudigger COMMAND
...

docudigger scrape all

Scrapes all websites periodically (default for docker environment)

USAGE
  $ docudigger scrape all [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l <value>] [-c <value> -r]

FLAGS
  -c, --recurringCron=<value>  [default: * * * * *] Cron pattern to execute periodically
  -d, --debug
  -l, --logPath=<value>        [default: ./logs/] Log path
  -r, --recurring
  --logLevel=<option>          [default: info] Specify level for logging.
                               <options: trace|debug|info|warn|error>

GLOBAL FLAGS
  --json  Format output as json.

DESCRIPTION
  Scrapes all websites periodically

EXAMPLES
  $ docudigger scrape all

docudigger scrape amazon

Used to get invoices from amazon

USAGE
  $ docudigger scrape amazon -u <value> -p <value> [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l
    <value>] [-c <value> -r] [--fileDestinationFolder <value>] [--fileFallbackExentension <value>] [-t <value>]
    [--yearFilter <value>] [--pageFilter <value>] [--onlyNew]

FLAGS
  -c, --recurringCron=<value>        [default: * * * * *] Cron pattern to execute periodically
  -d, --debug
  -l, --logPath=<value>              [default: ./logs/] Log path
  -p, --password=<value>             (required) Password
  -r, --recurring
  -t, --tld=<value>                  [default: de] Amazon top level domain
  -u, --username=<value>             (required) Username
  --fileDestinationFolder=<value>    [default: ./data/] Amazon top level domain
  --fileFallbackExentension=<value>  [default: .pdf] Amazon top level domain
  --logLevel=<option>                [default: info] Specify level for logging.
                                     <options: trace|debug|info|warn|error>
  --onlyNew                          Gets only new invoices
  --pageFilter=<value>               Filters a page
  --yearFilter=<value>               Filters a year

GLOBAL FLAGS
  --json  Format output as json.

DESCRIPTION
  Used to get invoices from amazon

  Scrapes amazon invoices

EXAMPLES
  $ docudigger scrape amazon

Docker

docker run \ 
  -e AMAZON_USERNAME='[YOUR MAIL]' \ 
  -e AMAZON_PASSWORD='[YOUR PW]' \
  -e AMAZON_TLD='de' \ 
  -e AMAZON_YEAR_FILTER='2020' \
  -e AMAZON_PAGE_FILTER='1' \
  -e LOG_LEVEL='info' \
  -v "C:/temp/docudigger/:/home/node/docudigger" \
  ghcr.io/disane87/docudigger

Dev-Time šŸŖ²

NPM

npm install
[Change created .env for your needs]
npm run start

Author

šŸ‘¤ Marco Franke

šŸ¤ Contributing

Contributions, issues and feature requests are welcome!Feel free to check issues page. You can also take a look at the contributing guide.

Show your support

Give a ā­ļø if this project helped you!


This README was generated with ā¤ļø by readme-md-generator

1.0.11

10 months ago

1.0.10

10 months ago

1.0.9

10 months ago

1.0.8

10 months ago

1.0.7

10 months ago

1.0.6

10 months ago

1.0.5

10 months ago

1.0.4

10 months ago

1.0.3

10 months ago

1.0.2

10 months ago

1.0.1

10 months ago