s3-migrate v0.0.4
s3-migrate ๐
A command-line tool to migrate objects from one S3-compatible storage bucket to another. Supports resuming interrupted transfers using a SQLite state file.
This tool can be useful in the following scenarios:
- You want to copy an entire bucket from an account to another, and you need two different set of credentials for the source and destination accounts.
- You want to migrate objects from an S3-compatible service to another, such as from DigitalOcean Spaces to AWS S3 or vice versa.
!NOTE\ This project is (currently) intended for a one-off migration, not to keep 2 buckets in sync.
!WARNING\ The experimentalโข๏ธ nature of this project is to be taken very seriously. This is still a new project which has been tested only against a limited number of use cases and configuration. It might not work 100% with your specific configuration. Please use it with caution and report any issues you might find. Even better, consider opening a PR to improve it if you find any bug or a missing feature! ๐
Features โจ
- Supports different AWS accounts, regions, and even S3-compatible services (it uses the AWS SDK under the hood but with the right configuration it should theoretically work with any S3-compatible service such as DigitalOcean Spaces, MinIO, Cloudflare R2, Backblaze B2, etc.)
- Uses Node.js streams for efficient transfers (Data is transfered directly from the source to the destination without buffering the entire object in memory)
- Allows stopping and resuming with a local SQLite database
- Graceful shutdown on Ctrl+C
- Configurable concurrency level and chunk size for memory / performance tuning
- Progress bar and ETA for the copy process
Installation ๐ฆ
Using npx, pnpm dlx, or yarn dlx
You can use the package directly without installing it globally with npm, pnpm, or yarn:
npx s3-migrate <command> [options]
pnpm dlx s3-migrate <command> [options]
yarn dlx s3-migrate <command> [options]
Installing as a global binary
If you prefer, you can also install the package globally with npm, pnpm, or yarn:
npm install -g s3-migrate
pnpm add -g s3-migrate
yarn global add s3-migrate
Now you can run the s3-migrate
command from anywhere in your terminal.
Usage ๐ ๏ธ
STEP 0: Credential Configuration (Environment Variables) ๐
Set the environment variables for source and destination credentials:
SRC_AWS_ACCESS_KEY_ID
: Your source AWS access key IDSRC_AWS_SECRET_ACCESS_KEY
: Your source AWS secret access keySRC_AWS_REGION
: Your source AWS regionSRC_AWS_SESSION_TOKEN
: (Optional) Your source AWS session tokenDEST_AWS_ACCESS_KEY_ID
: Your destination AWS access key IDDEST_AWS_SECRET_ACCESS_KEY
: Your destination AWS secret access keyDEST_AWS_REGION
: Your destination AWS regionDEST_AWS_SESSION_TOKEN
: (Optional) Your destination AWS session tokenSRC_ENDPOINT
: (Optional) Custom endpoint for the source S3-compatible serviceDEST_ENDPOINT
: (Optional) Custom endpoint for the destination S3-compatible service
!TIP\ All the variables prefixed with
SRC_
orDEST_
will fallback to the respective variable without the prefix if not found (which means that you can useAWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
instead ofSRC_AWS_ACCESS_KEY_ID
andSRC_AWS_SECRET_ACCESS_KEY
orDEST_AWS_ACCESS_KEY_ID
andDEST_AWS_SECRET_ACCESS_KEY
).!TIP\ This project automatically loads
.env
files in the current directory. You can create a.env
file with your values:SRC_AWS_ACCESS_KEY_ID=your-source-key SRC_AWS_SECRET_ACCESS_KEY=your-source-secret SRC_AWS_REGION=your-source-region DEST_AWS_ACCESS_KEY_ID=your-dest-key DEST_AWS_SECRET_ACCESS_KEY=your-dest-secret DEST_AWS_REGION=your-dest-region SRC_ENDPOINT=your-source-endpoint (optional) DEST_ENDPOINT=your-dest-endpoint (optional)
STEP 1: Catalog Objects ๐
The first step is to catalog the objects in the source bucket:
s3-migrate catalog --src-bucket-name my-source-bucket --state-file migration.db
This command fetches the list of objects and stores them in a state file called
migration.db
.
A state file is essentially a SQLite database that keeps track of the objects that need to be copied. This file is used to resume the migration process in case it gets interrupted. It is also used to keep track of how many bytes have been copied so far and to give you an estimate of how much time is left.
STEP 2: Copy Objects ๐ฆโก๏ธ๐ฆ
Once you have cataloged the objects, you can start copying them to the destination bucket:
s3-migrate copy --src-bucket-name my-source-bucket --dest-bucket-name my-dest-bucket --state-file migration.db
This command transfers uncopied objects and it will display a progress bar indicating the amount of objects copied and the total number of bytes transferred.
Sorting Options
You can sort the objects to be copied using the --sort-by
and --sort-order
options. The --sort-by
option accepts the following values: key
, size
,
etag
, last_modified
. The --sort-order
option accepts asc
or desc
.
This allows you, for example, to prioritize files that have been modified more
recently or smaller or bigger files. In the following example we sort by
last_modified
in descending order, to upload the most recently modified files
first:
s3-migrate copy --src-bucket-name my-source-bucket --dest-bucket-name my-dest-bucket --state-file migration.db --sort-by last_modified --sort-order desc
Checksum Options
You can use the --checksums-when-required
option to initialize S3 clients with
the checksums
options (requestChecksumCalculation
and
responseChecksumValidation
) set to 'WHEN_REQUIRED'
.
This can be useful if you see XAmzContentSHA256Mismatch
errors during copy,
especially with some specific S3-compatible services (e.g. Aruba Object
Storage).
s3-migrate copy --src-bucket-name my-source-bucket --dest-bucket-name my-dest-bucket --state-file migration.db --checksums-when-required
Graceful Shutdown ๐
Press Ctrl+C
during the copy process to stop it safely. Any file copy in
flight will be completed before the process exits. The state file will be
updated with the progress made so far.
!NOTE\ Depending on the size of the objects being copied, it might take a few seconds before the process exits. Please be patient.
Running the command again will resume from where it left off.
Performance Tuning โ๏ธ
Here are some things you can do to try to improve the transfer performance in case you are transferring a large number of objects and/or you are dealing with large objects.
Tweak Concurrency โก๏ธ
You can adjust the concurrency level to optimize the performance of the copy
process. By default, the tool uses 8 concurrent requests. You can change this by
setting the --concurrency
option:
s3-migrate copy --src-bucket-name my-source-bucket --dest-bucket-name my-dest-bucket --state-file migration.db --concurrency 32
You can experiment with different values to find the optimal concurrency level for your use case.
Tweak Chunk size ๐
You can also configure the chunk size for each request using the
--chunk-size-bytes
option. The default value is 2MB:
s3-migrate copy --src-bucket-name my-source-bucket --dest-bucket-name my-dest-bucket --state-file migration.db --chunk-size-bytes 1048576
A smaller chunk size means more requests to the storage service but less memory usage on the client side. A larger chunk size means fewer requests but more memory usage. You can calculate an indicative memory usage by multiplying the chunk size by the concurrency level.
Run Multiple Concurrent processes ๐ฅ๏ธโก๏ธ๐ฅ๏ธโก๏ธ
You can run multiple concurrent versions of this tool, even on different machines, by using different prefixes and a different state file for every prefix. This allows you to parallelize the migration process and use more networking bandwidth and CPU to speed up the migration.
Here's an example of how you might generate multiple state files using different prefixes as in the following example:
s3-migrate catalog --src-bucket-name my-source-bucket --state-file migration-a.db --prefix "a" # in one shell / machine
s3-migrate catalog --src-bucket-name my-source-bucket --state-file migration-b.db --prefix "b" # in another shell / machine
Then you can
s3-migrate copy --src-bucket-name my-source-bucket --dest-bucket-name my-dest-bucket --state-file migration-a.db # in one shell / machine
s3-migrate copy --src-bucket-name my-source-bucket --dest-bucket-name my-dest-bucket --state-file migration-b.db # in another shell / machine
By using different prefixes and state files, you can distribute the workload across multiple instances of the tool, potentially speeding up the migration process. Note that you might still be subject to rate limits imposed by the storage providers you are reading from or copying to.
Also note that finding a good set of prefixes depends on how you organised your source data. The trick is to try to distribute the workload evenly across the prefixes.
Contributing ๐ค
Everyone is very welcome to contribute to this project. You can contribute just by submitting bugs or suggesting improvements by opening an issue on GitHub. PRs are also very welcome.
License ๐
Licensed under MIT License. ยฉ Luciano Mammino.