Skip to content

Data Management

The current data-client transfer workflow is built around upload, download-single, and download-multiple. Those are the commands most users need.

Uploading Data

Use upload for normal file, directory, or glob uploads.

./data-client upload --profile=mycommons --upload-path=data/sample.bam

Common patterns:

./data-client upload --profile=mycommons --upload-path=data/
./data-client upload --profile=mycommons --upload-path='data/*.bam'
./data-client upload --profile=mycommons --upload-path=data/ --batch --numparallel=5

Useful flags:

Flag Meaning
--upload-path File path, directory path, or glob to upload
--batch Enable parallel uploads
--numparallel Number of concurrent uploads when --batch is enabled
--include-subdirname Preserve subdirectory names in uploaded object names
--metadata Look for [filename]_metadata.json sidecar files and upload metadata too
--bucket Override the target bucket

--metadata is only useful in environments that expose the Shepherd API.

Multipart Uploads

Use upload-multipart when you want to upload one large file explicitly with the multipart path.

./data-client upload-multipart --profile=mycommons --file-path=./large.bam
./data-client upload-multipart --profile=mycommons --file-path=./large.bam --guid=existing-guid

Useful flags:

Flag Meaning
--file-path Local file to upload
--guid Reuse an existing GUID instead of creating a new one
--bucket Override the target bucket

Retrying Failed Uploads

If you already have a failed upload log from a previous run, retry it with:

./data-client retry-upload --profile=mycommons --failed-log-path=/path/to/failed_log.json

Downloading Data

Use download-single for one GUID and download-multiple for a manifest.

Download One File

./data-client download-single \
  --profile=mycommons \
  --guid=206dfaa6-bcf1-4bc9-b2d0-77179f0f48fc \
  --download-path=./downloads

Download From a Manifest

./data-client download-multiple \
  --profile=mycommons \
  --manifest=manifest.json \
  --download-path=./downloads \
  --numparallel=4

The manifest is expected to contain objects with guid fields. download-multiple reads those GUIDs and downloads them in parallel.

Legacy Commands

The binary still contains upload-single and upload-multiple, but the main docs should treat upload as the normal upload entrypoint.