cp: Copying tables

The cp command copies tabular data from a source location to a destination location. For example, we can copy a CSV file into PostgreSQL, replacing any existing table:

dbcrossbar cp \
    --if-exists=overwrite \
    --schema=postgres-sql:my_table.sql \
    csv:my_table.csv \
    'postgres://postgres@127.0.0.1:5432/postgres#my_table'

Or we copy data from PostgreSQL and upsert it into a BigQuery table:

dbcrossbar config add temporary gs://$GS_TEMP_BUCKET
dbcrossbar config add temporary bigquery:$GCLOUD_PROJECT:temp_dataset
dbcrossbar cp \
    --if-exists=upsert-on:id \
    'postgres://postgres@127.0.0.1:5432/postgres#my_table' \
    bigquery:$GCLOUD_PROJECT:my_dataset.my_table

Command-line help


Usage: dbcrossbar cp [OPTIONS] <FROM_LOCATOR> <TO_LOCATOR>

Arguments:
  <FROM_LOCATOR>  The input table
  <TO_LOCATOR>    The output table

Options:
      --if-exists <IF_EXISTS>      One of `error`, `overwrite`, `append` or `upsert-on:COL` [default:
                                   error]
      --schema <SCHEMA>            The schema to use (defaults to input table schema)
      --temporary <TEMPORARIES>    Temporary directories, cloud storage buckets, datasets to use during
                                   transfer (can be repeated)
      --stream-size <STREAM_SIZE>  Specify the approximate size of the CSV streams manipulated by
                                   `dbcrossbar`. This can be used to split a large input into multiple
                                   smaller outputs. Actual data streams may be bigger or smaller
                                   depending on a number of factors. Examples: "100000", "1Gb"
      --from-arg <FROM_ARGS>       Pass an extra argument of the form `key=value` to the source driver
      --from-format <FROM_FORMAT>  For directory- and file-like data sources, the format to assume. If
                                   not specified, `dbcrossbar` will use the file extension to guess the
                                   format
      --to-arg <TO_ARGS>           Pass an extra argument of the form `key=value` to the destination
                                   driver
  -F, --to-format <TO_FORMAT>      For directory-like data destinations, the format to use. If not
                                   specified, `dbcrossbar` will use the destination file extension (if
                                   provided) or `csv`
      --where <WHERE_CLAUSE>       SQL where clause specifying rows to use
  -J, --max-streams <MAX_STREAMS>  How many data streams should we attempt to copy in parallel? [default:
                                   4]
      --display-output-locators    Display where we wrote our output data
  -h, --help                       Print help

EXAMPLE LOCATORS:
    postgres://localhost:5432/db#table
    bigquery:project:dataset.table

Flags

Not all command-line options are supported by all drivers. See the chapter on each driver for details.

--where

Specify a WHERE clause to include in the SQL query. This can be used to select a subset of the source rows.

--from-arg

This can be used to specify driver-specific options for the source driver. See the chapter for that driver.

--if-exists=error

If the destination location already contains data, exit with an error.

--if-exists=append

If the destination location already contains data, append the new data.

--if-exists=overwrite

If the destination location already contains data, replace it with the new data.

--if-exists=upset-on:COL1,..

For every row in the new data:

  • If a row with a matching col1, col2, etc., exists, use the new data to update that row.
  • If no row matching col1, col2, etc., exists, then insert the new row.

The columns col1, col2, etc., must be marked as NOT NULL.

--schema

By default, dbcrossbar will use the schema of the source table. But when this can't be inferred automatically, --schema can be used to specify a table schema:

It's also possible to use a schema from an existing database table:

  • --schema=postgres://localhost:5432/db#table
  • --schema=bigquery:project:dataset.table

Note that it's possible to create a BigQuery table using a PostgreSQL schema, or vice versa. Internally, all schemes are first converted to the internal schema format.

--temporary

Specify temporary storage, which is required by certain drivers. Typical values include:

  • --temporary=s3://$S3_TEMP_BUCKET
  • --temporary=gs://$GS_TEMP_BUCKET
  • --temporary=bigquery:$GCLOUD_PROJECT:temp_dataset

--to-arg

This can be used to specify driver-specific options for the destination driver. See the chapter for that driver.