`cp`: Copying tables

The cp command copies tabular data from a source location to a destination location. For example, we can copy a CSV file into PostgreSQL, replacing any existing table:

dbcrossbar cp \
    --if-exists=overwrite \
    --schema=postgres-sql:my_table.sql \
    csv:my_table.csv \
    'postgres://postgres@127.0.0.1:5432/postgres#my_table'

Or we copy data from PostgreSQL and upsert it into a BigQuery table:

dbcrossbar config add temporary gs://$GS_TEMP_BUCKET
dbcrossbar config add temporary bigquery:$GCLOUD_PROJECT:temp_dataset
dbcrossbar cp \
    --if-exists=upsert-on:id \
    'postgres://postgres@127.0.0.1:5432/postgres#my_table' \
    bigquery:$GCLOUD_PROJECT:my_dataset.my_table

Command-line help


Usage: dbcrossbar cp [OPTIONS] <FROM_LOCATOR> <TO_LOCATOR>

Arguments:
  <FROM_LOCATOR>  The input table
  <TO_LOCATOR>    The output table

Options:
      --if-exists <IF_EXISTS>      One of `error`, `overwrite`, `append` or `upsert-on:COL` [default:
                                   error]
      --schema <SCHEMA>            The schema to use (defaults to input table schema)
      --temporary <TEMPORARIES>    Temporary directories, cloud storage buckets, datasets to use during
                                   transfer (can be repeated)
      --stream-size <STREAM_SIZE>  Specify the approximate size of the CSV streams manipulated by
                                   `dbcrossbar`. This can be used to split a large input into multiple
                                   smaller outputs. Actual data streams may be bigger or smaller
                                   depending on a number of factors. Examples: "100000", "1Gb"
      --from-arg <FROM_ARGS>       Pass an extra argument of the form `key=value` to the source driver
      --from-format <FROM_FORMAT>  For directory- and file-like data sources, the format to assume. If
                                   not specified, `dbcrossbar` will use the file extension to guess the
                                   format
      --to-arg <TO_ARGS>           Pass an extra argument of the form `key=value` to the destination
                                   driver
  -F, --to-format <TO_FORMAT>      For directory-like data destinations, the format to use. If not
                                   specified, `dbcrossbar` will use the destination file extension (if
                                   provided) or `csv`
      --where <WHERE_CLAUSE>       SQL where clause specifying rows to use
  -J, --max-streams <MAX_STREAMS>  How many data streams should we attempt to copy in parallel? [default:
                                   4]
      --display-output-locators    Display where we wrote our output data
  -h, --help                       Print help

EXAMPLE LOCATORS:
    postgres://localhost:5432/db#table
    bigquery:project:dataset.table

Flags

Not all command-line options are supported by all drivers. See the chapter on each driver for details.

`--where`

Specify a WHERE clause to include in the SQL query. This can be used to select a subset of the source rows.

`--from-arg`

This can be used to specify driver-specific options for the source driver. See the chapter for that driver.

`--if-exists=error`

If the destination location already contains data, exit with an error.

`--if-exists=append`

If the destination location already contains data, append the new data.

`--if-exists=overwrite`

If the destination location already contains data, replace it with the new data.

`--if-exists=upset-on:COL1,..`

For every row in the new data:

If a row with a matching col1, col2, etc., exists, use the new data to update that row.
If no row matching col1, col2, etc., exists, then insert the new row.

The columns col1, col2, etc., must be marked as NOT NULL.

`--schema`

By default, dbcrossbar will use the schema of the source table. But when this can't be inferred automatically, --schema can be used to specify a table schema:

--schema=postgres-sql:my_table.sql: A PostgreSQL CREATE TABLE statement.
--schema=bigquery-schema:my_table.json: A BigQuery JSON schema.
--schema=dbcrossbar-schema:my_table.json: An internal dbcrossbar schema.

It's also possible to use a schema from an existing database table:

--schema=postgres://localhost:5432/db#table
--schema=bigquery:project:dataset.table

Note that it's possible to create a BigQuery table using a PostgreSQL schema, or vice versa. Internally, all schemes are first converted to the internal schema format.

`--temporary`

Specify temporary storage, which is required by certain drivers. Typical values include:

--temporary=s3://$S3_TEMP_BUCKET
--temporary=gs://$GS_TEMP_BUCKET
--temporary=bigquery:$GCLOUD_PROJECT:temp_dataset

`--to-arg`

This can be used to specify driver-specific options for the destination driver. See the chapter for that driver.

Using dbcrossbar