cp: Copying tables

The cp command copies tabular data from a source location to a destination location. For example, we can copy a CSV file into PostgreSQL, replacing any existing table:

dbcrossbar cp \
    --if-exists=overwrite \
    --schema=postgres-sql:my_table.sql \
    csv:my_table.csv \
    'postgres://postgres@127.0.0.1:5432/postgres#my_table'

Or we copy data from PostgreSQL and upsert it into a BigQuery table:

dbcrossbar config add temporary gs://$GS_TEMP_BUCKET
dbcrossbar config add temporary bigquery:$GCLOUD_PROJECT:temp_dataset
dbcrossbar cp \
    --if-exists=upsert-on:id \
    'postgres://postgres@127.0.0.1:5432/postgres#my_table' \
    bigquery:$GCLOUD_PROJECT:my_dataset.my_table

Command-line help

Copy tables from one location to another

USAGE:
    dbcrossbar cp [FLAGS] [OPTIONS] <from-locator> <to-locator>

FLAGS:
        --display-output-locators
            Display where we wrote our output data

    -h, --help                       Prints help information
    -V, --version                    Prints version information

OPTIONS:
        --from-arg <from-args>...
            Pass an extra argument of the form `key=value` to the
            source driver
        --if-exists <if-exists>
            One of `error`, `overwrite`, `append` or `upsert-on:COL`
            [default: error]
    -J, --max-streams <max-streams>
            How many data streams should we attempt to copy in
            parallel? [default: 4]
        --schema <schema>
            The schema to use (defaults to input table schema)

        --stream-size <stream-size>
            Specify the approximate size of the CSV streams
            manipulated by `dbcrossbar`. This can be used to split a
            large input into multiple smaller outputs. Actual data
            streams may be bigger or smaller depending on a number of
            factors. Examples: "100000", "1Gb"
        --temporary <temporaries>...
            Temporary directories, cloud storage buckets, datasets to
            use during transfer (can be repeated)
        --to-arg <to-args>...
            Pass an extra argument of the form `key=value` to the
            destination driver
        --where <where-clause>
            SQL where clause specifying rows to use


ARGS:
    <from-locator>    The input table
    <to-locator>      The output table

EXAMPLE LOCATORS:
    postgres://localhost:5432/db#table
    bigquery:project:dataset.table

Flags

Not all command-line options are supported by all drivers. See the chapter on each driver for details.

--where

Specify a WHERE clause to include in the SQL query. This can be used to select a subset of the source rows.

--from-arg

This can be used to specify driver-specific options for the source driver. See the chapter for that driver.

--if-exists=error

If the destination location already contains data, exit with an error.

--if-exists=append

If the destination location already contains data, append the new data.

--if-exists=overwrite

If the destination location already contains data, replace it with the new data.

--if-exists=upset-on:COL1,..

For every row in the new data:

  • If a row with a matching col1, col2, etc., exists, use the new data to update that row.
  • If no row matching col1, col2, etc., exists, then insert the new row.

The columns col1, col2, etc., must be marked as NOT NULL.

--schema

By default, dbcrossbar will use the schema of the source table. But when this can't be inferred automatically, --schema can be used to specify a table schema:

It's also possible to use a schema from an existing database table:

  • --schema=postgres://localhost:5432/db#table
  • --schema=bigquery:project:dataset.table

Note that it's possible to create a BigQuery table using a PostgreSQL schema, or vice versa. Internally, all schemes are first converted to the internal schema format.

--temporary

Specify temporary storage, which is required by certain drivers. Typical values include:

  • --temporary=s3://$S3_TEMP_BUCKET
  • --temporary=gs://$GS_TEMP_BUCKET
  • --temporary=bigquery:$GCLOUD_PROJECT:temp_dataset

--to-arg

This can be used to specify driver-specific options for the destination driver. See the chapter for that driver.