cp
: Copying tables
The cp
command copies tabular data from a source location to a destination location. For example, we can copy a CSV file into PostgreSQL, replacing any existing table:
dbcrossbar cp \
--if-exists=overwrite \
--schema=postgres-sql:my_table.sql \
csv:my_table.csv \
'postgres://postgres@127.0.0.1:5432/postgres#my_table'
Or we copy data from PostgreSQL and upsert it into a BigQuery table:
dbcrossbar config add temporary gs://$GS_TEMP_BUCKET
dbcrossbar config add temporary bigquery:$GCLOUD_PROJECT:temp_dataset
dbcrossbar cp \
--if-exists=upsert-on:id \
'postgres://postgres@127.0.0.1:5432/postgres#my_table' \
bigquery:$GCLOUD_PROJECT:my_dataset.my_table
Command-line help
Usage: dbcrossbar cp [OPTIONS] <FROM_LOCATOR> <TO_LOCATOR>
Arguments:
<FROM_LOCATOR> The input table
<TO_LOCATOR> The output table
Options:
--if-exists <IF_EXISTS> One of `error`, `overwrite`, `append` or `upsert-on:COL` [default:
error]
--schema <SCHEMA> The schema to use (defaults to input table schema)
--temporary <TEMPORARIES> Temporary directories, cloud storage buckets, datasets to use during
transfer (can be repeated)
--stream-size <STREAM_SIZE> Specify the approximate size of the CSV streams manipulated by
`dbcrossbar`. This can be used to split a large input into multiple
smaller outputs. Actual data streams may be bigger or smaller
depending on a number of factors. Examples: "100000", "1Gb"
--from-arg <FROM_ARGS> Pass an extra argument of the form `key=value` to the source driver
--from-format <FROM_FORMAT> For directory- and file-like data sources, the format to assume. If
not specified, `dbcrossbar` will use the file extension to guess the
format
--to-arg <TO_ARGS> Pass an extra argument of the form `key=value` to the destination
driver
-F, --to-format <TO_FORMAT> For directory-like data destinations, the format to use. If not
specified, `dbcrossbar` will use the destination file extension (if
provided) or `csv`
--where <WHERE_CLAUSE> SQL where clause specifying rows to use
-J, --max-streams <MAX_STREAMS> How many data streams should we attempt to copy in parallel? [default:
4]
--display-output-locators Display where we wrote our output data
-h, --help Print help
EXAMPLE LOCATORS:
postgres://localhost:5432/db#table
bigquery:project:dataset.table
Flags
Not all command-line options are supported by all drivers. See the chapter on each driver for details.
--where
Specify a WHERE
clause to include in the SQL query. This can be used to select a subset of the source rows.
--from-arg
This can be used to specify driver-specific options for the source driver. See the chapter for that driver.
--if-exists=error
If the destination location already contains data, exit with an error.
--if-exists=append
If the destination location already contains data, append the new data.
--if-exists=overwrite
If the destination location already contains data, replace it with the new data.
--if-exists=upset-on:COL1,..
For every row in the new data:
- If a row with a matching
col1
,col2
, etc., exists, use the new data to update that row. - If no row matching
col1
,col2
, etc., exists, then insert the new row.
The columns col1
, col2
, etc., must be marked as NOT NULL
.
--schema
By default, dbcrossbar
will use the schema of the source table. But when this can't be inferred automatically, --schema
can be used to specify a table schema:
--schema=postgres-sql:my_table.sql
: A PostgreSQLCREATE TABLE
statement.--schema=bigquery-schema:my_table.json
: A BigQuery JSON schema.--schema=dbcrossbar-schema:my_table.json
: An internaldbcrossbar
schema.
It's also possible to use a schema from an existing database table:
--schema=postgres://localhost:5432/db#table
--schema=bigquery:project:dataset.table
Note that it's possible to create a BigQuery table using a PostgreSQL schema, or vice versa. Internally, all schemes are first converted to the internal schema format.
--temporary
Specify temporary storage, which is required by certain drivers. Typical values include:
--temporary=s3://$S3_TEMP_BUCKET
--temporary=gs://$GS_TEMP_BUCKET
--temporary=bigquery:$GCLOUD_PROJECT:temp_dataset
--to-arg
This can be used to specify driver-specific options for the destination driver. See the chapter for that driver.