dbcrossbar Guide

dbcrossbar is an open source tool that copies large, tabular datasets between many different databases and storage formats. Data can be copied from any source to any destination.

An example

If we have a CSV file my_table.csv containing data:

id,name,quantity
1,Blue widget,10
2,Red widget,50

And a file my_table.sql containing a table definition:

CREATE TABLE my_table (
    id INT NOT NULL,
    name TEXT NOT NULL,
    quantity INT NOT NULL
);

Then we can use these to create a PostgreSQL table:

dbcrossbar cp \
    --if-exists=overwrite \
    --schema=postgres-sql:my_table.sql \
    csv:my_table.csv \
    'postgres://postgres@127.0.0.1:5432/postgres#my_table'

If we want to use this data to update an existing table in BigQuery, we can upsert into BigQuery using the id column:

dbcrossbar config add temporary gs://$GS_TEMP_BUCKET
dbcrossbar config add temporary bigquery:$GCLOUD_PROJECT:temp_dataset
dbcrossbar cp \
    --if-exists=upsert-on:id \
    'postgres://postgres@127.0.0.1:5432/postgres#my_table' \
    bigquery:$GCLOUD_PROJECT:my_dataset.my_table

This will stream the data out of PostgreSQL, upload it to $GS_TEMP_BUCKET, import it into a temporary BigQuery table, and run an appropriate MERGE command. The config add temporary commands tell dbcrossbar what cloud bucket and BigQuery dataset should be used for temporary files and tables, respectively.

Notice that we don't need to specify --schema, because dbcrossbar will automatically translate the PostgreSQL column types to corresponding BigQuery types.

Credits

dbcrossbar is generously supported by Faraday and by open source contributors. Please see the credits for more information.