dbcrossbar
Guide
dbcrossbar
is an open source tool that copies large, tabular datasets between many different databases and storage formats. Data can be copied from any source to any destination.
An example
If we have a CSV file my_table.csv
containing data:
id,name,quantity
1,Blue widget,10
2,Red widget,50
And a file my_table.sql
containing a table definition:
CREATE TABLE my_table (
id INT NOT NULL,
name TEXT NOT NULL,
quantity INT NOT NULL
);
Then we can use these to create a PostgreSQL table:
dbcrossbar cp \
--if-exists=overwrite \
--schema=postgres-sql:my_table.sql \
csv:my_table.csv \
'postgres://postgres@127.0.0.1:5432/postgres#my_table'
If we want to use this data to update an existing table in BigQuery, we can upsert into BigQuery using the id
column:
dbcrossbar config add temporary gs://$GS_TEMP_BUCKET
dbcrossbar config add temporary bigquery:$GCLOUD_PROJECT:temp_dataset
dbcrossbar cp \
--if-exists=upsert-on:id \
'postgres://postgres@127.0.0.1:5432/postgres#my_table' \
bigquery:$GCLOUD_PROJECT:my_dataset.my_table
This will stream the data out of PostgreSQL, upload it to $GS_TEMP_BUCKET
, import it into a temporary BigQuery table, and run an appropriate MERGE
command. The config add temporary
commands tell dbcrossbar
what cloud bucket and BigQuery dataset should be used for temporary files and tables, respectively.
Notice that we don't need to specify --schema
, because dbcrossbar
will automatically translate the PostgreSQL column types to corresponding BigQuery types.
Credits
dbcrossbar
is generously supported by Faraday and by open source contributors. Please see the credits for more information.