BigQuery
Google's BigQuery is a extremely scalable data warehouse that supports rich SQL queries and petabytes of data. If you need to transform or analyze huge data sets, it's an excellent tool.
When loading data into BigQuery, or extracting it, we always go via Google Cloud Storage. This is considerably faster than the load and extract functionality supplied by tools like bq
.
Example locators
bigquery:$PROJECT:$DATASET.$TABLE
: A BigQuery table.bigquery-test-fixture:$PROJECT:$DATASET.$TABLE
: If you only need a tiny, read-only "table" for testing purposes, you may want to try thebigquery-test-fixture:
locator. It currently usestables.insert
to pass atable.view.query
with all the table data inlined into theVIEW
SQL. This runs about 20 times faster thanbigquery:
, at the expense of not creating a regular table. Note that the implementation details of this method may change, if we discover a faster or better way to create a small, read-only table.
Configuration & authentication
See the Cloud Storage driver for authentication details.
The following command-line options will usually need to be specified for both sources and destinations:
--temporary=gs://$GS_TEMP_BUCKET
: A Google Cloud Storage bucket to use for staging data in both directions.--temporary=bigquery:$GCLOUD_PROJECT:temp_dataset
You can also specify Google Cloud resource labels to apply to all BigQuery jobs. Labels are often used to track query costs.
--from-arg=job_labels[department]=marketing
--to-arg=job_labels[project]=project1
Supported features
bigquery features:
- conv FROM
- count
--from-arg=$NAME=$VALUE --where=$SQL_EXPR
- cp FROM:
--from-arg=$NAME=$VALUE --where=$SQL_EXPR
- cp TO:
--to-arg=$NAME=$VALUE
--if-exists=error --if-exists=append --if-exists=overwrite --if-exists=upsert-on:col