Google's BigQuery is a extremely scalable data warehouse that supports rich SQL queries and petabytes of data. If you need to transform or analyze huge data sets, it's an excellent tool.
When loading data into BigQuery, or extracting it, we always go via Google Cloud Storage. This is considerably faster than the load and extract functionality supplied by tools like
COMPATIBILITY WARNING: This driver currently relies on
bq for many tasks, but those tools are poorly-suited to the kind of automation we need. In particular,
gsutil uses too much RAM, and
bq sometimes print status messages on standard output instead of standard error. We plan to replace those tools with native Rust libraries at some point. This will change how the BigQuery driver handles authentication in a future version.
bigquery:$PROJECT:$DATASET.$TABLE: A BigQuery table.
See the Cloud Storage driver for authentication details.
The following command-line options will usually need to be specified for both sources and destinations:
--temporary=gs://$GS_TEMP_BUCKET: A Google Cloud Storage bucket to use for staging data in both directions.
You can also specify Google Cloud resource labels to apply to all BigQuery jobs. Labels are often used to track query costs.
bigquery features: - conv FROM - count --from-arg=$NAME=$VALUE --where=$SQL_EXPR - cp FROM: --from-arg=$NAME=$VALUE --where=$SQL_EXPR - cp TO: --to-arg=$NAME=$VALUE --if-exists=error --if-exists=append --if-exists=overwrite --if-exists=upsert-on:col