BigML
BigML is a hosted machine-learning service, with support for many common algorithms and server-side batch scripts.
Example locators
Source locators:
bigml:dataset/$ID
: Read data from a BigML dataset.
Destination locators:
bigml:source
: Create a single BigML "source" resource from the input data.bigml:sources
: Create multiple BigML "source" resources from the input data.bigml:dataset
: Create a single BigML "dataset" resource from the input data.bigml:datasets
: Create multiple BigML "dataset" resources from the input data.
If you use BigML as a destination, dbcrossbar
will automatically activate --display-output-locators
, and it will print locators for all the created resources on standard output. Column types on created "source" resources will be set something appropriate (but see optype_for_text
below.)
Configuration & authentication
The BigML driver requires more configuration than most.
You'll need to set the following environment variables:
BIGML_USERNAME
: Set this to your BigML username.BIGML_API_KEY
: Set this to your BigML API key.BIGML_DOMAIN
(optional): Set this to the domain name of your BigML instance, if it's not located at the standard address.AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
: Set these to your AWS credentials when using BigML as a destination. Do not setAWS_SESSION_TOKEN
; it will not work with BigML.
You'll also need to pass the following on the command line when using:
--temporary=s3://$S3_TEMP_BUCKET
: Specify where to stage files for loading into BigML. This is not needed when using BigML as a source.
You can also specify the following --to-arg
values:
name
: The human-readable name of the resource to create.optype_for_text
: The BigML optype to use for text fields. This defaults totext
. You may want to set it tocategorical
if your text fields contain a small set of fixed strings. But you should probably usedbcrossbar
's{ "one_of": string_list }
types instead, which will always map tocategorical
.tag
: This may be specified repeatedly to attach tags to the created resources.
Supported features
bigml features:
- conv FROM
- cp FROM:
- cp TO:
--to-arg=$NAME=$VALUE
--if-exists=error --if-exists=append --if-exists=overwrite --if-exists=upsert-on:col
Note that --if-exists
is simply ignored, because BigML will always create new resources.