BigML is a hosted machine-learning service, with support for many common algorithms and server-side batch scripts.
bigml:dataset/$ID: Read data from a BigML dataset.
bigml:source: Create a single BigML "source" resource from the input data.
bigml:sources: Create multiple BigML "source" resources from the input data.
bigml:dataset: Create a single BigML "dataset" resource from the input data.
bigml:datasets: Create multiple BigML "dataset" resources from the input data.
If you use BigML as a destination,
dbcrossbar will automatically activate
--display-output-locators, and it will print locators for all the created resources on standard output. Column types on created "source" resources will be set something appropriate (but see
The BigML driver requires more configuration than most.
You'll need to set the following environment variables:
BIGML_USERNAME: Set this to your BigML username.
BIGML_API_KEY: Set this to your BigML API key.
BIGML_DOMAIN(optional): Set this to the domain name of your BigML instance, if it's not located at the standard address.
AWS_SECRET_ACCESS_KEY: Set these to your AWS credentials when using BigML as a destination. Do not set
AWS_SESSION_TOKEN; it will not work with BigML.
You'll also need to pass the following on the command line when using:
--temporary=s3://$S3_TEMP_BUCKET: Specify where to stage files for loading into BigML. This is not needed when using BigML as a source.
You can also specify the following
name: The human-readable name of the resource to create.
optype_for_text: The BigML optype to use for text fields. This defaults to
text. You may want to set it to
categoricalif your text fields contain a small set of fixed strings.
tag: This may be specified repeatedly to attach tags to the created resources.
bigml features: - conv FROM - cp FROM: - cp TO: --to-arg=$NAME=$VALUE --if-exists=error --if-exists=append --if-exists=overwrite --if-exists=upsert-on:col
--if-exists is simply ignored, because BigML will always create new resources.