CSV interchange format
Internally, dbcrossbar
converts all data into CSV streams. For many standard types, all input drivers are required to provide byte-for-byte identical CSV data:
id,test_bool,test_date,test_int16,test_int32,test_int64,test_text,test_timestamp_without_time_zone,test_timestamp_with_time_zone,test_uuid,select,testCapitalized,test_enum
1,t,1969-07-20,-32768,-2147483648,-9223372036854775808,hello,1969-07-20T20:17:39,1969-07-20T20:17:39Z,084ec3bb-3193-4ffb-8b74-99a288e8432c,,,red
2,f,2001-01-01,32767,2147483647,9223372036854775807,,,,,,,green
3,,,,,,,,,,,,
For more complex types such as arrays, structs, JSON, and GeoJSON data, we embed JSON into the CSV file:
test_null,test_not_null,test_bool,test_bool_array,test_date,test_date_array,test_float32,test_float32_array,test_float64,test_float64_array,test_geojson,test_geojson_3857,test_int16,test_int16_array,test_int32,test_int32_array,test_int64,test_int64_array,test_json,test_text,test_text_array,test_timestamp_without_time_zone,test_timestamp_without_time_zone_array,test_timestamp_with_time_zone,test_timestamp_with_time_zone_array,test_uuid,test_uuid_array,test_enum
,hi,t,"[true,false]",1969-07-20,"[""1969-07-20""]",1e+37,"[1e-37,0,100.125,1e+37]",1e+37,"[1e-37,0,1000.125,1e+37]","{""type"":""Point"",""coordinates"":[-71,42]}","{""type"":""Point"",""coordinates"":[-71,42]}",16,"[-32768,0,32767]",32,"[-2147483648,0,2147483647]",64,"[""-9223372036854775808"",""0"",""9223372036854775807""]","{""x"": 1, ""y"": 2}",hello,"[""hello"",""""]",1969-07-20T20:17:39.5,"[""1969-07-20T20:17:39.5""]",1969-07-20T20:17:39.5Z,"[""1969-07-20T20:17:39.5Z""]",084ec3bb-3193-4ffb-8b74-99a288e8432c,"[""084ec3bb-3193-4ffb-8b74-99a288e8432c""]",red
Tricks for preparing CSV data
If your input CSV files use an incompatible format, there are several things that might help. If your CSV files are invalid, non-standard, or full of junk, then you may be able to use scrubcsv
or xsv
to fix the worst problems.
If you need to clean up your data manually, then you may want to consider using dbcrossbar
to load your data into BigQuery, and set your columns to type STRING
. Once this is done, you can parse and normalize your data quickly using SQL queries.