CSV interchange format

Internally, dbcrossbar converts all data into CSV streams. For many standard types, all input drivers are required to provide byte-for-byte identical CSV data:

id,test_bool,test_date,test_int16,test_int32,test_int64,test_text,test_timestamp_without_time_zone,test_timestamp_with_time_zone,test_uuid,select,testCapitalized
1,t,1969-07-20,-32768,-2147483648,-9223372036854775808,hello,1969-07-20T20:17:39,1969-07-20T20:17:39Z,084ec3bb-3193-4ffb-8b74-99a288e8432c,,
2,f,2001-01-01,32767,2147483647,9223372036854775807,,,,,,
3,,,,,,,,,,,

For more complex types such as arrays, structs, JSON, and GeoJSON data, we embed JSON into the CSV file:

test_null,test_not_null,test_bool,test_bool_array,test_date,test_date_array,test_float32,test_float32_array,test_float64,test_float64_array,test_geojson,test_geojson_3857,test_int16,test_int16_array,test_int32,test_int32_array,test_int64,test_int64_array,test_json,test_text,test_text_array,test_timestamp_without_time_zone,test_timestamp_without_time_zone_array,test_timestamp_with_time_zone,test_timestamp_with_time_zone_array,test_uuid,test_uuid_array
,hi,t,"[true,false]",1969-07-20,"[""1969-07-20""]",1e+37,"[1e-37,0,100.125,1e+37]",1e+37,"[1e-37,0,1000.125,1e+37]","{""type"":""Point"",""coordinates"":[-71,42]}","{""type"":""Point"",""coordinates"":[-71,42]}",16,"[-32768,0,32767]",32,"[-2147483648,0,2147483647]",64,"[""-9223372036854775808"",""0"",""9223372036854775807""]","{""x"": 1, ""y"": 2}",hello,"[""hello"",""""]",1969-07-20T20:17:39.5,"[""1969-07-20T20:17:39.5""]",1969-07-20T20:17:39.5Z,"[""1969-07-20T20:17:39.5Z""]",084ec3bb-3193-4ffb-8b74-99a288e8432c,"[""084ec3bb-3193-4ffb-8b74-99a288e8432c""]"

Tricks for preparing CSV data

If your input CSV files use an incompatible format, there are several things that might help. If your CSV files are invalid, non-standard, or full of junk, then you may be able to use scrubcsv or xsv to fix the worst problems.

If you need to clean up your data manually, then you may want to consider using dbcrossbar to load your data into BigQuery, and set your columns to type STRING. Once this is done, you can parse and normalize your data quickly using SQL queries.