json - How to format the TSV file in Druid -


i trying load in tsv in druid using ingestion speck:

most updated spec below:

{                                                                                                                                                                                                "type" : "index", "spec" : {     "ioconfig" : {         "type" : "index",         "inputspec" : {             "type": "local",             "basedir": "quickstart",             "filter": "test_data.json"         }     },     "dataschema" : {         "datasource" : "local",         "granularityspec" : {             "type" : "uniform",             "segmentgranularity" : "hour",             "querygranularity" : "none",             "intervals" : ["2016-07-18/2016-07-22"]         },         "parser" : {             "type" : "string",             "parsespec" : {                 "format" : "json",                 "dimensionsspec" : {                     "dimensions" : ["name", "email", "age"]                 },                 "timestampspec" : {                     "format" : "yyyy-mm-dd hh:mm:ss",                      "column" : "date"                 }             }         },         "metricsspec" : [             {                 "name" : "count",                 "type" : "count"             },             {               "type" : "doublesum",               "name" : "age",               "fieldname" : "age"             }         ]     } } 

}

if schema looks this:

schema: name    email    age 

and actual dataset looks this:

name    email    age    bob    jones    23    billy    jones    45 

is how columns should formatted^^ in above dataset tsv? name email age should first (the columns) , actual data. confused how druid know how map columns actual dataset in tsv format.

tsv stands tab separated format, looks same csv use tabs instead of commas e.g.

name<tab>age<tab>address paul<tab>23<tab>1115 w franklin bessy cow<tab>5<tab>big farm way zeke<tab>45<tab>w main st 

you use frist line header define column names - can use "name", "age" or "email" in dimensions in spec file

as gmt , utc, same

there no time difference between greenwich mean time , coordinated universal time

first 1 time zone, other 1 time standard

btw don`t forget include column time value in tsv file!!

so e.g. if have tsv file looks like:

"name"  "position"  "office"    "age"   "start_date"    "salary" "airi satou"    "accountant"    "tokyo" "33"    "2016-07-16t19:20:30+01:00" "162700" "angelica ramos"    "chief executive officer (ceo)" "london"    "47"    "2016-07-16t19:20:30+01:00" "1200000" 

your spec file should this:

{     "spec" : {         "ioconfig" : {             "inputspec" : {                 "type": "local",                 "basedir": "path_to_folder",                 "filter": "name_of_the_file(s)"             }         },         "dataschema" : {             "datasource" : "local",             "granularityspec" : {                 "type" : "uniform",                 "segmentgranularity" : "hour",                 "querygranularity" : "none",                 "intervals" : ["2016-07-01/2016-07-28"]             },             "parser" : {                 "type" : "string",                 "parsespec" : {                     "format" : "tsv",                     "dimensionsspec" : {                         "dimensions" : [                             "position",                             "age",                             "office"                         ]                     },                     "timestampspec" : {                         "format" : "auto",                          "column" : "start_date"                     }                 }             },             "metricsspec" : [                 {                     "name" : "count",                     "type" : "count"                 },                 {                     "name" : "sum_sallary",                     "type" : "longsum",                     "fieldname" : "salary"                 }             ]         }     } } 

Comments