python - Create a dataframe from a list in pyspark.sql -

i totally lost in wired situation. have list li

li = example_data.map(lambda x: get_labeled_prediction(w,x)).collect() print li, type(li)

the output like,

[(0.0, 59.0), (0.0, 51.0), (0.0, 81.0), (0.0, 8.0), (0.0, 86.0), (0.0, 86.0), (0.0, 60.0), (0.0, 54.0), (0.0, 54.0), (0.0, 84.0)] <type 'list'>

when try create dataframe list

m = sqlcontext.createdataframe(l, ["prediction", "label"])

it threw error message

typeerror                                 traceback (most recent call last) <ipython-input-90-4a49f7f67700> in <module>()  56 l = example_data.map(lambda x: get_labeled_prediction(w,x)).collect()  57 print l, type(l) ---> 58 m = sqlcontext.createdataframe(l, ["prediction", "label"])  59 '''  60 g = example_data.map(lambda x:gradient_summand(w, x)).sum()  /databricks/spark/python/pyspark/sql/context.py in createdataframe(self, data, schema, samplingratio) 423             rdd, schema = self._createfromrdd(data, schema, samplingratio) 424         else: --> 425             rdd, schema = self._createfromlocal(data, schema) 426         jrdd = self._jvm.serdeutil.tojavaarray(rdd._to_java_object_rdd()) 427         jdf = self._ssql_ctx.applyschematopythonrdd(jrdd.rdd(), schema.json())  /databricks/spark/python/pyspark/sql/context.py in _createfromlocal(self, data, schema) 339  340         if schema none or isinstance(schema, (list, tuple)): --> 341             struct = self._inferschemafromlist(data) 342             if isinstance(schema, (list, tuple)): 343                 i, name in enumerate(schema):  /databricks/spark/python/pyspark/sql/context.py in _inferschemafromlist(self, data) 239             warnings.warn("inferring schema dict deprecated," 240                           "please use pyspark.sql.row instead") --> 241         schema = reduce(_merge_type, map(_infer_schema, data)) 242         if _has_nulltype(schema): 243             raise valueerror("some of types cannot determined after inferring")  /databricks/spark/python/pyspark/sql/types.py in _infer_schema(row) 831         raise typeerror("can not infer schema type: %s" % type(row)) 832  --> 833     fields = [structfield(k, _infer_type(v), true) k, v in items] 834     return structtype(fields) 835   /databricks/spark/python/pyspark/sql/types.py in _infer_type(obj) 808             return _infer_schema(obj) 809         except typeerror: --> 810             raise typeerror("not supported type: %s" % type(obj)) 811  812   typeerror: not supported type: <type 'numpy.float64'>

but when hard code list in line

tt = sqlcontext.createdataframe([(0.0, 59.0), (0.0, 51.0), (0.0, 81.0), (0.0, 8.0), (0.0, 86.0), (0.0, 86.0), (0.0, 60.0), (0.0, 54.0), (0.0, 54.0), (0.0, 84.0)], ["prediction", "label"]) tt.collect()

it works well.

[row(prediction=0.0, label=59.0),  row(prediction=0.0, label=51.0),  row(prediction=0.0, label=81.0),  row(prediction=0.0, label=8.0),  row(prediction=0.0, label=86.0),  row(prediction=0.0, label=86.0),  row(prediction=0.0, label=60.0),  row(prediction=0.0, label=54.0),  row(prediction=0.0, label=54.0),  row(prediction=0.0, label=84.0)]

what caused problem , how fix it? hint appreciated.

you have list of float64 , think doesn't type. on other hand, when hard code it's list of float.
here question answer goes on over how convert numpy's datatype python's native ones.

Trigger

Search This Blog

python - Create a dataframe from a list in pyspark.sql -

Comments

Post a Comment