Why spark tell me “ name 'sqlContext' is not defined ”, how can I use sqlContext?

I try to run example of spark-ml, but

from pyspark import SparkContext
import pyspark.sql
sc = SparkContext(appName="PythonStreamingQueueStream")
training = sqlContext.createDataFrame([
(1.0, Vectors.dense([0.0, 1.1, 0.1])),
(0.0, Vectors.dense([2.0, 1.0, -1.0])),
(0.0, Vectors.dense([2.0, 1.3, 1.0])),
(1.0, Vectors.dense([0.0, 1.2, -0.5]))], ["label", "features"])

cannot run because terminal tells me that

NameError: name 'SQLContext' is not defined

Why this happened? How can I solve it?

1

2 Answers

If you are using Apache Spark 1.x line (i.e. prior to Apache Spark 2.0), to access the sqlContext, you would need to import the sqlContext; i.e.

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

If you're using Apache Spark 2.0, you can just the Spark Session directly instead. Therefore your code will be

training = spark.createDataFrame(...)

For more information, please refer to the Spark SQL Programing Guide.

3
from pyspark.sql import SparkSession,SQLContext
spark = SparkSession.builder.appName("Basics").getOrCreate()
sc=spark.sparkContext
sqlContext = SQLContext(sc)
df = sqlContext.range(0,10)

Above piece of code will solve your issue.

1

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

You Might Also Like