ModuleNotFoundError: No module named 'py4j'

I installed Spark and I am running into problems loading the pyspark module into ipython. I'm getting the following error:

ModuleNotFoundError Traceback (most recent call last)
<ipython-input-2-49d7c4e178f8> in <module>
----> 1 import pyspark
/opt/spark/python/pyspark/__init__.py in <module> 44 45 from pyspark.conf import SparkConf
---> 46 from pyspark.context import SparkContext 47 from pyspark.rdd import RDD 48 from pyspark.files import SparkFiles
/opt/spark/python/pyspark/context.py in <module> 27 from tempfile import NamedTemporaryFile 28
---> 29 from py4j.protocol import Py4JError 30 31 from pyspark import accumulators
ModuleNotFoundError: No module named 'py4j'
1

1 Answer

If you can run spark directly, maybe you have to fix the environment variable PYTHONPATH. Check the filename in the directory $SPARK_HOME/python/lib/. If the Spark version 2.4.3, the file is py4j-0.10.7-src.zip:

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH
2

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like