How to Install Pyspark on MacBook Pro M3

In this post, I’ll cover step-by-step how to install Pyspark on a MacBook Pro M3 computer. This set of 6 instructions will conclude with verifying that Pyspark can be used from a jupyter notebook, in a conda virtual environment.

Table of Contents

How to Install Pyspark on MacBook Pro M3 – composite image by author

Step 1: Install Homebrew

To start off with, make sure that you have Homebrew installed on your computer. Homebrew is a missing package manger for MacOS and Linux. I did this by running the following command from terminal:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Follow the instructions as they appear, and respond according to your preferences.

Step 2: Install & Setup Python

Next we can install Python. I did this through Miniconda, a lightweight version of the Anaconda environment and package management system. Run this command from terminal:

brew install --cask miniconda

Like before, follow the instructions on the command line as they appear.

After refreshing the terminal to ensure conda is setup in the current process, I created a virtual environment for the rest of the steps to come. To do this, run:

conda create -n pyspark_env python=3.9

where pyspark_env is the name of the new virtual environment. Activate the new environment:

conda activate pyspark_env

Now we can install two packages in our new virtual environment, that we’ll make use of later. Run the following commands:

conda install jupyter
conda install findspark

Step 3: Install Java

Pyspark is a Python wrapper around Spark, which relies on Java. To install Java, I executed the following command:

brew install openjdk@11

Step 4: Install Apache Spark

This step will not only install Spark, but also the Pyspark library as well. Run the following command to complete this step:

brew install apache-spark

Note that in my particular case, I ended up with apache-spark version 3.5.0 after running this installation.

Step 5: Update .zshrc File

I had to add the following lines to my .zshrc file in order to have Pyspark accessible within Python. Open this file in your favorite text editor, and then append the following:

export JAVA_HOME=$(/usr/libexec/java_home)
export SPARK_HOME="/opt/homebrew/Cellar/apache-spark/3.5.0/libexec"
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3

Step 6: Test in Jupyter

Let’s now test the installation and setup. Create a new jupyter notebook by running the following command:

jupyter notebook

A new window should now appear in your browser. Select New, then Notebook, to create a new notebook. Within this notebook, enter the following code into one or more cells:

import findspark
findspark.init("/opt/homebrew/Cellar/apache-spark/3.5.0/libexec")

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("test").getOrCreate()

data = [("xyz","400"),("abc","450"),("qwe","678")]
df = spark.createDataFrame(data)
df.show()

If everything works correctly, you should see the result from the last line showing as:

Final Remarks on Install Pyspark on MacBook Pro M3

In this post, I covered how to install and setup a working Pyspark environment. These are the steps I worked through on my own MacBook Pro M3. I hope this information helps you in your own projects. If you have any questions or comments, please leave them below.

Hi I'm Michael Attard, a Data Scientist with a background in Astrophysics. I enjoy helping others on their journey to learn more about machine learning, and how it can be applied in industry.

5 1 vote

Article Rating

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Arek

8 months ago

Thanks Bro for this post. I had a problem because I had not information how to configure zshrc and my spark didn’t want run.

But i only installed python 3.9, openjdk 17, scala 2.13 and apache-spark 3.5.3 from brew. I didn’t need anaconda etc on my macbook pro m2 pro.

Author

Michael Attard

Reply to Arek

I’m glad to hear this post helped you out! And yes if you prefer to not use anaconda then your approach is perfectly fine.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.