In this post, I’ll cover step-by-step how to install Pyspark on a MacBook Pro M3 computer. This set of 6 instructions will conclude with verifying that Pyspark can be used from a jupyter notebook, in a conda virtual environment.

install pyspark on MacBook pro m3

How to Install Pyspark on MacBook Pro M3 – composite image by author

Step 1: Install Homebrew

To start off with, make sure that you have Homebrew installed on your computer. Homebrew is a missing package manger for MacOS and Linux. I did this by running the following command from terminal:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Follow the instructions as they appear, and respond according to your preferences.

Step 2: Install & Setup Python

Next we can install Python. I did this through Miniconda, a lightweight version of the Anaconda environment and package management system. Run this command from terminal:

brew install --cask miniconda

Like before, follow the instructions on the command line as they appear. 

After refreshing the terminal to ensure conda is setup in the current process, I created a virtual environment for the rest of the steps to come. To do this, run:

conda create -n pyspark_env python=3.9

where pyspark_env is the name of the new virtual environment. Activate the new environment:

conda activate pyspark_env

Now we can install two packages in our new virtual environment, that we’ll make use of later. Run the following commands: 

conda install jupyter
conda install findspark

Step 3: Install Java

Pyspark is a Python wrapper around Spark, which relies on Java. To install Java, I executed the following command:

brew install openjdk@11

Step 4: Install Apache Spark

This step will not only install Spark, but also the Pyspark library as well. Run the following command to complete this step: 

brew install apache-spark

Note that in my particular case, I ended up with apache-spark version 3.5.0 after running this installation.

Step 5: Update .zshrc File

I had to add the following lines to my .zshrc file in order to have Pyspark accessible within Python. Open this file in your favorite text editor, and then append the following:

export JAVA_HOME=$(/usr/libexec/java_home)
export SPARK_HOME="/opt/homebrew/Cellar/apache-spark/3.5.0/libexec"
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3

Step 6: Test in Jupyter

Let’s now test the installation and setup. Create a new jupyter notebook by running the following command:

jupyter notebook

A new window should now appear in your browser. Select New, then Notebook, to create a new notebook. Within this notebook, enter the following code into one or more cells:

import findspark
findspark.init("/opt/homebrew/Cellar/apache-spark/3.5.0/libexec")

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("test").getOrCreate()

data = [("xyz","400"),("abc","450"),("qwe","678")]
df = spark.createDataFrame(data)
df.show()

If everything works correctly, you should see the result from the last line showing as:

install pyspark on MacBook pro m3

Final Remarks on Install Pyspark on MacBook Pro M3

In this post, I covered how to install and setup a working Pyspark environment. These are the steps I worked through on my own MacBook Pro M3. I hope this information helps you in your own projects. If you have any questions or comments, please leave them below.

Related Posts

About Author

Hi I'm Michael Attard, a Data Scientist with a background in Astrophysics. I enjoy helping others on their journey to learn more about machine learning, and how it can be applied in industry.

5 1 vote
Article Rating
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x
Newsletter Signup