In this post, I’ll cover step-by-step how to install Pyspark on a MacBook Pro M3 computer. This set of 6 instructions will conclude with verifying that Pyspark can be used from a jupyter notebook, in a conda virtual environment.
Table of Contents
How to Install Pyspark on MacBook Pro M3 – composite image by author
Step 1: Install Homebrew
To start off with, make sure that you have Homebrew installed on your computer. Homebrew is a missing package manger for MacOS and Linux. I did this by running the following command from terminal:
Follow the instructions as they appear, and respond according to your preferences.
Step 2: Install & Setup Python
Next we can install Python. I did this through Miniconda, a lightweight version of the Anaconda environment and package management system. Run this command from terminal:
brew install --cask miniconda
Like before, follow the instructions on the command line as they appear.
After refreshing the terminal to ensure conda is setup in the current process, I created a virtual environment for the rest of the steps to come. To do this, run:
conda create -n pyspark_env python=3.9
where pyspark_env is the name of the new virtual environment. Activate the new environment:
conda activate pyspark_env
Now we can install two packages in our new virtual environment, that we’ll make use of later. Run the following commands:
conda install jupyter
conda install findspark
Step 3: Install Java
Pyspark is a Python wrapper around Spark, which relies on Java. To install Java, I executed the following command:
brew install openjdk@11
Step 4: Install Apache Spark
This step will not only install Spark, but also the Pyspark library as well. Run the following command to complete this step:
brew install apache-spark
Note that in my particular case, I ended up with apache-spark version 3.5.0 after running this installation.
Step 5: Update .zshrc File
I had to add the following lines to my .zshrc file in order to have Pyspark accessible within Python. Open this file in your favorite text editor, and then append the following:
Let’s now test the installation and setup. Create a new jupyter notebook by running the following command:
jupyter notebook
A new window should now appear in your browser. Select New, then Notebook, to create a new notebook. Within this notebook, enter the following code into one or more cells:
import findspark
findspark.init("/opt/homebrew/Cellar/apache-spark/3.5.0/libexec")
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("test").getOrCreate()
data = [("xyz","400"),("abc","450"),("qwe","678")]
df = spark.createDataFrame(data)
df.show()
If everything works correctly, you should see the result from the last line showing as:
Final Remarks on Install Pyspark on MacBook Pro M3
In this post, I covered how to install and setup a working Pyspark environment. These are the steps I worked through on my own MacBook Pro M3. I hope this information helps you in your own projects. If you have any questions or comments, please leave them below.
Hi I'm Michael Attard, a Data Scientist with a background in Astrophysics. I enjoy helping others on their journey to learn more about machine learning, and how it can be applied in industry.
I’m glad to hear this post helped you out! And yes if you prefer to not use anaconda then your approach is perfectly fine.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies. Review privacy policy here
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
If you wish to view our privacy policy, please see: http://insidelearningmachines.com/wpautoterms/privacy-policy/
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Thanks Bro for this post. I had a problem because I had not information how to configure zshrc and my spark didn’t want run.
But i only installed python 3.9, openjdk 17, scala 2.13 and apache-spark 3.5.3 from brew. I didn’t need anaconda etc on my macbook pro m2 pro.
I’m glad to hear this post helped you out! And yes if you prefer to not use anaconda then your approach is perfectly fine.