Starting to develop in PySpark with Jupyter installed in a Big Data Cluster

Is not a secret that Data Science tools like Jupyter, Apache Zeppelin or the more recently launched Cloud Data Lab and Jupyter Lab are a must be known for the day by day work so How could be combined the power of easily developing models and the capacity of computation of a Big Data Cluster? Well in this article I will share very simple step to start using Jupyter notebooks for PySpark in a Data Proc Cluster in GCP.

Final goal

Prerequisites

1. Have a Google Cloud account (Just log in with your Gmail and automatically get $300 of credit for one year) [1]

2. Create a new project with your favorite name

Steps

In order to make easier the deployment, I’m going to use a beta featurethat only can be applied when creating a Data
To finish reading, please visit source site

Apache Spark
Big Data
Google Cloud
Jupyter
python