Cloud-hosted Jupyter Notebooks are infrastructures that make it possible to perform data operations on the cloud, without having to install additional software or tools. Learn how you can make use of this.
Motivation
Getting the right infrastructure for data science or machine learning can become a challenge, especially when you are starting with a very low budget. You might find it so difficult to run and install some software like Anaconda, and you might suffer delays in the execution of certain tasks even if you can get lightweight software (e.g miniconda) or packages. Some computations are best done with GPUs or TPUs and not the common CPUs found in our devices.
The cloud-based Jupyter Notebooks I’ll discuss in this article solve the problem of computation. Read till the end to get all from this article.
This article covers:
What are cloud-based Jupyter notebooks?
Examples of cloud-based Jupyter Notebooks
How to make use of Kaggle Jupyter Notebook
Starting a Jupyter Notebook
Conclusion
References
Cloud-based Jupyter notebooks
Jupyter Notebook is an interactive notebook used by data scientists. You can readily access and run a notebook using Anaconda or VSCode. The challenge is that this software works best on computers with good computing power and many data science enthusiasts cannot afford such. On the other hand, cloud computing facilitates the delivery of computing services over the Internet. Examples of computing services include servers, storage, databases, networking, and others. You don’t need to download softwares to run data science experiments with cloud-based Jupyter Notebooks.
Examples of cloud-based Jupyter Notebook
You can look up the following:
Kaggle Jupyter notebook
Google Colab
Microsoft Azure Notebook
Amazon SageMaker
IBM Watson Studio
Databricks
Saturn Cloud
The benefits
Some of the benefits of using such notebooks are:
Run experiments with low computing power PC
Collaborate easily with others during experiments
Get access to higher computing power. GPU and TPU
Other benefits include parallel computing features, ease of connection to cloud storage, and many other high-end services that are peculiar to the company providing the service.
How to make use of Kaggle Notebook
Kaggle is a data science community with many amazing resources for data scientists and machine learning engineers. The resources include datasets, community, learning resources, and a cloud-based Jupyter Notebook. Its Jupyter Notebook comes with pre-installed numpy, pandas, and many other necessary packages for data operations.
Follow these steps to access the notebook:
1. Create a Kaggle account
Enter Kaggle in your favorite search engine. Click on Register and follow the next instruction.
2. Create a notebook
Click on the Create button on the left side of the window. Select New Notebook from the drop-down menu. Now, you have access to the Jupyter notebook displayed.
3. Start a session
Start a session by clicking on the On button as shown in the image below
4. Add a new code or markdown cell
Click on Code or Markdown to add a new code or markdown cell in your notebook.
5. Run a cell
Running a cell in Jupyter Notebook is not difficult.
You can make use of any of the three methods presented below:
Click on the play button to run your current cell. Current cells refer to the cell you edited last. It is similar to “Active Cell” in MS Excel
Click on the play button beside the cell you want to run.
Hold Shift + Enter on your keyboard to run your current cell.
4. Load data into your Jupyter notebook
There are two major scenarios I have identified. The first is when you want to load a dataset from Kaggle into your Jupyter Notebook, and the second is when you want to load a dataset from your computer into Jupyter Notebook.
Load a Kaggle dataset
You have to focus on the right pane of your Jupyter Notebook.
Step One:
Click on the down arrow circle in the image below. A drop-down menu will be displayed.
Step Two:
Select Add Input from the drop-down menu.
Step Three:
Search for a dataset by entering the name of the dataset or keywords associated with it. Your search results will contain datasets, Jupyter Notebooks, and articles related to the keywords you entered. You can be more specific by directly clicking on Datasets as shown in the image below. Datasets related to the search keywords will be displayed.
Step Four:
Add your choice dataset to your Kaggle storage by clicking the encircled “+” sign, as shown in 1. The right action will turn the circle to black as shown in 2.
Step Five:
Drag your cursor to the dataset you want to make use of. Copy file path and move to a Jupyter Notebook cell.
Step Six:
Enter the file path you copied in parenthesis of the pandas read_csv method to load the dataset.
Note that the file path must not be the same as shown in the image below.
Add new code or markdown cell
Add a new code block or markdown by clicking the “+code” or “+markdown” icons respectively.
Versioning your notebooks
Part of the advantages of working with cloud notebooks is that you can version your notebooks. For example, imagine you submitted a PowerPoint presentation tagged version one to your team lead for supervision, after the first correction, you are asked to make another submission. This time, you have to send the edited presentation as version two. It means you have two similar, yet different copies, of the presentation.
In the same way, you can version your notebooks with Kaggle notebooks.
Click on the “Save Version” to save a version of your notebook.
This will lead to the image shown below
Click on the Save button to save this version for future use.
Next, I’ll guide you on how to switch to the cloud GPU and TPU.
How to select GPU and TPU
Graphics processing unit (GPU) and tensor processing unit (TPU) are electrical circuits that make it possible to run tasks that require high computing power. You can easily switch to a GPU or TPU processor during your job. Click on the vertical three dots (…) labeled as 1 in the image. This will bring the drop-down shown in the image.
Next, select the last option, “Accelerator”. The next drop-down presents processor options that you can choose from.
Conclusion
There are many other free-to-use cloud Jupyter Notebooks that you can make use of in your data science and machine learning journey. All you need to do is to start a project within the next cloud-based notebook.
If you want more of this
References
https://support.microsoft.com/en-us/windows/all-about-graphics-processing-units-gpus-e159bedb-80b7-4738-a0c1-76d2a05beab4.