For beginners looking to gain experience with machine learning and AI, it can be difficult to obtain access to huge data sets or vast computational power to handle workloads. One option to overcome this challenge is Google Colab, a free tool from Google that provides resources, such as GPUs, TPUs and Python libraries, to help you gain experience or further refine your skills.
Follow this tutorial to learn what Google Colab is and how to start using the tool.
What is Google Colab?
Jupyter Notebook is a free, open source creation from the Jupyter Project. A Jupyter notebook is like an interactive laboratory notebook that includes not just notes and data, but also code that can manipulate the data. The code can be executed within the notebook, which, in turn, can capture the code output. Applications such as Matlab and Mathematica pioneered this model, but unlike those applications, Jupyter is a browser-based web application.
Google Colab is built around Project Jupyter code and hosts Jupyter notebooks without requiring any local software installation. But while Jupyter notebooks support multiple languages, including Python, Julia and R, Colab currently only supports Python.
Colab notebooks are stored in a Google Drive account and can be shared with other users, similar to other Google Drive files. The notebooks also include an autosave feature, but they do not support simultaneous editing, so collaboration must be serial rather than parallel.
Colab is free, but has limitations. There are some code types that are forbidden, such as media serving and crypto mining. Available resources are also limited and vary depending on demand, though Google Colab offers a pro version with more reliable resourcing. There are other cloud services based on Jupyter Notebook, including Azure Notebooks from Microsoft and SageMaker Notebooks from Amazon.
The benefits of Google Colab
Enterprise data analysts and analytics developers can use Colab to work through data analytics and manipulation problems in collaboration. They can write, execute and revise core code in a tight loop, developing the documentation in Markdown format, LaTeX or HTML as they go.
Notebooks can include embedded images as part of the documentation or as generated output. In addition, you can copy finished analytics code, with documentation, into other platforms for production use once sufficiently tested and debugged.
Google Colab eliminates the need for complex configuration setup and installation, as it runs right in the browser. It also includes pre-installed Python libraries that require no setup to use.
How to use Colaboratory
To use Colaboratory, you must have a Google account.
On your first visit, you will see a Welcome To Colaboratory notebook with links to video introductions and basic information on how to use Colab.
Create a workbook
From the File menu, click New notebook to create a workbook.
If you are not yet logged in to a Google account, the system will prompt you to log in.
The notebook will by default have a generic name; click on the filename field to rename it.
The file type, IPYNB, is short for "IPython notebook" because IPython was the forerunner of Jupyter Notebook.
The interface allows you to insert various kinds of cells, mainly text and code, which have their own shortcut buttons under the menu bar via the Insert menu.
Because notebooks are meant for sharing, there are accommodations throughout for structured documentation.
Code, debug, repeat
You can insert Python code to execute in a code cell. The code can be entirely standalone or imported from various Python libraries.
A notebook can be treated as a rolling log of work, with earlier code snippets being no longer executed in favor of later ones, or treated as an evolving set of code blocks intended for ongoing execution. The Runtime menu offers execution options, such as Run all, Run before or Run the focused cell, to match either approach.
Each code cell has a run icon on the left edge, as shown above. You can type code into a cell and hit the run icon to execute it immediately.
If the code generates an error, the error output will appear beneath the cell. Correcting the problem and hitting run again replaces the error info with program output. The first line of code, in its own cell, imports the NumPy library, which is the source of the arange function. Colab has many common libraries pre-loaded for easy import into programs.
A text cell provides basic rich text using Markdown formatting by default and allows for the insertion of images, HTML code and LaTeX formatting.
As you add text on the left side of the text cell, the formatted output appears on the right.
Once you stop editing a block, only the final formatted version shows.
Incorporating data into the notebook
After getting comfortable with the interface and using it for initial test coding, you must eventually provide the code with data to analyze or otherwise manipulate.
Colab can mount a user's Google Drive to the VM hosting their notebook using a code cell.
Once you hit run, Google will ask for permission to mount the drive.
If you allow it to connect, you will then have access to the files in your Google Drive via the /my_drive path.
If you prefer not to grant access to your Drive space, you can upload files or any network file space mounted as a drive from your local machine instead.
With file access, many functions are available to read data in various ways. For example, importing the Pandas library gives access to functions such as read_csv and read_json.
Save and share
By default, Colab puts notebooks in a Colab Notebooks folder under My Drive in Google Drive.
The File menu enables notebooks to be saved as named revisions in the version history, relocated using Move, or saved as a copy in Drive or GitHub. It also allows you to download and upload notebooks. Tools based on Jupyter provide broad compatibilities, so you can create notebooks in one place and then upload and use them in another.
You can use the Share button in the upper right to grant other Google users access to the notebook and to copy links.
Google also provides example notebooks illustrating available resources, such as pre-trained image classifiers and language transformers, as well as addressing common business problems, such as working with BigQuery or performing time series analytics. It also provides links to introductory Python coding notebooks.