These days some models are being released with implementations running only on TPUs (e.g. text-to-text transformer )

Logging my experience setting up a Google cloud TPU node for running machine learning models.

The overall sequence of steps are

  1. Create Google cloud account if we don’t have one.
  2. Create a project.
  3. Create VM instance with a single CPU or GPU, OS, hard disk space, CPU memory, etc.
  4. Install tool (ctpu) on procured VM instance to create, manage and delete TPU instances. This step would require authorization which can be done using gcloud (comes installed by default in VM instances).
  5. Setup google cloud storage bucket.
  6. We can now use TPUs and store data in bucket. After using TPUs remember to release tpu instance, bucket storage (after saving output if needed), and VM instance

Instructions for steps 1–3 are covered in the article for setting up VM instances.

4. Installing and using ctpu

Fetch ctpu and include its location in PATH.

wget && chmod a+x ctpu

The command ctpu enables us to provision, manage and delete TPUs. However we need to enable this command to be authorized to perform these operations. This can be done by using gcloud as shown below. Type the command below and follow instructions to copy-paste the key from browser

gcloud auth application-default login

The simplest usage of ctpu is to create a TPU instance, check status, and to delete as shown below

ctpu up --tpu-onlyctpu statusctpu delete

In most cases however, we may need to specify additional options. For instance models may require bringing up TPU with specific versions of Tensorflow etc. (e.g. text-to-text transformer, the up command would be specified with additional options. )

ctpu up --name=$TPU_NAME --project=$PROJECT --zone=$ZONE --tpu-size=v3-8  --tpu-only   --tf-version=1.15.dev20190821 --noconf

5. Setting up and managing bucket storage

To run a model, we may have to copy the model either to our VM instance or store the model in a bucket. We can use gsutil (also comes installed by default on VM instances)

gsutil mb gs://<your bucket name>gsutil cp <source> gs://<your bucket name>gsutil rm -r gs://<your bucket name>

Operations on bucket, would require authorization which can be done using the command below and following instructions to cut and paste back auth code from the browser URL provided

gsutil config -b

6. Deleting instances

Finally when our work is done, we can delete our TPU, bucket storage and VM instance

ctpu delete gsutil rm -r gs://<your bucket name>

One anomalous behavior of ctpu is that it may not show the active TPU node we provisioned, when checking on status. This happens at times when we procure TPUs either from a VM instance or from Google cloud shell.

We can however, delete the TPU instance by the delete option in the web interface for Google cloud platform. Any active TPU node always shows up in web interface (so does VM instance even if we provision it with gcloud utility) VM instance as well as data bucket can also be deleted from the Google cloud platform web interface.

One could potentially avoid creating a VM instance in the first place by just using a Google cloud shell (not gcloud utility which gives us fine grained control of specifying required memory etc.) , but this may not be possible in all cases due to memory/resource constraints in google cloud shell.

Lastly, the VM instance we provision could be just a CPU or a GPU based on the model requirements.

This answer is imported manually from Quora.




Machine learning practitioner

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ajit Rajasekharan

Ajit Rajasekharan

Machine learning practitioner

More from Medium

Semantic Search with Deep Learning and Python

Pre-trained Python model for Sentiment Analysis

Using NLP For Business Success

A comparison of sentiment analysis techniques targeting cheap POC deployment on Azure ML