Write Training Scripts to Run on GPUs, CPUs, or Intel® Gaudi® AI Accelerators

Optimize with Intel® Gaudi® AI Accelerators

  • Create new deep learning models or migrate existing code in minutes.

  • Deliver generative AI performance with simplified development and increased productivity.

This tutorial demonstrates how to write code that automatically detects what type of AI accelerator is installed on a machine (GPU, CPU, or Intel® Gaudi® AI accelerator) and how to make the needed changes to run the code smoothly. Developers may want to run the same model code on different types of AI accelerators. For example, a developer may want to write PyTorch* code on a development laptop that has a GPU but intends to run the code on a training server that uses an Intel Gaudi AI accelerator. With minimal code changes, it is easier to enable multiple hardware platforms for a common code base.

This tutorial uses the Getting Started with Training on Intel Gaudi AI Accelerator torch_compile.py example to show how to write cross-platform code with just a few adjustments. The torch_compile.py example has the following code:

import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torchvision import datasets, transforms from torch.optim.lr_scheduler import StepLR import os import sys import habana_frameworks.torch.core as htcore import habana_frameworks.torch.hpu as hthpu class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 32, 3, 1) self.conv2 = nn.Conv2d(32, 64, 3, 1) self.dropout1 = nn.Dropout(0.25) self.dropout2 = nn.Dropout(0.5) self.fc1 = nn.Linear(7744, 128) self.fc2 = nn.Linear(128, 10) def forward(self, x): x = self.conv1(x) x = F.relu(x) x = self.conv2(x) x = F.relu(x) x = F.max_pool2d(x, 3, 2) x = self.dropout1(x) x = torch.flatten(x, 1) x = self.fc1(x) x = F.relu(x) x = self.dropout2(x) x = self.fc2(x) output = F.log_softmax(x, dim=1) return output def train(model, device, train_loader, optimizer, epoch): model.train() model = torch.compile(model,backend="hpu_backend") def train_function(data, target): optimizer.zero_grad() output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step() return loss training_step = 0 for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device) loss = train_function(data, target) if batch_idx % 10 == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( epoch, batch_idx * len(data), len(train_loader.dataset), 100. * batch_idx / len(train_loader), loss.item())) def main(): device = torch.device("hpu") model = Net().to(device) optimizer = optim.Adadelta(model.parameters(), lr=1.0) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ]) dataset = datasets.MNIST('./data', train=True, download=True, transform=transform) train_loader = torch.utils.data.DataLoader(dataset, batch_size=500) scheduler = StepLR(optimizer, step_size=1, gamma=0.7) for epoch in range(0,1): train(model, device, train_loader, optimizer, epoch) scheduler.step() print("torch.compile training completed.") if __name__ == '__main__': main()

After this code is copied to a torch_compile.py file on the target platform it can then be run using the command:

python torch_compile.py

Running the code in an environment that enables PyTorch but doesn't have the Intel® Gaudi® software installed generates the following error:

ModuleNotFoundError: No module named 'habana_frameworks'

To overcome this obstacle, wrap the habana_frameworks import statements with a try/except block as follows:

try: import habana_frameworks.torch.core as htcore import habana_frameworks.torch.hpu as hthpu except: htcore = None hthpu = None

If an exception is thrown because of the import failure, this code assigns both htcore and hthpu a None value, indicating that the Intel Gaudi software stack is not installed nor available on the platform. Rerun the code. The following error is generated:

RuntimeError: PyTorch is not linked with support for hpu devices

The code no longer fails on the import statements, but it is still trying to move the model to the Intel Gaudi accelerator back end, which is not supported without the Intel Gaudi software or drivers. For the original line that enables the accelerator (HPU):

device = torch.device("hpu")

Replace that line with code that dynamically uses the best available hardware:

if hthpu and hthpu.is_available(): target = "hpu"; print("Using HPU") elif torch.cuda.is_available(): target = "cuda"; print("Using GPU") else: target = "cpu" print("Using CPU") device = torch.device(target)

Remember that the value None was assigned to hthpu when the habana_frameworks module was not installed and can be used to determine if Intel Gaudi software is installed. If it is installed, the code uses the is_available() API to dynamically identify that the server has an Intel Gaudi accelerator. If an Intel Gaudi accelerator is not available, use the similar CUDA* API to check if the server has a GPU. If nothing is there, the code uses a CPU.

Next, make sure to compile the model using the hpu_backend option only when Intel Gaudi software is present:

if hthpu and hthpu.is_available(): model = torch.compile(model,backend="hpu_backend") else: model = torch.compile(model)

The code now works on GPUs, CPUs, and Intel Gaudi AI accelerators.

Related Article

Intel Gaudi Accelerators, PyTorch*, and Python* API