This tutorial demonstrates fine-tuning a GPT-2* model on Intel® Gaudi® AI processors using the Hugging Face* Optimum for Intel library with Microsoft DeepSpeed*.
Fine-Tune Defined
Training models from scratch can be expensive, especially with today’s large-scale models. Depending on the model size and scale, the estimated cost for the hardware needed to train such models can range from thousands of dollars all the way to millions of dollars. Fine-tuning is a process of taking a neural network model that has already been trained (usually called a pretrained model) and updating it to create a model that performs a specific task. Assuming that the original task is similar to the new task, using a pretrained model allows us to take full advantage of the feature extraction that occurs in the top layers of the network without having to develop and train a model from scratch.
This blog focuses on transformers. Pretrained transformers can be quickly fine-tuned for numerous downstream tasks and perform well. Let’s consider a pretrained transformer model that already understands language. Fine-tuning then focuses on training the model to perform question-answering, language generation, named-entity recognition, sentiment analysis, and other such tasks.
Given the cost and complexity of training large models, making use of pretrained models is an appealing approach. And in fact, there are many publicly available pretrained models. This blog focuses on the most popular open source transformer library, Hugging Face. The Hugging Face Hub contains a wide variety of pretrained transformer models, and the Hugging Face Transformers library makes it easy to use these pretrained models for fine-tuning.
Use Pretrained GPU Models to Fine-Tune on Intel Gaudi AI Processors and Vice Versa
While the pretraining process was done on a specific architecture, the saved pretrained model can be used on different architectures. For example, you can pretrain a model using Intel Gaudi AI processor, save it, and later fine-tune the model using a CPU. Or you can load a publicly available pretrained model, originally pretrained on a GPU, and continue training or fine-tuning it on an Intel Gaudi AI processor.
Start with Intel Gaudi Software and Hugging Face
Set up an Amazon EC2* DL1 instance with the latest Intel Gaudi software. For full instructions, see AWS DL1 Quick Start Guide.
Start the Docker* Software
Make sure to use the latest PyTorch* container from the PyTorch Docker Images for the Intel Gaudi Accelerator.
Create the model folder
Clone Optimum for Intel from Hugging Face and Set Up the Requirements
Install Microsoft DeepSpeed*
Fine tune the model
Return to the GPT-2 folder.
To create a new file called main.py, enter the following command:
The code fine-tunes the GPT-2 pretrained model using the WikiText dataset. It runs in distributed mode if multiple Intel Gaudi AI processors are available. Note that for fine-tuning, the argument model_name_or_path is used and it loads the model checkpoint for weights initialization.
Run the code using the following command:
This command produces the following results:
Use the New Fine-Tuned Model for Text Prediction
To create a new file called test.py, enter the following command:
Run the code using the following command:
This command produces the following results:
What’s next?
You can try different prompts and different configurations for running the model. You can find more information on Hugging Face Habana-optimum GitHub page, and Habana Developer site.