Document Summarization

A Step-by-Step Guide with OPEA™ 1.2 and Intel® Gaudi® 2

Introduction

Are you drowning in an ocean of information? In today's fast-paced world, staying updated and informed is essential for both personal and professional growth. However, the sheer volume of content available can be overwhelming. Document summarization applications offer a powerful solution by condensing information into concise summaries, capturing the key points from lengthy training videos, educational lectures, promotional content, research presentations, and podcasts. This greatly enhances your ability to keep up.

In this article, we will explore the various use cases of a Document Summarization (DocSum) application implemented using the Open Platform for Enterprise AI (OPEA™). Discover how these innovative tools can enhance productivity and efficiency across different domains. We will walk you through the steps to deploy and test drive OPEA’s DocSum application on the Intel® Gaudi® 2 AI accelerator using Intel® Tiber™ AI Cloud. From setup to execution, we’ll cover everything you need to know to unlock the potential of document summarization and transform the way you interact with information using this cutting-edge GenAI application.

What is OPEA?

OPEA is an open platform consisting of composable building blocks for state-of-the-art generative AI systems. It is ideal for showcasing DocSum because it is flexible and cost-effective. OPEA makes it easy to integrate advanced AI solutions into business systems, speeding up development and adding value. It uses a modular approach with microservices for flexibility and megaservices for comprehensive solutions, simplifying the development and scaling of complex AI applications. OPEA also supports powerful hardware like Intel Gaudi 2 and Intel® Xeon® Scalable Processors, which are adept at handling the heavy demands of AI models. Plus, OPEA’s GenAIExamples repository demonstrates many different scenarios and makes accessing the services easy and user-friendly.

Overview of the DocSum Application

The DocSum application leverages advanced open-source Large Language Models (LLMs) to revolutionize the way we interact with text. These models can be used to create summaries of various types of documents.

With DocSum, you can efficiently generate concise and accurate summaries, enhancing your ability to process and understand large volumes of information. The application supports summarization from various sources, including:

  • Plain text
  • Documents (.txt, .doc, .docx, .pdf)
  • Audio (.wav)
  • Video (.mp4)

At its core, DocSum consists of three key components: 

  1. User Interface Service: Provides an intuitive and user-friendly interface for interacting with the system. We provide two interface options — a Graphical User Interface (GUI) and a REST API.

  2. Domain Transform Service: According to the document, DocSum employs a variety of domain transformation tools to extract plain text, which is then prepared for processing by the LLM.

  3. LLM Service: Creates a summary of the document. This microservice leverages LangChain to implement summarization strategies and facilitate LLM inference using text generation inference. Based on the length of the context, the summary type can be selected, such as auto, stuff, truncate, map_reduce, and refine. Details of the LLM service can be found here.

This architecture ensures that DocSum can handle a wide range of domains from GUI and terminal.

Figure 1. DocSum Architecture
Figure 1. DocSum Architecture

Prerequisites

Before starting the setup of the DocSum application, make sure you have the following prerequisites:

  1. Hardware: Access to a machine equipped with two or more Intel Gaudi 2 processor cards is required. For this tutorial, we will utilize Intel Tiber AI Cloud, specifically an instance featuring 8 Gaudi 2 HL-225H mezzanine cards with 3rd Generation Xeon processors, 1 TB of RAM, and 20 TB of disk space. If you lack SSH access to the machine, you will need to port forward the UI port (5173) to access the user interface.

  1. Docker Compose: Docker Compose will be employed to run the services. Ensure that Docker Compose is installed on your machine.

Once these prerequisites are met, you can proceed with the step-by-step tutorial to set up and deploy the DocSum application on Intel Gaudi 2 using Intel Tiber AI Cloud.

Step-by-Step Tutorial

Follow these steps to get the DocSum application and its megaservice, which organizes the corresponding services, up and running on Intel Gaudi 2 using Tiber Cloud, and start summarizing your own data. 

Step 1: Connect to Your Gaudi Machine 

To begin, if you are using Tiber Cloud, initiate a Gaudi 2 instance and wait until it reaches the “Ready” state. Once the instance is ready, connect to the virtual machine (VM) via SSH with port forwarding to access the user interface on port 5173. Use the following command: 

ssh -L 5173:127.0.0.1:5173 <ip address>

Step 2: Clone the GenAIExamples Repository

Next, download the GenAIExamples repository. This repository contains the necessary files for the DocSum application. Clone the repository and navigate to the DocSum Application Intel Gaudi directory. The docker compose directory includes options for running with different hardware configurations.

git clone https://github.com/opea-project/GenAIExamples.git --depth 1 --single-branch --branch v1.2 cd GenAIExamples/DocSum/docker_compose

Step 3: Configure the Environment

Proceed to configure the environment by setting the necessary environment variables for the host IP (external public IP) and your Hugging Face token. Execute the set_env.sh script for Gaudi. If your enterprise operates behind a proxy, OPEA components will require specific proxy environment variables. Many parameters can be customized by editing the set_env.sh script before sourcing it. For instance, on Gaudi, the LLM model defaults to Intel/neural-chat-7b-v3-3.

To set up environment variables follow these steps:

export host_ip=$(hostname -I | awk '{print $1}')

Some Hugging Face models are gated and require a token, along with accepting Hugging Face's terms of use, as shown below:

export HUGGINGFACEHUB_API_TOKEN=<your Hugging Face API token>

The set_env.sh script also specifies the port numbers for the services, which can be modified if necessary. After reviewing and editing the entries in the set_env.sh script, source it in your environment:

source ./set_env.sh

Step 4: Start the Services with Docker Compose

To launch the services, use Docker Compose. The Docker Compose YAML configuration file defines and manages the multi-container DocSum application, handling networks, variables, ports, and other dependencies. Ensure you are in the GenAIExamples/DocSum/docker_compose directory, then execute the following command to start the services:

cd intel/hpu/gaudi docker compose up -d

Step 5: Monitor the Service Initialization

Docker Compose will pull images from DockerHub and start the containers according to the configuration file. Some services, such as llm-docsum-server, may take several minutes to initialize, depending on your environment. You can monitor the readiness of the services before testing by following the container logs. Here’s how: 

  1. Get the list of container names.

    docker ps

    Expected container names for DocSum:
        docsum-xeon-ui-server
        docsum-xeon-ui-server
        docsum-xeon-backend-server
        llm-docsum-server
        whisper-server
        tgi-server

  2. Check the service’s status of the desired container.

    docker logs <CONTAINER_NAME> -f

Step 6: Access the User Interface (UI)

The UI for DocSum makes it easy to interact with the system. We provide two interface options:

  1. GUI: Included with the DocSum application example and deployed in a Docker container. It offers a user-friendly way to interact with the system by allowing users to input text, upload documents, and view summary responses. From your browser, navigate to http://127.0.0.1:5173 and access the DocSum application UI.

    Figure 2. Creating a Summary of Text in the DocSum UI
    Figure 2. Creating a Summary of Text


    Users can paste the text to be summarized into the text box. By clicking the "Generate Summary" button, the summarization of the text will start. Additionally, users can upload files from their local device. Once a file is uploaded, the summarization of the document will start automatically. A condensed summary of the content will be produced and displayed in the "Summary" box on the right.

  2. REST API: All functionalities provided by the GUI can also be accessed via curl commands. This flexibility allows for a programmatic interface, enabling the integration of the DocSum application with other applications. Here are some examples:

    English mode (default)

    curl http://${host_ip}:8888/v1/docsum \ -H "Content-Type: multipart/form-data" \ -F "type=text" \ -F "messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." \ -F "max_tokens=32" \ -F "language=en" \ -F "stream=false"

    Chinese mode

    curl http://${host_ip}:8888/v1/docsum \ -H "Content-Type: multipart/form-data" \ -F "type=text" \ -F "messages=2024年9月26日,北京——今日,英特尔正式发布英特尔® 至强® 6性能核处理器(代号Granite Rapids),为AI、数据分析、科学计算等计算密集型业务提供卓越性能。" \ -F "max_tokens=32" \ -F "language=zh" \ -F "stream=false"

    Upload file

    curl http://${host_ip}:8888/v1/docsum \ -H "Content-Type: multipart/form-data" \ -F "type=text" \ -F "messages=" \ -F "files=@/path to your file (.txt, .docx, .pdf)" \ -F "max_tokens=32" \ -F "language=en" \ -F "stream=false" \ -F "summary_type=auto" # "stuff", "truncate", "map_reduce", "refine", default is "auto"

    Audio and video file uploads are not supported in DocSum with a curl request in OPEA 1.2. Please use the GUI for these types of uploads. However, you can still pass a base64 string of the audio or video file as follows:

    curl http://${host_ip}:8888/v1/docsum \ -H "Content-Type: multipart/form-data" \ -F "messages=UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA" \ -F "max_tokens=32" \ -F "language=en" \ -F "stream=true" \ -F "type=audio" # "video"

Step 7: Shutdown

Once you have completed using the DocSum application service, it's important to stop all running containers to release system resources. Navigate to the directory where your Docker Compose YAML file is located and execute the following command:

docker compose down

Conclusion

In an era overwhelmed by information, the ability to quickly and effectively summarize content is invaluable. The DocSum application, implemented using OPEA, offers a powerful solution to this challenge. By leveraging state-of-the-art generative AI systems and robust hardware like the Intel® Gaudi® 2 AI accelerator, DocSum transforms the way we interact with information, making it more accessible and manageable.

Throughout this article, we have explored the potential of document summarization across various domains. The flexibility and cost-effectiveness of OPEA make it an ideal platform for deploying advanced AI solutions, enhancing productivity and efficiency.

We encourage you to experiment with the DocSum application, integrate it into your workflows, and experience firsthand the benefits it brings. Your feedback is invaluable, so please share your experiences and suggestions through our GitHub repository.

Acknowledgements

Thank you to our colleagues who made contributions and helped to review this blog: Melanie Hart Buehler, Dina Suehiro Jones, Harsha Ramayanam, and Abolfazl Shahbazi.


Related Content

Multimodal Question and Answer: A Step-by-Step Guide