Deploy Smarter, Faster, More Responsive LLMs at the Edge

Deploy Smarter, Faster, More Responsive LLMs at the Edge

Subscribe Now

Stay in the know on all things CODE. Updates are delivered to your inbox.

Overview

This session explores new techniques for running LLMs efficiently on client PCs and small-form-factor machines at the edge using the OpenVINO™ toolkit in combination with popular tools, libraries, and frameworks for model optimization and quantization.

Additionally, you’ll have the opportunity to gain practical experience by implementing a conversational AI voice agent using the OpenVINO toolkit and Gradio, an open source Python* package for quickly building a machine learning–based demo or web application.

This session provides:

Techniques for deploying advanced LLMs on edge devices for a variety of industries such as healthcare and manufacturing
How to optimize and quantize LLMs to ensure superior performance and low power consumption while reducing model size and computational demands
How to use AI Tools to create AI applications that are powerful and energy efficient for edge computing

Skill level: Expert

Featured Software Tools

OpenVINO toolkit

Optimum Intel library by Hugging Face*

Featured Code

Edge AI Reference Kits for OpenVINO Toolkit

Neural Network Compression Framework for OpenVINO Toolkit

Gradio on GitHub*

Gradio Quick Start

Jump to:

You May Also Like

You May Also Like

Related Articles

Essential AI Tools to Jumpstart AI Development Projects

How to Deploy AI Applications on AI PCs

Related Videos

Edge AI Reference Kit Demos

Explore AI PCs' Potential for Building Generative AI (GenAI) Solutions

Optimize Workloads for OpenVINO Toolkit at the Hardware Level

Build Next-Gen, Portable, Power-Efficient AI on an AI PC

Prototype and Deploy LLM Applications on Intel NPUs

<link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/250318/intel/clientlibs/pages/commons-page.min.css" type="text/css"><script src="/etc.clientlibs/settings/wcm/designs/ver/250318/intel/clientlibs/pages/commons-page.min.js" defer></script>

<link rel="preload" href="/etc.clientlibs/settings/wcm/designs/ver/250318/intel/clientlibs/pages/atomVideo.min.css" as="style"><link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/250318/intel/clientlibs/pages/atomVideo.min.css" type="text/css"><script src="/etc.clientlibs/settings/wcm/designs/ver/250318/intel/clientlibs/pages/atomVideo.min.js"></script>

<link rel="preload" href="/etc.clientlibs/settings/wcm/designs/ver/250318/intel/clientlibs/pages/colorBlock.min.css" as="style"><link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/250318/intel/clientlibs/pages/colorBlock.min.css" type="text/css">

<link rel="preload" href="/etc.clientlibs/settings/wcm/designs/ver/250318/intel/clientlibs/pages/contact-us.min.css" as="style"><link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/250318/intel/clientlibs/pages/contact-us.min.css" type="text/css">

<script>!function(){var e=setInterval(function(){"undefined"!=typeof $CQ&&($CQ(function(){CQ_Analytics.SegmentMgr.loadSegments("/etc/segmentation"),CQ_Analytics.ClientContextUtils.init("/etc/clientcontext/intel",window.location.pathname.substr(0,window.location.pathname.indexOf(".")))}),clearInterval(e))},100)}();</script>

<link rel="preload" as="style" href="/etc.clientlibs/settings/wcm/designs/intel/cn/zh/css/resources/css/intel.rwd.override.css"/>
<link rel="stylesheet" type="text/css" href="/etc.clientlibs/settings/wcm/designs/intel/cn/zh/css/resources/css/intel.rwd.override.css"/>