Introduction
In this first chapter, we will cover the theoretical fundamentals of profiling and provide practical advice for profiling and optimizing your application with Unreal Engine*. We will begin with an overview of profiling fundamentals, discussing best practices, profiling metrics, and briefly touching on frame pacing and frame budget in your application. Next, we will explore how to determine whether your application is CPU-Bound or GPU-Bound, the tools available to detect this, and the subsequent steps to take. Following this, we will discuss how to prepare for profiling, including recommended build configurations, and the console commands to use to minimize data and CPU overhead during profiling.
Profiling Fundamentals
To begin profiling, you must select a representative system with the CPU and GPU specifications for which you intend to optimize your application. Identify problematic scenes, establish your performance goals, and align your specifications with the target audience's systems. Consider the following questions to guide your process:
- What are your performance targets?
- Minimum: 1080p Medium 30 FPS?
- Recommended: 1440p High 60 FPS?
- What is your frame budget?
- What technologies do you plan to utilize?
- Are they supported on all platforms?
- Do you plan to support integrated GPU or handhelds?
- What is your minimum and recommended specification?
- CPU
- GPU
- RAM
- VRAM
- How frequenty do you profile your game?
Unreal Engine* 5 Hardware and Software Specifications can help you with establishing the base requirements, but at the end it is all about making sure that your game meets performance targets on mentioned specification.
Once you have addressed these questions, you can proceed to configure your representative hardware setup for profiling. For profiling and optimization purposes, it is crucial to maintain a stable and reproducible environment. To ensure accurate comparisons of your changes, it is essential to profile on the same CPU and GPU, BIOS settings, and OS versions. However, considering our ongoing efforts to improve performance in our driver, we highly recommend updating GPU drivers and profile games often. Additionally, avoid running demanding applications in the background. Ideally, only your game and profiling tools should be active.
During profiling, you will likely identify specific areas or camera angles in your game that cause performance issues in certain situations, such as:
- A specific model causing an FPS drop when viewed
- An FPS drop when activating certain light sources
- Reflective materials causing an FPS drop
- Stutters or hitches when moving closer to certain objects
- The game slows down as assets are loaded in the background
- Stutters during the first run of the game but not during subsequent runs
It is crucial to emphasize that determining whether your application is bottlenecked by CPU or GPU should be based on multiple samples, rather than a single capture taken over a short period with a low sampling rate. Different spots in the game can have completely different performance results and they can be caused by completely different issues.
Another factor to consider is that different platform configurations can produce significantly different results, which is where scalability settings become relevant. We will discuss these in detail later.
There are numerous tools available to help you diagnose these issues, including:
- Unreal Insights
- Intel® PresentMon
- Intel® VTune™
- Intel® Graphics Performance Analyzers
- GPUView
- PIX
- RenderDoc
Performance Metrics
Frame Time vs Frame Rate
Frame Time (MPF - milliseconds per frame): This metric represents the time it takes to render a single frame. It is used to discuss frame budget and analyze rendering performance. Frame time is considered a more reliable metric when evaluating or discussing performance.
Frame Rate (FPS - frames per second): This metric indicates the number of frames rendered in one second. It is primarily used to establish performance goals and can be useful for approximating overall game performance.
Conversion
Conversion between frame time and frame rate can be achieved using a simple equation. However, it is important to note that precision may be lost, especially when calculating average times based on multiple frames over a defined period.
To simplify, the conversions can be expressed as follows:
Frame Rate = 1000 / Frame Time (e.g., 60 FPS = 1000 / 16.67ms)
Frame Time = 1000 / Frame Rate (e.g., 16.67ms = 1000 / 60 FPS)
Percentiles, Median
k-th Percentile
It is used to determine a value for which x% of all remaining values are smaller than it.
Median
The median is basically the percentile 50th. A good indicator of consistent frame pacing in your game is when the Median is almost equal to average of values.
What to use them for?
Percentiles are used in benchmarks and performance analysis to screen out extreme values. It can be used to screen out singular hitches/stutters in benchmark. It would be desirable to eliminate those stutters and hitches if possible before benchmarking.
Let's say that we have a benchmark application from which we have 10,000 frame time values. When we sort them from smallest (1) to largest (10,000) value, the 99th percentile is the value on 9900th place. It means that 99% of values are smaller than this one, and 1% of values are higher - in MPF lower means better in FPS higher means better.
Percentiles are especially useful when it comes to hitch and stutter analysis, but they require good sampling and a large enough data set to get to a conclusion based on available values. There are more options available but percentiles can be helpful in understanding how often the game stutters even if you cannot see it with your own eye.
In stutter analysis instead of thinking about values below we have to reverse it and start to think about this 1% above - it means that 1% of frame times are above X ms. Also, it is worth reducing the percentile step around 98th-100th from 1 to 0.1 when we have a large enough data set. The steepness of the diagram depends on sample count but also on overall game performance. Observing big differences in frame times on the right side of the diagram can indicate how on average your game behaves and how many frames lag - causing stutters.
Frame Budget
The overall idea of Frame Budget is simple - you have to set a time limit for single-frame execution and stick to it in every frame for both CPU and GPU. Frame Budgets are: 33.33ms for 30 FPS, 16.66ms for 60 FPS
Think about the frame budget as a finite resource - you can only spend some of it for features. Some of the features may look great but they can be also really expensive. It is up to you to decide whether they are worth it or not and if they are crucial for your game. Maybe you can simplify some solutions or optimize them by using good practices and patterns without the need to resign from features completely.
PCs in opposite to platforms with consistent hardware specifications like handhelds or consoles require you to scale your game across different hardware configurations. Because of this, you have to think about how it can be scaled accordingly for integrated and discrete graphics.
Some hardware features may not be available on some platforms and they will require your game to prepare alternative solutions to handle it.
Unreal Engine* offers many ways to do it. For more information please refer to: Significance Manager Reference and Scalability Reference.
Frame Pacing
Frame Pacing is a term related to the synchronization of the game's logic and rendering loop. Proper Frame Pacing ensures that frames are consistently delivered over time and that frames are rendered in regular intervals. Frame Pacing is extremely important to provide a smooth experience to the player.
With irregular frame time, players can experience freezes, stuttering, hitches, and jerky animations and it can be perceived as lag during gameplay. Even with a high average frame rate, it can be perceived as much lower because a couple of frames take longer to execute.
As simple as the definition may appear it is extremely challenging and complex to do it correctly in 100% of cases but there are many solutions and tools to make it possible. There are many elements that count into the consistency of frames:
- Frame persistence
- Monitor refresh rates
- Shader Compilation Stutters - refer to PSO Cache and Game Engines & Shader Stuttering: Unreal Engine*’s Solution
- Leveraging Task Systems
- Leveraging multithreaded architecture
- Leveraging Data-Oriented Programming - refer to CppCon 2014: Mike Acton "Data-Oriented Design and C++ and Jason Booth Practical Optimizations
- Input latency
- Using async calculations when possible
- Resuing GPU resources instead of recreating them
- Utilizing Garbage Collector inefficiently - refer to Garbage Collection, Incremental Garbage Collection, and in-code documentation of gc.* flags.
- Spawning to many objects at once
- Using Significance Manager
- Using not adequate Scalability Settings causing memory throughput issues
Unreal Engine* offers stat commands to make it easier to detect, however those issues don't have to be 100% reproducible all of the time if they are that is definitely easier to identify but sometimes it will require multiple runs or longer gameplay to get to conclusions.
Helpful stat in Unreal Engine*:
stat DumpHitches // write to Log anytime a "hitch" is detected based on `t.HitchFrameTimeThreshold` stat Hitches // set `t.HitchFrameTimeThreshold` time in seconds that is considered a hitch. profilegpuhitches // opens profiler when hitch happens and record what happened
For more information please refer to Frame Pacing for Mobile Device and Low Latency Frame Syncing
CPU-Bound or GPU-Bound?
What does it mean to be CPU-Bound or GPU-Bound?
In a really simplified definition it is determining whether your application is limited due to CPU or GPU and for which one your game has to wait to produce the next frame. Both scenarios can lead to poor user experience, but identifying the actual bottleneck is important to decide what to profile and how to profile it. Typically, you will begin by assessing whether your application is CPU-Bound or GPU-Bound based on utilization and idle metrics, and then proceed with the appropriate profiling tools. This process may ultimately reveal that the issue lies elsewhere, such as in memory bandwidth, execution unit stalls, improper synchronization, inefficient algorithms, etc.
CPU-Bound
In that case, the main limitation is the CPU's ability to execute tasks in time during each frame. The GPU has to wait for the CPU to submit new frames to be rendered. Those are heavy tasks such as game logic, physics, collisions, preparing data for the next frame, etc. and they are executed on the GameThread. In this case it is going to be a combination of high CPU utilization and low GPU utilization.
GPU-Bound
In that case, the main limitation is the GPU's ability to execute tasks in time during each frame. The CPU has to wait for the GPU to finish rendering the previous frame before submitting new frames to be rendered. Those are tasks like geometry transformation, shading, post-processing, upscaling, etc. In this case it is going to be a combination of high GPU utilization (above 95%) and relatively low CPU utilization.
Determine if the Game is CPU-Bound or GPU-Bound
Establishing whether you are CPU-Bound or GPU-Bound can help you with a lot of guesswork and allows you to focus on optimizing what is important till you reach your performance goals and fit in your frame budget. To determine whether your game is CPU-Bound or GPU-Bound you can use multiple tools. We will show only a few examples but there are many more tools than mentioned.
As a starting point, we can assume that the value that means we are GPU-Bound is 95% GPU Utilization and above.
Unreal Engine*
Your first go-to approach will likely be utilizing stat commands built into the engine. These commands are available both in Editor and in Packaged Build (depending on configuration). For more information please refer to Stat Commands
Using those commands displays performance information with metrics for specific threads.
stat detailed // stat FPS, stat UNIT, stat UnitGraph stat t.TargetFrameTimeThreshold <miliseconds> // adjust Y-axis in UnitGraph stat t.UnacceptableFrameTimeThreshold // adjust budget threshold in stats to make them red above defined value r.vsync 0 // vsync disabled t.maxfps 0 // cap max fps, 0 for unlimited
Based on this information we can get a global overview of what is going on in the frame, and reach some basic conclusions, if the Draw Time is ~95% of Frame Time it is an indicator that it is GPU-Bound. If Game Time is close to Frame Time and Draw Time is lower than 90% of Frame Time it is an indicator that it can be CPU-Bound.
If something is not clear you can use these commands that can help you with establishing whether you are CPU-Bound or GPU-Bound.
r.SetRes 2560x1440f // f for fullscreen, wf for windowed fullscreen r.ScreenPercentage 100 // if increasing it by 25%/50% does not impact performance much it is a indicator that app can be CPU-Bound pause // pause GameThread
PresentMon
PresentMon can help establishing GPU and Memory Utilization by providing charts at runtime. As shown below we have a workload that is GPU-Bound with 98.6% GPU Utilization - this is our first hint where we should start to look for performance improvements.
For more information please refer to the Github and Intel Gaming
GPUView
Longstanding but still a great tool that can help you with determining whether your application is CPU-Bound or GPU-Bound or both. It is a part of Windows Performance Toolkit (WPT) and you can use it to capture ETL Traces for really advanced profiling analysis. For more information please refer to GPUView
When it comes to prerequisites when capturing ETL trace using GPUView we recommend:
- Use single monitor setup
- Disable all background apps
- V-sync off
- Framerate unlimited
- Disable upscalers / frame generation / dynamic resolution
- Render scale at 100
It is also important to mention Hardware-accelerated GPU scheduling which can significantly affect ETL logs and sometimes even point you in an incorrect direction whether the game is GPU-Bound or CPU-Bound. To make sure that you will get correct information about your bottleneck DISABLE Hardware-accelerated GPU scheduling. In other cases, you can observe high CPU spikes that point to CPU-Bound when in fact your game may be GPU-Bound.
After collecting ETL you can observe GPU Utilization in the right upper corner in Adapter -> Hardware Queue -> 3D / Copy / Compute for the selected scope (if work is distributed across different queues - sum their time to get total GPU utilization) but you can also observe much more things like: Context Switches, Pipeline stalls, synchronization issues etc.
What to do when you establish CPU / GPU-Boundness?
Some profiling tools can only attach to your game if the build configuration permits it. For instance, the Shipping build configuration typically prohibits external tools from attaching or limits the data, while the Test build configuration allows it but is only available in engine builds from source. The most common choice is to use the Development build configuration, but as we will show later it is not the best choice for profiling.
If the Game is CPU-Bound
To enhance CPU performance, consider the following profiling strategies:
- Utilize the stats tools available in Unreal Engine*.
- Use Unreal Insights for in-depth performance analysis.
- Use Intel® VTune™ Profiler to identify hotspots and optimize CPU utilization.
- Refer to the User Guide and Cookbook for detailed instructions on how this tool can assist you.
When addressing CPU-Bound performance issues, it is essential to consider the three major categories: Memory Bound, Compute Bound, and I/O Bound.
Examples of potential causes of suboptimal CPU performance include:
- Operating on a bunch of dynamically allocated objects and not leveraging managers and vectorization
- Inefficient workload distribution within a frame. Evaluate if work can be distributed asynchronously across multiple frames.
- Improper synchronization, causing CPU threads to remain idle for extended periods.
- Failure to use object pooling, resulting in loading time hitches.
- Brute force algorithms with poor complexity.
If the Game is GPU-Bound
To enhance GPU performance, consider the following profiling strategies:
- Utilize the stat tools available in Unreal Engine*.
- Use ProfileGPU
- Use Unreal Insights for in-depth performance analysis.
- Use PIX / Render Doc / Graphics Performance Analyzer / GPUView
When dealing with GPU-Bound performance issues, there are several categories to consider, including Front-end Bound, Vertex Fetch Bound, Geometry Bound, Pixel Bound, Sampler Bound, and Thread Bound.
Sometimes, a game may perform poorly during the first run but improve on subsequent runs due to background shader compilation and caching it in the local cache. It can also improve over time because of shader recompilation and compiler optimizations being applied changing used SIMD length, General Purpose Registers (GRF) mode, or reducing shader spiling.
If your game is GPU-Bound, the first step is to capture a frame trace to identify the root cause. A frame capture records GPU activity during the rendering of a single frame. It can be captured by either built-in tools as well as external tools.
When to profile?
It would be the best to include profiling and optimization in your process and do it often, however when it comes to reality probably the best answer would be to profile only when you are not meeting expected frame times on your recommended setups. At least it sounds that easy, right? Lack of access to hardware, different platform configurations, performance issues not visible in single frame analysis, stutters, garbage collection and many many more affects the ability to optimize game across wide range of hardware.
What to profile?
You can profile a standalone game from the editor, but we only recommend using it to eyeball what is going on in the project - the best approach is to use a packaged build, preferably Test Configuration.
Probably the best way to start profiling and optimizing your game is to maintain a benchmark with all technologies you want to work with. It can be automated with multiple camera angles or camera fly through the location to collect data. It does not have to be a separate level - you can use already existing ones. This is not perfect but it will help your performance analysis significantly. Especially since you can use it for different scalability settings on different platforms and collect results to compare.
How to prepare build for Profiling?
Build Configuration
Unreal Engine*'s custom build method offers various configurations, each containing different optimizations and debug features. Detailed information on how these build configurations differ can be found here: Unreal Engine* Build Configuration
We recommend using a slightly modified Test Configuration for any kind of performance profiling. Test Configuration has fewer features enabled than Development Configuration but it is close to Shipping Configuration when it comes to performance and optimizations. You can think of Test Configuration as Shipping Configuration when it comes to performance but you have access to limited console commands and it can be profiled by Unreal Insights.
The significant reason for this recommendation is that the Debug and Development Configurations, by default, use r.D3D12.DRED, while Test and Shipping Configurations utilize r.D3D12.LightweightDRED, which is less demanding on GPUs, particularly on Intel® Xe Architecture. Based on our observations and received feedback this issue has been significantly improved in Intel® Xe2 architecture.
It is worth mentioning that since Unreal Engine* 5.5 r.GPUCrashDebugging.Breadcrumbs are enabled by default for all configurations, and r.D3D12.LightweightDRED and r.D3D12.DRED are disabled. They must be enabled manually when needed (-lightdred, -dred, -gpubreadcrumbs).
Taking this under consideration using the Development Configuration for profiling can result in substantial performance differences compared to the Shipping or Test Configurations. If you must use the Development Configuration (for instance, when using the retail launcher version of the engine), you can run your game binary with the following command to disable DRED and other instrumentation:
start Game-Win64-Test.exe -dpcvars=r.D3D12.DRED=0,r.D3D12.LightweightDRED=0,r.GPUCrashDebugging.Breadcrumbs=0
However, keep in mind that DRED is not the only difference between Test and Development config and results from profiling can still be different.
Configuration | Shipping | Test | Development |
---|---|---|---|
Unreal Insights | UE_TRACE_ENABLED=1 | Yes | Yes |
PIX Markers | No | bAllowProfileGPUInTest | Yes |
Log to file | bUseLoggingInShipping | bUseLoggingInShipping | Yes |
stat | No |
Limited by default, FORCE_USE_STATS=1 |
Full |
statnamedevents | No |
-statnamedevents (STAT has to be enabled) |
-statnamedevents |
debugdraws | No | UE_ENABLE_DEBUG_DRAWING=1 | r.ForceDebugViewModes=1 |
LLMTracker | No |
ALLOW_LOW_LEVEL_MEM_TRACKER_IN_TEST=1 -LLM |
-LLM |
DRED | No | No | Yes |
Lightweight DRED | Yes | Yes | Yes |
Breadcrumbs | No | No | No |
Since the Test and Shipping Configurations are limited in terms of profiling details and may lack information such as statnamedevents, ProfileGPU, debug draw, llm, logging you have to enable some of these features manually in Unreal Engine* or in your project.
Here is our proposed configuration for your project Target.build.cs
What bAllowProfileGPUInTest does is changing the ALLOW_PROFILEGPU_IN_TEST property for UBT to include additional info in Test build:
For more properties please refer to Targets.
Development Config by default contains profiling details, GPU markers and you can add statnamedevents by adding it as console arguments. Please refer to Command-Line Arguments Reference.
Hardware Profile Guided Optimization
Additionally, you can use Hardware Profile Guided Optimization (HPGO) to optimize machine code execution for Unreal Engine* 5.4 and above. More information can be found here: oneAPI Optimization for Unreal Engine* 5.
- bAllowLTCG - Whether to allow the use of link time code generation (LTCG).
- bPGOProfile - Whether to enable Profile Guided Optimization (PGO) instrumentation in this build.
- bPGOOptimize - Whether to optimize this build with Profile Guided Optimization (PGO).
How to Profile?
Unreal Engine* offers many different ways to profile your game. There are a couple of typical approaches to use. We are going to start with simple ones, and incrementally move to more in-depth profiling methods and tools.
Prerequisites
Resizable BAR
Before we start looking at tools one thing that you want to make sure of is if you have Resizable BAR enabled. You can see that information in ARC Control Panel (Intel® Graphics Software) for more information please refer to What Is Resizable BAR and How Do I Enable It?
Project Configuration
It is important to choose the correct presets during profiling according to your current setup - by default, scalability settings are defined in BaseScalability and they are applied based on synthbenchmark CPU and GPU scores. With your project growing and utilizing other features in the engine it is important to keep your scalability settings up to date in your DefaultScalability.ini. You can set scalability settings by using scalability auto and verify if thresholds are set properly, or set them manually by using scalability <0-3>. You can also set single Scalability Group levels from the console, they start with sg.*. For more information please refer to Scalability Reference.
We will cover Scalability Settings and synthbenchmark in the next Chapter. However for now what you should know is that Unreal Engine* offers synthbenchmark tool together with BaseScalability.ini file defined below. Synthbenchmark can be helpful with establishing overall CPU and GPU scores but as it says it is the synthetic benchmark and it cannot predict exactly how your game will behave. To adjust Preset levels the best way is to test your game and adjust scores based on its performance, however, using Synthbenchmark at the beginning can be helpful to adjust some thresholds.
;Default values for Scalability Settings ;The first number is the perfindex threshold that changes quality from 0 to 1. ;The second is the threshold from 1 to 2, ;The third is the threshold from 2 to 3. [ScalabilitySettings] PerfIndexThresholds_ResolutionQuality="GPU 18 42 115" PerfIndexThresholds_ViewDistanceQuality="Min 18 42 105" PerfIndexThresholds_AntiAliasingQuality="GPU 18 42 115" PerfIndexThresholds_ShadowQuality="Min 18 42 105" PerfIndexThresholds_GlobalIlluminationQuality="GPU 18 42 115" PerfIndexThresholds_ReflectionQuality="GPU 18 42 115" PerfIndexThresholds_PostProcessQuality="GPU 18 42 115" PerfIndexThresholds_TextureQuality="GPU 18 42 115" PerfIndexThresholds_EffectsQuality="Min 18 42 105" PerfIndexThresholds_FoliageQuality="GPU 18 42 115" PerfIndexThresholds_ShadingQuality="GPU 18 42 115" PerfIndexValues_ResolutionQuality="50 71 87 100 100"
Considering the current requirements of Unreal Engine* 5 and that we have more and more powerful GPUs those scores are quite low which is why we recommend increasing them even up to "GPU 150 260 550"
Another important thing is that synthbenchmark results can be extremely different when tested on Development Configuration and when tested on Test Configuration, even up to 30% higher on Test.
Viewport Modes / Buffer Visualizations
Let's start with the first tool that can help you with your game profiling. There are many visualization modes available and they can be used to help you with debugging and optimizing issues in your scene. They are available in Editor in the left upper corner and can be enabled in development build r.ForceDebugViewModes=1 from the console. They can help you visualize potential issues in your scene such as Nanite shader overdraws, Nanite programmable vs fixed rasterization, Potential Improvements in meshes of your object when using Nanite, Lumen Reflections, page invalidation in VSM, TSR buffers, unnecessary collisions, used LODs, shader complexity, high-resolution textures and many more.
Viewport Modes are compiled out from Shipping and Test Configuration builds and require to be added in Target.build.cs or by source code modification. However, depending on the project that may not be a straightforward change and it may require some additional work. Most of the time viewport modes and visualization are utilized in viewport as a hint of what may be wrong and what can be improved.
Some visualizations require additional flags to be set like for example r.nanite.visualize.advanced 1 that enables Nanite advanced visualizations. You can search for .Visualize in the console to see available modes.
Some viewport modes are available under hotkey for more information please refer to Viewport Modes. However they can be also enabled from the console.
Viewmode <viewmode>
You can search for all available view modes in Engine/Source/Runtime/Engine/Private/ShowFlags.cpp. Here is the list of available view modes:
- BrushWireframe
- Wireframe
- Unlit
- Lit
- Lit_DetailLighting
- Lit_Wireframe
- LightingOnly
- LightComplexity
- ShaderComplexity
- QuadOverdraw
- ShaderComplexityWithQuadOverdraw
- PrimitiveDistanceAccuracy
- MeshUVDensityAccuracy
- MaterialTextureScaleAccuracy
- RequiredTextureResolution
- VirtualTexturePendingMips
- StationaryLightOverlap
- LightmapDensity
- LitLightmapDensity
- ReflectionOverride
- VisualizeBuffer
- VisualizeNanite
- VisualizeLumen
- VisualizeSubstrate
- VisualizeGroom
- VisualizeVirtualShadowMap
- RayTracingDebug
- PathTracing
- CollisionPawn
- CollisionVis
- LODColoration
- HLODColoration
- VisualizeGPUSkinCache
- LWCComplexity
ABTest
ABTest is a console command to compare performance between two different variable values. It is quite an ancient feature and not longer maintained but it is great for quick comparison between two different variable values.
It is designed to be used with single variable, but it is possible to create custom Utility Flag that modify more than one value at once. We do not recommend to use it that way with complex set of variables like for example whole Scalability Settings, but rather with variables that are somehow dependent of each other.
By default UE_LOG is not used in UE_BUILD_TEST - you can observe messages when attached with debugger in output window, but if you want to have them in Log file you have to modify engine source in ABTesting.cpp.
In other case you will only see change cvars without results output in console and full log will be visible in IDE debugger output.
It allows you to modify its behavior by adjusting settings from console.
abtest.HistoryNum 1000 // Number of history frames to use for stats abtest.ReportNum 100 // Number of frames between reports abtest.CoolDown 5 // Number of frames to discard data after each command to cover threading abtest.MinFramesPerTrial 10 // The number of frames to run a given command before switching; this is randomized abtest.NumResamples 256 // The number of resamples to use to determine confidence
Example of Usage
abtest r.LumenScene.Radiosity.UpdateFactor 64 128
Example Output
LogConsoleResponse: Display: abtest started with A = 'r.LumenScene.Radiosity.UpdateFactor 64' and B = 'r.LumenScene.Radiosity.UpdateFactor 128' r.LumenScene.Radiosity.UpdateFactor = "128" r.LumenScene.Radiosity.UpdateFactor = "64" r.LumenScene.Radiosity.UpdateFactor = "128" r.LumenScene.Radiosity.UpdateFactor = "64" r.LumenScene.Radiosity.UpdateFactor = "128" LogConsoleResponse: Display: 24.3991ms ( 53 samples) A = 'r.LumenScene.Radiosity.UpdateFactor 64' LogConsoleResponse: Display: 24.9197ms ( 47 samples) B = 'r.LumenScene.Radiosity.UpdateFactor 128' LogConsoleResponse: Display: A is 0.5207ms faster than B; 0% chance this is noise. LogConsoleResponse: Display: ---------------- ... LogConsoleResponse: Display: ---------------- r.LumenScene.Radiosity.UpdateFactor = "128" r.LumenScene.Radiosity.UpdateFactor = "64" r.LumenScene.Radiosity.UpdateFactor = "128" LogConsoleResponse: Display: 24.9097ms ( 196 samples) A = 'r.LumenScene.Radiosity.UpdateFactor 64' LogConsoleResponse: Display: 25.0011ms ( 204 samples) B = 'r.LumenScene.Radiosity.UpdateFactor 128' LogConsoleResponse: Display: A is 0.0914ms faster than B; 1% chance this is noise. LogConsoleResponse: Display: ----------------
CSVProfiler
CSVProfiler is a built-in lightweight profiler that gathers per-frame metrics and then visualizes them using PerfReportTool in the form of charts. For more information please refer to CSV Profiler and CSVToSVG
CSV Profiler can help determine the overall state of the game. You can observe on the charts if there are some increased periods where your frame is taking longer than expected and then based on this visualization you can detect exactly what took that much time during that frame.
To use CSV Profiler we can either run it with a prepared batch script to maintain the same parameters between executions or enable it from the console. CSV files generated by the CSV Profiler are written out to the [ProjectDirectory]/Saved/Profiling/CSV directory.
It is important to highlight that shader compilation during the first run can affect your profiling metrics. When profiling make sure that your application already finished the shader compilation stage and collected all necessary PSOs. If you did not implement PSO Cache, at least run your benchmark twice and leverage hardware vendor shader cache.
From Batch Script
-csvCaptureFrames=N "Capture N frames of CSV data then stop profiling" -csvGpuStats "Enables CSVProfiler GPU stats from the beginning" -ExitAfterCsvProfiling "Quits workload after the number of set captured frames is reached" -csvABTest=UNREAL_CVAR -csvABTestStatFrameOffset=N -csvABTestSwitchDuration=N -csvABTestFastCVarSet -novsync "turn off vsync as an option when starting the game from the command line" -vsync "turn on vsync as an option when starting the game from the command line" -game "run the game as if it were a packaged build; don’t open the editor" -deterministic "Shortcut for -UseFixedTimeStep -FixedSeed" -benchmark "Needed for deterministic. Compiled out in shipping" -benchmarkseconds=N "Limit execution of benchmark to N seconds" -fps=60 "Override fixed tick rate frames per second. Combined with seconds you can limit the number of frames you ultimately display before quitting" -seconds=N "Run workload N seconds and then quit" -windowed "Use windowed mode" -fullscreen "Use fullscreen mode" -resx=2560 "Specify window width resolution in pixels" -resy=1440 "Specify window height resolution in pixels" -noscreenmessages "useful to only see the graphics being rendered. I think GPA punches through this" -dumpmovie "write out an image file for each rendered frame" -corelimit=8
Example of capturing and generating report from CSV
Game-Win64-Test.exe -benchmark -deterministic -FPS=60 -unattended -noscreenmessages -novsync -csvcaptureframes=12000 -csvGpuStats -ExitAfterCsvProfiling -resx=2560 -resy=1440 -forceres -windowed
"E:\UnrealEngine\UE_5.5\Engine\Binaries\DotNET\CsvTools\PerfreportTool.exe" -csvdir "E:\UnrealPackagedProjects\LyraStarterGame\Windows\LyraStarterGame\Saved\Profiling\CSV" -summarytable all -reporttype Default60fps
* Report will be created in folder where you called command prompt
From Console
It is also possible to run some of the command from console:
csvprofile start // Starts recording csvprofile frames=N // Capture N frames of CSV data then stop profiling. csvprofile stop // Ends recording of a stat csv profile r.gpuCsvStatsEnabled 1 // Adds STAT GPU output to CSV file this adds some overhead to stat collection and is GPU time query based. r.Vsync // turn on/off vsync t.maxfps // set limit for max fps, 0 set unlimited
Unreal Insights
Unreal Insights is an extremely powerful profiling tool that can help you with the optimization of your game. It is being continuously developed and improved and in most cases it will be your first go-to tool. It uses the Trace framework to collect timing metrics from your game. It can be found in the Engine installation folder in Engine/Binaries/Win64/UnrealInsights.exe. By default it automatically attaches to the running game, you can observe it in Unreal Insights Frontend as a red "Live" message in the Status column. To avoid that run your game with the -traceautostart=0 parameter or just uncheck the Auto-connect checkbox in Unreal Insights Frontend. After that, you will be able to trigger data collection manually from the console by using Trace.Start, Trace.Pasue, Trace.Resume, Trace.Stop, or Trace.Send localhost <channel>. Those console variables are helpful if you want to reduce noise in your capture.
Another useful options are Trace.Screenshot <name> and Trace.Bookmark <name> - they can be called from the console, or added manually in source code to add benchmark and screenshot automatically in the trace. You can then open trace and filter your log by bookmark's name to see where exactly in trace you called it and see what frame it was based on the screenshot. It can be really helpful with establishing the point in trace where you switched between settings to compare frames between each other.
Unreal Insights also offers you a very flexible way of including trace channels that are interesting for your profiling purpose. You can include channels that are more important for CPU profiling or ones for GPU profiling. Also because of that, we want to get rid of some noise data to focus on what really matters in the current profiling session. To make it more convenient for you we prepared a common set of commands used for profiling CPU and GPU.
Shortcuts / Batch Scripts to run App
Default
-trace=default is equal to -trace=cpu,gpu,frame,log,bookmark
CPU and Loading Profiling
start Game-Win64-Test.exe /YourMap/Maps/L_Map -statnamedevents -trace=default,screenshot,file,loadtime
-statnamedevents add more CPU timing events with cost of CPU overhead
The difference between having statnamedevents disabled and enabled is presented here
For more advanced analysis to see specific thread being used for specific workload and tracing of Task Graph event you can add additional trace channels
start Game-Win64-Test.exe /YourMap/Maps/L_Map -statnamedevents -trace=default,file,loadtime,ContextSwitch,Task
*ContextSwitch requires from game to be run as administrator
GPU
For GPU it is important to reduce CPU noise during profiling you can do it by passing command-line arguments however it is not that simple on integrated platforms but we will cover that later. For more information please refer to Command-Line Arguments
start Game-Win64-Test.exe /YourMap/Maps/L_Map -trace=default,task -nosound -noverifygc -noailogging -novsync -dpcvars=r.D3D12.DRED=0,r.D3D12.LightweightDRED=0,r.GPUCrashDebugging.Breadcrumbs=0
You can also consider Trace channels: RHICommands, RenderCommands or RDG.
However, when profiling GPUs the best way to understand what is happening in the game frame is to use Unreal Insights together with the built-in tool ProfileGPU or external tools like PIX, RenderDoc and GPA.
For more information please refer to Unreal Insights and Trace Channel Reference
ProfileGPU
By default, ProfileGPU is compiled out of test and shipping configurations but as we showed above it can be enabled manually.
ProfileGPU can be run from the Editor by CTRL+, and from the console by ProfileGPU. When run from Editor it will automatically open the Visualizer, when run from console it will write it to the log.
It is helpful to eyeball what is happening during the frame and what can take too much time. Unfortunately, GPU Visualizer cannot be used to visualize that info from the log but you can use any comparison tool when you have ProfileGPU from 2 different settings.
Profiling the next GPU frame LogD3D12RHI: LogD3D12RHI: LogRHI: Perf marker hierarchy, total GPU time 577.02ms LogRHI: LogRHI: 3.9%22.76ms Frame 390 278 draws 88265 prims 137173 verts 370 dispatches LogRHI: 0.0% 0.00ms WorldTick LogRHI: 0.0% 0.00ms SendAllEndOfFrameUpdates LogRHI: 0.0% 0.00ms RayTracingGeometry LogRHI: 0.0% 0.00ms UpdateLumenSceneBuffers LogRHI: 3.8%21.97ms FRDGBuilder::Execute 244 draws 87137 prims 135413 verts 370 dispatches LogRHI: 0.0% 0.10ms ClearGPUMessageBuffer 1 dispatch 1 groups LogRHI: 0.0% 0.00ms ShaderPrint::UploadParameters 1 dispatch 1 groups LogRHI: 0.0% 0.10ms UpdateDistanceFieldAtlas 1 draw 1 prims 0 verts 2 dispatches LogRHI: 0.0% 0.00ms ClearBuffer(%s Size=%ubytes) 1 draw 1 prims 0 verts LogRHI: 0.0% 0.01ms ComputeWantedMips 1 dispatch 34 groups LogRHI: 0.0% 0.00ms GenerateStreamingRequests 1 dispatch 3 groups LogRHI: 0.0% 0.00ms DistanceFieldAssetReadback LogRHI: 3.7%21.23ms Scene 242 draws 87135 prims 135413 verts 366 dispatches LogRHI: 0.0% 0.00ms AccessModePass[Graphics] (Textures: %d, Buffers: %d) LogRHI: 0.0% 0.00ms FXSystemPreRender LogRHI: 0.0% 0.00ms GPUParticles_PreRender LogRHI: 0.0% 0.05ms GPUSceneUpdate 1 draw 1 prims 0 verts 2 dispatches LogRHI: 0.0% 0.05ms BuildRenderingCommandsDeferred(Culling=%s) 1 draw 1 prims 0 verts 2 dispatches LogRHI: 0.0% 0.02ms ClearBuffer(%s Size=%ubytes) 1 draw 1 prims 0 verts LogRHI: 0.0% 0.01ms ClearIndirectArgInstanceCount 1 dispatch 1 groups LogRHI: 0.0% 0.01ms CullInstances(%s). Bin %d 1 dispatch 2 groups LogRHI: 0.0% 0.00ms CullInstances(%s). Bin %d LogRHI: 0.0% 0.00ms Instance Compaction Phase 1 LogRHI: 0.0% 0.00ms Instance Compaction Phase 2 LogRHI: ...
Intel® Graphics Performance Analyzers (GPA)
Intel® Graphics Performance Analyzers (Intel® GPA) is an application suite that contains tools for analysis and optimization of graphics-intensive applications, including but not limited to Microsoft DirectX*, Vulkan*, and OpenGL* games. With Intel® GPA, you can test your applications on various platforms to detect bottlenecks and find the best optimization for each platform. You can also enable/disable certain features (such as detailed terrain features or additional interactive game elements) until you achieve the optimal game-playing experience.
For more information please refer to User Guide and Intel Graphics Performance Analyzers
Graphics Performance Analyzer Plugin
With the Unreal Engine* (UE) plugin you can capture a multi-frame stream directly while you are in Unreal Editor and analyze it in Graphics Frame Analyzer.
Intel® VTune™
VTune can help you with really advanced profiling of CPU and GPU. It can help you with tracking down hotspots, and issues related to: algorithm complexity, threading, microarchitecture, memory access, GPU offload, and more.
For more information please refer to User Guide and Intel V-Tune Profiler Training
When starting it is recommended to capture Performance Snapshot then proceed with a more isolated approach.
When it comes to prerequisites when capturing VTune trace we recommend:
- Turn off Real-Time Protection
- Disable Core Isolation
To add detailed more information in your trace use -VTune command.
start Game-Win64-Test.exe -VTune
PIX
PIX is a profiling and debugging tool maintained by Microsoft, you can use it for single-frame analysis but also multiple frames (up to 10). You can observe many different metrics like GPU events, API calls, Shader source code and modify them in runtime, but also really low-level metrics (provided by PIX plugins for each GPU vendor) that you can analyze. PIX is not only helpful with GPU but also with CPU, if you want to know more about PIX please refer to PIX on Windows Overview
In the past, it was required to set r.RHISetGPUCaptureOptions=1 which enabled EnableIdealGPUCaptureOptions however now after installation the only thing you have to do is add -AttachPIX in the command line when launching your project/gam,e and then attach with PIX, or run it from PIX application directly by passing the path to your game binary. You can also attach with PIX directly to your editor and capture the frame from Editor Viewport.
start Game-Win64-Test.exe -AttachPIX
What EnableIdealGPUCaptureOptions does is:
- Toggling Draw Events and Material Draw Events to include PIX markers in PIX frame capture
- Enabling full pass names in RenderDependencyGraph
For additional debug purpose consider using:
- r.RHICmdBypass and r.RHIThread.Enable (-onethread -forcerhibypass) - helps with narrowing down if issue is threading / timing related
- r.RDG.Debug=1 whether some render passes have not been set properly
- r.RDG.ImmediateMode=1 forces RDG to execute passes right after creation
Intel® Arc™ GPUs & PIX
In our ongoing efforts to enhance driver functionality and performance, we have integrated AppDiscovery using Intel Extensions into Unreal Engine* 5 (available since UE5.2). This integration allows us to pass additional information to our GPUs to detect your game name, engine version, activate necessary features, and improve performance where possible.
As part of this process, it is necessary to create a special device context at game startup. However, PIX captures currently do not support it, resulting in captures that differ from runtime observations. We are collaborating closely with Microsoft to address this issue and provide support in the future. Please stay tuned for updates.
As our compiler and driver teams are continually working to optimize performance it may sometimes require analyzing the same PIX capture multiple times to observe all optimizations being applied in PIX. For instance, our compiler might determine in subsequent runs that a shader can be optimized by adjusting the GRF size to reduce spilling and enhance performance. To make it more visible and improve PIX experience on our platforms we've added more detailed information about shader execution in PIX. This feature is invaluable for debugging shaders, as it provides insights into the used SIMD Width (Wavelength), whether the shader caused any spills, and the size of the GRF used.
Memory Profiling
Memory Profiling is another thing you have to take care of during your game development. For Memory profiling, we recommend using Development Config since LowLevelMemoryTrackers are already enabled there, but if you want you can also enable them in Test Configuration. However, they may heavily impact synthbenchmark results.
To make it work in Test Configuration you have to add GlobalDefinition in Target.build.cs file and run your application with -llm command.
For more information please refer to LowLevelMemoryTracker
Memory Insights
It is another functionality of Unreal Insights that allows you to track down memory short-term and long-term allocations, deallocations, and even find memory leaks in your game.
start Game-Win64-Test.exe /YourMap/Maps/L_Map -trace=default,memory,metadata,assetmetadata -llm
It can show you the timeline of in-game allocations to see what was the actual state of memory and what was allocated or deallocated when observed change in memory usage happened.
Using the Investigation tab you can run a Query to collect data based on the chosen Rule. The collected data will be shown in another window or tab.
However, this view is hard to understand as it is not grouping any information by default. For better visualization, you can use Hierarchy groups and advanced filters to make it easier to read.
For more information please refer to Memory Insights
MemReport
MemReport is like a wrapper for other console flags. It saves memreport to /Saved/Profiling/MemReports/*.memreport. What is being called by MemReport is defined in Engine/Config/BaseEngine.ini in [MemReportCommands] and [MemReportFullCommands] - you can observe there the complete list of called commands. You can also use other commands separately when it is needed to limit your log size. MemReport can show you where and how your memory is used and help you with analyzing its usage.
Example of Usage
MemReport // -full for detailed mem report, -log to save to log
As [MemReportFullCommands] uses many commands it can be overwhelming at first sight but, lets explain some of used commands:
obj list -resourcesizesort - list all UObjects classes in game sorted by Resource Exclusive Size (KB). In log columns you can observe:
- Count - instances of UObject
- NumKB - the amount of memory used by the UObject
- ResExcKB - the amount of memory used by resources owned by this UObject
- ResExcDedSysKB - the amount of memory in system memory
- ResExcDedVidKB - the amount of memory in dedicated video memory
obj list class=SkeletalMesh -resourcesizesort - allows you to list all UObjects by class name
rhi.DumpResourceMemory - list all RHIResources and provide information about its Type, Size and Flags.
rhi.dumpresourcememory summary name=Lumen allows you to filter and summarize rhi.DumpResourceMemory to contain only results with Lumen. in the name.
MemReport: Begin command "rhi.dumpresourcememory summary name=Lumen" Shown 98 entries with name Lumen. Size: 726.94/4342.22 MB (16.74% of total non-transient) MemReport: End command "rhi.dumpresourcememory summary name=Lumen"
Those were only a few commands used by MemReport but there are also other useful ones that can help you with tracking down big textures, listing spawned actors, detecting really big buffers in RHI Resources, etc.
Render Resource Viewer
The Render Resource Viewer Tool allows you to verify your current VRAM usage by different render resources. In the upper left corner of the Render Resource View tab, you can observe the Total Resource Count and its total Size. Above it, you can choose Filter flags to exclude or include resources, and you can also filter by keyword to search for a Resource Name.
The Refresh Button allows you to collect snapshots in time from your scene - it can be helpful when you modify your scene for example when adding new assets, lights, etc., or when modifying presets / config flags.
What Render Resource Viewer can do for you is tell you what is used, where it is, and what size it is. It can also tell you which buffers are worth investigating and who owns them. You can also easily Browse to Asset when you find something interesting in the log.
For more information please refer to Render Resource Viewer
Summary
And that would be it in this first Unreal Engine* Optimization Guide chapter. At this point, you should have basic knowledge about Profiling Metrics, Frame Budget, Frame Pacing, and be able to determine whether your app is CPU-Bound or GPU-Bound and which tool to use for further analysis. In the next chapters, we will cover more in-depth examples related to Scalability Settings and Optimization in Unreal Engine* 5.
References
Hardware Profile Guided Optimization in UE5
Intel® Graphics Performance Analyzers: User Guide
Intel® Graphics Performance Analyzers: Unreal Engine* Plugin
Intel® VTune™: User Guide
Resizable BAR
Unreal Engine* 4 Optimization Tutorial
Unreal Engine* 5 Documentation
Ari Arnbjörnsson's Resources
Matt Oztalay's Resources
Unreal Engine* Game Optimization on a Budget by Tom Looman