Intel VTune Profiler Performance Analysis Cookbook

ID 766316
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Profiling a .NET* Core Application

This recipe uses Intel® VTune™ Profiler for .NET Core dynamic-code profiling to locate performance hotspots in the managed code and optimize the application turnaround.

Ingredients

This section lists the hardware and software tools used for the performance analysis scenario.

  • Application: a sample C# application that adds all the elements of an integer List. The application is used as a demo and not available for download.

  • Tools:

    • Intel® VTune™ Profiler 2018

      NOTE:
      • Starting with the 2020 release, Intel® VTune™ Amplifier has been renamed to Intel® VTune™ Profiler.

      • Most recipes in the Intel® VTune™ Profiler Performance Analysis Cookbook are flexible. You can apply them to different versions of Intel® VTune™ Profiler. In some cases, minor adjustments may be required.

      • Get the latest version of Intel® VTune™ Profiler:

    • .NET Core 2.0 SDK

  • Operating system: Microsoft* Windows* 10

  • CPU: Intel microarchitecture code name Skylake

Prepare Your Application for Analysis

  1. Open a new command window for the .NET environment variables to take effect. Make sure that .NET Core 2.0 is successfully installed:

    dotnet --version
  2. Create a new listadd directory for the application:

    mkdir C:\listadd > cd C:\listadd
  3. Enter dotnet new console to create a new skeleton project with the following structure:

  4. Replace the contents of Program.cs in the listadd folder with C# code that adds the elements of an integer List:

    using System; using System.Linq; using System.Collections.Generic; namespace listadd { class Program { static void Main(string[] args) { Console.WriteLine("Starting calculation..."); List<int> numbers = Enumerable.Range(1,10000).ToList(); for (int i =0; i < 100000; i ++) { ListAdd(numbers); } Console.WriteLine("Calculation complete"); } static int ListAdd(List<int> candidateList) { int result = 0; foreach (int item in candidateList) { result += item; } return result; } } }
  5. Create listadd.dll in the C:\listadd\bin\Release\netcoreapp2.0 folder:

    dotnet build -c Release
  6. Run the sample application:

    dotnet C:\listadd\bin\Release\netcoreapp2.0\listadd.dll

Run Advanced Hotspots Analysis

  1. Launch VTune Profiler with administrator privileges.

  2. Click the New Project button on the toolbar and specify a name for the new project, for example: dotnet.

  3. In the Analysis Target window, select local host and Launch Application target type from the left pane.

  4. On the Launch Application pane, specify the application to analyze:

    • Application: C:\Program Files\dotnet\dotnet.exe

    • Application parameters: C:\listadd\bin\Release\netcoreapp2.0\listadd.dll

    NOTE:

    The location of dotnet.exe depends on your environment and can be identified with the command: where dotnet.

  5. Click the Choose Analysis button on the right and select the Advanced Hotspots analysis from the left pane.

    NOTE:

    Advanced Hotspots analysis was integrated into the generic Hotspots analysis starting with Intel VTune Amplifier 2019, and is available via the Hardware Event-Based Sampling collection mode.

  6. Click Start to run the analysis.

Identify Hotspots in the Managed Code

When the collected analysis result opens, switch to the Bottom-up tab and set the data grouping level to Process/Module/Function/Thread/Call Stack:

Expanding dotnet.exe > listadd.dll discovers the managed listadd::Program::ListAdd function that took the most CPU Time:

Double-click this hotspot function to open the source view. To view the source and disassembly code side by side, click the Assembly toggle button on the toolbar:

Use the statistics per source line/assembly instruction to identify the most time-consuming code snippets (line 24 in the example above) and work on optimizations.

Optimize the Code with Loop Interchange

VTune Amplifier highlights the following code line as performance-critical:

foreach (int item in candidateList)

For optimization, consider using the for loop statement. Replace the contents of Program.cs with this C# code:

using System; using System.Linq; using System.Collections.Generic; namespace listadd { class Program { static void Main(string[] args) { Console.WriteLine("Starting calculation..."); List<int> numbers = Enumerable.Range(1,10000).ToList(); for (int i =0; i < 100000; i ++) { ListAdd(numbers); } Console.WriteLine("Calculation complete"); } static int ListAdd(List<int> candidateList) { int result = 0; for (int i = 0; i < candidateList.Count; i++) { result += candidateList[i]; } return result; } } }

Verify the Optimization

To verify the optimization for the updated code, re-run the Advanced Hotspots analysis.

Before the optimization the sample application took 2.636 seconds of CPU time:

After optimization the application ran for 0.945s, which is a 64% reduction in time over the original:

NOTE:

To discuss this recipe, visit the developer forum