<Prev | Content | Next>

08. Metal System Trace

Though GPU frame capture in Xcode provides some performance details, it’s usually not very precise because time values may change from frame to frame due to thermal conditions, other GPU load, and so on. So I recommend using Instruments for more confident measurements of your Metal app performance.

Instruments

In Instruments, select the Metal System Trace toolset/template, which contains everything you need for profiling a Metal app. After this, you get capture settings where you can configure the tools and their parameters.

Recording

  1. Capture button. Tap it when you’re ready to start.
  2. List of capture runs. Sometimes it’s helpful to have several runs because of different conditions, code changes, or captured functionality.
  3. List of used instruments. Often you don’t need everything from a selected toolset, so you can remove or add some of them. Keep in mind that some tools aren’t compatible with each other.
  4. Target settings:
    • Device. You can run Instruments on your Mac, iPhone, simulator, etc. Never profile on simulator if you want real numbers.
    • Target
      • Launch. Launches a selected app with options.
      • Attach to. Attaches to an existing process.
      • All processes. Captures system-wide activity, useful for correlating your app with other processes.
  5. Recorder settings:
    • Immediate. Shows recorded data during recording, so you can see how it looks in real time. It needs more resources.
    • Deferred. Records first, then runs analysis and shows results.
    • Capture last X seconds. Records only the last X seconds of your run. Very convenient, because it doesn’t record the whole path to your interesting Metal-based function.
  6. Metal Application:
    • Metal Device. Selects which GPU device to profile.
    • Counter Set. Selects which counters to capture (for example, performance limiters/utilization).
    • Performance State. Allows forcing a specific performance state for more repeatable measurements.
    • Enable Shader Timeline. Shows not only encoder time, but also shader time. Helpful to understand if your shader is slow or if data management/synchronization is the bottleneck.
  7. Time Profiler. Settings for CPU profiling; it’s out of scope for this episode, but sometimes it’s helpful because CPU behavior can affect GPU workload.
  8. Hangs. In most cases you don’t need it specifically for Metal profiling, so you can remove it.

After you start recording (depending on recorder settings), do whatever you want to profile and stop recording. After some time (sometimes significant), Instruments shows you the results.

Tools Window Overview

  1. Filtering/grouping. You can select only what you need and not get distracted by other tools.
  2. Timeline per tool. The main panel for visual representation of what’s happening in your app.
  3. Details. Here you can see real numbers. It’s different for each tool, and can have different modes. Helpful to analyze summaries of performance limiters, etc.
  4. Additional details. Usually not used for most GPU tools.

Left of a tool’s name you can find a pin button that moves the lane to a separate section, so you can collect only the lanes that are really important.

GPU Overview

In many Metal profiling cases, the most useful lanes are in the Metal Device/Metal Application area. There you can find decomposition of every frame (shown in different colors):

  • Command buffer scheduling and execution.
  • Completion handlers and drawable presents (if you use them).
  • Encoders and GPU stage activity per type.
  • Performance limiters.

These tools don’t show performance distribution per shader line, but they do show the entire Metal pipeline. So you can see how well you load your GPU, where gaps are, and whether you can fill them with useful processing. For example, if you use too many completion waits, you can block the next command buffer from starting, so GPU load will drop and overall performance will degrade.

If you record with performance limiters and enabled shader timeline, you can get much more interesting details. Don’t worry if shader time is much less than encoder time where the shader is called: some time is spent on data movement, bindings, synchronization, etc. This is a good point to start checking performance limiters to understand why you spend more time moving data than processing it. So an encoder is not only a shader call, it’s a more complex process, and shader execution is just one part.

Conclusion

  • If you need to profile the entire Metal app, not only a particular shader, use the Metal System Trace toolset/template in Xcode Instruments.
  • Use “Capture last X seconds” for convenience to avoid capturing useless data.
  • In most cases, focus first on Metal Device/Metal Application lanes.
  • Don’t hesitate to capture performance limiters and shader timeline; they are very informative.

<Prev | Content | Next>