Lecture 10: Performance Optimization

Date: 2023-09-26

1. Assessing Performance

Understanding your code's performance landscape is the first step in optimization. Various aspects need to be assessed to get a comprehensive view.

Four Levels of Consideration

Timing Constraints: Determine if the operation has any deadlines or time limits it needs to meet.
Time Observability: Measure how long the operation actually takes. This sets the baseline for any optimization.
Behavior Observability: Dig into what's causing the operation to take the time it does. This can involve profiling, looking into algorithmic complexity, or checking for bottlenecks.
Actionable Insights: Finally, outline what can be done to improve the time performance. This could range from changing algorithms to parallelizing tasks.

Real-Time Requirements

Hard vs. Soft Real-Time

Hard Real-Time: Failing to meet deadlines is a no-go. These are often found in systems like avionics.
Soft Real-Time: Deadlines are preferable but not a must. For example, video buffering in a streaming service.

2. Observing Performance

Once you've assessed your performance needs, the next step is to actually observe how your code behaves. This is essential for pinpointing bottlenecks and areas for improvement.

Using `std::chrono`

The C++ Standard Library offers std::chrono for high-precision timing. It's a versatile way to measure code execution time.

#include <iostream>
#include <chrono>

int main() {
    auto start = std::chrono::high_resolution_clock::now();

    // Your code here

    auto stop = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);

    std::cout << "Time taken: " << duration.count() << " microseconds" << std::endl;
}

Event Tracing

Sometimes, you'll want to understand how multiple parts of your application interact over time. Event tracing tools can help you visualize this.

Performance Counters

Modern CPUs offer hardware-level performance counters. These can be accessed through specific OS-level APIs or tools like perf on Linux.

Network Monitoring

If your application is network-bound, tools like Wireshark can help monitor data packets and understand latency issues.

3. Profiling Tools

Profiling is an essential step in performance optimization. It helps you identify bottlenecks and inefficiencies at various granularities— from the entire system down to individual lines of code.

VTune

Intel's VTune profiler is a powerful tool for both CPU and GPU performance analysis. It provides deep insights into how your application utilizes hardware resources. Use it to optimize hotspots, threading, and offloading to graphics processors.

gprof

The GNU profiler, gprof, is useful for collecting performance data of your application. It gives you a function-level breakdown of time spent, so you can see which parts of your code are the most time-consuming.

g++ -pg your_program.cpp -o your_program
./your_program
gprof your_program gmon.out > analysis.txt

perf

Linux's perf tool is versatile, providing a wide array of system-level metrics. It covers CPU usage, cache misses, and context switches. Good for getting a broad performance picture.

perf record ./your_program
perf report

Valgrind

Though primarily known for memory leak detection, Valgrind also has a tool called Callgrind that can profile your application. It's heavier than other tools, but very thorough.

valgrind --tool=callgrind ./your_program
kcachegrind callgrind.out.xxxx

VerySleepy

A Windows-specific sampling profiler, VerySleepy is useful for identifying bottlenecks without much setup. It's not as detailed as other tools, but good for a quick check.

Others

Visual Studio Profiler: For Windows users, the built-in Visual Studio profiler is quite effective.
Instruments: Mac users can rely on Instruments for profiling various aspects like CPU, memory, and more.

4. Improving Performance

Once you've assessed and observed performance using various tools, the next step is making improvements. Below are some proven methods to enhance your C++ application's performance.

Algorithm Optimization

Before getting into micro-optimizations, make sure your algorithms are as efficient as they can be. The time and space complexity should be optimized for the problem at hand.

Loop Unrolling

Manually expand loop bodies to decrease loop-control operations and improve speed. Use this cautiously as it can increase code size.

// Before
for (int i = 0; i < 4; ++i) {
    do_something(i);
}

// After
do_something(0);
do_something(1);
do_something(2);
do_something(3);

Function Inlining

Inline small functions to avoid the overhead of function calls. The inline keyword or compiler optimizations can help here.

inline int square(int x) {
    return x * x;
}

Memory Access

Try to optimize memory access patterns to be sequential and thereby cache-friendly. The idea is to enhance spatial locality.

// Cache-inefficient
for (int i = 0; i < cols; ++i) {
    for (int j = 0; j < rows; ++j) {
        array[j][i] = 0;
    }
}

// Cache-efficient
for (int i = 0; i < rows; ++i) {
    for (int j = 0; j < cols; ++j) {
        array[i][j] = 0;
    }
}

Use `constexpr`

Use constexpr for computations that can be done at compile-time, reducing runtime overhead.

constexpr int factorial(int n) {
    return (n <= 1) ? 1 : (n * factorial(n - 1));
}

Multithreading

Leverage multi-core processors by using parallel algorithms or threading libraries like std::thread in C++.

Profile-Guided Optimization

Some compilers offer the ability to optimize code based on profile data collected during test runs.

Hardware Intrinsics

If you know the specific hardware you're targeting, using hardware intrinsics can yield a significant performance gain.

Minimize I/O Operations

Reducing I/O-bound operations can also significantly speed up your application. Use buffering or batch operations whenever possible.