Lecture 10: Performance Optimization
Date: 2023-09-26
1. Assessing Performance
Understanding your code's performance landscape is the first step in optimization. Various aspects need to be assessed to get a comprehensive view.
Four Levels of Consideration
-
Timing Constraints: Determine if the operation has any deadlines or time limits it needs to meet.
-
Time Observability: Measure how long the operation actually takes. This sets the baseline for any optimization.
-
Behavior Observability: Dig into what's causing the operation to take the time it does. This can involve profiling, looking into algorithmic complexity, or checking for bottlenecks.
-
Actionable Insights: Finally, outline what can be done to improve the time performance. This could range from changing algorithms to parallelizing tasks.
Real-Time Requirements
Hard vs. Soft Real-Time
-
Hard Real-Time: Failing to meet deadlines is a no-go. These are often found in systems like avionics.
-
Soft Real-Time: Deadlines are preferable but not a must. For example, video buffering in a streaming service.
2. Observing Performance
Once you've assessed your performance needs, the next step is to actually observe how your code behaves. This is essential for pinpointing bottlenecks and areas for improvement.
Using std::chrono
The C++ Standard Library offers std::chrono
for high-precision timing. It's a versatile way to measure code execution time.
#include <iostream>
#include <chrono>
int main() {
auto start = std::chrono::high_resolution_clock::now();
// Your code here
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
std::cout << "Time taken: " << duration.count() << " microseconds" << std::endl;
}
Event Tracing
Sometimes, you'll want to understand how multiple parts of your application interact over time. Event tracing tools can help you visualize this.
Performance Counters
Modern CPUs offer hardware-level performance counters. These can be accessed through specific OS-level APIs or tools like perf on Linux.
Network Monitoring
If your application is network-bound, tools like Wireshark can help monitor data packets and understand latency issues.
3. Profiling Tools
Profiling is an essential step in performance optimization. It helps you identify bottlenecks and inefficiencies at various granularities— from the entire system down to individual lines of code.
VTune
Intel's VTune profiler is a powerful tool for both CPU and GPU performance analysis. It provides deep insights into how your application utilizes hardware resources. Use it to optimize hotspots, threading, and offloading to graphics processors.
gprof
The GNU profiler, gprof
, is useful for collecting performance data of your application. It gives you a function-level breakdown of time spent, so you can see which parts of your code are the most time-consuming.
g++ -pg your_program.cpp -o your_program
./your_program
gprof your_program gmon.out > analysis.txt
perf
Linux's perf
tool is versatile, providing a wide array of system-level metrics. It covers CPU usage, cache misses, and context switches. Good for getting a broad performance picture.
perf record ./your_program
perf report
Valgrind
Though primarily known for memory leak detection, Valgrind also has a tool called Callgrind
that can profile your application. It's heavier than other tools, but very thorough.
valgrind --tool=callgrind ./your_program
kcachegrind callgrind.out.xxxx
VerySleepy
A Windows-specific sampling profiler, VerySleepy is useful for identifying bottlenecks without much setup. It's not as detailed as other tools, but good for a quick check.
Others
-
Visual Studio Profiler: For Windows users, the built-in Visual Studio profiler is quite effective.
-
Instruments: Mac users can rely on Instruments for profiling various aspects like CPU, memory, and more.
4. Improving Performance
Once you've assessed and observed performance using various tools, the next step is making improvements. Below are some proven methods to enhance your C++ application's performance.
Algorithm Optimization
Before getting into micro-optimizations, make sure your algorithms are as efficient as they can be. The time and space complexity should be optimized for the problem at hand.
Loop Unrolling
Manually expand loop bodies to decrease loop-control operations and improve speed. Use this cautiously as it can increase code size.
// Before
for (int i = 0; i < 4; ++i) {
do_something(i);
}
// After
do_something(0);
do_something(1);
do_something(2);
do_something(3);
Function Inlining
Inline small functions to avoid the overhead of function calls. The inline
keyword or compiler optimizations can help here.
inline int square(int x) {
return x * x;
}
Memory Access
Try to optimize memory access patterns to be sequential and thereby cache-friendly. The idea is to enhance spatial locality.
// Cache-inefficient
for (int i = 0; i < cols; ++i) {
for (int j = 0; j < rows; ++j) {
array[j][i] = 0;
}
}
// Cache-efficient
for (int i = 0; i < rows; ++i) {
for (int j = 0; j < cols; ++j) {
array[i][j] = 0;
}
}
Use constexpr
Use constexpr
for computations that can be done at compile-time, reducing runtime overhead.
constexpr int factorial(int n) {
return (n <= 1) ? 1 : (n * factorial(n - 1));
}
Multithreading
Leverage multi-core processors by using parallel algorithms or threading libraries like std::thread
in C++.
Profile-Guided Optimization
Some compilers offer the ability to optimize code based on profile data collected during test runs.
Hardware Intrinsics
If you know the specific hardware you're targeting, using hardware intrinsics can yield a significant performance gain.
Minimize I/O Operations
Reducing I/O-bound operations can also significantly speed up your application. Use buffering or batch operations whenever possible.