Lair Of The Multimedia Guru

2005-12-20

Profiling / Benchmarking code

You have a piece of code, its too slow so you want to optimize it, but to do that you first need to be able to meassure its speed, otherwise you wont know if a change improved speed or not

Benchmarking a piece of code sounds easy but it isnt, you can try to do it with various standard “get the current time” functions but these tend to be very inaccurate, canned tools like gprof are an option too, but in my experience they are also inaccurate and they can produce highly misleading results due to function inlining. A more accurate method on the x86 architecture is to use rdtsc

So now we have a accurate method to get the current “time” but what now? simply running the code once and getting the time before and afterwards? This would be a very bad choice as the first time a piece of code is executed its not in the code cache, the data isnt in the data cache, branch prediction has not seen the code yet, … so we could ignore the first few iterations, but still a single measurement isnt very trustworthy, next improvement would be to average many interations but still its very sensitive, just think about what happens when the kernel decides to stop your program and give the cpu to your mp3 player, its like having a interation which needs 50 times longer then the average assuming the code between the rdtsc instructions doesnt take too long. Solving this is easy though, just discard such outliers which need many times the average (see the START/STOP_TIMER macros in ffmpeg/libavutil/common.h)

Filed under: Optimization — Michael @ 13:27

Powered by WordPress