GPU Perf Studio 2 and CodeXL are used to debug and micro benchmark the new renderer.
While these tools are made by AMD they can be used on any hardware for debug purpose. As profiling requires specific hardware counter you obviously need a radeon cards to get the most out of it.
GPU Perf studio 2 is more suited to GPU but only ships on Windows at the time of writing (dependency on .Net). CodeXL is more generic as it targets both CPU and GPU, and is available on Linux.
To set up GPU Perf Studio:
- Click on "Connect" in the main window.
- A"Server connection" pop up will appear. Set the path to stk executable built in Release mode (debug mode executable are sometimes instable when used through GPS2). Then click on the "connect in the pop up.
- Stk will start (it is highly recommended to run it in windowed mode). Switch to GPS2 windows which will ask you confirmation about the API and the monitored application.
- Disable STK frame limiter
- Go back to STK and run it until there is a frame you want to debug and/or profile.
- Switch to GPS2 again, and click on the pause button. GPS2 will then register all events to replay the frame.
- You can now access Frame Debugger (list all glDraw*/glClear with complete context ie textures, vbo...) or API Trace (list all GL command with cpu time), and optionnaly gpu profiler (get gpu execution time of each command).
The "USE_ASAN" option in cmake on Linux allows gcc and clang to add ASAN support to the binary in debug mode.
ASAN (for Adress SANitizer) is a Library that reports memory related error like use after free, double free, malloc/dealloc mismatch... It's basically a Valgrind-like tool but orders of magnitude faster.
To enable it you must set some environnment var so that symbol are properly translated (and of course, have libasan installed). On Fedora 20 the following command should work :
ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer ASAN_OPTIONS=symbolize=1 bin/supertuxkart
Even if stk was built with gcc.
Proper profiling require STK to be build in debug mode ; however it's much easier to track performance bottleneck with literal function name instead of their machine address. On linux it probably requires an extra -fno-omit-frame-pointer flag passed to gcc and ld. On Windows you need to build STK in Release mode with debug info. The step by step description can be found here : http://msdn.microsoft.com/en-us/library/fsk896zz.aspx Then it's possible to use CodeXL to collect samples and find in which function cpu spend most of its time.
There is also a bar profiler available in artist debug mode. It should be quite easy to understand. Please note that there are 2 sync points between cpu and gpu :
- One at the "GPU CPU stall" marker. The cpu might have to wait for the GPU to finish to render the second solid pass before uploading updated position/instances in VRAM. Thus the "GPU CPU stall" bar in the profiler represents cycles wasted.
- One right at the end of "Upload Draw command" bar. Here the gpu might have to wait for the cpu to finish to upload all commands in VRAM. Unfortunatly there is currently no accurate way to know the status of the gpu at this point and it is safer to keep this bar as tiny as possible.