Tracing the whole hardware infrastructure, from network processors to application-specific many-core processors

Tracers using dynamic tracepoints, including DTrace, SystemTap [14] and GDB tracepoints, have been optimized for ad hoc tracing with no overhead when not tracing. Tracepoints are then dynamically inserted using trap instructions and require a costly context switch to handle the trap in most cases. Furthermore, using traps may impose additional constraints on the context in which tracepoints can be inserted. LTTng is a tracer based on static tracepoints. Other Linux kernel tracers, Perf and Ftrace, derive to some extent from LTTng for static tracepoints mechanisms. These tracers have been optimized for low overhead and low disturbance, insuring that they can be used to trace the behavior of heavily loaded multi-core online systems. LTTng in particular uses per CPU buffers, lockless per CPU atomic operations, zero copy buffering and transport, and Read Copy Update (RCU) synchronization.

These advanced tracing algorithms need to be extended and complemented in order to harvest efficiently all the available information. Already, LTTng has some initial support for dynamic tracepoints. This track will develop new algorithms to efficiently retrieve, store and display the content of running programs like usually done by debuggers. The GDB tracepoints already provide some mechanisms for non-stop tracing of complex expressions but further optimizations are possible. The difficulty comes from the multi-threaded aspect of dynamic code modification, which on Intel X86 platforms may even need to reset the instruction execution pipeline. New lockless algorithms, based on atomic modifications and using RCU techniques will be required to allow efficient dynamic instrumentation of a large number of sites, on many-core systems, while minimizing the disruption time.

Another challenge that will be addressed is to efficiently extract this hardware trace information and make it available to complement the software traces in the tracing infrastructure. This includes adapting the hardware generated trace information to a common format, synchronizing it to a common time base, and insuring an efficient interface to the software controlled part of the chain.



Team members

David Couturier École Polytechnique de Montréal Master Student
Mathieu Desnoyers Efficios Partner


Documents and presentations

LTTng CLUST: A system-wide unified CPU and GPU tracing tool for OpenCL applications

.(Not yet published)

Towards faster trace filters using eBPF and JIT

Tracing GPUs

Hardware assisted software tracing

Debugging and Tracing of Many-core Processors

Teaching Operating Systems Concepts with Execution Visualization

(Not yet published)

Turbocharged Tracing with LTTng

LTTng updates

Demo of TMF

LTTng Xeon Phi / GDB RPC debugging