Troubleshooting real world software
Today many aspects of software development are giving a hard time to developers. Multicore programs introduce notoriously difficult-to-find bugs, race conditions, deadlocks, non-deterministic behavior. Code may need to interact with many layers e.g. middleware, VM, OS, hypervisor, sometimes across nodes or clusters. The Internet of Things and Machine to Machine technologies create complex systems with a huge number of different links and interactions between different types of devices. In the embedded domain it is not uncommon to have heterogeneous systems with a combination of Linux on a few cores and bare metal on specialized processors. Those are just a few examples where tricky problems can take an incredible amount of investigation before developers can identify the cause of the problem. Debuggers are invaluable for algorithm issues but they fall short for the above category of problems, not to mention that some problems are not reproducible in the developer’s environment!
Tracing/logging is usually the only way to troubleshoot complex problems. Fortunately many companies are improving open source tools to enable very low overhead tracing and advanced trace analysis. For example, IBM granted the Read-Copy-Update patent (yes it was part of the SCO lawsuit) to be used initially in the Linux kernel and then also in LGPL for LTTng to enable application tracing, Freescale added HW traces, Polytechnique University who created LTTng added the generic state system, EfficiOS implemented the Common Trace Format (CTF) in collaboration with the Linux Foundation Embedded Work Group and the Multicore Association Tool Infrastructure Work Group, Ericsson provided the Eclipse trace analysis, adoption by all the main Linux distro, etc. In other words, opens source at its best!
The Eclipse part of this effort is hosted at the Linux Tools project in the Tracing and Monitoring Framework (TMF). This talk will cover important trace concepts and how to analyze the data in Eclipse:
- System-wide and multi-system trace correlation
- How to provide new trace types by plug-in extension to enable fully integrated correlation of different log/trace formats
- Text log parser wizard to create new trace types without any coding required
- Analysis views: control flow, histogram, sequence diagram, statistics, resources, events with searching, filtering, bookmarking
- Integration of CDT GDB tracepoint with TMF
- Common Trace Format (CTF)
- Generic State System for storing application states and allowing very efficient queries
- Trace design and use cases
- How to reduce trace overhead by a factor of 200!
- Controlling Linux Tracing Toolkit (LTTng) for low intrusive tracing and precise time stamp (100 ns)
- EMF info in CTF and in LTTng Tracepoint to enable model-based tracing in Papyrus
- Handling hundreds of GB of trace data
- BSD/MIT Libraries providing trace reading/writing to and from CTF
- Correlation with HW traces
Come and learn how troubleshooting can become easier.