boost.png (6897 bytes) Home Libraries People FAQ More

PrevUpHome

Appendices

Credits
The Attic...

Credits

Author

StackTrack was written by James Fowler, founder of Open Sea Consulting. As of the initial release, he is solely responsible for any bugs, typos, or other deficiencies in StackTrack...

Acknowledgements

The Boost Community, for the wonderful things they have accomplished so far...

Joel de Guzman and Eric Niebler, for creating QuickBook, Thomas Guest for adding dynamic source code highlighting, and the many others working to improve the process for creating Boost documentation.

Brian Baatz, for enlightening conversation on profiling interfaces.

Ion Gaztaņaga, for providing the shmem library which works very well with Boost and is used by StackTrack.

ACE - The initial protoype for shared memory support in StackTrack used ACE, which worked well. Unfortunately, requiring ACE just for basic shared memory support would make StackTrack more difficult to use for projects which don't already depend on ACE. There are, however, some "bonus features" planned for ACE users in future releases...

The Attic...

WARNING : PRE-RELEASE VERSION, NOT OFFICIAL BOOST LIBRARY!!!!
StackTrack is not an official Boost library. It has not been submitted for review as a potential Boost library, although the current intention is to seek review after a stable release with significant positive experience in the Boost community.

This section is a holding area for various questions, ideas, and so on which are still in process.

Pending

WARNING: You're in The Attic....
It can get a bit dusty in here - don't expect everything to be polished, current, or even accurate...

Design Decisions which might be made adjustable (current behavior, if any, in italics) :

  • Shared Memory
    • Overflow : what happens when writing a log entry would require the thread to block?
      • block indefinitely?
      • timed block?
      • drop the data and move on?
    • Identification
      • named?
      • anonymous (from user's perspective)
    • Process granularity
      • unique buffer per process
      • shared buffer for all processes on system
    • __
  • Timing
    • Granularity
      • every call gets a unique timer value?
      • common shared timer value updated on background thread?
    • Format
      • store as raw (highest efficiency), convert in post-processing?
      • convert to common (double?) format?
    • Calibration
      • expected accuracy for calibrating relative time to "world" time
        • +\- "world" time (1 second)/
        • average before/after hires time at "world" time transition - introduces startup delay during calibration
      • periodic recalibration?

Documentation Config

WARNING: You're in The Attic....
It can get a bit dusty in here - don't expect everything to be polished, current, or even accurate...
Be careful in this section... this may not be the "right way" to set up documentation for BoostBook, just because the author happened to figure out how to beat it into submission...

Prerequisites

The following need to be configured and available:

BoostBook

Make sure you already have BoostBook successfully installed and working before you try generating the docs for StackTrack!

Using Doxygen 1.4.1

BoostBook as contained in Boost 1.32 works (for the author) with Doxygen 1.3.9.1. Since StackTrack makes use of some nested classes (which Doxygen 1.3.9.1 didn't seem to handle well), an attempt was made to use Doxygen 1.4.1 which failed miserably. Some changes have been made to $BOOST/tools/boostbook/xsl/doxygen/doxygen2boostbook.xsl which allow it to work better, but it still needs some work. For now, be happy with pre-1.4.x versions unless you're interested in wrestling with BoostBook's Doxygen support.

QuickBook

Get QuickBook from CVS, build it, and have the executable in your path.

Create a "quickbook.jam" in $BOOST/tools/v2/build/tools (if it doesn't exist) containing:

import type ; 
import boostbook ; 
type.register QUICKBOOK : qbk ; 
 
import generators ; 
generators.register-standard quickbook.inline-file : QUICKBOOK : XML ; 
 
actions inline-file 
{ 
    quickbook $(>) $(<) 
} 

Build the docs

run

bjam --v2 

from the directory $BOOST/libs/stacktrack/docs

  • optional : BoostBook update for Doxygen 1.4x (works with 1.32 release and Doxygen 1.2.9.1, but some doc details will be omitted)
    • this update is still in progress

High Precision Timers

WARNING: You're in The Attic....
It can get a bit dusty in here - don't expect everything to be polished, current, or even accurate...
struct hires_momemt { 
   typedef some_type raw_type; 
   void operator()( raw_type & ); 
 
    
};     

Interface for capturing high resolution time

  • can be very challenging to balance:
    • resolution (as high possible??? !!!)
    • reliability and accuracy (otherwise resolution doesn't mean much...)
    • efficiency (all this with zero runtime overhead please, and fetch me a fresh cup of coffee while you're at it)
  • variations in interface
    • hard to find one ideal portable API to build on, lots of choices
      • ACE has some nice timer wrappers, but that doesn't help unless you're using ACE
      • QueryPerformanceCounter(...) & QueryPerformanceFrequency(...)on Windows
      • time() is ubiquitous, but with pitiful resolution...
      • gettimeofday() available on various POSIX systems
      • gethrtime() on Solaris, HPUX, systems (maybe RTLinux too)
      • reading tsc counters on Pentium-based systems
      • and there are assuredly a few more out there
    • representation - some use an integer type, some use structs, ...
    • resolution - from very coarse (1 second from time()) to very fine (nanoseconds for gethrtime())
    • context - "wall clock" time (like gettimeofday()), process time (like clock()), and more...
    • scaling - constant (sec/usec in gettimeofday()) vs. runtime dynamic (QueryPerformanceCounter(...) / QueryPerformanceFrequency(...))
    • offset - fixed ( gettimeofday()) vs. floating (gethrtime())
    • overflow for floating offset representations
    • calibration for floating offset representations (baseline for conversion to known offset)
  • variations in behavior - due to implementation or dynamic environmental influences
    • known issues, like QueryPerformanceCounter() sometimes skipping
    • degree of separation from actual "real time clock" hardware
      • query RTC hardware directly (should be most stable - but not necessarily fastest...)
      • query counters slaved to CPU clock (like the "rdtsc" instruction for Pentiums)
      • query global internal "current time" value incremented by periodic process (interrupts...)
    • local uncertainty, i.e. the first (in hard real time) of two nearly simultaneous calls may have a result slightly "later" value than the second
      • called in one thread?
      • called in multiple threads on one CPU?
      • called in multiple processes on one CPU?
      • called in multiple threads/processes on multiple CPUs?
    • performance impact - overhead varies
      • based on API calls used, can differ by platform / kernel version / etc...
      • based on frequency called (impact on caches, SMP systems)
    • precision - useful precision may be less than representation allows
    • response to system "idling"- load-based CPU speed throttling, sleep & hibernation periods, etc.
    • drift - representations with "wall clock" context but floating offsets may (or may not) diverge over time from calibration points

thoughts on requirements for an ideal portable C++ high resolution time capture mechanism

  • generic interface
    • specifies opaque "raw_hrtime" form in which time is stored
      • not necessarily an object, may use POD type as representation
      • don't want to require constructor/destructor
    • common signatures for key operations
      • store "now" in an instance of raw_hrtime
      • move/copy instance of raw_hrtime
      • convert from raw_hrtime into normalized form(s)
        • separate conversions for floating and fixed offsets
        • conversion to integral POD type(s)
          • also provide converted equivalents for scale and offset
        • conversion to double (in seconds)
      • overflow check
      • get scaling factor
      • get offset
    • traits to make variations in interface of underlying API available for compile-time or run-time use
      • size & alignment requirements for raw_hrtime
      • resolution, context, mode (fixed|variable) for scaling and offset
    • traits / operations addressing potential variations in behavior
      • optional, may only represent "best guess"
    • common signatures for potentially optional operations (default implementation may be feasible)
      • calibration
      • drift detection
    • ideally zero runtime overhead added by generic wrapper for each capture of "now" in raw_hrtime
      • highest priority is capture
      • secondary priority is direct (unprocessed) output
        • output raw_hrtime without forcing conversion
          • do NOT apply - or even query - scaling and offset
        • optionally to store additional information as necessary to decode raw form
          • traits on interface variation
          • scaling/offset values
        • can treat as raw (possibly aligned) opaque chunk of memory
      • local manipulation only by explicit request
        • deferred (lazy-evaluation) for any conversions
          • use scaling and offset to provide normalized representation
        • calibration on demand

"Friendly" class wrapper for hires time type

  • A common class interface implemented around the generic raw_hrtime interface should be able to provide
    • Convenient features like automatically fetching "now" in the constructor
    • automatic conversion to normalized form (like double)
    • comparison and math operations
    • iostream output
    • and do all this in clean, portable code
  • A templatized version can be created to allow compile time variation multiple raw_hrtime implementations
  • A concrete version could provide "best effort" default support for hi-res time
    • may need more than one...
  • A hybrid version could potentially allow for selection of the "closest" available implementation based on filtering raw_hrtime variants
    • example: VARIABLE_OFFSET_OK + MAX_SMP_CONSISTENCY
    • may or may not be worth it if this requires runtime performance hit...

Features of StackTrack

WARNING: You're in The Attic....
It can get a bit dusty in here - don't expect everything to be polished, current, or even accurate...

This section is out of data and somewhat redundant...

Performance

  • StackTrack is designed for minimum impact at runtime
  • Data is gathered in a very minimalist, binary format
  • Post-processing is required to produce useful analysis
    • this significantly reduces the time and space costs of acquiring trace data
    • postprocessing is currently done via perl script

Connectivity

  • For simple use cases, data can be written directly to local files
    • this introduces a problem: either the files must be flushed after each write (which can cause steep overhead) or data may be lost if the process dies with buffered, unwritten data.
  • StackTrack supports a shared-memory protocol which allows minimum overhead while reducing the risk of data loss due to buffered file I/O.
Copyright Š 2005 James Fowler

PrevUpHome