1. 19 Feb, 2013 13 commits
    • Geoff Simmons's avatar
      trackrdrd: - added amq_connection.* and connection pooling · 78705f7d
      Geoff Simmons authored
      	- reworked config params:
      		- maxopen.scale can be an power of 2
      		- maxdone, maxdata and qlen.goal not powers of 2
      78705f7d
    • Geoff Simmons's avatar
      301c1ba6
    • Geoff Simmons's avatar
      eebb874d
    • Geoff Simmons's avatar
      f5fa81b2
    • Geoff Simmons's avatar
      trackrdrd: - corrected the termination condition · 594bc60a
      Geoff Simmons authored
      	- now logging the varnish instance name
      	- some code cleanup
      594bc60a
    • Geoff Simmons's avatar
      fe21d4ff
    • Geoff Simmons's avatar
    • Geoff Simmons's avatar
      trackrdrd code reorg: · cd5bf4d7
      Geoff Simmons authored
      	- child process in child.c (including hashing code)
      	- common signal handlers in handler.c
      	- other code common to parent and child in trackrdrd.h & config.c
      cd5bf4d7
    • Geoff Simmons's avatar
      trackrdrd: - all data & functions exclusive to VSL reader are now · 644b0b5c
      Geoff Simmons authored
      	static in trackrdrd.c (part of data.c and all of hash.c)
      	- replaced the global nworkers with WRK_Running(), since
      	nworkers caused too many dependencies (esp. for unit tests)
      644b0b5c
    • Geoff Simmons's avatar
      trackrdrd: - make check now passes · 9814a0b4
      Geoff Simmons authored
      	- Stop/NeedWorker now encapsulated by the SPMCQ interface
      	- spmcq_len not exposed by the SPMCQ interface
      9814a0b4
    • Nils Goroll's avatar
      Various performance and stability improvements, hash/data table separation · eb179832
      Nils Goroll authored
      major changes
      =============
      
      hash/data table
      ---------------
      
      The hash table is now only used for _OPEN records, and the actual data
      is stored in a data table. Upon submit, hash entries are cleared and
      data continues to live in the data table until it gets freed by a
      worker (or upon submit if it is a NODATA record).
      
      This drastically reduces the hash table load and significantly
      increases worst case performance. In particular, the hash table load
      is now independend of ActiveMQ backend performance (read: stalls).
      
      Preliminary recommendations fon table sizing:
      
      * hash table: double max_sessions from varnish
      
        e.g.
      
        maxopen.scale = 16
      
        for 64K hash table entries to support >32K sessions
        (savely and efficiently)
      
      * data table: max(req/s) * max(ActiveMQ stall time)
      
        e.g. to survive 8000 req/s with 60 seconds ActiveMQ stall time,
        the data table should be >240K in size, so
      
        maxdone.scale = 19
      
        (= 512K entries) should be on the safe side also to provide
        sufficient buffer for temporary load peaks
      
      hash table performance
      ----------------------
      
      Previously, the maximum number of probes to the hash table was set to
      the hash table size - which resulted in bad insert performance and
      even worse lookup performance.
      
      Now that the hash table is freed of _OPEN records, we can remove this
      burden and limit the maximum number of probles to a sensible value (10
      to start with, configurable as hash_max_probes.
      
      As another consequence, as we don't require 100% capacity on the hash
      table, we don't need to run an exhaustive search upon insert. Thus,
      probing has been changed from liner to hash (by h2()).
      
      only ever insert on ReqStart - and drop if we can't
      ---------------------------------------------------
      
      Keeping up with the VSL is essential. Once we fall behind, we are in
      real trouble:
      
      - If we miss ReqEnd, we will clobber our hash, with drastic effects:
        - hash lookups become inefficient
        - inserts become more likely to fail
        - before we had HASH_Exp (see below), the hash would become useless
      
      - When the VSL writer overtakes our reader, we will see corrupt data
        and miss _many_ VCL Logs and ReqEnds (as many as can be found in the
        whole VSL), so, again, our hash and data arrays will get clobbered
        with incomplete data (which needs to be cleaned up by HASH_Exp).
      
      The latter point is the most relevant, corrupt records are likely to
      trigger assertions.
      
      Thus, keeping up with the VSL needs to be our primary objective. When
      the VSL overtakes, we will loose a massive amount auf reconds anyway
      (and we won't even know how many). As long as we don't stop Varnish
      when we fall behind, we can't avoid loosing records under certain
      circumstances anway (for instance, when the backend stalls and the
      data table runs full), so we should rather drop early, in a controlled
      manner - and without drastic performance penalty.
      
      Under this doctrine, it does not make sense to insert records for
      VSL_Log or ReqEnd, so if an xid can't be found for these tags, the
      respective events will get dropped (and logged).
      
      performance optimizations
      =========================
      
      spmcq reader/writer synchronization
      -----------------------------------
      
      Various measures have been implemented to reduce syscall and general
      function call overhead for reader/writer synchroniration on the
      spmcq. Previously, the writer would issue a pthread_cond_signal to
      potentially wake up a reader, irrespective of whether or not a reader
      was actually blocking on the CV.
      
      - now, the number of waiting readers (workers) is modified inside a
        lock, but queried first from outside the lock, so if there are no
        readers waiting the CV is not signalled.
      
      - The number of running readers is (attempted to be) kept proportional
        to the queue length for queue lengths between 0 and
        2^qlen_goal.scale to further reduce the number of worker thread
        block/wakeup transitions under low to averade load.
      
      pthread_mutex / ptherad_condvar attributes
      ------------------------------------------
      
      Attributes are now being used to allow the O/S implementation to
      choose more efficient low-level synchronization primitives because we
      know that we are using these only within one multi-threaded process.
      
      data table freelist
      -------------------
      
      To allow for efficient allocation of new data table entries, a free
      list with local caches is maintained:
      
      - The data writer (VSL reader thread) maintains its own freelist and
        serves requests from it without any synchronization overhead.
      
      - Only when the data writer's own freelist is exchausted will it
        access the global freelist (under a lock). It will take the whole
        list at once and resume serving new records from its own cache.
      
      - Workers also maintain their own freelist of entries to be returned
        to the global freelist as long as
      
        - they are running
        - there are entries on the global list.
      
        Before a worker thread goes to block on the spmcq condvar, it
        returns all its freelist entries to the global freelist. Also, it
        will always check if the global list is empty and return any entries
        immediately if it is.
      
      stability improvements
      ======================
      
      record timeouts
      ---------------
      
      Every hash entry gets added to the insert_list ordered by insertion
      time. Not any more often then x seconds (currently hard-coded to x=10,
      check only performed when ReqStart is seen), the list is checked for
      records which have reached their ttl (configured by hash_ttl, default
      120 seconds). These get submitted despite the fact that no ReqEnd has
      been seen - under the assumption that no ReqEnd is ever to be expected
      after a certain time has passed.
      
      hash evacuation
      ---------------
      
      If no free entry is found when probing all possible locations for an
      insert, the oldest record is evacuated from the hash and submitted to
      the backend if its live time has exceeded hash_mlt under the
      assumption that it is better to submit records early (which are likely
      to carry useful log information already) than throwing away records.
      
      If this behavior is not desired, hash_mtl can be set to hash_ttl.
      
      various code changes
      ====================
      
      * statistics have been reorganized to seperate out
        - hash
        - data writer/VSL reader
        - data reader/worker (partially shared with writer)
        statistics
      
      * print the native thread ID for workers (to allow to correllation
        with prstat/top output)
      
      * workers have a new state when blocking on the spmcq CV: WRK_WAITING
        / "waiting" in monitor output
      
      * because falling behind with VSL reading (the VSL writer overtaking
        our reader) is so bad, notices are logged whenever the new VSL data
        pointer is less than the previous one, iow the VSL ring buffer
        wraps.
      
        this is not the same as a detection of the VSL writer overtaking
        (which would require varnishapi changes), but noting information and
        some statistics about VSL wraps can (and did) help analyze track
        down strance issues to VSL overtaking.
      
      config file changes
      ===================
      
      * The _scale options
      
        maxopen.scale
        maxdone.scale (new, see below)
        maxdata.scale
      
        are now being used directly, rather than in addition to a base value
        of 10 as before.
      
        10 is now the minimum value and an EINVAL error will get thrown
        when lower values are used in the config file.
      
      new config options
      ==================
      
      see trackrdrd.h for documentation in comments:
      
      * maxdone.scale
      
        Scale for records in _DONE states, determines size of
        - the data table (which is maxopen + maxdone)
        - the spmcq
      
      * qlen_goal.scale
      
        Scale for the spmcq queue length goal. All worker threads will be
        used when the queue length corresponding to the scale is reached.
      
        For shorter queue lengths, the number of worker threads will be
        scaled propotionally.
      
      * hash_max_probes
      
        Maximum number of probes to the hash.
      
        Smaller values increase efficiency, but reduce the capacity of the
        hash (more ReqStart records may get lost) - and vice versa for
        higher values.
      
      * hash_ttl
      
        Maximum time to live for records in the _OPEN state
      
        Entries which are older than this ttl _may_ get expired from the
        trackrdrd state.
      
        This should get set to a value significantly longer than your
        maximum session lifetime in Varnish.
      
      * hash_mlt
      
        Minimum lifetime for entries in HASH_OPEN before they could get
        evacuated.
      
        Entries are guaranteed to remain in trackrdrd for this duration.
        Once the mlt is reached, they _may_ get expired when trackrdrd needs
        space in the hash.
      eb179832
    • Geoff Simmons's avatar
    • Geoff Simmons's avatar
      trackrdrd: added signal handlers for HUP · d45589c6
      Geoff Simmons authored
      d45589c6
  2. 17 Dec, 2012 1 commit
  3. 07 Dec, 2012 1 commit
  4. 06 Dec, 2012 4 commits
  5. 05 Dec, 2012 1 commit
  6. 04 Dec, 2012 3 commits
  7. 03 Dec, 2012 1 commit
  8. 30 Nov, 2012 5 commits
  9. 29 Nov, 2012 3 commits
  10. 28 Nov, 2012 5 commits
  11. 27 Nov, 2012 3 commits