rtcompare

package module
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 4, 2026 License: MIT Imports: 9 Imported by: 1

README

rtcompare

Go Report Card Go Reference Linter Tests coverage

Statistically significant runtime comparison for codepaths in golang

rtcompare is a small, focused Go library for robust runtime or memory measurement comparisons and lightweight benchmarking. It provides utilities to collect timing samples, compare sample distributions using bootstrap techniques, and helper primitives (deterministic PRNG, sample timing helpers, small statistics utilities). The project is intended as a practical alternative to the standard testing benchmarking harness when you want reproducible, distribution-aware comparisons and confidence estimates for relative speedups.

Keywords: benchmarking, performance, bootstrap, runtime comparison, statistics, deterministic prng, go

Features

  • Collect per-run timing or memory consumption samples for two implementations and compare their distributions.
  • Compute confidence that implementation A is faster or less memory consuming than B by at least a given relative gain using bootstrap resampling.
  • Deterministic DPRNG for reproducible input generation.
  • Timing helpers (SampleTime, DiffTimeStamps) and small statistics utilities (mean, median, stddev).
  • Small, dependency-light API suitable for integration into CI and micro-benchmarks.
  • CPRNG — a new cryptographically secure PRNG backed by crypto/rand for scenarios that require cryptographic strength or unpredictable inputs (see API highlights).

Install

Use as a normal Go module dependency:

go get github.com/TomTonic/rtcompare

Import:

import "github.com/TomTonic/rtcompare"

Quickstart example

This example demonstrates how to collect timing samples for two implementations candidate A and candidate B and compare them (see cmd/rtcompare-example/main.go for a full runnable example).

import (
    "fmt"
    "math/rand"

    "github.com/TomTonic/rtcompare"
)

func example() {
    // generate some timing samples for two functions
    var timesA, timesB []float64
    for i := 0; i < 50; i++ {
        // set up inputs deterministically using DPRNG
        dprng := rtcompare.NewDPRNG()
        // measure repeatedly to reduce quantization noise
        t1 := rtcompare.SampleTime()
        for j := 0; j < 2000; j++ {
            // call candidate A
            // use dprng with constant runtime if necessary
        }
        t2 := rtcompare.SampleTime()
        timesA = append(timesA, float64(rtcompare.DiffTimeStamps(t1, t2))/2000.0)

        // ... same for candidate B ...
    }

    // Compare distributions using bootstrap (precision controls bootstrap repetitions)
    speedups := []float64{0.1, 0.2, 0.5, 1.0} // relative speedups in % to test
    // use the package default resamples or provide a numeric value
    results, err := rtcompare.CompareSamplesDefault(timesA, timesB, speedups)
    if err != nil {
        panic(err)
    }
    for _, r := range results {
        fmt.Printf("Speedup ≥ %.0f%% → Confidence %.2f%%\n", r.RelativeSpeedupSampleAvsSampleB*100, r.Confidence*100)
    }
}

Technical background

  • Bootstrap-based inference: Instead of reporting a single sample mean or relying on the testing harness, rtcompare collects timing samples across independent runs and uses bootstrap resampling to estimate the confidence that one implementation is faster than another by at least a given relative margin. This yields more informative, distribution-aware results (confidence intervals and probability estimates).

  • Deterministic input generation: DPRNG is provided to seed and generate reproducible inputs across runs, helping reduce input variance when comparing implementations. For cases that require cryptographic strength or unpredictable inputs (for example, testing code that must handle cryptographic-quality randomness), rtcompare now provides CPRNG, a crypto/rand-backed PRNG. Use DPRNG when you need deterministic, repeatable, extremely fast inputs; use CPRNG when you need cryptographic unpredictability or higher entropy.

  • Noise reduction: The example shows how to warm up, use multiple inner iterations per timing sample to reduce quantization noise, and manually trigger GC cycles to reduce interference from allocations.

When to use rtcompare instead of testing.B

Use rtcompare when you want:

  • Distribution-aware comparisons rather than single-number reports.
  • Statistical confidence estimates for relative speedups (e.g., "A is at least 20% faster than B with 95% confidence").
  • A library you can easily call from small programs, CI jobs, or dedicated comparison tools without the testing harness.

The standard testing package is excellent for microbenchmarks and tight per-op measurements. rtcompare complements it by focusing on sampling strategy, resampling inference, and reproducible comparisons across implementations.

API highlights

  • DPRNG — deterministic PRNG with Uint64 and Float64 helpers.
  • CPRNG — cryptographically secure PRNG backed by crypto/rand. Provides the same convenience helpers (Uint64, Float64) as DPRNG but yields cryptographic-strength randomness; not deterministic across runs.
  • SampleTime() / DiffTimeStamps() — helpers for high-resolution timing.
  • CompareSamples(timesA, timesB, speedups, resamples) — returns confidence estimates per requested relative speedup. Use rtcompare.DefaultResamples or the convenience wrapper rtcompare.CompareSamplesDefault for a sensible default.
  • QuickMedian — returns the median of a Float64 slice in expected O(n) time.

Note on negative relativeGains: Negative thresholds are allowed and are interpreted as tolerated relative slowdowns rather than speedups. A threshold of -0.05 means "A is within 5% of B" (i.e., A is not more than 5% slower than B). Use negative values when you want to ask whether one implementation is approximately as fast as another within a relative tolerance instead of requiring a strict speedup.

Choosing resamples

The number of bootstrap resamples controls the Monte‑Carlo error of the confidence estimates. Common recommendations from the bootstrap literature (Efron & Tibshirani; Davison & Hinkley) are:

  • Use at least 1,000 resamples for reasonable standard-error estimation.
  • Use 5,000–10,000 resamples when estimating percentile confidence intervals or when you need stability in tails.

The Monte‑Carlo standard error of a proportion estimated from resamples decreases approximately as 1/sqrt(R) where R is the number of resamples. Increase resamples when you require low Monte‑Carlo noise (for example, precise reporting of extreme thresholds). See Efron & Tibshirani (1993) and Davison & Hinkley (1997) for more details.

See the package docs and the example in cmd/rtcompare-example for detailed usage.

Contributing

Contributions welcome. Please open issues or PRs for bug reports, performance tweaks, or additional comparison strategies. Add tests for any behavioral changes and keep CPU/memory overheads small.

License

MIT — see LICENSE file.

Documentation

Index

Constants

View Source
const DefaultResamples uint64 = 5_000

DefaultResamples is a sensible package-level default for bootstrap resamples. Use this when you want a balanced trade-off between Monte-Carlo precision and runtime cost. This default (5k) follows common recommendations in the bootstrap literature; increase it for extreme-tail accuracy or highly precise confidence estimates.

View Source
const MinimumDataPoints uint64 = 11

Variables

This section is empty.

Functions

func BootstrapConfidence added in v0.2.0

func BootstrapConfidence(A, B []float64, relativeGains []float64, resamples uint64, prngSeed uint64) (confidenceForThreshold map[float64]float64)

BootstrapConfidence estimates the probability (confidence) that the relative speedup of A over B meets or exceeds each requested threshold using bootstrap resampling.

The function performs `resamples` bootstrap replicates. In each replicate it draws a bootstrap sample from A and from B (via bootstrapSample), computes their medians and evaluates the relative speedup as:

delta = 1 - median(A_sample)/median(B_sample)

A positive delta indicates A is faster than B by that relative amount. For every threshold t in `relativeGains` the function increments a counter when delta >= t. After all replicates it returns a map that maps each threshold to the estimated confidence (fraction of replicates meeting delta >= t).

Numerical and edge-case behavior (important):

  • If `resamples` is zero the function returns a map with each threshold mapped to math.NaN().
  • If either sample median is NaN (for example QuickMedian returned NaN for an empty sample), the replicate produces delta = NaN and that replicate does not count as meeting any threshold.
  • To avoid divide-by-zero and extreme ratios when median(B_sample) == 0 (or is numerically extremely small), the implementation uses a small, scale-aware epsilon fallback. Concretely it chooses an epsilon = max(|median(B)| * rel, SmallestNonzeroFloat64) with a small relative factor (e.g. rel = 1e-12). If |median(B)| < epsilon the code uses epsilon as the denominator. This guarantees a finite, bounded delta while preserving the correct ratio for typical non-zero medians.
  • If both medians are zero (or both are equal/infinite in the same direction), the replicate sets delta = 0.0 (no relative difference).

Parameters:

  • A, B: observed samples (e.g. runtimes or throughputs) used as the population for bootstrap sampling.
  • relativeGains: slice of relative-speedup thresholds to evaluate (e.g. 0.05 for 5% faster).
  • resamples: number of bootstrap resamples to run (the greater the resamples, the lower the Monte Carlo sampling error).
  • prngSeed: DPRNG seed used for reproducible sampling. Provide a specific non-zero seed to reproduce results across runs. If prngSeed is zero, the function uses a CPRNG with cryptographic strength randomness.

Note on choosing `resamples` (literature guidance): There is no one-size-fits-all value; common recommendations in the bootstrap literature (Efron & Tibshirani; Davison & Hinkley) are to use at least 1,000 resamples for standard-error estimation and often 5,000–10,000 (or more) when estimating percentile confidence intervals, especially for tail probabilities. The Monte Carlo error of a proportion estimated from resamples decreases approximately as 1/sqrt(R) where R is the number of resamples, so doubling `resamples` reduces that error by about 1/sqrt(2). For many practical uses `resamples` in the range 1,000–10,000 is a reasonable default; increase it when you need precise confidence estimates near extreme thresholds or when you require reproducible low-variance results.

Returns:

A map[float64]float64 where each key is a threshold from `thresholds` and the corresponding value is
the estimated confidence in [0,1] that the relative speedup of A over B is at least that threshold.

func DiffTimeStamps

func DiffTimeStamps(t_earlier, t_later TimeStamp) int64

Retruns the difference between two timestams in nanoseconds with the highest possible precision (which might be more than just one nanosecond). The function assumes that t_later is later than t_earlier and will return a negative value if this is not the case. Please note that the call to this function does NOT have constant runtime on other systems but Windows.

func F2T added in v0.4.0

func F2T(timesFaster float64) float64

F2T (FactorToThreshold) converts a multiplicative speedup timesFaster (e.g. 3.0 => A is 3× faster) to the internal relative‑reduction threshold used by CompareSamples and BootstrapConfidence.

func FloatsEqualWithTolerance

func FloatsEqualWithTolerance(f1, f2, tolerancePercentage float64) bool

FloatsEqualWithTolerance reports whether f1 and f2 are approximately equal, using a percentage-based absolute tolerance computed from each operand.

The tolerancePercentage parameter is interpreted as a percentage (for example, 5 means 5%). For each value v the function computes

absTol = |v * tolerancePercentage / 100|

and checks whether the other value lies within [v - absTol, v + absTol]. The function returns true if either check succeeds (i.e. if f2 is within the tolerance computed from f1 OR f1 is within the tolerance computed from f2).

Important notes:

  • The comparison uses absolute tolerances derived from the operands, which makes the check effectively two-sided: a small value may be considered equal to a much larger one if the larger value's tolerance range contains the smaller value.
  • A tolerancePercentage of 0 requires exact equality.
  • Negative tolerancePercentage values are treated equivalently to their absolute value because the computed tolerance is wrapped with math.Abs.
  • Comparisons involving NaN follow IEEE754 semantics and will not be true; interactions with ±Inf follow IEEE754 and may produce true results when a computed tolerance range is infinite.
  • This function performs simple arithmetic checks and returns a boolean.

func GenerateScrambler added in v0.5.0

func GenerateScrambler() uint64

GenerateScrambler generates reasonable scrambler constants for the DPRNG. The generated scrambler constant is always an odd number with a good bit density. This ensures maximal period and good mixing properties. You can use the returned scrambler constant when creating a new DPRNG instance to get a different permutation (i.e., sequence) of generated numbers.

func GetSampleTimePrecision

func GetSampleTimePrecision() int64

Returns the precision of time measurements obtained via SampleTime() on the runtime system in nanoseconds. Should return 100ns on Windows systems, and typically between 20ns and 100ns on Linux and MacOS systems.

func Median

func Median(data []float64) float64

Median computes the median of the provided slice of float64. If data is empty, Median returns 0.0. The function makes a copy of the input and sorts the copy, so the original slice is not modified. For an odd-length slice it returns the middle element; for an even-length slice it returns the element at index len(data)/2 (the upper middle). Time complexity: O(n log n). Space complexity: O(n) due to the copy required for sorting.

func QuickMedian

func QuickMedian(xs []float64) float64

QuickMedian returns the median in expected O(n) time. In case of an odd number of elements, it returns the middle one. In case of an even number of elements, it returns the higher of the two middle ones. Returns math.NaN() for an empty input slice. Note: This function modifies the input array. To avoid this, pass a copy.

func Statistics

func Statistics(data []float64) (mean, variance, stddev float64)

Statistics computes the arithmetic mean, population variance, and standard deviation of the provided slice of float64 values.

It returns three float64s in order: mean, variance, and stddev.

The variance returned is the population variance (sum of squared deviations divided by n). The standard deviation is the square root of that variance.

If the input slice is empty, the function returns mean = 0 and variance = stddev = -1 to indicate that the values are undefined for an empty dataset.

Types

type CPRNG added in v0.4.0

type CPRNG struct {
	// contains filtered or unexported fields
}

CPRNG is a cryptographically secure random number generator ("CryptographicPrecisionRNG") that reads random bytes in batches to reduce the number of calls to the underlying crypto/rand.Reader (OS call). This improves performance while maintaining security and provides high-precision output suitable for statistical and numerical work. This RNG is thread-safe as long as each goroutine uses its own instance. The memory footprint can be adjusted by changing the capBytes parameter in NewCPRNG.

func NewCPRNG added in v0.4.0

func NewCPRNG(capBytes uint32) *CPRNG

NewCPRNG creates a new CPRNG with a buffer capacity of capBytes. The buffer is filled with random bytes upon creation and refilled as needed. A larger buffer reduces the number of operating system calls to crypto/rand.Reader, improving performance. A smaller buffer reduces memory usage. This random number generator is not deterministic in the sequence of numbers it generates. This random number generator is not deterministic in its runtime (i.e., it does not have a constant runtime as it needs to periodically refill its buffer via crypto/rand.Reader, an OS call). This random number generator is cryptographically secure (relying on crypto/rand, see https://pkg.go.dev/crypto/rand). This random number generator is thread-safe as long as each goroutine uses its own instance. This random number generator has a varying memory footprint (usually a few kilobytes).

func (*CPRNG) Float32 added in v0.4.0

func (c *CPRNG) Float32() float32

Float32 returns a uniformly distributed float32 in [0.0, 1.0). This function will never return -0.0. This function will never return 1.0. This function will never return NaN or Inf. If you need random values in a different range, scale and shift the result accordingly. This function uses 23 random bits for the mantissa. This is the maximum randomness that can be represented in a float32 without breaking uniformity. If you need more randomness, use Float64 instead. See: https://en.wikipedia.org/wiki/Single-precision_floating-point_format

func (*CPRNG) Float64 added in v0.4.0

func (c *CPRNG) Float64() float64

Float64 returns a uniformly distributed float64 in [0.0, 1.0). This function will never return -0.0. This function will never return 1.0. This function will never return NaN or Inf. If you need random values in a different range, scale and shift the result accordingly. This function uses 52 random bits for the mantissa. This is the maximum randomness that can be represented in a float64 without breaking uniformity. See: https://en.wikipedia.org/wiki/Double-precision_floating-point_format

func (*CPRNG) Int16 added in v0.4.0

func (c *CPRNG) Int16() int16

Int16 returns a uniformly distributed int16.

func (*CPRNG) Int32 added in v0.4.0

func (c *CPRNG) Int32() int32

Int32 returns a uniformly distributed int32.

func (*CPRNG) Int64 added in v0.4.0

func (c *CPRNG) Int64() int64

Int64 returns a uniformly distributed int64.

func (*CPRNG) Int8 added in v0.4.0

func (c *CPRNG) Int8() int8

Int8 returns a uniformly distributed int8.

func (*CPRNG) Uint16 added in v0.4.0

func (c *CPRNG) Uint16() uint16

Uint16 returns a uniformly distributed uint16.

func (*CPRNG) Uint32 added in v0.4.0

func (c *CPRNG) Uint32() uint32

Uint32 returns a uniformly distributed uint32.

func (*CPRNG) Uint32N added in v0.4.0

func (c *CPRNG) Uint32N(n uint32) uint32

Uint32N returns a non-negative pseudo-random number in the half-open interval [0,n). Use this function for generating random indices or sizes for slices or arrays, for example. Even though this function will probably not be inlined by the compiler, it has a very efficient implementation avoiding division or modulo operations. This function compensates for bias. For n=0 and n=1, Uint32N returns 0.

For implementation details, see:

https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction
https://lemire.me/blog/2016/06/30/fast-random-shuffling

func (*CPRNG) Uint64 added in v0.4.0

func (c *CPRNG) Uint64() uint64

Uint64 returns a uniformly distributed uint64.

func (*CPRNG) Uint8 added in v0.4.0

func (c *CPRNG) Uint8() uint8

Uint8 returns a uniformly distributed uint8.

type DPRNG

type DPRNG struct {
	State     uint64
	Scrambler uint64
	Round     uint64 // for debugging purposes
}

DPRNG is a Deterministic Pseudo-Random Number Generator based on the xorshift* algorithm (see https://en.wikipedia.org/wiki/Xorshift#xorshift*). This random number generator is by design deterministic in the sequence of numbers it generates. It has a period of 2^64-1, i.e. every single number occurs every 2^64-1 calls and has the same successor and the same predecessor. This random number generator is deterministic in its runtime (i.e., it has a constant runtime). This random number generator is not cryptographically secure. This random number generator is thread-safe as long as each goroutine uses its own instance. This random number generator has a very small memory footprint (24 bytes). The initial state must not be zero.

func NewDPRNG added in v0.1.1

func NewDPRNG(seed ...uint64) DPRNG

NewDPRNG creates a new Deterministic Pseudo-Random Number Generator instance. If no seed is provided, it initializes the state with a random non-zero value. If the provided seed is zero, it initializes the state with a random non-zero value. Otherwise, it uses the provided seed value. If a second uint64 value is provided, it is used as the scrambler constant. The scrambler constant creates a permutation (different sequence) of the generated numbers. If no scrambler constant is provided, Vigna's default scrambler constant is used. The only requirement for the scrambler constant is that it must be an odd number to ensure maximal period, but this is automatically enforced by the code. You can use GenerateScrambler to generate good quality scrambler constants.

func (*DPRNG) Float64

func (thisState *DPRNG) Float64() float64

Float64 returns a pseudo-random float64 in the range [0.0, 1.0) like Go’s math/rand.Float64(). It has a deterministic (i.e. constant) runtime and a high probability to be inlined by the compiler. The generated float64 values are uniformly distributed in the range [0.0, 1.0) with the effective precision of 53 bits (IEEE 754 compliant).

func (*DPRNG) UInt32N added in v0.2.0

func (thisState *DPRNG) UInt32N(n uint32) uint32

UInt32N returns a pseudo-random uint32 in the range [0, n) like Go’s math/rand.Intn(). Use this function for generating random indices or sizes for slices or arrays, for example. This code avoids modulo arithmetics by implementing Lemire's fast alternative to the modulo reduction method (see https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/). It has a deterministic (i.e. constant) runtime and a high probability to be inlined by the compiler. Note: This implementation may introduce a slight bias if n is not a power of two.

func (*DPRNG) Uint64

func (thisState *DPRNG) Uint64() uint64

This function returns the next pseudo-random number in the sequence. It has a deterministic (i.e. constant) runtime and a high probability to be inlined by the compiler.

type RTcomparisonResult

type RTcomparisonResult struct {
	// RelativeSpeedupSampleAvsSampleB is the relative speedup threshold that was evaluated.
	RelativeSpeedupSampleAvsSampleB float64
	// Confidence is the estimated confidence (in [0,1]) that the relative speedup of sample A over sample B
	// meets or exceeds RelativeSpeedupSampleAvsSampleB.
	Confidence float64
}

RTcomparisonResult holds the result of comparing two sets of runtime measurements. For each requested relative speedup threshold it contains the estimated confidence that the speedup of sample A over sample B meets or exceeds that threshold.

func CompareRuntimes deprecated

func CompareRuntimes(measurementsA, measurementsB []float64, relativeGains []float64, resamples uint64) (result []RTcomparisonResult, err error)

Deprecated: Use CompareSamples instead. This function is retained for backward compatibility.

func CompareSamples added in v0.3.0

func CompareSamples(measurementsA, measurementsB []float64, relativeGains []float64, resamples uint64) (result []RTcomparisonResult, err error)

CompareSamples compares two sets of scalar measurements (for example: runtimes, memory footprints, or other numeric metrics) and estimates the confidence that values from `measurementsA` are smaller than those from `measurementsB` by at least the requested relative thresholds.

The function is intentionally metric-agnostic: it treats each input slice as a sample of independent measurements where *smaller* values indicate a better outcome (this matches runtimes or memory consumption). If you have a "larger-is-better" metric (e.g., throughput), transform the inputs before calling this function (for example by taking the reciprocal or negating the values) so that smaller means better.

For each bootstrap replicate the implementation draws a resampled population from `measurementsA` and `measurementsB`, computes their medians and evaluates the relative improvement as:

delta = 1 - median(A_sample)/median(B_sample)

A positive `delta` indicates that `measurementsA` are smaller than `measurementsB` by that relative fraction (e.g. delta=0.2 → A is 20% smaller). For each requested relative gain the function reports the fraction of replicates where `delta >= threshold` as the confidence.

Parameters:

  • measurementsA, measurementsB: samples of scalar measurements (float64). Prefer measurements that share the same units and scale (e.g., both in milliseconds).

  • relativeGains: relative improvement thresholds to evaluate (e.g. 0.05 means "A is at least 5% smaller than B"). If nil or empty, the function evaluates a single relative gain at 0.0 (is A smaller than B at all?).

    Negative values in `relativeGains` are allowed and are interpreted as tolerated relative *slowdowns* of A vs. B. Concretely, a threshold `t < 0` is evaluated as `delta >= t` where `delta = 1 - median(A)/median(B)`. For example, `t = -0.05` corresponds to the statement "A is not more than 5% slower than B" (i.e., A is within 5% of B). A replicate with `delta = -0.03` would count as meeting `t = -0.05` because `-0.03 >= -0.05`.

    Use negative thresholds when you want to ask whether A is *within* a relative tolerance of B rather than strictly faster. Zero remains the boundary "is A smaller than B?" and positive thresholds require A to be faster by at least that relative fraction.

  • resamples: number of bootstrap resamples to run (larger → more precise estimates, longer runtime). See the note in `BootstrapConfidence` for guidance and literature references about choosing the number of resamples.

Returns a slice of RTcomparisonResult where each entry contains the requested relative threshold and the corresponding confidence in [0,1]. If either input contains fewer than `MinimumDataPoints` values an error is returned.

func CompareSamplesDefault added in v0.3.0

func CompareSamplesDefault(measurementsA, measurementsB []float64, relativeGains []float64) (result []RTcomparisonResult, err error)

CompareRuntimesDefault calls CompareRuntimes using `DefaultResamples`. This convenience wrapper avoids repeating the numeric literal in callers and documents the recommended default in the public API.

type TimeStamp

type TimeStamp = time.Time

A relative TimeStamp with the highest possible precision on the current runtime system. The values aren't comparable between computer restarts or between computers. They are only comparable on the same computer between two calls to SampleTime() within the same runtime of a program.

func SampleTime

func SampleTime() TimeStamp

SampleTime returns a timestamp with the highest possible precision on the current runtime system.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL