The four sampling strategies
Every replay deployment makes one of these four choices, explicitly or by default:
- <strong>Full capture (100%):</strong> Record every session. Maximum debug coverage, maximum cost.
- <strong>Random sampling (X%):</strong> Record a uniform random share. Lowers cost; preserves statistical validity for aggregate metrics; loses individual-session debug coverage.
- <strong>Conditional capture:</strong> Record when a condition triggers — an error fires, a specific user segment, a flagged page. Best of both: low cost on routine traffic, full coverage on debug-worthy traffic.
- <strong>Error-only:</strong> Record only sessions where a JS error or specific event fires. Cheapest; only useful for error-driven debugging; loses silent-failure detection.
When 100% capture is right
Default to 100% for these cases:
- <strong>You're below ~50K sessions/month.</strong> The cost difference between 100% and 10% is negligible; the debug-coverage difference is huge.
- <strong>You debug silent failures.</strong> A user hits a confused UX path and leaves without an error. Random sampling means you might miss the example; 100% guarantees you have it.
- <strong>You care about cohort analysis.</strong> "Show me every Pro-plan user who hit our checkout flow this week." Random sampling forces statistical adjustments; 100% gives clean cohorts.
- <strong>You generate tests from sessions.</strong> Auto-generated Playwright specs need the actual user flow; missing the right session means no test.
- <strong>Your traffic spikes are unpredictable.</strong> Random sampling under-represents spike traffic (which is the most likely to expose bugs). 100% captures every spike.
When random sampling (10-25%) makes sense
Switch to random sampling when:
- <strong>You're at very high scale (5M+ sessions/month)</strong> and replay storage dominates infra cost. At this scale, 10% sampling preserves statistical validity for aggregate metrics while cutting cost by 10×.
- <strong>Your debug workflow is aggregate-friction-driven, not individual-session-driven.</strong> Heatmaps and friction scores work fine on 10-25% samples (with the appropriate confidence intervals).
- <strong>You haven't bought into AI workflows yet.</strong> AI ticket drafting and test generation benefit from the specific session that triggered the issue; if you're not using those, the cost-coverage trade-off shifts toward sampling.
When conditional capture wins
Conditional capture is usually the right choice for mid-market teams (100K-1M sessions/month):
- <strong>Record-on-error:</strong> Default to 100% capture but only persist sessions where a JS error, network 5xx, or rage-click fires. Storage cost drops 80-95%; debug coverage stays high because every error-bearing session is preserved.
- <strong>Record-on-segment:</strong> Always capture Pro-tier users, paying customers, or specific feature-flag cohorts. Sample free-tier traffic at 10%.
- <strong>Record-on-page:</strong> Capture every checkout session at 100%; sample marketing-page traffic at 5%.
- <strong>Record-on-velocity:</strong> Capture sessions with high friction signals (multiple rage clicks, repeated form errors) at 100%; sample others.
When error-only sampling is too restrictive
Error-only is the cheapest strategy but has a hidden cost: it eliminates the silent-failure use case. Examples it would miss:
- A user who clicks the wrong button, gets confused, leaves — no error fires.
- A funnel drop-off where the page rendered correctly but the user abandoned.
- A pricing-page session where the user spent 2 minutes comparing tiers and left without converting.
- A support-search session where the user couldn't find the answer.
The statistical validity question
For aggregate metrics (heatmaps, friction scores, conversion rates), random sampling at 10% is fine — the sampling error is ±1-2% on most metrics for typical traffic volumes. For specific debug ("what did this user do?") sampling fails by definition; you either have the session or you don't. For cohort analysis (Pro-tier checkout flow), you need 100% on the cohort but can sample everything else. The right strategy is rarely uniform across all traffic.
Sampling and privacy interact
A subtle interaction: sampling reduces the number of users whose data is processed but doesn't change which users. If you're relying on legitimate interest under GDPR Art. 6(1)(f), the lawful basis applies to all users; sampling doesn't change it. If you're relying on consent and only a subset of users consented, you must respect the consent flag before sampling decides whether to capture — never the other way around.
How Relyv handles sampling
Relyv supports all four strategies via the SDK init options:
init({
apiKey: '...',
// Always capture: omit `sample`
// Random: sample: 0.1 (10%)
// Conditional: sample: (ctx) => ctx.user?.plan === 'pro' ? 1 : 0.1
// Error-only: capture: 'on-error'
});Captured-but-not-persisted sessions are buffered client-side; the persist decision can be made retroactively when an error fires (capturing the lead-up to the error from the buffer). All four strategies respect the same consent + masking pipeline.