Skip to main content
Strategy guide

How to pick a session-replay sampling strategy.

Capturing 100% is the default — but storage + privacy + budget all push the other way. Here's how to pick between full capture, randomised sampling, conditional capture, and error-only recording, with the trade-offs for each.

TL;DR
Default to 100% capture if your traffic and budget allow — it eliminates "we don't have the session" debugging dead ends. Switch to randomised sampling (10-25%) when storage cost dominates AND you only need aggregate friction trends. Switch to conditional capture (record on error OR specific user segments) when you want full sessions for debug-worthy traffic but want to skip routine browses. Error-only sampling is the cheapest but loses the silent-failure use case.

The four sampling strategies

Every replay deployment makes one of these four choices, explicitly or by default:

  • <strong>Full capture (100%):</strong> Record every session. Maximum debug coverage, maximum cost.
  • <strong>Random sampling (X%):</strong> Record a uniform random share. Lowers cost; preserves statistical validity for aggregate metrics; loses individual-session debug coverage.
  • <strong>Conditional capture:</strong> Record when a condition triggers — an error fires, a specific user segment, a flagged page. Best of both: low cost on routine traffic, full coverage on debug-worthy traffic.
  • <strong>Error-only:</strong> Record only sessions where a JS error or specific event fires. Cheapest; only useful for error-driven debugging; loses silent-failure detection.

When 100% capture is right

Default to 100% for these cases:

  • <strong>You're below ~50K sessions/month.</strong> The cost difference between 100% and 10% is negligible; the debug-coverage difference is huge.
  • <strong>You debug silent failures.</strong> A user hits a confused UX path and leaves without an error. Random sampling means you might miss the example; 100% guarantees you have it.
  • <strong>You care about cohort analysis.</strong> "Show me every Pro-plan user who hit our checkout flow this week." Random sampling forces statistical adjustments; 100% gives clean cohorts.
  • <strong>You generate tests from sessions.</strong> Auto-generated Playwright specs need the actual user flow; missing the right session means no test.
  • <strong>Your traffic spikes are unpredictable.</strong> Random sampling under-represents spike traffic (which is the most likely to expose bugs). 100% captures every spike.

When random sampling (10-25%) makes sense

Switch to random sampling when:

  • <strong>You're at very high scale (5M+ sessions/month)</strong> and replay storage dominates infra cost. At this scale, 10% sampling preserves statistical validity for aggregate metrics while cutting cost by 10×.
  • <strong>Your debug workflow is aggregate-friction-driven, not individual-session-driven.</strong> Heatmaps and friction scores work fine on 10-25% samples (with the appropriate confidence intervals).
  • <strong>You haven't bought into AI workflows yet.</strong> AI ticket drafting and test generation benefit from the specific session that triggered the issue; if you're not using those, the cost-coverage trade-off shifts toward sampling.

When conditional capture wins

Conditional capture is usually the right choice for mid-market teams (100K-1M sessions/month):

  • <strong>Record-on-error:</strong> Default to 100% capture but only persist sessions where a JS error, network 5xx, or rage-click fires. Storage cost drops 80-95%; debug coverage stays high because every error-bearing session is preserved.
  • <strong>Record-on-segment:</strong> Always capture Pro-tier users, paying customers, or specific feature-flag cohorts. Sample free-tier traffic at 10%.
  • <strong>Record-on-page:</strong> Capture every checkout session at 100%; sample marketing-page traffic at 5%.
  • <strong>Record-on-velocity:</strong> Capture sessions with high friction signals (multiple rage clicks, repeated form errors) at 100%; sample others.

When error-only sampling is too restrictive

Error-only is the cheapest strategy but has a hidden cost: it eliminates the silent-failure use case. Examples it would miss:

  • A user who clicks the wrong button, gets confused, leaves — no error fires.
  • A funnel drop-off where the page rendered correctly but the user abandoned.
  • A pricing-page session where the user spent 2 minutes comparing tiers and left without converting.
  • A support-search session where the user couldn't find the answer.

The statistical validity question

For aggregate metrics (heatmaps, friction scores, conversion rates), random sampling at 10% is fine — the sampling error is ±1-2% on most metrics for typical traffic volumes. For specific debug ("what did this user do?") sampling fails by definition; you either have the session or you don't. For cohort analysis (Pro-tier checkout flow), you need 100% on the cohort but can sample everything else. The right strategy is rarely uniform across all traffic.

Sampling and privacy interact

A subtle interaction: sampling reduces the number of users whose data is processed but doesn't change which users. If you're relying on legitimate interest under GDPR Art. 6(1)(f), the lawful basis applies to all users; sampling doesn't change it. If you're relying on consent and only a subset of users consented, you must respect the consent flag before sampling decides whether to capture — never the other way around.

How Relyv handles sampling

Relyv supports all four strategies via the SDK init options:

init({
  apiKey: '...',
  // Always capture: omit `sample`
  // Random: sample: 0.1 (10%)
  // Conditional: sample: (ctx) => ctx.user?.plan === 'pro' ? 1 : 0.1
  // Error-only: capture: 'on-error'
});

Captured-but-not-persisted sessions are buffered client-side; the persist decision can be made retroactively when an error fires (capturing the lead-up to the error from the buffer). All four strategies respect the same consent + masking pipeline.

Frequently asked questions

What's the default sampling strategy in most replay tools?

100% capture is the default in Relyv, Hotjar, FullStory, LogRocket, and Microsoft Clarity. Sentry Replay defaults to 10% sampling with 100% capture on error. The default reflects the most-debug-coverage-for-the-money tradeoff for mid-market teams.

How does sampling affect AI features?

AI workflows that operate on individual sessions (bug ticket drafting from a specific captured session, test generation from a flow) need the specific session. Sampling reduces the chance that the AI has the right session. AI workflows that operate aggregates (cross-session pattern detection, friction-cluster scoring) work fine with sampled data.

Can I switch sampling strategies later?

Yes — sampling is a per-init config, not a deploy-time decision. Most teams start at 100% and reduce when they hit cost pressure. The historical captured corpus stays whatever it was when captured; only future sessions are affected.

Does sampling save bandwidth on the user's device?

Partially. The SDK still loads and instruments the page (those few KB are fixed). What sampling saves is the upload payload — sessions that don't get persisted aren't uploaded. For high-traffic sites, bandwidth savings can be meaningful for users on metered connections.

How do I sample only logged-in users?

Pass a function to sample: sample: (ctx) => ctx.user?.id ? 1 : 0. Or set a sample rate dynamically once you know the user's tier: relyv.setSample(0.1) for sampled, relyv.setSample(1) for full.

What sample rate gives me 95% confidence on conversion metrics?

Depends on traffic volume and effect size you're trying to detect. As a rough guide: for a 50K-session/month site detecting a ±2% conversion change, 10% sampling gives you ~95% confidence within ~1 week. For smaller effects, you need higher sampling or longer observation windows.

Ready to record your first session?

Free 1,000 sessions/mo. No credit card. Cancel anytime, no refunds.