The one-sentence answer
On-device PII masking is the practice of redacting personal data (names, emails, phone numbers, payment data, free text) inside the user's browser before the session-replay SDK transmits anything to the vendor — using regex, validation rules, CSS/attribute selectors, and optionally on-device machine learning. Server-side masking — redacting after the data arrives at the vendor — is not equivalent and does not satisfy GDPR Article 5(1)(c) data minimisation.
Why server-side masking is not compliant
Server-side masking means the unredacted PII was transmitted to the vendor, processed by their infrastructure (even briefly), and only then redacted. Under GDPR Article 5(1)(c), processing must be "adequate, relevant and limited to what is necessary." The moment unredacted PII reaches the vendor, the controller has processed it — even if it's deleted within seconds. The ICO and CNIL both treat this as a data-minimisation violation. Vendors that claim "we mask everything server-side" are putting their customers at legal risk; the customer (data controller) is the party that gets fined.
The four-layer on-device masking stack
A compliant replay SDK stacks four mechanisms in order of specificity:
1. Regex on standard patterns
The first layer catches the obvious: email addresses (RFC 5322 with practical extensions), phone numbers (E.164 plus common national formats), Social Security numbers (US/UK/EU national IDs), and free-text patterns that look like keys (Bearer tokens, AWS access keys, GitHub tokens). The SDK runs these against every text node and attribute value before serialization. False-positive rate is the main tradeoff: an overly-broad regex catches things that aren't PII (a 16-digit order number that happens to look like a credit card).
2. Luhn validation for payment data
Credit-card numbers have a built-in checksum (Luhn algorithm). The SDK runs candidate 13-19 digit strings through Luhn before masking — only strings that pass become "card numbers" and get redacted. This reduces false positives dramatically vs naive regex (which would mask any 16-digit number). Same pattern works for IBAN (mod-97 check), CPF (Brazilian tax ID), and a handful of other validated formats.
3. Attribute and selector-based masking
For fields the regex can't identify by content alone (user-provided names, custom IDs, internal-only fields), the SDK supports explicit declarations:
<input data-relyv-mask placeholder="Internal customer ID">
<div data-relyv-mask>{{ customer.fullName }}</div>Or a selector list passed to
init():init({
apiKey: '...',
mask: ['.pii', '[data-private]', 'input[name="ssn"]'],
});Engineers add the markers at the source code level — once, in the component template, never duplicated per page.
4. Optional on-device LLM for free text
Free-text inputs (comments, support messages, search queries) can contain PII that regex won't catch — a user typing "my name is Sarah and I live at 123 Main St". A small on-device language model (like GLiNER, ~80 MB, distilled for NER tasks) can run in the browser via ONNX Runtime + WebGPU and identify named entities for masking. Trade-off: 80 MB model load is too heavy for most sites; usually enabled only via a browser extension or on specific high-risk pages (support forms, checkout review).
What good masking looks like in practice
A few patterns that distinguish well-built masking from naïve masking:
- <strong>Mask before serialization, not after.</strong> The DOM snapshot the SDK takes should already have redacted values — not raw values that get masked in the upload payload.
- <strong>Mask at multiple levels.</strong> Text nodes, attribute values, console.log arguments, network request bodies, network response bodies, query parameters. PII leaks through every channel.
- <strong>Preserve structure, not content.</strong> A masked email shows as <code>***@***</code> not <code>***</code>; a masked credit card shows as <code>**** **** **** 1234</code> not redacted entirely. Debugability is preserved without the actual digits.
- <strong>Mask before Web Worker handoff.</strong> If capture runs in a Web Worker, the data sent to the worker must already be masked — not masked inside the worker after transfer.
- <strong>Provide an "inspect-the-payload" tool.</strong> Engineers should be able to record a session locally, capture the raw upload payload, and verify nothing leaked. Vendors that can't show you the raw payload are hiding something.
- <strong>Default to mask-all on sensitive elements.</strong> Password fields, hidden inputs, autocomplete tokens — never captured at all, regardless of selectors.
What gets missed (and how to mitigate)
Even with all four layers, three failure modes are common:
- <strong>Images of PII.</strong> A user uploaded an ID document or a screenshot containing personal data. The SDK can't parse pixels. Mitigation: don't capture <code><img src="..."></code> for user-uploaded content; substitute a placeholder.
- <strong>iframe content from third parties.</strong> Cross-origin iframes are opaque to the SDK; same-origin iframes need recursive masking. Mitigation: configure the SDK to treat unknown iframe origins as fully-masked.
- <strong>JavaScript that mutates the DOM after the snapshot.</strong> If a payment widget renders sensitive data into the DOM 200ms after page load, the MutationObserver picks it up and the masking layer needs to run on every mutation — not just the initial snapshot. Mitigation: verify the mutation-handler runs masking per change.
How Relyv implements masking
Relyv's SDK stacks all four layers by default:
- <strong>Regex + Luhn:</strong> Email, phone (E.164 + common national formats), SSN (US/UK/EU), credit card (with Luhn validation), Bearer/JWT/AWS/GitHub tokens — masked at every capture point (text, attributes, console, network).
- <strong>Attribute selectors:</strong> <code>data-relyv-mask</code> attribute marks custom fields. <code>mask: [...]</code> at init takes a CSS selector list. Password/hidden fields excluded outright.
- <strong>On-device ML:</strong> GLiNER (~80 MB) optionally loaded via the browser extension for free-text PII detection. Off by default; opt-in per workspace.
- <strong>Inspect-the-payload tool:</strong> Dashboard → Settings → "Inspect raw payload" shows the next captured session's upload before transmission, so you can verify masking worked on a real flow.
- <strong>Web Worker handoff:</strong> Masking runs on the main thread <em>before</em> events transfer to the capture worker. The worker never sees raw PII.