08. Privacy in feedback¶

Bias awareness improves how feedback is used. Privacy ensures the feedback itself is handled responsibly. Free-text comments, implicit signals tied to user identity, retention windows — the disciplines from 03_ai_security_safety/03_data_access_governance apply to feedback data with specific considerations for the feedback context.

A platform engineer at a Bengaluru SaaS company audits the feedback store after a customer raises a concern. The store has 18 months of feedback events; explicit comments contain user emails (users typing "my email is..." in support comments); implicit signals are joined to user identifiers in clear text; the retention is "indefinite for analytics purposes." The audit lead reads this back as a privacy problem. Customer free-text comments are PII-rich; user identifiers should be hashed; the retention should be bounded. Three weeks later, the store is rebuilt with PII redaction at intake, hashed user identifiers, bounded retention (1-2 years), and RTBF participation.

This chapter is that discipline. Feedback is user data; the privacy disciplines apply; the work is to identify the specific surfaces.

What goes wrong without the discipline¶

Five concrete failures.

Free-text comments contain PII. Users naturally include personal details — names, emails, account numbers, sensitive identifiers — in their comments. The feedback store accumulates PII the team did not intend to capture.

User identifiers in clear text. The feedback joins to user identity by user_id clear. A breach of the feedback store exposes which user said what.

Implicit signals expose behaviour patterns. A user's session-level behaviour (when they use the platform, how often, what they ask) is itself private. The feedback store may capture more behavioural detail than the user consented to.

Cross-team access without controls. Engineering, product, customer-success all want to read the feedback. Without access controls, the data is widely accessible.

Indefinite retention. Feedback ages; the team rarely re-reads old feedback; it accumulates in storage indefinitely. Regulatory exposure compounds.

The discipline addresses each.

PII redaction at intake¶

Free-text comments are scanned for PII patterns at write time (chapter 05 of 03_ai_security_safety/03_data_access_governance):

Email addresses → [REDACTED:email]
Phone numbers → [REDACTED:phone]
Account numbers, IDs → [REDACTED:identifier]
Other sensitive shapes per pattern library

The redaction is at write; the raw value never enters the feedback store. The shape of the comment is preserved ("I cannot access my account [REDACTED:identifier]"); the PII is gone.

For comments that lose their meaning when redacted (e.g., comments specifically about the user's own account), the team accepts the redaction as the trade — the comment is less specific but the privacy is protected.

Hashed user identifiers¶

The feedback store uses user_id_hash, not raw user_id. The hash:

Is one-way (salted; reversibility not retained).
Is stable per user (multiple feedback events from the same user produce the same hash).
Supports correlation without identity ("how many distinct users produced feedback on this feature this week").

Reverse-lookup (going from a hash to a user identity) requires the salt and a reverse-lookup table, held only by the platform team for the specific operations (RTBF) that require it.

The hash gives analytical capability without holding identity in clear; a breach of the feedback store exposes patterns, not identities.

Implicit-signal privacy¶

Implicit signals (chapter 03) capture behaviour. The discipline:

Tie to session, not to identity. A signal "user abandoned at turn 3" is per-session; the session may be tied to identity via hash, but the signal itself does not need raw identity.
Aggregate timely. Per-event signals are needed for the immediate pipeline; long-term retention should be aggregated (per-week engagement statistics, not per-event records).
Granularity awareness. Some signals (location, exact timestamps, device fingerprints) are themselves quasi-identifiers; treat with care.

Access controls¶

Read access to the feedback store is restricted:

Engineering — for the pipeline (chapter 05) and incident investigation.
Product — for weekly reviews and decisions.
Customer-success — for case-specific investigations triggered by support.
Data-protection officer — for compliance audits.

The access is logged (audit on the audit — every query against the feedback store produces its own audit record). Reads by users not in these roles are refused.

For high-volume analytical queries, an aggregated view (no per-row PII or identifiers) is provided to a broader audience; per-row access is restricted to the roles above.

Retention¶

Per chapter 06 of 03_ai_security_safety/03_data_access_governance. A reasonable matrix for feedback:

Data	Window	Reason
Raw events with free-text comments	1-2 years	Operational pipeline + audit; bounded for compliance
Structured events without text	2-3 years	Aggregate analysis
Aggregated metrics (no per-user)	Indefinite	Statistical history
Hash-to-identity lookup table	Aligned with primary user data	RTBF dependency

Automatic deletion at boundaries; verified per the discipline.

Right-to-be-forgotten¶

The feedback store participates in RTBF (chapter 09 of 03_ai_security_safety/03_data_access_governance).

For a data subject's erasure request:

Look up the hash of the subject's identifier.
Find all feedback events with that hash.
Delete or further-pseudonymise (depending on jurisdiction and audit-retention obligations).
Verify; document; notify per regulatory window.

For a clean (hashed-identifier) feedback store, the RTBF is fast. For a clear-text-identifier store, it is more painful — every comment may need scanning for the subject's identifiers.

Cross-tenant isolation¶

Feedback events are tenant-tagged; queries enforce tenant scope (chapter 10 of 03_ai_security_safety/03_data_access_governance). A cross-tenant feedback query (e.g., "all customers' worst feedback") is restricted to platform-team operations with explicit scope.

What this discipline does not solve¶

Users who intend to share PII. A user typing "please contact me at ravi@example.com" is voluntarily providing; the redaction protects them anyway, but the intent was sharing.
Bias from selective response. Privacy does not address bias; chapter 07 does.
Provider-side retention. If the feedback flows through a third-party tool (e.g., a survey platform), the provider's retention is also a concern.

Common mistakes¶

Free-text comments stored raw. PII accumulates silently.

User identifiers in clear. Breach exposes identity.

No access controls. Broad organisational access to a sensitive store.

No retention. Indefinite accumulation; compliance risk.

No RTBF participation. The feedback store is unreached by erasure requests.

Interview Q&A¶

Q1. Why redact free-text feedback comments at intake? Because users include PII in their comments unprompted. Storing raw comments accumulates personal data the platform did not intend to collect. Redaction at intake preserves the shape (the comment's structure and substance) while removing the values; a breach of the store exposes patterns, not identities. Redaction at read time leaves raw values in storage where any access exposes them. The discipline is "redact before store"; the same as the audit-log redaction in 03_ai_security_safety/03_data_access_governance chapter 05. Wrong-answer notes: "we'll be careful with the data" is the documentation-only failure mode.

Q2. How does hashed identifier support both privacy and analytics? The hash is one-way (cannot be reversed to a user identity without the salt and table held only by the platform team). It is stable per user (the same user produces the same hash across events), so analytics can correlate ("how many distinct users provided feedback last week") without holding identity in clear. A breach of the feedback store exposes patterns and behaviours but not identities. Reverse-lookup is held separately and used only for operations that require it (RTBF, security investigation). Wrong-answer notes: "we encrypt the identifier" is two-way and exposable; the one-way hash is structurally different.

Q3. Walk through how the feedback store participates in RTBF. For a data subject's erasure request: look up the hash of their identifier (using the held reverse-lookup table). Find all feedback events with that hash. Delete (or further-pseudonymise the hash to break the connection to the lookup table; the choice depends on jurisdiction). Verify with a re-query. Document the action. The clean (hashed-identifier with intake redaction) store makes this fast; the raw store requires scanning every comment for the subject's identifiers, which is slow and unreliable. Wrong-answer notes: "we don't store identity in the feedback" is wrong — the hash is identity; the discipline is about handling it correctly.

Q4. The team wants broad organisational access to the feedback store "for everyone to learn from users." What is your push-back? Two options. Restrict per-row access to the few roles that need it (engineering, product, customer-success, DPO). Provide an aggregated view (no per-row PII, no identifiers) to the broader audience. The broader audience learns from patterns and themes, not from per-user data. The restricted store is auditable; broad access is not. The push-back is "access proportional to need, with audit for the access that is granted." Wrong-answer notes: "everyone benefits" without controls produces a breach surface across the company.

What to do differently after reading this¶

Redact PII at intake; the store never holds raw personal data.
Hashed user identifiers; reverse-lookup held separately and audited.
Tenant-tag every event; queries enforce.
Access controls per role; reads are audited.
Bounded retention; automatic deletion; RTBF participation.

Bridge. Privacy is the responsibility around the data. Cadence is the responsibility for using it. The next chapter is the rhythm of looking at feedback and acting on it. → 09-feedback-cadence.md