Technology

Matching Without Meeting

April 2026

A retailer knows what you bought. A brand knows who saw their ad. Neither will hand the other their data. And yet both need the same number: did the people who saw the ad actually buy more?

When the retailer and the ad platform are the same company — Amazon, Walmart, Target — the loop closes naturally, and the measurement is direct. That is the closed-loop advantage that makes retail media so valuable. But most advertising does not work that way. A brand buying time on a streaming service to drive grocery sales is advertising on one company and selling through another. The data never meets.

Data clean rooms solve this problem. They are environments — typically cloud-hosted — where two parties can run joint analysis on data that neither party is willing to share in the raw. The results come out. The rows stay in.

The Matching Problem

Suppose a consumer goods brand ran a campaign across streaming platforms. They want to know whether households that saw the ad purchased more at a major grocery chain than households that did not. Simple question. Genuinely hard to answer.

The grocery chain has transaction records keyed to loyalty accounts. The streaming platform has ad exposure logs keyed to device or household IDs. These two datasets need to be joined — matched at the individual level — to produce any measurement at all. But the grocery chain will not hand its loyalty data to the streaming platform, and the streaming platform will not hand its user graph to the grocery chain. Competitive sensitivity, privacy regulations, and plain business logic all say no.

This is not a new problem. In the television era, Nielsen solved it by running a panel — a few thousand households that agreed to have their viewing and purchasing tracked simultaneously. The panel was the only place where exposure and outcome existed in the same dataset. Clean rooms are the modern version of this idea, extended to the full customer base rather than a sample.

How It Works

A data clean room is fundamentally a query environment. Each party uploads a hashed or encrypted version of their data — purchase records on one side, ad exposure logs on the other. The environment matches records by a common identifier, typically a hashed email address or a unified identifier like Unified ID 2.0. Neither party can download the matched dataset. They can only submit queries.

The queries return aggregates. You can ask: what was the conversion rate among users who saw my ad? You cannot ask: show me the names and addresses of users who saw my ad and then bought something. The environment enforces this through query restrictions, minimum cohort sizes (usually 25 or more records to prevent inference), and differential privacy mechanisms that add noise to results that would otherwise reveal individual behavior.

The technical architecture varies by vendor. Google's Ads Data Hub runs queries inside Google's infrastructure using BigQuery. Amazon Marketing Cloud works the same way inside AWS. InfoSum, Habu, and LiveRamp operate as neutral third parties — neither the retailer nor the brand — which matters when the parties are roughly equal in commercial power and neither wants to operate inside the other's cloud.

What You Can Actually Measure

Two things, both of which were genuinely hard to measure before clean rooms existed.

Reach is how many distinct people saw your ad. Not impressions — people. If the same household saw your ad nine times across three platforms, that is one person reached, not nine impressions of reach. Before clean rooms, a brand buying ads across multiple platforms had no way to deduplicate. Each platform reported its own reach number, and the brand added them up — which counted overlapping audiences multiple times. The clean room matches exposure records across platforms and collapses duplicates, giving you a true unique reach figure.

Frequency is the average number of times each reached person saw the ad. This matters because frequency has diminishing returns. The first exposure might register the brand. The third might prompt a purchase. The fifteenth is almost certainly wasted spend — and possibly damaging. Without cross-platform deduplication, a brand might think a household saw the ad four times when it actually saw it fourteen times across different services. Clean rooms make the real number visible.

Incrementality is the harder and more valuable number. Reach and frequency tell you what happened. Incrementality tells you whether it mattered.

The question incrementality answers is not "how many people saw the ad and later bought?" It is "how many people bought because they saw the ad — beyond what would have happened anyway?"

The methodology requires a holdout group: a set of similar users who were deliberately not shown the ad. You compare conversion rates between the exposed group and the holdout. The gap is what the campaign actually caused.

Say your campaign reached 500,000 people. Of those, 25,000 purchased — a 5% conversion rate. Your holdout of 100,000 unexposed users converted at 4%. The incremental lift is 1 percentage point. Without the holdout, you would have attributed all 25,000 purchases to the campaign. With it, you know that 20,000 of them were going to buy regardless. Only 5,000 were actually moved by the ad.

Without the clean room, you cannot construct this holdout because you cannot join the ad exposure data to the purchase data at all. With it, you can run the comparison across millions of actual customers rather than a panel of thousands.

The Platform Model

The original clean room use case was bilateral — one brand, one retailer, one measurement question. The market has since moved toward multi-party platforms where a single clean room operator maintains relationships with dozens of publishers and brands simultaneously.

This hub-and-spoke model emerged because the bilateral setup has a scaling problem. A brand that advertises across eight streaming platforms, two retail media networks, and a social platform would need ten separate clean room setups, ten separate data agreements, ten separate data upload processes. A platform aggregates these relationships, standardizing the data contracts and query interfaces across all of them.

TikTok's clean room, for example, lets brands connect TikTok exposure data to retailer purchase data from Walmart, Kroger, and others — all within one environment, without TikTok seeing the retailer data or the retailer seeing TikTok's user data. The brand gets a unified view across the full media mix. The publishers and retailers each get measurement they can offer as a premium feature.

Why This Is Happening Now

Two forces are converging. The first is the deprecation of third-party tracking. As cookies disappear from browsers and app tracking identifiers become opt-in rather than opt-out, the cross-site behavioral profiles that underpinned a decade of programmatic targeting are dissolving. Advertisers who relied on that infrastructure need a replacement. Clean rooms, because they work with first-party data held by each party, are structurally immune to cookie deprecation.

The second force is regulatory pressure. GDPR in Europe, CCPA in California, and a growing stack of state-level privacy laws in the United States are raising the cost of sharing raw personal data across company boundaries. Clean rooms are designed to satisfy these requirements — the raw data never moves, only query results do, and the query restrictions are designed to prevent individual-level inference.

The combination creates genuine business pressure to adopt clean rooms. This is not optional infrastructure for the advertising industry — it is the mechanism through which measurement continues to work as the old mechanisms shut down.

The Limitations

Clean rooms are not a complete solution to privacy measurement. Several constraints matter in practice.

Identity resolution is imperfect. The match between retailer loyalty IDs and platform user IDs depends on a common identifier, almost always hashed email. Users who have not provided email to both parties are invisible to the analysis. Match rates of 30 to 60 percent are common, which means results are drawn from a selected subset of the customer base.

Minimum cohort sizes prevent granular queries. The clean room will not return a result unless enough people match your query — Google Ads Data Hub requires 50 rows; other platforms set their own thresholds, typically in the 25–50 range. That rule exists to protect privacy — if a query returns a result for 4 people, you could start inferring things about those specific individuals. With enough narrow queries you could essentially reconstruct individual records, which is exactly what clean rooms are supposed to prevent.

The problem is that the more specific your question, the fewer people match it. You can ask broad questions just fine: what was the conversion rate among everyone who saw my campaign? That hits 500,000 people — no issue. But the moment you slice by ZIP code, creative variant, purchase history, and device type simultaneously, you end up with a thin slice that does not clear the threshold. You get silence — not a small number, not even a "fewer than 25." Just no result.

The practical frustration: you might want to know which creative performed better in a specific market segment. Clean rooms can answer that for large segments. For niche ones, the same privacy protection that stops bad actors from re-identifying individuals also stops you from doing the granular analysis you actually wanted.

Query costs accumulate. Every query runs compute against both parties' data. Complex incrementality analyses — particularly those requiring multiple holdout splits or segment-level breakdowns — can become expensive at scale.

Standardization is still early. Different clean room platforms have different query languages, different identifier systems, different privacy thresholds, and different output formats. A brand running analyses across multiple clean rooms is running multiple incomparable measurement methodologies, which makes media mix modeling complicated.

The underlying idea is straightforward: two parties each hold half of a useful dataset and neither will share it. A neutral environment runs the join, enforces the privacy rules, and returns only what the parties need to know. The implementation is genuinely complex — identity resolution, differential privacy, query auditing, contractual governance — but the concept resolves a problem that was otherwise unresolvable.

This is why clean rooms have moved from a research concept to core advertising infrastructure in roughly four years. The problem they solve is not going away.

Related: The Retail Media Stack — how closed-loop measurement works when the retailer and the ad platform are the same company.

References

All concepts in this post are drawn from public documentation and industry trade coverage.

Google Ads Data Hub — Developer Documentation — architecture, query restrictions, and the 50-row privacy threshold
Amazon Marketing Cloud — Product Overview — AMC's SQL query environment and aggregation requirements
InfoSum Platform — neutral third-party clean room model
LiveRamp Clean Rooms — identity resolution and data collaboration infrastructure
Unified ID 2.0 — The Trade Desk — the open-source cross-platform identifier used for matching
AdExplainer: Data Clean Rooms — AdExchanger — trade press overview of how clean rooms work in practice