Deterministic v probabilistic matching in a customer data platform

9 months ago
8

“Deterministic” is a bit of an exaggeration

If you’re learning about or investigating CDPs you’ll often run into the question of deterministic vs. probabilistic matching. That has to do with the rules you use to merge profiles. So first let’s address that issue. Why are you merging profiles at all? What’s the point?

One of the purposes of a CDP is to take data from multiple sources and merge those records into a single customer record. That is, one place where you have all the information on your customer.

Deterministic matching is an approach that relies on exact matches between certain attributes to match different customer records. It usually matches on an email address or a phone number, but it can also use a postal address, a customer ID, or other values.

As a practical example, a CDP might have a profile for your desktop computer and for your smart phone. It doesn’t know you own both of those devices, but if you enter the same email address on your desktop and phone, the CDP can merge those two records.

Probabilistic matching uses statistical algorithms and machine learning models to identify matches based on the likelihood that two records belong to the same person or entity.

You might use probabilistic matching to merge two customer records if they have variations on the same name – like Robert Smith and Bob Smith – if they show similar content interests, and also come from the same IP range.

There are two important points here.

1. Deterministic matching is never deterministic.
2. You need to employ a sliding scale of confidence based on your use cases.

On that first point, consider this story.

When my mother was getting older, she didn’t have the energy to do all her own Christmas shopping, so she asked my sister to log in to my mother’s Amazon account – on my sister’s computer, at my sister’s house – and buy presents for the grandkids. Based on so-called “deterministic matching,” this would identify my sister’s computer as my mother’s computer.

You might say that’s an edge case – and I’ll get to that in a minute – but if you think about it, there are a lot of situations where this sort of thing happens. Identity can be a bit murky on the internet.

Marketers tend to have a preference for things we can measure, sometimes to the exclusion of other real-world factors. I call this measurement bias – it’s similar to numeracy bias, where you prefer something that’s expressed as a number – and it’s something you need to keep in mind.

I recommend employing a sliding scale of confidence. When two profiles have the same account information, you can be pretty sure that you’re dealing with the same person on those two profiles. But if two profiles show exactly the same content interests, you really can’t be certain those are both the same people.

The key thing is does it matter to your use case?

If you’re creating profiles solely to drive ads on your site, it’s no big deal if you mistakenly merge records that aren’t actually the same person. But if you’re dealing with financial transactions, healthcare records, credit reporting and things like that, you need to be very careful how you merge profiles.

Loading comments...