Probabilistic Attribution Does Not Equal Fingerprinting — Here’s How

By Vungle | March 30, 2021

Winter is coming.

Wait, scratch that. Spring is coming.

In early spring, Apple’s identifier for advertisers (IDFA) for iOS will hit the endangered species list after the public release of iOS 14.5, leaving mobile marketers without a way to deterministically attribute through mobile measurement partners (MMP) users who don’t give permission to access their IDFA.

Historically MMPs have used an alternative method called fingerprinting to track these users when the MMP SDK is unable to access the device IDFA. This is either because there is no IDFA available (i.e. if the ad is on mobile web), or because the user has restricted access to the IDFA through Limit Ad Tracking (LAT).

In the context of mobile marketing, fingerprinting is the practice of attempting to one-to-one match an individual mobile user to an acquisition source through IP address (predominantly) along with other device attributes such as device name, device type, OS version, and mobile carrier.

For months since Apple announced the deprecation of IDFA in summer 2020, mobile marketers have wondered if fingerprinting would be allowed after the release of iOS 14.5.

Apple answered in their privacy FAQs in January 2021 with an emphatic no.

What is probabilistic attribution?

Probabilistic attribution is a statistical solution for providing a probability distribution of all the campaigns that are likely to have generated an install. Although fingerprinting is a type of probabilistic attribution solution, it’s not compliant with the privacy rules set forth by Apple. This is because an identifier (ID) is created to attempt to track a user without receiving user permission to do so, as Apple’s privacy FAQs stated: “you may not derive data from a device for the purpose of uniquely identifying it.”

But what’s currently stuck in the collective ad tech industry’s subconscious is the notion that “probabilistic attribution” is another word for “fingerprinting.”

In this blog post, we’ll discuss why the industry needs to move past discussions around fingerprinting as a solution for measurement and discuss types of probabilistic attribution that don’t track individual users. We’ll explain how the AlgoLift by Vungle Apple-compliant probabilistic attribution model has several fundamental differences and flat out doesn’t equal fingerprinting. Let’s dig in.

First, our probabilistic attribution model doesn’t attempt to make a one-to-one attribution match between user and attribution source the way fingerprinting does. This is critical in understanding how the AlgoLift by Vungle probabilistic attribution differs completely from fingerprinting.

Instead, our method creates a probability distribution of where a user is likely to have come from (e.g. 20% probability from organic, 30% from a Facebook ad campaign, etc.), using only the data provided by SKAdNetwork, ConversionValue, and anonymous user-level app data.

Second, our probabilistic attribution method doesn’t collect IP addresses for users-to-attribution-source matching as fingerprinting does.

The AlgoLift by Vungle probabilistic attribution model ingests four sets of data as described and visualized below.

  1. Anonymous user-level data: In-app event and revenue data
  2. SKAdNetwork data: Campaign ID, source app ID, in-app event, and revenue data
  3. Ad network campaign data: Campaign ID, impressions, and clicks
  4. Deterministic MMP attribution data: ATT opt-in iOS users and users identified by other means within Apple’s guidelines

Caption: AlgoLift by Vungle framework for probabilistic campaign predicted ROAS (pROAS) calculations based on anonymous user-level app data, SKAdNetwork, and ad network campaign data.

How does the AlgoLift by Vungle probabilistic attribution method work?

The probabilistic attribution model finds users who have a defined ConversionValue within the in-app data set and creates a probability that the user belongs to a campaign that contains the same ConversionValue.

Let’s take an example of ConversionValue equals 5 which is mapped to “tutorial finish” in a game. In this example, there have been 100 tutorial finishes on a certain day. We can see which new users within an app have completed the tutorial using in-app user-level data. We can then analyze which users have completed ConversionValue equals 5 within the SKAdNetwork campaigns.

SKAdNetwork reports the below:

  • Campaign 1 tracked 10 tutorial finishes
  • Campaign 2 tracked 65 tutorial finishes
  • Campaign M tracked 15 tutorial finishes

SKAdNetwork didn’t track 10 of the 100 tutorial finishes so we can assume they were organic.

Now, below is a diagram visualizing App User 1 and its associated campaign membership probabilities. We know from in-app user-level data that App User 1 completed a tutorial finish, and therefore we can assign a probability that the user originated from one of three different ad campaigns. Campaign 1 is known to have had 10 tutorial finishes out of a total of 100, so the probability App User 1 originated from Campaign 1 is 10%.

Caption: Campaign membership probabilities for App User 1 based on the given example of 100 tutorial finishes in a game.

Although Campaign 3 showed the highest campaign membership probability (65%) for App User 1, we can’t say with absolute certainty that Campaign 3 should be wholly attributed for driving an install from App User 1. We can only assign a probability to each discrete possibility.

In reality, many user-to-campaign combinations can be pre-filtered to a 0% probability due to geo or ConversionValue mismatch. Users who shared their IDFA can also be attributed deterministically with 100% probability, making the task of assigning probabilities from non-tracked users easier.

Probabilistic versus “winner-takes-all” approach

Let’s drive the point home further about how our probabilistic attribution approach works with a simple numerical example.

In the table below, you’ll see five hypothetical app users. Each user can belong to one of three ad campaigns (C1, C2, or C3) or Organic. In the second column, you’ll find the projected revenue (pRevenue or LTV) for each user. And lastly, we listed example campaign membership probability distributions across three campaigns and Organic.

We highlighted the “winner-takes-all” membership probability in red. This is the ad campaign that exhibited the highest membership probability.

Caption: A numerical example of pRevenue/LTV and membership probabilities for five users across three campaigns and Organic. User 3 is an IDFA opt-in iOS user that could be one-to-one attributed to Campaign C2.

In this example table below, you’ll see pROAS (pRevenue/Spend) per campaign calculated by:

  • Probabilistic attribution: Revenue for each user is distributed to each campaign based on their membership probability—e.g. give 75% of User 1 pRevenue to Campaign 1
  • Winner-takes-all attribution: All revenue is given to the campaign with the highest membership probability

Caption: A numerical example of campaign revenue and ROAS for the proposed probabilistic versus winner-takes-all attribution approaches.

As you can see in the above example table, the “winner-takes-all” attribution approach heavily biases toward Campaigns C1 and C2 and assumes no revenue to C3 and Organic.

Shockingly, the winner-takes-all attribution approach is the fingerprinting approach that is so often lauded as a workaround for IDFA deprecation. Not only is fingerprinting noncompliant with Apple’s policies, but a fingerprinting attribution model uses data inputs that are not attributed to any single device (IP address, device name, device type, OS version, mobile carrier) and outputs a deterministic probability of 100% that an install belongs to a campaign.

It’s clear from the above example that this approach is misleading at best. At worst, ignoring the “long-tail” probability of users possibly coming from more than one campaign over hundreds of thousands of users will lead to potentially misallocating millions of dollars of revenue. A “winner-takes-all” attribution approach works for Users 1, 3, and 4, but is absolutely the wrong approach for any other user.

iOS 14.5 is coming. Hopefully, we’ve explained why it’s so crucial to move past discussions of fingerprinting and move to an Apple-compliant probabilistic attribution solution that solves measurement for the IDFA-less future.

Learn more about AlgoLift by Vungle’s measurement solutions that solve for Apple’s privacy changes by contacting us below.