Probabilistic attribution for iOS 14  — a deeper dive

By Liftoff | July 16, 2020

We previously discussed a promising solution to the reduced efficacy of deterministic attribution inherent in iOS14. In that piece, we outlined our conclusion that deterministic attribution will be significantly impacted by the changes put in place by Apple to protect user privacy.

In this follow-up, we would like to elaborate on our approach and the use of attribution data.

Attribution for Campaign Valuation

AlgoLift’s primary use-case for user-level campaign attribution is campaign pROAS (predicted ROAS) estimation. We perform user-level LTV forecasts which we then aggregate to campaign level predicted revenue. We then compare the predicted revenue to the observed spend to predict ROAS. This pROAS is used for campaign optimization and performance reporting.

MMPs provide user-level campaign attribution to AlgoLift via our clients. The introduction of iOS14 will disrupt that data pipeline and prevent us from predicting future ROAS for campaigns. Thus, we are developing probabilistic attribution to provide a seamless transition for our clients to the IDFA-less world. We will still be able to use attribution provided by the client’s MMP; but our internal statistical models will supplement attribution gaps with partial or no deterministic attribution.

Budget Allocation without Full Deterministic Attribution

We are developing probabilistic attribution using our deterministically attributed, user-level historical data with a focus on several questions:

  • What combination of data must we leverage from the ad networks to make this approach viable?
  • What data should advertisers record within the SKAdnetwork conversion event mechanism?
  • What impact will this type of attribution have on the performance of user acquisition automation?
  • How do we incorporate deterministic attribution from MMP’s in probabilistic attribution meaningfully?

Figure 1 illustrates the data inputs into the probabilistic attribution model:

  1. Anonymous user-level data: geo, device make & model, app version, in-app event and revenue data
  2. SKAdnetwork data: campaign ID, source-app-id, in-app event and revenue data
  3. Ad network campaign data: campaign ID, geo, device make & model, app version
  4. Deterministic attribution from MMP’s: opt-in iOS users and users identified by other means within Apple’s terms of service

Figure 1 — Proposed scheme for probabilistic pROAS calculations based on anonymous user-level, SKAdNetwork, and Ad Network Data. We have also provisioned for users to be attributed by other means if they are Android or opt-in iOS users. Budget Allocation via API integration with multiple ad networks is shown as an example use-case.

We are not:

  • Fingerprinting based on IP address or user agent
  • Rejecting MMP based attribution

With the iOS 14 changes, advertisers will be left with multiple sets of data that can’t be connected in an obvious way. The probabilistic attribution framework we are developing is how we feel these datasets can be used together in the most robust way toward the end goal of maximizing ROAS across all channels.

We will take all information about a user’s origin from all data providers to probabilistically calculate pROAS. Internally, each user will be assigned a campaign membership probability as depicted in Figure 2. In this example, App User 1 may belong to one of 3 separate campaigns based on the available data about that user. There is also a 10% chance the user installed the app without clicking on an ad (organic). Although User 1 is most likely to have installed the app based on clicking on an ad from Campaign 2, we can’t say for certain that this is how they installed the app; we can only assign a probability to each discrete possibility.

Figure 2 — An example distribution of an app user over multiple user acquisition campaigns based on probabilistic attribution

Note that this approach can accommodate Android and opt-in iOS users easily by setting their membership probability to 100% for a deterministically attributed campaign, and using that piece of data to properly update other users’ projected membership probabilities as well.

Probabilistic versus “Winner Takes All” Approach

The reason for this approach can be demonstrated with a simple numerical example. Table 1 shows 5 hypothetical users that can belong to one of three campaigns or organic. Their projected LTV and estimated membership probabilities are given. The “winner takes all” membership (the campaign with the highest membership probability) is highlighted in red.

Table 1: Hypothetical revenue and membership probabilities for 5 users with 3 campaigns. User 3 was an opt-in iOS user with fully known attribution to campaign C2.

Table 2 summarizes the pROAS (pRevenue/Spend) per campaign for this example calculated by:

  • Probabilistic Attribution:
    the revenue for each user is distributed to each campaign weighted by the membership probability
  • Winner Takes All: All the revenue is given to the campaign with the highest membership probability

Table 2: Campaign revenue ad ROAS for the proposed probabilistic versus winner takes all approaches.

Table 2 illustrates that the “Winner Takes All” strategy is heavily biased toward campaigns C1 and C2. It assumes no revenue to C3 and Organic. Ignoring the “long tail” probability that users might come from more than one campaign over 100,000s of users will misallocate potentially millions of dollars of revenue. This strategy is mostly right for users 1, 3 and 4, and very wrong for everyone else. In short, AlgoLift must account for the long tail.

Continue reading more in this series: Why a user-level LTV model is essential for post-iOS14 user acquisition