Why LTV is essential for UA, and D7 ROAS should be retired

By Liftoff | October 28, 2020

In mobile UA, a common practice is to buy against ROAS (return on ad spend) targets based on revenue measured in the first 7 days after a user installs the app, i.e. D7 ROAS. D7 ROAS provides an early indicator of campaign performance that’s easily measured with the help of MMPs and campaign-level attribution. D7 ROAS is used primarily because it offers better insight into post-install performance than CPI or D1 ROAS. It also generally captures a large percentage of users who convert into payers or pass key event benchmarks early on.

D7 ROAS targets are often set based on the historical relationship between a cohort’s monetization at D7 and at the advertiser’s true payback window. For example, if historical cohorts have a D365 / D7 LTV ratio of 20, then the advertiser would buy against a 5% D7 ROAS target to hit a 100% payback at D365.

At best, these targets are continually re-calculated by UA teams as apps go through product updates and long term retention changes. At worst, the target is calculated once and slowly evolves into a magic number of unknown origin.

Why the current paradigm is inadequate

In reality, Day 365 and Day 7 LTV are not proportional, nor should one expect them to be. Let’s assess two cohort “LTV curves” from a top grossing mobile game:

In this simple example, the red cohort shows a higher D7 LTV but worse retention and by D60, the curves cross. The red cohort spent 10% of its eventual D365 LTV at D7, while the blue cohort had spent only 4% and so percentage-wise had more of its future value to come. The red cohort had a higher D7 LTV but much lower D180 and D365 LTV. This is a good example of D7 failing to represent long term value.

Only when cohorts exhibit identical retention and purchase rates over time could we expect a D7 payback % to be constant. In reality, different cohorts have different retention rates and their LTV curves differ in shape. This is especially true when comparing cohorts across different media sources or campaign types, which may fundamentally differ in behavior.

Increasing the number of multipliers based on the cohort performance partially solves for the incorrect prediction. However, this quickly becomes complex as the team now has many different ROAS targets across networks. This leaves no solution for newer channels or those with limited history. Furthermore, the multipliers are still not static at a channel level, and changing campaign structure can quickly change the possible retention curves. Conditional multipliers are a clumsy half step toward solving the problem.

Examining the relationship between D7 & D365

To further examine why it’s insufficient to assume a proportional relationship between D7 and D365, the graph below shows a real life example of the relationship between the Day 7 ARPI (average revenue per install) itself and the Day 365 / Day 7 multiplier. If D365 LTV were proportional to D7, this plot would show no clear relationship as any variation in the multiplier would just be random. On the contrary, there is a clear negative correlation between the two, which shows that the D7 / D365 LTVs are not simply proportional. If this app developer acquired installs against a constant multiplier, they would become biased towards ad campaigns that exhibit early conversion and revenue, as opposed to those who might actually have more long-term value due to better potential late conversion and retention. Bias toward early over long-term performance defeats the original goal of the multiplier. The only way to avoid such biases is to operate true LTV forecasts to understand each campaign’s projected D365 revenue.

Comparing LTV projections to D7 ROAS

How much better are LTV projections than D7 ROAS? To answer this, we can compare LTV projections made at day 7 in a user’s lifetime to a naive / multiplier approach. The error metric we will use is absolute error between the predicted LTV and actual LTV in dollars. Instead of looking at individual users, which display high variance and won’t tell us much about model performance, we’ll look at bootstrapped samples of 1000 users each, taken randomly from a larger sample of users.

Three estimations are shown below:

  1. Projections from one of AlgoLift’s proprietary LTV models
  2. Projections from a naive (multiplier-based) model, assuming the best case of a stable multiplier and long history
  3. Projections from a naive model with limited history

Even in the ideal case, the naive model shows a much larger standard deviation of errors (~2x) as well as a long tail of over-predicted cohorts. In the non-ideal case, it’s even more wide and the distribution shows considerable bias. On the contrary, the AlgoLift LTV projections have a symmetrical and narrower distribution, with a much smaller standard deviation and no long tail.

Projecting LTV is not straightforward, but a well maintained LTV projection pipeline has the following advantages:

  • 1-year projections that are typically within a margin of ~20% for each install date cohort, as opposed to multipliers that can have 50% or more error even in larger aggregations.
  • An approach that actually models customer behaviour across time, rather than assuming proportionality of LTV across time windows.
  • Models that can project 1-year LTV without a year or more of representative history needed to calculate a multiplier. For most apps, this requirement is not satisfied – it requires that the in-app user experience hasn’t changed significantly over the last year, which is rare.
  • Automated retraining, so that UA teams do not have to manually recalculate multipliers or early ROAS targets repeatedly.