Customer Retention analysis with SQL

Nnamdi
5 min readSep 18, 2020

--

Retention in simple terms means ability to keep new purchasers into repeat buyers.

Plotting a retention cohort helps a Growth marketer understand how well a product is performing. For a healthy product you will notice a decline in your retention graph before noticing a flatline or a smiling graph. For products experiencing a high churn rate the graph continues to decline until it eventually reaches 0.

Churn is the opposite of retention and this is when you lose the users you had initially acquired.

A common mistake made by many Growth Marketers is to focus solely on acquiring new users and very little time is spent figuring out how to keep current users. Retention is a very important part of the growth funnel. I wrote about creating a growth funnel and you can read about why Retention is so important.

In this article, I will be showing you how to use SQL to build a retention cohort. I decided to write this article because I had experienced difficulties trying to do a retention analysis with SQL and most of the articles I found weren’t really helpful to me.

Warning: We will be writing multiple subqueries, I will try my best to explain them as best as I can.

First thing we will do is check our Orders table and see the first time a user performed a transaction. This is important because we want to do a retention cohort based on activation, that is, we want to know the users who placed their first ever order in a particular week and see how that cohort performs week on week. Note: It is also possible for you to check retention cohort based on signups, age, location, channel etc you can go as granularly as you want as long as you have the data!

select min(extract(week from t.order_date)) as first_order, u.id as id
FROM
`order_table` as t
INNER JOIN

`user_table` as u
ON
u.id = t.userid
WHERE

t.order_status = 'SUCCESS'

GROUP BY 2

This will return the user id of every user that has performed a transaction as well as the week they performed this transaction.

Next, we will nest this query in another query that returns 3 columns, the week the user placed their first order, the week when/if they placed another order as well as the user id.

SELECT  extract(week from t.order_date) as subs_order, 
first_order,
first_time_order.id as users
FROM(
SELECT
min(extract(week from t.order_date)) as first_order,
u.id as id
FROM
`order_table` as t
INNER JOIN

`user_table` as u
ON
u.id = t.userid
WHERE

t.order_status = 'SUCCESS'

GROUP BY 2) as first_time_order
INNER JOIN
`order_table` as t
ON
first_time_order.id = t.userid
GROUP BY 1,2,3

This should give you a result that looks something like the image below

example of how your result should look

Now that we have gotten the user id’s , the week they placed their first order and the week they made subsequent orders, we can then build an 8 week cohort analysis to see how many distinct users that activated in a particular week and their retention over time.

To do this we will nest all the queries we have written so far in an SQL query that compares the difference between the subs_order and first_order

SELECT first_order,
round(count(DISTINCT IF( (subs_order - first_order) = 0, users, null))/count(DISTINCT IF( (subs_order - first_order) = 0, users, null)),2)as week_0,
round(count(DISTINCT IF( (subs_order- first_order =1), users,null))/count(DISTINCT IF( (subs_order - first_order) = 0, users, null)),2)as week_1,
round(count(DISTINCT IF( (subs_order - first_order =2), users,null))/count(DISTINCT IF( (subs_order - first_order) = 0, users, null)),2) as week_2,
round(count(DISTINCT IF((subs_order-first_order =3), users,null))/count(DISTINCT IF( (subs_order - first_order) = 0, users, null)),2) as week_3,
round(count(DISTINCT IF(subs_order-first_order =4, users,null))/count(DISTINCT IF( (subs_order - first_order) = 0, users, null)),2) as week_4,
round(count(DISTINCT IF((subs_order-first_order =5), users,null))/count(DISTINCT IF( (subs_order - first_order) = 0, users, null)),2) as week_5,
round(count(DISTINCT IF((subs_order-first_order =6), users,null))/count(DISTINCT IF( (subs_order - first_order) = 0, users, null)),2) as week_6,
round(count(DISTINCT IF((subs_order-first_order =7), users,null))/count(DISTINCT IF( (subs_order - first_order) = 0, users, null)),2) as week_7,
round(count(DISTINCT IF((subs_order-first_order =8), users,null))/count(DISTINCT IF( (subs_order - first_order) = 0, users, null)),2) as week_8
FROM(SELECT extract(week from t.order_date) as subs_order,
first_order,
first_time_order.id
FROM(
SELECT
min(extract(week from t.order_date)) as first_order,
u.id as id
FROM
`order_table` as t
INNER JOIN

`user_table` as u
ON
u.id = t.userid
WHERE

t.order_status = 'SUCCESS'

GROUP BY 2) as first_time_order
INNER JOIN
`order_table` as t
ON
first_time_order.id = t.userid
GROUP BY 1,2,3)
GROUP BY 1
order by 1 asc

In this final query what we did was to select the first_order, then counted distinct users if subsequent order week subtracted from week they placed their first order was equal to 0, this shows us users who placed their first order and since retention is shown as percentages we divided it by itself. Week_0 will always be 100% because that is the first time they performed transaction.

To find week_1, week_2 etc we just subtract first_order from subs_order and if it was equal to 1 then that was week_1, if it was equal to 2 then that was week_2 etc. The round function is so the output is rounded to two decimal places.

If we run this we will get very neat table that shows us to the weekly user retention of the different cohorts.

You can then export this data into data studio or your favourite data viz tool to visualize. If you are using BigQuery you can use the data connector to sync your data with Google sheets.

Regardless of what data visualization tool you use you will want to ensure you have a graph like below

Note: If your cohorts have’t stayed up to 8 weeks you might want to reduce the weeks to 4, alternatively you may just decide to just output the raw numbers ie do not divide by week_0, this will be good so as your cohort keeps spending more time you can track their retention.

Do not limit yourself to a weekly retention, depending on your type of product you may be more interested in a daily or monthly cohort. The logic is still the same.

I hope you have found this useful. If you have any questions you can DM me on twitter or on LinkedIn and I will be happy to answer them.

--

--

Nnamdi
Nnamdi

No responses yet