Predicting Customer Value: Applying BG-NBD and Gamma-Gamma Models in Retail Analytics

Published in

Python in Plain English

15 min readApr 17, 2024

Abstract

This project focuses on the application of the BG-NBD (Beta Geometric/Negative Binomial Distribution) and Gamma-Gamma models for estimating Customer Lifetime Value (CLTV) in a retail context. The data set includes customer purchase history from an omnichannel retail platform, with variables such as purchase frequency, recency, and monetary value. The BG-NBD model predicts future transaction numbers and customer churn probability, while the Gamma-Gamma model estimates the monetary value of transactions. Together, these models help businesses make informed decisions on sales, marketing, and resource allocation by understanding the future value of their customer base. The text also details the process of data preparation, model fitting, and CLTV calculation, providing insights into customer behavior and value segmentation.

Business Problem

FLO wants to determine a roadmap for sales and marketing activities. In order for the company to make medium-long term plans, it is necessary to estimate the potential value that existing customers will provide to the company in the future.

The data set consists of information obtained from the past shopping behavior of customers who made their last purchases from Flo via OmniChannel (both online and offline shopping) in 2020–2021.

Number of variables: 12
Number of observations: 19,945
Size of dataset: 2.7 MB

Variables

master_id: Unique customer number
order_channel: Which channel of the shopping platform is used (Android, iOS, Desktop, Mobile)
last_order_channel: Channel where the last purchase was made
first_order_date: The date of the customer’s first purchase
last_order_date: The last shopping date of the customer
last_order_date_online: The last shopping date of the customer on the online platform
last_order_date_offline: The last shopping date of the customer on the offline platform
order_num_total_ever_online: Total number of purchases made by the customer on the online platform
order_num_total_ever_offline: Total number of purchases made by the customer offline
customer_value_total_ever_offline: The total price paid by the customer for offline purchases
customer_value_total_ever_online: The total price paid by the customer for online purchases
interested_in_categories_12: List of categories the customer has shopped in the last 12 months

The BG-NBD (Beta Geometric/Negative Binomial Distribution) model and the Gamma-Gamma model are statistical models used in the field of customer analytics, particularly for predicting customer behavior and calculating customer lifetime value (CLV). Let’s delve into each model separately:

BG-NBD Model

The BG-NBD model is a probabilistic model that predicts the number of future transactions a customer will make over a certain period. It is based on two types of customer behavior:

Transaction Behavior

The model assumes that the number of transactions a customer makes follows a Negative Binomial Distribution (NBD). This part of the model accounts for the transactional heterogeneity across customers, meaning that different customers have different purchasing patterns.

Dropout Behavior

The model also incorporates a customer’s probability of becoming inactive or “dropping out.” This is modeled using a Beta Geometric (BG) distribution. The BG distribution captures the heterogeneity in the dropout process, recognizing that customers have different propensities to churn.

The combination of these two distributions allows the BG-NBD model to estimate the expected number of transactions for each customer and the probability that a customer is still active. It is particularly useful for businesses with repeat purchase patterns, such as subscription services or regular consumable goods.

Gamma-Gamma Model

The Gamma-Gamma model is often used in conjunction with the BG-NBD model to predict the monetary value of future transactions. While the BG-NBD model focuses on the purchase frequency, the Gamma-Gamma model is concerned with the monetary aspect.

The Gamma-Gamma model assumes that:
1. The monetary value of a customer’s transactions is randomly distributed around their average transaction value.
2. The average transaction value varies across customers but does not vary over time for any given customer.
3. The distribution of average transaction values across customers follows a Gamma distribution.

By fitting the Gamma-Gamma model to historical transaction data, businesses can estimate the expected average profit from a customer’s future transactions. This model is only applicable to customers with repeat purchases, as it requires a history of transaction values to make predictions.

Combining BG-NBD and Gamma-Gamma for CLTV

To calculate the Customer Lifetime Value (CLV), businesses often use the BG-NBD model to predict the number of future transactions and the Gamma-Gamma model to predict the average profit per transaction. By combining these two predictions, businesses can estimate the total future profit from a customer over a given time horizon.

The CLV is then calculated as:
CLV = Expected Number of Transactions (from BG-NBD) × Expected Average Profit per Transaction (from Gamma-Gamma)

These models are valuable for businesses because they help in making informed decisions about marketing, customer retention, and resource allocation by understanding the future value of their customer base.

The frequency column represents the number of repeat purchases (i.e., total purchases minus 1), recency is the age of the customer at their most recent purchase, T is the age of the customer in the dataset time units, and monetary_value is the average profit from each transaction for each customer.

Task 1: Understanding and Preparing Data

Importing the libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
from lifetimes import BetaGeoFitter, GammaGammaFitter
from lifetimes.plotting import plot_period_transactions, plot_transaction_rate_heterogeneity, plot_frequency_recency_matrix
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 500)
pd.set_option('display.float_format', lambda x: '%.2f' % x)

Read the flo.csv data and create a copy of the Dataframe.

flo = pd.read_csv('Datasets/flo.csv')

                            master_id order_channel last_order_channel first_order_date last_order_date last_order_date_online last_order_date_offline  order_num_total_ever_online  order_num_total_ever_offline  customer_value_total_ever_offline  customer_value_total_ever_online       interested_in_categories_12
0  cc294636-19f0-11eb-8d74-000d3a38a36f   Android App            Offline       2020-10-30      2021-02-26             2021-02-21              2021-02-26                         4.00                          1.00                             139.99                            799.38                           [KADIN]
1  f431bd5a-ab7b-11e9-a2fc-000d3a38a36f   Android App             Mobile       2017-02-08      2021-02-16             2021-02-16              2020-01-10                        19.00                          2.00                             159.97                           1853.58  [ERKEK, COCUK, KADIN, AKTIFSPOR]
2  69b69676-1a40-11ea-941b-000d3a38a36f   Android App        Android App       2019-11-27      2020-11-27             2020-11-27              2019-12-01                         3.00                          2.00                             189.97                            395.35                    [ERKEK, KADIN]
3  1854e56c-491f-11eb-806e-000d3a38a36f   Android App        Android App       2021-01-06      2021-01-17             2021-01-17              2021-01-06                         1.00                          1.00                              39.99                             81.98               [AKTIFCOCUK, COCUK]
4  d6ea1074-f1f5-11e9-9346-000d3a38a36f       Desktop            Desktop       2019-08-03      2021-03-07             2021-03-07              2019-08-03                         1.00                          1.00                              49.99                            159.99                       [AKTIFSPOR]

Define the outlier_thresholds and replace_with_thresholds functions required to suppress outliers

def outlier_thresholds(df, feature, q1=0.05, q3=0.95):
    # Calculate percentiles (Q1 and Q3)
    Q1 = df[feature].quantile(q1)
    Q3 = df[feature].quantile(q3)

    # Calculate the IQR (Interquartile Range)
    IQR = Q3 - Q1

    # Determine the outlier cutoffs
    lower_bound = int(round(Q1 - 1.5 * IQR, 0))
    upper_bound = int(round(Q3 + 1.5 * IQR, 0))

    # Identify outlier indices
    outlier_indices = df.index[(df[feature] < lower_bound) | (df[feature] > upper_bound)].tolist()

    return lower_bound, upper_bound

def replace_with_thresholds(dataframe, variable):
    lower_bound, upper_bound = outlier_thresholds(dataframe, variable)
    dataframe.loc[(dataframe[variable] < lower_bound), variable] = lower_bound
    dataframe.loc[(dataframe[variable] > upper_bound), variable] = upper_bound

Suppress “order_num_total_ever_online”, “order_num_total_ever_offline”, “customer_value_total_ever_offline”, “customer_value_total_ever_online” variables if they have outliers

variables = [col for col in flo.columns if 'ever' in col]
for variable in variables:
    replace_with_thresholds(flo, variable)

Omnichannel refers to customers shopping both online and offline platforms. Create new variables for the total number of purchases and expenditures of each customer.

flo['total_order_num'] = flo['order_num_total_ever_online'] + flo['order_num_total_ever_offline']
flo['total_order_value'] = flo['customer_value_total_ever_offline'] + flo['customer_value_total_ever_online']

Examine the variable types. Change the type of variables expressing date to date.

flo.dtypes
date_variables = [col for col in flo.columns if 'date' in col]
flo[date_variables] = flo[date_variables].apply(pd.to_datetime)
flo.dtypes

# Before changing the data type
master_id                             object
order_channel                         object
last_order_channel                    object
first_order_date                      object
last_order_date                       object
last_order_date_online                object
last_order_date_offline               object
order_num_total_ever_online          float64
order_num_total_ever_offline         float64
customer_value_total_ever_offline    float64
customer_value_total_ever_online     float64
interested_in_categories_12           object
total_order_num                      float64
total_order_value                    float64
dtype: object

# After changing the data type
master_id                                    object
order_channel                                object
last_order_channel                           object
first_order_date                     datetime64[ns]
last_order_date                      datetime64[ns]
last_order_date_online               datetime64[ns]
last_order_date_offline              datetime64[ns]
order_num_total_ever_online                 float64
order_num_total_ever_offline                float64
customer_value_total_ever_offline           float64
customer_value_total_ever_online            float64
interested_in_categories_12                  object
total_order_num                             float64
total_order_value                           float64
dtype: object

Task 2: Creating the CLTV Data Structure

Take 2 days after the date of the last purchase in the data set as the analysis date.

flo['last_order_date'].max()    # Timestamp('2021-05-30 00:00:00')
analysis_date = dt.datetime(2021, 6, 1)

Create a new cltv dataframe containing customer_id, recency_cltv_weekly, T_weekly, frequency and monetary_cltv_avg values. Monetary value should be expressed as the average value per purchase, and recency and tenure values should be expressed on a weekly basis.

cltv = pd.DataFrame()
cltv['customer_id'] = flo['master_id']
cltv['recency_cltv_weekly'] = (flo['last_order_date'] - flo['first_order_date']).astype('timedelta64[D]') / 7
cltv['T_weekly'] = (analysis_date - flo['first_order_date']).astype('timedelta64[D]') / 7
cltv['frequency'] = flo['total_order_num']
cltv['monetary_cltv_avg'] = flo['total_order_value'] / flo['total_order_num']
cltv = cltv[cltv['frequency'] > 1]

                                customer_id  recency_cltv_weekly  T_weekly  frequency  monetary_cltv_avg
0      cc294636-19f0-11eb-8d74-000d3a38a36f                17.00     30.57       5.00             187.87
1      f431bd5a-ab7b-11e9-a2fc-000d3a38a36f               209.86    224.86      21.00              95.88
2      69b69676-1a40-11ea-941b-000d3a38a36f                52.29     78.86       5.00             117.06
3      1854e56c-491f-11eb-806e-000d3a38a36f                 1.57     20.86       2.00              60.98
4      d6ea1074-f1f5-11e9-9346-000d3a38a36f                83.14     95.43       2.00             104.99
                                     ...                  ...       ...        ...                ...
19940  727e2b6e-ddd4-11e9-a848-000d3a38a36f                41.14     88.43       3.00             133.99
19941  25cd53d4-61bf-11ea-8dd8-000d3a38a36f                42.29     65.29       2.00             195.24
19942  8aea4c2a-d6fc-11e9-93bc-000d3a38a36f                88.71     89.86       3.00             210.98
19943  e50bb46c-ff30-11e9-a5e8-000d3a38a36f                98.43    113.86       6.00             168.29
19944  740998d2-b1f7-11e9-89fa-000d3a38a36f                39.57     91.00       2.00             130.98
[19945 rows x 5 columns]

Task 3: Establishing BG/NBD, Gamma-Gamma Models and Calculating CLTV

Fit the BG/NBD model

bgf = BetaGeoFitter(penalizer_coef=0.001)
bgf.fit(
    cltv['frequency'],
    cltv['recency_cltv_weekly'],
    cltv['T_weekly']
)

<lifetimes.BetaGeoFitter: fitted with 19945 subjects, a: 0.00, alpha: 80.49, b: 0.00, r: 3.83>

Estimate the expected purchases from customers within 3 months and add it to the cltv dataframe as expected_3_month.

cltv['expected_3_month'] = bgf.predict(
    4*3,
    cltv['frequency'],
    cltv['recency_cltv_weekly'],
    cltv['T_weekly']
)

                                customer_id  recency_cltv_weekly  T_weekly  frequency  monetary_cltv_avg  expected_3_month
0      cc294636-19f0-11eb-8d74-000d3a38a36f                17.00     30.57       5.00             187.87              0.95
1      f431bd5a-ab7b-11e9-a2fc-000d3a38a36f               209.86    224.86      21.00              95.88              0.98
2      69b69676-1a40-11ea-941b-000d3a38a36f                52.29     78.86       5.00             117.06              0.66
3      1854e56c-491f-11eb-806e-000d3a38a36f                 1.57     20.86       2.00              60.98              0.69
4      d6ea1074-f1f5-11e9-9346-000d3a38a36f                83.14     95.43       2.00             104.99              0.40
                                     ...                  ...       ...        ...                ...               ...
19940  727e2b6e-ddd4-11e9-a848-000d3a38a36f                41.14     88.43       3.00             133.99              0.48
19941  25cd53d4-61bf-11ea-8dd8-000d3a38a36f                42.29     65.29       2.00             195.24              0.48
19942  8aea4c2a-d6fc-11e9-93bc-000d3a38a36f                88.71     89.86       3.00             210.98              0.48
19943  e50bb46c-ff30-11e9-a5e8-000d3a38a36f                98.43    113.86       6.00             168.29              0.61
19944  740998d2-b1f7-11e9-89fa-000d3a38a36f                39.57     91.00       2.00             130.98              0.41
[19945 rows x 6 columns]

Estimate the expected purchases from customers within 6 months and add it to the cltv dataframe as exp_sales_6_month.

cltv['exp_sales_6_month'] = bgf.predict(
    4*6,
    cltv['frequency'],
    cltv['recency_cltv_weekly'],
    cltv['T_weekly']
)

                                customer_id  recency_cltv_weekly  T_weekly  frequency  monetary_cltv_avg  expected_3_month  exp_sales_6_month
0      cc294636-19f0-11eb-8d74-000d3a38a36f                17.00     30.57       5.00             187.87              0.95               1.91
1      f431bd5a-ab7b-11e9-a2fc-000d3a38a36f               209.86    224.86      21.00              95.88              0.98               1.95
2      69b69676-1a40-11ea-941b-000d3a38a36f                52.29     78.86       5.00             117.06              0.66               1.33
3      1854e56c-491f-11eb-806e-000d3a38a36f                 1.57     20.86       2.00              60.98              0.69               1.38
4      d6ea1074-f1f5-11e9-9346-000d3a38a36f                83.14     95.43       2.00             104.99              0.40               0.79
                                     ...                  ...       ...        ...                ...               ...                ...
19940  727e2b6e-ddd4-11e9-a848-000d3a38a36f                41.14     88.43       3.00             133.99              0.48               0.97
19941  25cd53d4-61bf-11ea-8dd8-000d3a38a36f                42.29     65.29       2.00             195.24              0.48               0.96
19942  8aea4c2a-d6fc-11e9-93bc-000d3a38a36f                88.71     89.86       3.00             210.98              0.48               0.96
19943  e50bb46c-ff30-11e9-a5e8-000d3a38a36f                98.43    113.86       6.00             168.29              0.61               1.21
19944  740998d2-b1f7-11e9-89fa-000d3a38a36f                39.57     91.00       2.00             130.98              0.41               0.82
[19945 rows x 7 columns]

plot_period_transactions(bgf)
plt.show()

plot_frequency_recency_matrix(bgf)
plt.show()

plot_transaction_rate_heterogeneity(bgf)
plt.show()

Examine the 10 people who will make the most purchases in the 3rd and 6th months and analyze whether there is a difference between them.

cltv.sort_values('expected_3_month', ascending=False).head(10)

                               customer_id  recency_cltv_weekly  T_weekly  frequency  monetary_cltv_avg  expected_3_month  exp_sales_6_month
8328   1902bf80-0035-11eb-8341-000d3a38a36f                28.86     33.29      25.00              97.44              3.04               6.08
15611  4a7e875e-e6ce-11ea-8f44-000d3a38a36f                39.71     40.00      26.00             156.79              2.97               5.94
19538  55d54d9e-8ac7-11ea-8ec0-000d3a38a36f                52.57     58.71      28.00             156.49              2.74               5.49
14373  f00ad516-c4f4-11ea-98f7-000d3a38a36f                38.00     46.43      25.00             152.66              2.73               5.45
6666   53fe00d4-7b7a-11eb-960b-000d3a38a36f                 9.71     13.00      17.00             206.93              2.67               5.35
7330   a4d534a2-5b1b-11eb-8dbd-000d3a38a36f                62.71     67.29      28.00             165.70              2.58               5.17
6756   27310582-6362-11ea-a6dc-000d3a38a36f                62.71     64.14      25.00             156.24              2.39               4.78
14054  645b95bc-544e-11ea-b1db-000d3a38a36f                71.43     72.00      26.00             160.99              2.35               4.69
1364   a2c95e4e-5b09-11ea-acac-000d3a38a36f                55.29     67.43      23.00             156.60              2.18               4.35
5759   dd8f7930-615f-11ea-8dd8-000d3a38a36f                32.43     47.00      19.00              65.15              2.15               4.30

cltv.sort_values('exp_sales_6_month', ascending=False).head(10)

                               customer_id  recency_cltv_weekly  T_weekly  frequency  monetary_cltv_avg  expected_3_month  exp_sales_6_month
8328   1902bf80-0035-11eb-8341-000d3a38a36f                28.86     33.29      25.00              97.44              3.04               6.08
15611  4a7e875e-e6ce-11ea-8f44-000d3a38a36f                39.71     40.00      26.00             156.79              2.97               5.94
19538  55d54d9e-8ac7-11ea-8ec0-000d3a38a36f                52.57     58.71      28.00             156.49              2.74               5.49
14373  f00ad516-c4f4-11ea-98f7-000d3a38a36f                38.00     46.43      25.00             152.66              2.73               5.45
6666   53fe00d4-7b7a-11eb-960b-000d3a38a36f                 9.71     13.00      17.00             206.93              2.67               5.35
7330   a4d534a2-5b1b-11eb-8dbd-000d3a38a36f                62.71     67.29      28.00             165.70              2.58               5.17
6756   27310582-6362-11ea-a6dc-000d3a38a36f                62.71     64.14      25.00             156.24              2.39               4.78
14054  645b95bc-544e-11ea-b1db-000d3a38a36f                71.43     72.00      26.00             160.99              2.35               4.69
1364   a2c95e4e-5b09-11ea-acac-000d3a38a36f                55.29     67.43      23.00             156.60              2.18               4.35
5759   dd8f7930-615f-11ea-8dd8-000d3a38a36f                32.43     47.00      19.00              65.15              2.15               4.30

Based on the analysis, the top 10 customers who will make the most purchases in the 3rd and 6th months are the same. These customers have high recency scores, indicating that they have made recent purchases. They also have high frequency scores, suggesting that they make purchases frequently. Their monetary scores are relatively high, indicating that they are high spenders. These customers are expected to bring significant revenue to the company in the coming months.

Fit the Gamma-Gamma model. Estimate the average value that customers will leave and add it to the cltv dataframe as exp_average_value.

gg = GammaGammaFitter(penalizer_coef=0.01)
gg.fit(
    cltv['frequency'],
    cltv['monetary_cltv_avg']
)

<lifetimes.GammaGammaFitter: fitted with 19945 subjects, p: 4.15, q: 0.47, v: 4.08>

cltv['exp_average_value'] = gg.conditional_expected_average_profit(
    cltv['frequency'],
    cltv['monetary_cltv_avg']
)

                                customer_id  recency_cltv_weekly  T_weekly  frequency  monetary_cltv_avg  expected_3_month  exp_sales_6_month  exp_average_value
0      cc294636-19f0-11eb-8d74-000d3a38a36f                17.00     30.57       5.00             187.87              0.95               1.91             193.63
1      f431bd5a-ab7b-11e9-a2fc-000d3a38a36f               209.86    224.86      21.00              95.88              0.98               1.95              96.67
2      69b69676-1a40-11ea-941b-000d3a38a36f                52.29     78.86       5.00             117.06              0.66               1.33             120.97
3      1854e56c-491f-11eb-806e-000d3a38a36f                 1.57     20.86       2.00              60.98              0.69               1.38              67.32
4      d6ea1074-f1f5-11e9-9346-000d3a38a36f                83.14     95.43       2.00             104.99              0.40               0.79             114.33
                                     ...                  ...       ...        ...                ...               ...                ...                ...
19940  727e2b6e-ddd4-11e9-a848-000d3a38a36f                41.14     88.43       3.00             133.99              0.48               0.97             141.36
19941  25cd53d4-61bf-11ea-8dd8-000d3a38a36f                42.29     65.29       2.00             195.24              0.48               0.96             210.72
19942  8aea4c2a-d6fc-11e9-93bc-000d3a38a36f                88.71     89.86       3.00             210.98              0.48               0.96             221.78
19943  e50bb46c-ff30-11e9-a5e8-000d3a38a36f                98.43    113.86       6.00             168.29              0.61               1.21             172.65
19944  740998d2-b1f7-11e9-89fa-000d3a38a36f                39.57     91.00       2.00             130.98              0.41               0.82             142.09
[19945 rows x 8 columns]

Calculate 6-month CLTV and add it to the dataframe with the name cltv_df. Observe the 20 people with the highest CLTV values.

cltv_df = gg.customer_lifetime_value(
    bgf,
    cltv['frequency'],
    cltv['recency_cltv_weekly'],
    cltv['T_weekly'],
    cltv['monetary_cltv_avg'],
    time = 6,
    freq = 'W',
    discount_rate = 0.01
)

cltv['cltv_df'] = cltv_df
cltv.sort_values('cltv_df', ascending=False).head(20)

                                customer_id  recency_cltv_weekly  T_weekly  frequency  monetary_cltv_avg  expected_3_month  exp_sales_6_month  exp_average_value  cltv_df
9055   47a642fe-975b-11eb-8c2a-000d3a38a36f                 2.86      7.86       4.00            1065.80              1.06               2.13            1101.98  2458.35
13880  7137a5c0-7aad-11ea-8f20-000d3a38a36f                 6.14     13.14      11.00             394.09              1.90               3.80             399.09  1591.37
8868   9ce6e520-89b0-11ea-a6e7-000d3a38a36f                 3.43     34.43       8.00             601.23              1.23               2.47             611.49  1584.70
6402   851de3b4-8f0c-11eb-8cb8-000d3a38a36f                 8.29      9.43       2.00             862.69              0.78               1.56             923.68  1507.20
14858  031b2954-6d28-11eb-99c4-000d3a38a36f                14.86     15.57       3.00             743.59              0.85               1.71             778.05  1392.35
6717   40b4f318-9dfb-11eb-9c47-000d3a38a36f                27.14     33.86       7.00             544.70              1.14               2.27             555.41  1324.24
11694  90f1b7f2-bbad-11ea-a0c9-000d3a38a36f                47.29     48.00       6.00             647.34              0.92               1.84             662.11  1275.11
1853   f02473b0-43c3-11eb-806e-000d3a38a36f                17.29     23.14       2.00             835.88              0.67               1.35             895.04  1267.18
11179  d2e74a36-3228-11eb-860c-000d3a38a36f                 1.14     26.29       3.00             750.57              0.77               1.53             785.34  1264.37
7936   ae4ce104-dbd4-11ea-8757-000d3a38a36f                 3.71     42.00       3.00             844.35              0.67               1.34             883.29  1239.61
9738   3a27b334-dff4-11ea-acaa-000d3a38a36f                40.00     41.14       3.00             837.06              0.67               1.35             875.67  1237.59
7312   90befc98-925a-11eb-b584-000d3a38a36f                 4.14      8.86       6.00             431.33              1.32               2.64             441.40  1222.48
7171   77e66e92-31fa-11eb-860c-000d3a38a36f                16.86     26.29       5.00             566.76              0.99               1.98             582.44  1212.44
15516  9083981a-f59e-11e9-841e-000d3a38a36f                63.57     83.86       4.00             971.50              0.57               1.14            1004.57  1204.68
10876  ae149d98-9b6a-11eb-9c47-000d3a38a36f                 6.14      7.14       9.00             317.48              1.76               3.51             322.51  1188.72
6666   53fe00d4-7b7a-11eb-960b-000d3a38a36f                 9.71     13.00      17.00             206.93              2.67               5.35             208.74  1170.99
18997  41231c72-566a-11eb-9e65-000d3a38a36f                 2.57      4.57       7.00             344.01              1.53               3.05             351.00  1125.00
16087  4031bc1e-c52a-11ea-9dde-000d3a38a36f                38.71     46.43       7.00             512.40              1.02               2.05             522.51  1122.39
1775   020e2b84-5bbb-11eb-8dbd-000d3a38a36f                 1.00     18.71       5.00             443.06              1.07               2.14             455.51  1020.57
6857   0515a7ec-d49f-11ea-9838-000d3a38a36f                41.14     43.29      16.00             247.93              1.92               3.84             250.18  1009.14

Task 4: Creating Segments According to cltv_df Value

Divide all your customers into 4 groups (segments) according to 6-month cltv_df and add the group names to the data set.

cltv['cltv_segment'] = pd.qcut(cltv['cltv_df'], 4, labels=['D', 'C', 'B', 'A'])

                                customer_id  recency_cltv_weekly  T_weekly  frequency  monetary_cltv_avg  expected_3_month  exp_sales_6_month  exp_average_value  cltv_df cltv_segment
0      cc294636-19f0-11eb-8d74-000d3a38a36f                17.00     30.57       5.00             187.87              0.95               1.91             193.63   387.52            A
1      f431bd5a-ab7b-11e9-a2fc-000d3a38a36f               209.86    224.86      21.00              95.88              0.98               1.95              96.67   197.91            B
2      69b69676-1a40-11ea-941b-000d3a38a36f                52.29     78.86       5.00             117.06              0.66               1.33             120.97   168.73            B
3      1854e56c-491f-11eb-806e-000d3a38a36f                 1.57     20.86       2.00              60.98              0.69               1.38              67.32    97.46            D
4      d6ea1074-f1f5-11e9-9346-000d3a38a36f                83.14     95.43       2.00             104.99              0.40               0.79             114.33    95.35            D
                                     ...                  ...       ...        ...                ...               ...                ...                ...      ...          ...
19940  727e2b6e-ddd4-11e9-a848-000d3a38a36f                41.14     88.43       3.00             133.99              0.48               0.97             141.36   143.86            C
19941  25cd53d4-61bf-11ea-8dd8-000d3a38a36f                42.29     65.29       2.00             195.24              0.48               0.96             210.72   212.09            B
19942  8aea4c2a-d6fc-11e9-93bc-000d3a38a36f                88.71     89.86       3.00             210.98              0.48               0.96             221.78   223.80            B
19943  e50bb46c-ff30-11e9-a5e8-000d3a38a36f                98.43    113.86       6.00             168.29              0.61               1.21             172.65   219.82            B
19944  740998d2-b1f7-11e9-89fa-000d3a38a36f                39.57     91.00       2.00             130.98              0.41               0.82             142.09   121.57            C
[19945 rows x 10 columns]

Make 6-month action recommendations to the management for the segments.

cltv.groupby('cltv_segment').agg({
    'recency_cltv_weekly': ['mean', 'sum'],
    'T_weekly': ['mean', 'sum'],
    'frequency': ['mean', 'sum'],
    'monetary_cltv_avg': ['mean', 'sum'],
    'exp_sales_6_month': ['mean', 'sum'],
    'exp_average_value': ['mean', 'sum'],
    'cltv_df': ['mean', 'sum']
})

             recency_cltv_weekly           T_weekly           frequency          monetary_cltv_avg            exp_sales_6_month         exp_average_value            cltv_df           
                            mean       sum     mean       sum      mean      sum              mean        sum              mean     sum              mean        sum    mean        sum
cltv_segment                                                                                                                                                                           
D                         138.51 690772.57   161.73 806528.29      3.77 18821.00             92.81  462857.90              0.82 4094.58             98.33  490368.34   80.48  401331.61
C                          92.71 462256.00   112.81 562491.86      4.38 21830.00            125.86  627533.90              1.05 5212.31            132.33  659788.87  137.94  687755.47
B                          83.01 413870.29   101.33 505242.86      5.13 25592.00            160.53  800382.01              1.19 5953.44            167.85  836887.49  198.43  989390.83
A                          66.81 333131.57    82.01 408877.71      6.36 31717.00            228.63 1139936.58              1.50 7496.35            237.88 1186088.00  352.96 1759872.40

The data given above shows the average values for various metrics for different customer segments (D, C, B, A). Let’s analyze the data deeply:

Recency (recency_cltv_weekly): The average recency values for each segment decrease as we move from segment D to segment A. This indicates that customers in segment A have made more recent purchases compared to customers in segment D.

Lifetime (T_weekly): The average lifetime values for each segment increase as we move from segment D to segment A. This suggests that customers in segment A have been active for a longer period of time compared to customers in segment D.

Frequency: The average frequency values for each segment increase as we move from segment D to segment A. This indicates that customers in segment A make more frequent purchases compared to customers in segment D.

Monetary Value (monetary_cltv_avg): The average monetary values for each segment increase as we move from segment D to segment A. This suggests that customers in segment A spend more money on average compared to customers in segment D.

Expected Sales in 6 Months (exp_sales_6_month): The average expected sales values for each segment increase as we move from segment D to segment A. This indicates that customers in segment A are expected to generate more sales in the next 6 months compared to customers in segment D.

Expected Average Value (exp_average_value): The average expected average value for each segment increases as we move from segment D to segment A. This suggests that customers in segment A are expected to have a higher average transaction value compared to customers in segment D.

CLTV (cltv_df): The average CLTV values for each segment increase as we move from segment D to segment A. This indicates that customers in segment A have a higher customer lifetime value compared to customers in segment D.

Overall, customers in segment A have the highest recency, lifetime, frequency, monetary value, expected sales, expected average value, and CLTV. On the other hand, customers in segment D have the lowest values for these metrics. This suggests that customers in segment A are the most valuable and engaged customers, while customers in segment D are the least valuable and engaged.

Conclusions

In conclusion, the integration of the BG-NBD and Gamma-Gamma models presents a robust approach to predicting customer behavior and calculating Customer Lifetime Value (CLTV) in retail. By analyzing purchase history data, these models offer valuable insights into customer retention and profitability, guiding strategic business decisions. The study underscores the importance of CLTV as a metric for evaluating marketing efforts and optimizing resource allocation. Future research could explore the models’ applicability across different industries and customer segments, potentially enhancing their predictive accuracy and broadening their utility in diverse market contexts.

In Plain English 🚀

Thank you for being a part of the In Plain English community! Before you go:

Be sure to clap and follow the writer ️👏️️
Follow us: X | LinkedIn | YouTube | Discord | Newsletter
Visit our other platforms: Stackademic | CoFeed | Venture | Cubed
More content at PlainEnglish.io