Customer Value & Retention Intelligence

Case Objective:
To segment customers based on behavior and revenue contribution, track how their engagement changes over time, and visualize these dynamics using SQL, Power BI, and Python.
  • Technologies

    • PostgreSQL (CTE, Window Functions)
    • Cloud DB: Neon
    • Power BI (Power Query, DAX metrics)
    • Python (Pandas, NumPy, Matplotlib)
  • Business Value

    Identify, retain and grow your most valuable customer segments. Map cohort-based retention strategies and revenue dynamics across lifecycle.

  • Dashboard Highlights

    • RFM 3D Segment Explorer
    • Chord Diagram of Segment Transitions
    • Cohort Growth Ribbon
    • Bubbles: Size × Growth × Revenue
Pipeline
Neon.tech
Neon.tech
Cloud Infrastructure
  • Deployed an Azure-backed serverless PostgreSQL instance on Neon.
  • Imported a synthetic orders dataset and predefined logic for 10 RFM segments with DBeaver.
PostgreSQL
PostgreSQL
SQL Views
  • Defined customer cohorts by assigning each user to their first purchase quarter.
  • Calculated current RFM segments based on the last 6 months of customer activity.
  • Tracked historical RFM segments at the end of each past quarter.
Power BI
Power BI
Interactive Dashboard
  • Connected to PostgreSQL and loaded pre-aggregated SQL views via Power Query.
  • Built a dynamic date dimension table for time intelligence.
  • Created DAX measures (L3M growth, distinct counts, and cohort-based KPIs).
  • Designed visual dashboards using Ribbon, Bubble, Treemap, and Chord charts.
  • Published the report to PBI Service with public web access and a live DB connection.
Python
Python
Custom Visualization
  • Developed 3D RFM segment matrix visualizations using matplotlib.voxels.
A structured view of the full data pipeline — from CSV data ingestion to PostgreSQL view modeling and integration with PBI.
Data Flow Architecture
Quarterly Cohort in PostgreSQL
This SQL view assigns each customer to a cohort based on the quarter of their first purchase. It uses MIN(order_date) combined with DATE_TRUNC('quarter') and TO_CHAR() to label cohorts in a readable format like 2024q1_Cohort.
-- =============================================
-- View: public.customer_cohort_quarterly
-- Purpose: Assigns each customer to the quarter of their first purchase.
--          Provides both a timestamp and a formatted cohort label.
-- =============================================

CREATE OR REPLACE VIEW public.customer_cohort_quarterly AS
SELECT
    customer_id,

    -- The start of the quarter when the customer made their first purchase
    DATE_TRUNC('quarter', MIN(order_date)) AS cohort_start_date,

    -- A formatted label like "2024q1_Cohort" for use in reporting and visualizations
    TO_CHAR(DATE_TRUNC('quarter', MIN(order_date)), 'YYYY"q"Q') || '_Cohort' AS cohort_label

FROM orders
GROUP BY customer_id;
RFM (historical) in PostgreSQL
This SQL view calculates Recency, Frequency, and Monetary scores for each customer at the end of past quarters. Using a rolling 6-month window, it captures historical segment dynamics and assigns each customer to a segment based on RFM scoring logic. The result enables longitudinal analysis of customer value evolution over time.
-- =============================================
-- View: public.rfm_historical_segments
-- Purpose: Historical RFM segmentation per customer at end of each quarter
-- =============================================

CREATE OR REPLACE VIEW public.rfm_historical_segments AS

WITH
-- 🔹 Define fixed snapshot dates (end of quarters)
quarter_snapshots AS (
    SELECT DATE '2023-12-31' AS snapshot_date
    UNION ALL
    SELECT DATE '2024-03-31'
),

-- 🔹 Raw RFM metrics per customer within 6 months before each snapshot
rfm_historical_raw AS (
    SELECT
        o.customer_id,
        q.snapshot_date,
        COUNT(o.order_id) AS frequency,
        SUM(o.amount) AS monetary,
        DATE_PART(
            'day',
            AGE(q.snapshot_date::timestamp, MAX(o.order_date)::timestamp)
        ) AS recency
    FROM orders o
    JOIN quarter_snapshots q
      ON o.order_date BETWEEN (q.snapshot_date - INTERVAL '6 months') AND q.snapshot_date
    GROUP BY o.customer_id, q.snapshot_date
),

-- 🔹 Apply R, F, M scoring using NTILE partitioned by snapshot
rfm_scored AS (
    SELECT
        *,
        NTILE(3) OVER (PARTITION BY snapshot_date ORDER BY recency)        AS r_score,
        NTILE(3) OVER (PARTITION BY snapshot_date ORDER BY frequency DESC) AS f_score,
        NTILE(3) OVER (PARTITION BY snapshot_date ORDER BY monetary DESC)  AS m_score
    FROM rfm_historical_raw
),

-- 🔹 Combine scores and assign RFM code
rfm_final AS (
    SELECT
        customer_id,
        snapshot_date,
        'historical'::text AS segment_type,
        recency,
        frequency,
        monetary,
        r_score,
        f_score,
        m_score,
        CONCAT(r_score, f_score, m_score) AS rfm_code
    FROM rfm_scored
)

-- 🔹 Final output: RFM metrics and mapped segment name
SELECT
    rf.customer_id,
    rf.snapshot_date,
    rf.segment_type,
    rf.rfm_code,
    rf.recency,
    rf.frequency,
    rf.monetary,
    rf.r_score,
    rf.f_score,
    rf.m_score,
    sd.segment
FROM rfm_final rf
LEFT JOIN segment_definitions sd ON rf.rfm_code = sd.rfm_code;
Samples of Visualization in Power BI
Explore interactive dashboards that reflect real-time segment dynamics, cohort behavior, and revenue trends.
RFM Segment Matrix Rendered with Python
This matrix-style visualization of RFM segments was created using Python and Matplotlib. It provides a clear spatial view of how customer segments are distributed across Recency, Frequency, and Monetary scores.
Python Script for Custom Visuals in PBI
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script: 

# dataset = pandas.DataFrame(F, M, R, segment)
# dataset = dataset.drop_duplicates()

# Paste or type your script code here:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.patches import Patch

#Segments to display
df = dataset[
    (dataset['segment'] == 'Champions') |
    (dataset['segment'] == 'Loyal Customers') |
    (dataset['segment'] == 'Potential Loyalists') |
    (dataset['segment'] == 'New Customers')
]

# Define the 3D grid dimensions
grid_size = (3, 3, 3)
voxels = np.zeros(grid_size, dtype=bool)
colors = np.empty(grid_size, dtype=object)

# Define segment colors
segment_color_map = {
    'Champions': '#FFEE00',
    'Loyal Customers': '#B0D0E0',
    'Potential Loyalists': '#CD7F32',
    'New Customers': '#50C878'
}

# Fill voxel grid and assign colors
for _, row in df.iterrows():
    r, f, m, segm = row['R'], row['F'], row['M'], row['segment']
    voxels[r-1, f-1, m-1] = True  # Adjust for zero-based indexing
    colors[r-1, f-1, m-1] = segment_color_map[segm]

fig = plt.figure(figsize=(9, 9))
ax = fig.add_subplot(111, projection='3d')

ax.voxels(voxels, facecolors=colors, edgecolors=None, alpha=0.63, linewidth=0.1)

ax.view_init(elev=20, azim=160)

ax.set_xlabel('Recency (R)')
ax.set_ylabel('Frequency (F)')
ax.set_zlabel('Monetary (M)')
ax.set_xlim(0,3)
ax.set_ylim(0,3)
ax.set_xticks([0, 1, 2, 3])
ax.set_yticks([0, 1, 2, 3])
ax.set_zticks([0, 1, 2, 3])

legend_patches = [Patch(facecolor=color, label=segm) for segm, color in segment_color_map.items()]
ax.legend(handles=legend_patches, bbox_to_anchor=(1.05, 1))

plt.tight_layout()
plt.show()
Explore the Code on GitHub!
I’m open to new job opportunities, projects, and collaborations — feel free to reach out.
Aleksei Ogarkov
Feel free to contact me.
Aleksei Ogarkov
Made on
Tilda