Business Problem is KEY to Model Evaluation of Telco Customer Churn
I’ve noticed that there are more than enough cool visualizations and mostly cookie-cutter solutions to ML problems, but not enough focus on the business side to understanding the problem and evaluating the ML model results.
The purpose of why the ML problem exists is to provide actionable insight and value to the business.
So here goes…
According to this article, the cost of acquiring a new customer is almost 50x the cost of retaining an existing customer for telecom companies.
However, the solutions I’ve found for this Kaggle data set don’t explain the impact of the model nor dig deep enough into the accuracy.
Let’s take a naive model for example using random forest classification.
It would be ‘naive’ to compare the naive model’s accuracy of 0.80 versus the improved model’s accuracy below of 0.79 and conclude that the naive model is better or that they are even comparable.
Notice the f1-score for Churn = ‘Yes’ is 0.83 versus the previous 0.57 which is a 49 percent increase.
Given 1,000,000 customers, let’s assume that 10% of customers churn monthly. If we correctly identified 83% of the customers that were about to churn instead of 53% and prevented them from churning, the cost of replacing the difference in churned customers would be $1,354,600 instead of prevention costs of $28,600 PER MONTH.
Some estimate costs for churn to be as much as $65 million per month for top telecom companies.
Understanding the business context here makes choosing the preferred model easy given the disparate costs of acquisition and retention even with taking into account the expenses for retention for false negatives.
Therefore, weighting for ‘1’ should be considerably higher when calculating the model’s overall accuracy.
How can we further improve the model?