Telco Customer Churn Deep Dive

Eric Lee
4 min readApr 7, 2021

In my previous post, I went over the significance of drilling down into the classification report of a model’s results when evaluating a model’s performance in relative terms. This was because the overall accuracy score is not fully representative and could be misleading due to class imbalances particularly when there is a massive bottom line difference between Type I and Type II errors.

We’ll start off with a couple models and go over the results to see what recommendations we can make for business functions in charge of managing customer turnover.

XGBoost

The XGBoost model with some simple preprocessing and hyperparameter optimization results in a not too shabby accuracy of 81.54720%. However, notice the low precision and recall scores for churn customers. Higher accuracy is needed for this class since the cost of losing customers is 50x the cost of retaining existing customers.

import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
df = pd.read_csv("data/WA_Fn-UseC_-Telco-Customer-Churn.xls")def encode_object_cols(df):
for col in df.columns:
if df[col].dtype == 'object':
df[col] = pd.factorize(df[col])[0]
return df
def preprocessing(df):
df.drop('customerID', axis=1, inplace=True)
df = encode_object_cols(df)
return df
X = preprocessing(df)
y = X.pop('Churn')
X_train, X_test, y_train, y_test = train_test_split(X,
y,
train_size=0.8,
test_size=0.2,
random_state=2)
params = {
'learning_rate': 0.01,
'max_depth': 5,
'n_estimators': 50,
'tree_method': 'gpu_hist', # comment out if no gpu
'random_state': 2,
}
model = XGBClassifier(**params, use_label_encoder=False)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print("Accuracy: {:.5%}".format(acc))
# Accuracy: 81.54720%

Random Forest (w/ undersampling)

Another model to try is random forest classification. Here I also used the same preprocessing method prior to undersampling as this proved more effective than oversampling using SMOTE. The accuracy on the test set improved to almost 88.5%.

# undersample majority class
np.random.seed(42)
index = X.loc[X.Churn==0].index
np.random.shuffle(list(index))
index = index[:round(len(index) * .7)]
X = X.drop(index)
y = X.pop('Churn')
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size=0.3,
random_state=42)
rf = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
rf.fit(X_train, y_train)
score = rf.score(X_train, y_train)
score2 = rf.score(X_test, y_test)
print("Training set accuracy: ", '%.5f.' % (score))
print("Test set accuracy: ", '%.5f.' % (score2))
y_pred = rf.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Notice that precision and recall jumped to averages of 0.88.

Random Forest model accuracy and evaluation

The model accuracy further improves if the majority class size is reduced further. However, this also increases the frequency of Type I and II errors for the majority class.

Feature Importance

So we’ve improved the model’s accuracy and can predict which customers will churn. So what?

Let’s use explainable AI to unveil what’s happening under the hood and translate it into communicable terms.

importances = rf.feature_importances_
std = np.std([tree.feature_importances_ for tree in rf.estimators_],
axis=0)
indices = np.argsort(importances)[::-1]
feature_names = X.columns[indices]
# Print the top 10 features ranked
print("Feature ranking:")
for f in range(10):
print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))
# Plot the impurity-based feature importances of the forest
plt.figure()
plt.title("Feature importances")
plt.bar(range(10), importances[indices][:10],
color="r", yerr=std[indices[:10]], align="center")
plt.xticks(range(10), feature_names[:10], rotation=90)
plt.xlim([-1, 10])
plt.show()

‘Total Charges’ is the most important feature followed by ‘Contract’, ‘tenure’, etc. The standard deviation of the feature importances are quite high (as shown by the black vertical lines in each red bar) meaning the variance of a particular feature importance between trees is high i.e. ‘Total Charges’ could play a much more significant role in some cases versus others.

SHAP

Using SHAP we can identify how features impacted the model’s prediction in specific cases. Below is a SHAP plot for an accurate churn prediction. In this example, ‘Total Charges’ were quite low and increased the prediction towards 1 while high tenure reduced the prediction slightly downwards.

Accurate Churn Prediction

The next plot shows an accurate non-churn prediction. The total charges again played a significant role though on the opposite side. Typically, customers with higher total charges churn less often.

Accurate non-Churn Prediction

How can we prevent churn?

--

--