Cost-Sensitive Learning Using Logistic Regression

abhinaya rajaram
Python in Plain English
8 min readJun 23, 2021

--

Class imbalance is one of the challenging problems for machine learning algorithms. When learning from highly imbalanced data, most classifiers are overwhelmed by the majority class examples, so the false-negative rate is always high. Researchers have introduced many methods to deal with this problem, including resampling that was discussed in my previous article, today let me show you another technique called cost-sensitive learning (CSL).

Definition : Cost-Sensitive Learning is a type of learning that takes the misclassification costs (and possibly other types of cost) into consideration. The goal of this type of learning is to minimize the total cost. The key difference between cost-sensitive learning and cost-insensitive learning is that cost-sensitive learning treats different misclassifications differently. That is, the cost for labeling a positive example as negative can be different from the cost for labeling a negative example as positive.

Fraud Detection Problem: Consider the problem of an insurance company wanting to determine whether a claim is fraudulent. Identifying good claims as fraudulent and following up with the customer is better than honouring fraudulent insurance claims.

We can see with these examples that misclassification errors are not desirable in general, but one type of misclassification is much worse than the other. Specifically predicting positive cases as a negative case is more harmful, more expensive, or worse in whatever way we want to measure the context of the target domain.

In cost-sensitive learning, a penalty is associated with an incorrect prediction and is referred to as a “cost.” The goal of cost-sensitive learning is to minimize the cost of a model on the training dataset, where it is assumed that different types of prediction errors have different and known associated costs.

We can modify existing algorithms to use the costs as a penalty for misclassification when the algorithms are trained. Given that most machine learning algorithms are trained to minimize error, the cost for misclassification is added to the error or used to weigh the error during the training process.

This approach can be used for iteratively trained algorithms, such as logistic regression & we will discuss more on this here.

Logistic regression does not support imbalanced classification directly. Instead, the training algorithm used to fit the logistic regression model must be modified to take the skewed distribution into account. This can be achieved by specifying a class weighing configuration that is used to
influence the amount that logistic regression coefficients are updated during training. The scikit-learn library provides examples of these cost-sensitive extensions via the class_weight argument on LogisticRegression. Let us try to understand using an example:

I am going to start by generating an imbalanced classification dataset. We can use the make classification() function to create a synthetic imbalanced two-class classification dataset. We will generate 10,000 examples with an approximate 1:100 minority to majority class ratio.

from sklearn.datasets import make_classification
# define dataset
X, y = make_classification(n_samples=10000, n_features=2, n_redundant=0,
n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=2)

Once generated, let us check the class distribution to make sure that the dataset was created as we expected.

# summarize class distribution
counter = Counter(y)
print(counter)

Finally, let us create a scatter plot of the examples and colour them by the class label to see what challenges lay ahead when we try to classify examples from this dataset.

# scatter plot of examples by class label
for label, _ in counter.items():
row_ix = np.where(y == label)[0]
pyplot.scatter(X[row_ix, 0], X[row_ix, 1], label=str(label))
pyplot.legend()
pyplot.show()

In the scatter plot above, we see the large mass of examples for the majority class (orange) and a small number of examples for the minority class (blue), with some class overlap.

Let us see the evaluated standard logistic regression on the imbalanced classification problem using the below code and take a look at the Mean ROC AUC:

# fit a logistic regression model on an imbalanced classification dataset
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
# generate dataset
X, y = make_classification(n_samples=10000, n_features=2, n_redundant=0,
n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=2)
# define model
model = LogisticRegression(solver='lbfgs')
# define evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring='roc_auc', cv=cv, n_jobs=-1)
# summarize performance
print('Mean ROC AUC: %.3f' % mean(scores))

A mean score of 0.985, huh? Not bad!

We now have a baseline and the model does have the skill as it’s achieved a ROC AUC above 0.5. Moving on to Logistic Regression.

The coefficients of the logistic regression algorithm are fit using an optimization algorithm that minimizes the negative log-likelihood (loss) for the model on the training dataset. This involves the repeated use of the model to make predictions followed by an adaptation of the coefficients in a direction that reduces the loss of the model. The calculation of the loss for a given set of coefficients can be modified to take the class balance into account. By default, the errors for each class may be considered to have the same weighting, say 1.0. These weightings can be adjusted based on the importance of each class.

“The weighting is applied to the loss so that smaller weight values result in a smaller error value, and in turn, less update to the model coefficients. A larger weight value results in a larger error calculation, and in turn, more update to the model coefficients.”

Pretty straightforward, isn’t it? The challenging bit is really in the choice of weight in each class.

A best practice for using class weighing is to use the inverse of the class distribution present in the training dataset. For example, the class distribution of the test dataset is a 1:100 ratio for the minority class to the majority class. The inversion of this ratio could be used with 1 for the majority class and 100 for the minority class. See below:

# weighted logistic regression model on an imbalanced classification dataset
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
# generate dataset
X, y = make_classification(n_samples=10000, n_features=2, n_redundant=0,
n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=2)
# define model
weights = {0:0.01, 1:1.0}
model = LogisticRegression(solver='lbfgs', class_weight=weights)
# define evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring='roc_auc', cv=cv, n_jobs=-1)
# summarize performance
print('Mean ROC AUC: %.3f' % mean(scores))

We now get a better score than the unweighted version of logistic regression, 0.989 as compared to 0.985.

The scikit-learn library provides an implementation of the best practice heuristic for the class weighing. It is implemented via the compute class weight() function and is calculated as: n samples/n classes *n samples with class

# generate dataset
X, y = make_classification(n_samples=10000, n_features=2, n_redundant=0,
n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=2)
# define model
model = LogisticRegression(solver='lbfgs', class_weight='balanced')
# define evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate model
scores = cross_val_score(model, X, y, scoring='roc_auc', cv=cv, n_jobs=-1)
# summarize performance
print('Mean ROC AUC: %.3f' % mean(scores))

# calculate class weighting
# calculate heuristic class weighting
from sklearn.utils.class_weight import compute_class_weight
from sklearn.datasets import make_classification
from sklearn.datasets import make_classification
# generate 2 class dataset
X, y = make_classification(n_samples=10000, n_features=2, n_redundant=0,
n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=2)
weighting = compute_class_weight('balanced', [0,1], y)
# calculate heuristic class weighting
from sklearn.utils.class_weight import compute_class_weight
print(weighting)

We obtain a weighting of about 0.5 for class 0 and a weighting of 50 for class 1. The values also match our heuristic calculation above for inverting the ratio of the class distribution in the training dataset; for example:

0.5 : 50 =1 : 100

Running the example gives the same mean ROC AUC as we achieved by specifying the inverse class ratio manually.

Using a class weighing that is the inverse ratio of the training data is just a heuristic. Let us try to get a better performance using a different class weighing using grid search & using repeated cross-validation to return the best value in the given choices, our choice that will give the best ROC AUC score. We will try the following weighings for classes 0 and 1:

ˆ Class 0:100, Class 1:1.

ˆ Class 0:10, Class 1:1.

ˆ Class 0:1, Class 1:1.

ˆ Class 0:1, Class 1:10.

ˆ Class 0:1, Class 1:100.

# grid search class weights with logistic regression for imbalance classification
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
# generate dataset
X, y = make_classification(n_samples=10000, n_features=2, n_redundant=0,
n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=2)
# define model
model = LogisticRegression(solver='lbfgs')
# define grid
balance = [{0:100,1:1}, {0:10,1:1}, {0:1,1:1}, {0:1,1:10}, {0:1,1:100}]
param_grid = dict(class_weight=balance)
# define evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# define grid search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=cv,
scoring='roc_auc')
# execute the grid search
grid_result = grid.fit(X, y)
# report the best configuration
print('Best: %f using %s' % (grid_result.best_score_, grid_result.best_params_))
# report all configurations
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print('%f (%f) with: %r' % (mean, stdev, param))

In this case, we can see that the 1:100 majority to minority class weighing achieved the best mean ROC score. This matches the configuration for the general heuristic. It might be interesting to explore even more severe class weighings to see their effect on the mean ROC AUC score.

Summary

We have understood how logistic regression can be modified to weight model error by class weight when fitting the coefficients. We have also learnt how to configure class weight for logistic regression and how to grid search different class weight configurations.

Thank you for reading.

More content at plainenglish.io

--

--