Bayesian Error Rate

src.meliora.core.bayesian_error_rate(default_flag, prob_default)[source]

BER is the proportion of the whole sample that is misclassified when the rating system is in optimal use. For a perfect rating model, the BER has a value of zero. A model’s BER depends on the probability of default. The lower the BER, and the lower the classification error, the better the model. The Bayesian error rate specifies the minimum probability of error if the rating system or score function under consideration is used for a yes/no decision whether a borrower will default or not. The error can be estimated parametrically, e.g. assuming normal score distributions, or non-parametrically, for instance with kernel density estimation methods. If parametric estimation is applied, the distributional assumptions have to be carefully checked. Non-parametric estimation will be critical if sample sizes are small. In its general form, the error rate depends on the total portfolio probability of default. As a consequence, in many cases its magnitude is influenced much more by the probability of erroneously identifying a non-defaulter as a defaulter than by the probability of not detecting a defaulter. In practice, therefore, the error rate is often applied with a fictitious 50% probability of default. In this case, the error rate is equivalent to the Kolmogorov-Smirnov statistic and to the Pietra index. :param default_flag: Boolean flag indicating whether the borrower has actually defaulted :type default_flag: pandas series :param prob_default: Predicted default probability, as returned by a classifier. :type prob_default: pandas series

Returns: score – Bayesian Error Rate.
Return type: float

Examples

>>> from scipy import stats
>>> default_flag = [1, 0, 0, 1, 1]
>>> prob_default = [0.01, 0.04, 0.07, 0.11, 0]
>>> bayesian_error_rate(default_flag, prob_default)
-0.47140452079103173