Bayesian calculation of the quality of the Kullback-Leibler divergence in normal distributions

Document Type : Original Article

Authors

Department of Statistics, Payame Noor University, Tehran, Iran.

Abstract
Purpose: In statistical data analysis and modeling, assessing the similarity or divergence between two probability distributions is of great importance. One of the most widely used metrics for this purpose is the Kullback-Leibler (KL) divergence, which quantifies the informational distance between distributions. This study aims to analyze the KL divergence between two normal distributions with equal variance and to compare the performance of different estimation methods for this measure.
Methodology: In this study, the exact value of the Kullback–Leibler divergence between two normal distributions with equal variance is first analytically derived, and then three estimation methods (maximum likelihood, Bayesian, and shrinkage) are proposed to estimate this measure. The performance of each estimator is evaluated via Monte Carlo simulations using the Mean Squared Error (MSE) criterion.
Findings:  The simulation results indicate that the Bayesian estimator outperforms the MLE in terms of estimation accuracy. Furthermore, the shrinkage estimator performs best, achieving the lowest MSE among the three methods. This argument suggests that incorporating prior information or penalization techniques can significantly improve estimation quality.
Originality/Value: This study contributes to the literature by providing a detailed comparison of classical and modern estimation techniques for KL divergence in the context of normal distributions with equal variance. The novelty lies in integrating shrinkage methodology and demonstrating its superior performance, which is quantitatively validated through simulations. The findings have practical implications across fields such as machine learning, signal processing, and information theory.

Keywords


[1]   Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle. In Selected papers of hirotugu akaike. Springer New York. https://doi.org/10.1007/978-1-4612-1694-0_15
[2]   Moulin, P., & Veeravalli, V. V. (2019). Statistical inference for engineers and data scientists. Cambridge University Press. https://www.google.com/books/edition/Statistical_Inference_for_Engineers_and/xRNwDwAAQBAJ?hl=en&gbpv=0
[3]   Pardo, L. (2019). New developments in statistical information theory based on entropy and divergence measures. Entropy21(4), 391. http://dx.doi.org/10.3390/e21040391
[4]   Noh, Y. K., Sugiyama, M., Liu, S., Plessis, M. C., Park, F. C., & Lee, D. D. (2014). Bias reduction and metric learning for nearest-neighbor estimation of kullback-leibler divergence. Artificial intelligence and statistics (pp. 669–677). PMLR. https://proceedings.mlr.press/v33/noh14.html
[5]   Ji, S., Zhang, Z., Ying, S., Wang, L., Zhao, X., & Gao, Y. (2020). Kullback–Leibler divergence metric learning. IEEE transactions on cybernetics, 52(4), 2047–2058. https://ieeexplore.ieee.org/abstract/document/9151281/
[6]   Koç, S., Erden, C., Ateş, Ç., & Ceviz, E. (2024). Evaluation of potential logistics village alternatives using bayesian best-worst method. Optimality, 1(1), 100–120. https://doi.org/10.22105/opt.v1i1.35
[7]   Ugwu, D. N., Onyeagu, S. I., & Igbokwe, C. P. (2024). A new weighted T−X perks distribution: Characterization, simulation and applications. Optimality, 1(1), 66–81. https://doi.org/10.22105/opt.v1i1.48
[8]   Kaviani, M. (2025). Forecasting the return of government exchange-traded funds based on linear and nonlinear models in machine learning algorithms. Innovation management and operational strategies, 6(1), 59–69. https://doi.org/10.22105/imos.2025.493892.1415
[9]   Binette, O. (2019). A note on reverse Pinsker inequalities. IEEE transactions on information theory, 65(7), 4094–4096. https://ieeexplore.ieee.org/abstract/document/8630660/
[10] Hellinger, E. (1909). Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen. Journal für die reine und angewandte mathematik, 1909(136), 210–271. https://www.degruyter.com/document/doi/10.1515/crll.1909.136.210/pdf
[11] Ding, R., & Mullhaupt, A. (2023). Empirical squared Hellinger distance estimator and generalizations to a family of α-divergence estimators. Entropy, 25(4), 612. http://dx.doi.org/10.3390/e25040612
[12] Hastie, T. (2009). The elements of statistical learning. Springer New York. https://doi.org/10.1007/978-0-387-84858-7