A Review of Trustworthy Machine Learning

doi:10.3969/j.issn.1007-7375.230241

Abstract

Abstract: Machine learning technology is continuously evolving and is extensively applied across various domains, demonstrating capabilities beyond human abilities. However, improper use of machine learning methods or biased decision-making can harm human interests, especially in sensitive areas with high-security demand such as finance and healthcare, etc., leading to an increasing attention on the trustworthiness of machine learning. Currently, machine learning technology commonly exhibits several drawbacks, such as biases against underrepresented groups, lack of user privacy protection, lack of model interpretability, and vulnerability to threats and attacks. These shortcomings undermine human trust in machine learning methods. Although researchers have conducted targeted studies on these issues, there is a lack of a comprehensive framework and methodology to systematically provide trustworthy analysis of machine learning. Therefore, this paper reviews the current mainstream definitions, indicators, methods, and evaluations of fairness, interpretability, robustness, and privacy in machine learning. Then, the relationships among these elements are discussed, while a trustworthy machine learning framework is established by integrating an entire lifecycle of machine learning. Finally, we present some of the current issues and challenges awaiting resolution in the field of trustworthy machine learning.

Key words: trustworthy machine learning, fairness, interpretability, robustness, privacy

CLC Number:

CHEN Caihua, SHE Chengxi, WANG Qingyang. A Review of Trustworthy Machine Learning[J]. Industrial Engineering Journal, 2024, 27(2): 14-26.

References

[1] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
[2] SHE C, LI K, REN Y, et al. Tool wear prediction method based on bidirectional long short-term memory neural network of single crystal silicon micro-grinding[J]. The International Journal of Advanced Manufacturing Technology, 2023: 1-11.
[3] ESHETE B. Making machine learning trustworthy[J]. Science, 2021, 3736556: 743-744
[4] THURAISINGHAM B. Trustworthy machine learning[J]. IEEE Intelligent Systems, 2022, 371: 21-24
[5] DAI E, ZHAO T, ZHU H, et al. A comprehensive survey on trustworthy graph neural networks: Privacy, robustness, fairness, and explainability[EB/OL]. (2022-09-27) [2024-03-13].https://doi.org/10.48550/arXiv.2204.08570.
[6] SCHLEGEL M, SATTLER K. Management of machine learning lifecycle artifacts: A survey[J]. ACM SIGMOD Record, 2023, 51(4): 18-35
[7] SURESH H, GUTTAG J V. A framework for understanding unintended consequences of machine learning [EB/OL]. (2021-12-01) [2024-03-13]. https://doi.org/10.1145/3465416.3483305.
[8] HARDT M, PRICE E, SREBRO N. Equality of opportunity in supervised learning[EB/OL]. (2016-10-7) [2024-03-13]. https://doi.org/10.48550/arXiv.1610.02413.
[9] MEHRABI N, MORSTATTER F, SAXENA N, et al. A survey on bias and fairness in machine learning[J]. ACM Computing Surveys, 2021, 54(6): 1-35
[10] 纪守领, 李进锋, 杜天宇, 等. 机器学习模型可解释性方法、应用与安全研究综述[J]. 计算机研究与发展, 2019, 56(10): 2071-2096
JI Shouling, LI Jinfeng, DU Tianyu, et al. Survey on techniques, applications and security of machine learning interpretability[J]. Journal of Computer Research and Development, 2019, 56(10): 2071-2096
[11] NASR M, SHOKRI R, HOUMANSADR A. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning[C]//2019 IEEE symposium on security and privacy (SP). San Francisco: IEEE, 2019: 739-753.
[12] LIU HY, CHAUDHARY M, WANG HH. Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives[EB/OL]. (2023-07-31) [2024-03-13]. https://doi.org/10.48550/arXiv.2307.16851.
[13] ZHOU Z. Machine learning[M]. New York: Springer Nature, 2021.
[14] 纪守领, 杜天宇, 李进锋, 等. 机器学习模型安全与隐私研究综述[J]. 软件学报, 2021, 32(1): 41-67
JI Shouling, DU Tianyu, LI Jinfeng, et al. Security and privacy of machine learning models: a survey[J]. Journal of Software, 2021, 32(1): 41-67
[15] D’AMOUR A, SRINIVASAN H, ATWOOD J, et al. Fairness is not static: deeper understanding of long term fairness via simulation studies[C]//Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. New York: Association for Computing Machinery, 2020: 525-534.
[16] CORBETT-DAVIES S, GOEL S. The measure and mismeasure of fairness: A critical review of fair machine learning[EB/OL]. (2023-08-14) [2024-03-13]. https://doi.org/10.48550/arXiv.1808.00023.
[17] CHOULDECHOVA A. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments[J]. Big data, 2017, 52: 153-163
[18] KLEINBERG J, MULLAINATHAN S, RAGHAVAN M. Inherent trade-offs in the fair determination of risk scores[EB/OL]. (2016-11-17) [2024-03-13]. https://doi.org/10.48550/arXiv.1609.05807.
[19] SILVA S, KENNEY M. Algorithms, platforms, and ethnic bias[J]. Communications of the ACM, 2019, 62(11): 37-39
[20] D’ALESSANDRO B, O’NEIL C, LAGATTA T. Conscientious classification: A data scientist’s guide to discrimination-aware classification[J]. Big data, 2017, 52: 120-134
[21] FRIEDMAN B, NISSENBAUM H. Bias in computer systems[J]. ACM Transactions on Information Systems, 1996, 143: 330-347
[22] 陈晋音, 陈奕芃, 陈一鸣, 等. 面向深度学习的公平性研究综述[J]. 计算机研究与发展, 2021, 58(2): 264-280
CHEN Jinyin, CHEN Yipeng, CHEN Yiming. Fairness research on deep learning[J]. Journal of Computer Research and Development, 2021, 58(2): 264-280
[23] DU M, LIU N, YANG F, et al. Learning credible deep neural networks with rationale regularization[C]//2019 IEEE International Conference on Data Mining (ICDM). Piscataway: IEEE, 2019: 150-159.
[24] IBRAHIM M, LOUIE M, MODARRES C, et al. Global explanations of neural networks: Mapping the landscape of predictions[C]//Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. New York: ACM, 2019: 279-287.
[25] SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing properties of neural networks[EB/OL]. (2014-02-19) [2024-03-13]. https://doi.org/10.48550/arXiv.1312.6199.
[26] VON WRIGHT G H. Explanation and understanding[M]. Ithaca: Cornell University Press, 2004.
[27] GILPIN L H, BAU D, YUAN B Z, et al. Explaining explanations: An overview of interpretability of machine learning[C]//2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). Piscataway: IEEE, 2018: 80-89.
[28] 陈珂锐, 孟小峰. 机器学习的可解释性[J]. 计算机研究与发展, 2021, 57(9): 1971-1986
CHEN Kerui, MENG Xiaofeng. Interpretation and understanding in machine learning[J]. Journal of Computer Research and Development, 2021, 57(9): 1971-1986
[29] CHEN Z, XIAO F, GUO F, et al. Interpretable machine learning for building energy management: A state-of-the-art review[J]. Advances in Applied Energy, 2023: 100123.
[30] RIBEIRO M T, SINGH S, GUESTRIN C. “Why should i trust you?” Explaining the predictions of any classifier[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 1135-1144.
[31] LUNDBERG S M, ERION G, CHEN H, et al. From local explanations to global understanding with explainable AI for trees[J]. Nature Machine Intelligence, 2020, 21: 56-67
[32] YOSINSKI J, CLUNE J, NGUYEN A, et al. Understanding neural networks through deep visualization[EB/OL]. (2015-06-22) [2024-03-13]. https://doi.org/10.48550/arXiv.1506.06579.
[33] LI X, CAO C, SHI Y, et al. A survey of data-driven and knowledge-aware explainable AI[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 341: 29-49
[34] BASTANI O, KIM C, BASTANI H. Interpreting blackbox models via model extraction[EB/OL]. (2019-01-24) [2024-03-13]. https://doi.org/10.48550/arXiv.1705.08504.
[35] DOSOVITSKIY A, BROX T. Inverting visual representations with convolutional networks[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 4829-4837.
[36] CHATTERJEE S, HADI A S. Sensitivity analysis in linear regression[M]. Hoboken: John Wiley & Sons, 2009.
[37] LUNDBERG S M, LEE S. A unified approach to interpreting model predictions[J]. Advances in Neural Information Processing Systems, 2017, 30: 4768-4777.
[38] KAUR D, USLU S, RITTICHIER K J, et al. Trustworthy artificial intelligence: a review[J]. ACM Computing Surveys, 2022, 552: 1-38
[39] GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[EB/OL]. (2015-05-20) [2024-03-13]. https://doi.org/10.48550/arXiv.1412.6572.
[40] SALMAN H, LI J, RAZENSHTEYN I, et al. Provably robust deep learning via adversarially trained smoothed classifiers[J]. Advances in Neural Information Processing Systems, 2019, 32: 11292-11303
[41] KATZ G, BARRETT C, DILL D L, et al. Reluplex: An efficient SMT solver for verifying deep neural networks[C]//Computer Aided Verification: 29th International Conference, CAV 2017. Heidelberg: Springer, 2017: 97-117.
[42] CHENG CH, NüHRENBERG G, RUESS H. Maximum resilience of artificial neural networks[C]//Automated Technology for Verification and Analysis: 15th International Symposium, ATVA 2017. Heidelberg: Springer, 2017: 251-268.
[43] WONG E, KOLTER Z. Provable defenses against adversarial examples via the convex outer adversarial polytope[C]//International Conference on Machine Learning. Maastricht: ML Research Press, 2018: 5286-5295.
[44] PULINA L, TACCHELLA A. An abstraction-refinement approach to verification of artificial neural networks[C]//Computer Aided Verification: 22nd International Conference, CAV 2010. Heidelberg: Springer, 2010: 243-257.
[45] LECUYER M, ATLIDAKIS V, GEAMBASU R, et al. Certified robustness to adversarial examples with differential privacy[C]//2019 IEEE symposium on security and privacy (SP) . San Francisco: IEEE, 2019: 656-672.
[46] GOWAL S, DVIJOTHAM K D, STANFORTH Robert, et al. Scalable verified training for provably robust image classification[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Seoul: IEEE, 2019: 4842-4851.
[47] MANGAL R, NORI A V, ORSO A. Robustness of neural networks: A probabilistic and practical approach[C]//2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER) . Montreal: IEEE, 2019: 93-96.
[48] ŠRNDIC N, LASKOV P. Detection of malicious pdf files based on hierarchical document structure[C/OL]//Proceedings of the 20th annual network & distributed system security symposium. (2014-04-24) [2024-03-13]. https://www.ndss-symposium.org/wp-content/uploads/2017/09/12_3_0.pdf.
[49] WITTEL G L, WU S F. On Attacking Statistical Spam Filters[C/OL]//International Conference on Email and Anti-Spam. (2004-01-24) [2024-03-13]. https://personal.utdallas.edu/~ muratk/courses/dmsec_files/170.pdf.
[50] TRAMèR F, ZHANG F, JUELS A, et al. Stealing machine learning models via prediction APIs[C]// Proceedings of the 25th USENIX Security Symposium. Austin: USENIX Association, 2016: 601-618.
[51] 纪守领, 杜天宇, 邓水光, 等. 深度学习模型鲁棒性研究综述[J]. 计算机学报, 2022, 45(1): 190-206
JI Shouling, DU Tianyu, DENG Shuiguang, et al. Robustness Certification Research on Deep Learning Models: A Survey[J]. Chinese Journal of Computers, 2022, 45(1): 190-206
[52] BULò S R, BIGGIO B, PILLAI I, et al. Randomized prediction games for adversarial machine learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 28(11): 2466-2478
[53] PAPERNOT N, MCDANIEL P, WU Xi, et al. Distillation as a defense to adversarial perturbations against deep neural networks[C]// 2016 IEEE Symposium on Security and Privacy (SP) . San Jose: IEEE, 2016: 582-597.
[54] 王科迪, 易平. 人工智能对抗环境下的模型鲁棒性研究综述[J]. 信息安全学报, 2020, 5(3): 13-22
WANG Kedi, YI Ping. A Survey on Model Robustness under Adversarial Example[J]. Journal of Cyber Security, 2020, 5(3): 13-22
[55] LING X, JI SL, ZOU JX, et al. Deepsec: A uniform platform for security analysis of deep learning model[C]//2019 IEEE Symposium on Security and Privacy (SP) . San Francisco: IEEE, 2019: 673-690.
[56] LUO B, LIU Y, WEI L, et al. Towards imperceptible and robust adversarial example attacks against neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018, 32 (1) : 1652-1659.
[57] WONG E, SCHMIDT F, KOLTER Z. Wasserstein adversarial examples via projected sinkhorn iterations[C]//International Conference on Machine Learning. Maastricht: ML Research Press, 2019: 6808-6817.
[58] 刘俊旭, 孟小峰. 机器学习的隐私保护研究综述[J]. 计算机研究与发展, 2020, 57(2): 346-362
LIU Junxu, MENG Xiaofeng. Survey on privacy-preserving machine learning[J]. Journal of Computer Research and Development, 2020, 57(2): 346-362
[59] YAO A C. How to generate and exchange secrets[C]//27th Annual Symposium on Foundations of Computer Science (SFCS 1986) . Piscataway: IEEE, 1986: 162-167.
[60] RIVEST R L, ADLEMAN L, DERTOUZOS M L. On data banks and privacy homomorphisms[J]. Foundations of Secure Computation, 1978, 411: 169-180
[61] DWORK C, MCSHERRY F, NISSIM K, et al. Calibrating noise to sensitivity in private data analysis[C]//TCC’06: Proceedings of the Third conference on Theory of Cryptography. Berlin: Springer, 2006: 265-284.
[62] CHAUDHURI K, MONTELEONI C. Privacy-preserving logistic regression[J]. Advances in Neural Information Processing Systems, 2008, 21: 289-298
[63] SONG S, CHAUDHURI K, SARWATE A D. Stochastic gradient descent with differentially private updates[C]//2013 IEEE Global Conference on Signal and Information Processing. Piscataway: IEEE, 2013: 245-248.
[64] ZHAO Y, WANG Y, DERR T. Fairness and explainability: Bridging the gap towards fair model explanations[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2023: 11363-11371.
[65] NANDA V, DOOLEY S, SINGLA S, et al. Fairness through robustness: Investigating robustness disparity in deep learning[C]//Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. New York: ACM, 2021: 466-477.
[66] JIA YZ, FRANK E, PFAHRINGER B, et al. Studying and exploiting the relationship between model accuracy and explanation quality[C]//Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference. Berlin: Springer, 2021: 699-714.
[67] BERK R, HEIDARI H, JABBARI S, et al. A convex framework for fair regression[EB/OL]. (2017-06-07) [2024-03-13]. https://doi.org/10.48550/arXiv.1706.02409.