For citation:
Shevtsova S. A., Saveleva M. S., Mayorova O. A., Prikhozhdenko E. S. Effect of low concentrations of hyaluronic acid on the structure of whey protein isolate during conjugation: Development and optimization of machine learning models based on adaptive boosting for spectroscopic data analysis. Izvestiya of Saratov University. Physics , 2025, vol. 25, iss. 3, pp. 305-315. DOI: 10.18500/1817-3020-2025-25-3-305-315, EDN: KWEXHY
Effect of low concentrations of hyaluronic acid on the structure of whey protein isolate during conjugation: Development and optimization of machine learning models based on adaptive boosting for spectroscopic data analysis
Background and Objectives: Multicomponent mixtures with bioactive compounds, such as hyaluronic acid (HA) in protein matrices, are critical in pharmaceuticals, nutraceuticals, and cosmetics. However, detecting low-concentration additives (e.g., 0.1–0.5 wt.% HA in whey protein isolate, WPI) remains challenging due to signal interference and matrix complexity. Raman spectroscopy (RS) is a powerful tool for such analyses, but interpreting spectral data requires advanced computational methods. This study leverages adaptive boosting (AdaBoost), an ensemble ML algorithm, to (1) classify WPI-HA mixtures by HA concentration, (2) quantify HA content via regression, and (3) determine the minimal training dataset size needed for robust predictions. Materials and Methods: WPI (5 wt.%) was mixed with HA (0.1, 0.25, 0.5 wt.%) in saline, dialyzed, and dried into thin films. Renishaw inVia spectrometer equipped with a 532 nm laser was implemented to collect 600 spectra/sample (20×30-point maps). Preprocessing included cosmic-ray removal, baseline correction, and L2 normalization. AdaBoost models (scikit-learn) were optimized via GridSearchCV (hyperparameters: DecisionTree max_depth, 1–3; n_estimators, 50–350). Performance was tested across training set sizes (50–500 spectra/sample). Metrics included accuracy (classification) and R2/RMSE (regression). Results: Optimization: 325 DecisionTrees with max_depth = 3 have been found to be the best hyperparameters of AdaBoost. Classification: 50 spectra/sample have achieved 94.5% accuracy; 200/300 spectra have improved this to 97.9%/98.3%, respectively. The models have reliably distinguished WPI + 0.1% HA from WPI (>96% accuracy). Regression: 300 spectra/sample have yielded optimal results (R2 = 0.910, RMSE = 0.061%). Larger datasets (400–500 spectra) have reduced performance (R2 = 0.894), suggesting overfitting. Key bands for analysis: 763 cm–1 (tryptophan), 1003 cm–1 (phenylalanine), and 1240 cm–1 (amide III). Bands at 1450–1667 cm–1 (C–H/amide I/II) have shown negligible importance, indicating minimal HA-induced changes. Conclusion: AdaBoost models efficiently analyze trace HA in WPI with small training datasets (200 spectra for classification, 300 for regression). The method precision and speed make it ideal for industrial applications, while identified spectral markers have deepen understanding of HAprotein interactions. Future work could extend this framework to other multicomponent systems with low analyte concentrations.
- Vaou N., Stavropoulou E., Voidarou C., Tsakris Z., Rozos G., Tsigalou C., Bezirtzoglou E. Interactions between medical plant-derived bioactive compounds: Focus on antimicrobial combination effects. Antibiotics, 2022, vol. 11, iss. 8, art. 1014. https://doi.org/10.3390/antibiotics11081014
- Mehta N., Kumar P., Verma A. K., Umaraw P., Kumar Y., Malav O. P., Sazili A. Q., Domínguez R., Lorenzo J. M. Microencapsulation as a noble technique for the application of bioactive compounds in the food Industry: A comprehensive review. Appl. Sci., 2022, vol. 12, no. 3, art. 1424. https://doi.org/10.3390/app12031424
- Senthilkumar K., Vijayalakshmi A., Jagadeesan M., Somasundaram A., Pitchiah S., Gowri S. S., Alharbi S. A., Ansari M. J., Ramasamy P. Preparation of self-preserving personal care cosmetic products using multifunctional ingredients and other cosmetic ingredients. Sci. Rep., 2024, vol. 14, no. 1, art. 19401. https://doi.org/10.1038/s41598-024-57782-9
- Saletnik A., Saletnik B., Puchalski C. Overview of Popular Techniques of Raman Spectroscopy and Their Potential in the Study of Plant Tissues. Molecules, 2021, vol. 26, no. 6, art. 1537. https://doi.org/10.3390/molecules26061537
- Rebrosova K., Samek O., Kizovsky M., Bernatova S., Hola V., Ruzicka F. Raman spectroscopy – A novel method for identification and characterization of microbes on a single-cell level in clinical settings. Front. Cell. Infect. Microbiol., 2022, vol. 12, art. 866463. https://doi.org/10.3389/fcimb.2022.866463
- Pezzotti G. Raman spectroscopy in cell biology and microbiology. J. Raman Spectrosc., 2021, vol. 52, no. 12, pp. 2348–2443. https://doi.org/10.1002/jrs.6204
- Kočišová E., Kuižová A., Procházka M. Analytical applications of droplet deposition Raman spectroscopy. Analyst, 2024, vol. 149, iss. 12, pp. 3276–3287. https://doi.org/10.1039/D4AN00336E
- Dodo K., Fujita K., Sodeoka M. Raman Spectroscopy for Chemical Biology Research. J. Am. Chem. Soc., 2022, vol. 144, no. 43, pp. 19651–19667. https://doi.org/10.1021/jacs.2c05359
- Koronaki E. D., Kaven L. F., Faust J. M., Kevrekidis I. G., Mitsos A. Nonlinear manifold learning determines microgel size from Raman spectroscopy. AIChE J., 2024, vol. 70, no. 10, art. e18494. https://doi.org/10.1002/aic.18494
- Zhang Y., Gao P., Zhang N., Hong H., Ruan J., Gao X. Efficient detection of specific pharmaceutical components in compound medications based on Raman spectroscopy. Opt. Commun., 2025, vol. 577, art. 131470. https://doi.org/10.1016/j.optcom.2024.131470
- Sun Y., Tang H., Zou X., Meng G., Wu N. Raman spectroscopy for food quality assurance and safety monitoring: A review. Curr. Opin. Food Sci., 2022, vol. 47, art. 100910. https://doi.org/10.1016/j.cofs.2022.100910
- Fernández-Manteca M. G., Ocampo-Sosa A. A., de Alegría-Puig C. R., Roiz M. P., Rodríguez-Grande J., Madrazo F., Calvo J., Rodríguez-Cobo L., López-Higuera J. M., Fariñas M. C., Cobo A. Automatic classification of Candida species using Raman spectroscopy and machine learning. Spectrochim. Acta Part A Mol. Biomol. Spectrosc., 2023, vol. 290, art. 122270. https://doi.org/10.1016/j.saa.2022.122270
- Guo F., Yang X., Zhang Z., Liu S., Zhang Y., Wang H. Rapid Raman spectroscopy analysis assisted with machine learning: A case study on Radix Bupleuri. J. Sci. Food Agric., 2025, vol. 105, iss. 4, pp. 2412–2419. https://doi.org/10.1002/jsfa.14012
- Tang J.-W., Li F., Liu X., Wang J. T., Xiong X. S., Lu X. Y., Zhang X.-Y., Si Y.-T., Umar Z., Tay A. C. Y., Marshall B. J., Yang W.-X., Gu B., Wang L. Detection of Helicobacter pylori Infection in Human Gastric Fluid Through Surface-Enhanced Raman Spectroscopy Coupled With Machine Learning Algorithms. Lab. Investig., 2024, vol. 104, iss. 2, art. 100310. https://doi.org/10.1016/j.labinv.2023.100310
- Freund Y., Schapire R. E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci., 1997, vol. 55, no. 1, pp. 119–139. https://doi.org/10.1006/jcss.1997.1504
- Zhu J., Zou H., Rosset S., Hastie T. Multi-class adaboost. Stat. Interface, 2009, vol. 2, no. 3, pp. 349–360. https://doi.org/10.4310/SII.2009.v2.n3.a8
- Wang P., Li Y., Wang K., Qu H. Research on the application of ensemble learning methods for rapid diagnosis of osteoarthritis. Ensemble learning-assisted rapid diagnosis methods. Practical research on the application of serum Raman spectroscopy combined with ensemble learning methods. In: ICBAR’24: Proceedings of the 2024 4th International Conference on Big Data, Artificial Intelligence and Risk Management. New York, ACM, 2024, pp. 421–427. https://doi.org/10.1145/3718751.3718818
- Poth M., Magill G., Filgertshofer A., Popp O., Großkopf T. Extensive evaluation of machine learning models and data preprocessings for Raman modeling in bioprocessing. J. Raman Spectrosc., 2022, vol. 53, no. 9, pp. 1580–1591. https://doi.org/10.1002/jrs.6402
- Mishra D. P., Gupta H. K., Saajith G., Bag R. Optimizing heart disease prediction model with gridsearch CV for hyperparameter tuning. In: 2024 1st International Conference on Cognitive, Green and Ubiquitous Computing (IC–CGU). IEEE, 2024, pp. 1–6. https://doi.org/10.1109/IC-CGU58078.2024.10530772
- Muzayanah R., Pertiwi D. A. A., Ali M., Muslim M. A. Comparison of gridsearchcv and bayesian hyperparameter optimization in random forest algorithm for diabetes prediction. J. Soft Comput. Explor., 2024, vol. 5, no. 1, pp. 86–91. https://doi.org/10.52465/joscex.v5i1.308
- Kurniasih A., Previana C. N. Implementation of Grid-SearchCV to find the best hyperparameter combination for classification model flgorithm in predicting water potability. J. Artif. Intell. Eng. Appl., 2025, vol. 4, no. 2, pp. 1174–1182. https://doi.org/10.59934/jaiea.v4i2.844
- Rajput D., Wang W.-J., Chen C.-C. Evaluation of a decided sample size in machine learning applications. BMC Bioinformatics, 2023, vol. 24, no. 1, art. 48. https://doi.org/10.1186/s12859-023-05156-9
- Ramezan C. A., Warner T. A., Maxwell A. E., Price B. S. Effects of training set size on supervised machine-learning land-cover classification of large-area high-resolution remotely sensed data. Remote Sens., 2021, vol. 13, iss. 3, art. 368. https://doi.org/10.3390/rs13030368
- Stahlschmidt S. R., Ulfenborg B., Synnergren J. Multimodal deep learning for biomedical data fusion: A review. Brief. Bioinform., 2022, vol. 23, iss. 2, art. bbab569. https://doi.org/10.1093/bib/bbab569
- Bates F., Busato M., Piletska E., Whitcombe M. J., Karim K., Guerreiro A., del Valle M., Giorgetti A., Piletsky S. Computational design of molecularly imprinted polymer for direct detection of melamine in milk. Sep. Sci. Technol., 2017, vol. 52, iss. 8, pp. 1441–1453. https://doi.org/10.1080/01496395.2017.1287197
- Lu Y., Xia Y., Liu G., Pan M., Li M., Lee N. A., Wang S. A Review of methods for detecting melamine in food samples. Crit. Rev. Anal. Chem., 2017, vol. 47, iss. 1, pp. 51–66. https://doi.org/10.1080/10408347.2016.1176889
- Einkamerer O. B., Ferreira A. V., Fair M. D., Hugo A. The effect of dietary non-protein nitrogen content on the meat quality of finishing lambs. S. Afr. J. Anim., 2024, vol. 54, no. 3, pp. 340–357. https://doi.org/10.4314/sajas.v54i3.05
- Alizadeh Sani M., Jahed-Khaniki G., Ehsani A., Shariatifar N., Hadi Dehghani M., Hashemi M., Hosseini H., Abdollahi M., Hassani S., Bayrami Z., McClements D. J. Metal-organic framework fluorescence sensors for rapid and accurate detection of melamine in milk powder. Biosensors, 2023, vol. 13, no. 1, art. 94. https://doi.org/10.3390/bios13010094
- Lukacs M., Zaukuu J. L. Z., Bazar G., Pollner B., Fodor M., Kovacs Z. Comparison of multiple NIR spectrometers for detecting low-concentration nitrogen-based adulteration in protein powders. Molecules, 2024, vol. 29, no. 4, art. 781. https://doi.org/10.3390/molecules29040781
- Lukacs M., Bazar G., Pollner B., Henn R., Kirchler C. G., Huck C. W., Kovacs Z. Near infrared spectroscopy as an alternative quick method for simultaneous detection of multiple adulterants in whey protein-based sports supplement. Food Control, 2018, vol. 94, pp. 331–340. https://doi.org/10.1016/j.foodcont.2018.07.004
- Marinho A., Nunes C., Reis S. Hyaluronic acid: A key ingredient in the therapy of inflammation. Biomolecules, 2021, vol. 11, no. 10, art. 1518. https://doi.org/10.3390/biom11101518
- Yasin A., Ren Y., Li J., Sheng Y., Cao C., Zhang K. Advances in hyaluronic acid for biomedical applications. Front. Bioeng. Biotechnol., 2022, vol. 10, art. 910290. https://doi.org/10.3389/fbioe.2022.910290
- Juncan A. M., Moisă D. G., Santini A., Morgovan C., Rus L. L., Vonica-Țincu A. L., Loghin F. Advantages of hyaluronic acid and Its combination with other bioactive ingredients in cosmeceuticals. Molecules, 2021, vol. 26, no. 15, art. 4429. https://doi.org/10.3390/molecules26154429
- Iaconisi G. N., Lunetti P., Gallo N., Cappello A. R., Fiermonte G., Dolce V., Capobianco L. Hyaluronic Acid: A powerful biomolecule with wide-ranging applications – A comprehensive review. Int. J. Mol. Sci., 2023, vol. 24, no. 12, art. 10296. https://doi.org/10.3390/ijms241210296
- Wang N., Zhao X., Jiang Y., Ban Q., Wang X. Enhancing the stability of oil-in-water emulsions by non-covalent interaction between whey protein isolate and hyaluronic acid. Int. J. Biol. Macromol., 2023, vol. 225, pp. 1085–1095. https://doi.org/10.1016/j.ijbiomac.2022.11.170
- Zhong W., Li C., Diao M., Yan M., Wang C., Zhang T. Characterization of interactions between whey protein isolate and hyaluronic acid in aqueous solution: Effects of pH and mixing ratio. Colloid. Surf. B: Biointerfaces, 2021, vol. 203, art. 111758. https://doi.org/10.1016/j.colsurfb.2021.111758
- Zhong W., Zhang T., Dong C., Li J., Dai J., Wang C. Effect of sodium chloride on formation and structure of whey protein isolate/hyaluronic acid complex and its ability to loading curcumin. Colloid. Surf. A: Physicochem. Eng. Asp., 2022, vol. 632, art. 127828. https://doi.org/10.1016/j.colsurfa.2021.127828
- Zhong W., Li J., Wang C., Zhang T. Formation, stability and in vitro digestion of curcumin loaded whey protein/hyaluronic acid nanoparticles: Ethanol desolvation vs. pH-shifting method. Food Chem., 2023, vol. 414, art. 135684. https://doi.org/10.1016/j.foodchem.2023.135684
- Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay É. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res., 2011, vol. 12, iss. 85, pp. 2825–2830. Available at: http://jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf (accessed April 22, 2025).
- Zhao Y., Ma C. Y., Yuen S. N., Phillips D. L. Study of succinylated food proteins by Raman spectroscopy. J. Agric. Food Chem., 2004, vol. 52, iss. 7, pp. 1815–1823. https://doi.org/10.1021/jf030577a
- Mayorova O. A., Saveleva M. S., Bratashov D. N., Prikhozhdenko E. S. Combination of machine learning and Raman spectroscopy for determination of the complex of whey protein isolate with hyaluronic acid. Polymers, 2024, vol. 16, no. 5, art. 666. https://doi.org/10.3390/polym16050666
- Breiman L. Random Forests. Mach. Learn., 2001, vol. 45, pp. 5–32. https://doi.org/10.1023/A:1010933404324
- Becker T., Rousseau A. J., Geubbelmans M., Burzykowski T., Valkenborg D. Decision trees and random forests. Am. J. Orthod. Dentofac. Orthop., 2023, vol. 164, iss. 6, pp. 894–897. https://doi.org/10.1016/j.ajodo.2023.09.011
- Sun Z., Wang G., Li P., Wang H., Zhang M., Liang X. An improved random forest based on the classification accuracy and correlation measurement of decision trees. Expert Syst. Appl., 2024, vol. 237, pt. B, art. 121549. https://doi.org/10.1016/j.eswa.2023.121549
- Friedman J. H. Greedy Function Approximation: A gradient boosting machine. Ann. Stat., 2001, vol. 29, no. 5, pp. 1189–1232. Available at: http://www.jstor.org/stable/2699986 (accessed April 22, 2025)
- Friedman J. H. Stochastic gradient boosting. Comput. Stat. Data Anal., 2002, vol. 38, iss. 4, pp. 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2
- Wang M., Zhang J. Surface enhanced Raman spectroscopy Pb2+ Ion Detection based on a gradient boosting decision tree algorithm. Chemosensors, 2023, vol. 11, no. 9, art. 509. https://doi.org/10.3390/chemosensors11090509
- 267 reads