Science, Technologies, Innovations №3(31) 2024, 114-126 р

http://doi.org/10.35668/2520-6524-2024-3-13

Isaienkov Ya. O. — Postgraduate Student, Faculty of Intelligent Information Technologies and Automation, Vinnytsia National Technical University, 95, Khmelnytsky Highway, Vinnytsia, Ukraine, 21021; +38 (095) 145-96-85; yisaienkov@gmail.com; ORCID: 0009-0005-5629-0021

Mokin O. B. — D. Sc. in Engineering, Professor of the Department of System Analysis and Information Technologies, Vinnytsia National Technical University, 95, Khmelnytsky Highway, Vinnytsia, Ukraine, 21021; +38 (067) 785-98-44; abmokin@gmail.com; ORCID: 0000-0002-9277-3312

METHOD FOR EVALUATING PARTIALLY GENERATED DATA

Abstract. Generative models, such as autoencoders, generative adversarial networks, and diffusion models, have become an integral part of innovation in various fields in recent years, including art, design, medicine, and more. Due to their ability to create new data samples, they open broad opportunities for automation and process improvement. However, assessing the quality of generated data remains a challenging task, as traditional methods do not always adequately reflect the diversity and realism of the generated samples. This is particularly true for partial data generation, where changes are applied only to specific parts of an image, significantly complicating the assessment of their quality.
This work examines various approaches to evaluating generative models, including automatic metrics such as Inception Score and Fréchet Inception Distance, precision, recall, density, and coverage, as well as a human-in-the-loop method such as HYPE. While these metrics have proven effective in evaluating the results of traditional generation, their use in the case of partially generated data may be inappropriate due to their limitations.
To address this issue, the paper proposes a new method for evaluating partially generated data that involves the human factor. This method is based on analysing transformed images by users, who identify the areas that have been altered, and evaluates their quality using precision, recall, and F1-score metrics by seeking intersections between actual altered areas and those selected by users using IoU. The proposed approach provides a more objective assessment of the realism and quality of generated image fragments during transformations.
A practical example of applying the developed method is presented using a dataset of panoramic dental images, where the quality of three models was evaluated: 1) a GAN based on a U-generator; 2) the same model with post-processing of the output image and segmentation mask; and 3) a self-validated GAN. The evaluation was performed by 30 individuals. The average F1-scores for these models were 0,78, 0,27, and 0,20, respectively. Since lower F1-scores in this case indicate better results (the more accurately users identified the transformations, the worse the model performed), the best model by this metric is the self-validated GAN, which is also supported by subjective assessments mentioned in the authors’ work.

Keywords: augmentation, data generation, generative adversarial network, GAN, computer vision, deep learning, self-validated GAN, evaluation, neural networks.

REFERENCES

  1. Kramer, M. A. (1991). Nonlinear principal component analysis using autoassociative Neural Networks. AIChE Journal. 37 ( 2), 233–243. DOI: https://doi.org/10.1002/aic.690370209.
  2. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D.,Ozair, S., Courville, A., & Bengio, Y. et al (2020). Generative Adversarial Networks. Communications of the ACM. 63(11), 139–144. DOI: https://doi.org/10.1145/3422622.
  3. Chen, M., Mei, S., Fan, J., & Wang, M. (2024). An overview of diffusion models: Applications, guided generation, statistical rates and optimization. arXiv e-prints. DOI: https://doi.org/10.48550/arXiv.2404.07771.
  4. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training GANs. NIPS’16: Proceedings of the 30th International Conference on Neural Information Processing Systems. P. 2234–2242. DOI: https://doi.org/10.48550/arXiv.1606.03498.
  5. Szegedy, C., Wei Liu, Yangqing Jia, Sermanet, P., Reed, S., & Anguelov, et al. (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: https://doi.org/10.1109/cvpr.2015.7298594.
  6. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., & Ma, S. et al (2015). Imagenet Large Scale Visual Recognition Challenge. International Journal of Computer Vision. 115, 211–252. DOI: https://doi.org/10.1007/s11263-015-0816-y.
  7. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2018). GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems. P. 6629–6640. DOI: https://doi.org/10.48550/arXiv.1706.08500.
  8. Sajjadi, M. S. M., Bachem, O., Lucic, M., Bousquet, O., & Gelly, S. (2018). Assessing generative models via precision and recall. Advances in Neural Information Processing Systems. P. 5228–5237.
  9. Isaienkov, Ya. O., & Mokin, O. B. (2022). Analiz heneratyvnykh modelei hlybokoho navchannia ta osoblyvostei yikh realizatsii na prykladi WGAN [Analysis of generative deep learning models and features of their implementation on the example of WGAN]. Visnyk Vinnytskoho politekhnichnoho instytutu [Bulletin of the Vinnytsia Polytechnic Institute]. 1, 82–94. DOI: https://doi.org/10.31649/1997-9266-2022-160-1-82-94 [in Ukr.].
  10. Naeem, M. F., Oh, S. J., Uh, Y., Choi, Y., & Yoo, J. (2020). Reliable fidelity and Diversity Metrics for generative models. International Conference on Machine Learning. P. 7176–7185.
  11. Shmelkov, K., Schmid, C., & Alahari, K. (2018). How good is my gan? Lecture Notes in Computer Science. Vol. 11206. P. 218–234. DOI: https://doi.org/10.1007/978-3-030-01216-8_14.
  12. Meehan, C., Chaudhuri, K., & Dasgupta, S. (2020). A non-parametric test to detect data-copying in generative models. International Conference on Artificial Intelligence and Statistics.
  13. Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative Adversarial Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). P. 4396–4405. DOI: https://doi.org/10.1109/cvpr.2019.00453.
  14. Zhou, S., Gordon, M. L., Krishna, R., Narcomey, A., Fei-Fei, L., & Bernstein, M. S. (2019). Hype: A benchmark for human eye perceptual evaluation of Generative Models. Advances in neural information processing systems. 32.
  15. Isaienkov, Ya. O., & Mokin, O. B. (2024). Transformatsiia tsilovoho klasu dlia zadachi sehmentatsii z vykorystanniam U-GAN [Target class transformation for segmentation task using U-GAN]. Visnyk Vinnytskoho politekhnichnoho instytutu [Bulletin of the Vinnytsia Polytechnic Institute]. 172(1), 81–87. DOI: https://doi.org/10.31649/1997-9266-2024-172-1-81-87 [in Ukr.].
  16. Isaienkov, Ya. O., & Mokin, O. B. (2024). Samovalidovanyi U-GAN dlia transformatsii tsilovoho klasu v zadachakh sehmentatsii [Self-validated U-gan for target class transformation in segmentation tasks]. Visnyk Vinnytskoho politekhnichnoho instytutu [Bulletin of the Vinnytsia Polytechnic Institute]. 3, 102–111. DOI: https://doi.org/10.31649/1997-9266-2024-174-3-102-111 [in Ukr.].