Подход к оценке состояний внимания и проектирование моделей распознавания на базе нейронных сетей

Yana Nikolaevna Artamonova; Igor Mikhailovich Artamonov

doi:10.25559/SITITO.16.202002.500-509

Yana Nikolaevna Artamonova ООО "Нейрокорпус" http://orcid.org/0000-0002-4947-1562
Igor Mikhailovich Artamonov Московский авиационный институт (национальный исследовательский университет) http://orcid.org/0000-0001-8343-4821

DOI: https://doi.org/10.25559/SITITO.16.202002.500-509

Аннотация

В статье рассматривается подход к цифровизации феномена внимания. В работе приведены ссылки о том, что внимание улучшает любую деятельность. Психолого-педагогические исследования показывают, что особое положительное влияние внимания оказывает на деятельность обучения. Выбор направления исследования и разработки технологий диагностики внимания обусловлен прикладными задачами и ожиданиями повышения эффективности и скорости освоения программ обучения, отказа от неэффективных методик, оперативной реакции на трудности в освоении учебной программы и повышение легкости восприятия материалов.
Авторы, основываясь на экспертном анализе видео данных, сформулированных требований к методике, рассматривают возможность использования методов компьютерного зрения и алгоритмов распознавания изображений на основе нейронных сетей для анализа внимания по наблюдаемым паттернам выразительных движений. Показаны модель обработки видеоданных, структура модели нейронной сети определения общих паттернов внимания.
Предложенные в статье подходы и модели позволяют рассмотреть возможности современных информационных технологий в такой области как образование. Рассмотреть применение алгоритмов на основе нейронных сетей в образовательной деятельности, повысить эффективность обучающих программ, в особенности он-лайн и дистанционных программ, за счет оперативной обратной связи ведущим обучающих мероприятий и возможности в реальном времени корректировать учебный процесс и методические материалы.

Сведения об авторах

Yana Nikolaevna Artamonova, ООО "Нейрокорпус"

генеральный директор

Igor Mikhailovich Artamonov, Московский авиационный институт (национальный исследовательский университет)

начальник отдела информационных сетей

Литература

[1] Pulli K., Baksheev A., Kornyakov K., Eruhimov V. Real-time computer vision with OpenCV. Communications of the ACM. 2012; 55(6):61-69. (In Eng.) DOI: https://doi.org/10.1145/2184319.2184337
[2] Huang R., Pedoeem J., Chen C. YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers. In: 2018 IEEE International Conference on Big Data (Big Data). Seattle, WA, USA; 2018. p. 2503-2510. (In Eng.) DOI: https://doi.org/10.1109/BigData.2018.8621865
[3] Xiao B., Wu H., Wei Y. Simple Baselines for Human Pose Estimation and Tracking. In: Ferrari V., Hebert M., Sminchisescu C., Weiss Y. (ed.) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science. 2018; 11210:472-487. Springer, Cham. (In Eng.) DOI: https://doi.org/10.1007/978-3-030-01231-1_29
[4] Zafeiriou S., Zhang C., Zhang Z. A Survey on Face Detection in the wild: past, present and future. Computer Vision and Image Understanding. 2015; 138:1-24. (In Eng.) DOI: http://dx.doi.org/10.1016/j.cviu.2015.03.015
[5] Cao Z., Hidalgo G., Simon T., Wei S. -E., Sheikh Y. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021; 43(1):172-186. (In Eng.) DOI: https://doi.org/10.1109/TPAMI.2019.2929257
[6] Sun K., Xiao B., Liu D., Wang J. Deep High-Resolution Representation Learning for Human Pose Estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA; 2019. p. 5686-5696. (In Eng.) DOI: https://doi.org/10.1109/CVPR.2019.00584.
[7] Chen S., Yang R.R. Pose Trainer: Correcting Exercise Posture using Pose Estimation. arXiv:2006.11718 [cs.CV]. 2020. (In Eng.)
[8] Andriluka M., Pishchulin L., Gehler P., Schiele B. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH; 2014. p. 3686-3693. (In Eng.) DOI: https://doi.org/10.1109/CVPR.2014.471
[9] Toshev A., Szegedy C. DeepPose: Human Pose Estimation via Deep Neural Networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH; 2014. p. 1653-1660. (In Eng.) DOI: https://doi.org/10.1109/CVPR.2014.214
[10] Pishchulin L., Andriluka M., Gehler P., Schiele B. Poselet Conditioned Pictorial Structures. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR; 2013. p. 588-595. (In Eng.) DOI: https://doi.org/10.1109/CVPR.2013.82
[11] Cao Z., Simon T., Wei S., Sheikh Y. Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI; 2017. p. 1302-1310. (In Eng.) DOI: https://doi.org/10.1109/CVPR.2017.143
[12] Lin T.Y. et al. Microsoft COCO: Common Objects in Context. In: Fleet D., Pajdla T., Schiele B., Tuytelaars T. (ed.) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science. 2014; 8693:740-755. Springer, Cham. (In Eng.) DOI: https://doi.org/10.1007/978-3-319-10602-1_48
[13] Yang Z., Li Y., Yang J., Luo J. Action Recognition With Spatio-Temporal Visual Attention on Skeleton Image Sequences. IEEE Transactions on Circuits and Systems for Video Technology. 2019; 29(8):2405-2415. (In Eng.) DOI: https://doi.org/10.1109/TCSVT.2018.2864148
[14] Gupta A., Agrawal D., Chauhan H., Dolz J., Pedersoli M. An Attention Model for Group-Level Emotion Recognition. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction (ICMI'18). Association for Computing Machinery, New York, NY, USA; 2018. p. 611-615. (In Eng.) DOI: https://doi.org/10.1145/3242969.3264985
[15] Guo X., Polanía L.F., Barner K.E. Group-level emotion recognition using deep models on image scene, faces, and skeletons. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction (ICMI'17). Association for Computing Machinery, New York, NY, USA; 2017. p. 603-608. (In Eng.) DOI: https://doi.org/10.1145/3136755.3143017
[16] Girdhar R., Ramanan D. Attentional Pooling for Action Recognition. In: 31st Conference on Neural Information Processing Systems (NIPS 2017). Long Beach, CA, USA; 2017. p. 1-12. (In Eng.)
[17] He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV; 2016. p. 770-778. (In Eng.) DOI: https://doi.org/10.1109/CVPR.2016.90
[18] Rosner T.M., D’Angelo M.C., MacLellan E., Milliken B. Selective attention and recognition: effects of congruency on episodic learning. Psychological Research. 2015; 79:411-424. (In Eng.) DOI: https://doi.org/10.1007/s00426-014-0572-6
[19] Cheng Z., Bai F., Xu Y., Zheng G., Pu S., Zhou S. Focusing Attention: Towards Accurate Text Recognition in Natural Images. In: 2017 IEEE International Conference on Computer Vision (ICCV). Venice; 2017. p. 5086-5094. (In Eng.) DOI: https://doi.org/10.1109/ICCV.2017.543
[20] Simonyan K., Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015). San Diego, CA, USA; 2015. Available at: http://arxiv.org/abs/1409.1556 (accessed 18.04.2020). (In Eng.)
[21] Wang L., Xiong Y., Wang Z., Qiao Y., Lin D., Tang X., Van Gool L. Temporal Segment Networks for Action Recognition in Videos. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2019; 41(11):2740-2755. (In Eng.) DOI: https://doi.org/10.1109/TPAMI.2018.2868668
[22] Schuldt C., Laptev I., Caputo B. Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004). Vol. 3. Cambridge; 2004. p. 32-36. (In Eng.) DOI: https://doi.org/10.1109/ICPR.2004.1334462
[23] Kuehne H., Jhuang H., Garrote E., Poggio T., Serre T. HMDB: A large video database for human motion recognition. In: Proceedings of the 2011 International Conference on Computer Vision (ICCV'11). IEEE Computer Society, USA; 2011. p. 2556-2563. (In Eng.) DOI: https://doi.org/10.1109/ICCV.2011.6126543
[24] Pham H.H., Salmane H., Khoudour L., Crouzil A., Zegers P., Velastin S.A. Spatio-Temporal Image Representation of 3D Skeletal Movements for View-Invariant Action Recognition with Deep Convolutional Neural Networks. Sensors. 2019; 19(8):1932. (In Eng.) DOI: https://doi.org/10.3390/s19081932
[25] Wang L., Qiao Y., Tang X. MoFAP: A Multi-level Representation for Action Recognition. International Journal of Computer Vision. 2016; 119(3):254-271. (In Eng.) DOI: https://doi.org/10.1007/s11263-015-0859-0