Please use this identifier to cite or link to this item:
http://localhost:8080/xmlui/handle/123456789/8502
Title: | Human Emotion Identification based on Deep Learning |
Authors: | Nawaf, Asmaa Jasim, Wesam |
Keywords: | Convolutional Neural Networks Deep Learning Facial Expression Recognition FER2013 FER+. |
Issue Date: | 1-Jan-2022 |
Publisher: | University of Anbar |
Abstract: | Facial Expression Recognition (FER) is one of the most essential methods influencing human-machine interaction (HMI). The goal of developing HMI systems is to create a channel of communication between humans and machines in order to participate and to do tasks. The fastest way to understand human emotions is facial expressions, and therefore the development of systems capable of recognizing facial expressions leads to smart systems that are more responsive to the user’s requirements. Determining human emotions through images is a difficult task because of the great diversity of faces in terms of age, gender, and others. Therefore, the importance of diversity in the data used in training the model is to enrich the model with information during the learning process. The proposed work system consists of several stages, starting with the data preprocessing by applying a number of preprocessors to it and in proportion to the used model, whether it is the VGG16 pre-trained model or the proposed ERCNN (Emotion Recognition Convolutional Neural Networks) model, and then, to saercnie the volume of data used to train the model; firstly, by merging it with the new data and secondly by applying a number of data augmentation techniques to avoid overfitting. Furthermore, zero padding is added in order to avoid loss in image size when filters are applied to it. In addition to using parameters that fit the data used in training. In this thesis, a comparison was conducted between two models, a modified model trained on the FER dataset only, and a model previously trained on a wide range of datasets, which is the VGG16 model. ehc pre-trained model was reset and retrained using the FER dataset. The results showed that the proposed ERCNN model dedicated to identifying human emotions significantly outperformed the pre-trained model in terms of accuracy, speed and performance. The reason for the weak performance in identifying human VII emotions by the VGG16 model is that the model needs more layers to train on the FER dataset. The superiority of the proposed ERCNN model is due to the proposed structure that relied on increasing the convolution layers in order to increase the number and diversity of features extracted from the image. Each layer extracts different features from the input images depending on the convolution operation and on the filters presented in each convolution layer. Adding Batch Normalization after each convolutional layer (i.e., applying normalizing and standardizing processes to the inputs from the previous convolutional layers) makes the deep network faster and more stable. Then, the use of the max-pooling layer at the end of each block (five blocks in the suggested model) is to get the most important features from the features extracted from the convolution layer. Thus, reducing arithmetic operations and preventing overfitting. Adding the dropout layer after the max-pooling layer at the end of the block to drop the nodes randomly will prevent a model from overfitting. At the end of the model, the flatting layer is used to transform the output from the previous layers into a one-dimensional array and then fed into the fully connected layer to be classified using SoftMax. The proposed model was trained and tested using two versions of FER2013 datasets, which were taken from Kaggle open-source. The first version before correcting the wrong labels and the second version after correcting the wrong labels. In addition to using the new data, the number of images became in the expanded data 49568 images. These images are divided into three sections: (training, validation, and testing). The highest results and the best performance were obtained when training and testing the proposed model using the expanded data of the FER+ (corrected labels). The accuracy in the public test was 87.133% and in the private test was 82.648%. In addition to the effective performance during the evaluation using images from the Internet with different emotions, all the predictions on the images were correct and with high accuracy. The proposed work system was carried out on the Kaggle platform that provides high-speed GPU. |
URI: | http://localhost:8080/xmlui/handle/123456789/8502 |
Appears in Collections: | قسم علوم الحاسبات |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.