Improve the Accuracy of Emotion Recognition

Data Acquisition

Acquiring a considerable number of bio-signal datasets is key to defining user’s emotions by machine learning algorithm. Thus, we have been working on the experiment in which a research participant watches a series of VR contents and responds to the surveys on their feeling while wearing our VR headset embedded with eye tracking cameras and EEG sensors. Then, by conducting such experiment a number of times, a substantial amount of surveyed emotion data along with brain and eye data of different individuals can be collected and accumulated. These data from different individuals are very personalized at the initial stage of the experiment even if they watch the same VR content (Figure 1). However, we aim to generalize emotion recognition model by designing more effective experiment protocol and conducting enough experiments. Therefore, at this stage, we will concentrate on improving our experiment protocol so that enough emotion data get collected to set up a generalized model for emotion recognition system.

Figure 1. Each emotional index map

Bio-signmal Processing


The pre-processing is the process of eliminating the noise in the original signals and calibrating the user’s initial emotional states. In general, the electrical signals from the brain waves are very weak, so they are sensitive to other noises such as eye-blinking, muscle movement, interference with electrical devices and etc. Therefore, eliminating these unwanted noises is very important for post-analysis. Looxid Labs has developed an ICA (Independent Component Analysis) based algorithm that identifies important features in EEG signals and automatically extracts and reduces the noise contained in the original signal to improve the quality of the original signal. In addition, emotions aroused by a specific stimulus differ among users and vary by time and environment even if they are from the same user. Thus, the system undergoes calibration process of a user’s emotions before the user watches VR contents so that his or her emotions during the actual VR experience are distinctly classified. Our pre-processing includes four steps:

1)  Detrending

2)  Filtering

3) Spike Artifact Removal

4)  Blink Artifact Removal


Looxid Labs’ eye tracking system handles eye images taken with infrared cameras. The infrared rays emitted by the infrared LED located under the eye are reflected by the eyeball and detected by the infrared cameras. In general, the pupil absorbs infrared rays, so it appears darker than iris or other eye areas. Reflecting these characteristics, our system locates the center of the pupil from the captured eye images and track the size and position of the pupil and thus eyeline by using the center of the pupil as a baseline. While the shape of the eyeball is a three-dimensional sphere, the image or the image input through the camera is two-dimensional. So, the image is distorted depending on the angle and position of the camera. Since the resulting distortion accounts for the distortion of the user’s gaze information as well, an extra step (calibration) is required to rectify the distorted information. In the calibration phase, the predefined points appear on the screen and the user tracks the points as they change their positions. The mapping process is performed based on the relationship between the eye images obtained during the calibration and the predefined points. In addition, the precise eye position in 3D space is estimated using the geometric correlation so that the estimated gaze value perfectly matches the actual user’s gaze on the screen. In the same manner, additional information such as a fixation of eyes and saccades can be acquired.

Pupil Size

Changes in pupil size are most closely related to the user’s cognitive status, memory, behavioral decisions and emotional changes. Therefore, measuring the pupil size is one of the most important factors for inferring the user’s biological responses. In order to measure the size of an accurate pupil from an eye image, two problems must be solved first. First, because the size of the pupil varies with the amount of light coming into the eye, the system must be able to distinguish between pupil size changes caused by cognition and emotional states and those by the light and therefore control the amount of light. Second, as with previous eye tracking techniques, it is difficult to accurately track the size and the shape due to image distortion. In order to solve these two problems, we are inferring the exact pupil size according to the user’s vital reaction by applying information tracking, filtering and 3D reconstruction methods.

Emotion Recognition Algorithm

Among emotion recognition technologies based on human physiological responses, using electroencephalogram (EEG) that directly records at the head surface provides better accuracy than using indirect information such as facial expressions or voices in understanding emotions. Existing emotion recognition techniques using EEG mainly employ basic machine learning algorithms that extract features from brainwaves and match them into emotion indexes defined by prior researches. However, in spite of its relatively high accuracy, the traditional emotion analysis using EEG has some limitations in emotion classification. It is difficult to achieve more than a certain level of accuracy because of low quality of EEG signals and insufficient quantity of the data. Furthermore, it is not easy to define the emotion indexes themselves. As a result, there have been limitations in applying supervised learning to match defined emotion indexes with well known features in EEG even if the machine learning technology was advanced. In order to overcome these limitations, Looxid Labs aims to extract a variety of emotion indexes from many people through deep learning algorithm based on representation learning. In other words, we are developing technology that improves the accuracy of emotion classification by finding hidden patterns in the EEG signals themselves. In addition to EEG signals, changes in pupil size and reactivity are also combined to improve the accuracy of emotion recognition. Through Looxid Labs’ technology, it is possible to make use of an individual’s emotion data, which has been difficult to generalize and objectify, for determining and measuring business indexes customized to the needs of various industries. VR user’s bio-signals are classified into a specific emotion by machine learning algorithm including these three processes: i) feature extraction, ii) feature selection, and iii) classifier learning.

Feature Extraction

The feature extraction process transforms user’s biomedical signals into a form that can express a certain emotional state more clearly. At this time, arbitrary characteristics are created by time, frequency, and both time-frequency analysis. For example, when the user views the content A, the brain waves repeatedly oscillate with time, which may not be very different from when viewing the content B. However, when the two signals are converted to a specific frequency band, the size of the frequency band can be different. In this way, Looxid Labs has the technology to extract the features that help classify distinct emotions from the bio-signals preprocessed by the methodology used in mathematics, statistics and computer science.

 Figure 4. Extracted Features

Feature Selection

In this process, we select the important features among the various bio-signal features from the extraction process. Due to the nature of brain waves and eye data, we include a large number of information (about 3000 per second) per unit time. As these data go through the feature extraction process, the amount of information is reduced due to the certain features, but the data still contain vast amount of information. Looxid Labs uses algorithms that single out the only information that brings about the greatest dependency and relevance between two sets of data based on mutual information techniques that use information theory-based statistical properties. As a result, only the minimum features that can represent biological signals by capturing the importance of information are used for emotion classification, thereby improving the accuracy and speed of emotion analysis.