Conventional image-based deep learning models employ multiple GPUs to simultaneously learn from massive amounts of data. However, in many real-world applications, data may contain confidential information, and computing resources may be limited. Such constraints pose a major challenge in the storage and parallel processing of the whole dataset. Training deep neural networks (DNNs) in a sequential manner may mitigate this issue; however, DNNs are known to suffer from catastrophic forgetting (i.e., forgetting of the existing knowledge) when they are continuously exposed to new data without reviewing any of the previously seen data. A research field has emerged to address catastrophic forgetting in sequential learning. It is known as continual learning (or incremental learning or lifelong learning). The goal of continual learning is to continually learn new knowledge while preserving the existing ones (i.e., minimizing catastrophic forgetting).
LG AI Research presented a paper titled "Online Class-incremental Continual Learning with Adversarial Shapley Value[1]" at the AAAI-21[2], which is one of the top AI conferences. This paper introduced an algorithm that excels in retaining previously acquired knowledge while promoting learning from new data. It was the first attempt to utilize K-Nearest Neighbor Shapley value (KNN-SV)[3], a data valuation scheme, for continual learning. It established a theoretical basis for exploiting KNN-SV for data selection. By replaying a small number of sampled data along with new data, it could achieve higher accuracy and lower forgetting than the existing continual learning methods. This algorithm is called the adversarial Shapley value experience replay (ASER), and it was jointly proposed by Professor Scott Sanner's team at the University of Toronto.
In continual learning, two settings are often considered for classification problems. In both settings, a DNN model learns from a series of different learning tasks over time. Each task is comprised of images from different classes, and the model is subject to a specific learning task at a given time. In the task-incremental setting, a model has access to all data instances belonging to the current task, and it only has to classify among the classes given in that task (i.e., multi-head setting). This setting is quite simple because the model has abundant data, and it always knows what task it is looking at. During inference, the model is given test images along with their corresponding task identifiers. With the task information, the model only needs to make a prediction relevant to that task. In real-world applications, task information is not available and data may be streamed. Therefore, the task-incremental setting is not realistic.
Online-class incremental setting provides a more practical and challenging learning environment. A DNN model gets to see one or a small number of new images at a time from a data stream belonging to the current task. It does not have any information regarding tasks. As a result, it has to classify among all classes from all tasks. This is a difficult single-head setting that is prone to catastrophic forgetting. Noting that task identifiers are artificial information, this setting allows to compare different continual learning algorithms in a realistic setting under the influence of severe catastrophic forgetting. Several existing continual learning algorithms have also adopted the online class-incremental setting. ASER was developed to solve the problem of catastrophic forgetting in this setting.
Comparison between the task-incremental setting and the online-incremental setting
Continual learning can be approached in various ways, but experience replay is the most realistic and promising method. The experience replay method selects a small amount of data from the data stream and stores them in the memory. The saved data can then be replayed along with new data for learning. This method is widely used because it shows good performance even when the task-related information is not available and does not require special conditions other than the data storage and the calculations related to the selection process. The experience replay method has two main processes: memory update and memory retrieval. A small set of data to be saved in the memory are selected and exchanged in the memory update process. In the memory retrieval process, the data to be learned along with the new set of data are retrieved from the memory to retain the previously learned knowledge. ASER performs data valuation based on KNN-SV for both processes to perform strategic and intuitive data selection.
Using KNN-SV, we can determine the characteristics of each data in the latent feature space of the deep learning model. KNN-SV indicates how much influence each data has on the accurate KNN classification of the other data. These values indicate relationships among the data in the latent feature space. For example, if the average value of a certain data point is a large positive number, then that point is surrounded by those from the same class in the feature space. If the average value is a large negative number, then the data are found among other classes of data. If the absolute value of the average is small, then the data are distributed close to the decision boundary. KNN-SV differs from the methods that only use distances or class labels. It enables us to infer the location where the data are represented using the entire distribution or a portion of the distribution.
KNN-SV distribution plots. The above plots show the deep learning model's representation of the data points using triangular markers in a 2-dimensional latent feature space. The color of the marker indicates the class label. In plots (b) and (c), data points having 20 and 50 highest positive KNN-SV, respectively, are emphasized. The larger the positive value is, the closer the data point is to the same class of data. In plot (d), data points having the 50 lowest KNN Shapley values are emphasized. Data points with negative values are found where the density of other classes is high (blue square markers among green triangular markers in the top right area of the plot). Besides these, data points with low values are distributed around the decision boundary.
The rules for ASER memory update and memory retrieval can be made using the properties of KNN-SV. To prevent significant deformation of the previously learned decision boundary near a set of new data, we can pick samples that are “adversarial” to them. The stored data instances that are represented near the new data but are of different class labels are called adversarial data points. They promote interference at the decision boundary and prevent over-fitting to the new data.
The representative information for each class can be supplemented by using “cooperative” data points. They can be selected among the stored samples that are surrounded mainly by those from the same class. These two properties can be captured by the proposed ASV formula below. Data instances with high ASV values are retrieved from the memory and learned along with new data. When updating the memory, data having a high positive average KNN-SV are given preference so that the data with high confidence for the target class can be preserved in the memory.
ASV formula and its meaning
The figure below shows the representation of the data that were retrieved for learning by ASER and other experience replay methods, namely random replay and MIR, in a 2-dimensional feature space. Random replay makes a random selection based on a uniform distribution for both the retrieval and update processes. Hence, the retrieved data are distributed according to the spatial density. The update process for MIR works in the same manner as random replay. For the retrieval process, MIR uses data that have the largest interference with the new data. However, the retrieved data points are mostly found in the red area of the plot, indicating that MIR may retrieve redundant samples. ASER strategically selects data samples that have high ASV values such that the retrieved data instances either share the same class boundaries with the new data or are representative of their own class.
Comparison of the TSNE plots between the existing methods and the proposed method>
For a quantitative comparison, the accuracy of the continual learning algorithms for the two datasets were compared, and ASER achieved the highest performance.
Comparison of the experimental results between the Mini-ImageNet and CIFAR-100>
In this study, LG AI Research set online continual learning as the research direction through a use-case analysis that reflects the needs of the field. We led the literature review and made contributions to the design of the experiment and the verification of the results. In particular, LG AI Research took the initiative in applying and analyzing the algorithm for the image data using the computer vision expertise.
Continual learning is a technology that has enabled deep learning models to perform fast "updates" based on data. This aspect of the technology has made it highly demanded by the industry. However, significant work remains to meet these demands. Compared to previous research, this study has achieved groundbreaking results with respect to "effective updates and use of memory." However, many areas still need improvement to reach the continual learning capability of humans. Future research should find hints about continual learning through a more in-depth exploration of human cognitive abilities and develop an effective algorithm.
In the future, LG AI Research will apply the published algorithm to real projects in the industry to contribute to creating business values. Simultaneously, we will conduct fundamental research to innovatively improve existing deep learning methods by addressing various fundamental problems (such as catastrophic forgetting) that need to be solved using continual learning.