Abstract
Accurate service identification of network data streams is a prerequisite for providing differentiated services. The commonly used supervised learning is difficult to implement when constructing training data sets due to the need for a large number of human annotations. Semi-supervised learning based on a small amount of annotated data has become one of the research hotspots. Semi-supervised framework of Self-paced Co-training adopts the method of collaboration that processes the easier pieces first using multiple perspectives when dealing with unlabeled data. However, this method only uses confidence as the criterion to select pseudo labels for samples, which can easily lead to the gradual decline of multi-perspective differences in the training process, resulting in the decline of synergy gain and the limitation of model performance. Therefore, for the recognition of WeChat data streams, a self-paced co-training model based on fuzziness (FBSpaCo) is proposed. When labeling pseudo labels, the fuzziness evaluation mechanism is introduced. Experiments show that the model can effectively avoid the decline of the difference between two perspectives in the training process. Compared with the existing methods, the recognition accuracy is greatly improved.
Abstract
Accurate service identification of network data streams is a prerequisite for providing differentiated services. The commonly used supervised learning is difficult to implement when constructing training data sets due to the need for a large number of human annotations. Semi-supervised learning based on a small amount of annotated data has become one of the research hotspots. Semi-supervised framework of Self-paced Co-training adopts the method of collaboration that processes the easier pieces first using multiple perspectives when dealing with unlabeled data. However, this method only uses confidence as the criterion to select pseudo labels for samples, which can easily lead to the gradual decline of multi-perspective differences in the training process, resulting in the decline of synergy gain and the limitation of model performance. Therefore, for the recognition of WeChat data streams, a self-paced co-training model based on fuzziness (FBSpaCo) is proposed. When labeling pseudo labels, the fuzziness evaluation mechanism is introduced. Experiments show that the model can effectively avoid the decline of the difference between two perspectives in the training process. Compared with the existing methods, the recognition accuracy is greatly improved.