ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

Adaptive ensemble classification algorithm for data streams based on information entropy

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2017.07.005
  • Received Date: 28 August 2016
  • Rev Recd Date: 08 December 2016
  • Publish Date: 31 July 2017
  • The processing of streaming data implies new requirements concerning limited amount of memory, small processing time, and one scan of incoming instances. Most of the approaches in the literature to deal with concept drift only focus on gradual or abrupt concept drift and have not addressed the problem of recurring concepts. Motivated by this challenge, an ensemble with internal change detection was proposed to enhance performance by exploring the recurring concepts. It is done by maintaining a pool of classifiers, which dynamically adds and removes classifiers in response to the change detector. The algorithm adopts a two window change detection model, which adopts the Jensen-Shannon divergence to measure the distance of the distributions between two consecutive windows. When a change is detected, the repository of stored historical concepts is checked for reuse. The proposed algorithm has been experimentally compared with the state-of-the-art algorithms on synthetic and real datasets. The results show the suitability of the proposed algorithm for different types of drift as well as static environments.
    The processing of streaming data implies new requirements concerning limited amount of memory, small processing time, and one scan of incoming instances. Most of the approaches in the literature to deal with concept drift only focus on gradual or abrupt concept drift and have not addressed the problem of recurring concepts. Motivated by this challenge, an ensemble with internal change detection was proposed to enhance performance by exploring the recurring concepts. It is done by maintaining a pool of classifiers, which dynamically adds and removes classifiers in response to the change detector. The algorithm adopts a two window change detection model, which adopts the Jensen-Shannon divergence to measure the distance of the distributions between two consecutive windows. When a change is detected, the repository of stored historical concepts is checked for reuse. The proposed algorithm has been experimentally compared with the state-of-the-art algorithms on synthetic and real datasets. The results show the suitability of the proposed algorithm for different types of drift as well as static environments.
  • loading
  • [1]
    COHEN L, AVRAHAMI-BAKISH G, LAST M, et al. Real-time data mining of non-stationary data streams from sensor networks[J]. Information Fusion, 2008, 9(3): 344-353.
    [2]
    WANG H X, FAN W, YU P S, et al. Mining concept-drifting data streams using ensembles classifiers[C]// Proceedings of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2003: 226-235.
    [3]
    ELWELL R, POLIKAR R. Incremental learning of concept drift in nonstationary environments[J]. IEEE Transactions on Neural Networks, 2011, 22(10): 1517-1531.
    [4]
    GAMA J. Knowledge Discovery from Data Streams[M]. New York: CRC Press, 2010.
    [5]
    WIDMER G, KUBAT M. Learning in the presence of concept drift and hidden contexts[J]. Machine Learning, 1996, 23(1): 69-101.
    [6]
    TSYMBAL A. The problem of concept drift: Definitions and related work[R]. Department of Computer Science, Trinity College, Dublin, Ireland, 2004.
    [7]
    GAMA J, LIOBAIT I, BIFET A, et al. A survey on concept drift adaptation[J]. ACM Computing Surveys, 2014, 46(4): 231-238.
    [8]
    亓开元,赵卓峰,房俊,等. 针对高速数据流的大规模数据实时处理方法[J]. 计算机学报. 2012, 35(3): 477-490.
    QI Kaiyuan, ZHAO Zhuofeng, FANG Jun, et al. Real-time processing for high speed data stream over lame scale data[J]. Chinese Journal of Computers, 2012, 35(3): 477-490.
    [9]
    GAMA J, MEDAS P, CASTILLO G, et al. Learning with drift detection[C]// Proceedings of the 17th Brazilian Symposium on Artificial Intelligence. Berlin: Springer-Verlag, 2004: 286-295.
    [10]
    BAENA-GARCA M, CAMPO-VILA J D, FIDALGO R, et al. Early drift detection method[C]// Proceedings of the 4th International Workshop on Knowledge Discovery from Data Streams. New York: ACM Press, 2006: 77-86.
    [11]
    NISHIDA K, YAMAUCHI K. Detecting concept drift using statistical testing[C]// Proceedings of the 10th International Conference on Discovery Science. Sendai, Japan: Springer-Verlag, 2007: 264-269.
    [12]
    BIFET A, GAVALD R. Learning from time-changing data with adaptive windowing[C]// Proceedings of the 7th SIAM International Conference on Data Mining. Philadelphia, PA: SIAM Press, 2007: 443-448.
    [13]
    ROSS G J, ADAMS N M, TASOULIS D K, et al. Exponentially weighted moving average charts for detecting concept drift[J]. Pattern Recognition Letters, 2012, 33(2): 191-198.
    [14]
    RAMAMURTHY S, BHATNAGAR R. Tracking recurrent concept drift in streaming data using ensemble classifiers[C]// Proceedings of the 6th International Conference on Machine Learning and Applications. Cincinnati, USA: IEEE Press, 2007: 404-409.
    [15]
    KATAKIS I, TSOUMAKAS G, VLAHAVAS I. Tracking recurring contexts using ensemble classifiers: An application to email filtering[J]. Knowledge and Information Systems, 2010, 22(3): 371-391.
    [16]
    YANG Y, WU X D, ZHU X Q. Mining in anticipation for concept change: Proactive-reactive prediction in data streams[J]. Data Mining and Knowledge Discovery, 2006, 13(3): 261-289.
    [17]
    GAMA J, KOSINA P. Recurrent concepts in data streams classification[J]. Knowledge and Information Systems, 2014, 40(3): 489-507.
    [18]
    GONALVES P M, DE BARROS R S M. RCD: A recurring concept drift framework[J]. Pattern Recognition Letters, 2013, 34(9): 1018-1025.
    [19]
    DATAR M, GIONIS A, INDYK P, et al. Maintaining stream statistics over sliding windows[J]. SIAM Journal on Computing, 2002, 31(6): 1794-1813.
    [20]
    DASU, T, KRISHNAN S, VENKATASUBRAMANIAN S, et al. An information-theoretic approach to detecting changes in multi-dimensional data streams[C]// Proceedings of the 38th Symposium on the Interface of Statistics, Computing Science, and Applications. Pasadena, USA: IEEE Press, 2006: 1-24.
    [21]
    BIFET A, HOLMES G, KIRKBY R, et al. MOA: Massive online analysis[J]. Journal of Machine Learning Research, 2010, 11(2): 1601-1604.
    [22]
    DOMINGOS P, HULTEN G. Mining high-speed data streams[C]// Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, USA: ACM Press, 2000: 71-80.
  • 加载中

Catalog

    [1]
    COHEN L, AVRAHAMI-BAKISH G, LAST M, et al. Real-time data mining of non-stationary data streams from sensor networks[J]. Information Fusion, 2008, 9(3): 344-353.
    [2]
    WANG H X, FAN W, YU P S, et al. Mining concept-drifting data streams using ensembles classifiers[C]// Proceedings of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2003: 226-235.
    [3]
    ELWELL R, POLIKAR R. Incremental learning of concept drift in nonstationary environments[J]. IEEE Transactions on Neural Networks, 2011, 22(10): 1517-1531.
    [4]
    GAMA J. Knowledge Discovery from Data Streams[M]. New York: CRC Press, 2010.
    [5]
    WIDMER G, KUBAT M. Learning in the presence of concept drift and hidden contexts[J]. Machine Learning, 1996, 23(1): 69-101.
    [6]
    TSYMBAL A. The problem of concept drift: Definitions and related work[R]. Department of Computer Science, Trinity College, Dublin, Ireland, 2004.
    [7]
    GAMA J, LIOBAIT I, BIFET A, et al. A survey on concept drift adaptation[J]. ACM Computing Surveys, 2014, 46(4): 231-238.
    [8]
    亓开元,赵卓峰,房俊,等. 针对高速数据流的大规模数据实时处理方法[J]. 计算机学报. 2012, 35(3): 477-490.
    QI Kaiyuan, ZHAO Zhuofeng, FANG Jun, et al. Real-time processing for high speed data stream over lame scale data[J]. Chinese Journal of Computers, 2012, 35(3): 477-490.
    [9]
    GAMA J, MEDAS P, CASTILLO G, et al. Learning with drift detection[C]// Proceedings of the 17th Brazilian Symposium on Artificial Intelligence. Berlin: Springer-Verlag, 2004: 286-295.
    [10]
    BAENA-GARCA M, CAMPO-VILA J D, FIDALGO R, et al. Early drift detection method[C]// Proceedings of the 4th International Workshop on Knowledge Discovery from Data Streams. New York: ACM Press, 2006: 77-86.
    [11]
    NISHIDA K, YAMAUCHI K. Detecting concept drift using statistical testing[C]// Proceedings of the 10th International Conference on Discovery Science. Sendai, Japan: Springer-Verlag, 2007: 264-269.
    [12]
    BIFET A, GAVALD R. Learning from time-changing data with adaptive windowing[C]// Proceedings of the 7th SIAM International Conference on Data Mining. Philadelphia, PA: SIAM Press, 2007: 443-448.
    [13]
    ROSS G J, ADAMS N M, TASOULIS D K, et al. Exponentially weighted moving average charts for detecting concept drift[J]. Pattern Recognition Letters, 2012, 33(2): 191-198.
    [14]
    RAMAMURTHY S, BHATNAGAR R. Tracking recurrent concept drift in streaming data using ensemble classifiers[C]// Proceedings of the 6th International Conference on Machine Learning and Applications. Cincinnati, USA: IEEE Press, 2007: 404-409.
    [15]
    KATAKIS I, TSOUMAKAS G, VLAHAVAS I. Tracking recurring contexts using ensemble classifiers: An application to email filtering[J]. Knowledge and Information Systems, 2010, 22(3): 371-391.
    [16]
    YANG Y, WU X D, ZHU X Q. Mining in anticipation for concept change: Proactive-reactive prediction in data streams[J]. Data Mining and Knowledge Discovery, 2006, 13(3): 261-289.
    [17]
    GAMA J, KOSINA P. Recurrent concepts in data streams classification[J]. Knowledge and Information Systems, 2014, 40(3): 489-507.
    [18]
    GONALVES P M, DE BARROS R S M. RCD: A recurring concept drift framework[J]. Pattern Recognition Letters, 2013, 34(9): 1018-1025.
    [19]
    DATAR M, GIONIS A, INDYK P, et al. Maintaining stream statistics over sliding windows[J]. SIAM Journal on Computing, 2002, 31(6): 1794-1813.
    [20]
    DASU, T, KRISHNAN S, VENKATASUBRAMANIAN S, et al. An information-theoretic approach to detecting changes in multi-dimensional data streams[C]// Proceedings of the 38th Symposium on the Interface of Statistics, Computing Science, and Applications. Pasadena, USA: IEEE Press, 2006: 1-24.
    [21]
    BIFET A, HOLMES G, KIRKBY R, et al. MOA: Massive online analysis[J]. Journal of Machine Learning Research, 2010, 11(2): 1601-1604.
    [22]
    DOMINGOS P, HULTEN G. Mining high-speed data streams[C]// Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, USA: ACM Press, 2000: 71-80.

    Article Metrics

    Article views (425) PDF downloads(174)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return