ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

An online outlier detection and confidence estimation algorithm based on Bayesian posterior ratio

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2017.08.003
  • Received Date: 26 May 2017
  • Rev Recd Date: 14 July 2017
  • Publish Date: 31 August 2017
  • In order to satisfy the outlier detection requirements in one kind of high-speed, small-variance unlabeled industrial time series, an online outlier detection and confidence estimation algorithm based on Bayesian posterior ratio was proposed. The algorithm combined prediction and hypothesis testing, establishing the autoregressive model firstly and then using Bayesian posterior logarithm of residuals to identify outliers. To reduce misjudgment, the state transition probabilities were calculated by self-organizing map neural network and the reliability of detected outliers was evaluated afterwards. It updated models periodically to dynamically adapt to data changes, thus improving accuracy. Experimental results demonstrate that the online algorithm can effectively detect outliers in time series provide reliable confidence evaluation, bringing higher adaptability and practicability.
    In order to satisfy the outlier detection requirements in one kind of high-speed, small-variance unlabeled industrial time series, an online outlier detection and confidence estimation algorithm based on Bayesian posterior ratio was proposed. The algorithm combined prediction and hypothesis testing, establishing the autoregressive model firstly and then using Bayesian posterior logarithm of residuals to identify outliers. To reduce misjudgment, the state transition probabilities were calculated by self-organizing map neural network and the reliability of detected outliers was evaluated afterwards. It updated models periodically to dynamically adapt to data changes, thus improving accuracy. Experimental results demonstrate that the online algorithm can effectively detect outliers in time series provide reliable confidence evaluation, bringing higher adaptability and practicability.
  • loading
  • [1]
    PAWAR A D, KALAVADEKAR P N, TAMBE S N. A survey on outlier detection techniques for credit card fraud detection [J]. IOSR Journal of Computer Engineering, 2014, 16(2): 44-48.
    [2]
    GOLMOHAMMADI K, ZAIANE O R. Time series contextual anomaly detection for detecting market manipulation in stock market[C]// IEEE International Conference on Data Science and Advanced Analytics. Pairs, France: IEEE Press, 2015: 1-10.
    [3]
    KIM G, LEE S, KIM S. A novel hybrid intrusion detection method integrating anomaly detection with misuse detection[J]. Expert Systems with Applications, 2014, 41(4): 1690-1700.
    [4]
    SCHIFF G D, VOLK L A, VOLODARSKAYA M, et al. Screening for medication errors using an outlier detection system[J]. Journal of the American Medical Informatics Association, 2017, 24(2): 281-287.
    [5]
    BILLOR N, HADI A S, VELLEMAN P F. BACON: Blocked adaptive computationally efficient outlier nominators[J]. Computational Statistics & Data Analysis, 2000, 34(3):279-298.
    [6]
    KNORR E M,NG R T. Algorithms for mining distance-based outliers in large datasets[C]//Proceedings of the 24th International Conference on Very Large Data Bases. San Francisco:Morgan Kaufmann Publishers,1998: 392-403.
    [7]
    RAMASWAMY S, RASTOGI R, SHIM K. Efficient algorithms for mining outliers from large data sets[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. Dallas, USA: ACM Press, 2000, 29(2): 427-438.
    [8]
    BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: Identifying density-based local outliers[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. Dallas, USA: ACM Press, 2000, 29(2): 93-104.
    [9]
    HAWKINS D M. Identification of Outliers[M]. London: Chapman and Hall, 1980.
    [10]
    ABRAHAM B, BOX G E P. Bayesian Analysis of Some Outlier Problems in Time Series[J]. Biometrika, 1979, 66(2):229-236.
    [11]
    KARIMIAN S H, KELARESTAGHI M, HASHEMI S. I-IncLOF: Improved incremental local outlier detection for data streams[C]// Proceedings of the 16th CSI International Symposium on Artificial Intelligence and Signal Processing. Shiraz, Fars, Iran: IEEE Press, 2012: 23-28.
    [12]
    潘渊洋, 李光辉, 徐勇军. 基于DBSCAN的环境传感器网络异常数据检测方法[J]. 计算机应用与软件, 2012(11): 69-72.
    [13]
    HILL D J, MINSKER B S, AMIR E. Real-time Bayesian anomaly detection for environmental sensor data[C]// Proceedings of the 32nd Congress-International Association for Hydraulic Research. 2007, (2): 503.
    [14]
    ERFANI S M, RAJASEGARAR S, KARUNASEKERA S, et al. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning[J]. Pattern Recognition, 2016, 58(C): 121-134.
    [15]
    JADIDI Z, MUTHUKKUMARASAMY V, SITHIRASENAN E, et al. Flow-based anomaly detection using neural network optimized with GSA algorithm[C]// Proceedings of the 33rd International Conference on Distributed Computing Systems Workshops. Philadelphia, USA: IEEE Press, 2013: 76-81.
    [16]
    MARTINS H, PALMA L, CARDOSO A, et al. A support vector machine based technique for online detection of outliers in transient time series[C]// 10th Asian Control Conference. Kota, Kinabalu: IEEE Press, 2015: 1-6.
    [17]
    JOHANSEN S, NIELSEN B. Asymptotic theory of outlier detection algorithms for linear time series regression models[J]. Scandinavian Journal of Statistics, 2016, 43(2): 321-348.
    [18]
    刘芳, 毛志忠. 过程控制时间序列中异常值的动态检测[J]. 控制理论与应用, 2012, 29(4): 424-432.
    [19]
    杨志勇, 朱跃龙, 万定生. 基于知识粒度的时间序列异常检测研究[J]. 计算机技术与发展, 2016, 26(7): 51-54.
    [20]
    BOX G E P, JENKINS G M, REINSEL G C, et al. Time Series Analysis: Forecasting and Control[M]. John Wiley & Sons, 2015.
    [21]
    LACOUR C, MASSART P, RIVOIRARD V. Estimator selection: A new method with applications to kernel density estimation[J]. arXiv preprint, 2016, arXiv:1607.05091.
    [22]
    ANDERSSON B, DAVIER A A. Improving the bandwidth selection in kernel equating[J]. Journal of Educational Measurement, 2014, 51(3): 223-238.
    [23]
    苏卫星, 朱云龙, 胡琨元,等. 基于模型的过程工业时间序列异常值检测方法[J]. 仪器仪表学报, 2012, 33(9): 2080-2087.
    [24]
    GUIDO D, TEUVO K.Visual Explorations in Finance: With Self-Organizing Maps[M]. Springer Science & Business Media, 2013.
    [25]
    TAKEUCHI J I, YAMANISHI K. A unifying framework for detecting outliers and change points from time series[J]. Journal of Taiyuan Normal University, 2006, 18(4): 482-492.
  • 加载中

Catalog

    [1]
    PAWAR A D, KALAVADEKAR P N, TAMBE S N. A survey on outlier detection techniques for credit card fraud detection [J]. IOSR Journal of Computer Engineering, 2014, 16(2): 44-48.
    [2]
    GOLMOHAMMADI K, ZAIANE O R. Time series contextual anomaly detection for detecting market manipulation in stock market[C]// IEEE International Conference on Data Science and Advanced Analytics. Pairs, France: IEEE Press, 2015: 1-10.
    [3]
    KIM G, LEE S, KIM S. A novel hybrid intrusion detection method integrating anomaly detection with misuse detection[J]. Expert Systems with Applications, 2014, 41(4): 1690-1700.
    [4]
    SCHIFF G D, VOLK L A, VOLODARSKAYA M, et al. Screening for medication errors using an outlier detection system[J]. Journal of the American Medical Informatics Association, 2017, 24(2): 281-287.
    [5]
    BILLOR N, HADI A S, VELLEMAN P F. BACON: Blocked adaptive computationally efficient outlier nominators[J]. Computational Statistics & Data Analysis, 2000, 34(3):279-298.
    [6]
    KNORR E M,NG R T. Algorithms for mining distance-based outliers in large datasets[C]//Proceedings of the 24th International Conference on Very Large Data Bases. San Francisco:Morgan Kaufmann Publishers,1998: 392-403.
    [7]
    RAMASWAMY S, RASTOGI R, SHIM K. Efficient algorithms for mining outliers from large data sets[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. Dallas, USA: ACM Press, 2000, 29(2): 427-438.
    [8]
    BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: Identifying density-based local outliers[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. Dallas, USA: ACM Press, 2000, 29(2): 93-104.
    [9]
    HAWKINS D M. Identification of Outliers[M]. London: Chapman and Hall, 1980.
    [10]
    ABRAHAM B, BOX G E P. Bayesian Analysis of Some Outlier Problems in Time Series[J]. Biometrika, 1979, 66(2):229-236.
    [11]
    KARIMIAN S H, KELARESTAGHI M, HASHEMI S. I-IncLOF: Improved incremental local outlier detection for data streams[C]// Proceedings of the 16th CSI International Symposium on Artificial Intelligence and Signal Processing. Shiraz, Fars, Iran: IEEE Press, 2012: 23-28.
    [12]
    潘渊洋, 李光辉, 徐勇军. 基于DBSCAN的环境传感器网络异常数据检测方法[J]. 计算机应用与软件, 2012(11): 69-72.
    [13]
    HILL D J, MINSKER B S, AMIR E. Real-time Bayesian anomaly detection for environmental sensor data[C]// Proceedings of the 32nd Congress-International Association for Hydraulic Research. 2007, (2): 503.
    [14]
    ERFANI S M, RAJASEGARAR S, KARUNASEKERA S, et al. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning[J]. Pattern Recognition, 2016, 58(C): 121-134.
    [15]
    JADIDI Z, MUTHUKKUMARASAMY V, SITHIRASENAN E, et al. Flow-based anomaly detection using neural network optimized with GSA algorithm[C]// Proceedings of the 33rd International Conference on Distributed Computing Systems Workshops. Philadelphia, USA: IEEE Press, 2013: 76-81.
    [16]
    MARTINS H, PALMA L, CARDOSO A, et al. A support vector machine based technique for online detection of outliers in transient time series[C]// 10th Asian Control Conference. Kota, Kinabalu: IEEE Press, 2015: 1-6.
    [17]
    JOHANSEN S, NIELSEN B. Asymptotic theory of outlier detection algorithms for linear time series regression models[J]. Scandinavian Journal of Statistics, 2016, 43(2): 321-348.
    [18]
    刘芳, 毛志忠. 过程控制时间序列中异常值的动态检测[J]. 控制理论与应用, 2012, 29(4): 424-432.
    [19]
    杨志勇, 朱跃龙, 万定生. 基于知识粒度的时间序列异常检测研究[J]. 计算机技术与发展, 2016, 26(7): 51-54.
    [20]
    BOX G E P, JENKINS G M, REINSEL G C, et al. Time Series Analysis: Forecasting and Control[M]. John Wiley & Sons, 2015.
    [21]
    LACOUR C, MASSART P, RIVOIRARD V. Estimator selection: A new method with applications to kernel density estimation[J]. arXiv preprint, 2016, arXiv:1607.05091.
    [22]
    ANDERSSON B, DAVIER A A. Improving the bandwidth selection in kernel equating[J]. Journal of Educational Measurement, 2014, 51(3): 223-238.
    [23]
    苏卫星, 朱云龙, 胡琨元,等. 基于模型的过程工业时间序列异常值检测方法[J]. 仪器仪表学报, 2012, 33(9): 2080-2087.
    [24]
    GUIDO D, TEUVO K.Visual Explorations in Finance: With Self-Organizing Maps[M]. Springer Science & Business Media, 2013.
    [25]
    TAKEUCHI J I, YAMANISHI K. A unifying framework for detecting outliers and change points from time series[J]. Journal of Taiyuan Normal University, 2006, 18(4): 482-492.

    Article Metrics

    Article views (569) PDF downloads(252)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return