In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior.[1] Such examples may arouse suspicions of being generated by a different mechanism,[2] or appear inconsistent with the remainder of that set of data.[3]
Anomaly detection finds application in many domains including cybersecurity, medicine, machine vision, statistics, neuroscience, law enforcement and financial fraud to name only a few. Anomalies were initially searched for clear rejection or omission from the data to aid statistical analysis, for example to compute the mean or standard deviation. They were also removed to better predictions from models such as linear regression, and more recently their removal aids the performance of machine learning algorithms. However, in many applications anomalies themselves are of interest and are the observations most desirous in the entire data set, which need to be identified and separated from noise or irrelevant outliers.
Three broad categories of anomaly detection techniques exist.[1]Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier. However, this approach is rarely used in anomaly detection due to the general unavailability of labelled data and the inherent unbalanced nature of the classes. Semi-supervised anomaly detection techniques assume that some portion of the data is labelled. This may be any combination of the normal or anomalous data, but more often than not, the techniques construct a model representing normal behavior from a given normal training data set, and then test the likelihood of a test instance to be generated by the model. Unsupervised anomaly detection techniques assume the data is unlabelled and are by far the most commonly used due to their wider and relevant application.
Definition
Many attempts have been made in the statistical and computer science communities to define an anomaly. The most prevalent ones include the following, and can be categorised into three groups: those that are ambiguous, those that are specific to a method with pre-defined thresholds usually chosen empirically, and those that are formally defined:
Ill defined
An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism.[2]
Anomalies are instances or collections of data that occur very rarely in the data set and whose features differ significantly from most of the data.
An outlier is an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data.[3]
An anomaly is a point or collection of points that is relatively distant from other points in multi-dimensional space of features.
Anomalies are patterns in data that do not conform to a well-defined notion of normal behaviour.[1]
Specific
Let T be observations from a univariate Gaussian distribution and O a point from T. Then the z-score for O is greater than a pre-selected threshold if and only if O is an outlier.
History
Intrusion detection
The concept of intrusion detection, a critical component of anomaly detection, has evolved significantly over time. Initially, it was a manual process where system administrators would monitor for unusual activities, such as a vacationing user's account being accessed or unexpected printer activity. This approach was not scalable and was soon superseded by the analysis of audit logs and system logs for signs of malicious behavior.[4]
By the late 1970s and early 1980s, the analysis of these logs was primarily used retrospectively to investigate incidents, as the volume of data made it impractical for real-time monitoring. The affordability of digital storage eventually led to audit logs being analyzed online, with specialized programs being developed to sift through the data. These programs, however, were typically run during off-peak hours due to their computational intensity.[4]
The 1990s brought the advent of real-time intrusion detection systems capable of analyzing audit data as it was generated, allowing for immediate detection of and response to attacks. This marked a significant shift towards proactive intrusion detection.[4]
As the field has continued to develop, the focus has shifted to creating solutions that can be efficiently implemented across large and complex network environments, adapting to the ever-growing variety of security threats and the dynamic nature of modern computing infrastructures.[4]
Applications
Anomaly detection is applicable in a very large number and variety of domains, and is an important subarea of unsupervised machine learning. As such it has applications in cyber-security, intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, detecting ecosystem disturbances, defect detection in images using machine vision, medical diagnosis and law enforcement.[5]
Intrusion detection
Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986.[6] Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with soft computing, and inductive learning.[7] Types of features proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations.[8] The counterpart of anomaly detection in intrusion detection is misuse detection.
Preprocessing data to remove anomalies can be an important step in data analysis, and is done for a number of reasons. Statistics such as the mean and standard deviation are more accurate after the removal of anomalies, and the visualisation of data can also be improved. In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy.[11][12]
Video surveillance
Anomaly detection has become increasingly vital in video surveillance to enhance security and safety.[13][14] With the advent of deep learning technologies, methods using Convolutional Neural Networks (CNNs) and Simple Recurrent Units (SRUs) have shown significant promise in identifying unusual activities or behaviors in video data.[13] These models can process and analyze extensive video feeds in real-time, recognizing patterns that deviate from the norm, which may indicate potential security threats or safety violations.[13]
IT infrastructure
In IT infrastructure management, anomaly detection is crucial for ensuring the smooth operation and reliability of services.[15] Techniques like the IT Infrastructure Library (ITIL) and monitoring frameworks are employed to track and manage system performance and user experience.[15] Detection anomalies can help identify and pre-empt potential performance degradations or system failures, thus maintaining productivity and business process effectiveness.[15]
IoT systems
Anomaly detection is critical for the security and efficiency of Internet of Things (IoT) systems.[16] It helps in identifying system failures and security breaches in complex networks of IoT devices.[16] The methods must manage real-time data, diverse device types, and scale effectively. Garbe et al.[17] have introduced a multi-stage anomaly detection framework that improves upon traditional methods by incorporating spatial clustering, density-based clustering, and locality-sensitive hashing. This tailored approach is designed to better handle the vast and varied nature of IoT data, thereby enhancing security and operational reliability in smart infrastructure and industrial IoT systems.[17]
Petroleum industry
Anomaly detection is crucial in the petroleum industry for monitoring critical machinery.[18] Martí et al. used a novel segmentation algorithm to analyze sensor data for real-time anomaly detection.[18] This approach helps promptly identify and address any irregularities in sensor readings, ensuring the reliability and safety of petroleum operations.[18]
Oil and gas pipeline monitoring
In the oil and gas sector, anomaly detection is not just crucial for maintenance and safety, but also for environmental protection.[19] Aljameel et al. propose an advanced machine learning-based model for detecting minor leaks in oil and gas pipelines, a task traditional methods may miss.[19]
Methods
Many anomaly detection techniques have been proposed in literature.[1][20] The performance of methods usually depend on the data sets. For example, some may be suited to detecting local outliers, while others global, and methods have little systematic advantages over another when compared across many data sets.[21][22] Almost all algorithms also require the setting of non-intuitive parameters critical for performance, and usually unknown before application. Some of the popular techniques are mentioned below and are broken down into categories:
Statistical
Parameter-free
This section is empty. You can help by adding to it. (January 2024)
Convolutional Neural Networks (CNNs): CNNs have shown exceptional performance in the unsupervised learning domain for anomaly detection, especially in image and video data analysis.[13] Their ability to automatically and hierarchically learn spatial hierarchies of features from low to high-level patterns makes them particularly suited for detecting visual anomalies. For instance, CNNs can be trained on image datasets to identify atypical patterns indicative of defects or out-of-norm conditions in industrial quality control scenarios.[40]
Simple Recurrent Units (SRUs): In time-series data, SRUs, a type of recurrent neural network, have been effectively used for anomaly detection by capturing temporal dependencies and sequence anomalies.[13] Unlike traditional RNNs, SRUs are designed to be faster and more parallelizable, offering a better fit for real-time anomaly detection in complex systems such as dynamic financial markets or predictive maintenance in machinery, where identifying temporal irregularities promptly is crucial.[41]
Histogram-based Outlier Score (HBOS) uses value histograms and assumes feature independence for fast predictions.[50]
Anomaly detection in dynamic networks
Dynamic networks, such as those representing financial systems, social media interactions, and transportation infrastructure, are subject to constant change, making anomaly detection within them a complex task. Unlike static graphs, dynamic networks reflect evolving relationships and states, requiring adaptive techniques for anomaly detection.
Types of anomalies in dynamic networks
Community anomalies
Compression anomalies
Decomposition anomalies
Distance anomalies
Probabilistic model anomalies
Explainable anomaly detection
Many of the methods discussed above only yield an anomaly score prediction, which often can be explained to users as the point being in a region of low data density (or relatively low density compared to the neighbor's densities). In explainable artificial intelligence, the users demand methods with higher explainability. Some methods allow for more detailed explanations:
The Subspace Outlier Degree (SOD)[30] identifies attributes where a sample is normal, and attributes in which the sample deviates from the expected.
Correlation Outlier Probabilities (COP)[31] compute an error vector of how a sample point deviates from an expected location, which can be interpreted as a counterfactual explanation: the sample would be normal if it were moved to that location.
Software
ELKI is an open-source Java data mining toolkit that contains several anomaly detection algorithms, as well as index acceleration for them.
PyOD is an open-source Python library developed specifically for anomaly detection.[51]
scikit-learn is an open-source Python library that contains some algorithms for unsupervised anomaly detection.
Wolfram Mathematica provides functionality for unsupervised anomaly detection across multiple data types [52]
^Campos, Guilherme O.; Zimek, Arthur; Sander, Jörg; Campello, Ricardo J. G. B.; Micenková, Barbora; Schubert, Erich; Assent, Ira; Houle, Michael E. (2016). "On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study". Data Mining and Knowledge Discovery. 30 (4): 891. doi:10.1007/s10618-015-0444-8. ISSN1384-5810. S2CID1952214.
^Knorr, E. M.; Ng, R. T.; Tucakov, V. (2000). "Distance-based outliers: Algorithms and applications". The VLDB Journal the International Journal on Very Large Data Bases. 8 (3–4): 237–253. CiteSeerX10.1.1.43.1842. doi:10.1007/s007780050006. S2CID11707259.
^Ramaswamy, S.; Rastogi, R.; Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. Proceedings of the 2000 ACM SIGMOD international conference on Management of data – SIGMOD '00. p. 427. doi:10.1145/342009.335437. ISBN1-58113-217-4.
^Angiulli, F.; Pizzuti, C. (2002). Fast Outlier Detection in High Dimensional Spaces. Principles of Data Mining and Knowledge Discovery. Lecture Notes in Computer Science. Vol. 2431. p. 15. doi:10.1007/3-540-45681-3_2. ISBN978-3-540-44037-6.
^Schubert, E.; Zimek, A.; Kriegel, H. -P. (2012). "Local outlier detection reconsidered: A generalized view on locality with applications to spatial, video, and network outlier detection". Data Mining and Knowledge Discovery. 28: 190–237. doi:10.1007/s10618-012-0300-z. S2CID19036098.
^Campello, R. J. G. B.; Moulavi, D.; Zimek, A.; Sander, J. (2015). "Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection". ACM Transactions on Knowledge Discovery from Data. 10 (1): 5:1–51. doi:10.1145/2733381. S2CID2887636.
^Nguyen, H. V.; Ang, H. H.; Gopalkrishnan, V. (2010). Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces. Database Systems for Advanced Applications. Lecture Notes in Computer Science. Vol. 5981. p. 368. doi:10.1007/978-3-642-12026-8_29. ISBN978-3-642-12025-1.
^Zimek, A.; Campello, R. J. G. B.; Sander, J. R. (2014). "Ensembles for unsupervised outlier detection". ACM SIGKDD Explorations Newsletter. 15: 11–22. doi:10.1145/2594473.2594476. S2CID8065347.
^Zimek, A.; Campello, R. J. G. B.; Sander, J. R. (2014). Data perturbation for outlier detection ensembles. Proceedings of the 26th International Conference on Scientific and Statistical Database Management – SSDBM '14. p. 1. doi:10.1145/2618243.2618257. ISBN978-1-4503-2722-0.
Andre Malraux bersama Presiden AS John F. Keneddy Andre Malraux (lahir di Paris, 3 November 1901, meninggal di Creteil, 23 November 1976) adalah penulis Prancis yang menjabat sebagai kepala Dinas Penerangan Prancis pada tahun 1958 dan juga menjadi Menteri Kebudayaan pada tahun 1958 - 1968. [1] Ia lahir dari sebuah keluarga kaya.[2] Ayahnya Fernand-Georges Malraux adalah seorang pialang saham dan Ibunya Berthe Malraux sudah berpisah dengan ayahnya etika ia masih kecil.[2 ...
CoariMunisipalitas BenderaLambangJulukan: Cidade do GásPemerintahan • Wali kotaManoel Adail Amaral Pinheiro (PRP)Luas • Munisipalitas57.921,646 km2 (22,363,673 sq mi)Ketinggian40 m (130 ft)Populasi (2012) • Kepadatan1,334/km2 (3,460/sq mi) • Metropolitan77.305Zona waktuUTC-4 (AST(UTC-4) • Musim panas (DST)UTC-4ZIP code69460-000Situs webhttp://www.coari.am.gov.br/ Coari adalah nama munisipalita...
Italian professor, neuroscientist, experimental neurologist and medical neuropsychologist This article is an orphan, as no other articles link to it. Please introduce links to this page from related articles; try the Find link tool for suggestions. (February 2023) Pasquale CalabreseBorn (1961-02-27) February 27, 1961 (age 63)NaplesNationalityItalianCitizenshipItalianKnown fordiagnosis of neuropsychological and behavioral disturbances in neuropsychiatric patients, focus on multiple s...
Weightlifting at the 2022 Asian GamesVenueXiaoshan Sports Center GymnasiumDates30 September–7 OctoberCompetitors205 from 33 nations← 20182026 → Weightlifting at the 2022 Asian Games was held at the Xiaoshan Sports Center Gymnasium, Xiaoshan District, Hangzhou, China, from 30 September to 7 October 2023.[1][2] Originally, the weightlifting competitions were scheduled for the first week of the Games, but due to scheduling reasons and t...
Borough in EnglandBorough of PendleBoroughSkyline of Nelson from Barkerhouse RoadShown within Lancashire and EnglandSovereign stateUnited KingdomConstituent countryEnglandRegionNorth West EnglandCeremonial countyLancashireAdmin. HQNelsonGovernment • TypePendle Borough Council • Leadership:Leader & Cabinet • MPs:Andrew StephensonArea • Total65.4 sq mi (169.4 km2) • Rank155thPopulation (2021) • ...
Season of television series El Señor de los CielosSeason 1El Señor de los Cielos season 1 posterStarring Rafael Amaya Ximena Herrera Robinson Díaz Raúl Méndez Gabriel Porras No. of episodes74ReleaseOriginal networkTelemundoOriginal releaseApril 15 (2013-04-15) –August 5, 2013 (2013-08-05)List of episodes The first season of the American television series El Señor de los Cielos, was developed by Telemundo, it premiered on April 15, 2013 and ended on August 5, 2013. The ...
Systematic ChaosAlbum studio karya Dream TheaterDirilis4 Juni 2007 (2007-06-04)DirekamSeptember 2006 – Februari 2007 di Avatar Studios, New York CityGenreProgressive metal, progressive rockDurasi78:41LabelRoadrunnerProduserJohn Petrucci, Mike PortnoyKronologi Dream Theater Score(2006)Score2006 Systematic Chaos(2007) Greatest Hit (...and 21 Other Pretty Cool Songs)(2008)Greatest Hit (...and 21 Other Pretty Cool Songs)2008 Singel dalam album Systematic Chaos Constant MotionDirilis: 2...
Artikel ini tidak memiliki referensi atau sumber tepercaya sehingga isinya tidak bisa dipastikan. Tolong bantu perbaiki artikel ini dengan menambahkan referensi yang layak. Tulisan tanpa sumber dapat dipertanyakan dan dihapus sewaktu-waktu.Cari sumber: SMA Negeri 11 Kerinci – berita · surat kabar · buku · cendekiawan · JSTOR SMA Negeri 11 KerinciInformasiDidirikan11 Juni 2006JenisNegeriAkreditasiTerakreditasi BKepala SekolahSahdanur Gusmin.R, S.Pd...
لمعانٍ أخرى، طالع الشاوية (توضيح). الشاوية ⵉⵛⴰⵡⵉⵢⵏ الشاوية خريطة الشاوية و قبائلها بالمغربخريطة الشاوية و قبائلها بالمغرب معلومات القبيلة البلد المغرب المكان جهة الدار البيضاء سطات اللغة اللغة العربية الديانة الإسلام المذهب المذهب المالكي تعديل مصدري - تعد�...
Village in Podlaskie Voivodeship, PolandKucharówkaVillageKucharówkaCoordinates: 53°2′N 23°18′E / 53.033°N 23.300°E / 53.033; 23.300Country PolandVoivodeshipPodlaskieCountyBiałystokGminaZabłudów Kucharówka [kuxaˈrufka] is a village in the administrative district of Gmina Zabłudów, within Białystok County, Podlaskie Voivodeship, in north-eastern Poland.[1] It lies approximately 3 kilometres (2 mi) north-west of Zabłudów and 13 k...
Overview of American involvement in World War I United States in World War I1917–1918Two American soldiers run towards a bunker.LocationUnited StatesPresident(s)Woodrow WilsonKey eventsSelective Service Act of 1917Food and Fuel Control ActConscriptionChronology Progressive Era Roaring Twenties This article is part of a series on theHistory of the United States Timeline and periodsPrehistoric and Pre-Columbian Erauntil 1607Colonial Era 1607–17651776–1789 American R...
Species of bird Not to be confused with Wilson's phalarope. Wilson's snipe Conservation status Least Concern (IUCN 3.1)[1] Scientific classification Domain: Eukaryota Kingdom: Animalia Phylum: Chordata Class: Aves Order: Charadriiformes Family: Scolopacidae Genus: Gallinago Species: G. delicata Binomial name Gallinago delicata(Ord, 1825) Synonyms Gallinago gallinago delicata Ord, 1825 Wilson's snipe (Gallinago delicata) is a small, stocky shorebird.[2] The gener...
Ron Carter, 2008.He is the most-recorded bassist in jazz history, with appearances on over 2,200 albums.[1] This list of jazz bassists includes performers of the double bass and since the 1950s, and particularly in the jazz subgenre of jazz fusion which developed in the 1970s, electric bass players. Jazz bassist Charles Mingus was also an influential bandleader and composer whose musical interests spanned from bebop to free jazz. The most influential jazz double bassists from the 194...
2002 2012 Élections législatives de 2007 en Nouvelle-Calédonie 2 sièges de députés à l'Assemblée nationale 10 et 17 juin 2007 Corps électoral et résultats Inscrits 138 635 Votants au 1er tour 74 864 54 % 12,4 Votes exprimés au 1er tour 72 814 Votants au 2d tour 88 251 60,09 % Votes exprimés au 2d tour 85 845 Majorité présidentielle Liste Union pour un mouvement populaireNouveau centreMouvement pour la FranceDivers droit...
Measuring the point of gaze or motion of an eye relative to the head This article is about the study of eye movement. For the tendency to visually track potential prey, see eye-stalking. Eye tracking device Scientists track eye movements in glaucoma patients to check vision impairment while driving. Eye tracking is the process of measuring either the point of gaze (where one is looking) or the motion of an eye relative to the head. An eye tracker is a device for measuring eye positions and ey...
Multi-sport event in Doha, Qatar XV Asian GamesHost cityDoha, QatarMottoThe Games of Your Life[a]Nations45Athletes9,520[1]Events424 in 39 sports (46 disciplines)Opening1 December 2006Closing15 December 2006Opened byHamad bin Khalifa Al ThaniEmir of QatarClosed byAhmad Al-Fahad Al-SabahPresident of the Olympic Council of AsiaAthlete's OathMubarak Eid BilalJudge's OathAbd Allah Al-BulooshiTorch lighterMohammed bin Hamad bin Khalifa Al ThaniMain venueKhalifa International Stadium...