Sequential pattern mining

Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence.[1][2] It is usually presumed that the values are discrete, and thus time series mining is closely related, but usually considered a different activity. Sequential pattern mining is a special case of structured data mining.

There are several key traditional computational problems addressed within this field. These include building efficient databases and indexes for sequence information, extracting the frequently occurring patterns, comparing sequences for similarity, and recovering missing sequence members. In general, sequence mining problems can be classified as string mining which is typically based on string processing algorithms and itemset mining which is typically based on association rule learning. Local process models [3] extend sequential pattern mining to more complex patterns that can include (exclusive) choices, loops, and concurrency constructs in addition to the sequential ordering construct.

String mining

String mining typically deals with a limited alphabet for items that appear in a sequence, but the sequence itself may be typically very long. Examples of an alphabet can be those in the ASCII character set used in natural language text, nucleotide bases 'A', 'G', 'C' and 'T' in DNA sequences, or amino acids for protein sequences. In biology applications analysis of the arrangement of the alphabet in strings can be used to examine gene and protein sequences to determine their properties. Knowing the sequence of letters of a DNA or a protein is not an ultimate goal in itself. Rather, the major task is to understand the sequence, in terms of its structure and biological function. This is typically achieved first by identifying individual regions or structural units within each sequence and then assigning a function to each structural unit. In many cases this requires comparing a given sequence with previously studied ones. The comparison between the strings becomes complicated when insertions, deletions and mutations occur in a string.

A survey and taxonomy of the key algorithms for sequence comparison for bioinformatics is presented by Abouelhoda & Ghanem (2010), which include:[4]

  • Repeat-related problems: that deal with operations on single sequences and can be based on exact string matching or approximate string matching methods for finding dispersed fixed length and maximal length repeats, finding tandem repeats, and finding unique subsequences and missing (un-spelled) subsequences.
  • Alignment problems: that deal with comparison between strings by first aligning one or more sequences; examples of popular methods include BLAST for comparing a single sequence with multiple sequences in a database, and ClustalW for multiple alignments. Alignment algorithms can be based on either exact or approximate methods, and can also be classified as global alignments, semi-global alignments and local alignment. See sequence alignment.

Itemset mining

Some problems in sequence mining lend themselves to discovering frequent itemsets and the order they appear, for example, one is seeking rules of the form "if a {customer buys a car}, he or she is likely to {buy insurance} within 1 week", or in the context of stock prices, "if {Nokia up and Ericsson up}, it is likely that {Motorola up and Samsung up} within 2 days". Traditionally, itemset mining is used in marketing applications for discovering regularities between frequently co-occurring items in large transactions. For example, by analysing transactions of customer shopping baskets in a supermarket, one can produce a rule which reads "if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat in the same transaction".

A survey and taxonomy of the key algorithms for item set mining is presented by Han et al. (2007).[5]

The two common techniques that are applied to sequence databases for frequent itemset mining are the influential apriori algorithm and the more-recent FP-growth technique.

Applications

With a great variation of products and user buying behaviors, shelf on which products are being displayed is one of the most important resources in retail environment. Retailers can not only increase their profit but, also decrease cost by proper management of shelf space allocation and products display. To solve this problem, George and Binu (2013) have proposed an approach to mine user buying patterns using PrefixSpan algorithm and place the products on shelves based on the order of mined purchasing patterns.[6]

Algorithms

Commonly used algorithms include:

  • GSP algorithm
  • Sequential Pattern Discovery using Equivalence classes (SPADE)
  • FreeSpan
  • PrefixSpan
  • MAPres[7]
  • Seq2Pat (for constraint-based sequential pattern mining)[8][9]

See also

References

  1. ^ Mabroukeh, N. R.; Ezeife, C. I. (2010). "A taxonomy of sequential pattern mining algorithms". ACM Computing Surveys. 43: 1–41. CiteSeerX 10.1.1.332.4745. doi:10.1145/1824795.1824798. S2CID 207180619.
  2. ^ Bechini, A.; Bondielli, A.; Dell'Oglio, P.; Marcellonii, F. (2023). "From basic approaches to novel challenges and applications in Sequential Pattern Mining". Applied Computing and Intelligence. 3 (1): 44–78. doi:10.3934/aci.2023004.
  3. ^ Tax, N.; Sidorova, N.; Haakma, R.; van der Aalst, Wil M. P. (2016). "Mining Local Process Models". Journal of Innovation in Digital Ecosystems. 3 (2): 183–196. arXiv:1606.06066. doi:10.1016/j.jides.2016.11.001. S2CID 10872379.
  4. ^ Abouelhoda, M.; Ghanem, M. (2010). "String Mining in Bioinformatics". In Gaber, M. M. (ed.). Scientific Data Mining and Knowledge Discovery. Springer. doi:10.1007/978-3-642-02788-8_9. ISBN 978-3-642-02787-1.
  5. ^ Han, J.; Cheng, H.; Xin, D.; Yan, X. (2007). "Frequent pattern mining: current status and future directions". Data Mining and Knowledge Discovery. 15 (1): 55–86. doi:10.1007/s10618-006-0059-1.
  6. ^ George, A.; Binu, D. (2013). "An Approach to Products Placement in Supermarkets Using PrefixSpan Algorithm". Journal of King Saud University-Computer and Information Sciences. 25 (1): 77–87. doi:10.1016/j.jksuci.2012.07.001.
  7. ^ Ahmad, Ishtiaq; Qazi, Wajahat M.; Khurshid, Ahmed; Ahmad, Munir; Hoessli, Daniel C.; Khawaja, Iffat; Choudhary, M. Iqbal; Shakoori, Abdul R.; Nasir-ud-Din (1 May 2008). "MAPRes: Mining association patterns among preferred amino acid residues in the vicinity of amino acids targeted for post-translational modifications". Proteomics. 8 (10): 1954–1958. doi:10.1002/pmic.200700657. PMID 18491291. S2CID 22362167.
  8. ^ Hosseininasab A, van Hoeve WJ, Cire AA (2019). "Constraint-Based Sequential Pattern Mining with Decision Diagrams". Proceedings of the AAAI Conference on Artificial Intelligence. 33: 1495–1502. arXiv:1811.06086. doi:10.1609/aaai.v33i01.33011495. S2CID 53427299.
  9. ^ "Seq2Pat: Sequence-to-Pattern Generation Library". GitHub. 9 April 2022.
  • SPMF includes open-source implementations of GSP, PrefixSpan, SPADE, SPAM and many others.

Read other articles:

The Zürau Aphorisms 2014 Harvill Secker publicationAuthorFranz KafkaCountryGermanyLanguageGermanPublication date1931Media typePrint The Zürau Aphorisms (German: Die Zürauer Aphorismen) are 109 aphorisms of Franz Kafka, written from September 1917 to April 1918 and published by his friend Max Brod in 1931, after his death. They are selected from his writing in Zürau in West Bohemia (now Siřem in the community of Blšany) where he stayed with his sister Ottla, suffering from tuberculo...

 

Artikel ini tidak memiliki referensi atau sumber tepercaya sehingga isinya tidak bisa dipastikan. Tolong bantu perbaiki artikel ini dengan menambahkan referensi yang layak. Tulisan tanpa sumber dapat dipertanyakan dan dihapus sewaktu-waktu.Cari sumber: Uretra – berita · surat kabar · buku · cendekiawan · JSTOR UretraUretra priaPengidentifikasiMeSHD014521TA98A08.4.01.001F A08.5.01.001MTA23426, 3442FMA19667Daftar istilah anatomi[sunting di Wikidata] ...

 

Gereja Katedral PangkalpinangGereja Katedral Santo Yosef PangkalpinangLokasiPangkalpinang, Bangka BelitungNegaraIndonesiaDenominasiGereja Katolik RomaArsitekturStatus fungsionalAktifTipe arsitekturGerejaAdministrasiKeuskupanKeuskupan Pangkalpinang Gereja Katedral Pangkalpinang atau yang bernama lengkap Paroki Katedral Santo Yosef Pangkalpinang adalah sebuah gereja katedral Katolik di Pangkalpinang, Bangka Belitung. Gereja Katedral Pangkalpinang didekisasikan untuk Santo Yosef. Katedral ini me...

Cet article est une ébauche concernant une intercommunalité française et la Côte-d'Or. Vous pouvez partager vos connaissances en l’améliorant (comment ?) ; pour plus d’indications, visitez le Projet des intercommunalités françaises. Communauté de communes des Terres d'Auxois Administration Pays France Région Bourgogne-Franche-Comté Département Côte-d'Or Forme Communauté de communes Siège Semur-en-Auxois Communes 76 Président Jean-Michel Pétreau (SE) Date de ...

 

Pour les articles homonymes, voir Lens. Lens-Lestang Chapelle Notre-Dame de Chatenay. Blason Administration Pays France Région Auvergne-Rhône-Alpes Département Drôme Arrondissement Valence Intercommunalité Communauté de communes Porte de Dromardèche Maire Mandat François Faure 2020-2026 Code postal 26210 Code commune 26162 Démographie Gentilé Lenselois, Lenseloises Populationmunicipale 886 hab. (2021 ) Densité 54 hab./km2 Géographie Coordonnées 45° 17′ 33�...

 

Medieval French nobleman and convicted serial killer Gilles de Retz redirects here. For the racehorse, see Gilles de Retz (horse). Gilles de RaisBaron de RetzGilles de Rais by Éloi Firmin Féron (1835)This oil painting is an artist's impression, since no contemporary portrait or description has survived. The artwork was commissioned by the July Monarchy for display in the Gallery of French Marshals of the Musée de l'Histoire de France (Versailles).[1]Birth nameGilles de Montmorency-...

Запрос «Пугачёва» перенаправляется сюда; см. также другие значения. Алла Пугачёва На фестивале «Славянский базар в Витебске», 2016 год Основная информация Полное имя Алла Борисовна Пугачёва Дата рождения 15 апреля 1949(1949-04-15) (75 лет) Место рождения Москва, СССР[1]...

 

American actor (1948–2022) This biography needs additional citations for verification. Please help improve this article by adding citations to reliable sources in this biography. Unsourced material may be challenged and removed.Find sources: Gregory Itzin – news · newspapers · books · scholar · JSTOR (July 2022) (Learn how and when to remove this message) Gregory ItzinItzin in 2006BornGregory Martin Itzin(1948-04-20)April 20, 1948Washington, D.C....

 

Taiwanese political party established in 2019 Not to be confused with Taiwanese People's Party (1927–1931). Taiwan People's Party 台灣民眾黨AbbreviationTPPChairmanKo Wen-jeSecretary-GeneralVincent Chou [zh]FounderKo Wen-jeFounded6 August 2019 (2019-08-06)HeadquartersNo. 27, Section 1, Hangzhou South Road, Zhongzheng District, Taipei City, TaiwanMembership (2023) 32,500[1]IdeologyCivic nationalism[2]Social liberalism[3]Populism[4&#...

Mappa dell'Eurozona      Zona euro      UE appartenenti agli AEC II      UE appartenenti agli AEC II con deroga      UE non appartenenti agli AEC II      Non UE che usano bilateralmente l'euro      Non UE che usano unilateralmente l'euro Voce principale: Monete in euro. Le monete euro sammarinesi sono le monete in euro coniate dalla Repubblica d...

 

لا بورت سيتي     الإحداثيات 42°18′49″N 92°11′18″W / 42.313611111111°N 92.188333333333°W / 42.313611111111; -92.188333333333   [1] تاريخ التأسيس 1855  تقسيم إداري  البلد الولايات المتحدة[2][3]  التقسيم الأعلى مقاطعة بلاك هوك  خصائص جغرافية  المساحة 6.783864 كيلومتر مربع6.783863 ...

 

Federal United States law This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Tenure of Office Act 1867 – news · newspapers · books · scholar · JSTOR (March 2010) (Learn how and when to remove this message) Tenure of Office Act (1867)Long titleAn act regulating the tenure of certain civil officesEnacted...

此條目可参照英語維基百科相應條目来扩充。 (2021年5月6日)若您熟悉来源语言和主题,请协助参考外语维基百科扩充条目。请勿直接提交机械翻译,也不要翻译不可靠、低品质内容。依版权协议,译文需在编辑摘要注明来源,或于讨论页顶部标记{{Translated page}}标签。 约翰斯顿环礁Kalama Atoll 美國本土外小島嶼 Johnston Atoll 旗幟颂歌:《星條旗》The Star-Spangled Banner約翰斯頓環礁�...

 

Ethnic group Ethnic group Lao peopleລາວA Lao woman wearing traditional clothing in Luang Prabang, LaosTotal populationc. 22 million(including Lao Isan)[a][1]Regions with significant populations Laos3,427,665[2] Thailand17,822,432 (including Lao Isan people) (2010)[3] France200,000[4] United States200,000 (2015)[5] Cambodia101,400 (including Khmer Lao people)[1] Canada24,580 (2016)[6] Myanm...

 

American basketball player and coach Larry StavermanPersonal informationBorn(1936-10-11)October 11, 1936Newport, Kentucky, U.S.DiedJuly 12, 2007(2007-07-12) (aged 70)Edgewood, Kentucky, U.S.NationalityAmericanListed height6 ft 9 in (2.06 m)Listed weight205 lb (93 kg)Career informationHigh schoolNewport Catholic(Newport, Kentucky)CollegeThomas More (1954–1958)NBA draft1958: 9th round, 64th overall pickSelected by the Cincinnati RoyalsPlaying career1958–1964Pos...

Jovenel Moïse Presiden Haiti ke–42Masa jabatan7 Februari 2017 – 7 Juli 2021Perdana MenteriEnex Jean-CharlesJack Guy LafontantJean-Henry CéantJean-Michel Lapin (penjabat)Joseph JoutheClaude Joseph (penjabat)PendahuluJocelerme Privert (interim)PenggantiClaude Joseph (penjabat) Informasi pribadiLahir(1968-06-26)26 Juni 1968Trou-du-Nord, HaitiMeninggal7 Juli 2021(2021-07-07) (umur 53)Pétion-Ville, HaitiSebab kematianLuka tembakKebangsaanHaitiPartai politikTèt Kale[1...

 

1880 painting by Arkhip Kuindzhi Moonlit Night on the DnieperArtistArkhip KuindzhiYear1880MediumOil on canvasSubjectDnieper RiverDimensions105 cm × 144 cm (41 in × 57 in)LocationRussian Museum, Saint Petersburg Moonlit Night on the Dnieper (Russian: Лунная ночь на Днепре) or Moonlit Night on the Dnipro[1][2] is an oil on canvas painting by artist Arkhip Kuindzhi made in 1880. Description The painting displays the ba...

 

IsraelFINA codeISRConfederationLEN (Europe)Head coachDimitrios MavrotasAsst coachSotiris ZoumpouliasCaptainAyelet PeresFINA ranking (since 2008)Current15 (as of 9 August 2021)World ChampionshipAppearances1 (first in 2023)Best result10th place (2023)World CupAppearances1 (first in 2023)Best result7th (2023)European ChampionshipAppearances3 (first in 2018)Best result6th place (2022) The Israel women's national water polo team represents Israel in international women's water polo competitions a...

Extradosed bridge in India Durgam Cheruvu Cable BridgeLong exposure shot of the bridgeCoordinates17°25′54″N 78°23′24″E / 17.4317°N 78.39°E / 17.4317; 78.39Carries6 lanes (3 lanes each way), pedestrians and bicyclesCrossesDurgam CheruvuCharacteristicsDesignCable-stayed bridgeTotal length233 meters (764 ft)HistoryConstruction cost₹184 crore[1]Opened25 September 2020; 3 years ago (25 September 2020)Location The Durgam Cheruvu Cable ...

 

Duta Besar Amerika Serikat untuk JermanSegel Kementerian Dalam Negeri Amerika SerikatDicalonkan olehPresiden Amerika SerikatDitunjuk olehPresidendengan nasehat Senat Berikut ini adalah daftar Duta Besar Amerika Serikat untuk Jerman Daftar Nama Gambar John C. Kornblum Daniel R. Coats William R. Timken, Jr. John M. Koenig Philip D. Murphy John B. Emerson Kent Logsdon Richard Grenell Referensi  Artikel ini berisi bahan berstatus domain umum dari situs web atau dokumen Departemen Lu...