Code stylometry

Code stylometry (also known as program authorship attribution or source code authorship analysis) is the application of stylometry to computer code to attribute authorship to anonymous binary or source code. It often involves breaking down and examining the distinctive patterns and characteristics of the programming code and then comparing them to computer code whose authorship is known.[1] Unlike software forensics, code stylometry attributes authorship for purposes other than intellectual property infringement, including plagiarism detection, copyright investigation, and authorship verification.[2]

History

In 1989, researchers Paul Oman and Curtis Cook identified the authorship of 18 different Pascal programs written by six authors by using “markers” based on typographic characteristics.[3]

In 1998, researchers Stephen MacDonell, Andrew Gray, and Philip Sallis developed a dictionary-based author attribution system called IDENTIFIED (Integrated Dictionary-based Extraction of Non-language-dependent Token Information for Forensic Identification, Examination, and Discrimination) that determined the authorship of source code in computer programs written in C++. The researchers noted that authorship can be identified using degrees of flexibility in the writing style of the source code, such as:[4]

  • The way the algorithm in the source code solves the given problem
  • The way the source code is laid out (spacing, indentation, bordering characteristics, standard headings, etc.)
  • The way the algorithm is implemented in the source code

The IDENTIFIED system attributed authorship by first merging all the relevant files to produce a single source code file and then subjecting it to a metrics analysis by counting the number of occurrences for each metric. In addition, the system was language-independent due to its ability to create new dictionary files and meta-dictionaries.[4]

In 1999, a team of researchers led by Stephen MacDonell tested the performance of three different program authorship discrimination techniques on 351 programs written in C++ by 7 different authors. The researchers compared the effectiveness of using a feed-forward neural network (FFNN) that was trained on a back-propagation algorithm, multiple discriminant analysis (MDA), and case-based reasoning (CBR). At the end of the experiment, both the neural network and the MDA had an accuracy rate of 81.1%, while the CBR reached an accuracy performance of 88.0%.[5]

In 2005, researchers from the Laboratory of Information and Communication Systems Security at Aegean University introduced a language-independent method of program authorship attribution where they used byte-level n-grams to classify a program to an author. This technique scanned the files and then created a table of different n-grams found in the source code and the number of times they appear. In addition, the system could operate with limited numbers of training examples from each author. However, the more source code programs that were present for each author, the more reliable the author attribution. In an experiment testing their approach, the researchers found that classification using n-grams reached an accuracy rate of up to 100%, although the rate declined drastically if the profile size exceeded 500 and the n-gram size was 3 or less.[3]

In 2011, researchers from the University of Wisconsin created a program authorship attribution system that identified a programmer based on the binary code of a program instead of the source code. The researchers utilized machine learning and training code to determine which characteristics of the code would be helpful in describing the programming style. In an experiment testing the approach on a set of programs written by 10 different authors, the system achieved an accuracy rate of 81%. When tested using a set of programs written by almost 200 different authors, the system performed with an accuracy rate of 51%.[6]

In 2015, a team of postdoctoral researchers from Princeton University, Drexel University, the University of Maryland, and the University of Goettingen as well as researchers from the U.S. Army Research Laboratory developed a program authorship attribution system that could determine the author of a program from a sample pool with programs written by 1,600 coders with a 94 percent accuracy. The methodology consisted of four steps:[7]

  1. Disassembly - The program is disassembled to obtain information on its characteristics.
  2. Decompilation - The program is converted into a variant of C-like pseudocode through decompilation to obtain abstract syntax trees.
  3. Dimensionality reduction - The most relevant and useful features for author identification are selected.
  4. Classification - A random-forest classifier attributes the authorship of the program.

This approach analyzed various characteristics of the code, such as blank space, the use of tabs and spaces, and the names of variables, and then used a method of evaluation called a syntax tree analysis that translated the sample code into tree-like diagrams that displayed the structural decisions involved in writing the code. The design of these diagrams prioritized the order of the commands and the depths of the functions that were nestled in the code.[8]

The 2014 Sony Pictures hacking attack

U.S. intelligence officials were able to determine that the 2014 cyber attack on Sony Pictures was sponsored by North Korea after evaluating the software, techniques, and network sources. The attribution was made after cybersecurity experts noticed similarities between the code used in the attack and a malicious software known as Shamoon, which was used in the 2013 attacks against South Korean banks and broadcasting companies by North Korea.[9]

References

  1. ^ Claburn, Thomas (March 16, 2018). "FYI: AI tools can unmask anonymous coders from their binary executables". The Register. Retrieved August 2, 2018.
  2. ^ De-anonymizing Programmers via Code Stylometry. August 12, 2015. ISBN 9781939133113. Retrieved August 2, 2018. {{cite book}}: |website= ignored (help)
  3. ^ a b Frantzeskou, Georgia; Stamatatos, Efstathios; Gritzalis, Stefanos (October 2005). "Supporting the Cybercrime Investigation Process: Effective Discrimination of Source Code Authors Based on Byte-Level Information". E-business and Telecommunication Networks. Communications in Computer and Information Science. Vol. 3. pp. 283–290. doi:10.1007/978-3-540-75993-5_14. ISBN 978-3-540-75992-8 – via ResearchGate.
  4. ^ a b Gray, Andrew; MacDonnell, Stephen; Sallis, Philip (January 1998). "IDENTIFIED (Integrated Dictionary-based Extraction of Non-language-dependent Token Information for Forensic Identification, Examination, and Discrimination): A dictionary-based system for extracting source code metrics for software forensics". Proceedings. 1998 International Conference Software Engineering: Education and Practice (Cat. No.98EX220). pp. 252–259. doi:10.1109/SEEP.1998.707658. hdl:10292/3472. ISBN 978-0-8186-8828-7. S2CID 53463447 – via ResearchGate.
  5. ^ MacDonell, Stephen; Gray, Andrew; MacLennan, Grant; Sallis, Philip (February 1999). "Software forensics for discriminating between program authors using case-based reasoning, feedforward neural networks and multiple discriminant analysis". Neural Information Processing. 1. ISSN 1177-455X – via ResearchGate.
  6. ^ Rosenblum, Nathan; Zhu, Xiaojin; Miller, Barton (September 2011). "Who wrote this code? Identifying the authors of program binaries". Proceedings of the 16th European Conference on Research in Computer Security. Esorics'11: 172–189. ISBN 978-3-642-23821-5 – via ACM Digital Library.
  7. ^ Brayboy, Joyce (January 15, 2016). "Malicious coders will lose anonymity as identity-finding research matures". U.S. Army. Retrieved August 2, 2018.
  8. ^ Greenstadt, Rachel (February 27, 2015). "Dusting for Cyber Fingerprints: Coding Style Identifies Anonymous Programmers". Forensic Magazine. Retrieved August 2, 2018.
  9. ^ Brunnstrom, David; Finkle, Jim (December 18, 2014). "U.S. considers 'proportional' response to Sony hacking attack". Reuters. Retrieved August 2, 2018.

Read other articles:

2011 British filmWuthering HeightsUK theatrical release posterDirected byAndrea ArnoldWritten by Andrea Arnold Olivia Hetreed Based onWuthering Heights1847 novelby Emily BrontëProduced by Robert Bernstein Kevin Loader Douglas Rae Starring Kaya Scodelario James Howson Soloman Glave Shannon Beer Steve Evets Oliver Milburn Paul Hilton Lee Shaw James Northcote Amy Wren Nichola Burley CinematographyRobbie RyanEdited byNicolas ChaudeurgeProductioncompanies HanWay Films Ecosse Films Film4 UK Film ...

 

 

Artikel atau sebagian dari artikel ini mungkin diterjemahkan dari 55 Cancri e di en.wikipedia.org. Isinya masih belum akurat, karena bagian yang diterjemahkan masih perlu diperhalus dan disempurnakan. Jika Anda menguasai bahasa aslinya, harap pertimbangkan untuk menelusuri referensinya dan menyempurnakan terjemahan ini. Anda juga dapat ikut bergotong royong pada ProyekWiki Perbaikan Terjemahan. (Pesan ini dapat dihapus jika terjemahan dirasa sudah cukup tepat. Lihat pula: panduan penerjemahan...

 

 

Stranger ThingsGenre Fiksi ilmiah Fantasi Horor Misteri PembuatThe Duffer BrothersPemeran Winona Ryder David Harbour Finn Wolfhard Millie Bobby Brown Gaten Matarazzo Caleb McLaughlin Natalia Dyer Charlie Heaton Cara Buono Matthew Modine Noah Schnapp Sadie Sink Joe Keery Dacre Montgomery Sean Astin Paul Reiser Maya Hawke Priah Ferguson Brett Gelman Penata musik Kyle Dixon Michael Stein Negara asalAmerika SerikatBahasa asliInggrisJmlh. musim4Jmlh. episode34 (daftar episode)ProduksiProdus...

Not to be confused with Florin, Fluorene, Fluoride, Fluorone, or Florine. Chemical element, symbol F and atomic number 9Fluorine, 9FLiquid fluorine (F2 at extremely low temperature)FluorinePronunciation/ˈflʊəriːn//ˈflɔːriːn/(FLOR-een)Allotropesalpha, beta (see Allotropes of fluorine)Appearancegas: very pale yellowliquid: bright yellowsolid: alpha is opaque, beta is transparentStandard atomic weight Ar°(F)18.998403162±0.000000005[1]18.998±0.001 (abridged)[2...

 

 

925 1225 Penjaringan Halte TransjakartaHalte Penjaringan, 2022LetakKotaJakarta UtaraDesa/kelurahanPenjaringan, PenjaringanKodepos14440AlamatJalan Pluit RayaKoordinat6°07′35″S 106°47′31″E / 6.1263°S 106.7920°E / -6.1263; 106.7920Koordinat: 6°07′35″S 106°47′31″E / 6.1263°S 106.7920°E / -6.1263; 106.7920Desain HalteStruktur BRT, median jalan bebas 1 tengah Pintu masukMelalui zebra cross di depan Pluit JunctionGerbang t...

 

 

العلاقات النرويجية الوسط أفريقية النرويج جمهورية أفريقيا الوسطى   النرويج   جمهورية أفريقيا الوسطى تعديل مصدري - تعديل   العلاقات النرويجية الوسط أفريقية هي العلاقات الثنائية التي تجمع بين النرويج وجمهورية أفريقيا الوسطى.[1][2][3][4][5] مقارن...

Conceptual beauty that shows unusual effectiveness and simplicity Elegant redirects here. For the American novelist, see Robert Elegant. For other uses, see Elegance (disambiguation). Elegance of the Epoque by Frédéric Soulacroix An example of beauty in method—a simple and elegant proof of the Pythagorean theorem. Elegance is beauty that shows unusual effectiveness and simplicity. Elegance is frequently used as a standard of tastefulness, particularly in visual design, decorative arts, li...

 

 

Cinema ofArmenia List of Armenian films pre-1920 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s 2020s Animation history People Actors Directors Producers vte This is a list of notable films produced in the country of Armenia. Before 1920s List of Armenian films before 1920 1920s List of Armenian films of the 1920s 1930s List of Armenian films of the 1930s 1940s List of Armenian films of the 1940s 1950s List of Armenian films of the 1950s 1960s List of Armenian films of the 1960s...

 

 

В Википедии есть статьи о других людях с фамилией Ананьич. Борис Васильевич Ананьич Дата рождения 4 марта 1931(1931-03-04) Место рождения Ленинград, РСФСР, СССР Дата смерти 20 июля 2015(2015-07-20) (84 года) Место смерти Санкт-Петербург, Россия Страна  СССР →  Россия Род дея�...

نورودوم سيهانوك (بالخميرية: នរោត្ដម សីហនុ)‏    معلومات شخصية الميلاد 31 أكتوبر 1922 [1][2][3][4]  بنوم بنه  الوفاة 15 أكتوبر 2012 (89 سنة) [5]  بكين  سبب الوفاة نوبة قلبية  مواطنة كمبوديا  الأولاد نورودوم سيهاموني  الأب نورودوم سوراماريت&...

 

 

This article is about the German agency. For the United States agency, see Office of Administration.Federal Office of AdministrationBundesverwaltungsamt (BVA)Agency overviewFormed14 January 1960 (64 years ago) (1960-01-14)Superseding agencyBundesstelle für VerwaltungsangelegenheitenJurisdictionGovernment of GermanyHeadquartersCologne, North Rhine-WestphaliaEmployees6,000Agency executiveKatja Wilken, PresidentWebsitehttp://www.bundesverwaltungsamt.de Head office The Federal Offi...

 

 

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Approach album – news · newspapers · books · scholar · JSTOR (February 2024) (Learn how and when to remove this message) 2006 studio album by Von Hertzen BrothersApproachStudio album by Von Hertzen BrothersReleased17 May 2006 (Finland)RecordedMD St...

Human settlement in EnglandRiley GreenLeeds and Liverpool Canal, Riley Green MarinaRiley GreenShown within Chorley BoroughShow map of the Borough of ChorleyRiley GreenLocation within LancashireShow map of LancashireOS grid referenceSD621254Civil parishHoghtonDistrictChorleyShire countyLancashireRegionNorth WestCountryEnglandSovereign stateUnited KingdomPost townPRESTONPostcode districtPR5Dialling code01254PoliceLancashireFireLancashireAmbulanceNorth West UK ...

 

 

American historian and public intellectual (1916–1970) Richard HofstadterHofstadter circa 1970Born(1916-08-06)August 6, 1916Buffalo, New York, U.S.DiedOctober 24, 1970(1970-10-24) (aged 54)New York City, New York, U.S.Spouses Felice Swados ​ ​(m. 1936; died 1945)​ Beatrice Kevitt ​(m. 1947)​AwardsPulitzer Prize (1956, 1964)Academic backgroundEducationUniversity at BuffaloColumbia UniversityDoctoral advisorMerl...

 

 

Public technical and research university in India This article contains content that is written like an advertisement. Please help improve it by removing promotional content and inappropriate external links, and by adding encyclopedic content written from a neutral point of view. (January 2022) (Learn how and when to remove this message) Indian Institute of Technology MandiOther nameIITMD[citation needed]MottoScaling the heights!TypePublic technical universityEstablished2009; ...

«  Basculement des pôles » redirige ici. Pour les autres significations, voir Basculement et Pôle. Article principal : Champ magnétique terrestre. Champ magnétique terrestre mesuré en juin 2014 par la sonde Swarm (ESA/DTU Space). L'inversion du champ magnétique terrestre (ou basculement des pôles) est un phénomène récurrent dans l'histoire géologique terrestre, le pôle Nord magnétique se déplace au pôle Sud géographique, et inversement. C'est le résultat d'u...

 

 

University of Leicester Botanic GardenThe pond at the Botanic Garden, showing several of the sculptures installed for the Summer 2010 exhibitionUniversity of Leicester Botanic GardenShow map of the East MidlandsUniversity of Leicester Botanic GardenShow map of the United KingdomUniversity of Leicester Botanic GardenShow map of EuropeTypeBotanical gardenLocationOadby, Leicestershire, EnglandCoordinates52°36′30″N 1°05′29″W / 52.6083°N 1.0913°W / 52.6083; -1....

 

 

This article is about the Hindu poet-saint. For the agriculturist, see G. Nammalvar.Hindu poet-saint NammalvarStucco image of Nammalvar in Kalamegha Perumal templePersonalBornSadagopan8th century CE[1]Alwarthirunagiri, Tamil NaduReligionHinduismOrganizationPhilosophySri Vaishnavism, BhaktiReligious careerLiterary worksTiruviruttamTiruvaciriyamPeriya TiruvantatiTiruvaymoli Part of a series onVaishnavism Supreme deity Vishnu / Krishna / Rama Important deities Dashavatara Matsya Kurma Va...

Chemical compound 5α-DihydronorethandroloneClinical dataOther names5α-DHNED; 4,5α-Dihydronorethandrolone; 3-Keto-5α-dihydroethylestrenol; 17α-Ethyl-5α-dihydro-19-nortestosterone; 17α-Ethyl-5α-estran-17β-ol-3-one; 19-Nor-5α-pregnan-17α-ol-3-oneIdentifiers IUPAC name (5S,8R,9R,10S,13S,14S,17S)-17-Ethyl-17-hydroxy-13-methyl-1,2,4,5,6,7,8,9,10,11,12,14,15,16-tetradecahydrocyclopenta[a]phenanthren-3-one CAS Number2099-68-5 YPubChem CID71316183ChemSpider48063897UNIIUYA5MB8CXLCompTox...

 

 

Dieser Artikel behandelt die haitianische Währung Gourde; für den kanadischen Eishockeyspieler siehe Yanni Gourde. Gourde 250-Gourde-Banknote von 2004 Staat: Haiti Haiti Unterteilung: 100 Centimes ISO-4217-Code: HTG Abkürzung: Gde. Wechselkurs:(17. Mär 2023) 1 EUR = 163,743 HTG100 HTG = 0,6107 EUR 1 CHF = 173,475 HTG100 HTG = 0,5765 CHF Die Haitianische Gourde ist die Währung von Haiti. Die internationa...