K-anonymity

k-anonymity is a property possessed by certain anonymized data. The term k-anonymity was first introduced by Pierangela Samarati and Latanya Sweeney in a paper published in 1998,[1] although the concept dates to a 1986 paper by Tore Dalenius.[2]

k-anonymity is an attempt to solve the problem "Given person-specific field-structured data, produce a release of the data with scientific guarantees that the individuals who are the subjects of the data cannot be re-identified while the data remain practically useful."[3][4][5] A release of data is said to have the k-anonymity property if the information for each person contained in the release cannot be distinguished from at least individuals whose information also appear in the release. The guarantees provided by k-anonymity are aspirational, not mathematical.

Methods for k-anonymization

To use k-anonymity to process a dataset so that it can be released with privacy protection, a data scientist must first examine the dataset and decide whether each attribute (column) is an identifier (identifying), a non-identifier (not-identifying), or a quasi-identifier (somewhat identifying). Identifiers such as names are suppressed, non-identifying values are allowed to remain, and the quasi-identifiers need to be processed so that every distinct combination of quasi-identifiers designates at least k records.

The example table below presents a fictional, non-anonymized database consisting of the patient records for a fictitious hospital. The Name column is an identifier, Age, Gender, State of domicile, and Religion are quasi-identifiers, and Disease is a non-identifying sensitive value. But what about Height and Weight? Are they also non-identifying sensitive values, or are they quasi-identifiers?

Patients treated in the study on April 30
Name Age Gender Height Weight State of domicile Religion Disease
Ramsha 30 Female 165 cm 72 kg Tamil Nadu Hindu Cancer
Yadu 24 Female 162 cm 70 kg Kerala Hindu Viral infection
Salima 28 Female 170 cm 68 kg Tamil Nadu Muslim Tuberculosis
Sunny 27 Male 170 cm 75 kg Karnataka Parsi No illness
Joan 24 Female 165 cm 71 kg Kerala Christian Heart-related
Bahuksana 23 Male 160 cm 69 kg Karnataka Buddhist Tuberculosis
Rambha 19 Male 167 cm 85 kg Kerala Hindu Cancer
Kishor 29 Male 180 cm 81 kg Karnataka Hindu Heart-related
Johnson 17 Male 175 cm 79 kg Kerala Christian Heart-related
John 19 Male 169 cm 82 kg Kerala Christian Viral infection

There are 6 attributes and 10 records in this data. There are two common methods for achieving k-anonymity for some value of k:

  1. Suppression. In this method, certain values of the attributes are replaced by an asterisk "*". All or some values of a column may be replaced by "*". In the anonymized table below, we have replaced all the values in the Name attribute and all the values in the Religion attribute with a "*".
  2. Generalization. In this method, individual values of attributes are replaced with a broader category. For example, the value "19" of the attribute Age may be replaced by "≤ 20", the value "23" by "20 < Age ≤ 30", etc.

The next table shows the anonymized database.

Patients treated in the study on April 30
Name Age Gender Height Weight State of domicile Religion Disease
* 20 < Age ≤ 30 Female 165 cm 72 kg Tamil Nadu * Cancer
* 20 < Age ≤ 30 Female 162 cm 70 kg Kerala * Viral infection
* 20 < Age ≤ 30 Female 170 cm 68 kg Tamil Nadu * Tuberculosis
* 20 < Age ≤ 30 Male 170 cm 75 kg Karnataka * No illness
* 20 < Age ≤ 30 Female 165 cm 71 kg Kerala * Heart-related
* 20 < Age ≤ 30 Male 160 cm 69 kg Karnataka * Tuberculosis
* Age ≤ 20 Male 167 cm 85 kg Kerala * Cancer
* 20 < Age ≤ 30 Male 180 cm 81 kg Karnataka * Heart-related
* Age ≤ 20 Male 175 cm 79 kg Kerala * Heart-related
* Age ≤ 20 Male 169 cm 82 kg Kerala * Viral infection

This data has 2-anonymity with respect to the attributes Age, Gender and State of domicile, since for any combination of these attributes found in any row of the table there are always at least 2 rows with those exact attributes. The attributes available to an adversary are called quasi-identifiers. Each quasi-identifier tuple occurs in at least k records for a dataset with k-anonymity.[6]

Critiques of k-anonymity

The following example demonstrates a failing with k-anonymity: there may exist other data records that can be linked on the variables that are allegedly non-identifying. For instance, suppose an attacker is able to obtain the log from the person who was taking vital signs as part of the study and learns that Kishor was at the hospital on April 30 and is 180 cm tall. This information can be used to link with the "anonymized" database (which may have been published on the Internet) and learn that Kishor has a heart-related disease. An attacker who knows that Kishor visited the hospital on April 30 may be able to infer this simply knowing that Kishor is 180 cm height, roughly 80–82 kg, and comes from Karnataka.

The root of this problem is the core problem with k-anonymity: there is no way to mathematically, unambiguously determine whether an attribute is an identifier, a quasi-identifier, or a non-identifying sensitive value. In fact, all values are potentially identifying, depending on their prevalence in the population and on auxiliary data that the attacker may have. Other privacy mechanisms such as differential privacy do not share this problem.

Although k-anonymity safeguards against identity revelations, it does not shield against the disclosure of specific attributes. This becomes problematic when attackers possess background knowledge. Additionally, the absence of diversity in sensitive domains may result in the exposure of personal information. In such scenarios, opting for ℓ-Diversity might offer a more robust privacy safeguard.[1]

Meyerson and Williams (2004) demonstrated that optimal k-anonymity is an NP-hard problem, however heuristic methods such as k-Optimize as given by Bayardo and Agrawal (2005) often yield effective results.[7][8] A practical approximation algorithm that enables solving the k-anonymization problem with an approximation guarantee of was presented by Kenig and Tassa.[9]

Attacks

While k-anonymity is a relatively simple-to-implement approach for de-identifying a dataset prior to public release, it is susceptible to many attacks. When background knowledge is available to an attacker, such attacks become even more effective. Such attacks include:

  • Homogeneity Attack: This attack leverages the case where all the values for a sensitive value within a set of k records are identical. In such cases, even though the data has been k-anonymized, the sensitive value for the set of k records may be exactly predicted.
  • Background Knowledge Attack: This attack leverages an association between one or more quasi-identifier attributes with the sensitive attribute to reduce the set of possible values for the sensitive attribute. For example, Machanavajjhala, Kifer, Gehrke, and Venkitasubramaniam (2007) showed that knowing that heart attacks occur at a reduced rate in Japanese patients could be used to narrow the range of values for a sensitive attribute of a patient's disease.
  • Downcoding Attack: This attack, introduced in 2022 by Aloni Cohen, takes advantage of the way that anonymity algorithms aggregate attributes in separate records. Because the aggregation is deterministic, it is possible to reverse-engineer the original data image, and in many cases reveal the original data that was to be protected. This attack does not require background knowledge, but is strengthened by it.[10]

Because k-anonymization does not include any randomization, attackers can make reliable, unambiguous inferences about data sets that may harm individuals. For example, if the 19-year-old John from Kerala is known to be in the database above, then it can be reliably said that he has either cancer, a heart-related disease, or a viral infection.

K-anonymization is not a good method to anonymize high-dimensional datasets.[11]

It has also been shown that k-anonymity can skew the results of a data set if it disproportionately suppresses and generalizes data points with unrepresentative characteristics.[12] The suppression and generalization algorithms used to k-anonymize datasets can be altered, however, so that they do not have such a skewing effect.[13]

See also

References

  1. ^ Samarati, Pierangela; Sweeney, Latanya (1998). "Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression" (PDF). Harvard Data Privacy Lab. Retrieved April 12, 2017.
  2. ^ Tore Dalenius, "Finding a Needle In a Haystack", Journal of Official Statistics, Vol. 2, No. 3, 1986, pp. 326–336.
  3. ^ Samarati, Pierangela (November 2001). "Protecting Respondents' Identities in Microdata Release" (PDF). IEEE Transactions on Knowledge and Data Engineering. 13 (6): 1010–1027. doi:10.1109/69.971193. S2CID 561716.
  4. ^ Sweeney, Latanya. "Database Security: k-anonymity". Retrieved 19 January 2014.
  5. ^ Sweeney, Latanya (2002). "k-anonymity: a model for protecting privacy" (PDF). International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 10 (5): 557–570. doi:10.1142/S0218488502001648. S2CID 361794.
  6. ^ Narayanan, Arvind; Shmatikov, Vitaly. "Robust De-anonymization of Large Sparse Datasets" (PDF).
  7. ^ Roberto J. Bayardo; Rakesh Agrawal (2005). "Data Privacy through Optimal k-Anonymization". 21st International Conference on Data Engineering (ICDE'05) (PDF). pp. 217–228. doi:10.1109/ICDE.2005.42. ISBN 978-0-7695-2285-2. ISSN 1084-4627. S2CID 17044848. Data de-identification reconciles the demand for release of data for research purposes and the demand for privacy from individuals. This paper proposes and evaluates an optimization algorithm for the powerful de-identification procedure known as k-anonymization. A k-anonymized dataset has the property that each record is indistinguishable from at least k − 1 others. Even simple restrictions of optimized k-anonymity are NP-hard, leading to significant computational challenges. We present a new approach to exploring the space of possible anonymizations that tames the combinatorics of the problem, and develop data-management strategies to reduce reliance on expensive operations such as sorting. Through experiments on real census data, we show the resulting algorithm can find optimal k-anonymizations under two representative cost measures and a wide range of k. We also show that the algorithm can produce good anonymizations in circumstances where the input data or input parameters preclude finding an optimal solution in reasonable time. Finally, we use the algorithm to explore the effects of different coding approaches and problem variations on anonymization quality and performance. To our knowledge, this is the first result demonstrating optimal k-anonymization of a nontrivial dataset under a general model of the problem.
  8. ^ Adam Meyerson; Ryan Williams (2004). "On the complexity of optimal K-anonymity". Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PDF). New York, NY: ACM. pp. 223–228. doi:10.1145/1055558.1055591. ISBN 978-1581138580. S2CID 6798963. The technique of k-anonymization has been proposed in the literature as an alternative way to release public information, while ensuring both data privacy and data integrity. We prove that two general versions of optimal k-anonymization of relations are NP-hard, including the suppression version which amounts to choosing a minimum number of entries to delete from the relation. We also present a polynomial time algorithm for optimal k-anonymity that achieves an approximation ratio independent of the size of the database, when k is constant. In particular, it is a O(k log k)-approximation where the constant in the big-O is no more than 4. However, the runtime of the algorithm is exponential in k. A slightly more clever algorithm removes this condition, but is a O(k log m)-approximation, where m is the degree of the relation. We believe this algorithm could potentially be quite fast in practice.
  9. ^ Kenig, Batya; Tassa, Tamir (2012). "A practical approximation algorithm for optimal k-anonymity". Data Mining and Knowledge Discovery. 25: 134–168. doi:10.1007/s10618-011-0235-9. S2CID 14158546.
  10. ^ Attacks on Deidentification's Defenses, Aloni Cohen, USENIX Security 2022, Distinguished Paper Award Winner. https://www.usenix.org/conference/usenixsecurity22/presentation/cohen
  11. ^ Aggarwal, Charu C. (2005). "On k-Anonymity and the Curse of Dimensionality". VLDB '05 – Proceedings of the 31st International Conference on Very large Data Bases. Trondheim, Norway. CiteSeerX 10.1.1.60.3155. ISBN 1-59593-154-6.
  12. ^ Angiuli, Olivia; Joe Blitzstein; Jim Waldo. "How to De-Identify Your Data". ACM Queue. ACM.
  13. ^ Angiuli, Olivia; Jim Waldo (June 2016). "Statistical Tradeoffs between Generalization and Suppression in the De-identification of Large-Scale Data Sets". 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC). pp. 589–593. doi:10.1109/COMPSAC.2016.198. ISBN 978-1-4673-8845-0. S2CID 17716908.

Read other articles:

Gabriel Høyland Nazionalità  Norvegia Calcio Ruolo Attaccante Carriera Squadre di club1 1970-1986 Bryne? (?) Nazionale 1971 Norvegia U-161 (0)1972-1973 Norvegia U-1911 (3)1974-1977 Norvegia U-214 (0)1974-1982 Norvegia23 (3) Carriera da allenatore 2001 Bryne2009 Bryne 1 I due numeri indicano le presenze e le reti segnate, per le sole partite di campionato.Il simbolo → indica un trasferimento in prestito.   Modifica dati su Wikidata · Manuale Gabriel Høyland ...

Sporting event delegationKenya at theCommonwealth GamesCGF codeKENCGANational Olympic Committee of KenyaMedalsRanked 8th Gold 85 Silver 75 Bronze 77 Total 237 Commonwealth Games appearances (overview)195419581962196619701974197819821986199019941998200220062010201420182022 Kenya has attended sixteen Commonwealth Games, beginning in 1954 and missing only the 1986 Games. Kenya has won 237 Commonwealth Games medals, mostly in athletics and boxing, with thirty-four coming from a single event, the ...

Adi Parashakti Artikel ini adalah bagian dari seriSakta Ketuhanan tertinggi Adi Parasakti (Mahadewi) Perwujudan Mahadewi Lalita Tripura Sundari Tridewi Saraswati Laksmi Durga Navadurga Mahawidya Kali Sati Parwati Bhairawi Kamakhya (Kubjika) Yogini Tara Uma Camunda Candika Radha Sita Matrika Pustaka suci Weda Upanisad Dewi Sita Tripura Tantra Yoni Kali Kamakalawilasa Kularnawa Mahanirwana Tara Tripura Dewibhagawata Purana Dewi Gita Dewi Mahatmya Lalita Sahasranama Kalika Purana Sutra Parananda...

Dale Van Sickel Dale Harris Van Sickel (Eatonton, Georgia, 29 de noviembre de 1907 - Newport Beach, California, 25 de enero de 1977) fue un jugador estadounidense de fútbol universitario, baloncesto universitario y béisbol universitario durante la años 20, quien después se convirtió en un actor de Hollywood y especialista de cine por más de cuarenta años. Van Sickel jugó al fútbol universitario para la Universidad de la Florida, y su equipo fue reconocido como el primer equipo All-Am...

  لمعانٍ أخرى، طالع ميخائيل كيلي (توضيح). هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (يوليو 2019) ميخائيل كيلي معلومات شخصية الميلاد 29 مارس 1872  غالواي  الوفاة 3 مايو 1923 (51 سنة)   كوبلنز  مواطنة الولايات...

IseVaro della IseDescrizione generale TipoNave da battaglia ClasseIse CantiereKawasaki, Kōbe Impostazione10 maggio 1915 Varo12 novembre 1916 Completamento15 dicembre 1917 Destino finaleAffondata, 28 luglio 1945 Caratteristiche generaliDislocamentoNormale: 31762 t Pieno carico: 37086 t LunghezzaGlobale: 205,8 m linea di galleggiamento: 195,1 m Larghezza28,7 m Pescaggio8,8 m PropulsioneTurbine Curtiss, 4 assi elica; 24 caldaie Kampon; 45 CV; Carbone 4706 ...

British politician The Right HonourableThe Lord AilwynKCVO KBE PC DLVice-Chamberlain of the HouseholdIn office10 July 1895 – 3 December 1900MonarchQueen VictoriaPrime MinisterThe Marquess of SalisburyPreceded byCharles SpencerSucceeded bySir Alexander Fuller-Acland-Hood, BtPresident of the Board of AgricultureIn office14 March 1905 – 4 December 1905MonarchEdward VIIPrime MinisterArthur BalfourPreceded byThe Earl of OnslowSucceeded byThe Earl Carrington Person...

Peacekeeping unit of the Russian army 15th Separate Guards Motorized Rifle BrigadeRussian: 15-я отдельная гвардейская мотострелковая бригадаShoulder sleeve insigniaActive1 February 2005–PresentCountry RussiaBranch Russian Ground ForcesTypeMechanized infantry (peacekeeping)SizeBrigadePart of2nd Guards Combined Arms Army, Central Military DistrictGarrison/HQRoshchinsky, Volzhsky District, Samara OblastEquipmentTorn-MDM radio intelligenc...

Heppach Oberlaufname: Hörnlesbach Mündung des Heppachs in die Rems Mündung des Heppachs in die Rems Daten Gewässerkennzahl DE: 2383678 Lage Neckarbecken Waiblinger Bucht Remstaltraufbucht Baden-Württemberg Rems-Murr-Kreis Gde. Korb Stadt Weinstadt Flusssystem Rhein Abfluss über Rems → Neckar → Rhein → Nordsee Quelle des Hörnlesbachs:nordöstlich von Korb unterm Hanweiler Sattel48° 50′ 54″ N, 9° 22′...

Revival architectural style This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Moorish Revival architecture – news · newspapers · books · scholar · JSTOR (December 2007) (Learn how and when to remove this template message) Famed Viječnica in Sarajevo, 1894, building of the National Library of Bosnia and Herzeg...

2014 video gameGuilty Gear XrdTextless cover of the home console versionsDeveloper(s)Arc System WorksPublisher(s)ArcadeJP: SegaPS3, PS4JP: Arc System WorksNA: Aksys GamesEU: Sony Computer Entertainment (Sign)EU: PQube (Revelator, Rev 2)WindowsWW: Arc System WorksDirector(s) Daisuke Ishiwatari Takeshi Yamanaka Designer(s)Daisuke IshiwatariArtist(s)Hidehiko SakamuraWriter(s) Daisuke Ishiwatari Takeshi Yamanaka Yoshito Shoudai Composer(s) Daisuke Ishiwatari Norichika Sato SeriesGuilty GearEngine...

This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) A major contributor to this article appears to have a close connection with its subject. It may require cleanup to comply with Wikipedia's content policies, particularly neutral point of view. Please discuss further on the talk page. (June 2018) (Learn how and when to remove this template message) This article's lead section may be too short...

El predicador estadounidense Billy Graham en 1966. Un predicador cuáquero y su congregación. Predicador es un término que se aplica a quien pronuncia sermones y homilías, generalmente de contenido teológico. La predicación también se entiende que no se limita a un punto de vista religioso, sino que, en sentido amplio, se extiende a las visiones del mundo moral y social. Los predicadores son comunes en la mayoría de las culturas. Pueden tomar la forma tanto de un ministro cristiano en ...

Swiss female curler and coach Betty BourquinCurler ♀Other namesBetty Bourquin-SteffenTeamCurling clubBasel-Albeina CC, BaselCurling career Member Association  SwitzerlandWorld Championshipappearances1 (1979)European Championshipappearances1 (1979) Medal record Curling World Championships 1979 Perth European Championships 1979 Varese Swiss Women's Championship[1] 1979 Betty Bourquin (in marriage also known as Betty Bourquin-Steffen[2]) is a former Swiss fem...

Morena Información personalNombre de nacimiento Margerita Camilleri FenechOtros nombres «El Volcán Mediterráneo»Nacimiento 22 de mayo de 1984Sannat, Malta Nacionalidad MaltesaLengua materna Maltés Información profesionalOcupación Cantante y modelo.Años activa desde 2006Seudónimo «El Volcán Mediterráneo»Géneros Pop, danceInstrumento VozSitio web Página oficial[editar datos en Wikidata] Margaret Camilleri conocida como Morena,[1]​ nacida en el año de 1984,[...

Circuit simulation software LTspiceOriginal author(s)Mike Engelhardt[1]Developer(s)Linear Technology,[1] Analog Devices[2]Initial releaseOctober 1999;24 years ago (1999-10)[1]Stable release17.1.15[3] / October 2, 2023 (2023-10-02)[3] Operating systemWindows 7, 8, 8.1, 10, macOS 10.10+PlatformIA-32, x86-64SizeWindows (60 MB), MacOS (30 MB)Available inEnglishTypeElectronic design automationLicenseFreeware[4 ...

Cet article est une ébauche concernant le Burundi. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. 2007 au Burundi - 2008 au Burundi - 2009 au Burundi - 2010 au Burundi - 2011 au Burundi 2007 par pays en Afrique - 2008 par pays en Afrique - 2009 par pays en Afrique - 2010 par pays en Afrique - 2011 par pays en Afrique] Chronologies Données clés 2006 2007 2008  2009  2010 2011 2012Décennies :197...

[1]هذه مقالة غير مراجعة. ينبغي أن يزال هذا القالب بعد أن يراجعها محرر؛ إذا لزم الأمر فيجب أن توسم المقالة بقوالب الصيانة المناسبة. يمكن أيضاً تقديم طلب لمراجعة المقالة في الصفحة المخصصة لذلك. (أبريل 2021) هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإ�...

Sylvan Ebanks-Blake Ebanks-Blake bermain untuk Wolverhampton Wanderers pada 2012Informasi pribadiNama lengkap Sylvan Agustus Ebanks-Blake[1]Tanggal lahir 29 Maret 1986 (umur 37)Tempat lahir Cambridge, InggrisTinggi 1,78 m (5 ft 10 in)[2]Posisi bermain PenyerangKarier junior1999–2002 Cambridge United2002–2004 Manchester UnitedKarier senior*Tahun Tim Tampil (Gol)2004–2006 Manchester United 0 (0)2006 → Royal Antwerp (pinjaman) 9 (4)2006–2008 Plymout...

Paghimo ni bot Lsjbot. Anthurium holquinianum Siyentipikinhong Pagklasipikar Kaginharian: Plantae Kabahig: Tracheophyta Kahutong: Liliopsida Kahanay: Alismatales Kabanay: Araceae Kahenera: 'Anthurium' Espesye: ''Anthurium holquinianum'' Siyentipikinhong Ngalan Anthurium holquinianumCroat & D.C.Bay Kaliwatan sa tanom nga bulak ang Anthurium holquinianum.[1] Una ning gihulagway ni Thomas Bernard Croat ug D.C.Bay.[2] Ang Anthurium holquinianum sakop sa kahenera nga Anthurium,...