Data editing

Data editing is defined as the process involving the review and adjustment of collected survey data.[1] Data editing helps define guidelines that will reduce potential bias and ensure consistent estimates leading to a clear analysis of the data set by correct inconsistent data using the methods later in this article.[2] The purpose is to control the quality of the collected data.[3] Data editing can be performed manually, with the assistance of a computer or a combination of both.[4]

Editing methods

Editing methods refer to a range of procedures and processes used for detecting and handling errors in data. Data editing is used with the goal to improve the quality of statistical data produced. These modifications can greatly improve the quality of analytics created by aiming to detect and correct errors. Examples of different techniques to data editing such as micro-editing, macro-editing, selective editing, or the different tools used to achieve data editings such as graphical editing and interactive editing.

Interactive editing

The term interactive editing is commonly used for modern computer-assisted manual editing. Most interactive data editing tools applied at National Statistical Institutes (NSIs) allow one to check the specified edits during or after data entry, and if necessary to correct erroneous data immediately. Several approaches can be followed to correct erroneous data:

  • Re-contact the respondent
  • Compare the respondent's data to his data from the previous year
  • Compare the respondent's data to data from similar respondents
  • Use the subject matter knowledge of the human editor

Interactive editing is a standard way to edit data. It can be used to edit both categorical and continuous data.[5] Interactive editing reduces the time frame needed to complete the cyclical process of review and adjustment.[6] Interactive editing also requires an understanding of the data set and the possible results that would come from an analysis of the data.

Selective editing

Selective editing is an umbrella term for several methods to identify the influential errors, [note 1] and outliers.[note 2] Selective editing techniques aim to apply interactive editing to a well-chosen subset of the records, such that the limited time and resources available for interactive editing are allocated to those records where it has the most effect on the quality of the final estimates of published figures. In selective editing, data is split into two streams:

  • The critical stream
  • The non-critical stream

The critical stream consists of records that are more likely to contain influential errors. These critical records are edited in a traditional interactive manner. The records in the non-critical stream which are unlikely to contain influential errors are not edited in a computer-assisted manner.[7]

Data editing techniques

Data editing can be accomplished in many ways and primarily depends on the data set that is being explored.[8]

Validity and completeness of data

The validity of a data set depends on the completeness of the responses provided by the respondents. One method of data editing is to ensure that all responses are complete in fields that require a numerical or non-numerical answer. See the example below.

In the above table is an example of incomplete and invalid data. See Column 1, Row 2: The answer is alphanumeric when the rest of the table is numeric. See Column 3, Row 3: The answer is incomplete and missing data.

Duplicate data entry

Verifying that the data is unique is an important aspect of data editing to ensure that all data provided was only entered once. This reduces the possibility for repeated data that could skew analytics reporting. See the example below.

In the above table is an example of data with duplicate entries. See Sr. No 1 and 4: The data is repeated for two different entries with different indexes (Index No.).

Outliers

It is common to find outliers in data sets, which as described before are values that do not fit a model of data well. These extreme values can be found based on the distribution of data points from previous data series or parallel data series for the same data set. The values can be considered erroneous and require further analysis for checking and determining the validity of the response. See the example below.

In the above table is an example of extreme values in a data set also known as outliers. See Employees 2 and 6: The data is divergent from the rest of the table.

Logical inconsistencies

Logical consistency is the presence of logical relationships and interdependence between the variables. This editing requires a certain understanding around the dataset and the ability to identify errors in data based on previous reports or information. This type of data editing is used to account for the differences between data fields or variables. See the example below.

In the above table is an example of logical inconsistency in the data set. See Row 2: Salim's age is documented as 55cm, which is not logical and therefore an error in the data set.

Macro editing

There are two methods of macro editing:[7]

Aggregation method

This method is followed in almost every statistical agency before publication: verifying whether figures to be published seems plausible. This is accomplished by comparing quantities in publication tables with the same quantities in previous publications. If an unusual value is observed, a micro-editing procedure is applied to the individual records and fields contributing to the suspicious quantity.[6]

Distribution method

Data available is used to characterize the distribution of the variables. Then all individual values are compared with the distribution. Records containing values that could be considered uncommon (given the distribution) are candidates for further inspection and possibly for editing.[9]

Automatic editing

In automatic editing records are edited by a computer without human intervention.[10] Prior knowledge on the values of a single variable or a combination of variables can be formulated as a set of edit rules which specify or constrain the admissible values

Determinants of data editing

Data editing has its limitations with the capacity and resources of any given study. These determinants can have a positive or negative impact on the post-analysis of the data set. Below are several determinants of data editing.[8]

Available resources: [8]

  • Time allocated to the project
  • Money and budget constraints

Available Software:[8]

  • Tools used to analyze the data
  • Tools available to identify errors in the data set
  • Immediate availability of software depending on the objectives and goals of the data

Data Source: [8]

  • Limitations of respondents to answer according to expectations
  • Missing information from respondents that are not readily available
  • Follow-ups are difficult to maintain in large data pools

Coordination of Data Editing Procedure: [8]

  • Subjective views on the data set
  • Disagreements between the overall objectives of the data
  • Methods used to handle data editing

See also

Notes

  1. ^ the errors that have a substantial impact on the publication figures
  2. ^ values that do not fit a model of data well

References

  1. ^ Ferguson, Dania P. "AN INTRODUCTION TO THE DATA EDITING PROCESS" (PDF). unece.org/.
  2. ^ "National Center for Education Statistics (NCES) Home Page, part of the U.S. Department of Education". nces.ed.gov. Retrieved 2020-12-06.
  3. ^ "UNECE".
  4. ^ "Stat¡stics: Power from Data! Data editing". www150.statcan.gc.ca.
  5. ^ Waal, Ton de et al. "Handbook of Statistical Data Editing and Imputation". Wiley publication, 2011, p.15.
  6. ^ a b "UNECE Homepage". www.unece.org.
  7. ^ a b Waal, Ton de et al. "Handbook of Statistical Data Editing and Imputation". Wiley publication, 2011, p.16.
  8. ^ a b c d e f SCAD. "SCAD". SCAD. Retrieved 2020-12-07.
  9. ^ Bethlehem, J. "Applied Survey Methods A Statistical Perspective ". Wiley publication, 2009, p.205.
  10. ^ Waal, Ton de et al. "Handbook of Statistical Data Editing and Imputation". Wiley publication

Read other articles:

Rachel McAdamsRachel McAdams di San Diego Comic-Con tahun 2016LahirRachel Anne McAdams17 November 1978 (umur 45)[1]Ontario, KanadaNama lainRachel MacAdamsPekerjaanAktrisTahun aktif2001–sekarang Rachel Anne McAdams (lahir 17 November 1978) adalah seorang aktris asal Kanada. Setelah lulus dari program teater empat tahun di York University tahun 2001, McAdams ambil peran di sejumlah produksi televisi dan film Kanada seperti My Name Is Tanino, Perfect Pie (masuk nominasi...

 

 

Michelle MalkinLahirMichelle Maglalang20 Oktober 1970 (umur 53)Philadelphia, Pennsylvania, Amerika SerikatPendidikanOberlin CollegePekerjaanPenulis, syndicated columnist, television personality, and blogger, Fox NewsPartai politikRepublicanSuami/istriJesse Malkin ​(m. 1993)​Anak2Situs webOfficial website Michelle Malkin ( /ˈmɔːlkɪn/ ; née Maglalang; lahir 20 Oktober 1970) adalah seorang Amerika konservatif blogger, komentator politik, penulis dan p...

 

 

Ferry operator in Washington, United States and British Columbia, Canada This article relies excessively on references to primary sources. Please improve this article by adding secondary or tertiary sources. Find sources: Clipper Navigation – news · newspapers · books · scholar · JSTOR (March 2023) (Learn how and when to remove this template message) FRS Clipper Navigation Inc.Company typePrivateFounded1986HeadquartersSeattle, Washington, United States...

South Korean footballer Oh In-Kyun In-kyun with Persib Bandung in 2018Personal informationFull name Oh In-KyunDate of birth (1985-01-29) 29 January 1985 (age 39)Place of birth Seoul, South KoreaHeight 1.76 m (5 ft 9 in)Position(s) MidfielderSenior career*Years Team Apps (Gls)2007–2008 Chungju Hummel 21 (6)2008–2009 Yesan FC 14 (4)2009–2010 Balestier Khalsa 15 (7)2010−2011 PS Bengkulu 20 (4)2011–2012 PSMS Medan 14 (0)2013 Persela Lamongan 35 (4)2016 Gresik United ...

 

 

Questa voce o sezione sugli argomenti aziende aeronautiche e aziende britanniche non cita le fonti necessarie o quelle presenti sono insufficienti. Puoi migliorare questa voce aggiungendo citazioni da fonti attendibili secondo le linee guida sull'uso delle fonti. Segui i suggerimenti del progetto di riferimento. Questa voce sugli argomenti aziende aeronautiche e aziende britanniche è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Segui i suggerime...

 

 

Европейская сардина Научная классификация Домен:ЭукариотыЦарство:ЖивотныеПодцарство:ЭуметазоиБез ранга:Двусторонне-симметричныеБез ранга:ВторичноротыеТип:ХордовыеПодтип:ПозвоночныеИнфратип:ЧелюстноротыеГруппа:Костные рыбыКласс:Лучепёрые рыбыПодкласс:Новопёры...

Country club and neighborhood in Raleigh, North Carolina North Ridge Country ClubClub informationLocationRaleigh, North Carolina, United StatesTypePrivate/ResidentialTotal holes36Events hostedNorth Ridge CupWebsite[1]The LakesDesigned byGeorge CobbGene Hamm The OaksDesigned byGeorge CobbGene Hamm North Ridge Country Club is a member-owned country club located in northern Raleigh, North Carolina, along the North Ridge Estates neighborhood. Golfing and events North Ridge has two 18-hole golf co...

 

 

ألكسندر كيرينسكي (بالروسية: Александръ Ѳедоровичъ Керенскій)‏    معلومات شخصية الميلاد 22 أبريل 1881   أوليانوفسك  الوفاة 11 يونيو 1970 (89 سنة) [1][2][3][4][5]  نيويورك  مواطنة الجمهورية الروسية الإمبراطورية الروسية  عدد الأولاد 2   مناصب رئيس و...

 

 

2022年肯塔基州聯邦參議員選舉 ← 2016年 2022年11月8日 (2022-11-08) 2028年 →   获提名人 蘭德·保羅 查爾斯·布克 政党 共和黨 民主党 民選得票 913,326 564,311 得票率 61.8% 38.2% 各縣結果保羅:     50–60%     60–70%     70–80%     80–90%布克:     50–60%     60–70% 选前聯邦參議...

1900年美國總統選舉 ← 1896 1900年11月6日 1904 → 447張選舉人票獲勝需224張選舉人票投票率73.2%[1] ▼ 6.1 %   获提名人 威廉·麥金利 威廉·詹寧斯·布賴恩 政党 共和黨 民主党 家鄉州 俄亥俄州 內布拉斯加州 竞选搭档 西奧多·羅斯福 阿德萊·史蒂文森一世 选举人票 292 155 胜出州/省 28 17 民選得票 7,228,864 6,370,932 得票率 51.6% 45.5% 總統選舉結果地圖,紅色代表�...

 

 

2007 film by Andrew Lau This article's lead section may be too short to adequately summarize the key points. Please consider expanding the lead to provide an accessible overview of all important aspects of the article. (October 2014) The FlockPromotional movie posterDirected byAndrew LauWritten byHans BauerCraig MitchellProduced byPhilippe MartinezElie SamahaJenette KahnAdam RichmanAndrew LauStarringRichard GereClaire DanesKaDee StricklandAvril LavigneCinematographyEnrique ChediakEdited byMar...

 

 

Chatbot developed by Google This article is about the chatbot. For the language model, see Gemini (language model). GeminiScreenshot Developer(s)Google AIInitial releaseMarch 21, 2023; 14 months ago (2023-03-21)Stable releaseMay 21, 2024; 16 days ago (2024-05-21)[1] Operating system Web app Android iOS PredecessorPaLM 2Available in46 languages[2]239 countries and regions[2]TypeChatbotLicenseProprietary[3]Websitegemini.google....

Low-lying flatland region encompassing the northern part of the Caspian Sea This article includes a list of general references, but it lacks sufficient corresponding inline citations. Please help to improve this article by introducing more precise citations. (February 2008) (Learn how and when to remove this message) Caspian DepressionDepressionCaspian Depression and north Caspian Sea from space. NASA photoCaspian DepressionShow map of KazakhstanCaspian DepressionShow map of RussiaCoordinates...

 

 

Abbess and Roman Catholic saint For persons named Odile, see Odile. Odilla redirects here. For the moth genus, see Odilla (moth). For the nematode genus, see Odilia (genus). SaintOdileSaint Odile in Avolsheim, AlsaceAbbess of HohenburgBorn660Alsace, AustrasiaDied720Alsace, Kingdom of the FranksVenerated inCatholic ChurchEastern Orthodox ChurchCanonizedPre-CongregationFeast13 DecemberAttributesAbbess praying before an altar; woman with a book on which lie two eyes[1]Patronagethe b...

 

 

Costa RicaBiệt danhCác chàng trai (Los Ticos)Hiệp hộiLiên đoàn bóng đá Costa Rica (FEDEFUTBOL)Liên đoàn châu lụcCONCACAF (Bắc Mỹ)Huấn luyện viên trưởngGustavo AlfaroĐội trưởngFrancisco CalvoThi đấu nhiều nhấtCelso Borges (163)Ghi bàn nhiều nhấtRolando Fonseca (47)Sân nhàSân vận động Quốc gia Costa RicaMã FIFACRC Áo màu chính Áo màu phụ Hạng FIFAHiện tại 52 2 (ngày 4 tháng 4 năm 2024)[1]Cao nh�...

Tunisian former tennis player Malek Jaziri مالك الجزيريJaziri at the 2019 French OpenCountry (sports) TunisiaResidenceTunis, TunisiaBorn (1984-01-20) January 20, 1984 (age 40)Bizerte, TunisiaHeight1.85 m (6 ft 1 in)Turned pro2003Retired2023PlaysRight-handed (two-handed backhand) *occasionally uses one-handed backhandCoachDejan PetrovićPrize moneyUS$4,050,966SinglesCareer record104–145 (41.8% in ATP World Tour and Grand Slam main draw ...

 

 

Operating system Operating system FreeSBIEFreeSBIE 2.0 with Xfce environmentOS familyFreeBSDWorking stateUnmaintainedSource modelOpen sourceLatest release2.0.1 / February 2, 2007 (2007-02-02)Kernel typeMonolithicOfficial websitewww.freesbie.org FreeSBIE is a live CD, an operating system that is able to load directly from a bootable CD with no installation process or hard disk.[1] It is based on the FreeBSD operating system. Its name is a pun on frisbee. Currently, FreeS...

 

 

Scottish footballer Stephen Husband Personal informationFull name Stephen Husband[1]Date of birth (1990-10-29) 29 October 1990 (age 33)Place of birth Kelty, ScotlandPosition(s) MidfielderTeam informationCurrent team Dundonald Bluebell (manager)Youth career Cowdenbeath Heart of MidlothianSenior career*Years Team Apps (Gls)2006–2007 Cowdenbeath 5 (0)2007–2010 Heart of Midlothian 0 (0)2009 → Livingston (loan) 7 (0)2010–2012 Blackpool 3 (0)2011 → Stockport County (loan) 5 (...

Species of mammals related to sloths and armadillos Northern tamandua[1] Conservation status Least Concern  (IUCN 3.1)[2] Scientific classification Domain: Eukaryota Kingdom: Animalia Phylum: Chordata Class: Mammalia Order: Pilosa Family: Myrmecophagidae Genus: Tamandua Species: T. mexicana Binomial name Tamandua mexicana(Saussure, 1860) Northern tamandua range Gamboa, Panama The northern tamandua (Tamandua mexicana) is a species of tamandua, an anteater in the famil...

 

 

Eutelsat 5 West A Données générales Organisation Eutelsat Domaine Télécommunications Autres noms Stellat 5 (2002)Atlantic Bird 3 (2002-2012)Eutelsat 5 West A (Depuis 2012) Lancement 5 juillet 2002 à 23:21 UTC Lanceur Ariane 5 G Durée de vie Estimée à 12 ans, fin d'exploitation opérationnelle estimée au 06/2018 : T4-2019 [1] 20 ans (réelle) Désorbitage Janvier 2023[2] Identifiant COSPAR 2002-035A Caractéristiques techniques Masse au lancement 4 050 kg (au lancement) Orbite ...