A region of interest (often abbreviated ROI) is a sample within a data set identified for a particular purpose.[1] The concept of a ROI is commonly used in many application areas. Existing as a vicinity, or within one. For example, in medical imaging, the boundaries of a tumor may be defined on an image or in a volume, for the purpose of measuring its size. The endocardial border may be defined on an image, perhaps during different phases of the cardiac cycle, for example, end-systole and end-diastole, for the purpose of assessing cardiac function. In geographical information systems (GIS), a ROI can be taken literally as a polygonal selection from a 2D map. In computer vision and optical character recognition, the ROI defines the borders of an object under consideration. In many applications, symbolic (textual) labels are added to a ROI, to describe its content in a compact manner. Within a ROI may lie individual points of interest (POIs).
Examples of regions of interest
1D dataset: a time or frequency interval on a waveform
2D dataset: the boundaries of an object on an image
3D dataset: the contours or surfaces outlining an object (sometimes known as the Volume of Interest (VOI)) in a volume
4D dataset: the outline of an object at or during a particular time interval in a time-volume
A ROI is a form of Annotation, often associated with categorical or quantitative information (e.g., measurements like volume or mean intensity), expressed as text or in a structured form.
There are three fundamentally different means of encoding a ROI:
As an integral part of the sample data set, with a unique or masking value that may or may not be outside the normal range of normally occurring values and which tags individual data cells
As separate, purely graphic information, such as with vector or bitmap (rasterized) drawing elements, perhaps with some accompanying plain (unstructured) text in the format of the data itself
As a separate structured semantic information (such as coded value types) with a set of spatial and/or temporal coordinates
Medical imaging
Medical imaging standards such as DICOM provide general and application-specific mechanisms to support various use-cases.
For DICOM images (two or more dimensions):
Burned in graphics and text may occur within the normal pixel value range (e.g., as the maximum white value) (deprecated)
Bitmap (rasterized) overlay graphics and text may be present in unused high bits of the pixel data or in a separate attribute (deprecated)
Vector graphics may be encoded in separate image attributes as curves (deprecated)
Unstructured vector graphics and text as well as bitmap (rasterized) overlay graphics may be encoded in a separate object as a presentation state that references the image object to which it is to be applied
Structured data may be encoded in a separate object as a structured report in the form of a tree of name-value pairs of coded or text concepts possibly associated with derived quantitative information can reference spatial and/or temporal coordinates that in turn reference the image objects to which they apply
Reference locations may be encoded as fiducials in the form of spatial coordinates with an associated coded purpose, either as pixel coordinates by reference to specific images or as coordinates in a named patient-relative 3D Cartesian space
Pixels (possibly non-contiguous) may be classified into segments encoded in a segmentation object as either binary or probabilistic values in a raster (which is not required to have the same spatial sampling or extent as the images from which the segmentation was derived); these are usually referenced by other objects containing structured content (structured reports)
For DICOM radiotherapy:
Contours of objects may be defined as structure sets, either as pixel coordinates by reference to specific images or as coordinates in a named patient-relative 3D Cartesian space (these are also used for non-RT applications)
For DICOM time-based waveforms:
Burned in values may occur with the waveform (deprecated)
Annotations may be encoded in a separate attribute can select multiple time points or a range of time points, either by sample number or specified time
Structured data may be encoded in a separate object as a structured report in the form of a tree of name-value pairs of coded or text concepts possibly associated with derived quantitative information can reference temporal coordinates that in turn reference the waveform objects to which they apply
HL7Clinical Document Architecture also has a subset of mechanisms similar to (and intended to be compatible with) DICOM for referencing image-related spatial coordinates as observations; it allows for a circle, ellipse, polyline or point to be defined as integer pixel-relative coordinates referencing an external multi-media image object, which may be of a consumer rather than medical image format (e.g., a GIF, PNG or JPEG).
Document analysis systems
In Optical Character Recognition (OCR) and Document Layout Analysis, regions of interest (ROIs) hierarchically encompass pages, text or graphical blocks, down to individual line-strip images, word and character image boxes. The de facto standard in archives and libraries is the tuplet {image_file, xml_file}, usually in the form of a *.tif file and its accompanying *.xml file.
Other 2D applications
As far as non-medical standards are concerned, in addition to the purely graphic markup languages (such as PostScript or PDF) and vector graphic (such as SVG) and 3D (such as VRML) drawing file formats that are widely available, and which carry no specific ROI semantics, some standards such as JPEG 2000 specifically provide mechanisms to label and/or compress to a different degree of fidelity, what they refer to as regions of interest.