Digitization is the process of converting information into a digital format . In this format, information is organized into discrete units of data (called bit s) that can be separately addressed (usually in multiple-bit groups called byte s). This is the binary data that computers and many devices with computing capacity (such as digital camera s and digital hearing aid s) can process.
Text and images can be digitized similarly: a scanner captures an image (which may be an image of text) and converts it to an image file, such as a bitmap . An optical character recognition ( OCR ) program analyzes a text image for light and dark areas in order to identify each alphabetic letter or numeric digit, and converts each character into an ASCII code.
Audio and video digitization uses one of many analog-to-digital conversion processes in which a continuously variable ( analog ) signal is changed, without altering its essential content, into a multi-level (digital) signal. The process of sampling measures the amplitude (signal strength) of an analog waveform at evenly spaced time markers and represents the samples as numerical values for input as digital data.
Digitizing information makes it easier to preserve, access, and share. For example, an original historical document may only be accessible to people who visit its physical location, but if the document content is digitized, it can be made available to people worldwide. There is a growing trend towards digitization of historically and culturally significant data.
According to an article in The Guardian in March 2007, if all spoken language since the dawn of time were digitized, it would consume five exabyte s of storage space. Total digital information, in 2006 was estimated at 161 billion exabytes. Email alone made up six exabytes of that figure.