Any data which is disregarded and
still remains stored without being indexed anywhere is known as Dark Data, it
has a tendency to get lost as it disappears for the researchers first. It has
been gathered by organizations unintentionally and therefore it is unstructured
in nature, it is not accessible to the public and is neither employed for any
decision making.
The primary reason for the
generation of dark data is the accumulation of bulk of data and only a small part of
it being selected for analysis. Data is generated very rapidly; with every user
clicking on a link, data is being generated which is analyzed by the corporations
to better their businesses. However, they require only a limited amount of data
that is structured and then kept as a record in databases whereas the remaining
unstructured data is lost amid other data which is not indexed.
Out of 7.5 sextillion gigabytes
of data generated throughout the world on a daily basis, 6.75 Septillion
megabytes is left unprocessed and goes as dark data which further remains stockpiled
in data repositories. The lack of required tools for analysis is another reason
for the generation of dark data.
Referenced
from the statements given by Bob Picciano, Senior VP of Analytics at IBM, “Data
that is difficult to work with creates a high barrier to entry. People
typically forego trying to get any information out of it. About 90% of data
generated by most sensors and other sources on the market never get utilized,
and 60% of that data loses its true value within milliseconds.”
Dark data can be employed by an
organization to gain valuable insights which are even more valuable than the
insights they are gaining presently, dark data is a subset of big data in a way
and can be used for multiple purposes such as to analyze the network security
in an environment.