Bridging the Big Data Digital divide with Data Prospecting


Big data encompasses not only large size but also complexity and has spawned a new paradigm known as data-intensive science. Data-intensive science is a scientific discovery process that is driven by knowledge extracted from large volumes of data rather than the traditional hypothesis driven discovery process. Lack of resources (tools, storage and compute) to use and exploit big data is create new digital divides.
One of the key challenges in data-intensive science is development of enabling technologies to allow all researchers to effectively utilize these large volumes of data in an effective manner. This poster introduces the concept of “data prospecting” to address the challenges of data intensive science. With data prospecting, we extend the familiar metaphor of data mining to describe an initial phase of data exploration used to determine promising areas for deeper analysis. Data prospecting enhances data selection through the use of interactive discovery engines. Interactive exploration enables a researcher to filter the data based on the “first look” analytics, discover interesting and previously unknown patterns to start new science investigations, verify the quality of the data, and corroborate whether patterns in the data match existing science theories or mental models.

Collaboration Area: 
Creative Common License: 
Creative Commons Attribution 3.0 License