Data mining Twitter for augmenting NASA precipitation research and applications


The Twitter social microblogging database, which recently passed its tenth anniversary, is potentially a rich source of real-time and historical global information for science applications (beyond the by-now fairly familiar use of Twitter for natural hazards monitoring). Over the past several years, we have been exploring the feasibility of extracting from the Twitter data stream useful information for application to NASA precipitation research, with both “passive” and “active” participation by the twitterers. In the passive case, we have experimented with listening to the Twitter stream in real time for “precipitation” and related tweets (in different languages), applying basic filters for exact phrases, extracting location information, and mapping the resulting tweet distributions. In the active case, we have conducted preliminary experiments to evaluate different methods of engaging with potential participants. The time-varying set of “precipitation” tweets can be thought of as an organic network of rain gauges, potentially providing a widespread view of precipitation occurrence. The validation of satellite precipitation estimates is challenging, because many regions lack data or access to data, especially outside of the U.S. and in remote and developing areas. Mining the Twitter stream could augment these validation programs and, potentially, help tune existing algorithms. Our exploratory efforts thus far, soon to be expanded by a new project under the NASA Citizen Science for Earth Systems Program, could significantly extend the application realm of Twitter, as a platform for citizen science, beyond natural hazards monitoring to science applications.