Normalization Strategies for Enhancing Spatio-Temporal Analysis of Social Media Responses during Extreme Events: A Case Study based on Analysis of Four Extreme Events using Socio-Environmental Data Explorer (SEDE)
With social media becoming increasingly location-based, there has been a greater push from researchers across various domains including social science, public health, and disaster management, to tap in the spatial, temporal, and textual data available from these sources to analyze public response during extreme events such as an epidemic outbreak or a natural disaster. Studies based on demographics and other socio-economic factors suggests that social media data could be highly skewed based on the variations of population density with respect to place. To capture the spatio-temporal variations in public response during extreme events we have developed the Socio-Environmental Data Explorer (SEDE). SEDE collects and integrates social media, news and environmental data to support exploration and assessment of public response to extreme events. For this study, using SEDE, we conduct spatio-temporal social media response analysis on four major extreme events in the United States including the “North American storm complex” in December 2015, the “snowstorm Jonas” in January 2016, the “West Virginia floods” in June 2016, and the “Hurricane Matthew” in October 2016. Analysis is conducted on geo-tagged social media data from Twitter and warnings from the storm events database provided by National Centers For Environmental Information (NCEI) for analysis. Results demonstrate that, to support complex social media analyses, spatial and population-based normalization and filtering is necessary. The implications of these results suggests that, while developing software solutions to support analysis of non-conventional data sources such as social media, it is quintessential to identify the inherent biases associated with the data sources, and adapt techniques and enhance capabilities to mitigate the bias. The normalization strategies that we have developed and incorporated to SEDE will be helpful in reducing the population bias associated with social media data and will be useful for researchers and decision makers to enhance their analysis on spatio-temporal social media responses during extreme events.