Leveraging NLP and Twitter Mining for Disaster Management: A Case Study of Hurricane Harvey

Overview

This research presents an innovative approach to extracting valuable information from social media data during natural disasters, focusing on Hurricane Harvey in 2017. The study, titled “Hazard risk awareness and disaster management: Extracting the information content of twitter data”, was published in the journal Sustainable Cities and Society in 2022.

Link to full paper

Key Contributions

Developed a novel Hazard Risk Awareness (HRA) Index
Conducted comprehensive spatiotemporal analysis of Twitter data during Hurricane Harvey
Employed Natural Language Processing (NLP) techniques for content analysis
Proposed a classification system for disaster-related tweets

NLP and Twitter Mining Approach

Data Collection and Preprocessing

Utilized a dataset of 18 million tweets related to Hurricane Harvey
Employed the Natural Language Toolkit (NLTK) for text preprocessing, including:
- Removal of stop words, hyperlinks, mentions, and special characters
- Tokenization and normalization of text data

Topic Modeling

Applied Latent Dirichlet Allocation (LDA) for unsupervised topic discovery
Extracted main themes discussed on Twitter during the disaster

Tweet Classification

Implemented a Random Forest classifier to categorize tweets into humanitarian classes:
1. Hurricane information and warnings
2. Damages and injuries
3. Help, donation, and recovery
4. Sympathy, appreciation, and support
5. Other useful information

Location Extraction

Developed a method to extract geographical locations from non-geotagged tweets
Mapped tweets to counties using a custom location dictionary

Deep Learning Extensions

While not explicitly mentioned in the paper, the following deep learning approaches could enhance the research:

CNN-LSTM for Spatiotemporal Analysis: Implement a CNN-LSTM architecture to capture both spatial and temporal patterns in tweet data.
Neural Topic Modeling: Apply neural network-based topic modeling techniques like Neural LDA for improved topic extraction.
Transformer-based Named Entity Recognition: Employ transformer models for better identification of locations and other named entities in tweets.

Results and Implications

The HRA index provided insights into public awareness and risk perception during the disaster
Temporal analysis revealed evolving patterns of tweet content throughout the event
The classification system offered valuable information for crisis managers and first responders

Future Directions

Integrate deep learning models for improved classification and topic modeling
Expand the approach to other types of natural disasters and geographical regions
Develop real-time analysis capabilities for immediate disaster response
Explore multi-modal analysis by incorporating image and video data from social media

This research demonstrates the potential of NLP and social media mining in disaster management, offering valuable tools for improving situational awareness and response strategies during natural hazards.

Share on

Twitter Facebook LinkedIn

Sepehr Karimi, Ph.D.