socialmedia

Back to Home Page

Monitoring Events Using twarc Filter and Search

This is a narrative guide outlining how to start a search and a filter and combine the results once the event is over. We’re going to running this on a recent news event about the Governor of Florida, but any topic will work.

Table of Contents

Before You Start
Filter and Search
Dehydrate
Combine
Rehydrate
Deduplicate
Analysis

Before You Start

Before starting this guide, make sure you have twarc installed and setup.

Next you’re going to want to run twarc filter which collects tweets from the Twitter stream matching the filter criteria, and twarc search which collects tweets made in the past seven days matching the search criteria. There are a couple of ways this can be done, but the most preferable is to run two command line windows.

twarc filter desantis > desantis_filter.jsonl

twarc search desantis > desantis_search.jsonl

The search command will finish before the filter which will keep running until manually stopped. Once we are finished running the search, we can work on combining the two JSONLs.

Dehydrate

We will start by dehydrating the two collected datasets.

twarc dehydrate desantis_filter.jsonl > desantis_filter.txt 
   
twarc dehydrate desantis_search.jsonl  > desantis_search.txt

Combine

Now that the datasets have been dehydrated, we can use the python program combine.py here to combine them.

python utils/combine.py 

And enter the input requests as follows:

Enter the name of your filter txt: desantis_filter.txt
Enter the name of your search txt: desantis_search.txt
Enter the name of your output txt: desantis_fs.txt    

Rehydrate

Now that we have our merged dataset, we can rehydrate the dataset.

twarc hydrate desantis_fs.txt > desantis_fs.jsonl

Deduplicate

Then, we can run deduplicate.py to remove any overlap from the merging of the two datasets.

python utils/deduplicate.py desantis_fs.jsonl > desantis.jsonl

All of the usage is displayed in the command line here:

DESANTIS1

DESANTIS2

Analysis

Now that we have our merged dataset without duplicate ID’s, we can perform analysis using the python utilities provided with twarc. See the twarc page for more information and links the the repository.

You can download the DeSantis files from the twitter repo.

Back To Top