Tracking a Pandemic – through Words

217

Text analysis software developed at PNNL helps the nation track global biothreats, such as COVID-19

The key project team includes Lauren Charles, who is in charge of the project’s analytics capabilities; Scott Dowson, who leads the software engineering; and Michelle Hart, the project manager.

Mega Doctor NEWS

By Pacific Northwest National Laboratory

Newswise — In late December 2019, U.S. analysts monitoring global biothreats began tracking unidentified viral pneumonia spreading in China through technology developed at the U.S. Department of Energy’s Pacific Northwest National Laboratory. About a month later, the rest of the world would know that disease as COVID-19.

Some of the earliest insights came courtesy of a team of U.S. analysts who constantly monitor open-source text for information about active and potential biological, chemical, and radiation threats to humans, animals, and the environment. This information helps them track all aspects of an ongoing event like COVID-19 from inception to its impact on the world.

Data-mining software developed at PNNL called BioFeeds plays a key role, helping analysts by automating the process of combing through tens of thousands of articles each day. Dozens of government agencies and international partners rely on the reports from BioFeeds – developed with support from the U.S. Department of Homeland Security National Biosurveillance Integration Center — to quickly get relevant information about active, future, and emerging biothreats, including COVID-19.

“COVID-19 is an example of why BioFeeds exists,” said Lauren Charles, a senior data scientist at PNNL, who is leading the development of advanced analytic algorithms for this software. “Now it’s also important because this software can look below the continuous talk about COVID-19 and monitor other potential biothreats happening in the world right now.”

Currently, those threats include an outbreak of bubonic plague in the Democratic Republic of Congo, as well as the largest outbreak of dengue fever ever recorded in Argentina.

A daily harvest of information worldwide

So far, BioFeeds has harvested information from more than 800,000 reports, news articles, blogs, scientific research, web search alerts, and other publicly available information in 90 different languages.

The software “reads” the articles using natural language processing algorithms to extract information regarding an event and its impacts. Then, it automatically labels relevant information from a taxonomy of about 1,500 tags, including the type of threat (disease or chemical agent, for example), specific event details, impacts on humans and critical infrastructure, and control measures being applied for mitigation. The software also flags special cases, such as new events, novel or unusual pathogens, and abnormal characteristics of ongoing events.

Typically for an ongoing event like COVID-19, the software notes tens of thousands of articles a day, then applies filters to reduce the data to the most important for review by analysts – making it possible for people confronting a rapidly evolving situation to navigate a flood of information and react appropriately.

Analysts can query the tagged data, find similar articles, add additional tags, and generate reports for immediate, daily or weekly notifications. Any user can also subscribe to customized web feeds based on user-defined queries or specific alerts.