Artificial intelligence (or “A.I.”) and Big Data are widely predicted to create a Fourth Industrial Revolution that many say is now underway, with the power to change how we live and work.
The expanding frontier of data science is also expected to bring opportunities for students and scholars to explore new areas within their disciplines—as well as demand for workers with an understanding of advanced data science tools.
With these developments, Bryant University has joined a research group for advanced research computing supported by virtual access to the Umass Amherst UNITY High Performance Computing Cluster.
The UNITY is housed by the Massachusetts Green High Performance Computing Center, an intercollegiate high-performance computing facility and joint venture of Boston University, Harvard, MIT, Northeastern, and the University of Massachusetts system, based in Holyoke, Massachusetts.
The research group is facilitated by the CAREERS Cyberteam Program, a consortium of seven research institutions convened by a 3-year National Science Foundation (NSF) grant to build a regional pool of research computing facilitators to support researchers at small and mid-sized institutions. For researchers who are sifting through massive quantities of data and their computing needs exceed the capacity of their desktop, the CAREERS Cyberteam Program helps connect them to high-performance computing resources to meet the researcher’s needs.
The University’s first research project utilizing the UNITY Cluster is being piloted by Suhong Li, Ph.D., Professor and Department Chair of Information Systems and Analytics, and Li’s collaborator Brenna Rojek ’22, who recently graduated with a degree in Data Science. The project is supported by NSF funds.
The project, “Understanding the Covid-19 Pandemic through Social Media Discussion,” will look at how people discuss the Covid-19 pandemic on Twitter, studying broadly the impact of Covid-19, the polarization of public health topics, Americans’ use of social media to obtain news, and how information is spread on Twitter.
Li has been collecting Covid-19 tweets since March 2020 and currently has about 1.5 billion tweets, requiring high-performance GPUs for analysis. Rojek was given access to the UNITY Cluster GPU servers to conduct the data analysis for the project, which she helped design by adding her interest in focusing on Covid-19 vaccine discourse.
The project will explore a dataset of over 13 million tweets with the keywords related to Covid-19 and ‘vaccine’ or ‘vax’, spanning from March 2020 to February 2022. Rojek will use machine learning and natural language processing in Python to implement topic modeling and sentiment analysis on the tweets in the dataset, and she will learn and use Hugging face natural language processing library to predict the emotion of the tweets.
“In this way, we plan to investigate how sentiments and views changed over time on the platform and the impact of Covid-19 on people’s emotions,” said Li, a data analytics expert, award-winning educator and Faculty Fellow at Bryant’s Center for Health and Behavioral Sciences (CHBS), which is a key facet of Bryant’s recently announced School of Health and Behavioral Sciences.
“Exploring dominant topics and users within Twitter networks provides insight into the behaviors and actions taken by individuals in their respective communities and furthermore can help to understand how or why Covid-19 is spread in different geographical areas as well as the polarization and further politicization of public health topics,” said Rojek, explaining the potential impact of focusing on Covid-19 vaccines in the project.
With the grant, they have the resources in place they need for the project, with the potential to yield promising insights.
“Through the grant, we are able to access UNITY, a very powerful cluster, and run our data analysis. I’m grateful for the access and the opportunity to do more advanced analysis,” said Li. “Brenna was a very good data science student. Once I applied to CAREERS, I offered her the opportunity because I knew she would do a good job. She has the foundational knowledge and skills to do this. And not only will she be paid with grant funds for the work, but she’ll also get to learn more advanced tools.”
“Throughout my years in the Data Science Program at Bryant, Natural Language Processing (NLP) has always been incredibly intriguing to me […] there is still so much potential for machines to be able to process more complex data like language,” said Rojek. “When processing such a large dataset of predominately text data, it is so challenging and exciting. There is so much that you can learn from the data.”
“I have worked with Professor Li in interpreting COVID-19 data in the past and was extremely honored when she presented me the opportunity to work with her again. I cannot emphasize enough how helpful she has been and continues to be in my Data Science journey,” added Rojek.
CAREERS (Cyberteam to Advance Research and Education in Eastern Regional Schools) CyberTeam members include Yale, Penn State, Rutgers, Rensselaer Polytechnic Institute, University of Rhode Island, University of Delaware, and the Massachusetts Green High-Performance Computing Center. Projects supported by the grant includes those of researchers at small to mid-size institutions in Connecticut, Delaware, New Jersey, New York, Pennsylvania and Rhode Island.