Submission Number: 148
Submission ID: 408
Submission UUID: 135ffcba-0c6f-4434-a690-3ded031c7f5e
Submission URI: /form/project

Created: Tue, 08/09/2022 - 15:59
Completed: Tue, 08/09/2022 - 16:07
Changed: Fri, 04/14/2023 - 06:37

Remote IP address: 130.215.45.247
Submitted by: Gaurav Khanna
Language: English

Is draft: No
Webform: Project
Project Title Detecting Covid-19 Misinformation on Social Media
Program CAREERS
Project Image Unknown.jpeg
Tags ai (271), bash (242), batch-jobs (76), big-data (4), biology (515), cuda (222)
Status Complete
Project Leader Suhong Li
Email sli@bryant.edu
Mobile Phone
Work Phone
Mentor(s) Suhong Li
Student-facilitator(s) Jason Michaud
Mentee(s)
Project Description The ongoing pandemic has heightened the need for developing tools to flag COVID-19-related misinformation on the internet, specifically on social media such as Twitter. This project is based on 1.6 billion covid-19 tweets that were collected between March 2020 and May 2022. The project focuses on developing a machine learning model to detect covid-19 related misinformation. In addition, the validated model will be applied to all covid-19 tweets to further understand misinformation. For example, who are distributing covid-19 misinformation? How is the misinformation travelled over social media? what are the main topics of the misinformation?, and how does the misinformation differ by time and by location?

The student will work on this project from start to finish using various data analytic methodology including data exploration, topic modelling, natural language processing and machine learning. More specifically, in the context of an RCF skillset, the student will gain experience with accessing a remote computational system, setting up jobs in an HPC environment, working with queuing systems, performing file I/O with remote systems etc.

Note: This is a follow-on project from a previous project led by the same PI and RCF Brenna Rojek that ran in Spring 2022. The current RCF will leverage the tools and workflow that was developed by Brenna and develop it further.
Project Deliverables
Project Deliverables
Student Research Computing Facilitator Profile
Mentee Research Computing Profile
Student Facilitator Programming Skill Level Some hands-on experience
Mentee Programming Skill Level
Project Institution Bryant University
Project Address Rhode Island
Anchor Institution CR-University of Rhode Island
Preferred Start Date 10/01/2022
Start as soon as possible. No
Project Urgency Already behind5Start date is flexible
Expected Project Duration (in months) 6
Launch Presentation
Launch Presentation Date
Wrap Presentation
Wrap Presentation Date 04/12/2023
Project Milestones
  • Milestone Title: Milestone #1
    Milestone Description: Student review relevant literature and learns about ML, NLP and other needed libraries/packages; launch presentation.
    Completion Date Goal: 2022-10-01
    Actual Completion Date: 2022-11-01
  • Milestone Title: Milestone #2
    Milestone Description: Student reviews the twitter data set and formats the data for use by the ML, NLP, etc. software.
    Completion Date Goal: 2022-11-01
    Actual Completion Date: 2022-12-01
  • Milestone Title: Milestone #3
    Milestone Description: Student performs extensive analysis of the formatted data using ML and NLP techniques. Specific tasks
    • Build a machine learning model to detect covid-19 misinformation
    • Using the validated model to make prediction to all tweets and evaluate the following questions:
    • who are distributing covid-19 misinformation?
    • How is the misinformation travelled over social media?
    • what are the main topics of the misinformation?
    • How does the misinformation differ by time and by location?

    Completion Date Goal: 2022-12-01
    Actual Completion Date: 2023-02-01
  • Milestone Title: Milestone #4
    Milestone Description: Student works with faculty to interpret the results and writes a report.
    Completion Date Goal: 2023-02-01
    Actual Completion Date: 2023-03-01
  • Milestone Title: Milestone #5
    Milestone Description: The student presents the results in a poster or a Zoom presentation. Student submit the project to a conference; wrap presentation

    Completion Date Goal: 2023-03-01
    Actual Completion Date: 2023-03-31
Github Contributions
Planned Portal Contributions (if any)
Planned Publications (if any)
What will the student learn?
What will the mentee learn?
What will the Cyberteam program learn from this project?
HPC resources needed to complete this project?
Notes
What is the impact on the development of the principal discipline(s) of the project? This project built a machine learning model to predict fake news related to Covid-19. It applied the model to covid-19 tweets in three countries (United States, UK and India) to detect fake news in each country. The project also applied topic modelling to find dominant topics in fake news in each country.
What is the impact on other disciplines? This project contributes to our knowledge in the field of communication and health care. This project built a machine learning model to predict covid-19 misinformation and can be used to detect fake news in Twitter. In addition, the study deepens our understanding of dominant topics of covid-19 misinformation in social media and how it differs by country. The result can be helpful in detecting and preventing the spread of misinformation on social media.
Is there an impact physical resources that form infrastructure? None
Is there an impact on the development of human resources for research computing? The RCF developed strong awareness of opportunities and experiences involved in research computing -- something the student was completely unaware of previously.

The student involved learned to use High Performance Cluster and request proper resources needed. In addition, the student learned to run batch jobs when dealing with high volume of data. He plans to organize his code and share his code with the public so that more people can benefit from this experience.
Is there an impact on institutional resources that form infrastructure? None.
Is there an impact on information resources that form infrastructure? None.
Is there an impact on technology transfer? None.
Is there an impact on society beyond science and technology? As mentioned previously, this project is helpful in detecting and preventing the spread of misinformation on social media and will reduce potential negative impact of social media on society.
Lessons Learned The student working on this project was able to learn start-of-art natural language processing algorithms, learn to use GPU cluster, and run batch job. However, some jobs still took about more than 24 hours to run. A better approach needs to be developed to scale the data better in the future
Overall results The project trained a model to predict fake news and apply the model to covid-19 tweets collected between March 2020 and May 2022 in three countries (USA, UK and India). The results of topic modelling show the dominant topics in fake news in the US are related to Covid Symptom, Politics, Covid Treatment and Cases /Lock-down, the dominant topics in real news are Mask Mandate/Social Distancing, Covid Statistic, and Politics. The model has trouble distinguish between fake news and real news for India dataset due to limited training data available for that country.