Submission Number: 192
Submission ID: 4380
Submission UUID: 9ad2c3ca-e6c5-492f-aae8-c37102a8d280
Submission URI: /form/project

Created: Thu, 02/22/2024 - 12:47
Completed: Thu, 02/22/2024 - 12:54
Changed: Tue, 04/16/2024 - 10:05

Remote IP address: 100.1.94.195
Submitted by: Udi Zelzion
Language: English

Is draft: No
Webform: Project
Project Title Training Program to Study Text Data Analytics on HPC Systems
Program CAREERS
Project Image
Tags ai (271), natural-language-processing (274), python (69)
Status In Progress
Project Leader Jim Samuel
Email jim.samuel@rutgers.edu
Mobile Phone
Work Phone
Mentor(s)
Student-facilitator(s) Tanya Khanna
Mentee(s)
Project Description There is a tremendous increase in volumes of text data across multiple disciplines.
It has become necessary to develop easy to use research frameworks using high performance computing
(HPC) capabilities for research with text data, because it is near impossible to run analysis of text data on
even medium sized datasets. For example, an attempt to run sentiment analysis algorithms on a social
media text data file with just 100,000 records would fail on a computer with 16 B or less RAM.
Such frameworks need to be beginner friendly and user friendly, and need to customized to the Rutgers’
computing environments to benefit researchers, faculty, students and other users and stakeholders. This
will empower all relevant users to focus on the core aspects of their research rather than struggle with
HPC related technological challenges.
To bring this concept to effect at Rutgers University, we propose the development of standardized
processes for basic multidisciplinary natural language processing (NLP) analyses to support beginners
and current users of the Amarel system.
Our work will focus on preparing Jupyter Notebooks in Python for textual data analyses, NLP and textual
data visualization. We anticipate the production of materials which will help researchers at Rutgers.
Project Deliverables Jupyter Notebooks in Python for textual data analyses, NLP and textual data visualization.
Project Deliverables
Student Research Computing Facilitator Profile
Mentee Research Computing Profile
Student Facilitator Programming Skill Level Some hands-on experience
Mentee Programming Skill Level
Project Institution
Project Address
Anchor Institution CR-Rutgers
Preferred Start Date
Start as soon as possible. Yes
Project Urgency Already behind3Start date is flexible
Expected Project Duration (in months) 6
Launch Presentation
Launch Presentation Date
Wrap Presentation
Wrap Presentation Date
Project Milestones
  • Milestone Title: Launch Presentation
    Milestone Description: Give a launch presentation during the March CAREERS Monthly meeting.
    Completion Date Goal: 2024-03-13
  • Milestone Title: Gathering Datasets
    Milestone Description: Prepare text datasets, establish and preliminary training.
    Completion Date Goal: 2024-03-10
  • Milestone Title: Putting the code together
    Milestone Description: Experiment with code, process and datasets.
    Completion Date Goal: 2024-04-12
  • Milestone Title: Creating the Notebooks
    Milestone Description: Create scripts, standardized notebooks using NLP methods.
    Completion Date Goal: 2024-05-15
  • Milestone Title: Finalize materials
    Milestone Description: Finalize slides, notebooks and documentation.
    Completion Date Goal: 2024-06-12
  • Milestone Title: Wrap Presentation
    Milestone Description: Give a wrap presentation at the June CAREERS monthly meeting.
    Completion Date Goal: 2024-07-10
Github Contributions
Planned Portal Contributions (if any)
Planned Publications (if any)
What will the student learn? The student will gain familiarity with Rutgers' HPC system, Amarel, and understand how to run NLP analysis using Amarel.
What will the mentee learn?
What will the Cyberteam program learn from this project? Access to the Amarel cluster, Rutgers' HPC system.
HPC resources needed to complete this project?
Notes
What is the impact on the development of the principal discipline(s) of the project?
What is the impact on other disciplines?
Is there an impact physical resources that form infrastructure?
Is there an impact on the development of human resources for research computing?
Is there an impact on institutional resources that form infrastructure?
Is there an impact on information resources that form infrastructure?
Is there an impact on technology transfer?
Is there an impact on society beyond science and technology?
Lessons Learned
Overall results