Submission information
Submission Number: 192
Submission ID: 4380
Submission UUID: 9ad2c3ca-e6c5-492f-aae8-c37102a8d280
Submission URI: /form/project
Created: Thu, 02/22/2024 - 12:47
Completed: Thu, 02/22/2024 - 12:54
Changed: Tue, 04/16/2024 - 10:05
Remote IP address: 100.1.94.195
Submitted by: Udi Zelzion
Language: English
Is draft: No
Webform: Project
Project Title | Training Program to Study Text Data Analytics on HPC Systems |
---|---|
Program | CAREERS |
Project Image | |
Tags | ai (271), natural-language-processing (274), python (69) |
Status | In Progress |
Project Leader | Jim Samuel |
jim.samuel@rutgers.edu | |
Mobile Phone | |
Work Phone | |
Mentor(s) | |
Student-facilitator(s) | Tanya Khanna |
Mentee(s) | |
Project Description | There is a tremendous increase in volumes of text data across multiple disciplines. It has become necessary to develop easy to use research frameworks using high performance computing (HPC) capabilities for research with text data, because it is near impossible to run analysis of text data on even medium sized datasets. For example, an attempt to run sentiment analysis algorithms on a social media text data file with just 100,000 records would fail on a computer with 16 B or less RAM. Such frameworks need to be beginner friendly and user friendly, and need to customized to the Rutgers’ computing environments to benefit researchers, faculty, students and other users and stakeholders. This will empower all relevant users to focus on the core aspects of their research rather than struggle with HPC related technological challenges. To bring this concept to effect at Rutgers University, we propose the development of standardized processes for basic multidisciplinary natural language processing (NLP) analyses to support beginners and current users of the Amarel system. Our work will focus on preparing Jupyter Notebooks in Python for textual data analyses, NLP and textual data visualization. We anticipate the production of materials which will help researchers at Rutgers. |
Project Deliverables | Jupyter Notebooks in Python for textual data analyses, NLP and textual data visualization. |
Project Deliverables | |
Student Research Computing Facilitator Profile | |
Mentee Research Computing Profile | |
Student Facilitator Programming Skill Level | Some hands-on experience |
Mentee Programming Skill Level | |
Project Institution | |
Project Address | |
Anchor Institution | CR-Rutgers |
Preferred Start Date | |
Start as soon as possible. | Yes |
Project Urgency | Already behind3Start date is flexible |
Expected Project Duration (in months) | 6 |
Launch Presentation | |
Launch Presentation Date | |
Wrap Presentation | |
Wrap Presentation Date | |
Project Milestones |
|
Github Contributions | |
Planned Portal Contributions (if any) | |
Planned Publications (if any) | |
What will the student learn? | The student will gain familiarity with Rutgers' HPC system, Amarel, and understand how to run NLP analysis using Amarel. |
What will the mentee learn? | |
What will the Cyberteam program learn from this project? | Access to the Amarel cluster, Rutgers' HPC system. |
HPC resources needed to complete this project? | |
Notes | |
What is the impact on the development of the principal discipline(s) of the project? | |
What is the impact on other disciplines? | |
Is there an impact physical resources that form infrastructure? | |
Is there an impact on the development of human resources for research computing? | |
Is there an impact on institutional resources that form infrastructure? | |
Is there an impact on information resources that form infrastructure? | |
Is there an impact on technology transfer? | |
Is there an impact on society beyond science and technology? | |
Lessons Learned | |
Overall results |