Submission Number: 191
Submission ID: 4361
Submission UUID: 075cb4fc-4a23-4388-a6d4-668a2d2f65e5
Submission URI: /form/project

Created: Mon, 02/12/2024 - 13:20
Completed: Mon, 02/12/2024 - 13:34
Changed: Wed, 09/04/2024 - 15:34

Remote IP address: 128.6.36.2
Submitted by: Udi Zelzion
Language: English

Is draft: No
Webform: Project
Project Title Framework for Reduction of Ambiguity in Text Data from Generative AI
Program CAREERS
Project Image
Tags ai (271), natural-language-processing (274), python (69)
Status In Progress
Project Leader Jim Samuel
Email jim.samuel@rutgers.edu
Mobile Phone
Work Phone
Mentor(s)
Student-facilitator(s) Kushal Gunavantkumar Patel
Mentee(s)
Project Description Generative AI has invaded our places of work and learning with the promise of increasing productivity.
However, many generative AIs are built on (Large Language Model) LLMs which act as next-wordpredictors
based on probabilistic modeling. This leads to numerous challenges, especially ambiguity.
This proposal addresses the research question: How can we reduce ambiguity in AI generated text?
The current proposal seeks to 1) identify ways to algorithmically identify and flag ambiguity, and 2)
explore identifying levels of ambiguity and 3) explore ways in which ambiguity could be reduced or
managed.
Once ambiguity is identified, we intend to use a LLM application to generate improved alternatives. This
project will help improve the quality of human interactions with AI applications such as chatbots.
Project Deliverables
Project Deliverables
Student Research Computing Facilitator Profile
Mentee Research Computing Profile
Student Facilitator Programming Skill Level Practical applications
Mentee Programming Skill Level
Project Institution
Project Address
Anchor Institution CR-Rutgers
Preferred Start Date
Start as soon as possible. Yes
Project Urgency Already behind3Start date is flexible
Expected Project Duration (in months) 6
Launch Presentation
Launch Presentation Date
Wrap Presentation
Wrap Presentation Date
Project Milestones
  • Milestone Title: Launch
    Milestone Description: Give a launch presentation during the monthly meeting get HPC access and explore and validate multiple LLMs on ready to use datasets
    Completion Date Goal: 2024-04-19
    Actual Completion Date: 2024-05-31
  • Milestone Title: Identify ambiguity
    Milestone Description: Narrow down on most promising approaches to identify ambiguity and run tests
    Completion Date Goal: 2024-06-01
    Actual Completion Date: 2024-07-15
  • Milestone Title: Finetuning
    Milestone Description: Apply Finetuning, and other customizations to the LLMs to generate suitable text
    Completion Date Goal: 2024-07-16
    Actual Completion Date: 2024-08-31
  • Milestone Title: Finalize Documentation
    Milestone Description: Produce a workflow that Write a white paper, prepare presentation and package any other deliverable/s.
    Completion Date Goal: 2024-09-01
    Actual Completion Date: 2024-10-01
  • Milestone Title: Wrap presentation
    Milestone Description: Give a wrap presentation at the monthly meeting and have an exit interview.
Github Contributions
Planned Portal Contributions (if any)
Planned Publications (if any)
What will the student learn? The student will gain familiarity with Rutgers' HPC system, Amarel, and understand how to run NLP analysis using Amarel.
What will the mentee learn?
What will the Cyberteam program learn from this project? Jupyter notebooks with examples on how to run NLP analysis.
HPC resources needed to complete this project? Access to the Amarel cluster, Rutgers' HPC system.
Notes
What is the impact on the development of the principal discipline(s) of the project?
What is the impact on other disciplines?
Is there an impact physical resources that form infrastructure?
Is there an impact on the development of human resources for research computing?
Is there an impact on institutional resources that form infrastructure?
Is there an impact on information resources that form infrastructure?
Is there an impact on technology transfer?
Is there an impact on society beyond science and technology?
Lessons Learned
Overall results