Submission information
Submission Number: 142
Submission ID: 258
Submission UUID: 92db66c8-a836-4c3c-90f6-87e912e5b2d1
Submission URI: /form/project
Created: Thu, 03/03/2022 - 13:55
Completed: Thu, 03/03/2022 - 13:55
Changed: Wed, 05/17/2023 - 15:34
Remote IP address: 74.103.220.121
Submitted by: Gaurav Khanna
Language: English
Is draft: No
Webform: Project
Project Title | Vector Representation with a Finance Corpus |
---|---|
Program | CAREERS |
Project Image |
![]() |
Tags | bash (242), batch-jobs (76), deep-learning (303), distributed-computing (92), machine-learning (272), programming (5), python (69), research-facilitation (442), ssh (78) |
Status | Complete |
Project Leader | Murat Aydogdu |
maydogdu@ric.edu | |
Mobile Phone | |
Work Phone | |
Mentor(s) | |
Student-facilitator(s) | Ritesh Bachhar |
Mentee(s) | |
Project Description | This project entails generating vector representations using a general purpose and a finance corpus using the GloVe implementation. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. The steps will involve extracting text from two sets of documents and building the two corpora, then training GloVe on these two corpora and generating vector representations. These vector representations will then be used to analyze the impact of domain-specific corpus on vector representation. This project will require storage space to save large corpora and computation power to train GloVe on these corpora. A computing platform like URI’s HPC or MGHPCC will be used to perform these tasks. The student facilitator will help the project PI to get the computational workflow set up in an HPC environment i.e. develop and test the job submission scripts and set up the required software and data properly on the chosen computational resource. |
Project Deliverables | A tested computational workflow for a GloVe based vector representation in an HPC environment. |
Project Deliverables | |
Student Research Computing Facilitator Profile | Experience with writing and running python programs with large number of datasets in a distributed system environment such as HPC. |
Mentee Research Computing Profile | |
Student Facilitator Programming Skill Level | Some hands-on experience |
Mentee Programming Skill Level | |
Project Institution | Rhode Island College |
Project Address | Rhode Island |
Anchor Institution | CR-University of Rhode Island |
Preferred Start Date | |
Start as soon as possible. | No |
Project Urgency | Already behind5Start date is flexible |
Expected Project Duration (in months) | 3 |
Launch Presentation | |
Launch Presentation Date | 06/08/2022 |
Wrap Presentation | |
Wrap Presentation Date | 08/10/2022 |
Project Milestones |
|
Github Contributions | |
Planned Portal Contributions (if any) | |
Planned Publications (if any) | |
What will the student learn? | |
What will the mentee learn? | |
What will the Cyberteam program learn from this project? | |
HPC resources needed to complete this project? | |
Notes | |
What is the impact on the development of the principal discipline(s) of the project? | Other than the development of an impactful resource that allows for powerful computing to train GloVe on large data volumes -- no other significant impact on the discipline of the project. |
What is the impact on other disciplines? | The student facilitator gained a lot of experience working with an HPC resource and will be using that experience in other areas of science including his own area of interest in computational physics. No other significant impact on another discipline. |
Is there an impact physical resources that form infrastructure? | None. |
Is there an impact on the development of human resources for research computing? | Yes; the student facilitator enjoyed his engagement with CyberTeams and is open to the possibility of computational work/facilitation as a career option. |
Is there an impact on institutional resources that form infrastructure? | Yes; there is now a complete and tested HPC workflow for GloVe computations. |
Is there an impact on information resources that form infrastructure? | None. |
Is there an impact on technology transfer? | None. |
Is there an impact on society beyond science and technology? | None. |
Lessons Learned | This was a somewhat more complex project for the time allotted and for the student's background. The project moved slower than expected and didn't make as much progress as quickly. However, with significant input from the project researcher and URI's HPC team, the project completed successfully -- the researcher is satisfied with the outcome and is able to run complex GloVe workflows in an HPC environment. |
Overall results | The project developed and tested a complete HPC workflow for GloVe related computations. The student facilitator enjoyed his engagement with CyberTeams and is open to the possibility of computational work/facilitation as a career option. |