Submission Number: 142
Submission ID: 258
Submission UUID: 92db66c8-a836-4c3c-90f6-87e912e5b2d1
Submission URI: /form/project

Created: Thu, 03/03/2022 - 13:55
Completed: Thu, 03/03/2022 - 13:55
Changed: Wed, 05/17/2023 - 15:34

Remote IP address: 74.103.220.121
Submitted by: Gaurav Khanna
Language: English

Is draft: No
Webform: Project
Project Title: Vector Representation with a Finance Corpus
Program:
CAREERS (323)

Project Image: https://support.access-ci.org/system/files/webform/project/258/1%2AsGxKvELB8t0Q2AUUh15yyQ.png
Tags:
bash (242), batch-jobs (76), deep-learning (303), distributed-computing (92), machine-learning (272), programming (5), python (69), research-facilitation (442), ssh (78)

Status: Complete
Project Leader
--------------
Project Leader:
Murat Aydogdu

Email: maydogdu@ric.edu
Mobile Phone: {Empty}
Work Phone: {Empty}

Project Personnel
-----------------
Mentor(s):
{Empty}

Student-facilitator(s):
Ritesh Bachhar (1790)

Mentee(s):
{Empty}


Project Information
-------------------
Project Description:
This project entails generating vector representations using a general purpose and a finance corpus using the GloVe implementation. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. The steps will involve extracting text from two sets of documents and building the two corpora, then training GloVe on these two corpora and generating vector representations. These vector representations will then be used to analyze the impact of domain-specific corpus on vector representation.

This project will require storage space to save large corpora and computation power to train GloVe on these corpora. A computing platform like URI’s HPC or MGHPCC will be used to perform these tasks. The student facilitator will help the project PI to get the computational workflow set up in an HPC environment i.e. develop and test the job submission scripts and set up the required software and data properly on the chosen computational resource. 


Project Information Subsection
------------------------------
Project Deliverables:
A tested computational workflow for a GloVe based vector representation in an HPC environment.  

Project Deliverables:
{Empty}

Student Research Computing Facilitator Profile:
Experience with writing and running python programs with large number of datasets in a distributed system environment such as HPC.


Mentee Research Computing Profile:
{Empty}

Student Facilitator Programming Skill Level: Some hands-on experience
Mentee Programming Skill Level: {Empty}
Project Institution: Rhode Island College
Project Address:
Rhode Island

Anchor Institution: CR-University of Rhode Island
Preferred Start Date: {Empty}
Start as soon as possible.: No
Project Urgency: Already behind5Start date is flexible
Expected Project Duration (in months): 3
Launch Presentation: https://support.access-ci.org/system/files/webform/project/258/Careers_project_launch.pdf
Launch Presentation Date: 06/08/2022
Wrap Presentation: https://support.access-ci.org/system/files/webform/project/258/Wrap_Persentation_Final%20%281%29.pdf
Wrap Presentation Date: 08/10/2022
Project Milestones:
- Milestone Title: Milestone #1
  Milestone Description: Background study (vector representation, GloVe); HPC access; overview of project and edits of initial code; github repo setup
  Completion Date Goal: 2022-06-08
  Actual Completion Date: 2022-06-08
- Milestone Title: Milestone #2
  Milestone Description: Testing code to extract text from corpora on a small scale; setting up job submission scripts on HPC cluster; testing combining output extensively; running GloVe program to produce small scale results
  Completion Date Goal: 2022-07-08
  Actual Completion Date: 2022-07-08
- Milestone Title: Milestone #3
  Milestone Description: Executing the project at scale and generating results; presenting the results in a Zoom "wrap" presentation; contributing developed code/script/documentation to the github repo.
  
  Completion Date Goal: 2022-08-10
  Actual Completion Date: 2022-08-10

Github Contributions: {Empty}
Planned Portal Contributions (if any):
{Empty}

Planned Publications (if any):
{Empty}

What will the student learn?:
{Empty}

What will the mentee learn?:
{Empty}

What will the Cyberteam program learn from this project?:
{Empty}

HPC resources needed to complete this project?:
{Empty}

Notes:
{Empty}



Final Report
------------
What is the impact on the development of the principal discipline(s) of the project?:
Other than the development of an impactful resource that allows for powerful computing to train GloVe on large data volumes -- no other significant impact on the discipline of the project. 

What is the impact on other disciplines?:
The student facilitator gained a lot of experience working with an HPC resource and will be using that experience in other areas of science including his own area of interest in computational physics. No other significant impact on another discipline. 

Is there an impact physical resources that form infrastructure?:
None. 

Is there an impact on the development of human resources for research computing?:
Yes; the student facilitator enjoyed his engagement with CyberTeams and is open to the possibility of computational work/facilitation as a career option. 

Is there an impact on institutional resources that form infrastructure?:
Yes; there is now a complete and tested HPC workflow for GloVe computations. 

Is there an impact on information resources that form infrastructure?:
None.

Is there an impact on technology transfer?:
None.

Is there an impact on society beyond science and technology?:
None.

Lessons Learned:
This was a somewhat more complex project for the time allotted and for the student's background. The project moved slower than expected and didn't make as much progress as quickly. However, with significant input from the project researcher and URI's HPC team, the project completed successfully -- the researcher is satisfied with the outcome and is able to run complex GloVe workflows in an HPC environment. 

Overall results:
The project developed and tested a complete HPC workflow for GloVe related computations. The student facilitator enjoyed his engagement with CyberTeams and is open to the possibility of computational work/facilitation as a career option.