Submission Number: 167
Submission ID: 3796
Submission UUID: f7842b35-5003-45ea-9f50-527cf4532103
Submission URI: /form/project

Created: Wed, 06/28/2023 - 10:07
Completed: Wed, 06/28/2023 - 10:07
Changed: Mon, 07/03/2023 - 16:35

Remote IP address: 128.6.36.20
Submitted by: Udi Zelzion
Language: English

Is draft: No
Webform: Project
Project Title: AI-based Analysis of Historical Handwritten Text Transcription and Historical Image Analysis 
Program:
CAREERS (323)

Project Image: {Empty}
Tags:
ai (271), data-analysis (422), deep-learning (303), distributed-computing (92), python (69), tensorflow (51)

Status: {Empty}
Project Leader
--------------
Project Leader:
Sonia Yaco

Email: sonia.yaco@rutgers.edu
Mobile Phone: {Empty}
Work Phone: {Empty}

Project Personnel
-----------------
Mentor(s):
{Empty}

Student-facilitator(s):
{Empty}

Mentee(s):
{Empty}


Project Information
-------------------
Project Description:
Margaret Clark Griffis was one the first women missionaries to modernizing Japan (1870s) and is credited with helping modernize Japanese women’s education. From 1872 to 1874, she served as a teacher and assistant principal in the Jo-Gakko girls school, the first Japanese government school for girls. Margaret kept a very detailed diaries during her time in Japan; The importance of these texts lies in their potential to provide insights into Japan's history, culture, and social practices, particularly with respect to Christian missionary activities. The diaries together with historical images are part of the William Elliot Griffis Collection at Rutgers’ Special Collections and University Archives.
This project proposes an AI-based transcription system for historical texts written by Christian missionaries in Japan during the 19th century and image analysis of historic photos from Japan form the late 19th century. The use of AI can expedite transcription and improve accuracy while creating digital archives for accessibility. The project aims to facilitate a deeper understanding of Japan's past, as digital archives make the texts more accessible and searchable for researchers and scholars. 


Project Information Subsection
------------------------------
Project Deliverables:
{Empty}

Project Deliverables:
{Empty}

Student Research Computing Facilitator Profile:
We are looking for a Grad student to conduct research on applying AI methods to understand the text and image collections at the libraries. The student will test and deploy machine learning and deep learning models to transcribe the handwritten text and identify the objects in the historical photographs. The project requires data exploration, modeling, and analysis with Python and utilizing packages such as Pandas, scikit-learn, and Tensorflow.

Mentee Research Computing Profile:
{Empty}

Student Facilitator Programming Skill Level: Practical applications
Mentee Programming Skill Level: {Empty}
Project Institution: Rutgers University
Project Address:
CoRE Building, 96 Frelinghuysen Road
Piscataway, New Jersey. 08854

Anchor Institution: CR-Rutgers
Preferred Start Date: 07/01/2023
Start as soon as possible.: Yes
Project Urgency: Already behind3Start date is flexible
Expected Project Duration (in months): 6
Launch Presentation: {Empty}
Launch Presentation Date: 07/12/2023
Wrap Presentation: {Empty}
Wrap Presentation Date: 01/10/2024
Project Milestones:
- Milestone Title: Launch presentation
  Milestone Description: Present an overview of the project at the July monthly CAREERS meeting.
  Completion Date Goal: 2023-07-12
- Milestone Title: Refine Model
  Milestone Description: Refining the "Griffis" handwriting transcription model.
  Completion Date Goal: 2023-08-16
- Milestone Title: Run on larger dataset and data analysis
  Milestone Description: Applying handwriting transcription model to additional texts and analyzing the results.
  Completion Date Goal: 2023-09-13
- Milestone Title: Selecting images for test set  
  Milestone Description: Selecting the images for the test-set for model training.
  Completion Date Goal: 2023-10-11
- Milestone Title: Applying Model
  Milestone Description: Applying the model on a large collection of images and combining the data with the transcription data.
  Completion Date Goal: 2023-12-21
- Milestone Title: Wrap presentation
  Milestone Description: Presenting the project outcomes at the January CAREERS monthly meeting.
  Completion Date Goal: 2024-01-10

Github Contributions: {Empty}
Planned Portal Contributions (if any):
{Empty}

Planned Publications (if any):
{Empty}

What will the student learn?:
The student will learn to test and deploy machine learning and deep learning models to transcribe handwritten text and identify objects in historical photographs and learn how to deploy the models  on distributed computing resources.



What will the mentee learn?:
{Empty}

What will the Cyberteam program learn from this project?:
{Empty}

HPC resources needed to complete this project?:
Access to Rutgers' HPC cluster named Amarel

Notes:
{Empty}



Final Report
------------
What is the impact on the development of the principal discipline(s) of the project?:
{Empty}

What is the impact on other disciplines?:
{Empty}

Is there an impact physical resources that form infrastructure?:
{Empty}

Is there an impact on the development of human resources for research computing?:
{Empty}

Is there an impact on institutional resources that form infrastructure?:
{Empty}

Is there an impact on information resources that form infrastructure?:
{Empty}

Is there an impact on technology transfer?:
{Empty}

Is there an impact on society beyond science and technology?:
{Empty}

Lessons Learned:
{Empty}

Overall results:
{Empty}