Submission Number: 167
Submission ID: 3796
Submission UUID: f7842b35-5003-45ea-9f50-527cf4532103
Submission URI: /form/project

Created: Wed, 06/28/2023 - 10:07
Completed: Wed, 06/28/2023 - 10:07
Changed: Mon, 07/03/2023 - 16:35

Remote IP address: 128.6.36.20
Submitted by: Udi Zelzion
Language: English

Is draft: No
Webform: Project
Project Title AI-based Analysis of Historical Handwritten Text Transcription and Historical Image Analysis
Program CAREERS
Project Image
Tags ai (271), data-analysis (422), deep-learning (303), distributed-computing (92), python (69), tensorflow (51)
Status
Project Leader Sonia Yaco
Email sonia.yaco@rutgers.edu
Mobile Phone
Work Phone
Mentor(s)
Student-facilitator(s)
Mentee(s)
Project Description Margaret Clark Griffis was one the first women missionaries to modernizing Japan (1870s) and is credited with helping modernize Japanese women’s education. From 1872 to 1874, she served as a teacher and assistant principal in the Jo-Gakko girls school, the first Japanese government school for girls. Margaret kept a very detailed diaries during her time in Japan; The importance of these texts lies in their potential to provide insights into Japan's history, culture, and social practices, particularly with respect to Christian missionary activities. The diaries together with historical images are part of the William Elliot Griffis Collection at Rutgers’ Special Collections and University Archives.
This project proposes an AI-based transcription system for historical texts written by Christian missionaries in Japan during the 19th century and image analysis of historic photos from Japan form the late 19th century. The use of AI can expedite transcription and improve accuracy while creating digital archives for accessibility. The project aims to facilitate a deeper understanding of Japan's past, as digital archives make the texts more accessible and searchable for researchers and scholars.
Project Deliverables
Project Deliverables
Student Research Computing Facilitator Profile We are looking for a Grad student to conduct research on applying AI methods to understand the text and image collections at the libraries. The student will test and deploy machine learning and deep learning models to transcribe the handwritten text and identify the objects in the historical photographs. The project requires data exploration, modeling, and analysis with Python and utilizing packages such as Pandas, scikit-learn, and Tensorflow.
Mentee Research Computing Profile
Student Facilitator Programming Skill Level Practical applications
Mentee Programming Skill Level
Project Institution Rutgers University
Project Address CoRE Building, 96 Frelinghuysen Road
Piscataway, New Jersey. 08854
Anchor Institution CR-Rutgers
Preferred Start Date 07/01/2023
Start as soon as possible. Yes
Project Urgency Already behind3Start date is flexible
Expected Project Duration (in months) 6
Launch Presentation
Launch Presentation Date 07/12/2023
Wrap Presentation
Wrap Presentation Date 01/10/2024
Project Milestones
  • Milestone Title: Launch presentation
    Milestone Description: Present an overview of the project at the July monthly CAREERS meeting.
    Completion Date Goal: 2023-07-12
  • Milestone Title: Refine Model
    Milestone Description: Refining the "Griffis" handwriting transcription model.
    Completion Date Goal: 2023-08-16
  • Milestone Title: Run on larger dataset and data analysis
    Milestone Description: Applying handwriting transcription model to additional texts and analyzing the results.
    Completion Date Goal: 2023-09-13
  • Milestone Title: Selecting images for test set
    Milestone Description: Selecting the images for the test-set for model training.
    Completion Date Goal: 2023-10-11
  • Milestone Title: Applying Model
    Milestone Description: Applying the model on a large collection of images and combining the data with the transcription data.
    Completion Date Goal: 2023-12-21
  • Milestone Title: Wrap presentation
    Milestone Description: Presenting the project outcomes at the January CAREERS monthly meeting.
    Completion Date Goal: 2024-01-10
Github Contributions
Planned Portal Contributions (if any)
Planned Publications (if any)
What will the student learn? The student will learn to test and deploy machine learning and deep learning models to transcribe handwritten text and identify objects in historical photographs and learn how to deploy the models on distributed computing resources.

What will the mentee learn?
What will the Cyberteam program learn from this project?
HPC resources needed to complete this project? Access to Rutgers' HPC cluster named Amarel
Notes
What is the impact on the development of the principal discipline(s) of the project?
What is the impact on other disciplines?
Is there an impact physical resources that form infrastructure?
Is there an impact on the development of human resources for research computing?
Is there an impact on institutional resources that form infrastructure?
Is there an impact on information resources that form infrastructure?
Is there an impact on technology transfer?
Is there an impact on society beyond science and technology?
Lessons Learned
Overall results