Submission information
Submission Number: 167
Submission ID: 3796
Submission UUID: f7842b35-5003-45ea-9f50-527cf4532103
Submission URI: /form/project
Created: Wed, 06/28/2023 - 10:07
Completed: Wed, 06/28/2023 - 10:07
Changed: Mon, 07/03/2023 - 16:35
Remote IP address: 128.6.36.20
Submitted by: Udi Zelzion
Language: English
Is draft: No
Webform: Project
Project Title: AI-based Analysis of Historical Handwritten Text Transcription and Historical Image Analysis Program: CAREERS (323) Project Image: {Empty} Tags: ai (271), data-analysis (422), deep-learning (303), distributed-computing (92), python (69), tensorflow (51) Status: {Empty} Project Leader -------------- Project Leader: Sonia Yaco Email: sonia.yaco@rutgers.edu Mobile Phone: {Empty} Work Phone: {Empty} Project Personnel ----------------- Mentor(s): {Empty} Student-facilitator(s): {Empty} Mentee(s): {Empty} Project Information ------------------- Project Description: Margaret Clark Griffis was one the first women missionaries to modernizing Japan (1870s) and is credited with helping modernize Japanese women’s education. From 1872 to 1874, she served as a teacher and assistant principal in the Jo-Gakko girls school, the first Japanese government school for girls. Margaret kept a very detailed diaries during her time in Japan; The importance of these texts lies in their potential to provide insights into Japan's history, culture, and social practices, particularly with respect to Christian missionary activities. The diaries together with historical images are part of the William Elliot Griffis Collection at Rutgers’ Special Collections and University Archives. This project proposes an AI-based transcription system for historical texts written by Christian missionaries in Japan during the 19th century and image analysis of historic photos from Japan form the late 19th century. The use of AI can expedite transcription and improve accuracy while creating digital archives for accessibility. The project aims to facilitate a deeper understanding of Japan's past, as digital archives make the texts more accessible and searchable for researchers and scholars. Project Information Subsection ------------------------------ Project Deliverables: {Empty} Project Deliverables: {Empty} Student Research Computing Facilitator Profile: We are looking for a Grad student to conduct research on applying AI methods to understand the text and image collections at the libraries. The student will test and deploy machine learning and deep learning models to transcribe the handwritten text and identify the objects in the historical photographs. The project requires data exploration, modeling, and analysis with Python and utilizing packages such as Pandas, scikit-learn, and Tensorflow. Mentee Research Computing Profile: {Empty} Student Facilitator Programming Skill Level: Practical applications Mentee Programming Skill Level: {Empty} Project Institution: Rutgers University Project Address: CoRE Building, 96 Frelinghuysen Road Piscataway, New Jersey. 08854 Anchor Institution: CR-Rutgers Preferred Start Date: 07/01/2023 Start as soon as possible.: Yes Project Urgency: Already behind3Start date is flexible Expected Project Duration (in months): 6 Launch Presentation: {Empty} Launch Presentation Date: 07/12/2023 Wrap Presentation: {Empty} Wrap Presentation Date: 01/10/2024 Project Milestones: - Milestone Title: Launch presentation Milestone Description: Present an overview of the project at the July monthly CAREERS meeting. Completion Date Goal: 2023-07-12 - Milestone Title: Refine Model Milestone Description: Refining the "Griffis" handwriting transcription model. Completion Date Goal: 2023-08-16 - Milestone Title: Run on larger dataset and data analysis Milestone Description: Applying handwriting transcription model to additional texts and analyzing the results. Completion Date Goal: 2023-09-13 - Milestone Title: Selecting images for test set Milestone Description: Selecting the images for the test-set for model training. Completion Date Goal: 2023-10-11 - Milestone Title: Applying Model Milestone Description: Applying the model on a large collection of images and combining the data with the transcription data. Completion Date Goal: 2023-12-21 - Milestone Title: Wrap presentation Milestone Description: Presenting the project outcomes at the January CAREERS monthly meeting. Completion Date Goal: 2024-01-10 Github Contributions: {Empty} Planned Portal Contributions (if any): {Empty} Planned Publications (if any): {Empty} What will the student learn?: The student will learn to test and deploy machine learning and deep learning models to transcribe handwritten text and identify objects in historical photographs and learn how to deploy the models on distributed computing resources. What will the mentee learn?: {Empty} What will the Cyberteam program learn from this project?: {Empty} HPC resources needed to complete this project?: Access to Rutgers' HPC cluster named Amarel Notes: {Empty} Final Report ------------ What is the impact on the development of the principal discipline(s) of the project?: {Empty} What is the impact on other disciplines?: {Empty} Is there an impact physical resources that form infrastructure?: {Empty} Is there an impact on the development of human resources for research computing?: {Empty} Is there an impact on institutional resources that form infrastructure?: {Empty} Is there an impact on information resources that form infrastructure?: {Empty} Is there an impact on technology transfer?: {Empty} Is there an impact on society beyond science and technology?: {Empty} Lessons Learned: {Empty} Overall results: {Empty}