Submission information
Submission Number: 165
Submission ID: 3769
Submission UUID: af0b1e88-3357-4b39-943a-c850fd242402
Submission URI: /form/project
Created: Tue, 06/13/2023 - 08:10
Completed: Tue, 06/13/2023 - 08:10
Changed: Fri, 04/19/2024 - 14:12
Remote IP address: 173.59.116.37
Submitted by: Vinayak Mathur
Language: English
Is draft: No
Webform: Project
Project Title | High throughput Python pipeline to identify Horizontal Gene Transfer |
---|---|
Program | CAREERS |
Project Image | |
Tags | bioinformatics (277), biology (515), data-wrangling (6), genomics (537), github (490), python (69), workflow (365) |
Status | Halted |
Project Leader | Vinayak Mathur |
vm7027@cabrini.edu | |
Mobile Phone | 7324214925 |
Work Phone | |
Mentor(s) | Simon Delattre |
Student-facilitator(s) | Kendrick Key |
Mentee(s) | |
Project Description | Project Description: This project seeks to further investigate the genetic phenomenon of horizontal gene transfer (HGT), specifically when involving interactions between bacteriophages and their host bacteria. From a biological perspective, this type of horizontal gene transfer occurs when bacteriophages attach themselves to a bacterial cell and inject it with a vector such as a plasmid that integrates into the host genome and takes control of the bacterium to make copies of itself. The main aim of the project is to develop an analysis pipeline written in Python that automatically generates a large output list of bacterial accession numbers given an input list of phage accession numbers. The current program employs BLAST to create this list of accession numbers. In the analysis pipeline, the input list is iterated through, and each phage accession number is submitted as a BLAST query to be aligned with the NCBI database of bacterial genes. The top bacterial result for each phage query ID is stored and aligned with the database of bacteriophage genes in turn. A match between the original phage query ID and the phage result of the BLAST search where the bacterial accession number is the query ID indicates the presence of horizontal gene transfer. Conducting this analysis in an HPC environment using SSH could significantly speed up the process of data collection compared to the functioning of the current pipeline or performing manual searches on the NCBI website where BLAST has been made available. Current version of the pipeline is available here: https://github.com/genomesolver/CSPpipeline Research goals: This research project has three major goals: 1) Identify instances of HGT in a large dataset of bacteriophage proteins: The data list produced by the program facilitates more in-depth analysis of bacteriophage-mediated horizontal gene transfer. 2) Predict likelihood of HGT: By developing a probabilistic classifier, we can attempt to predict the likelihood that a certain clade of bacteria is affected by horizontal gene transfer given the HGT status of the other members of the clade. This model could assist in establishing the statistical significance of the occurrences of HGT in bacterial relatives and help identify cellular features specific to those groups of bacteria that could potentially explain their vulnerability to infection by phages. 3) Functional analysis: A Gene Ontology (GO) enrichment analysis is another research aim to extract meaningful conclusions from this data. Since the current version of the pipeline generates a list of bacterial accession numbers that correspond to phage query IDs, that list can be processed in order to find GO terms in groups of genes regulated by the integration of the nucleic acids of the bacteriophage. This type of data analysis would be very useful to visualize and increase the understanding of how the phage infections disrupt the genetic network of the bacteria. |
Project Deliverables | The goals of the project are: 1) To fine tune the already developed Python pipeline to be able to analyze larger datasets 2) Be able to use a offline version of NCBI database to run the analysis 3) Develop a model to be able to predict likelihood of HGT |
Project Deliverables | |
Student Research Computing Facilitator Profile | |
Mentee Research Computing Profile | |
Student Facilitator Programming Skill Level | Some hands-on experience |
Mentee Programming Skill Level | |
Project Institution | Cabrini University |
Project Address | 610 King of Prussia Road IAD 224 Radnor, Pennsylvania. 19087 |
Anchor Institution | CR-Penn State |
Preferred Start Date | |
Start as soon as possible. | No |
Project Urgency | Already behind3Start date is flexible |
Expected Project Duration (in months) | 4 |
Launch Presentation | |
Launch Presentation Date | |
Wrap Presentation | |
Wrap Presentation Date | |
Project Milestones |
|
Github Contributions | https://github.com/genomesolver/CSPpipeline |
Planned Portal Contributions (if any) | |
Planned Publications (if any) | Plan to publish the manuscript in the journal: https://iubmb.onlinelibrary.wiley.com/journal/15393429 |
What will the student learn? | |
What will the mentee learn? | |
What will the Cyberteam program learn from this project? | |
HPC resources needed to complete this project? | |
Notes | |
What is the impact on the development of the principal discipline(s) of the project? | |
What is the impact on other disciplines? | |
Is there an impact physical resources that form infrastructure? | |
Is there an impact on the development of human resources for research computing? | |
Is there an impact on institutional resources that form infrastructure? | |
Is there an impact on information resources that form infrastructure? | |
Is there an impact on technology transfer? | |
Is there an impact on society beyond science and technology? | |
Lessons Learned | |
Overall results |