Submission Number: 185
Submission ID: 4159
Submission UUID: f0fe8f07-765e-4aef-a2be-593874f1840e
Submission URI: /form/project

Created: Mon, 10/09/2023 - 14:28
Completed: Mon, 10/09/2023 - 14:28
Changed: Mon, 05/20/2024 - 10:35

Remote IP address: 199.223.241.7
Submitted by: John Moustakas
Language: English

Is draft: No
Webform: Project
Project Title Ultrafast Spectral Energy Distribution Modeling of Galaxies using GPUs
Program Campus Champions, CAREERS
Project Image
Tags optimization (509), parallelization (223), astrophysics (297), gpu (80), python (69)
Status Complete
Project Leader John Moustakas
Email jmoustakas@siena.edu
Mobile Phone 518-390-4012
Work Phone 518-783-4274
Mentor(s)
Student-facilitator(s) Samyak Tuladhar
Mentee(s)
Project Description One of the most important outstanding problems in observational and theoretical astrophysics is to understand the physical origin and evolution of galaxies. Galaxies are gravitationally-bound systems consisting of tens to hundreds of billions of stars, gas, and dust, as well as large amounts of dark matter, which we observe across the entire 14 billion-year history of the universe. Fortunately, sophisticated models exist which allow us to interpret the observed spectral energy distributions of galaxies---in essence, how bright they appear in different parts of the electromagnetic spectrum, particularly in the ultraviolet, optical, and infrared---in terms of their physical properties such as stellar mass and star-formation rate. For example, the stellar mass of a galaxy reveals how efficiently gas has been converted into stars over the evolutionary history of the galaxy, while the star-formation rate indicates the current rate at which new stars are being born, or whether star formation has ceased entirely.

Not surprisingly, the parameter likelihood space which must be explored in order to effectively model observations of galaxies can be very large. In addition, the latest generation of massively multiplexed astrophysical surveys such as the Dark Energy Spectroscopic Instrument (DESI) survey are observing samples of tens of millions of galaxies. Consequently, there is an acute need for massively parallelized, computationally efficient code which can extract astrophysically meaningful constraints from large observational datasets of galaxies.

The open-source Python software package needed to carry out this project is called FastSpecFit (https://fastspecfit.readthedocs.org/en/latest). The code is reasonably well-documented and it has already been run on a high-performance computing system on samples of millions of galaxies observed by DESI. There are two computational bottlenecks, however, which are hampering being able to deploy FastSpecFit at the next scale, both in terms of input sample size and complexity of the underlying astrophysical models. These bottlenecks involve non-negative least-squares (NNLS) and non-linear least-squares fitting, both of which are currently being done using the CPU-optimized SciPy library.

With these issues in mind, the goal of this project is to port the computational "heart" of FastSpecFit to GPUs. We propose using JAX (https://jax.readthedocs.io/en/latest), which uses automatic (or computational) differentiation for optimization. Specifically, the open-source project JAXopt (https://jaxopt.github.io/stable) includes well-tested algorithms for solving a wide range of both linear and non-linear constrained optimization problems using GPU-accelerated, automatic differentiation. After testing these algorithms using simple (simulated) datasets, we will then implement an optional GPU version of FastSpecFit, and ultimately test it on actual DESI data.
Project Deliverables This project includes three major deliverables:

1. Documentation which clearly describes how all software products and their dependencies (particularly JAX and JAXopt) should be installed and run, both with and without GPUs.

2. Executable, well-documented code which solves both simulated and real-data bounded non-linear least-squares problems.

3. Comparisons (via benchmarking runs) of existing CPU (e.g., scipy.optimize) and GPU/JAX implementations of the identical problems.
Project Deliverables
Student Research Computing Facilitator Profile Samyak (Sam) Tuladhar (sd10tula@siena.edu) is a sophomore undergraduate physics major at Siena College and he has both the interest and technical background needed to undertake this project.
Mentee Research Computing Profile
Student Facilitator Programming Skill Level Some hands-on experience
Mentee Programming Skill Level
Project Institution Siena College
Project Address Department of Physics and Astronomy
515 Loudon Rd
Loudonville, New York. 12211
Anchor Institution CR-Rensselaer Polytechnic Institute
Preferred Start Date 12/01/2023
Start as soon as possible. No
Project Urgency Already behind3Start date is flexible
Expected Project Duration (in months) 6
Launch Presentation
Launch Presentation Date 01/05/2024
Wrap Presentation
Wrap Presentation Date 05/17/2024
Project Milestones
  • Milestone Title: Complete JAX and JAXOpt Tutorials
    Milestone Description: Gain familiarity with JAX and JAXOpt by completing several of the tutorials at https://jax.readthedocs.io/en/latest/advanced_guide.html and https://jaxopt.github.io/stable/notebooks/index.html.
    Completion Date Goal: 2024-01-15
  • Milestone Title: Generate and model synthetic data
    Milestone Description: Generate synthetic data (a simple emission-line spectrum with noise), code up the objective function, and optimize its performance using JAX/JAXopt.
    Completion Date Goal: 2024-03-01
  • Milestone Title: Modify FastSpecFit to model real data
    Milestone Description: Add a "GPU/JAX mode" to FastSpecFit and use it to model at least one real galaxy spectrum.
    Completion Date Goal: 2024-04-15
  • Milestone Title: Benchmarking and documentation
    Milestone Description: Carry out benchmarking results on a larger set of spectra and finalize all documentation.
    Completion Date Goal: 2024-06-01
Github Contributions https://github.com/Samyak-DT/FasterSpecFit
Planned Portal Contributions (if any)
Planned Publications (if any) If successful, I anticipate describing the proposed work and its outcomes in a larger publication which will most likely be submitted to The Astrophysical Journal, one of the top astrophysical journals in the world. Alternatively, depending on the interests of the student, we could prepare a shorter, more technical paper and submit it to a GPU/HPC computing journal (TBD).
What will the student learn? The student will learn how deploying GPUs on HPC systems can lead to significant improvements in computing speed, and how those speed-ups directly improve our ability to do science with large astronomical datasets. The student will also improve their Python programming skills and learn how to clearly document and communicate their results to collaborators with a wide range of technical backgrounds.

What will the mentee learn?
What will the Cyberteam program learn from this project? JAX and JAXOpt are powerful tools for a range of applications in scientific computing, machine learning, artificial intelligence, and much more. The Cyberteam will gain documentation and example code which demonstrates how these codes can be deployed on GPUs on HPCs, and benchmarked, well-documented code which illustrates how that code can be applied to solve one specific class of astrophysics problems.
HPC resources needed to complete this project? We will need access to a multi-node GPU system and a modern software architecture with an isolated software environment where all the code dependencies can be installed (Python, JAX, etc.).
Notes
What is the impact on the development of the principal discipline(s) of the project?
What is the impact on other disciplines?
Is there an impact physical resources that form infrastructure?
Is there an impact on the development of human resources for research computing?
Is there an impact on institutional resources that form infrastructure?
Is there an impact on information resources that form infrastructure?
Is there an impact on technology transfer?
Is there an impact on society beyond science and technology?
Lessons Learned
Overall results