Language models and using HPC resources
0
Documentation and research based on the latest NLP text generation detection methods for 2023.
Big Data Research at the University of Colorado Boulder
0
Background: Big data, defined as having high volume, complexity or velocity, have the potential to greatly accelerate research discovery. Such data can be challenging to work with and require research support and training to address technical and ethical challenges surrounding big data collection, analysis, and publication.
Methods: The present study was conducted via a series of semi-structured interviews to assess big data methodologies employed by CU Boulder researchers across a broad sample of disciplines, with the goal of illuminating how they conduct their research; identifying challenges and needs; and providing recommendations for addressing them.
Findings: Key results and conclusions from the study indicate: gaps in awareness of existing big data services provided by CU Boulder; open questions surrounding big data ethics, security and privacy issues; a need for clarity on how to attribute credit for big data research; and a preference for a variety of training options to support big data research.
The Official Documentation of Pandas
0
Pandas is one of the most essential Python libraries for data analysis and manipulation. It provides high-performance, easy-to-use data structures, and data analysis tools for the Python programming language. The official documentation serves as an in-depth guide to using this powerful tool including explanations and examples.
Numpy - a Python Library
0
Numpy is a python package that leverages types and compiled C code to make many math operations in Python efficient. It is especially useful for matrix manipulation and operations.
Slurm Tutorials
0
Introduction to the Slurm Workload Manager for users and system administrators, plus some material for Slurm programmers.
ACCESS Events and Training
0
Listing of upcoming ACCESS related events and training activities.
MATLAB bioinformatics toolbox
0
Bioinformatics Toolbox provides algorithms and apps for Next Generation Sequencing (NGS), microarray analysis, mass spectrometry, and gene ontology. Using toolbox functions, you can read genomic and proteomic data from standard file formats such as SAM, FASTA, CEL, and CDF, as well as from online databases such as the NCBI Gene Expression Omnibus and GenBank.
Texas A&M HPRC Training Site
0
Training Resources and Courses offered by Texas A&M's Research Computing Group
Spack Documentation
0
Spack is a package manager for supercomputers that can help administrators install scientific software and libraries for multiple complex software stacks.
Neurodesk
0
Neurodesk provides a containerised data analysis environment to facilitate reproducible analysis of neuroimaging data. Analysis pipelines for neuroimaging data typically rely on specific versions of packages and software, and are dependent on their native operating system. These dependencies mean that a working analysis pipeline may fail or produce different results on a new computer, or even on the same computer after a software update. Neurodesk provides a platform in which anyone, anywhere, using any computer can reproduce your original research findings given the original data and analysis code.
Master's in Data Science Program Guide - TechGuide
0
A master’s degree in data science helps prepare professionals to take the next career step. This article will focus primarily on data science, a graduate degree in this field, and a data scientist or data analyst career. With many employers preferring a master’s degree in data science for those seeking to fill roles as data scientists or analysts, we will discuss the data science master’s degree in detail.
marimo | a next generation python notebook
0
Introduction seminar for new reactive python notebook from marimo ambassador.
Fundamentals of Cloud Computing
0
An introduction to Cloud Computing
Oakridge Leadership Computing Facility (OLCF) Training Events and Archive
0
Upcoming training events and archives of training materials detailing general HPC best practices as well as how to use OLCF resources and services.
Factor Graphs and the Sum-Product Algorithm
0
A tutorial paper that presents a generic message-passing algorithm, the sum-product algorithm, that operates in a factor graph. Following a single, simple computational rule, the sum-product algorithm computes either exactly or approximately various marginal functions derived from the global function. A wide variety of algorithms developed in artificial intelligence, signal processing, and digital communications can be derived as specific instances of the sum-product algorithm, including the forward/backward algorithm, the Viterbi algorithm, the iterative "turbo" decoding algorithm, Pearl's (1988) belief propagation algorithm for Bayesian networks, the Kalman filter, and certain fast Fourier transform (FFT) algorithms
National Public Radio (NPR)
0
Pluses and challenges of mentor selection. Offers tips for acquiring a mentor (finding, asking). And how to be a good mentee. SMART framework mentioned. Discrimination mentioned. Difference between mentor and sponsor underlined. More than one mentor encouraged. Good tips.
Set Up VSCode for Python and Github
0
VSCode is a popular IDE that runs on Windows, MacOS, and Linux. This tutorial will explain how to get set up with VSCode to code in Python. It will also provide a tutorial on how to set up Github integration within VSCode.
Ask.CI Q&A Platform for Research Computing
0
Fine-tuning LLMs with PEFT and LoRA
0
As LLMs get larger fine-tuning to the full extent can become difficult to train on consumer hardware. Storing and deploying these tuned models can also be quite expensive and difficult to store. With PEFT (parameter -efficent fine tuning), it approaches fine-tune on a smaller scale of model parameters while freezing most parameters of the pretrained LLMs. Basically it is providing full performance that which is similar if not better than full fine tuning while only having a small number of trainable parameters. This source explains that as well as going over LORA diagrams and a code walk through.
ACES: Charliecloud Containers for Scientific Workflows (Tutorial)
0
This tutorial introduces the use of Containers using the Charliecloud software suite. This tutorial will provide participants with background and hands-on experience to use basic Charliecloud containers for HPC applications. We discuss what containers are, why they matter for HPC, and how they work. We'll give an overview of Charliecloud, the unprivileged container solution from Los Alamos National Laboratory's HPC Division. Students will learn how to build toy containers and containerize real HPC applications, and then run them on a cluster. Exercises are demonstrated using the ACES cluster, a composable accelerator testbed at Texas A&M University. Students with an allocation on the ACES cluster can follow along with the ACES-specific exercises.
AHPCC documentary
0
This link is a documentary website to use AHPCC.
OpenHPC: Beyond the Install Guide
0
Materials for the "OpenHPC: Beyond the Install Guide" half-day tutorial, first offered at PEARC24. The goal of this repository is to let instructors or self-learners to construct one or more OpenHPC 3.x virtual environments, for those environments to be as close as possible to the defaults from the OpenHPC installation guide, and to then use those environments to demonstrate several topics beyond the basic installation guide.
Topics include:
1. Building a login node that's practically identical to a compute node (except for where it needs to be different)
2. Adding more security to the SMS and login node
3. Using node-local storage for the OS and/or scratch
4. De-coupling the SMS and the compute nodes (e.g., independent kernel versions)
5. GPU driver installation (simulated/recorded, not live)
6. Easier management of node differences (GPU or not, diskless/single-disk/multi-disk, Infiniband or not, etc.)
7. Slurm configuration to match some common policy goals (fair share, resource limits, etc.)
InsideHPC
0
InsideHPC is an informational site offers videos, research papers, articles, and other resources focused on machine learning and quantum computing among other topics within high performance computing.
Active inference textbook
0
This textbook is the first comprehensive treatment of active inference, an integrative perspective on brain, cognition, and behavior used across multiple disciplines including computational neurosciences, machine learning, artificial intelligence, and robotics. It was published in 2022 and it's open access at this time. The contents in this textbook should be educational to those who want to understand how the free energy principle is applied to the normative behavior of living organisms and who want to widen their knowledge of sequential decision making under uncertainty.
DELTA Introductory Video
0
Introductory video about DELTA. Speaker Tim Boerner, Senior Assistant Director, NCSA