Scalable Computing Workflows

DSE512 - Spring 2026

Instructor

Scott Emrich
Office: 608 Min Kao
Phone: (865) 974-3891; E-mail: semrich at utk.edu
Office hours: TBD; and by appointment

Overview

This course covers foundational concepts for building scalable and reproducible computational workflows. Students will learn how to decompose work into jobs, manage dependencies, handle failure, and execute workloads using batch schedulers and containers. This course is neither a Spark certification course nor an MPI programming class. The course will emphasize principles that generalize across scientific HPC environments and industry data platforms, preparing students to reason about scale, cost, and the responsible use of shared computing systems.

Text and syllabus

The syllabus can be found here

Schedule

Date Topic Homework Notes
1/20/2026 Intro to scalable computing    
1/22/2026 Spark vs. SLURM    
1/27/2026 Responsible HPC usage    
1/29/2026 Intro to containers and DAGs    
2/03/2026 Job Arrays and Parameter Sweeps    
2/05/2026 Failure on purpose    
2/10/2026 Workflow design at a high level    
2/12/2026 From hand-drawn DAGs to executable workflows    
2/17/2026 From research idea to workflow: Capstone pitches    
2/19/2026 Execution frameworks as workflow realizations    
2/24/2026 SLURM jam: From design to debugging    
2/26/2026 Project DAG Design Studio (no class)    
3/03/2026 Scaling is a tradeoff, not a goal    
3/05/2026 Designing for humans: logging, monitoring, and debuggability    
3/10/2026 Spring break (no class)    
3/12/2026 Spring break (no class)    
3/17/2026 Interactive compute as a tool    
3/19/2026 Interactive job jam!    
3/24/2026 Data movement & I/O in HPC workflows    
3/26/2026 Project architecture presentation prep (no class)    
3/31/2026 Scheduler guest lecture (student choice)    
4/2/2026 Spring recess (no class)    
4/7/2026 Initial project presentations for feedback    
4/9/2026 Cost and Performance Thinking    
4/14/2026 Evalutating not just executing workflows    
4/16/2026 Workflow jam: designing towards evaluation    
4/21/2026 Guest lecture: "War stories" from UTK NICS (Crosby)    
4/23/2026 Guest lecture: "War stories" from ORNL OLCF (Holman)    
4/28/2026 Draft project presentations (in class)    
4/30/2026 Study hall to finalize projects    
5/05/2026 Projects due (no formal class; videos will be shared)    

Academic dishonesty

All students are required to abide by the DSE and University Honor Code.

Discussion of concepts and general approaches with classmates is encouraged; however, unless explicitly stated otherwise, all submitted code and written answers must be developed and written individually.

You may use external resources—including documentation, textbooks, and generative AI tools—as learning aids (e.g., to clarify syntax, understand error messages, or review general concepts). However, relying on such tools to generate substantial portions of assignment solutions, complete implementations, or logic specific to a graded task is not permitted.

If you are unsure whether a particular use is allowed, please ask. As a guiding principle: if you could not explain or re-derive the solution without the tool, then its use was inappropriate. Submitted work must reflect your own understanding and effort.