Jupyter Notebooks are, for the most part, useful for fast experimentation. However, poor programming practices can lead you into the unreproducible-code trap, which is fatal for your research goals. In this talk, we will guide you through a practical and swift process to avoid this trap and migrate your notebook code to a nice, neat little reproducible Python package. This, you will see, will allow you to improve testing, automation, reproducibility (of course), and enhance your long-term scientific workflows overall. We will cover important Software Engineering fundamentals: separating concerns, organizing modules, adding environment controls, and enabling CLI executions for HPC environments. You will witness the game-changing benefits that Python packages provide, among which reproducibility is the most valuable for long-term research work. We will also dive into highly profitable fields of work where these techniques are used extensively to improve ML pipelines (also known as MLOps), which enable companies to deliver substantial amounts of value to their customers at a super fast pace. As you will see, dominating this set of skills will position you as a very attractive candidate for a substantial number of companies and startups in this hot and crazy AI market that we are living in today.
Architecting Reproducible Science: A Practical Path Beyond the Notebook
Remote event
Instructor
Fernando Garzon
Data Scientist and Software Engineer, SDSC
Fernando works as a Data Scientist and Software Engineer at the San Diego Supercomputer Center (SDSC) since August 2022. Although his background is in Physics, he transitioned into Scientific Computing and Software Engineering, focusing on building reproducible, scalable workflows for research and HPC environments. His work spans data engineering, backend development, distributed systems, and modern AI/ML pipelines. Fernando is also part of the Open Science Chain project, where he helps design and implement secure, high-performance architectures using Python, TypeScript, and blockchain technologies. He is passionate about bridging the gap between exploratory research code and production-ready scientific software.