Go From Notebook To Pipeline For Your Data Science Projects With Orchest

The Python Podcast.__init__ - A podcast by Tobias Macey

Podcast artwork

Categories:

Summary Jupyter notebooks are a dominant tool for data scientists, but they lack a number of conveniences for building reusable and maintainable systems. For machine learning projects in particular there is a need for being able to pivot from exploring a particular dataset or problem to integrating that solution into a larger workflow. Rick Lamers and Yannick Perrenet were tired of struggling with one-off solutions when they created the Orchest platform. In this episode they explain how Orchest allows you to turn your notebooks into executable components that are integrated into a graph of execution for running end-to-end machine learning workflows. Announcements Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Rick Lamers and Yannick Perrenet about Orchest, a development environment designed for building data science pipelines from notebooks and scripts. Interview Introductions How did you get introduced to Python? Can you start by giving an overview of what Orchest is and the story behind it? Who are the users that you are building Orchest for and what are their biggest challenges? What are some examples of the types of tools or workflows that they are using now? What are some of the other tools or strategies in the data science ecosystem that Orchest might replace? (e.g. MLFlow, Metaflow, etc.) What problems does Orchest solve? Can you describe how Orchest is implemented? How have the design and goals of the project changed since you first started working on it? What is the workflow for someone who is using Orchest? What are some of the sharp edges that they might run into? What is the deployable unit once a pipeline has been created? How do you handle verification and promotion of pipelines across staging and production environments? What are the interfaces available for integrating with or extending Orchest? How might an organization incorporate a pipeline defined in Orchest with the rest of their data orchestration workflows? How are you approaching governance and sustainability of the Orchest project? What are the most interesting, innovative, or unexpected ways that you have seen Orchest used? What are the most interesting, unexpected, or challenging lessons that you have learned while building Orchest? When is Orchest the wrong choice? What do you have planned for the future of the project and company? Keep In Touch Rick ricklamers on GitHub LinkedIn @RickLamers on Twitter Yannick yannickperrenet on GitHub LinkedIn Picks Tobias Fresh Bagels Rick Vaex Yannick Cookiecutter Pyenv Links Orchest Geoffrey Hinton Yann LeCun CoffeeScript Vim GAN == Generative Adversarial Network Git SQL BigQuery Software Carpentry Podcast Episode Google Colab Airflow Podcast Episode Kedro Data Engineering Podcast Episode nbdev Podcast Episode Papermill Data Engineering Podcast Episode MLFlow Metaflow Podcast Episode DVC Podcast Episode Andrew Ng Kubeflow Lua Caddy Traefik DAG == Directed Acyclic Graph Jupyter Enterprise Gateway Streamlit Kubernetes Dagster Podcast.__init__ Episode Data Engineering Podcast Episode DBT Data Engineering Podcast Episode GitLab Spark ETL The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA