From notebook to pipeline in no time with LineaPy
Abstract
The nightmare before data science production: You found a working prototype for your problem using a Jupyter notebook and now it’s time to build a production grade solution from that notebook. Unfortunately, your notebook looks anything but production grade. You embark on a time-consuming journey of refactoring the notebook. You come across irrelevant and relevant code snippets that are scattered in different cells but you persevere. Midway through your journey, you realize that your refactoring is not immune from the reproducibility issues caused by deleted cells and out-of-order cell executions. We haven’t even talked about the creation of a pipeline from that notebook yet! A desperate situation indeed. The good news is, there’s finally a cure! The open-source python package LineaPy aims to automate data science workflow generation and expediting the process of going from data science development to production. And truly, it transforms messy notebooks into data pipelines like Apache Airflow, DVC, Argo, Kubeflow, and many more. And if you can’t find your favorite orchestration framework, you are welcome to work with the creators of LineaPy to contribute a plugin for it! In this talk, you will learn the basic concepts of LineaPy and how it supports your everyday tasks as a data practitioner. For this purpose, we will transform a notebook step by step together to create a DVC pipeline. Finally, we will discuss what place LineaPy will take in the MLOps universe. Will you only have to check in your notebook in the future?
Date
Apr 17, 2023 10:50 AM — 11:35 AM
Event
Location
bcc Berlin Congress Center