Clouderizer | End-to-End MLOps Platform

Jupyter Notebook Deployment - No more ipynb to python

Deploy any notebook as serverless function in a few minutes

Jupyter Notebooks are arguably one of the most popular tools used by Data Engineers and Data Scientists worldwide. Data ETLs, machine learning training, experimentation, model testing, model inference –  all can be done from Jupyter Notebook interfaces itself. These notebooks are also excellent at generating visual reports, dashboards and training ML models.  While Jupyter Notebook is an awesome IDE for the above tasks, it’s not very easy to put these notebooks into some automated pipeline to perform these tasks on a recurring basis. But reports, dashboards and ML models need regular refresh, based on new incoming data.

Often people resort to converting their ipython notebooks (.ipynb) files to python script (.py) files which can be deployed in some pipeline and invoked programmatically, in a recurring fashion. One huge drawback in converting ipynb files to python (.py) scripts, apart from the developer effort needed for this conversion, is the need to maintain and manage duplicate code bases.

Papermill solved this problem by allowing one to run a Jupyter Notebooks (ipynb) file as if it’s a python script (py) file. Netflix is a contributor to this project and a big promoter to this idea of using Jupyter Notebooks in ETL and Data pipelines. Papermill supports notebook parameterization, using which we can override the value of any variable used inside the notebook at the time of invoking it. This opens up a whole new way of running our automated ETL job and ML training where the output notebook becomes a one stop immutable record for our cron job with report, dashboard, logs and error messages, all in one place.

Clouderizer supports deploying Jupyter Notebooks as serverless functions using Papermill. No need to convert your ipynb file to python. Any Jupyter Notebook can be deployed to a scalable serverless infrastructure, with just one CLI command and under 2 minutes !

*Note: Only python notebook support is in production right now. R notebook support is in beta. Please contact us in case you want early access to it. In case you have requirements for other kernels, please send us your request at info@clouderizer.com.

Examples

Pre-requisites:

  • Active Clouderizer account. Signup a free account here if you don’t have one.
  • Install Clouderizer CLI and login. Read here for detailed instructions.



Example 1:

Deploy a python ETL notebook (etl.ipynb) to clouderizer as a serverless function. Notebook takes one S3 url as input to load the data. This input parameter is used in a variable in your notebook with name input_dataset_url

  1. Parameterize your notebook for the input S3 url as per Papermill guidelines (basically ensuring that the S3 URL variable is in a single cell and tag that cell as parameter)

  2. Type the following command in a terminal window

    cldz deploy -n python etl.ipynb

  3. Clouderizer CLI will try to auto detect dependencies from your notebook and give you a preview. If you feel dependencies look ok, press y to continue. If you feel something is missing, you can compile list of dependencies in a requirements.txt and try deploying again with command

    cldz deploy -n python etl.ipynb requirements.txt

  4. That’s it. This will deploy your notebook as a serverless function. Packaging and deployment might take a few minutes depending on the dependencies. Once deployment is complete, you will get two URLs for your serverless function
    1. Sync URL – we can use this to invoke our function when we know our notebook takes < 1 min to execute.
    2. Async URL – we can use this to invoke our function when our notebook takes longer to execute.

  5. Example invocation using curl

    curl -i -X POST -F input_dataset_url=https://mys3bucket.s3.amazonaws.com/dataset.zip?1234 https://showcase.clouderizer.com/api/async-function/clouderizerdemo-etl/notebook

    Above is an async invocation. It immediately returns back with 202 Accepted http response code. One can login to Clouderizer console to see the invocation progress of this request. Once this function execution is complete, the output notebook is available for viewing and download.



Example 2:

Deploy a tensorflow deep learning notebook (tf_deeplearning.ipynb) as a serverless function with GPU support. Deployment will take input an S3 url for input dataset and an integer for batch size. Both inputs are defined as variables in the notebook with names input_dataset_url and batch_size. Notebook also generates a model file exported to local folder path with variable outputDir

  1. Parameterize your notebook for the inputs as per Papermill guidelines (basically ensuring that all input variables are in a single cell and tag that cell as parameter)

  2. Type the following command in a terminal window

    cldz deploy -n python tf_deeplearning.ipynb –infra gpu

    *Note the infra option for GPU deployment

  3. Clouderizer CLI will try to auto detect dependencies from your notebook and give you a preview. If you feel dependencies look ok, press y to continue. If you feel something is missing, you can compile list of dependencies in a requirements.txt and try deploying again with command

    cldz deploy -n python tf_deeplearning.ipynb requirements.txt –infra gpu

  4. That’s it. This will deploy your notebook as a serverless function. Packaging and deployment might take a few minutes depending on the dependencies. Once deployment is complete, you will get two URLs for your serverless function
    1. Sync URL – we can use this to invoke our function when we know our notebook takes < 1 min to execute.
    2. Async URL – we can use this to invoke our function when our notebook takes longer to execute.

  5. Example invocation using curl

    curl -i -X POST -F input_dataset_url=https://mys3bucket.s3.amazonaws.com/dataset.zip?1234 -F batch_size=128 https://showcase.clouderizer.com/api/async-function/clouderizerdemo-tf-deeplearning/notebook -H X-Callback-Url https://mywebhook.com

*Note the callback url provided in the http header.

Above is an async invocation. It immediately returns back with 202 Accepted http response code. Once execution is complete, the callback url specified in the request is called with the http result. This result will contain the S3 url of the model file generated during notebook invocation.

Get help

Product Documentation to get you started and fill in with advanced features.

How To videos around core features