4

Kubeflow – Jupyter Notebook on Kubernetes

Machine Learning and KubernetesKubeflow combines those two subjects. This post describes how to run a sample Jupyter Notebook based on Kubeflow version 0.1 (recently announced) and Minikube.

Ksonnet is the tool to get started.

Next step is to perform the steps below:

Most of these steps are taken from Kubeflow v0.1 announcement with a few adaptions:

  • the cluster is stopped before ks init my-kubeflow to avoid the error below:
    $ ks init my-kubeflow
    INFO Using context "minikube" from kubeconfig file "/home/[user]/.kube/config" 
    INFO Configuring TLS (from file) for retrieving cluster swagger.json 
    INFO Creating environment "default" with namespace "default", pointing to cluster at address "https://192.168.99.100:8443" 
    ERROR Invalid API specification 'version:'
    To undo this simply delete directory 'my-kubeflow' and re-run `ks init`.
    If the error persists, try using flag '--context' to set a different context or run `ks init --help` for more options
    
  • enable port forwarding to jupyter server – access http://127.0.0.1:8100 in browser
    kubectl port-forward tf-hub-0 8100:8000 --namespace=${NAMESPACE}
  • access minkube cluster dashboard at kubeflow namespace http://192.168.99.100:30000/#!/overview?namespace=kubeflow
    minikube dashboard

The applied my-tf-job started a pod the cluster:

that is running soon after:

In order to run a jupyter notebook, open http://127.0.0.1:8100 in your browser.
You can log in to the jupyter server framework with any credentials:

and start the server afterwards:

Now you can run a notebook image. Click in the image field and select one of the available images:

The notebook image is starting

and soon after ready to go:

Also the minikube dashboard shows the started pod:

 

This tutorial uses the tensorflow2DimsToALine.ipynb sample notebook from tensorflow2DimsToALine repository.

With that setup on your local cluster, you can experience the jupyter notebooks interactive controls as you would do with running jupyter notebook on your local machine.

To conclude: use Kubeflow and Minikube to do local composable and portable on top of Kubernetes.


Appendix 1

The ksonnet command

ks pkg install kubeflow/core

may cause this error:

ERROR resolve registry library: GET https://api.github.com/repos/google/kubeflow/contents/kubeflow/core/tests/spartakus_test.jsonnet?ref=ccebb3b0c02a2b597b583df73c17ad003796444d: 403 API rate limit exceeded for 2.206.171.107. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.) [rate reset in 55m54s]

This behavior is similar to https://github.com/ksonnet/ksonnet/issues/233. I could fix this only with building ksonnet from sources and use ks with a github personal access token.
Part of the local build process is the make install command. It will install ks in $GOPATH/bin directory. Probably you want to add this folder to your path, in order to execute the ks command and other commands built from go sources on your machine.

Adding the ks binary to your path can be achieved with:

export GOBIN=$GOPATH/bin
PATH=$PATH:$GOBIN

With this setup in place a successful ks-installation command would be:

$ GITHUB_TOKEN=[personalAccessToken] ks pkg install kubeflow/core
INFO Retrieved 29 files                 

Appendix 2

When starting the cluster with Minikube you may experience this issue:

$ minikube start
Starting local Kubernetes v1.10.0 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
E0505 12:33:27.124048    2492 start.go:281] Error restarting cluster:  restarting kube-proxy: waiting for kube-proxy to be up for configmap update: timed out waiting for the condition

That is very similar to Error restarting cluster minikube issue. The only workaround that helped me was downgrade:

  • minikube version: v0.25.2
  • kubernetest version: v1.9.4

There is no minikube update/downgrade command (yet), so the downgrade is the same as installing minikube (on linux):

curl -Lo minikube https://storage.googleapis.com/minikube/releases/v0.25.2/minikube-linux-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/

Lothar Schulz

4 Comments

  1. Thank you for the comprehensive tutorial. I am facing problem when choosing the notebook image. It gives me this error:
    Failed to pull image “gcr.io/kubeflow-images-public/tensorflow-1.4.1-notebook-cpu:v20180419-0ad94c4e”: rpc error: code = Unknown desc = Error response from daemon: Get https://gcr.io/v2/: dial tcp: lookup gcr.io on 10.0.2.3:53: read udp 10.0.2.15:46302->10.0.2.3:53: i/o timeout

    Do you know why is that so? I run it from my minikube. Thank you and hope to hear soon.

    • The error you reported includes a timeout. Unfortunately I can’t reproduce this behavior.
      I assume the timeout issue could be caused by
      1) referencing a docker image that does not exists
      or
      2) network issues.

      For 1) you could refer only to existing images from the Spawner Options as in https://www.lotharschulz.info/wp-content/uploads/spawn_notebook_image.png
      For 2) you could make sure the host machine as well as the pod in the cluster is able to connect to https://gcr.io/ e.g.

      $ curl https://gcr.io/
      <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
      <TITLE>302 Moved</TITLE></HEAD><BODY>
      <H1>302 Moved</H1>
      The document has moved
      <a href="https://cloud.google.com/container-registry/" >here</A>.
      </BODY></HTML>

      Just a wild guess: did you stop the cluster as proposed in the first two lines of https://gist.github.com/lotharschulz/813a993fdc9821c7ddac2d87f540fc76#file-kubeflow-deployment-L15-L33?

      • Hi,
        Yes your guess is right. I did not stop the cluster as proposed. And after doing so, it really worked! However, may I know why do we need to stop minikube before installing the kubeflow components? Thank you.

        • Hi Grace,

          I’m glad to read you got a working solution.
          As far as I remember I had the idea to stop the cluster because the error message included:

          ERROR Invalid API specification 'version:'
          

          At the time, the only service running that potentially offers an API (on top of my default set up) was mini kube.
          So I just gave it a try to stop the service and voilà … it worked.

          You may check out: Getting Started with Kubeflow that includes Minikube for Kubeflow.
          I see a lot of improvement since version 0.1 that is the base for my post.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.