Commit 63206ec9 authored by Daniele Venzano's avatar Daniele Venzano

Update ZApp shop

parent 2f2b3af6
# Spark ZApp
URL: [https://gitlab.eurecom.fr/zoe-apps/zapp-spark](https://gitlab.eurecom.fr/zoe-apps/zapp-spark)
Combine the full power of a distributed [Apache Spark](http://spark.apache.org) cluster with Python Jupyter Notebooks.
The Spark shell can be used from the built-in terminal in the notebook ZApp.
Spark is configured in stand-alone, distributed mode. This ZApp contains Spark version 2.2.0.
# Jupyter Notebook image
This image contains the Jupyter notebook configured with Pythen and a Spark client. It is used by Zoe, the Container Analytics as a
Service system to create on-demand notebooks connected to containerized Spark clusters.
Zoe can be found at: https://github.com/DistributedSystemsGroup/zoe
## Setup
The Dockerfile runs a start script that configures the Notebook using these environment variables:
* SPARK\_MASTER\_IP: IP address of the Spark master this notebook should use for its kernel
* PROXY\_ID: string to use as a prefix for URL paths, for reverse proxying
* SPARK\_EXECUTOR\_RAM: How much RAM to use for each executor spawned by the notebook
# Spark Scala master image
This image contains the Scala master process. It is used by Zoe, the Container Analytics as a
Service system to create on-demand Spark clusers in Spark standalone mode.
Zoe can be found at: https://github.com/DistributedSystemsGroup/zoe
## Setup
The Dockerfile automatically starts the Spark master process when the container is run.
# Spark worker image
This image contains the Scala worker process. It is used by Zoe, the Container Analytics as a
Service system to create on-demand Spark clusters in standalone mode.
Zoe can be found at: https://github.com/DistributedSystemsGroup/zoe
## Setup
The Dockerfile runs the worker process when run. The following options can be passed via environment variables:
* SPARK\_MASTER\_IP: IP address of the Spark master this notebook should use for its kernel
* SPARK\_WORKER\_RAM: How much RAM the worker can use (default is 4g)
* SPARK\_WORKER\_CORES: How many cores can be used by the worker process (default is 4)
{
"name": "clouds-lab-spark",
"services": [
{
"command": null,
"environment": [
[
"SPARK_MASTER_IP",
"{dns_name#self}"
],
[
"HADOOP_USER_NAME",
"{user_name}"
],
[
"PYTHONHASHSEED",
"42"
]
],
"essential_count": 1,
"image": "zapps/spark2-master-clouds:4769",
"monitor": false,
"name": "spark-master",
"ports": [
{
"name": "Spark master web interface",
"port_number": 8080,
"protocol": "tcp",
"url_template": "http://{ip_port}/"
}
],
"replicas": 1,
"resources": {
"cores": {
"max": 1,
"min": 0.1
},
"memory": {
"max": 2684354560,
"min": 2147483648
}
},
"startup_order": 0,
"total_count": 1,
"volumes": []
},
{
"command": null,
"environment": [
[
"SPARK_WORKER_CORES",
"1"
],
[
"SPARK_WORKER_RAM",
"9126805504"
],
[
"SPARK_MASTER_IP",
"{dns_name#spark-master0}"
],
[
"SPARK_LOCAL_IP",
"{dns_name#self}"
],
[
"PYTHONHASHSEED",
"42"
],
[
"HADOOP_USER_NAME",
"{user_name}"
]
],
"essential_count": 1,
"image": "zapps/spark2-worker-clouds:4769",
"monitor": false,
"name": "spark-worker",
"ports": [],
"replicas": 1,
"resources": {
"cores": {
"max": 1,
"min": 1
},
"memory": {
"max": 10737418240,
"min": 6442450944
}
},
"startup_order": 1,
"total_count": 4,
"volumes": []
},
{
"command": null,
"environment": [
[
"SPARK_MASTER",
"spark://{dns_name#spark-master0}:7077"
],
[
"SPARK_EXECUTOR_RAM",
"9125756416"
],
[
"SPARK_DRIVER_RAM",
"2147483648"
],
[
"HADOOP_USER_NAME",
"{user_name}"
],
[
"NB_USER",
"{user_name}"
],
[
"PYTHONHASHSEED",
"42"
],
[
"NAMENODE_HOST",
"hdfs-namenode.zoe"
]
],
"essential_count": 1,
"image": "zapps/spark2-jupyter-notebook-clouds:4769",
"monitor": true,
"name": "spark-jupyter",
"ports": [
{
"name": "Jupyter Notebook interface",
"port_number": 8888,
"protocol": "tcp",
"url_template": "http://{ip_port}/"
}
],
"replicas": 1,
"resources": {
"cores": {
"max": 2,
"min": 0.5
},
"memory": {
"max": 8589934592,
"min": 6442450944
}
},
"startup_order": 0,
"total_count": 1,
"volumes": []
}
],
"size": 648,
"version": 3,
"will_end": false
}
\ No newline at end of file
{
"version": 1,
"zapps": [
{
"category": "Teaching and labs",
"name": "Clouds Lab",
"description": "clouds-lab-zapp.json",
"readable_descr": "README-clouds.md",
"parameters": [],
"guest_access": true
}
]
}
...@@ -2,10 +2,8 @@ ...@@ -2,10 +2,8 @@
URL: [https://hub.docker.com/r/jupyter/r-notebook/](https://hub.docker.com/r/jupyter/r-notebook/) URL: [https://hub.docker.com/r/jupyter/r-notebook/](https://hub.docker.com/r/jupyter/r-notebook/)
* Jupyter Notebook 5.0.x * Jupyter Notebook and JupyterLab
* Conda R v3.3.x and channel * Conda R
* plyr, devtools, shiny, rmarkdown, forecast, rsqlite, reshape2, nycflights13, caret, rcurl, and randomforest pre-installed * plyr, devtools, shiny, rmarkdown, forecast, rsqlite, reshape2, nycflights13, caret, rcurl, and randomforest pre-installed
* The tidyverse R packages are also installed, including ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, lubridate, and broom * The tidyverse R packages are also installed, including ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, lubridate, and broom
Please note that you need to retrieve the secret key from the service logs to be able to access the notebooks.
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
"category": "Data science", "category": "Data science",
"readable_descr": "README-r.md", "readable_descr": "README-r.md",
"name": "R notebook", "name": "R notebook",
"description": "r-notebook.json", "description": "rdatasci.json",
"parameters": [] "parameters": []
} }
] ]
......
{
"name": "r-notebook",
"services": [
{
"command": null,
"environment": [
[
"NB_UID",
"1000"
],
[
"HOME",
"/mnt/workspace"
]
],
"essential_count": 1,
"image": "jupyter/r-notebook:latest",
"monitor": true,
"name": "jupyter",
"ports": [
{
"name": "Jupyter Notebook interface",
"port_number": 8888,
"protocol": "tcp",
"url_template": "http://{ip_port}/"
}
],
"replicas": 1,
"resources": {
"cores": {
"max": 4,
"min": 4
},
"memory": {
"max": 4294967296,
"min": 4294967296
}
},
"startup_order": 0,
"total_count": 1,
"volumes": [],
"work_dir": "/mnt/workspace"
}
],
"size": 512,
"version": 3,
"will_end": false
}
\ No newline at end of file
{ {
"name": "microsoft-mls", "name": "rdatasci",
"services": [ "services": [
{ {
"command": null, "command": "/opt/conda/bin/jupyter lab --no-browser --NotebookApp.token='' --allow-root --ip=0.0.0.0",
"environment": [ "environment": [],
[
"ACCEPT_EULA",
"yes"
]
],
"essential_count": 1, "essential_count": 1,
"image": "microsoft/mmlspark:0.10", "image": "zapps/rdatasci:10396",
"monitor": true, "monitor": true,
"name": "mls-notebook", "name": "r-notebook",
"ports": [ "ports": [
{ {
"name": "Notebook web interface", "name": "Jupyter Notebook interface",
"port_number": 8888, "port_number": 8888,
"protocol": "tcp", "protocol": "tcp",
"url_template": "http://{ip_port}/" "url_template": "http://{ip_port}/"
...@@ -24,8 +19,8 @@ ...@@ -24,8 +19,8 @@
"replicas": 1, "replicas": 1,
"resources": { "resources": {
"cores": { "cores": {
"max": 4, "max": 2,
"min": 4 "min": 2
}, },
"memory": { "memory": {
"max": 6442450944, "max": 6442450944,
......
# Microsoft Machine Learning for Apache Spark ZApp
Unmodified [Microsoft MLS](https://github.com/Azure/mmlspark) as generated by Microsoft.
The image used contains a Jupyter Notebook.
Please note that you need to get the Notebook key from the service logs to be able to access the Notebook.
{
"version": 1,
"zapps": [
{
"category": "Machine learning",
"name": "Microsoft Machine Learning for Spark",
"description": "microsoft-mls.json",
"readable_descr": "README.md",
"parameters": []
}
]
}
# Notebook for Data Science
This ZApp contains a Jupyter Notebook with a Python 3.5 kernel and the following libraries:
* Tensorflow 1.10.1, Tensorboard 1.10.0
* Pytorch and TorchVision 0.4.1
* pandas, matplotlib, scipy, seaborn, scikit-learn, scikit-image, sympy, cython, patsy, statsmodel, cloudpickle, dill, numba, bokeh
The GPU version contains also CUDA 9.0 and tensorflow with GPU support
## Customizations
### Adding Python libraries
To install additional libraries you can add the following code on top of your notebook:
import subprocess
import sys
def install(package):
subprocess.call([sys.executable, "-m", "pip", "--user", "install", package])
and call the `install(<package name>)` function to install all packages you need.
Finally restart the kernel to load the modules you just installed.
### Running your own script
By modifying the `command` parameter in the JSON file you can tell Zoe to run your own script instead of the notebook.
In this ZApp the default command is:
"command": "jupyter lab --no-browser --NotebookApp.token='' --allow-root --ip=0.0.0.0"
If you change the JSON and write:
"command": "/mnt/workspace/myscript.sh"
Zoe will run myscript.sh instead of running the Jupyter notebook. In this way you can:
* transform an interactive notebook ZApp into a batch one, with exactly the same libraries and environment
* perform additional setup before starting the notebook. In this case you will have to add the jupyter lab command defined above at the end of your script.
zapps/pydatasci:10396
zapps/pydatasci-gpu:10396
{
"version": 1,
"zapps": [
{
"category": "Data science",
"readable_descr": "README-datascience.md",
"name": "Data science notebook",
"description": "pydatasci.json",
"parameters": [],
"guest_access": true
},
{
"category": "Data science",
"readable_descr": "README-datascience.md",
"name": "Data science notebook GPU",
"description": "pydatasci-gpu.json",
"parameters": [
{
"kind": "environment",
"name": "NVIDIA_VISIBLE_DEVICES",
"readable_name": "GPU",
"description": "Which GPU to enable for this execution (e.g. all: all GPUs, 0: just GPU #0, 0,2: GPU #0 and #2)",
"type": "string",
"default": "all"
}
],
"guest_access": false
}
]
}
{ {
"name": "tf-google-gpu", "name": "pydatasci-gpu",
"services": [ "services": [
{ {
"command": null, "command": "jupyter lab --no-browser --NotebookApp.token='' --allow-root --ip=0.0.0.0",
"environment": [ "environment": [
[ [
"NVIDIA_VISIBLE_DEVICES", "NVIDIA_VISIBLE_DEVICES",
...@@ -10,22 +10,22 @@ ...@@ -10,22 +10,22 @@
] ]
], ],
"essential_count": 1, "essential_count": 1,
"image": "gcr.io/tensorflow/tensorflow:1.3.0-gpu-py3", "image": "zapps/pydatasci-gpu:10396",
"labels": [ "labels": [
"gpu" "gpu"
], ],
"monitor": true, "monitor": true,
"name": "tf-jupyter", "name": "py-notebook",
"ports": [ "ports": [
{ {
"name": "Tensorboard web interface", "name": "Jupyter Notebook interface",
"port_number": 6006, "port_number": 8888,
"protocol": "tcp", "protocol": "tcp",
"url_template": "http://{ip_port}/" "url_template": "http://{ip_port}/"
}, },
{ {
"name": "Notebook web interface", "name": "Tensorboard",
"port_number": 8888, "port_number": 6006,
"protocol": "tcp", "protocol": "tcp",
"url_template": "http://{ip_port}/" "url_template": "http://{ip_port}/"
} }
...@@ -33,12 +33,12 @@ ...@@ -33,12 +33,12 @@
"replicas": 1, "replicas": 1,
"resources": { "resources": {
"cores": { "cores": {
"max": 4, "max": 2,
"min": 4 "min": 2
}, },
"memory": { "memory": {
"max": 34359738368, "max": 6442450944,
"min": 34359738368 "min": 6442450944
} }
}, },
"startup_order": 0, "startup_order": 0,
......
{ {
"name": "mag-google", "name": "pydatasci",
"services": [ "services": [
{ {
"command": null, "command": "jupyter lab --no-browser --NotebookApp.token='' --allow-root --ip=0.0.0.0",
"environment": [], "environment": [],
"essential_count": 1, "essential_count": 1,
"image": "tensorflow/magenta", "image": "zapps/pydatasci:10396",
"monitor": true, "monitor": true,
"name": "tf-jupyter", "name": "py-notebook",
"ports": [ "ports": [
{ {
"name": "Tensorboard web interface", "name": "Jupyter Notebook interface",
"port_number": 6006, "port_number": 8888,
"protocol": "tcp", "protocol": "tcp",
"url_template": "http://{ip_port}/" "url_template": "http://{ip_port}/"
}, },
{ {
"name": "Notebook web interface", "name": "Tensorboard",
"port_number": 8888, "port_number": 6006,
"protocol": "tcp", "protocol": "tcp",
"url_template": "http://{ip_port}/" "url_template": "http://{ip_port}/"
} }
...@@ -25,12 +25,12 @@ ...@@ -25,12 +25,12 @@
"replicas": 1, "replicas": 1,
"resources": { "resources": {
"cores": { "cores": {
"max": 4, "max": 2,
"min": 4 "min": 2
}, },
"memory": { "memory": {
"max": 34359738368, "max": 6442450944,
"min": 34359738368 "min": 6442450944
} }
}, },
"startup_order": 0, "startup_order": 0,
......
# Jupyter Notebook with PyTorch
URL: [https://hub.docker.com/r/jupyter/scipy-notebook/](https://hub.docker.com/r/jupyter/scipy-notebook/) and [http://pytorch.org/](http://pytorch.org/)
* Jupyter Notebook 5.0.x
* Conda Python 3.x environment
* pandas, matplotlib, scipy, seaborn, scikit-learn, scikit-image, sympy, cython, patsy, statsmodel, cloudpickle, dill, numba, bokeh, vincent, beautifulsoup, xlrd pre-installed
* PyTorch
Please note that you need to retrieve the secret key from the service logs to be able to access the notebooks.
{
"version": 1,
"zapps": [
{
"category": "Machine learning",
"readable_descr": "README-pytorch.md",
"name": "PyTorch notebook",
"description": "pytorch-notebook.json",
"parameters": [],
"logo": "pytorch.png"
}
]
}
{
"name": "pytorch-notebook",
"services": [
{
"command": null,
"environment": [
[
"NB_UID",
"1000"
],
[
"HOME",
"/mnt/workspace"
]
],
"essential_count": 1,
"image": "zapps/pytorch:latest",
"monitor": true,
"name": "jupyter",
"ports": [
{
"name": "Jupyter Notebook interface",
"port_number": 8888,
"protocol": "tcp",
"url_template": "http://{ip_port}/"
}
],
"replicas": 1,
"resources": {
"cores": {
"max": 4,
"min": 4
},
"memory": {
"max": 4294967296,
"min": 4294967296
}
},
"startup_order": 0,
"total_count": 1,
"volumes": [],
"work_dir": "/mnt/workspace"
}