install.rst 15.2 KB
Newer Older
1 2 3 4 5
.. _install:

Installing Zoe
==============

6
If you are looking for the five-minutes install procedure, just for testing, check the :ref:`test install <test-install-label>` section, below.
7

8
When installing Zoe for production you should, first of all, look at the following requirements and take a decision about each of them:
9

10 11
* Container back-end: Kubernetes or DockerEngine
* Shared filesystem: we have deployments on NFS and CephFS, but anything similar should work
12 13 14 15
* Network: how your users will connect to the containers
* Authentication back-end: how your users will authenticate to Zoe (LDAP or text file)
* How to manage Zoe Applications (ZApps)
* ZApp output logs: see :ref:`logging`
qhoangxuan's avatar
qhoangxuan committed
16

17
After, you can start the installation, as outlined in the :ref:`manual install <manual-install-label>` section.
qhoangxuan's avatar
qhoangxuan committed
18

19 20
Choosing the container back-end
-------------------------------
qhoangxuan's avatar
qhoangxuan committed
21

22 23
At this time Zoe supports three back-ends:

24
* DockerEngine: uses one or more Docker Engines. It is simple to install and to scale.
25
* Kubernetes: the most complex to setup, we suggest using it only if you already have (or need) a Kubernetes setup for running other software.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80

DockerEngine
^^^^^^^^^^^^

The DockerEngine back-end uses one or more nodes with Docker Engine installed and configured to listen to network requests.

The Docker Engines must be configured to enable `multi host networking <https://docs.docker.com/engine/userguide/networking/overlay-standalone-swarm/>`_.

This sample config file, usually found in ``/etc/docker/daemon.conf`` may help to get you started::

   {
      "dns": ["192.168.46.1"],
      "dns-search": ["bigfoot.eurecom.fr"],
      "tlsverify": true,
      "tlscacert": "/mnt/certs/cacert.pem"
      "tlscert": "/mnt/certs/cert.pem",
      "tlskey": "/mnt/certs/key.pem",
      "hosts": ["tcp://bf11.bigfoot.eurecom.fr:2375", "unix:///var/run/docker.sock"]
    }

Once you have your docker hosts up and running, to tell the back-end which nodes are available and how to connect to them, you need to create a file with this format::

    [DEFAULT]
    use_tls: no
    tls_cert: /mnt/cephfs/admin/cert-authority/container-router/cert.pem
    tls_key: /mnt/cephfs/admin/cert-authority/container-router/key.pem
    tls_ca: /mnt/cephfs/admin/cert-authority/ca.pem

    [foo]
    address: localhost:2375
    external_address: 192.168.45.42

    [bar]
    docker_address: 192.168.47.5:2375
    external_address: 192.168.47.5
    use_tls: yes

    [baz]
    docker_address: 192.168.47.50:2375
    external_address: 192.168.47.50
    use_tls: yes
    labels: gpu,ssd

This sample configuration describes three hosts. The DEFAULT section contains items that are common to all hosts, in any case  these entries can be overwritten in the host definition.

Host ``foo`` does not use TLS (from the default config item), Zoe needs to connect to localhost on port 2375 to talk to it and users connecting to containers running on this host need to use the ``192.168.45.42`` address to connect. This ``external_address`` will be used by Zoe to generate links in the web interface.

Host ``bar`` uses TLS and host ``baz`` has also two labels that can be matched when starting services with the corresponding label. Labels are comma separated.

You tell Zoe the location of this file using the ``backend-docker-config-file`` option in zoe.conf.

Kubernetes
^^^^^^^^^^

See :ref:`kube-backend` for configuration details.
81 82 83 84 85 86 87 88 89

Shared filesystem
-----------------

Users need to put data and binaries in a place accessible by Zoe and need to be able to access the results and the logs generated by running ZApp.

Zoe uses the concept of workspaces: each user has a private directory that is attached to all the containers of each ZApp belonging to her in a well-known location. This filesystem can be accessed by a special gateway container spawned by the administrator (see `gateway containers <https://github.com/DistributedSystemsGroup/gateway-containers>`_) or by other methods (direct mount on user machines, webdav, web file managers).

Zoe implements a "directory" back-end for workspaces. Container back-ends may implement more volume technologies: Zoe is not involved, it needs only the information on how to attach the user volume to the container, so the effort required to support new volume types should be minimal.
qhoangxuan's avatar
qhoangxuan committed
90

91
At Eurecom we use CephFS, but we know of successful Zoe deployments based on NFS.
Daniele Venzano's avatar
Daniele Venzano committed
92

93 94
Networking
----------
Daniele Venzano's avatar
Daniele Venzano committed
95

96
Most of the ZApps expose a number of interfaces (web, REST and others) to the user. Zoe configures the active back-end to expose these ports, but does not perform any additional action to configure routing or DNS to make the ports accessible. Keeping in mind that the back-end network configuration is outside Zoe's competence area, here there is non-exhaustive list of the possible configurations:
97 98 99 100 101

* expose the hosts running the containers by using public IP addresses
* use a proxy, like the one developed for Zoe: :ref:`proxy`
* use back-end network plugins to build custom topologies

102 103
Authentication back-ends
------------------------
104

105
Zoe supports multiple user authentication back-ends. Multiple back-ends can coexist at the same time.
106

107
Check the :ref:`users` page for more details on the user model.
108

109
Remember to disable or change the password of the default admin user.
110 111 112 113 114

LDAP
^^^^
Plain LDAP or LDAP+SASL GSSAPI are available.

115
In Zoe configuration you need to specify the following options:
116

117 118 119 120 121 122 123 124
* ``ldap-server-uri``
* ``ldap-bind-user``
* ``ldap-bind-password``
* ``ldap-base-dn``
* ``ldap-admin-gid``
* ``ldap-user-gid``
* ``ldap-guest-gid``
* ``ldap-group-name``
125 126 127 128

Text file
^^^^^^^^^
For testing and for simple deployments with a few users, a CSV text file can be used.
129

130
Its format is::
131

132 133 134 135 136 137 138
    <username>,<password>,<role>

The file location can be specified in the ``zoe.conf`` file and it needs to be readable only be the Zoe processes.

Managing Zoe applications
-------------------------

139
At the very base, ZApps are composed of a container image and a JSON description. The container image can be stored on the Docker nodes,  in a local private registry, or in a public one, accessible via the Internet.
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155

Zoe does not provide a way to automatically build images, push them to a local registry, or pull them to the hosts when needed. At Eurecom we provide an automated environment based on GItLab's CI features: users are able to customize their applications (JSON and Dockerfiles) by working on git repositories. Images are rebuilt and pushed on commit and JSON files are generated and copied to the ZApp shop directory. You can check out how we do it here:
https://gitlab.eurecom.fr/zoe-apps

The ZApp Shop
^^^^^^^^^^^^^

The Zoe web interface provides a ZApp shop to showcase available ZApps and have a friendly and easy way for users to list and access ZApps.

The shop is managed locally. It looks for ZApps in a configured directory (option ``zapp-shop-path``). Each ZApp must live in its own directory, that must contain:

* manifest.json : a JSON file that describes the contents of the ZApp
* a logo that is displayed on the web interface
* one or more text files in markdown format with ZApp information and documentation
* one or more JSON Zoe application descriptions

156
The ``manifest.json`` file gather all this information together for the ZApp Shop interface. Its format is as follows::
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181

    {
        "version": 1,
        "zapps": [
            {
                "category": "TensorFlow",
                "name": "Google TensorFlow notebook",
                "description": "tf-google.json",
                "readable_descr": "README-goog.md",
                "parameters": []
            },
            {
                "category": "TensorFlow",
                "name": "Google TensorFlow batch",
                "description": "tf-google.json",
                "readable_descr": "README-batch.md",
                "parameters": [
                    {
                        "kind": "command",
                        "name": "tf-jupyter",
                        "readable_name": "Command",
                        "description": "The Python script to run, relative to the workspace directory",
                        "type": "string",
                        "default": "./my-tf-app/main.py"
                    }
182
                ],
183
                "disabled_for": ["role_A"]
184 185 186 187 188
            }
        ]
    }

* version : a internal version, used by Zoe to recognize the manifest format. For now only 1 is supported.
189
* zapps : a list of ZApps that have to be shown in the shop
190 191 192

For each ZApp:

193
* category : the category this ZApp belongs to, it is used to group ZApps in the web interfaces. There are no pre-defined categories and you are free to put anything you want in here
194 195 196 197
* name : the human-readable name
* description : the name of the json file with the Zoe description
* readable_descr : the name of the markdown file containing user documentation for the ZApp
* parameters : a list of parameters the user can set to tune the ZApp before starting it
198
* disabled_for (optional) : list of roles that will not see this ZApp in the app shop
199 200 201 202 203

Parameters:

Parameters are values of the JSON description that are modified at run time.

204
* kind : the kind of parameter, it can be ``service_count``, ``command`` or ``environment``
205 206 207 208 209
* name : the machine-friendly name of the parameter
* readable_name : the human-friendly name of the parameter
* description : an helpful description
* type : string or integer, used for basic for validation
* default : the default value
210 211 212
* max : if ``type`` is integer, this is required and is the maximum value the user can set
* min : if ``type`` is integer, this is required and is the minimum value the user can set
* step : if ``type`` is integer, this is required and is the step for moving between values
213

214
Parameters can be of the following kinds:
215 216 217

* environment : the parameter is passed as an environment variable. The name of the environment variable is stored in the ``name`` field. The JSON description is modified by setting the user-defined value in the environment variable with the corresponding name. All services that have the variable defined are modified.
* command : the service named ``name`` has its start-up command changed to the user-defined value
218
* service_count : the service named ``name`` has its total_count and essential_count changed to the user-defined value
219

220
By default users with the ``user`` and ``admin`` roles have also access to parameters via the web interface. They can set the amount of memory and cores to reserve before starting their execution. The configuration option ``no-user-edit-limits-web`` can be used to disable access to this feature.
221

222
To get started, in the ``contrib/zapp-shop-sample/`` directory there is a sample of the structure needed for a working zapp-shop, including some data science related ZApps. Copy it as-is in your ZApp shop directory to have some Zapps to play with.
223 224

Example of distributed environment
225
----------------------------------
226

227
For running heavier workloads and distributed applications, you need a real container cluster. In this example we will use the DockerEngine back-end, as it is simpler to setup than Kubernetes.
228 229 230

Software:

231
* One or more Docker Engines
232 233
* Zoe
* NFS (or another distributed filesystem like CephFS)
234
* A Postgresql server
235 236 237

Topology:

238 239 240 241
* One node running Zoe. Depending on how your users will access the workspaces you may want to add `gateway containers <https://github.com/DistributedSystemsGroup/gateway-containers>`_ for SSH and/or SOCKS proxies on this node.
* At least one worker node with a Docker Engine
* A file server running NFS: depending on the workload it can be co-located with Zoe
* A Postgresql server, again it can be colocated depending on your expected load
qhoangxuan's avatar
qhoangxuan committed
242

243
To configure container networking, we suggest the standard Docker multi-host networking.
qhoangxuan's avatar
qhoangxuan committed
244

245
In this configuration Zoe expects the network filesystem to be mounted in the same location on all worker nodes. This location is specified in the ``workspace-base-path`` Zoe configuration item. Zoe will create a directory under it named as ``deployment-name`` by default or ``workspace-deployment-path`` if specified. Under it a new directory will be created for each user accessing Zoe.
246

247
.. _test-install-label:
248

249 250
Stand-alone environment for development and testing
---------------------------------------------------
251

252
A simple deployment for development and testing is possible with just:
253

254 255
* A Docker Engine
* Zoe
256 257 258

In the root of the repository you can find a ``docker-compose.yml`` file that should help get you started.

259
You will need to create a ``/etc/zoe`` directory containing the ``docker.conf`` file that lists the Docker engine nodes available to Zoe.
260

261 262 263 264
.. _manual-install-label:

Manual install (recommended for production)
-------------------------------------------
265 266

This section shows how to install the components outlined in the distributed environment outlined above. A lot of other options and possibilities exist for deploying Zoe.
qhoangxuan's avatar
qhoangxuan committed
267 268

Requirements
269
^^^^^^^^^^^^
qhoangxuan's avatar
qhoangxuan committed
270

271 272 273
* Python 3.4 or later
* One or more Docker engine
* A shared filesystem, mounted on all Docker hosts.
qhoangxuan's avatar
qhoangxuan committed
274 275 276 277 278

Optional:

* A logging pipeline able to receive GELF-formatted logs, or a Kafka broker

279 280
Docker Engine
^^^^^^^^^^^^^
qhoangxuan's avatar
qhoangxuan committed
281

282
Install Docker:
qhoangxuan's avatar
qhoangxuan committed
283 284 285 286 287 288 289 290 291 292 293 294 295 296 297

* https://docs.docker.com/installation/ubuntulinux/

Network configuration
^^^^^^^^^^^^^^^^^^^^^

Docker 1.9/Swarm 1.0 multi-host networking can be used in Zoe:

* https://docs.docker.com/engine/userguide/networking/get-started-overlay/

This means that you will also need a key-value store supported by Docker. We use Zookeeper, it is available in Debian and Ubuntu without the need for external package repositories and is very easy to set up.

Images: Docker Hub Vs local Docker registry
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

298 299 300
A few sample ZApps have their images available on the Docker Hub. Images can be manually (or via a CI pipeline) pulled on all the worker nodes.

A Docker Registry becomes interesting to have if you have lot of image build activity and you need to keep track of who builds what, establish ACLs, etc.
qhoangxuan's avatar
qhoangxuan committed
301 302

Zoe
303
^^^
qhoangxuan's avatar
qhoangxuan committed
304

Daniele Venzano's avatar
Daniele Venzano committed
305
Zoe is written in Python and uses the ``requirements.txt`` file to list the package dependencies needed for all components of Zoe. Not all of them are needed in all cases, for example you need the ``pykube`` library only if you use the Kubernetes back-end.
306

qhoangxuan's avatar
qhoangxuan committed
307 308 309 310 311
Currently this is the recommended procedure, once the initial Swarm setup has been done:

1. Clone the zoe repository
2. Install Python package dependencies: ``pip3 install -r requirements.txt``
3. Create new configuration files for the master and the api processes (:ref:`config_file`), you will need also access to a postgres database
312
4. Setup supervisor to manage Zoe processes: in the ``contrib/supervisor/`` directory you can find the configuration file for supervisor. You need to modify the paths to point to where you cloned Zoe and the user (Zoe does not need special privileges).
qhoangxuan's avatar
qhoangxuan committed
313 314 315
5. Start running ZApps!

In case of troubles, check the logs for errors. Zoe basic functionality can be tested via the ``zoe.py stats`` command. It will query the ``zoe-api`` process, that in turn will query the ``zoe-master`` process.
Daniele Venzano's avatar
Daniele Venzano committed
316 317 318

.. _api-manager-label:

319 320
API Managers
------------
Daniele Venzano's avatar
Daniele Venzano committed
321 322 323 324 325

To provide TLS termination, authentication, load balancing, metrics, and other services to the Zoe API, you can use an API manager in front of the Zoe API. For example:

* Tyk: https://tyk.io/tyk-documentation/get-started/with-tyk-on-premise/
* Kong: https://getkong.org/docs/0.10.x/proxy/