Commit 947fb0af authored by Daniele Venzano's avatar Daniele Venzano

Documentation updates

parent a939e9b3
......@@ -11,18 +11,18 @@ The main Zoe Components are:
The Zoe master is the core component of Zoe and communicates with the clients by using an internal ZeroMQ-based protocol. This protocol is designed to be robust, using the best practices from ZeroMQ documentation. A crash of the Api or of the Master process will not leave the other component inoperable, and when the faulted process restarts, work will restart where it was left.
In this architecture all application state is kept in a Postgres database. Platform state is kept in-memory and rebuilt at start time. A lot of care and tuning has been spent in keeping the vision Zoe has of the system and the real back-end state synchronised. In a few cases containers may be left orphaned: when Zoe deems it safe, they will be automatically cleaned-up, otherwise a warning in the logs will generated and the administrator has to examine the situation as, usually, it points to a bug hidden somewhere in the back-end code.
In this architecture all application state is kept in a Postgres database. Platform state is kept in-memory: built at start time and refreshed periodically. A lot of care and tuning has been spent in keeping synchronized the view Zoe has of the system and the real back-end state. In a few cases containers may be left orphaned: when Zoe deems it safe, they will be automatically cleaned-up, otherwise a warning in the logs will generated and the administrator has to examine the situation as, usually, it points to a bug hidden somewhere in the back-end code.
Users submit *execution requests*, composed by a name and an *application description*. The frontend process (Zoe api) informs the Zoe Master that a new execution request is available for execution.
Inside the Master, a scheduler keeps track of available resources and execution requests, and applies a
scheduling policy to decide which requests should be satisfied as soon as possible and which ones can be deferred for later.
The master also talks to a container orchestrator (Docker Swarm for example) to create and destroy containers and to read monitoring information used to schedule applications.
The master also talks to a container orchestrator (Docker for example) to create and destroy containers and to read monitoring information used to schedule applications.
Application descriptions
------------------------
Application descriptions are at the core of Zoe. They are likely to evolve in time, to satisfy the needs of new distributed analytic engines. The current version is built around several use cases involving MPI, Spark and Jupyter notebooks.
Application descriptions are composed of a set of generic attributes that apply to the whole Zoe Application (abbreviated in ZApp) and a list of Zoe Frameworks. Each Framework is composed by Zoe Services, that describe actual Docker containers. The composition of Frameworks and Services is described by a dependency tree.
Application descriptions are composed of a set of generic attributes that apply to the whole Zoe Application (abbreviated in ZApp) and a list of services. Zoe Services describe actual Docker containers.
The Zoe Service descriptions are strictly linked to the Docker images they use, as they specify environment variables and commands to be executed. We successfully used third party images, demonstrating the generality of Zoe's approach.
The Zoe Service descriptions are strictly linked to the Docker images they use, as they specify environment variables and commands to be executed. We successfully used unmodified third party images, demonstrating the generality of Zoe's approach.
......@@ -17,7 +17,7 @@ zoe.conf
--------
The Zoe config file have a simple format of ``<option name> = <value>``. Dash characters can be use for comments.
All Zoe processes use one single configuration file, called zoe.conf. It is searched in the current working directory and in ``/etc/zoe/``.
All Zoe processes use a single configuration file, called zoe.conf. It is looked up in the current working directory and in ``/etc/zoe/``.
Common options:
......@@ -34,6 +34,8 @@ Metrics:
* ``kairosdb-enable = false`` : Enable gathering of usage metrics recorded in KairosDB
* ``kairosdb-url = http://localhost:8090`` : URL of KairosDB REST API
* ``influxdb-enable = false`` : Enable gathering of usage metrics recorded in InfluxDB
* ``kairosdb-url = http://localhost:8086`` : URL of InfluxDB REST API
Service logs (see: :ref:`logging`):
......@@ -60,7 +62,6 @@ API options:
Master options:
* ``api-listen-uri = tcp://*:4850`` : ZeroMQ server connection string, used for the master listening endpoint
* ``overlay-network-name = zoe`` : name of the pre-configured Docker overlay network Zoe should use (Swarm backend)
* ``max-core-limit = 16`` : maximum amount of cores a user is able to reserve
* ``max-memory-limit = 64`` : maximum amount of memory a user is able to reserve
* ``additional-volumes = <none>`` : list of additional volumes to mount in every service, for every ZApp (ex. /mnt/data:data,/mnt/data_n:data_n)
......@@ -94,6 +95,7 @@ Kubernetes back-end:
DockerEngine back-end:
* ``backend-docker-config-file = docker.conf`` : name of the DockerEngine back-end configuration file
* ``overlay-network-name = zoe`` : Name of the Docker network Zoe should use (can be overridden in the ZApp definition)
Proxy options:
......
......@@ -29,27 +29,25 @@ DockerEngine
The DockerEngine back-end uses one or more nodes with Docker Engine installed and configured to listen to network requests.
The Docker Engines must be configured to enable `multi host networking <https://docs.docker.com/engine/userguide/networking/overlay-standalone-swarm/>`_.
This sample config file, usually found in ``/etc/docker/daemon.conf`` may help to get you started::
{
"dns": ["192.168.46.1"],
"dns-search": ["bigfoot.eurecom.fr"],
"dns-search": ["zoe.example.com"],
"tlsverify": true,
"tlscacert": "/mnt/certs/cacert.pem"
"tlscacert": "/mnt/certs/cert-authority/ca.pem"
"tlscert": "/mnt/certs/cert.pem",
"tlskey": "/mnt/certs/key.pem",
"hosts": ["tcp://bf11.bigfoot.eurecom.fr:2375", "unix:///var/run/docker.sock"]
"hosts": ["tcp://worker1.zoe.example.com:2375", "unix:///var/run/docker.sock"]
}
Once you have your docker hosts up and running, to tell the back-end which nodes are available and how to connect to them, you need to create a file with this format::
[DEFAULT]
use_tls: no
tls_cert: /mnt/cephfs/admin/cert-authority/container-router/cert.pem
tls_key: /mnt/cephfs/admin/cert-authority/container-router/key.pem
tls_ca: /mnt/cephfs/admin/cert-authority/ca.pem
tls_cert: /mnt/certs/client/cert.pem
tls_key: /mnt/certs/client/key.pem
tls_ca: /mnt/certs/cert-authority/ca.pem
[foo]
address: localhost:2375
......@@ -66,7 +64,7 @@ Once you have your docker hosts up and running, to tell the back-end which nodes
use_tls: yes
labels: gpu,ssd
This sample configuration describes three hosts. The DEFAULT section contains items that are common to all hosts, in any case these entries can be overwritten in the host definition.
This sample configuration describes three hosts. The DEFAULT section contains items that are common to all hosts, however they can be overwritten in the host definition.
Host ``foo`` does not use TLS (from the default config item), Zoe needs to connect to localhost on port 2375 to talk to it and users connecting to containers running on this host need to use the ``192.168.45.42`` address to connect. This ``external_address`` will be used by Zoe to generate links in the web interface.
......@@ -93,6 +91,10 @@ At Eurecom we use CephFS, but we know of successful Zoe deployments based on NFS
Networking
----------
Containers spawned by Zoe need to be able to talk to each other on the network like they where on the same broadcast domain, even when they run on different hosts. Network configuration is back-end dependent: both Kubernetes and Docker provide their own systems to manage the virtual network between containers.
Docker provides a feature called ``multi host networking``. An alternative that we found more efficient and simple to setup and maintain is `Flannel <https://github.com/coreos/flannel>`_.
Most of the ZApps expose a number of interfaces (web, REST and others) to the user. Zoe configures the active back-end to expose these ports, but does not perform any additional action to configure routing or DNS to make the ports accessible. Keeping in mind that the back-end network configuration is outside Zoe's competence area, here there is non-exhaustive list of the possible configurations:
* expose the hosts running the containers by using public IP addresses
......@@ -136,10 +138,9 @@ The file location can be specified in the ``zoe.conf`` file and it needs to be r
Managing Zoe applications
-------------------------
At the very base, ZApps are composed of a container image and a JSON description. The container image can be stored on the Docker nodes, in a local private registry, or in a public one, accessible via the Internet.
ZApps are composed of a container image and a JSON description. The container image can be stored on the Docker nodes, in a local private registry or in the public Docker Hub (or any other public registry).
Zoe does not provide a way to automatically build images, push them to a local registry, or pull them to the hosts when needed. At Eurecom we provide an automated environment based on GItLab's CI features: users are able to customize their applications (JSON and Dockerfiles) by working on git repositories. Images are rebuilt and pushed on commit and JSON files are generated and copied to the ZApp shop directory. You can check out how we do it here:
https://gitlab.eurecom.fr/zoe-apps
Zoe does not provide a way to automatically build images, push them to a local registry, or pull them to the hosts when needed. At Eurecom we provide an automated environment based on GitLab's CI features: users are able to customize their applications (JSON and Dockerfiles) by working on git repositories. Images are rebuilt and pushed on commit and JSON files are generated and copied to the ZApp shop directory. You can check out a few examples here: https://gitlab.eurecom.fr/zoe-apps
The ZApp Shop
^^^^^^^^^^^^^
......@@ -292,34 +293,30 @@ Docker 1.9/Swarm 1.0 multi-host networking can be used in Zoe:
This means that you will also need a key-value store supported by Docker. We use Zookeeper, it is available in Debian and Ubuntu without the need for external package repositories and is very easy to set up.
An alternative is Flannel. It can be configured to use IP routing without tunneling, that improves performance under heavy workloads. Flannel required etcd.
* https://github.com/coreos/flannel
Images: Docker Hub Vs local Docker registry
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A few sample ZApps have their images available on the Docker Hub. Images can be manually (or via a CI pipeline) pulled on all the worker nodes.
The ZApps we use at Eurecom have their images available on the Docker Hub. Images can be manually (or via a CI pipeline) pulled on all the worker nodes.
A Docker Registry becomes interesting to have if you have lot of image build activity and you need to keep track of who builds what, establish ACLs, etc.
An internal Docker Registry becomes interesting to have if you have lot of image build activity and you need to keep track of who builds what, establish ACLs, etc.
The simplest way to manage images is to load them on the Docker Hub and pull them on all the hosts via some automation tool, like Ansible.
Zoe
^^^
Zoe is written in Python and uses the ``requirements.txt`` file to list the package dependencies needed for all components of Zoe. Not all of them are needed in all cases, for example you need the ``pykube`` library only if you use the Kubernetes back-end.
Currently this is the recommended procedure, once the initial Swarm setup has been done:
Currently this is the recommended procedure, once the initial back-end setup has been done:
1. Clone the zoe repository
2. Install Python package dependencies: ``pip3 install -r requirements.txt``
3. Create new configuration files for the master and the api processes (:ref:`config_file`), you will need also access to a postgres database
3. Create new configuration files for the master and the api processes (:ref:`config_file`), you will need postgres credentials
4. Setup supervisor to manage Zoe processes: in the ``contrib/supervisor/`` directory you can find the configuration file for supervisor. You need to modify the paths to point to where you cloned Zoe and the user (Zoe does not need special privileges).
5. Start running ZApps!
In case of troubles, check the logs for errors. Zoe basic functionality can be tested via the ``zoe.py stats`` command. It will query the ``zoe-api`` process, that in turn will query the ``zoe-master`` process.
.. _api-manager-label:
API Managers
------------
To provide TLS termination, authentication, load balancing, metrics, and other services to the Zoe API, you can use an API manager in front of the Zoe API. For example:
* Tyk: https://tyk.io/tyk-documentation/get-started/with-tyk-on-premise/
* Kong: https://getkong.org/docs/0.10.x/proxy/
......@@ -22,10 +22,10 @@ Because of this in Zoe we decided to leave the maximum freedom to administrators
In this case the logs command line, API and web interface will not be operational.
Swarm-only integrated log management
------------------------------------
Docker engine integrated log management
---------------------------------------
When using the Swarm back-end, however, Zoe can configure the containers to produce the output in UDP GELF format and send them to a configured destination, via the ``gelf-address`` option. Each messages is enriched with labels to help matching each log line to the ZApp and service that produced it.
When using the Docker Engine back-end Zoe can configure the containers to produce the output in UDP GELF format and send them to a configured destination, via the ``gelf-address`` option. Each messages is enriched with labels to help matching each log line to the ZApp and service that produced it.
GELF is understood by many tools, like Graylog or the `ELK <https://www.elastic.co/products>`_ and it is possible to store the service output in Elasticsearch and make it searchable via Kibana, for example.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment