Commit 9f1a8716 authored by Daniele Venzano's avatar Daniele Venzano
Browse files

Update the documentation, second part

parent 06fbad59
version: '2'
services:
postgres:
image: postgres
gateway-socks:
image: zoerepo/gateway-socks
networks:
- zoe
image: postgres:9.3
zoe-api:
image: zoerepo/zoe
command: python3 zoe-api.py --debug --swarm ${SWARM_URL} --deployment-name compose --master-url tcp://zoe-master:4850 --dbuser postgres --dbhost postgres --dbname postgres
image: zoerepo/zoe-test
command: python3 zoe-api.py --debug --backend DockerEngine --backend-docker-config-file /etc/zoe/docker.conf --deployment-name compose --master-url tcp://zoe-master:4850 --dbuser postgres --dbhost postgres --dbname postgres
ports:
- "8080:5001"
depends_on:
- postgres
zoe-master:
image: zoerepo/zoe
image: zoerepo/zoe-test
ports:
- "4850:4850"
volumes:
- /etc/zoe:/etc/zoe
- /opt/zoe-workspaces:/mnt/zoe-workspaces
command: python3 zoe-master.py --debug --swarm ${SWARM_URL} --deployment-name compose --dbuser postgres --dbhost postgres --dbname postgres
command: python3 zoe-master.py --debug --backend DockerEngine --backend-docker-config-file /etc/zoe/docker.conf --deployment-name compose --dbuser postgres --dbhost postgres --dbname postgres
depends_on:
- zoe-api
networks:
......
......@@ -5,13 +5,13 @@ Architecture
The main Zoe Components are:
* zoe master: the core component that performs application scheduling and talks to Swarm
* zoe master: the core component that performs application scheduling and talks to the container back-end
* zoe api: the Zoe frontend, offering a web interface and a REST API
* command-line clients (zoe.py and zoe-admin.py)
The Zoe master is the core component of Zoe and communicates with the clients by using an internal ZeroMQ-based protocol. This protocol is designed to be robust, using the best practices from ZeroMQ documentation. A crash of the Api or of the Master process will not leave the other component inoperable, and when the faulted process restarts, work will restart where it was left.
In this architecture all state is kept in a Postgres database. With Zoe we try very hard not to reinvent the wheel and the internal state system we had in the previous architecture iteration was starting to show its limits.
In this architecture all application state is kept in a Postgres database. Platform state is kept in-memory and rebuilt at start time. A lot of care and tuning has been spent in keeping the vision Zoe has of the system and the real back-end state synchronised. In a few cases containers may be left orphaned: when Zoe deems it safe, they will be automatically cleaned-up, otherwise a warning in the logs will generated and the administrator has to examine the situation as, usually, it points to a bug hidden somewhere in the back-end code.
Users submit *execution requests*, composed by a name and an *application description*. The frontend process (Zoe api) informs the Zoe Master that a new execution request is available for execution.
Inside the Master, a scheduler keeps track of available resources and execution requests, and applies a
......
......@@ -13,7 +13,7 @@ Development repository
----------------------
Development happens at `Eurecom's GitLab repository <https://gitlab.eurecom.fr/zoe/main>`_. The GitHub repository is a read-only mirror.
The choice of GitLab over GitHub is due to the CI pipeline that we set-up to test Zoe.
The choice of GitLab over GitHub is due to the CI pipeline that we set-up to test Zoe. Please note the issue tracking happens on GitHub.
Bug reports and feature requests
--------------------------------
......@@ -22,10 +22,10 @@ Bug reports and feature requests are handled through the GitHub issue system at:
The issue system should be used for only for bug reports or feature requests. If you have more general questions, you need help setting up Zoe or running some application, please use the mailing list.
The mailing list
----------------
Mailing list
------------
The first step is to subscribe to the mailing list: `http://www.freelists.org/list/zoe <http://www.freelists.org/list/zoe>`_
The mailing list: `http://www.freelists.org/list/zoe <http://www.freelists.org/list/zoe>`_
Use the mailing list to stay up-to-date with what other developers are working on, to discuss and propose your ideas. We prefer small and incremental contributions, so it is important to keep in touch with the rest of the community to receive feedback. This way your contribution will be much more easily accepted.
......
.. _main_index:
Zoe - Container-based Analytics as a Service
============================================
Zoe Analytics - Container-based Analytics as a Service
======================================================
Zoe provides a simple way to provision data analytics applications. It hides the complexities of managing resources, configuring and deploying complex distributed applications on private clouds. Zoe is focused on data analysis applications, such as `Spark <http://spark.apache.org/>`_ or `Tensorflow <https://www.tensorflow.org/>`_. A generic, very flexible application description format lets you easily describe any kind of data analysis application.
Zoe Analytics provides a simple way to provision data analytics applications. It hides the complexities of managing resources, configuring and deploying complex distributed applications on private clouds. Zoe is focused on data analysis applications, such as `Spark <http://spark.apache.org/>`_ or `Tensorflow <https://www.tensorflow.org/>`_. A generic, very flexible application description format lets you easily describe any kind of data analysis application.
Downloading
-----------
......@@ -27,9 +27,9 @@ Now you can check if you are up and running with this command::
It will return some version information, by querying the zoe-api and zoe-master processes.
Zoe applications are passed as JSON files. A few sample ZApps are available in the ``contrib/zoeapps/`` directory. To start a ZApp use the following command::
Zoe applications are passed as JSON files. A few sample ZApps are available in the ``contrib/zapp-shop-sample/`` directory. To start a ZApp use the following command::
./zoe.py start joe-spark-notebook contrib/zoeapps/eurecom_aml_lab.json
./zoe.py start joe-spark-notebook contrib/zapp-shop-sample/jupyter-r/r-notebook.json
ZApp execution status can be checked this way::
......@@ -38,6 +38,7 @@ ZApp execution status can be checked this way::
Where ``execution id`` is the ID of the ZApp execution to inspect, taken from the ``exec-ls`` command.
Or you can just connect to the web interface (port 5001 by default).
Where to go from here
---------------------
......
......@@ -7,8 +7,8 @@ If you are looking for the five-minutes install procedure, just for testing, che
When installing Zoe for production you should, first of all, look at the following requirements and take a decision about each of them:
* Container back-end
* Shared filesystem: we know of NFS or CephFS, but anything similar should work
* Container back-end: Kubernetes or DockerEngine
* Shared filesystem: we have deployments on NFS and CephFS, but anything similar should work
* Network: how your users will connect to the containers
* Authentication back-end: how your users will authenticate to Zoe (LDAP or text file)
* How to manage Zoe Applications (ZApps)
......@@ -21,9 +21,64 @@ Choosing the container back-end
At this time Zoe supports three back-ends:
* Docker engine: useful for running Zoe on a single PC, it is the simplest to setup.
* Legacy Docker Swarm: simple to install, additional features like SSL, high-availability and dynamic host discovery can be added as needed. Please note that Zoe does not support the new Swarm Mode of Docker Engine as the API is too limited.
* DockerEngine: uses one or more Docker Engines. It is simple to install and to scale.
* Kubernetes: the most complex to setup, we suggest using it only if you already have (or need) a Kubernetes setup for running other software.
* Legacy Docker Swarm (deprecated): simple to install, additional features like SSL, high-availability and dynamic host discovery can be added as needed. Please note that Zoe does not support the new Swarm Mode of Docker Engine as the API is too limited.
DockerEngine
^^^^^^^^^^^^
The DockerEngine back-end uses one or more nodes with Docker Engine installed and configured to listen to network requests.
The Docker Engines must be configured to enable `multi host networking <https://docs.docker.com/engine/userguide/networking/overlay-standalone-swarm/>`_.
This sample config file, usually found in ``/etc/docker/daemon.conf`` may help to get you started::
{
"dns": ["192.168.46.1"],
"dns-search": ["bigfoot.eurecom.fr"],
"tlsverify": true,
"tlscacert": "/mnt/certs/cacert.pem"
"tlscert": "/mnt/certs/cert.pem",
"tlskey": "/mnt/certs/key.pem",
"hosts": ["tcp://bf11.bigfoot.eurecom.fr:2375", "unix:///var/run/docker.sock"]
}
Once you have your docker hosts up and running, to tell the back-end which nodes are available and how to connect to them, you need to create a file with this format::
[DEFAULT]
use_tls: no
tls_cert: /mnt/cephfs/admin/cert-authority/container-router/cert.pem
tls_key: /mnt/cephfs/admin/cert-authority/container-router/key.pem
tls_ca: /mnt/cephfs/admin/cert-authority/ca.pem
[foo]
address: localhost:2375
external_address: 192.168.45.42
[bar]
docker_address: 192.168.47.5:2375
external_address: 192.168.47.5
use_tls: yes
[baz]
docker_address: 192.168.47.50:2375
external_address: 192.168.47.50
use_tls: yes
labels: gpu,ssd
This sample configuration describes three hosts. The DEFAULT section contains items that are common to all hosts, in any case these entries can be overwritten in the host definition.
Host ``foo`` does not use TLS (from the default config item), Zoe needs to connect to localhost on port 2375 to talk to it and users connecting to containers running on this host need to use the ``192.168.45.42`` address to connect. This ``external_address`` will be used by Zoe to generate links in the web interface.
Host ``bar`` uses TLS and host ``baz`` has also two labels that can be matched when starting services with the corresponding label. Labels are comma separated.
You tell Zoe the location of this file using the ``backend-docker-config-file`` option in zoe.conf.
Kubernetes
^^^^^^^^^^
See :ref:`kube-backend` for configuration details.
Shared filesystem
-----------------
......@@ -39,7 +94,7 @@ At Eurecom we use CephFS, but we know of successful Zoe deployments based on NFS
Networking
----------
Most of the ZApps expose a number of interfaces (web, REST and others) to the user. Zoe configures the active back-end to expose these ports, but does not perform any additional action to configure routing or DNS to make the ports accessible. Keeping in mind that the back-end network configuration is outside Zoe's competence area, here there a non-exhaustive list of the possible configurations:
Most of the ZApps expose a number of interfaces (web, REST and others) to the user. Zoe configures the active back-end to expose these ports, but does not perform any additional action to configure routing or DNS to make the ports accessible. Keeping in mind that the back-end network configuration is outside Zoe's competence area, here there is non-exhaustive list of the possible configurations:
* expose the hosts running the containers by using public IP addresses
* use a proxy, like the one developed for Zoe: :ref:`proxy`
......@@ -56,8 +111,8 @@ Zoe has a simple user model: users are authenticated against an external source
Zoe supports two authentication back-ends:
* LDAP and LDAP+SASL
* Text file
* LDAP and LDAP+SASL (``auth-type=ldap`` ot ``auth-type=ldapsasl``)
* Text file (``auth-type=text``)
As most of Zoe, the authentication back-end is pluggable and others can be easily implemented.
......@@ -65,9 +120,16 @@ LDAP
^^^^
Plain LDAP or LDAP+SASL GSSAPI are available.
The implementation will use the ``uid`` as the user_id and ``gidNumber`` to decide the role.
In Zoe configuration you need to specify the following options:
In Zoe configuration you need to specify the mapping between group IDs and roles.
* ``ldap-server-uri``
* ``ldap-bind-user``
* ``ldap-bind-password``
* ``ldap-base-dn``
* ``ldap-admin-gid``
* ``ldap-user-gid``
* ``ldap-guest-gid``
* ``ldap-group-name``
Text file
^^^^^^^^^
......@@ -82,7 +144,7 @@ The file location can be specified in the ``zoe.conf`` file and it needs to be r
Managing Zoe applications
-------------------------
At the very base, ZApps are composed of a container image and a JSON description. The container image can be stored in a local, private, registry, or in a public one, accessible via the Internet.
At the very base, ZApps are composed of a container image and a JSON description. The container image can be stored on the Docker nodes, in a local private registry, or in a public one, accessible via the Internet.
Zoe does not provide a way to automatically build images, push them to a local registry, or pull them to the hosts when needed. At Eurecom we provide an automated environment based on GItLab's CI features: users are able to customize their applications (JSON and Dockerfiles) by working on git repositories. Images are rebuilt and pushed on commit and JSON files are generated and copied to the ZApp shop directory. You can check out how we do it here:
https://gitlab.eurecom.fr/zoe-apps
......@@ -99,7 +161,7 @@ The shop is managed locally. It looks for ZApps in a configured directory (optio
* one or more text files in markdown format with ZApp information and documentation
* one or more JSON Zoe application descriptions
The ``manifest.json`` file drives the ZApp Shop. Its format is as follows::
The ``manifest.json`` file gather all this information together for the ZApp Shop interface. Its format is as follows::
{
"version": 1,
......@@ -127,30 +189,16 @@ The ``manifest.json`` file drives the ZApp Shop. Its format is as follows::
}
],
"guest_access": true
},
{
"category": "TensorFlow",
"name": "DRAGNN SyntaxNet model",
"description": "stnet-google.json",
"readable_descr": "README-syntaxnet.md",
"parameters": []
},
{
"category": "TensorFlow",
"name": "Magenta model",
"description": "mag-google.json",
"readable_descr": "README-magenta.md",
"parameters": []
}
]
}
* version : a internal version, used by Zoe to recognize the manifest format. For now only 1 is supported.
* zapps : a list of related ZApps that have to be shown in the shop
* zapps : a list of ZApps that have to be shown in the shop
For each ZApp:
* category : the category this ZApp belongs to, it is used to group ZApps in the web interfaces
* category : the category this ZApp belongs to, it is used to group ZApps in the web interfaces. There are no pre-defined categories and you are free to put anything you want in here
* name : the human-readable name
* description : the name of the json file with the Zoe description
* readable_descr : the name of the markdown file containing user documentation for the ZApp
......@@ -167,46 +215,41 @@ Parameters are values of the JSON description that are modified at run time.
* description : an helpful description
* type : string or integer, used for basic for validation
* default : the default value
* max : if ``type`` is integer, this is required and is the maximum value the user can set
* min : if ``type`` is integer, this is required and is the minimum value the user can set
* step : if ``type`` is integer, this is required and is the step for moving between values
Parameters can be of two kinds:
* environment : the parameter is passed as an environment variable. The name of the environment variable is stored in the ``name`` field. The JSON description is modified by setting the user-defined value in the environment variable with the corresponding name. All services that have the variable defined are modified.
* command : the service named ``name`` has its start-up command changed to the user-defined value
By default users with the ``user`` and ``admin`` roles have access also to resource parameters via the web interface. They can set the amount of memory and cores to reserve before starting their execution. The configuration option ``no-user-edit-limits-web`` can be used to disable access to this feature.
To get started, in the ``contrib/zapp-shop-sample/`` directory there is a sample of the structure needed for a working zapp-shop, including some data science related ZApps. Copy it as-is in your ZApp shop directory to have some Zapps to play with.
Example of distributed environment
----------------------------------
For running heavier workloads and distributed applications, you need a real container cluster. In this example we will use Docker Swarm, as it is simpler to setup than Kubernetes.
For running heavier workloads and distributed applications, you need a real container cluster. In this example we will use the DockerEngine back-end, as it is simpler to setup than Kubernetes.
Software:
* Docker Swarm
* Optional: ZooKeeper for Docker Engine discovery by Swarm (etcd and consul are also supported by Swarm)
* One or more Docker Engines
* Zoe
* NFS (or another distributed filesystem like CephFS)
Swarm supports several discovery mechanisms and you should select the most appropriate for your environment. An explicit host list is the most simple to setup, while ZooKeeper provides a resilient and dynamic way of attaching Docker Engines to Swarm.
* A Postgresql server
Topology:
* On node running Zoe, a Docker Engine and the following containers:
* One node running Zoe. Depending on how your users will access the workspaces you may want to add `gateway containers <https://github.com/DistributedSystemsGroup/gateway-containers>`_ for SSH and/or SOCKS proxies on this node.
* At least one worker node with a Docker Engine
* A file server running NFS: depending on the workload it can be co-located with Zoe
* A Postgresql server, again it can be colocated depending on your expected load
* Swarm manager
* Docker registry (Optional, but strongly suggested)
* Optional `gateway containers <https://github.com/DistributedSystemsGroup/gateway-containers>`_ for SSH and/or SOCKS proxies
To configure container networking, we suggest the standard Docker multi-host networking.
* Three nodes for ZooKeeper: if used only for Swarm the load will be very low and ZooKeeper can be co-located with Zoe and the Docker Engines
* At least one more node with a Docker Engine
* A file server running NFS: depending on the workload it can be co-located with Zoe and the Swarm manager
* Another node running a Swarm manager, if you want to enable HA for Swarm
To configure Swarm and the Registry, please refer to the documentation available on the Docker website.
To configure container networking, we suggest the standard Swarm overlay network.
In this configuration Zoe expects the network filesystem to be mounted in the same location on all hosts registered in Swarm. This location is specified in the ``workspace-base-path`` Zoe configuration item. Zoe will create a directory under it named as ``deployment-name`` by default or ``workspace-deployment-path`` if specified. Under it a new directory will be created for each user accessing Zoe.
In this configuration Zoe expects the network filesystem to be mounted in the same location on all worker nodes. This location is specified in the ``workspace-base-path`` Zoe configuration item. Zoe will create a directory under it named as ``deployment-name`` by default or ``workspace-deployment-path`` if specified. Under it a new directory will be created for each user accessing Zoe.
.. _test-install-label:
......@@ -218,18 +261,9 @@ A simple deployment for development and testing is possible with just:
* A Docker Engine
* Zoe
Please note: since Zoe will use the Swarm API to talk to the Docker Engine, the addresses of the exposed ports for running ZApp may be ``0.0.0.0``, causing the links generated by the web interface and the command line tool to be wrong. ZApps can be accessed by using ``127.0.0.1`` instead.
Docker compose
^^^^^^^^^^^^^^
In the root of the repository you can find a ``docker-compose.yml`` file that should help get you started.
Look also at the deployment scripts below, as they provide an option for a simple, zoe-on-a-laptop, install.
Deployment scripts
^^^^^^^^^^^^^^^^^^
Refer to `zoe-deploy <https://github.com/DistributedSystemsGroup/zoe-deploy>`_ repository for automated deployment scripts for configurations with Swarm or Kubernetes back-ends.
You will need to create a ``/etc/zoe`` directory containing the ``docker.conf`` file that lists the Docker engine nodes available to Zoe.
.. _manual-install-label:
......@@ -241,22 +275,20 @@ This section shows how to install the components outlined in the distributed env
Requirements
^^^^^^^^^^^^
* Python 3. Development happens on Python 3.4 and 3.5.
* Docker Swarm
* A shared filesystem, mounted on all hosts part of the Swarm.
* Python 3.4 or later
* One or more Docker engine
* A shared filesystem, mounted on all Docker hosts.
Optional:
* A Docker registry containing Zoe images for faster container startup times
* A logging pipeline able to receive GELF-formatted logs, or a Kafka broker
Swarm/Docker
^^^^^^^^^^^^
Docker Engine
^^^^^^^^^^^^^
Install Docker and the Swarm container:
Install Docker:
* https://docs.docker.com/installation/ubuntulinux/
* https://docs.docker.com/swarm/install-manual/
Network configuration
^^^^^^^^^^^^^^^^^^^^^
......@@ -270,7 +302,9 @@ This means that you will also need a key-value store supported by Docker. We use
Images: Docker Hub Vs local Docker registry
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A few sample ZApps have their images available on the Docker Hub. We strongly suggest setting up a private registry, containing your customized Zoe Service images.
A few sample ZApps have their images available on the Docker Hub. Images can be manually (or via a CI pipeline) pulled on all the worker nodes.
A Docker Registry becomes interesting to have if you have lot of image build activity and you need to keep track of who builds what, establish ACLs, etc.
Zoe
^^^
......
......@@ -9,50 +9,8 @@ All contributions to the codebase are centralised into an internal repository at
GitHub has been configured to protect the master branch on the `Zoe repository <https://github.com/DistributedSystemsGroup/zoe>`_. It will accept only pushes that are marked with a status check. This, together with Jenkins pushing only successful builds, guarantees that the quality of the published code respects our standards.
The CI pipeline in detail
-------------------------
Different contributors use different software for managing the CI pipeline.
* :ref:`ci-jenkins`
* :ref:`ci-gitlab`
SonarQube
^^^^^^^^^
`SonarQube <https://www.sonarqube.org/>`_ is a code quality tool that performs a large number of static tests on the codebase. It applies rules from well-known coding standards like Misra, Cert and CWE and generates a quality report.
We configured our test pipelines to fail if the code quality of new commits is below the following rules:
* Coverage less than 80%
* Maintainability worse than A
* Reliability worse than A
* Security rating worse than A
Documentation
^^^^^^^^^^^^^
A description of the CI pipeline is available in the :ref:`ci-gitlab` page.
Sphinx documentation is tested with the ``doc8`` tool with default options.
Integration tests
^^^^^^^^^^^^^^^^^
Refer to the :ref:`integration-test` documentation for details.
Zapp image vulnerability scan
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We use `clair <https://github.com/coreos/clair>`_, a vulnerability static analyzer for Containers from CoreOS, to analyze the Zoe Docker image before using it.
To scan the Zoe image (or any other Docker image) with clair, you can use the script below::
export imageID=`docker image inspect <your-registry-address>/zoe:$BUILD_ID | grep "Id" | awk -F ' ' '{print $2}' | awk -F ',' '{print $1}' | awk -F '"' '{print $2}'`
docker exec clair_clair analyzer $imageID
To start Clair and eventually add it to your Jenkins pipeline, check the files under the ``ci/clair`` folder.
RAML API description
--------------------
Under the directory ``contrib/raml`` there is the `RAML <http://raml.org/>`_ description of the Zoe API.
Refer to the :ref:`integration-test` documentation for details on integration testing.
.. _vision:
Vision for Zoe
==============
The vision for Zoe Analytics
============================
Zoe focus is data analytics. This focus helps defining a clear set of objectives and priorities for the project and avoid the risk of competing directly with generic infrastructure managers like Kubernetes or Swarm. Zoe instead sits on top of these "cloud managers" to provide a simpler interface to end users who have no interest in the intricacies of container infrastructures.
Data analytic applications do not work in isolation. They need data, that may be stored or streamed, and they generate logs, that may have to be analyzed for debugging or stored for auditing. Data layers, in turn, need health monitoring. All these tools, frameworks, distributed filesystems, object stores, form a galaxy that revolves around analytic applications. For simplicity we will call these "support applications".
To offer an integrated, modern solution to data analysis Zoe must support both analytic and support applications, even if they have very different users and requirements.
The users of Zoe
----------------
- users: the real end-users, who submit non-interactive analytic jobs or use interactive applications
- admins: systems administrators who maintain Zoe, the data layers, monitoring and logging for the infrastructure and the apps being run
Deviation from the current ZApp terminology
-------------------------------------------
In the current Zoe implementation ZApps are self-contained descriptions of a set of cooperating processes. They get submitted once, to request the start-up of an execution. This fits well the model of a single spark job or of a throw-away jupyter notebook.
ZApps are at the top level, they are the user-visible entity. The ZApp building blocks, analytic engines or support tools, are called frameworks. A framework, by itself, cannot be run. It lacks configuration or a user-provided binary to execute, for example. Each framework is composed of one or more processes, that come with some metadata of the needed configuration and a container image.
A few examples:
- A Jupyter notebook, by itself, is a framework in Zoe terminology. It lacks configuration that tells it which kernels to enable or on which port to listen to. It is a framework composed by just one process.
- A Spark cluster is another framework. By itself it does nothing. It can be connected to a notebook, or it can be given a jar file and some data to process.
- A "Jupyter with R listening on port 8080" ZApp is a Jupyter framework that has been configured with certain options and made runnable
To create a ZApp you need to put together one or more frameworks and add some configuration (framework-dependent) that tells them how to behave.
A ZApp shop could contain both frameworks (that the user must combine together) and full-featured ZApps.
Nothing prevents certain ZApp attributes to be "templated", like resource requests or elastic process counts, for example.
Zoe does not focus on support applications. Managing a stable and fault tolerant HDFS cluster, for example, is a task for tools like Puppet, Chef or Ansible and is done by system administrators. Zoe, instead, targets data scientists, that need to use a cluster infrastructure, but do not usually have sysadmin skills.
Kinds of applications
---------------------
See :ref:`zapp_classification`.
Architecture
------------
.. image:: /figures/extended_arch.png
This architecture extends the current one by adding a number of pluggable modules that implement additional (and optional) features. The additional modules, in principle, are created as more Zoe frameworks, that will be instantiated in ZApps and run through the scheduler together with all the other ZApps.
The core of Zoe remains very simple to understand, while opening the door to more capabilities. The additional modules will have the ability of communicating with the backend, submit, modify and terminate executions via the scheduler and report information back to the user. Their actions will be driven by additional fields written in the ZApps descriptions.
In the figure there are three examples of such modules:
- Zoe-HA: monitors and performs tasks related to high availability, both for Zoe components and for running user executions. This modules could take care of maintaining a certain replication factor, or making sure a certain service is restarted in case of failures, updating a load balancer or a DNS entry
- Zoe rolling upgrades: performs rolling upgrades of long-running ZApps
- Workflows (see the section below)
Other examples could be:
- Zoe-storage: for managing volumes and associated constraints
The modules should try to re-use as much as possible the functionality already available in the backends. A simple Zoe installation could run on a single Docker engine available locally and provide a reduced set of features, while a full-fledged install could run on top of Kubernetes and provide all the large-scale deployment features that such platform already implements.
Workflows
^^^^^^^^^
A few examples of workflows:
- run a job every hour (triggered by time)
- run a set of jobs in series or in parallel (triggered by the state of other jobs)
- run a job whenever the size of a file/directory/bucket on a certain data layer reaches 5GB (more complex triggers)
- combinations of all the above
A complete workflow system for data analytic is very complex. There is a lot of theory on this topic and zoe-workflow is a project of the same size as Zoe itself. An example of a full-featured workflow system for Hadoop is http://oozie.apache.org/
......@@ -136,6 +136,12 @@ Resources that need to be reserved for this service. Each resource is specified
Support for this feature depends on the scheduler and back-end in use.
The Elastic scheduler together with the DockerEngine back-end will behave in the following way:
* the "max" limits are ignored in the JSON description, the ones set in the Zoe configuration file are used instead
* for memory, a soft limit will be set at the "min" resource level and an hard limit to the amount set in the ``max-memory-limit`` option. See Docker documentation about the exact definitions of soft and hard limits
* for cores, they will be allocated dynamically and automatically: a service that has cores.min set to 4 will have at least cores. If more are available on the node it is running on, more will be given
startup_order
^^^^^^^^^^^^^
......@@ -157,6 +163,13 @@ string
This entry is optional. If set, its value will be used as the work directory where the command is executed.
labels
^^^^^^
array of strings
This entry is optional. Labels will be used by the scheduler to take placement decisions for services. Services that have labels "ssd" and "gpu" will be placed only on hosts declared with both labels. If no hosts with "ssd" and "gpu" are available, the ZApp is left in the queue.
ports
^^^^^
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment