Commit 9f1a8716 authored by Daniele Venzano's avatar Daniele Venzano

Update the documentation, second part

parent 06fbad59
version: '2'
services:
postgres:
image: postgres
gateway-socks:
image: zoerepo/gateway-socks
networks:
- zoe
image: postgres:9.3
zoe-api:
image: zoerepo/zoe
command: python3 zoe-api.py --debug --swarm ${SWARM_URL} --deployment-name compose --master-url tcp://zoe-master:4850 --dbuser postgres --dbhost postgres --dbname postgres
image: zoerepo/zoe-test
command: python3 zoe-api.py --debug --backend DockerEngine --backend-docker-config-file /etc/zoe/docker.conf --deployment-name compose --master-url tcp://zoe-master:4850 --dbuser postgres --dbhost postgres --dbname postgres
ports:
- "8080:5001"
depends_on:
- postgres
zoe-master:
image: zoerepo/zoe
image: zoerepo/zoe-test
ports:
- "4850:4850"
volumes:
- /etc/zoe:/etc/zoe
- /opt/zoe-workspaces:/mnt/zoe-workspaces
command: python3 zoe-master.py --debug --swarm ${SWARM_URL} --deployment-name compose --dbuser postgres --dbhost postgres --dbname postgres
command: python3 zoe-master.py --debug --backend DockerEngine --backend-docker-config-file /etc/zoe/docker.conf --deployment-name compose --dbuser postgres --dbhost postgres --dbname postgres
depends_on:
- zoe-api
networks:
......
......@@ -5,13 +5,13 @@ Architecture
The main Zoe Components are:
* zoe master: the core component that performs application scheduling and talks to Swarm
* zoe master: the core component that performs application scheduling and talks to the container back-end
* zoe api: the Zoe frontend, offering a web interface and a REST API
* command-line clients (zoe.py and zoe-admin.py)
The Zoe master is the core component of Zoe and communicates with the clients by using an internal ZeroMQ-based protocol. This protocol is designed to be robust, using the best practices from ZeroMQ documentation. A crash of the Api or of the Master process will not leave the other component inoperable, and when the faulted process restarts, work will restart where it was left.
In this architecture all state is kept in a Postgres database. With Zoe we try very hard not to reinvent the wheel and the internal state system we had in the previous architecture iteration was starting to show its limits.
In this architecture all application state is kept in a Postgres database. Platform state is kept in-memory and rebuilt at start time. A lot of care and tuning has been spent in keeping the vision Zoe has of the system and the real back-end state synchronised. In a few cases containers may be left orphaned: when Zoe deems it safe, they will be automatically cleaned-up, otherwise a warning in the logs will generated and the administrator has to examine the situation as, usually, it points to a bug hidden somewhere in the back-end code.
Users submit *execution requests*, composed by a name and an *application description*. The frontend process (Zoe api) informs the Zoe Master that a new execution request is available for execution.
Inside the Master, a scheduler keeps track of available resources and execution requests, and applies a
......
......@@ -13,7 +13,7 @@ Development repository
----------------------
Development happens at `Eurecom's GitLab repository <https://gitlab.eurecom.fr/zoe/main>`_. The GitHub repository is a read-only mirror.
The choice of GitLab over GitHub is due to the CI pipeline that we set-up to test Zoe.
The choice of GitLab over GitHub is due to the CI pipeline that we set-up to test Zoe. Please note the issue tracking happens on GitHub.
Bug reports and feature requests
--------------------------------
......@@ -22,10 +22,10 @@ Bug reports and feature requests are handled through the GitHub issue system at:
The issue system should be used for only for bug reports or feature requests. If you have more general questions, you need help setting up Zoe or running some application, please use the mailing list.
The mailing list
----------------
Mailing list
------------
The first step is to subscribe to the mailing list: `http://www.freelists.org/list/zoe <http://www.freelists.org/list/zoe>`_
The mailing list: `http://www.freelists.org/list/zoe <http://www.freelists.org/list/zoe>`_
Use the mailing list to stay up-to-date with what other developers are working on, to discuss and propose your ideas. We prefer small and incremental contributions, so it is important to keep in touch with the rest of the community to receive feedback. This way your contribution will be much more easily accepted.
......
.. _main_index:
Zoe - Container-based Analytics as a Service
============================================
Zoe Analytics - Container-based Analytics as a Service
======================================================
Zoe provides a simple way to provision data analytics applications. It hides the complexities of managing resources, configuring and deploying complex distributed applications on private clouds. Zoe is focused on data analysis applications, such as `Spark <http://spark.apache.org/>`_ or `Tensorflow <https://www.tensorflow.org/>`_. A generic, very flexible application description format lets you easily describe any kind of data analysis application.
Zoe Analytics provides a simple way to provision data analytics applications. It hides the complexities of managing resources, configuring and deploying complex distributed applications on private clouds. Zoe is focused on data analysis applications, such as `Spark <http://spark.apache.org/>`_ or `Tensorflow <https://www.tensorflow.org/>`_. A generic, very flexible application description format lets you easily describe any kind of data analysis application.
Downloading
-----------
......@@ -27,9 +27,9 @@ Now you can check if you are up and running with this command::
It will return some version information, by querying the zoe-api and zoe-master processes.
Zoe applications are passed as JSON files. A few sample ZApps are available in the ``contrib/zoeapps/`` directory. To start a ZApp use the following command::
Zoe applications are passed as JSON files. A few sample ZApps are available in the ``contrib/zapp-shop-sample/`` directory. To start a ZApp use the following command::
./zoe.py start joe-spark-notebook contrib/zoeapps/eurecom_aml_lab.json
./zoe.py start joe-spark-notebook contrib/zapp-shop-sample/jupyter-r/r-notebook.json
ZApp execution status can be checked this way::
......@@ -38,6 +38,7 @@ ZApp execution status can be checked this way::
Where ``execution id`` is the ID of the ZApp execution to inspect, taken from the ``exec-ls`` command.
Or you can just connect to the web interface (port 5001 by default).
Where to go from here
---------------------
......
This diff is collapsed.
......@@ -9,50 +9,8 @@ All contributions to the codebase are centralised into an internal repository at
GitHub has been configured to protect the master branch on the `Zoe repository <https://github.com/DistributedSystemsGroup/zoe>`_. It will accept only pushes that are marked with a status check. This, together with Jenkins pushing only successful builds, guarantees that the quality of the published code respects our standards.
The CI pipeline in detail
-------------------------
Different contributors use different software for managing the CI pipeline.
* :ref:`ci-jenkins`
* :ref:`ci-gitlab`
SonarQube
^^^^^^^^^
`SonarQube <https://www.sonarqube.org/>`_ is a code quality tool that performs a large number of static tests on the codebase. It applies rules from well-known coding standards like Misra, Cert and CWE and generates a quality report.
We configured our test pipelines to fail if the code quality of new commits is below the following rules:
* Coverage less than 80%
* Maintainability worse than A
* Reliability worse than A
* Security rating worse than A
Documentation
^^^^^^^^^^^^^
A description of the CI pipeline is available in the :ref:`ci-gitlab` page.
Sphinx documentation is tested with the ``doc8`` tool with default options.
Integration tests
^^^^^^^^^^^^^^^^^
Refer to the :ref:`integration-test` documentation for details.
Zapp image vulnerability scan
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We use `clair <https://github.com/coreos/clair>`_, a vulnerability static analyzer for Containers from CoreOS, to analyze the Zoe Docker image before using it.
To scan the Zoe image (or any other Docker image) with clair, you can use the script below::
export imageID=`docker image inspect <your-registry-address>/zoe:$BUILD_ID | grep "Id" | awk -F ' ' '{print $2}' | awk -F ',' '{print $1}' | awk -F '"' '{print $2}'`
docker exec clair_clair analyzer $imageID
To start Clair and eventually add it to your Jenkins pipeline, check the files under the ``ci/clair`` folder.
RAML API description
--------------------
Under the directory ``contrib/raml`` there is the `RAML <http://raml.org/>`_ description of the Zoe API.
Refer to the :ref:`integration-test` documentation for details on integration testing.
.. _vision:
Vision for Zoe
==============
The vision for Zoe Analytics
============================
Zoe focus is data analytics. This focus helps defining a clear set of objectives and priorities for the project and avoid the risk of competing directly with generic infrastructure managers like Kubernetes or Swarm. Zoe instead sits on top of these "cloud managers" to provide a simpler interface to end users who have no interest in the intricacies of container infrastructures.
Data analytic applications do not work in isolation. They need data, that may be stored or streamed, and they generate logs, that may have to be analyzed for debugging or stored for auditing. Data layers, in turn, need health monitoring. All these tools, frameworks, distributed filesystems, object stores, form a galaxy that revolves around analytic applications. For simplicity we will call these "support applications".
To offer an integrated, modern solution to data analysis Zoe must support both analytic and support applications, even if they have very different users and requirements.
The users of Zoe
----------------
- users: the real end-users, who submit non-interactive analytic jobs or use interactive applications
- admins: systems administrators who maintain Zoe, the data layers, monitoring and logging for the infrastructure and the apps being run
Deviation from the current ZApp terminology
-------------------------------------------
In the current Zoe implementation ZApps are self-contained descriptions of a set of cooperating processes. They get submitted once, to request the start-up of an execution. This fits well the model of a single spark job or of a throw-away jupyter notebook.
ZApps are at the top level, they are the user-visible entity. The ZApp building blocks, analytic engines or support tools, are called frameworks. A framework, by itself, cannot be run. It lacks configuration or a user-provided binary to execute, for example. Each framework is composed of one or more processes, that come with some metadata of the needed configuration and a container image.
A few examples:
- A Jupyter notebook, by itself, is a framework in Zoe terminology. It lacks configuration that tells it which kernels to enable or on which port to listen to. It is a framework composed by just one process.
- A Spark cluster is another framework. By itself it does nothing. It can be connected to a notebook, or it can be given a jar file and some data to process.
- A "Jupyter with R listening on port 8080" ZApp is a Jupyter framework that has been configured with certain options and made runnable
To create a ZApp you need to put together one or more frameworks and add some configuration (framework-dependent) that tells them how to behave.
A ZApp shop could contain both frameworks (that the user must combine together) and full-featured ZApps.
Nothing prevents certain ZApp attributes to be "templated", like resource requests or elastic process counts, for example.
Zoe does not focus on support applications. Managing a stable and fault tolerant HDFS cluster, for example, is a task for tools like Puppet, Chef or Ansible and is done by system administrators. Zoe, instead, targets data scientists, that need to use a cluster infrastructure, but do not usually have sysadmin skills.
Kinds of applications
---------------------
See :ref:`zapp_classification`.
Architecture
------------
.. image:: /figures/extended_arch.png
This architecture extends the current one by adding a number of pluggable modules that implement additional (and optional) features. The additional modules, in principle, are created as more Zoe frameworks, that will be instantiated in ZApps and run through the scheduler together with all the other ZApps.
The core of Zoe remains very simple to understand, while opening the door to more capabilities. The additional modules will have the ability of communicating with the backend, submit, modify and terminate executions via the scheduler and report information back to the user. Their actions will be driven by additional fields written in the ZApps descriptions.
In the figure there are three examples of such modules:
- Zoe-HA: monitors and performs tasks related to high availability, both for Zoe components and for running user executions. This modules could take care of maintaining a certain replication factor, or making sure a certain service is restarted in case of failures, updating a load balancer or a DNS entry
- Zoe rolling upgrades: performs rolling upgrades of long-running ZApps
- Workflows (see the section below)
Other examples could be:
- Zoe-storage: for managing volumes and associated constraints
The modules should try to re-use as much as possible the functionality already available in the backends. A simple Zoe installation could run on a single Docker engine available locally and provide a reduced set of features, while a full-fledged install could run on top of Kubernetes and provide all the large-scale deployment features that such platform already implements.
Workflows
^^^^^^^^^
A few examples of workflows:
- run a job every hour (triggered by time)
- run a set of jobs in series or in parallel (triggered by the state of other jobs)
- run a job whenever the size of a file/directory/bucket on a certain data layer reaches 5GB (more complex triggers)
- combinations of all the above
A complete workflow system for data analytic is very complex. There is a lot of theory on this topic and zoe-workflow is a project of the same size as Zoe itself. An example of a full-featured workflow system for Hadoop is http://oozie.apache.org/
......@@ -136,6 +136,12 @@ Resources that need to be reserved for this service. Each resource is specified
Support for this feature depends on the scheduler and back-end in use.
The Elastic scheduler together with the DockerEngine back-end will behave in the following way:
* the "max" limits are ignored in the JSON description, the ones set in the Zoe configuration file are used instead
* for memory, a soft limit will be set at the "min" resource level and an hard limit to the amount set in the ``max-memory-limit`` option. See Docker documentation about the exact definitions of soft and hard limits
* for cores, they will be allocated dynamically and automatically: a service that has cores.min set to 4 will have at least cores. If more are available on the node it is running on, more will be given
startup_order
^^^^^^^^^^^^^
......@@ -157,6 +163,13 @@ string
This entry is optional. If set, its value will be used as the work directory where the command is executed.
labels
^^^^^^
array of strings
This entry is optional. Labels will be used by the scheduler to take placement decisions for services. Services that have labels "ssd" and "gpu" will be placed only on hosts declared with both labels. If no hosts with "ssd" and "gpu" are available, the ZApp is left in the queue.
ports
^^^^^
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment