Commit bc709616 authored by Daniele Venzano's avatar Daniele Venzano

Documentation update

parent 066c19f5
......@@ -34,6 +34,11 @@ To contribute code and/or documentation you should follow this workflow:
2. fork the Zoe repository via GitHub (if you don't already have a fork)
3. create a branch that will hold your changes
4. ... develop and debug ...
5. generate a pull request via GitHub
5. when you are ready propose your changes on the mailing list
Zoe maintainers will review pull requests, give constructive feedback and eventually merge them.
Zoe maintainers will review your code, give constructive feedback and eventually perform a pull and a merge.
Code quality and tests
^^^^^^^^^^^^^^^^^^^^^^
Contributors can setup their own CI pipeline following the quality guidelines (:ref:`quality`). At a bare minimum all code should be tested via the `run_tests.sh` script available in the root of the repository. Accepted contributions will be run through the CI pipeline at Eurecom before being published on the public repository.
.. _devel_backend:
Backend abstraction
===================
The container backend Zoe uses is configurable at runtime. Internally there is an API that Zoe, in particular the scheduler, uses to communicate with the container backend. This document explains the API, so that new backends can be created and maintained.
Zoe assumes backends are composed of multiple nodes. In case the backend is not clustered or does not expose per-node information, it can be implemented in Zoe as exposing a single node.
Package structure
-----------------
Backends are written in Python and live in the ``zoe_master/backends/`` directory. Inside there is one Python package for each backend implementation.
To let Zoe use a new backend, its class must be imported in ``zoe_master/backends/interface.py`` and the ``_get_backend()`` function should be modified accordingly. Then the choices in ``zoe_lib/config.py`` for the configuration file should be expanded to include the new backend name.
More options to the configuration file can be added to support the new backend. Use the ``--<backend name>-<option name>`` convention for them.
API
---
Whenever Zoe needs to access the container backend it will create a new instance of the backend class. The class must be a child of ``zoe_master.backends.base.BaseBackend``.
.. autoclass:: zoe_master.backends.base.BaseBackend
:members:
.. _developer_documentation:
Developer documentation
=======================
:ref:`modindex`
.. toctree::
:maxdepth: 2
introduction
rest-api
auth
api-endpoint
master-api
scheduler
backend
stats
REST API classes
================
.. _rest-api:
.. automodule:: zoe_api.rest_api.discovery
:members:
Zoe REST API
============
.. automodule:: zoe_api.rest_api.execution
:members:
Zoe can be used from the command line or the web interface. For more complex tasks also an API is provided, so that Zoe functionality can be accesses programmatically.
.. automodule:: zoe_api.rest_api.info
:members:
The API is provided by the zoe-api processes, on the same port of the web interface (5001 by default). Every URL of the API contains, after the hostname and port, the path ``/api/<api version>/``. This document describes API version 0.6.
.. automodule:: zoe_api.rest_api.service
:members:
In case the request causes an error, an appropriate HTTP status code is returned. The reply will also contain a JSON document in this format::
.. automodule:: zoe_api.rest_api.statistics
:members:
{
"message": "missing or wrong authentication information"
}
.. automodule:: zoe_api.rest_api.utils
:members:
With an error message detailing the kind of error that happened.
Some endpoints require credentials for authentication. For now the API uses straightforward HTTP Basic authentication. In case credentials are missing or wrong a 401 status code will be returned.
Info endpoint
-------------
This endpoint does not need authentication. It returns general, static, information about the Zoe software. It is meant for checking that the client is able to talk correctly to the API server::
curl http://bf5:8080/api/0.6/info
Will return a JSON document, like this::
{
"version" : "0.10.1-beta",
"deployment_name" : "prod",
"application_format_version" : 2,
"api_version" : "0.6"
}
Where:
* ``version`` is the Zoe version
* ``deployment_name`` is the name configured for this deployment (multiple Zoe deployment can share the same cluster)
* ``application_format_version`` is the version of ZApp format this Zoe is able to understand
* ``api_version`` is the API version supported by this Zoe and should match the one used in the request URL
Execution endpoint
------------------
All the endpoints listed in this section require authentication.
Execution details
^^^^^^^^^^^^^^^^^
Request (GET)::
curl -u 'username:password' http://bf5:8080/api/0.6/execution/<execution_id>
Where:
* ``execution_id`` is the ID of the execution we want to inspect
Will return a JSON document like this::
{
"status" : "running",
"description" : {
"version" : 2,
"will_end" : false,
[...]
},
"error_message" : null,
"time_start" : 1473337160.16264,
"id" : 25158,
"user_id" : "venzano",
"time_end" : null,
"name" : "boinc-loader",
"services" : [
26774
],
"time_submit" : 1473337122.99315
}
Where:
* ``status`` is the execution status. It can be on of "submitted", "scheduled", "starting", "error", "running", "cleaning up", "terminated"
* ``description`` is the full ZApp description as submitted by the user
* ``error_message`` contains the error message in case ``status`` is equal to error
* ``time_submit`` is the time the execution was submitted to Zoe
* ``time_start`` is the time the execution started, after it was queued in the scheduler
* ``time_end`` is the time the execution finished or was terminated by the user
* ``id`` is the ID of the execution
* ``user_id`` is the identifier of the user who submitted the ZApp for execution
* ``name`` is the name of the execution
* ``services`` is a list of service IDs that can be used to inspect single services
Terminate execution
^^^^^^^^^^^^^^^^^^^
This endpoint terminates a running execution.
Request (DELETE)::
curl -X DELETE -u 'username:password' http://bf5:8080/api/0.6/execution/<execution_id>
If the request is successful an empty response with status code 200 will be returned.
Delete execution
^^^^^^^^^^^^^^^^
This endpoint deletes an execution from the database, terminating it if it is running.
Request (DELETE)::
curl -u 'username:password' http://bf5:8080/api/0.6/execution/delete/<execution_id>
If the request is successful an empty response with status code 200 will be returned.
List all executions
^^^^^^^^^^^^^^^^^^^
This endpoint will list all executions belonging to the calling user. If the user has an administrator role, executions for all users will be returned.
Request (GET)::
curl -u 'username:password' http://bf5:8080/api/0.6/execution
Will return a JSON document like this::
{
"25152" : {
"time_submit" : 1473337122.87461,
"id" : 25152,
[...]
"status" : "running",
"time_start" : 1473337156.8096,
"services" : [
26768
],
"time_end" : null,
"name" : "boinc-loader",
"error_message" : null
},
"25086" : {
"time_start" : 1473337123.30892,
"status" : "running",
"user_id" : "venzano",
[..]
It is a map with the execution IDs as keys and the full execution details as values.
Start execution
^^^^^^^^^^^^^^^
Request (POST)::
curl -X POST -u 'username:password' --data-urlencode @filename http://bf5:8080/api/0.6/execution
Needs a JSON document passed as the request body::
{
"application": <zapp json>,
'name': "experiment #33"
}
Where:
* ``application`` is the full ZApp JSON document, the application description
* ``name`` is the name of the execution provided by the user
Will return a JSON document like this::
{
"execution_id": 23441
}
Where:
* ``execution_id`` is the ID of the new execution just created.
Service endpoint
----------------
All the endpoints listed in this section require authentication.
Service details
^^^^^^^^^^^^^^^
Request::
curl -u 'username:password' http://bf5:8080/api/0.6/service/<service_id>
Will return a JSON document like this::
{
"status" : "active",
"service_group" : "boinc-client",
"backend_status" : "started",
"ip_address" : "10.0.0.94",
"execution_id" : 25158,
"name" : "boinc-client0",
"backend_id" : "d0042c69b54e90327d9287e099304b6c25921d81f639803494ea744445d58430",
"error_message" : null,
"id" : 26774,
"description" : {
"required_resources" : {
"memory" : 536870912
},
[...]
"name" : "boinc-client",
"volumes" : []
}
}
Where:
* ``status`` is the service status from Zoe point of view. It can be one of "terminating", "inactive", "active" or "starting"
* ``service_group`` is the name for the service provided in the ZApp description. When the ZApp is unpacked to create the actual containers a single service definition will spawn one or more services with this name in common
* ``backend_status`` is the container status from the point of view of the container backend. Zoe tries her best to keep this value in sync, but the value here can be out of sync by several minutes. It can be one of 'undefined', 'created', 'started', 'dead' or 'destroyed'
* ``ip_address`` is the IP address of the container
* ``execution_id`` is the execution ID this service belongs to
* ``name`` is the name for this service instance, generated from ``service_group``
* ``backend_id`` is the ID used by the backend to identify this container
* ``error_message`` is currently unused
* ``id`` is the ID of this service, should match the one given in the URL
* ``description`` is the service description extracted from the ZApp
Service standard output and error
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Request::
curl -u 'username:password' http://bf5:8080/api/0.6/service/logs/<service_id>
Will stream the service instance output, starting from the time the service started. It will close the connection when the service exits.
Discovery endpoint
------------------
This endpoint does not need authentication. It returns a list of services that meet the criteria passed in the URL. It can be used as a service discovery mechanism for those ZApps that need to know in advance the list of available services.
Request::
curl http://bf5:8080/api/0.6/discovery/by_group/<execution_id>/<service_type>
Where:
* ``execution_id`` is the numeric ID of the execution we need to query
* ``service_type`` is the service name (as defined in the ZApp) to filter only services of that type
Will return a JSON document, like this::
{
"service_type" : "boinc-client",
"execution_id" : "23015",
"dns_names" : [
"boinc-client0-23015-prod"
]
}
Where:
* ``service_type`` is the name of the service as passed in the URL
* ``execution_id`` is the execution ID as passed in the URL
* ``dns_names`` is the list of DNS names for each service instance currently active (only one in the example above)
Statistics endpoint
-------------------
This endpoint does not need authentication. It returns current statistics about the internal Zoe status.
Scheduler
^^^^^^^^^
Request::
curl http://bf5:8080/api/0.6/statistics/scheduler
Will return a JSON document, like this::
{
"termination_threads_count" : 0,
"queue_length" : 0
}
Where:
* ``termination_threads_count`` is the number of executions that are pending for termination and cleanup
* ``queue_length`` is the number of executions in the queue waiting to be started
.. _stats:
Platform and scheduler statistics
=================================
.. automodule:: zoe_master.stats
:members:
......@@ -20,7 +20,7 @@ To better understand what we mean by "analytic service", here are a few examples
A number of predefined applications for testing and customization can be found at the `zoe-applications <https://github.com/DistributedSystemsGroup/zoe-applications>`_ repository.
Have a look at the :ref:`vision` and at the :ref:`roadmap` to see what we are currently planning and feel free to `contact us <daniele.venzano@eurecom.fr>`_ via email or through the `GitHub issue tracker <https://github.com/DistributedSystemsGroup/zoe/issues>`_ to pose questions or suggest ideas and new features.
Have a look at the :ref:`vision` and at the `roadmap <https://github.com/DistributedSystemsGroup/zoe/wiki>`_ to see what we are currently planning and feel free to `contact us <daniele.venzano@eurecom.fr>`_ via email or through the `GitHub issue tracker <https://github.com/DistributedSystemsGroup/zoe/issues>`_ to pose questions or suggest ideas and new features.
A note on terminology (needs to be updated)
-------------------------------------------
......@@ -43,10 +43,9 @@ Contents
logging
monitoring
architecture
rest-api
quality
vision
motivations
roadmap
contributing
Zoe applications
......@@ -68,14 +67,10 @@ Developer documentation
:ref:`modindex`
.. toctree::
:maxdepth: 2
:maxdepth: 1
developer/index
developer/introduction
developer/rest-api
developer/auth
developer/api-endpoint
developer/master-api
developer/scheduler
Contacts
========
......
......@@ -5,7 +5,7 @@ Container logs
By default Zoe does not involve itself with the output from container processes. The logs can be retrieved with the usual Docker command ``docker logs`` while a container is alive, they are lost forever when the container is deleted. This solution however does not scale very well: to examine logs, users need to have access to the docker commandline tools and to the Swarm they are running in.
To setup a more convenient loggin solution, Zoe provides two alternatives:
To setup a more convenient logging solution, Zoe provides two alternatives:
1. Using the ``gelf-address`` option, Zoe can configure Docker to send the container outputs to an external destination in GELF format. GELF is the richest format supported by Docker and can be ingested by a number of tools such as Graylog and Logstash. When that option is set all containers created by Zoe will send their output (standard output and standard error) to the destination specified. Docker is instructed to add all Zoe-defined tags to the GELF messages, so that they can be aggregated by Zoe execution, Zoe user, etc. A popular logging stack that supports GELF is `ELK <https://www.elastic.co/products>`_.
2. Using the ``service-log-path`` option: logs will be stored in the directory specified when the execution terminates. The directory can be exposed via http or NFS to give access to users. On the other hand, if the log are too big, Zoe will spend a big amount of time saving the data and resources will not be freed until the the copying process has not finished.
......@@ -13,8 +13,8 @@ To setup a more convenient loggin solution, Zoe provides two alternatives:
In our experience, web interfaces like Kibana or Graylog are not useful to the Zoe users: they want to quickly dig through logs of their executions to find an error or an interesting number to correlate to some other number in some other log. The web interfaces (option 1) are slow and cluttered compared to using grep on a text file (option 2).
Which alternative is good for you depends on the usage pattern of your users, your log auditing requirements, etc.
Optional Kafka support
----------------------
What if you want your logs to go through Kafka
----------------------------------------------
Zoe also provides a Zoe Logger process, in case you prefer to use Kafka in your log pipeline. Each container output will be sent to its own topic, that Kafka will retain for seven days by default. With Kafka you can also monitor the container output in real-time, for example to debug your container images running in Zoe. In this case GELF is converted to syslog-like format for easier handling.
......
.. _quality:
Testing and quality assurance
=============================
Every commit that hits the master branch on the public repository of Zoe has to pass a testing gate.
All contributions to the codebase are centralised into an internal repository at Eurecom. There, every commit (on any branch) triggers a continuous integration pipeline that verifies code quality and runs tests. Only commits and merges on the master branch for which the Jenkins build succeeds are pushed to the public repository.
GitHub has been configured to protect the master branch on the `Zoe repository <https://github.com/DistributedSystemsGroup/zoe>`_. It will accept only pushes that are marked with a status check. This, together with Jenkins pushing only successful builds, guarantees that the quality of the published code respects our standards.
The CI pipeline in detail
-------------------------
Jenkins is triggered via a hook script on the internal Eurecom repository.
SonarQube
^^^^^^^^^
`SonarQube <https://www.sonarqube.org/>`_ is a code quality tool that performs a large number of static tests on the codebase. It applies rules from well-known coding standards like Misra, Cert and CWE.
SonarQube provides a feature that aggregates static test results into simple measures of overall code quality.
We configured the Jenkins build to fail if the code quality of new commits is below the following rules:
* Coverage less than 80%
* Maintainability worse than B
* Reliability worse than B
* Security rating worse than A
We plan to move to A rating for all measures after some clean ups and refactoring on the code.
Documentation
^^^^^^^^^^^^^
Sphinx documentation is tested with the ``doc8`` tool with default options.
Integration tests
^^^^^^^^^^^^^^^^^
Zoe is composed of two main processes and depends on a number of external services. In this setting, creating and maintaining credible mock-ups for unit testing would slow down the development too much.
Instead we are working on a suite of integration tests that will run Zoe components against real, live instances of the services Zoe depends on.
These tests will also be run before commits are pushed to the public repository.
.. _rest-api:
Zoe REST API
============
Zoe can be used from the command line or the web interface. For more complex tasks also an API is provided, so that Zoe functionality can be accesses programmatically.
The API is provided by the zoe-api processes, on the same port of the web interface (5001 by default). Every URL of the API contains, after the hostname and port, the path ``/api/<api version>/``. This document describes API version 0.6.
In case the request causes an error, an appropriate HTTP status code is returned. The reply will also contain a JSON document in this format::
{
"message": "missing or wrong authentication information"
}
With an error message detailing the kind of error that happened.
Some endpoints require credentials for authentication. For now the API uses straightforward HTTP Basic authentication. In case credentials are missing or wrong a 401 status code will be returned.
Info endpoint
-------------
This endpoint does not need authentication. It returns general, static, information about the Zoe software. It is meant for checking that the client is able to talk correctly to the API server::
curl http://bf5:8080/api/0.6/info
Will return a JSON document, like this::
{
"version" : "0.10.1-beta",
"deployment_name" : "prod",
"application_format_version" : 2,
"api_version" : "0.6"
}
Where:
* ``version`` is the Zoe version
* ``deployment_name`` is the name configured for this deployment (multiple Zoe deployment can share the same cluster)
* ``application_format_version`` is the version of ZApp format this Zoe is able to understand
* ``api_version`` is the API version supported by this Zoe and should match the one used in the request URL
Execution endpoint
------------------
All the endpoints listed in this section require authentication.
Execution details
^^^^^^^^^^^^^^^^^
Request (GET)::
curl -u 'username:password' http://bf5:8080/api/0.6/execution/<execution_id>
Where:
* ``execution_id`` is the ID of the execution we want to inspect
Will return a JSON document like this::
{
"status" : "running",
"description" : {
"version" : 2,
"will_end" : false,
[...]
},
"error_message" : null,
"time_start" : 1473337160.16264,
"id" : 25158,
"user_id" : "venzano",
"time_end" : null,
"name" : "boinc-loader",
"services" : [
26774
],
"time_submit" : 1473337122.99315
}
Where:
* ``status`` is the execution status. It can be on of "submitted", "scheduled", "starting", "error", "running", "cleaning up", "terminated"
* ``description`` is the full ZApp description as submitted by the user
* ``error_message`` contains the error message in case ``status`` is equal to error
* ``time_submit`` is the time the execution was submitted to Zoe
* ``time_start`` is the time the execution started, after it was queued in the scheduler
* ``time_end`` is the time the execution finished or was terminated by the user
* ``id`` is the ID of the execution
* ``user_id`` is the identifier of the user who submitted the ZApp for execution
* ``name`` is the name of the execution
* ``services`` is a list of service IDs that can be used to inspect single services
Terminate execution
^^^^^^^^^^^^^^^^^^^
This endpoint terminates a running execution.
Request (DELETE)::
curl -X DELETE -u 'username:password' http://bf5:8080/api/0.6/execution/<execution_id>
If the request is successful an empty response with status code 200 will be returned.
Delete execution
^^^^^^^^^^^^^^^^
This endpoint deletes an execution from the database, terminating it if it is running.
Request (DELETE)::
curl -u 'username:password' http://bf5:8080/api/0.6/execution/delete/<execution_id>
If the request is successful an empty response with status code 200 will be returned.
List all executions
^^^^^^^^^^^^^^^^^^^
This endpoint will list all executions belonging to the calling user. If the user has an administrator role, executions for all users will be returned.
Request (GET)::
curl -u 'username:password' http://bf5:8080/api/0.6/execution
Will return a JSON document like this::
{
"25152" : {
"time_submit" : 1473337122.87461,
"id" : 25152,
[...]
"status" : "running",
"time_start" : 1473337156.8096,
"services" : [
26768
],
"time_end" : null,
"name" : "boinc-loader",
"error_message" : null
},
"25086" : {
"time_start" : 1473337123.30892,
"status" : "running",
"user_id" : "venzano",
[..]
It is a map with the execution IDs as keys and the full execution details as values.
Start execution
^^^^^^^^^^^^^^^
Request (POST)::
curl -X POST -u 'username:password' --data-urlencode @filename http://bf5:8080/api/0.6/execution
Needs a JSON document passed as the request body::
{
"application": <zapp json>,
'name': "experiment #33"
}
Where:
* ``application`` is the full ZApp JSON document, the application description
* ``name`` is the name of the execution provided by the user
Will return a JSON document like this::
{
"execution_id": 23441
}
Where:
* ``execution_id`` is the ID of the new execution just created.
Service endpoint
----------------
All the endpoints listed in this section require authentication.
Service details
^^^^^^^^^^^^^^^
Request::
curl -u 'username:password' http://bf5:8080/api/0.6/service/<service_id>
Will return a JSON document like this::
{
"status" : "active",
"service_group" : "boinc-client",
"docker_status" : "started",
"ip_address" : "10.0.0.94",
"execution_id" : 25158,
"name" : "boinc-client0",
"docker_id" : "d0042c69b54e90327d9287e099304b6c25921d81f639803494ea744445d58430",
"error_message" : null,
"id" : 26774,
"description" : {
"required_resources" : {
"memory" : 536870912
},
[...]
"name" : "boinc-client",
"volumes" : []
}
}
Where:
* ``status`` is the service status from Zoe point of view. It can be one of "terminating", "inactive", "active" or "starting"
* ``service_group`` is the name for the service provided in the ZApp description. When the ZApp is unpacked to create the actual containers a single service definition will spawn one or more services with this name in common
* ``docker_status`` is the container status from the point of view of Docker. Zoe tries her best to keep this value in sync, but the value here can be out of sync by several minutes. It can be one of 'undefined', 'created', 'started', 'dead' or 'destroyed'
* ``ip_address`` is the IP address of the container
* ``execution_id`` is the execution ID this service belongs to
* ``name`` is the name for this service instance, generated from ``service_group``
* ``docker_id`` is the Docker ID string
* ``error_message`` is currently unused
* ``id`` is the ID of this service, should match the one given in the URL
* ``description`` is the service description extracted from the ZApp
Service standard output and error
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Request::
curl -u 'username:password' http://bf5:8080/api/0.6/service/logs/<service_id>
Will stream the service instance output, starting from the time the service started. It will close the connection when the service exits.
Discovery endpoint
------------------
This endpoint does not need authentication. It returns a list of services that meet the criteria passed in the URL. It can be used as a service discovery mechanism for those ZApps that need to know in advance the list of available services.
Request::
curl http://bf5:8080/api/0.6/discovery/by_group/<execution_id>/<service_type>
Where:
* ``execution_id`` is the numeric ID of the execution we need to query
* ``service_type`` is the service name (as defined in the ZApp) to filter only services of that type
Will return a JSON document, like this::
{
"service_type" : "boinc-client",
"execution_id" : "23015",
"dns_names" : [
"boinc-client0-23015-prod"
]
}
Where:
* ``service_type`` is the name of the service as passed in the URL
* ``execution_id`` is the execution ID as passed in the URL
* ``dns_names`` is the list of DNS names for each service instance currently active (only one in the example above)
Statistics endpoint
-------------------
This endpoint does not need authentication. It returns current statistics about the internal Zoe status.
Scheduler
^^^^^^^^^
Request::
curl http://bf5:8080/api/0.6/statistics/scheduler
Will return a JSON document, like this::
{
"termination_threads_count" : 0,
"queue_length" : 0
}
Where:
* ``termination_threads_count`` is the number of executions that are pending for termination and cleanup
* ``queue_length`` is the number of executions in the queue waiting to be started
.. _roadmap:
Roadmap
=======
We, the main developers of Zoe, are an academic research team. As such we have limited resources and through collaborations with other universities and private companies our aim is to do research and advance the state of the art. Our roadmap reflects this and pushes more on large-scale topics than on specific features.
The first priority for Zoe is to mature a stable and modular architecture on which advanced features can be built. Most of the work that is going into version 0.10.x is related to this point.
Scheduler architectures and resource allocation
-----------------------------------------------
In parallel to classic, stable and well known schedulers (FIFO), we plan to design and implement within Zoe novel approaches to application scheduling and resource allocation. This includes: