Commit 06fbad59 authored by Daniele Venzano's avatar Daniele Venzano

Update the documentation

parent b70c1764
.. _devel_backend:
Backend abstraction
===================
Back-end abstraction
====================
The container backend Zoe uses is configurable at runtime. Internally there is an API that Zoe, in particular the scheduler, uses to communicate with the container backend. This document explains the API, so that new backends can be created and maintained.
The container back-end Zoe uses is configurable at runtime. Internally there is an API that Zoe, in particular the scheduler, uses to communicate with the container back-end. This document explains the API, so that new back-ends can be created and maintained.
Zoe assumes backends are composed of multiple nodes. In case the backend is not clustered or does not expose per-node information, it can be implemented in Zoe as exposing a single node.
Zoe assumes back-ends are composed of multiple nodes. In case the back-end is not clustered or does not expose per-node information, it can be implemented in Zoe as exposing a single node.
Package structure
-----------------
Backends are written in Python and live in the ``zoe_master/backends/`` directory. Inside there is one Python package for each backend implementation.
Back-ends are written in Python and live in the ``zoe_master/backends/`` directory. Inside there is one Python package for each backend implementation.
To let Zoe use a new backend, its class must be imported in ``zoe_master/backends/interface.py`` and the ``_get_backend()`` function should be modified accordingly. Then the choices in ``zoe_lib/config.py`` for the configuration file should be expanded to include the new backend name.
To let Zoe use a new back-end, its class must be imported in ``zoe_master/backends/interface.py`` and the ``_get_backend()`` function should be modified accordingly. Then the choices in ``zoe_lib/config.py`` for the configuration file should be expanded to include the new back-end class name.
More options to the configuration file can be added to support the new backend. Use the ``--<backend name>-<option name>`` convention for them.
More options to the configuration file can be added to support the new backend. Use the ``--<backend name>-<option name>`` convention for them. If the new options do not fit the zoe.conf format, a separate configuration file can be used, like in the DockerEngine and Kubernetes cases.
API
---
Whenever Zoe needs to access the container backend it will create a new instance of the backend class. The class must be a child of ``zoe_master.backends.base.BaseBackend``.
Whenever Zoe needs to access the container back-end it will create a new instance of the back-end class. The class must be a child of ``zoe_master.backends.base.BaseBackend``. The class is not used as a singleton and may be instantiated concurrently, multiple times and in different threads.
.. autoclass:: zoe_master.backends.base.BaseBackend
:members:
......@@ -39,17 +39,7 @@ Variables
To run the tests a number of variables need to be set from the GitLab interface:
* REGISTRY_PASSWORD: the password used for authenticating with the registry via docker login
* SONARQUBE_SERVER_URL: the URL of the SonarQube server
* SONARQUBE_USER: the SonarQube user
* SSH_PRIVATE_KEY: private key to be used to deploy via rsync the staging build
* STAGING_IP: IP/hostname of the staging server
* WEB_STAGING_PATH: path for the web interface on the staging server
* ZOE_STAGING_PATH: path for Zoe on the staging server
* SWARM_URL: URL of a docker engine/swarm to run integration tests
SonarQube
---------
To run SonarQube against Zoe we use a special Docker image, `available on the Docker Hub <https://hub.docker.com/r/zoerepo/sonar-scanner/>`_.
You can also build it from the Dockerfile available at ``ci/gitlab-sonar-scanner/``, relative to the repository root.
......@@ -67,7 +67,6 @@ Internal module/class/method documentation
scheduler
backend
stats
jenkins-ci
gitlab-ci
integration_test
......
......@@ -3,65 +3,36 @@
Zoe Integration Tests
=====================
* Overview
Overview
--------
- Testing the zoe rest api in action.
- The backend could be swarm or kubernetes
The objective of integration testing is to run Zoe through a simple workflow to test basic functionality in an automated manner.
* What will it do
How it works
------------
- Launch two containers for zoe-api and zoe-master, one for postgresql
- Connect to the backend (kubernetes/swarm) and test the rest API of zoe.
- The authentication type is ``text`` for simplicity.
- The test would be described in a Jenkins job
- The whole process could be described in below steps:
- Build the container image for zoe. The tag is the $BUILD_ID from jenkins
- Deploy zoe with the new image, base on the docker-compose-test.yml
- Start the test for all api
- Generate coverage report
- Push the built image to the private registry
- Deploy zoe with the new image, base on the docker-compose-prod.yml
The integration tests are sun by GitLab CI, but can also be run by hand. Docker is used to guarantee reproducibility and a clean environment for each test run.
The job stops whenever one of the step above fails.
Two containers are used:
The last two steps could be optional if there’s no need to deploy zoe everytime.
* Standard Postgres 9.3
* Python 3.4 container with the Zoe code under test
* How to do it
Pytest will start a zoe-api and a zoe-master, then proceed querying the REST API via HTTP calls.
- Requirements:
* The DockerEngine back-end is used
* The authentication type is ``text`` for simplicity.
- A workable cluster. It could be Kubernetes or Swarm
- A private registry to push the built images.
- The runner for the integration test is contained in zoeci.py file
- Arguments explanation:
The code is under the ``integration_tests`` directory.
- argv[1]: 0: deploy, 1: build, 2: push
- args[2]: address for docker sock
- For build case:
- args[3]: private_registry_address/zoe:$BUILD_ID
- For deploy case:
- args[3]: docker-compose file location
- args[4]: private_registry_address/zoe:$BUILD_ID
What is being tested
--------------------
- Explanation on script for Jenkins job can be found on the document of continuous integration of Zoe.
The following endpoints are tested, with good and bad authentication information. Return status codes are checked for correctness.
* How to expand it?
* info
* userinfo
* execution start, list, terminate
* service list
- The initial infrastructure could be reused.
- Current tests of zoe use the unittest built-in library of Python, new library could be used based on the need.
- Current tests of zoe focus on testing the behaviors of the rest api:
- info
- userinfo
- execution
- service
- with two types of authentication:
- text
- cookie
- and two scenarios:
- success
- failure
- The new tests could be added into ``tests`` folder
A simple ZApp with an nginx web server is used for testing the execution start API.
.. _ci-jenkins:
Zoe Continuous Integration with Jenkins
=======================================
Overview
--------
- Integrate Zoe repository to Jenkins and SonarQube
- Each commit to Zoe repository trigger a build at Jenkins:
- Run SonarQube Scanner to analyze the codebase
- Create two containers for zoe-master, zoe-api
- Run integration test [testing rest api]
- Build new images if no errors happen
- Deploy Zoe with latest images
Software Stack
--------------
- Jenkins - version 2.7.4
- SonarQube - version 6.1
Configuration
-------------
- Jenkins: all the configurations in this section is configured on Jenkins Server
- Required:
- Plugins: Github plugin, SonarQube Plugin, Quality Gates, Email Plugin (optional), Cobertura Coverage Report (optional)
- Softwares: Java, Python, Docker
- Go to **Manage Jenkins**, then **Global Tool Configuration** to setup Java SDK, SonarQube Scanner
- SonarQube server configuration: this aims to connect Jenkins and SonarQube together
- Go to **Manage Jenkins** then **Configure System**
- SonarQube servers: input name, server URL, server version, **sever authentication token** (created on SonarQube Server)
- Quality Gates configuration:
- Go to **Manage Jenkins** then **Configure System**
- Quality Gates: input name, server URL, username and password to login into SonarQube server
- Github Servers configuration:
- Go to **Manage Jenkins** then **Configure System**
- Github: **Add Github Server**, the API URL would be ``https://api.github.com``. The credentials creation is well defined in the document of Github plugin:
- You can create your own [personal access token](https://github.com/settings/tokens/new) in your account GitHub settings.
- Token should be registered with scopes:
- admin:repo_hook - for managing hooks (read, write and delete old ones)
- repo - to see private repos
- repo:status - to manipulate commit statuses
- In Jenkins create credentials as «Secret Text», provided by Plain Credentials Plugin
- Create credentials for Github account: this is similar when you want to [connect to Github over SSH](https://help.github.com/articles/connecting-to-github-with-ssh/), here, beside adding your public key to Github, you also need to add your private key to Jenkins.
- Create SSH key pair on the machine run Jenkins:
- Add public key to Github
- Add private key to Jenkins credentials
- Create new item as a **freestyle project**: this aims to create a Jenkins job with the github repository
- General
- Select Github project
- Insert project URL
- Source Code Management
- Select **Git**
- Repositories
- Repository URL: use **SSH URL** of Github repository
- Credentials: select the one created above
- Build Triggers
- For Github plugin with version before 1.25.1: Select **Build when a change is pushed to Github**
- For Github plugin with version from 1.25.1: Select **GitHub hook trigger for GITScm polling**
- Build
- Add **Execute SonarQube Scanner** to do SonarQube Analysis
- Add **Quality Gates** to break the build when the SonarQube quality gate is not passed
- Add **Execute Shell** to run script for testing, deploying. Please refer to the Appendix section for the script.
- Post-build Actions [Optional]
- Add **Publish Covetura Coverage Report** for getting report from coverage. Due to the shell script in Appendix, the xml file generated by coverage is located at ``test`` folder, so, we should put ``**/tests/coverage.xml`` as the input of the field **Cobertura xml report pattern**.
- Add **E-mail Notification** for notifying when jobs finish
- Github
- Add new SSH key (the one created on Jenkins server)
- Go to the project (which is integrated to Jenkins) settings
- Integration & Services
- Add Service, choose **Jenkins (Github plugin)**
- Add Jenkins hook url
- For github plugin, this one would have the format: http://your-jenkins.com/github-webhook
- In case your Jenkins doens't expose to the world, try https://ngrok.com/
- SonarQube: all the configurations in this section is configured on SonarQube Server
- On **Administration**, go to **My Account**, then **Security**
- Generate Tokens, copy this and paste to **server authentication token** on Jenkins configuration
- The project needs to provides **sonar-properties** file in the repo:(http://docs.sonarqube.org/display/SCAN/Analyzing+with+SonarQube+Scanner)
- Then, on System then Update Center, install two plugins for Python and TypeScript.
Appendix
--------
- Sonar properties files
- Take a look at sonar-project.properties files in root, ``zoe_api``, ``zoe_master``, ``zoe_lib``, ``zoe_fe`` folders.
- Execute Shell Script
- Push this script inside the execute shell script of Jenkins job you created above, the zoe_rest_api can be changed in the ``test_config.py`` file.
::
# Run Style checker for Sphinx RST documentation
doc8 docs/
# Build new container images
python3 ci/zoeci.py 1 tcp://192.168.12.2:2375 192.168.12.2:5000/zoe:$BUILD_ID
# Deploy new zoe with the above images for testing
python3 ci/zoeci.py 0 tcp://192.168.12.2:2375 ci/docker-compose-test.yml 192.168.12.2:5000/zoe:$BUILD_ID
# Run integration test
cd tests
coverage run -p basic_auth_success_test.py
coverage run -p cookie_auth_success_test.py
coverage combine
coverage xml
cd ..
# Push the built images above to local registry
python3 ci/zoeci.py 2 tcp://192.168.12.2:2375 192.168.12.2:5000/zoe:$BUILD_ID
# Redeploy zoe with new images
python3 ci/zoeci.py 0 tcp://192.168.12.2:2375 ci/docker-compose-prod.yml 192.168.12.2:5000/zoe:$BUILD_ID
- Screenshots
- Jenkins Server configuration
- Plugin configuration
- Java SDK Configuration
.. image:: imgs/1.java.config.png
- SonarQube Scanner Configuration
.. image:: imgs/1.2.sonar.config.PNG
- SonarQube Server Configuration
.. image:: imgs/2.sonar.config.png
- Quality Gates Configuration
.. image:: imgs/2.1.sonar.quality.gates.png
- Github Server Configuration
.. image:: imgs/4.1.github.server.config.png
- Github Server Credential Creation
.. image:: imgs/4.1.github.server.credential.png
- Email Notification Configuration
.. image:: imgs/3.email.config.png
- Create Github credentials
.. image:: imgs/4.github.credential.png
- Create Freestyle project
.. image:: imgs/5.1.freestyle.project.png
.. image:: imgs/5.2.freestyle.project.png
.. image:: imgs/5.3.freestyle.project.png
.. image:: imgs/5.4.1.freestyle.project.png
.. image:: imgs/5.4.2.freestyle.project.png
.. image:: imgs/5.4.3.freestyle.project.png
.. image:: imgs/5.5.freestyle.project.png
- SonarQube Configuration
.. image:: imgs/6.sonar.token.png
- Github Repository Configuration
- Create webhook service
.. image:: imgs/7.github.repo.png
- Create access token
.. image:: imgs/7.1.github.access.token.png
......@@ -28,7 +28,7 @@ This endpoint does not need authentication. It returns general, static, informat
Will return a JSON document, like this::
{
"version" : "2017.06",
"version" : "2017.12",
"deployment_name" : "prod",
"application_format_version" : 3,
"api_version" : "0.7"
......@@ -215,6 +215,26 @@ Where:
* ``execution_id`` is the ID of the new execution just created.
Execution endpoints
^^^^^^^^^^^^^^^^^^^
Request (GET)::
curl -X GET -u 'username:password' http://bf5:8080/api/<api_version>/execution/endpoints/<execution_id>
Will return a JSON list like this::
[
['Jupyter Notebook interface', 'http://192.168.47.19:32920/'],
[...]
]
Where each item of the list is a tuple containing:
* The endpoint name
* The endpoint URL
Service endpoint
----------------
......@@ -311,7 +331,8 @@ Will return a JSON document, like this::
{
"termination_threads_count" : 0,
"queue_length" : 0
"queue_length" : 0,
[...]
}
Where:
......@@ -319,77 +340,7 @@ Where:
* ``termination_threads_count`` is the number of executions that are pending for termination and cleanup
* ``queue_length`` is the number of executions in the queue waiting to be started
OAuth2 endpoint
---------------
This endpoint aims to help users authenticate/authorize via an access token instead of raw username/password. It does need authentication when users require new access token. You can refresh an access token by a refresh token.
Request new access token
^^^^^^^^^^^^^^^^^^^^^^^^
Request::
curl -u 'username:password' http://bf5:8080/api/<api_version>/oauth/token -X POST -H 'Content-Type: application/json' -d '{"grant_type": "password"}'
Will return a JSON document, like this::
{
"token_type": "Bearer",
"access_token": "3ddbe9ba-6a21-4e4d-993b-70556390c5d3",
"refresh_token": "9bab190f-e211-42aa-917e-20ce987e355e",
"expires_in": 36000
}
Where:
* ``token_type`` is the type of the token, **Bearer** is used as default
* ``access_token`` is the token used for further authentication/authorization with others api endpoints
* ``refresh_token`` is the token used to get new access token when the current one has expired
* ``expires_in`` is the duration of time (second) when the access_token would be expired
Refresh an access token
^^^^^^^^^^^^^^^^^^^^^^^
Request::
curl -H 'Authorization: Bearer 9bab190f-e211-42aa-917e-20ce987e355e' http://bf5:8080/api/<api_version>/oauth/token -X POST -H 'Content-Type: application/json' -d '{"grant_type": "refresh_token"}'
Will return a JSON document, like this::
{
"token_type": "Bearer",
"access_token": "378f8d5f-2eb5-4181-b632-ad23c4534d32",
"expires_in": 36000
}
Where:
* ``access_token`` is the new access token after users issue a refresh
Revoke an access/refresh token
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Request::
curl -u 'usernam:password' -X DELETE http://bf5:8080/api/<api_version>/oauth/revoke/<token>
Where:
* ``token`` is the access token or refresh token needs to be revoked
Will return a JSON document, like this::
{
"ret": "Revoked token."
}
Authenticate other api endpoint
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Instead of sending raw username, password to request results from other api endpoints which require authentication, use an access token with header ``Authorization: Bearer <token>``
Example::
curl -H 'Authorization: Bearer 378f8d5f-2eb5-4181-b632-ad23c4534d32' http://bf5:8080/api/<api_version>/execution
The actual content of the response may vary between different Zoe releases.
Login endpoint
--------------
......@@ -406,7 +357,7 @@ Will return a JSON document, like this::
"uid": "admin"
}
And a file named zoe_cookie.txt contains the cookie information.
And a file named ``zoe_cookie.txt`` contains the cookie information.
Pass this cookie on each api request which requires authentication.
......@@ -417,5 +368,5 @@ Example::
Note:
- For zoe web interface, we require cookie_based mechanism for authentication/authorization.
- Every unauthorized request will be redirected to **http://bf5:8080/login**
- After successfully login, a cookie will be saved at browser for further authentication/authorization purpose.
- Every unauthorized request will be redirected to **http://<hostname>:8080/login**
- After a successful login, a cookie will be saved in the browser for further authentication/authorization purpose.
......@@ -3,9 +3,9 @@
Classification
==============
Zoe runs processes inside containers and the Zoe application description is very generic, allowing any kind of application to be described in Zoe and submitted for execution. While the main focus of Zoe are so-called "analytic applications", there are many other tools that can be run on the same cluster, for monitoring, storage, log management, history servers, etc. These applications can be described in Zoe and executed, but they have quite different scheduling constraint.
Zoe runs processes inside containers and the Zoe application description is very generic, allowing any kind of application to be described in Zoe and submitted for execution. While the main focus of Zoe are so-called "analytic applications", there are many other tools that can be run on the same cluster, for monitoring, storage, log management, history servers, etc. These applications can be described in Zoe and executed, but they have quite different scheduling constraints. Zoe is not a generic application deployment solution and lacks, by design, features like automatic migration, rolling upgrades, etc.
Please note that in this context an "elastic" service is a service that "can be automatically resized". HDFS can be resized, but it is done as an administrative operation that requires setting up partitions and managing the network and disk load caused by rebalancing. For this reason we do not consider it as "elastic".
Please note that in this context an "elastic" service is a service that "can be automatically killed or started".
- Long running: potentially will never terminate
......
......@@ -18,23 +18,15 @@ A ZApp repository contains a number of well-known files, that are used to automa
* ``root``
* ``docker/`` : directory containing Docker image sources (Dockerfiles and associated files)
* ``README-devel.md`` : documentation for the ZApp developer
* ``README-devel.md`` : documentation for the ZApp developer (optional)
* ``README.md`` : documentation for the ZApp user
* ``build_all.sh`` : builds the Docker images and pushes them to a registry
* ``gen_*.py`` : Python script that generate the ZApp description JSON files
* ``zapp.json`` : JSON ZApp description
* ``manifest.json`` : manifest with high-level information needed for the ZApp shop
* ``logo.png`` : logo for the ZApp, it will be used in the future ZAppShop
* ``validate_all.sh`` : runs the generated JSON files through the Zoe API validation endpoint
The scripts expect a number of environment variables to be defined:
* DOCKER_REGISTRY : hostname (with port if needed) of the Docker registry to use
* REPOSITORY : name of the image repository inside the registry to use
* VERSION : image version (normally this is set by the CI environment to be a build number or the commit hash)
* VALIDATION_URL : Zoe API URL for the validation endpoint (the default expects the zoe-api process to be running on localhost on the 5001 port)
A ZApp is composed by two main elements:
* a container image: the format depends on the container backend in use, currently Docker is the most common one
* a container image: the format depends on the container back-end in use, currently Docker is used for Zoe
* a JSON description: the magic ingredient that makes Zoe work
The JSON description contains all the information needed to start the containers that make up the application. Apart from some metadata, it contains a list of ``services``. Each service describes one or more (almost) identical containers. Please note that Zoe does not replicate services for fault tolerance, but to increase parallelism and performance (think in terms of additional Spark workers, for example).
......@@ -46,68 +38,15 @@ The Tensorflow ZApp
Clone the `Tensorflow ZApp <https://gitlab.eurecom.fr/zoe/zapp-tensorflow>`_ repository.
It contains two variants of the Tensorflow ZApp that we will examine in detail:
1. A simple ZApp that uses the unmodified Google Docker image with a notebook for interactive use
2. A batch ZApp that uses a custom image containing the HEAD version of Tensorflow
The interactive Tensorflow ZApp with stable release from Google
---------------------------------------------------------------
Open the ``gen_json_google.py`` script.
At the beginning of the file we define two constants:
* APP_NAME : name of Zoe Application. It is used in various places visible to the user.
* ZOE_APPLICATION_DESCRIPTION_VERSION : the format version this description conforms to
Then there is dictionary called ``options`` that lists parameters that can be changed to obtain different behaviors. In this case the ZApp is quite simple and we can tune only the amount of cores and memory that out ZApp is going to consume. This information is going to be used for scheduling and for placing the container in the cluster.
To keep the script standardized other constants are defined here, but are not used in this specific ZApp. They load values from the environment, as defined above.
``GOOG_IMAGE`` contains the name of the container image that Zoe is going to use. Here we point directly to the Tensorflow image on Google's registry.
Services
^^^^^^^^
The first function that is defined in the script is ``goog_tensorflow_service``. It defines the Tensorflow service in Zoe.
The format is detailed in the :ref:`zapp_format` document. Of note, here, are the two network ports that are going to be exposed, one for the Tensorboard interface and one for the Notebook web interface.
The ZApp
^^^^^^^^
At the end an ``app`` dictionary is build, containing ZApp metadata and the service we defined above. The dictionary is then dumped in JSON format, that Zoe can understand.
After running the script, the ZApp can be started with ``zoe.py start google-tensorflow goog_tensorflow.json``
The batch Tensorflow with custom image
--------------------------------------
First of all have a look at the Dockefile contained in ``docker/tensorflow/Dockerfile``.
It is based on a Ubuntu image and installs everything that is needed to build Tensorflow. Since the build system uses bazel, that in turn needs Java, the resulting image is quite large, but it can be used for developing Tensorflow.
It also makes those pesky warnings about Tensorflow not being optimized for your CPU disappear.
The Dockefile clones the Tensorflow repository and build the master branch, HEAD commit.
Please note that there is nothing Zoe-specific in this Dockerfile. Zoe can run pre-built images from public registries as well as custom images from private registries.
Run the ``build_all.sh`` script to build the Docker image. It will take several minutes, so you can have a coffee break in the meantime.
Now open the ``gen_json_standalone.py`` script. It is almost identical to the one we saw above, the only notable change is about the image name, that now is generated from the environment variables used to build the image.
Depending on the backend in use in you Zoe deployment, you may have to pull the image from the registry before being able to start the ZApp.
The ZApp uses the standard Tensorflow image released by google. The image contains Python, the Tensorflow library and a Jupyter Notebook.
After running the Python script, the ZApp can be started with ``zoe.py start my-long-running-training custom_tensorflow.json``
The ``tf-google.json`` file contains the JSON description of the ZApp. The format of this file is described in the :ref:`zapp_format` document.
Concluding remarks
^^^^^^^^^^^^^^^^^^
This ZApp has a single service and is a very good example of how to use a pre-existing image on the Docker Hub for execution in Zoe.
In this tutorial we examined in detail a sample Tensorflow ZApp. We saw where the memory and cores parameters are defined and how to customize them.
The ``image`` field points to an image name that the Zoe back-end is able to understand. Managing Docker images is outside the scope of Zoe: ideally you have in-place, in your cluster, a system that distributes the images on all the nodes for fast ZApp start-up times and that keeps them updated, to make sure new versions with bug fixes are made automatically available. The ``.gitlab.yml`` file contains the GitLab CI description that we use at Eurecom to automatically deploy new versions of the ZApp in our cluster.
The tutorial has also explained how to use third party Docker images or how to build new ones in-house for running development versions of standard software.
The user is also able to override the ``command``: this way the Notebook is not started and the user command is executed instead, effectively transforming the ZApp into a batch one able to run non-interactive scripts.
We have a lot of great ideas on how to evolve the ZApp concept, but we are sure you have many more! Any feedback or comment is always welcome, `contact us directly <daniele.venzano@eurecom.fr>`_ or through the `GitHub issue tracker <https://github.com/DistributedSystemsGroup/zoe/issues>`_.
The ``manifest.json`` file describes the ZApp in terms of the ZApp Shop in the Zoe web interface. It contains the logo and usage instructions file names and options that are presented to the user when she wants to start the ZApp.
The manifest and the ZApp Shop are documented in the :ref:`install` document.
......@@ -15,7 +15,7 @@
"""Versions."""
ZOE_VERSION = '2017.12-beta'
ZOE_VERSION = '2017.12'
ZOE_API_VERSION = '0.7'
ZOE_APPLICATION_FORMAT_VERSION = 3
SQL_SCHEMA_VERSION = 6 # ---> Increment this value every time the SQL schema changes !!! <---
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment