Commit 3d28bab6 authored by Daniele Venzano's avatar Daniele Venzano

Merge master branch

parents cd4ae27e 847058e5
Zoe - Container-based Analytics as a Service
============================================
Zoe provides a simple way to provision data analytics applications using Docker Swarm.
Zoe provides a simple way to provision any kind of data analytics applications.
Resources:
- Main website: http://zoe-analytics.eu
- Documentation: http://docs.zoe-analytics.eu
- How to install: http://zoe-analytics.readthedocs.org/en/latest/install.html
- Website: http://zoe-analytics.eu
- Documentation: http://docs.zoe-analytics.eu
- Mailing list: http://www.freelists.org/list/zoe
- Issue tracker: https://github.com/DistributedSystemsGroup/zoe/issues
Zoe applications can be easily created by users, we provide several examples in the `zoe-applications https://github.com/DistributedSystemsGroup/zoe-applications`_ repository to get you started.
Zoe applications (ZApps):
Other Zoe resources:
- Zoe applications: https://github.com/DistributedSystemsGroup/zoe-applications
- Zoe logger: https://github.com/DistributedSystemsGroup/zoe-logger
A note on the master branch
---------------------------
We are currently redesigning Zoe with a new architecture, so the master branch is unstable and changes very rapidly.
The latest stable version is maintained under the 0.9.7-stable branch. All the documentation currently refers to this stable version, unless otherwise noted.
Repository contents
-------------------
- `contrib`: supervisord config files
- `docs`: Sphinx documentation
- `scripts`: Scripts used to test Zoe images outside of Zoe
- `zoe_cmd`: Command-line client
- `zoe_lib`: Client-side library, contains also some modules needed by the observer and the master processes
- `zoe_master`: The core of Zoe, the server process that listens for client requests and creates the containers on Swarm
- `zoe_api`: The web client interface
|Travis build| |Documentation Status|
- Repository: https://github.com/DistributedSystemsGroup/zoe-applications
Zoe is licensed under the terms of the Apache 2.0 license.
.. |Documentation Status| image:: https://readthedocs.org/projects/zoe-analytics/badge/?version=latest
:target: https://readthedocs.org/projects/zoe-analytics/?badge=latest
.. |Travis build| image:: https://travis-ci.org/DistributedSystemsGroup/zoe.svg
:target: https://travis-ci.org/DistributedSystemsGroup/zoe
......@@ -3,17 +3,69 @@
Developer documentation
=======================
:ref:`modindex`
As a developer you can:
- call Zoe from your own software: :ref:`Zoe REST API documentation <rest-api>`
- create ot modify ZApps: :ref:`howto_zapp`
- contribute to Zoe: keep reading
Contributing to Zoe
-------------------
Zoe is open source and all kinds of contributions are welcome.
Zoe is licensed under the terms of the Apache 2.0 license.
Bugs, issues and feature requests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
`Zoe issue tracker <https://github.com/DistributedSystemsGroup/zoe/issues>`_
Testing beta code
^^^^^^^^^^^^^^^^^
The ``HEAD`` of the master branch represents the latest version of Zoe. Automatic tests are performed before code is merged into master, but human feedback is invaluable. Clone the repository and report on the `mailing list <http://www.freelists.org/list/zoe>`_ or on the `issue tracker <https://github.com/DistributedSystemsGroup/zoe/issues>`_.
Code changes and pull requests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**When you contribute code, you affirm that the contribution is your original work and that you license the work to the project under the project’s open source license. Whether or not you state this explicitly, by submitting any copyrighted material via pull request, email, or other means you agree to license the material under the project’s open source license and warrant that you have the legal authority to do so.**
To contribute code and/or documentation you should follow this workflow:
1. announce your idea on the mailing list, to prevent duplicated work
2. fork the Zoe repository via GitHub (if you don't already have a fork)
3. ... develop and debug ...
4. when you are ready propose your changes with a pull request
Zoe maintainers will review your code, give constructive feedback and eventually accept the code and merge.
Contributors can setup their own CI pipeline following the quality guidelines (:ref:`quality`). At a bare minimum all code should be tested via the `run_tests.sh` script available in the root of the repository. Accepted contributions will be run through the full Zoe CI pipeline before being merged in the public repository.
Repository contents
^^^^^^^^^^^^^^^^^^^
- `docs`: Sphinx documentation used to build these pages
- `scripts`: scripts for deployment and testing
- `zoe_api`: the front-end Zoe process that provides the REST API
- `zoe_cmd`: Command-line client
- `zoe_lib`: library, contains common modules needed by the api and the master processes
- `zoe_master`: the back-end Zoe process schedules and talks to the containerization system
- `contrib`: supervisord config files and sample ZApps
Internal module/class/method documentation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. toctree::
:maxdepth: 2
:maxdepth: 1
introduction
rest-api
auth
rest-api
api-endpoint
master-api
scheduler
backend
stats
kube_backend
:ref:`modindex`
General design decisions
========================
In this architecture we overturned our previous decision of keeping state internal, with periodic checkpointing.
State is kept in Postgres and shared among the different Zoe components. For a distributed system an external database simplifies enormously many common situation, with transactions and strong guarantees of consistency.
User management is left out of Zoe as much as possible. User authentication backends provide just a minimum of information for Zoe: a user ID and a role. Zoe does not manage creation, deletion, passwords, etc.
Zoe is distributed and uses threads to keep the APIs responsive at all times.
Object naming
-------------
Database IDs are used to identify executions and services. Container names within Docker Swarm must be unique, we decided to produce names that give some information to the administrator who looks at the output of ``docker ps`` instead of using opaque UUIDs. In addition, these same names are exposed by standard monitoring tools.
.. _main_index:
Zoe - Container-based Analytics as a Service
============================================
Zoe is a user facing software that hides the complexities of managing resources, configuring and deploying complex distributed applications on private clouds. The main focus are data analysis applications, such as `Spark <http://spark.apache.org/>`_, but Zoe has a very flexible application description format that lets you easily describe any kind of application.
Zoe provides a simple way to provision data analytics applications. It hides the complexities of managing resources, configuring and deploying complex distributed applications on private clouds. Zoe is focused on data analysis applications, such as `Spark <http://spark.apache.org/>`_ or `Tensorflow <https://www.tensorflow.org/>`_. A generic, very flexible application description format lets you easily describe any kind of data analysis application.
Downloading
-----------
Get Zoe from the `GitHub repository <https://github.com/DistributedSystemsGroup/zoe>`_. Stable releases are tagged on the master branch and can be downloaded from the `releases page <https://github.com/DistributedSystemsGroup/zoe/releases>`_.
Zoe is written in Python 3.4+ and requires a number of third-party packages to function. Deployment scripts for the supported back-ends, install and setup instructions are available in the :ref:`installation guide <install>`.
Quick tutorial
--------------
To use the Zoe command-line interface, first of all you have to define three environment variables::
export ZOE_URL=http://localhost:5000 # address of the zoe-api instance
export ZOE_USER=joe # User name
export ZOE_PASS=joesecret # Password
Zoe uses containerization technology to provide fast startup times and process isolation. A smart scheduler is able to prioritize executions according to several policies, maximising the use of the available capacity and maintaining a queue of executions that are ready to run.
Now you can check that you are up and running with this command::
Zoe currently supports Docker Swarm as the container backend. It can be located anywhere, on Amazon or in your own private cloud, and Zoe does not need exclusive access to it, meaning your Swarm could also be running other services: Zoe will not interfere with them. Zoe is meant as a private service, adding data-analytics capabilities to new or existing clusters.
./zoe.py info
The core components of Zoe are application-independent and users are free to create and execute application descriptions for any kind of service combination. Zoe targets analytics services in particular: we offer a number of tested sample ZApps and Frameworks that can be used as starting examples.
It will return some version information, by querying the zoe-api and zoe-master processes.
To better understand what we mean by "analytic service", here are a few examples:
Zoe applications are passed as JSON files. A few sample ZApps are available in the ``contrib/zoeapps/`` directory. To start a ZApp use the following command::
* Spark
* Zookeeper
* Hadoop (HDFS in particular)
* Cassandra
* Impala
* ... suggestions welcome!
./zoe.py start joe-spark-notebook contrib/zoeapps/eurecom_aml_lab.json
A number of predefined applications for testing and customization can be found at the `zoe-applications <https://github.com/DistributedSystemsGroup/zoe-applications>`_ repository.
ZApp execution status can be checked this way::
Have a look at the :ref:`vision` and at the :ref:`roadmap` to see what we are currently planning and feel free to `contact us <daniele.venzano@eurecom.fr>`_ via email or through the `GitHub issue tracker <https://github.com/DistributedSystemsGroup/zoe/issues>`_ to pose questions or suggest ideas and new features.
./zoe.py exec-ls # Lists all executions, past and present
./zoe.py exec-get <execution id> # Inspects an execution
A note on terminology (needs to be updated)
-------------------------------------------
Where ``execution id`` is the ID of the ZApp execution to inspect, taken from the ``exec-ls`` command.
We are spending a lot of effort to use consistent naming throughout the documentation, the software, the website and all the other resources associated with Zoe. Check the :ref:`architecture` document for the details, but here is a quick reference:
* Zoe Components: the Zoe processes, the Master, the API and the service monitor
* Zoe Applications: a composition of Zoe Frameworks, is the highest-level entry in application descriptions that the use submits to Zoe, can be abbreviated in ZApp(s).
* Zoe Frameworks: a composition of Zoe Services, is used to describe re-usable pieces of Zoe Applications, like a Spark cluster
* Zoe Services: one to one with a Docker container, describes a single service/process tree running in an isolated container
Where to go from here
---------------------
Contents
--------
Main documentation
^^^^^^^^^^^^^^^^^^
.. toctree::
:maxdepth: 2
:maxdepth: 1
install
kube_backend
config_file
logging
proxy
monitoring
architecture
quality
vision
motivations
roadmap
contributing
proxy
Zoe applications
----------------
:ref:`modindex`
^^^^^^^^^^^^^^^^
.. toctree::
:maxdepth: 2
:maxdepth: 1
zapps/classification
zapps/howto_zapp
zapps/zapp_format
zapps/contributing
Developer documentation
-----------------------
:ref:`modindex`
Development and contributing to the project
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. toctree::
:maxdepth: 1
developer/index
architecture
quality
motivations
vision
roadmap
contributing
External resources
^^^^^^^^^^^^^^^^^^
Contacts
========
`Zoe website <http://zoe-analytics.eu>`_
`Zoe mailing list <http://www.freelists.org/list/zoe>`_
About us
========
Zoe is developed as part of the research activities of the `Distributed Systems Group <http://distsysgroup.wordpress.com>`_ at `Eurecom <http://www.eurecom.fr>`_, in Sophia Antipolis, France.
- `Zoe Homepage <http://zoe-analytics.eu>`_
- `Issue tracker <https://github.com/DistributedSystemsGroup/zoe/issues>`_
- `ZApp repository <https://github.com/DistributedSystemsGroup/zoe-applications>`_
The main discussion area for issues, questions and feature requests is the `GitHub issue tracker <https://github.com/DistributedSystemsGroup/zoe/issues>`_.
Zoe is licensed under the terms of the Apache 2.0 license.
......@@ -18,7 +18,7 @@ How it works
1. Zoe configuration file:
* ``--backend``: put Kubernetes instead of Docker Swarm
* ``--kube-config-file``: the configuration file of kubernetes cluster
* ``--kube-config-file``: the configuration file of Kubernetes cluster
2. Zoe:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment