Commit 7964de6e authored by Daniele Venzano's avatar Daniele Venzano

Start adding some structure to the documentation

parent 2b1a4f2d
......@@ -2,7 +2,7 @@
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXOPTS = -b coverage
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
......
Architecture
============
It is composed of:
* zoe: command-line client
* zoe-scheduler: the main daemon that performs application scheduling and talks to Swarm
* zoe-web: the web service
\ No newline at end of file
......@@ -20,7 +20,7 @@ import shlex
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('..'))
# -- General configuration ------------------------------------------------
......
How to contribute
=================
API documentation
-----------------
:ref:`modindex`
.. toctree::
:maxdepth: 2
developer/client
.. module:: zoe_client
The zoe_client module
=====================
.. class:: zoe_client.ZoeClient
The ZoeClient class
-------------------
......@@ -4,25 +4,30 @@
Zoe - Container-based Analytics as a Service
============================================
Zoe uses Docker Swarm to run Analytics as a Service applications. Currently only Spark is supported, but we are planning inclusion of other frameworks.
Zoe uses `Docker Swarm <https://docs.docker.com/swarm/>`_ to run Analytics as a Service applications.
It is composed of:
Zoe can use a Docker Swarm located anywhere, on Amazon or in your own private cloud, and does not need exclusive access to it, meaning
your Swarm could also be running other services: Zoe will not interfere with them. Zoe is meant as a private service, adding data-analytics
capabilities to existing, or new, Docker clusters, maximising the use of already provisioned capacity.
* zoe: command-line client
* zoe-scheduler: the main daemon that performs application scheduling and talks to Swarm
* zoe-web: the web service
Currently only the `Spark framework <http://spark.apache.org/>`_ is supported, but we are planning inclusion of other frameworks. Have a look at the :ref:`vision` and at the
`roadmap <https://github.com/DistributedSystemsGroup/zoe/blob/master/README.md>`_ to see what we are currently planning and feel free to contact us through the
GitHub issue tracker to pose questions or suggest ideas and new features.
Contents:
.. toctree::
:maxdepth: 2
:maxdepth: 2
install
architecture
vision
contributing
Contacts
========
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
Zoe is developed as part of the research activities of the `Distributed Systems Group <http://distsysgroup.wordpress.com>`_ at `Eurecom <http://www.eurecom.fr>`_, in
Sophia Antipolis, France.
The main discussion area for issues, questions and feature requests is the `GitHub issue tracker <https://github.com/DistributedSystemsGroup/zoe/issues>`_.
Installing Zoe
==============
Requirements
------------
* MySQL to keep all the state
* Docker Swarm
* A Docker registry containing Spark images
* Apache Web Server to act as a reverse proxy
How to install
--------------
1. Clone this repository
2. Generate a sample configuration file with ``zoe.py write-config zoe.conf``
3. Edit ``zoe.conf`` and check/modify the following sections (the other sections are covered below):
* flask (use in a python interpreter ``import os; os.urandom(24)`` to generate a new key)
* filesystem
* smtp
4. Setup supervisor to manage Zoe processes: in the ``scripts/supervisor/`` directory you can find the configuration file for
supervisor. You need to modify the paths to point to where you cloned Zoe.
5. Start running applications!
Zoe configuration is read from an 'ini' file, the following locations are searched for a file names ``zoe.conf``:
* working path (.)
* /etc/zoe
Database
--------
1. Install MySQL/MariaDB, or any other DB supported by SQLAlchemy.
2. Create a database, a user and a password and use these to build a connection string like ``mysql://<user>:<password>@host/db``
3. Put this string in section ``[db]`` of zoe.conf
Swarm/Docker
------------
Install Docker and the Swarm container:
* https://docs.docker.com/installation/ubuntulinux/
* https://docs.docker.com/swarm/install-manual/
For testing you can use a Swarm with a single Docker instance located on the same host/VM.
Network configuration
^^^^^^^^^^^^^^^^^^^^^
Zoe assumes that containers placed on different hosts are able to talk to each other freely. Since we use Docker on bare metal, we
use an undocumented network configuration, with the docker bridges connected to a physical interface, so that
containers on different hosts can talk to each other on the same layer 2 domain.
To do that you need also to reset the MAC address of the bridge, otherwise bridges on different hosts will have the same MAC address.
Other configurations are possible, but configuring Docker networking is outside the scope of this document.
Images: Docker Hub Vs local Docker registry
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The images used by Zoe are available on the Docker Hub:
* https://hub.docker.com/r/zoerepo/spark-scala-notebook/
* https://hub.docker.com/r/zoerepo/spark-master/
* https://hub.docker.com/r/zoerepo/spark-worker/
* https://hub.docker.com/r/zoerepo/spark-submit/
Since the Docker Hub can be quite slow, we strongly suggest setting up a private registry. The ``build_images.sh`` script in the
`zoe-docker-images <https://github.com/DistributedSystemsGroup/zoe-docker-images>`_ repository can help you populate the registry
bypassing the Hub.
The images are quite standard and can be used also without Zoe. Examples on how to do that, are available in the ``scripts/start_cluster.sh`` script.
Set the registry address:port in section ``[docker]`` in ``zoe.conf``. If use Docker Hub, set the option to an empty string.
Apache Web Server configuration
-------------------------------
Install the Apache web server.
A sample virtual host file containing the directives required by Zoe is available in ``scripts/apache-sample.conf``.
This configuration will also proxy zoe-web, that starts on port 5000 by default.
Please note that putting the generated config file in /tmp can be a serious security problem, depending on your setup.
Zoe generates dynamically proxy entries to let users access to the various web interfaces contained in the Spark containers.
To do this, it needs to be able to reload Apache and to write to a configuration file included in the VirtualHost directive.
Zoe is executing ``sudo service apache2 reload`` whenever nedded, so make sure the user that runs Zoe is able to run that command
successfully.
Change as needed the options ``web_server_name``, ``access_log`` and ``proxy_config_file`` in the section ``[apache]`` of ``zoe.conf``.
.. _vision:
Vision for the future of Zoe
============================
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment