Commit ccfed885 authored by Daniele Venzano's avatar Daniele Venzano

Update documentation

parent 6f35dfce
...@@ -5,10 +5,7 @@ Zoe provides a simple way to provision data analytics applications using Docker ...@@ -5,10 +5,7 @@ Zoe provides a simple way to provision data analytics applications using Docker
This is the main repository, it contains the documentation and a number of scripts, useful to install and develop Zoe. This is the main repository, it contains the documentation and a number of scripts, useful to install and develop Zoe.
We are in the process of doing a major refactoring of the entire codebase and the HEAD version is not usable at this time, We are in the process of doing a major refactoring of the entire codebase and the HEAD version is not fully tested.
but feel free to have a look and make suggestions. See below for links to the repositories of
For now you can refer to the version tagged 0.8.92 in this repository, when all components where still together in one repository.
Resources: Resources:
...@@ -21,7 +18,7 @@ Zoe is a distributed application and each component is developed in a separate G ...@@ -21,7 +18,7 @@ Zoe is a distributed application and each component is developed in a separate G
- Zoe clients: https://github.com/DistributedSystemsGroup/zoe-client - Zoe clients: https://github.com/DistributedSystemsGroup/zoe-client
- Zoe scheduler: https://github.com/DistributedSystemsGroup/zoe-scheduler - Zoe scheduler: https://github.com/DistributedSystemsGroup/zoe-scheduler
Zoe can use any Docker image, but we provide some for the pre-configured applications available in the web interface: Zoe can use any Docker image, but we provide some for the pre-configured applications available in the client (Spark and HDFS):
- Docker images: https://github.com/DistributedSystemsGroup/zoe-docker-images - Docker images: https://github.com/DistributedSystemsGroup/zoe-docker-images
......
.. Zoe documentation master file, created by
sphinx-quickstart on Fri Sep 11 15:11:20 2015.
Zoe - Container-based Analytics as a Service Zoe - Container-based Analytics as a Service
============================================ ============================================
...@@ -18,17 +15,20 @@ Zoe can use a Docker Swarm located anywhere, on Amazon or in your own private cl ...@@ -18,17 +15,20 @@ Zoe can use a Docker Swarm located anywhere, on Amazon or in your own private cl
your Swarm could also be running other services: Zoe will not interfere with them. Zoe is meant as a private service, adding data-analytics your Swarm could also be running other services: Zoe will not interfere with them. Zoe is meant as a private service, adding data-analytics
capabilities to existing, or new, Docker clusters. capabilities to existing, or new, Docker clusters.
While the core components of Zoe are application-independent, the web interface currently supports the `Spark framework <http://spark.apache.org/>`_ with Scala or iPython notebooks. The core components of Zoe are application-independent and a user can submit application description for any kind of service combination. Since Zoe targets
We are working on providing easy access to the following software suites: analytics services in particular, the client tools offer some pre-configured Zoe applications that can be used as starting examples.
To better understand what we mean by "analytic service", here are a few examples:
* Spark
* Zookeeper * Zookeeper
* Hadoop (HDFS in particular) * Hadoop (HDFS in particular)
* Cassandra * Cassandra
* Impala * Impala
* More to come, suggestions welcome! * ... suggestions welcome!
Have a look at the :ref:`vision` and at the `roadmap <https://github.com/DistributedSystemsGroup/zoe/blob/master/ROADMAP.rst>`_ to see what we are currently planning Have a look at the :ref:`vision` and at the `roadmap <https://github.com/DistributedSystemsGroup/zoe/blob/master/ROADMAP.rst>`_ to see what we are currently
and feel free to `contact us <venza@brownhat.org>`_ via email or through the GitHub issue tracker to pose questions or suggest ideas and new features. planning and feel free to `contact us <venza@brownhat.org>`_ via email or through the GitHub issue tracker to pose questions or suggest ideas and new features.
Contents: Contents:
......
Installing Zoe Installing Zoe
============== ==============
Zoe install procedure is migrating from PIP to Docker containers. The codebase is being split in several components, each will live in its own repository with a Dockerfile Zoe components:
to build the container image.
* scheduler
* observer
* web client
* command-line client
Zoe is written in Python and uses the ``resource.txt`` file to list library dependencies.
Requirements Requirements
------------ ------------
* An SQL database to keep all the state (sqlite is used by default)
* Docker Swarm * Docker Swarm
* A DNS server for service discovery, with DDNS support
Optional: Optional:
* A Docker registry containing Zoe images for Docker image caching * A Docker registry containing Zoe images for faster container startup times
How to install
--------------
First of all make sure you have installed the three requirements listed above.
Database
--------
1. Install MySQL/MariaDB, or any other DB supported by SQLAlchemy.
2. Create a database, a user and a password and use these to build a connection string like ``mysql://<user>:<password>@host/db``
Two different Zoe processes use the database and the config file provides separate options. If you feel the need, you can setup different databases too.
Swarm/Docker Swarm/Docker
------------ ------------
Install Docker and the Swarm container: Install Docker and the Swarm container:
* https://docs.docker.com/installation/ubuntulinux/ * https://docs.docker.com/installation/ubuntulinux/
* https://docs.docker.com/swarm/install-manual/ * https://docs.docker.com/swarm/install-manual/
...@@ -40,12 +32,9 @@ For testing you can use a Swarm with a single Docker instance located on the sam ...@@ -40,12 +32,9 @@ For testing you can use a Swarm with a single Docker instance located on the sam
Network configuration Network configuration
^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^
Zoe assumes that containers placed on different hosts are able to talk to each other freely. Since we use Docker on bare metal, we Docker 1.9/Swarm 1.0 multi-host networking is used in Zoe:
use an undocumented network configuration, with the docker bridges connected to a physical interface, so that
containers on different hosts can talk to each other on the same layer 2 domain.
To do that you need also to reset the MAC address of the bridge, otherwise bridges on different hosts will have the same MAC address.
Other configurations are possible, but configuring Docker networking is outside the scope of this document. * https://docs.docker.com/engine/userguide/networking/get-started-overlay/
Images: Docker Hub Vs local Docker registry Images: Docker Hub Vs local Docker registry
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...@@ -58,38 +47,17 @@ Since the Docker Hub can be quite slow, we strongly suggest setting up a private ...@@ -58,38 +47,17 @@ Since the Docker Hub can be quite slow, we strongly suggest setting up a private
`zoe-docker-images <https://github.com/DistributedSystemsGroup/zoe-docker-images>`_ repository can help you populate the registry `zoe-docker-images <https://github.com/DistributedSystemsGroup/zoe-docker-images>`_ repository can help you populate the registry
bypassing the Hub. bypassing the Hub.
The images are quite standard and can be used also without Zoe. Examples on how to do that, are available in the ``scripts/start_cluster.sh`` script. The images are quite standard and can be used also without Zoe. Examples on how to do that, are available in the ``scripts/`` directory.
DNS Server
----------
Setting up a DNS server is not a simple task, but it is a necessary evil. DNS is standard, any service discovery implemented via DNS will work out of the box.
We are currently using Bind as a DNS server internally for all naming needs, Bind is well documented, old, stable and proven, set it up right once and you are done.
If the need arises adding support for other Dynamic DNS update protocols is easy, contact us if you need help.
Zoe Zoe
--- ---
Releases are also available through pip: ``pip install zoe-analytics`` Currently this is the recommended procedure:
For developers, we recommend the following procedure: 1. Clone the zoe-scheduler repository
2. Create new configuration files for the scheduler and the observer (:ref:`config_file`)
1. Clone this repository 3. Setup supervisor to manage Zoe processes: in the ``scripts/supervisor/`` directory you can find the configuration file for
2. Generate a sample configuration file with ``zoe.py write-config zoe.conf``
3. Edit ``zoe.conf`` using :ref:`config_file`
4. Create the tables in the database with ``zoe.py --setup-db`` and ``zoe-scheduler.py --setup-db``
5. Setup supervisor to manage Zoe processes: in the ``scripts/supervisor/`` directory you can find the configuration file for
supervisor. You need to modify the paths to point to where you cloned Zoe and the user (Zoe does not need special privileges). supervisor. You need to modify the paths to point to where you cloned Zoe and the user (Zoe does not need special privileges).
6. Start running applications! By default Zoe web listens on the 5000 port 4. Clone the zoe-client repository
5. Start running applications using the command-line client! (the web interface will be coming soon)
Zoe Object Storage
^^^^^^^^^^^^^^^^^^
Application binaries and execution logs are saved in a simple Object Storage server.
* Clone it from git: https://github.com/DistributedSystemsGroup/zoe-object-storage
* Use the Dockerfile to build a Docker image
* Run it
* Put the IP address of the container in Zoe's main configuration file (when the transition to Dockerfiles will be finished it will be possible to use linking instead)
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment