Commit 7baf4833 authored by Daniele Venzano's avatar Daniele Venzano

Update README and requirements, create a new script to create the table in an empty db

parent fbcf666b
# Zoe - Container Analytics as a Service
This application uses Docker Swarm to run on-demand Spark clusters.
This application uses Docker Swarm to run Analytics as a Service applications. Currently only Spark is supported, but we are planning inclusion of other frameworks.
It is composed of three components:
It is composed of:
* zoectl: command-line client
* zoe: command-line client
* zoe-scheduler: the main daemon that performs application scheduling and talks to Swarm
* zoe-web: the web service
......@@ -15,22 +15,46 @@ It is composed of three components:
* A Docker registry containing Spark images
* Apache to act as a reverse proxy
## Configuration
## How to install
1. Clone this repository
2. Generate a sample configuration file with `zoe.py write-config zoe.conf`
3. Edit `zoe.conf` and check/modify the following sections (the other sections are covered below):
* flask (use in a python interpreter `import os; os.urandom(24)` to generate a new key)
* filesystem
* smtp
4. Setup supervisor to manage Zoe processes: in the `scripts/supervisor/` directory you can find the configuration file for
supervisor. You need to modify the paths to point to where you cloned Zoe.
5. Start running applications!
Zoe configuration is read from an 'ini' file, the following locations are searched for a file names `zoe.conf`:
* working path (.)
* /etc/zoe
A sample configuration file, containing default values for all options can be generated by running `zoectl.py write-conf zoe.conf`
### DB
1. Install MySQL/MariaDB, or any other DB supported by SQLAlchemy.
2. Create a database, a user and a password and use these to build a connection string like `mysql://<user>:<password>@host/db`
3. Put this string in section `[db]` of zoe.conf
### Swarm/Docker
For testing you can use also a single Docker instance, just set its endpoint in the configuration file mentioned above.
Install Docker and the Swarm container:
* https://docs.docker.com/installation/ubuntulinux/
* https://docs.docker.com/swarm/install-manual/
For testing you can use a Swarm with a single Docker instance located on the same host/VM.
#### Network configuration
To use Swarm, we use an undocumented network configuration, with the docker bridges connected to a physical interface, so that
Zoe assumes that containers placed on different hosts are able to talk to each other freely. Since we use Docker on bare metal, we
use an undocumented network configuration, with the docker bridges connected to a physical interface, so that
containers on different hosts can talk to each other on the same layer 2 domain.
To do that you need also to reset the MAC address of the bridge, otherwise bridges on different hosts will have the same MAC address.
### Images: Docker Hub Vs local Docker registry
Other configurations are possible, but configuring Docker networking is outside the scope of this document.
#### Images: Docker Hub Vs local Docker registry
The images used by Zoe are available on the Docker Hub:
......@@ -46,45 +70,22 @@ bypassing the Hub.
The images are quite standard and can be used also without Zoe, for examples
on how to do that, see the `scripts/start_cluster.sh` script.
Set the registry address:port in section `[docker]` in `zoe.conf`. If use Docker Hub, set the option to an empty string.
### Apache configuration
Zoe generates dynamically proxy entries to let users access to the various web interfaces contained in the Spark containers.
To do this, it needs to be able to reload Apache and to write to a configuration file included in the VirtualHost directive.
Install the Apache web server.
Here is an example configuration for a virtual host:
```
ProxyHTMLLinks a href
ProxyHTMLLinks area href
ProxyHTMLLinks link href
ProxyHTMLLinks img src longdesc usemap
ProxyHTMLLinks object classid codebase data usemap
ProxyHTMLLinks q cite
ProxyHTMLLinks blockquote cite
ProxyHTMLLinks ins cite
ProxyHTMLLinks del cite
ProxyHTMLLinks form action
ProxyHTMLLinks input src usemap
ProxyHTMLLinks head profile
ProxyHTMLLinks base href
ProxyHTMLLinks script src for
ProxyHTMLEvents onclick ondblclick onmousedown onmouseup \
onmouseover onmousemove onmouseout onkeypress \
onkeydown onkeyup onfocus onblur onload \
onunload onsubmit onreset onselect onchange
ProxyRequests Off
<Location />
ProxyHtmlEnable On
ProxyHTMLExtended On
ProxyPass http://127.0.0.1:5000/ retry=0
ProxyPassReverse http://127.0.0.1:5000/
</Location>
IncludeOptional /tmp/zoe-proxy.conf*
```
A sample virtual host file containing the directives required by Zoe is available in `scripts/apache-sample.conf`.
This configuration will also proxy zoe-web, that starts on port 5000 by default.
Please note that putting the generated config file in /tmp can be a serious security problem, depending on your setup.
Zoe generates dynamically proxy entries to let users access to the various web interfaces contained in the Spark containers.
To do this, it needs to be able to reload Apache and to write to a configuration file included in the VirtualHost directive.
Zoe is executing `sudo service apache2 reload` whenever nedded, so make sure the user that runs Zoe is able to run that command
succesfully.
Change as required options `web_server_name`, `access_log` and `proxy_config_file` in the section `[apache]` of `zoe.conf`.
\ No newline at end of file
<VirtualHost *:80>
ServerAdmin webmaster@localhost
DocumentRoot /var/www/html
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined
ProxyHTMLLinks a href
ProxyHTMLLinks area href
ProxyHTMLLinks link href
ProxyHTMLLinks img src longdesc usemap
ProxyHTMLLinks object classid codebase data usemap
ProxyHTMLLinks q cite
ProxyHTMLLinks blockquote cite
ProxyHTMLLinks ins cite
ProxyHTMLLinks del cite
ProxyHTMLLinks form action
ProxyHTMLLinks input src usemap
ProxyHTMLLinks head profile
ProxyHTMLLinks base href
ProxyHTMLLinks script src for
ProxyHTMLEvents onclick ondblclick onmousedown onmouseup \
onmouseover onmousemove onmouseout onkeypress \
onkeydown onkeyup onfocus onblur onload \
onunload onsubmit onreset onselect onchange
ProxyRequests Off
<Location />
ProxyHtmlEnable On
ProxyHTMLExtended On
ProxyPass http://127.0.0.1:5000/ retry=0
ProxyPassReverse http://127.0.0.1:5000/
</Location>
IncludeOptional /tmp/zoe-proxy.conf
</VirtualHost>
[program:rpyc-registry]
command=/usr/local/bin/rpyc_registry.py
directory=/tmp
autostart=true
autorestart=true
startretries=3
stderr_logfile=/var/log/rpyc/registry.err.log
stdout_logfile=/var/log/rpyc/registry.out.log
user=ubuntu
#!/usr/bin/env python3
from argparse import ArgumentParser, Namespace
import logging
from common.state import create_tables
argparser = None
def setup_db_cmd(_):
create_tables()
def process_arguments() -> Namespace:
global argparser
argparser = ArgumentParser(description="Zoe - Container Analytics as a Service ops client")
argparser.add_argument('-d', '--debug', action='store_true', default=False, help='Enable debug output')
subparser = argparser.add_subparsers(title='subcommands', description='valid subcommands')
argparser_setup_db = subparser.add_parser('setup-db', help="Create the tables in the database")
argparser_setup_db.set_defaults(func=setup_db_cmd)
return argparser.parse_args()
def main():
args = process_arguments()
if args.debug:
logging.basicConfig(level=logging.DEBUG)
else:
logging.basicConfig(level=logging.INFO)
try:
args.func(args)
except AttributeError:
argparser.print_help()
return
if __name__ == "__main__":
main()
......@@ -6,7 +6,6 @@ from zipfile import is_zipfile
from pprint import pprint
from zoe_client import ZoeClient
from common.state import create_tables
from common.configuration import zoeconf
argparser = None
......@@ -22,10 +21,6 @@ def stats_cmd(args):
pprint(stats)
def setup_db_cmd(_):
create_tables()
def user_new_cmd(args):
client = get_zoe_client(args)
user = client.user_new(args.email)
......@@ -83,12 +78,12 @@ def app_rm_cmd(args):
if args.force:
a = client.application_get(application.id)
for eid in a.executions:
e = client.execution_get(eid)
e = client.execution_get(eid.id)
if e.status == "running":
print("Terminating execution {}".format(e.name))
client.execution_terminate(e.id)
client.application_remove(application.id)
client.application_remove(application.id, args.force)
def app_inspect_cmd(args):
......@@ -155,9 +150,6 @@ def process_arguments() -> Namespace:
argparser_user_get.add_argument('email', help="User email address")
argparser_user_get.set_defaults(func=user_get_cmd)
argparser_setup_db = subparser.add_parser('setup-db', help="Create the tables in the database")
argparser_setup_db.set_defaults(func=setup_db_cmd)
argparser_spark_cluster_create = subparser.add_parser('app-spark-cluster-new', help="Setup a new empty Spark cluster")
argparser_spark_cluster_create.add_argument('--user-id', type=int, required=True, help='Application owner')
argparser_spark_cluster_create.add_argument('--name', required=True, help='Application name')
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment