Commit 92783187 authored by Daniele Venzano's avatar Daniele Venzano

Move out all source in other repositories, update README

parent d1fbc62d
......@@ -58,5 +58,4 @@ docs/_build/
# PyBuilder
target/
.idea/
zoe.conf
rndc.key
language: python
python:
- "3.4"
install:
- pip install -r requirements.txt
- pip install pytest
before_script:
- bash tests/resources/create_db.sh
script:
- PYTHONPATH=. py.test --test-environment travis --cov=zoe_scheduler --cov=zoe_client --cov=zoe_web
- PYTHONPATH=. sphinx-build -nW -b html -d docs/_build/doctrees docs/ docs/_build/html
Zoe - Container-based Analytics as a Service
============================================
Zoe provides a simple way to provision data analytics clusters and
workflows using container-based (Docker) virtualization. The guiding
principles are:
- ease of use: data scientists know about data and applications,
systems and resource constraints should be kept out of the way
- ease of administration: we have a strong background in systems and
network administration, so we put all effort possible to make Zoe
easy to install and maintain
- use well-known technologies: we try hard not to reinvent the wheel,
we use Python, ZeroMQ, Docker and DNS
- a clear roadmap: our short and long-term objectives should always be
clear and well defined
- openness: the source code is open: clone, modify, discuss, test and
contribute, you are welcome!
Zoe provides a simple way to provision data analytics applications using Docker Swarm.
This is the main repository, it contains the documentation and a number of scripts, useful to install and develop Zoe.
We are in the process of updating documentation and scripts. For now you can refer to the version tagged 0.8.92 in this repository,
when all components where still together.
Resources:
- Documentation: http://zoe-analytics.readthedocs.org/
- Docker images:
https://github.com/DistributedSystemsGroup/zoe-docker-images
- Main website: http://zoe-analytics.eu
- Documentation: http://docs.zoe-analytics.eu
- How to install: http://zoe-analytics.readthedocs.org/en/latest/install.html
Zoe is a distributed application and each component is developed in a separate Git repository.
- Zoe clients: https://github.com/DistributedSystemsGroup/zoe-client
- Zoe scheduler: https://github.com/DistributedSystemsGroup/zoe-scheduler
- Zoe object storage: https://github.com/DistributedSystemsGroup/zoe-object-storage
Zoe can use any Docker image, but we provide some for the preconfigured applications available in the web interface:
- Docker images: https://github.com/DistributedSystemsGroup/zoe-docker-images
|Pypi version| |Python version| |Documentation Status| |Requirements Status|
|Documentation Status|
Zoe is licensed under the terms of the Apache 2.0 license.
.. |Pypi version| image:: https://img.shields.io/pypi/v/zoe-analytics.svg
:target: https://pypi.python.org/pypi/zoe-analytics
.. |Python version| image:: https://img.shields.io/pypi/pyversions/Zoe.svg
:target: https://pypi.python.org/pypi/zoe-analytics
.. |Documentation Status| image:: https://readthedocs.org/projects/zoe-analytics/badge/?version=latest
:target: https://readthedocs.org/projects/zoe-analytics/?badge=latest
.. |Requirements Status| image:: https://requires.io/github/DistributedSystemsGroup/zoe/requirements.svg?branch=master
:target: https://requires.io/github/DistributedSystemsGroup/zoe/requirements/?branch=master
import logging
from common.exceptions import InvalidApplicationDescription
log = logging.getLogger(__name__)
class ZoeApplication:
def __init__(self):
self.name = ''
self.version = 0
self.will_end = True
self.priority = 512
self.requires_binary = False
self.processes = []
@classmethod
def from_dict(cls, data):
ret = cls()
try:
ret.version = int(data["version"])
except ValueError:
raise InvalidApplicationDescription(msg="version field should be an int")
except KeyError:
raise InvalidApplicationDescription(msg="Missing required key: version")
required_keys = ['name', 'will_end', 'priority', 'requires_binary']
for k in required_keys:
try:
setattr(ret, k, data[k])
except KeyError:
raise InvalidApplicationDescription(msg="Missing required key: %s" % k)
try:
ret.will_end = bool(ret.will_end)
except ValueError:
raise InvalidApplicationDescription(msg="will_end field must be a boolean")
try:
ret.requires_binary = bool(ret.requires_binary)
except ValueError:
raise InvalidApplicationDescription(msg="requires_binary field must be a boolean")
try:
ret.priority = int(ret.priority)
except ValueError:
raise InvalidApplicationDescription(msg="priority field must be an int")
if ret.priority < 0 or ret.priority > 1024:
raise InvalidApplicationDescription(msg="priority must be between 0 and 1024")
for p in data['processes']:
ret.processes.append(ZoeApplicationProcess.from_dict(p))
found_monitor = False
for p in ret.processes:
if p.monitor:
found_monitor = True
break
if not found_monitor:
raise InvalidApplicationDescription(msg="at least one process should have monitor set to True")
return ret
def to_dict(self) -> dict:
ret = {
'name': self.name,
'version': self.version,
'will_end': self.will_end,
'priority': self.priority,
'requires_binary': self.requires_binary,
'processes': []
}
for p in self.processes:
ret['processes'].append(p.to_dict())
return ret
def total_memory(self) -> int:
memory = 0
for p in self.processes:
memory += p.required_resources['memory']
return memory
def container_count(self) -> int:
return len(self.processes)
class ZoeProcessEndpoint:
def __init__(self):
self.name = ''
self.protocol = ''
self.port_number = 0
self.path = ''
self.is_main_endpoint = False
def to_dict(self) -> dict:
return {
'name': self.name,
'protocol': self.protocol,
'port_number': self.port_number,
'path': self.path,
'is_main_endpoint': self.is_main_endpoint
}
@classmethod
def from_dict(cls, data):
ret = cls()
required_keys = ['name', 'protocol', 'port_number', 'is_main_endpoint']
for k in required_keys:
try:
setattr(ret, k, data[k])
except KeyError:
raise InvalidApplicationDescription(msg="Missing required key: %s" % k)
try:
ret.port_number = int(ret.port_number)
except ValueError:
raise InvalidApplicationDescription(msg="port_number field should be an integer")
try:
ret.is_main_endpoint = bool(ret.is_main_endpoint)
except ValueError:
raise InvalidApplicationDescription(msg="is_main_endpoint field should be a boolean")
if 'path' in data:
ret.path = data['path']
return ret
def get_url(self, address):
return self.protocol + "://" + address + ":{}".format(self.port_number) + self.path
class ZoeApplicationProcess:
def __init__(self):
self.name = ''
self.version = 0
self.docker_image = ''
self.monitor = False # if this process dies, the whole application is considered as complete and the execution is terminated
self.ports = [] # A list of ZoeProcessEndpoint
self.required_resources = {}
self.environment = [] # Environment variables to pass to Docker
self.command = None # Commandline to pass to the Docker container
def to_dict(self) -> dict:
ret = {
'name': self.name,
'version': self.version,
'docker_image': self.docker_image,
'monitor': self.monitor,
'ports': [p.to_dict() for p in self.ports],
'required_resources': self.required_resources.copy(),
'environment': self.environment.copy(),
'command': self.command
}
return ret
@classmethod
def from_dict(cls, data):
ret = cls()
try:
ret.version = int(data["version"])
except ValueError:
raise InvalidApplicationDescription(msg="version field should be an int")
except KeyError:
raise InvalidApplicationDescription(msg="Missing required key: version")
required_keys = ['name', 'docker_image', 'monitor']
for k in required_keys:
try:
setattr(ret, k, data[k])
except KeyError:
raise InvalidApplicationDescription(msg="Missing required key: %s" % k)
try:
ret.monitor = bool(ret.monitor)
except ValueError:
raise InvalidApplicationDescription(msg="monitor field should be a boolean")
if 'ports' not in data:
raise InvalidApplicationDescription(msg="Missing required key: ports")
if not hasattr(data['ports'], '__iter__'):
raise InvalidApplicationDescription(msg='ports should be an iterable')
for pp in data['ports']:
ret.ports.append(ZoeProcessEndpoint.from_dict(pp))
if 'required_resources' not in data:
raise InvalidApplicationDescription(msg="Missing required key: required_resources")
if not isinstance(data['required_resources'], dict):
raise InvalidApplicationDescription(msg="required_resources should be a dictionary")
if 'memory' not in data['required_resources']:
raise InvalidApplicationDescription(msg="Missing required key: required_resources -> memory")
ret.required_resources = data['required_resources'].copy()
try:
ret.required_resources['memory'] = int(ret.required_resources['memory'])
except ValueError:
raise InvalidApplicationDescription(msg="required_resources -> memory field should be an int")
if 'environment' in data:
if not hasattr(data['environment'], '__iter__'):
raise InvalidApplicationDescription(msg='environment should be an iterable')
ret.environment = data['environment'].copy()
for e in ret.environment:
if len(e) != 2:
raise InvalidApplicationDescription(msg='environment variable should have a name and a value')
if not isinstance(e[0], str):
raise InvalidApplicationDescription(msg='environment variable names must be strings: {}'.format(e[0]))
if not isinstance(e[1], str):
raise InvalidApplicationDescription(msg='environment variable values must be strings: {}'.format(e[1]))
if 'command' in data:
ret.command = data['command']
return ret
def exposed_endpoint(self) -> ZoeProcessEndpoint:
for p in self.ports:
assert isinstance(p, ZoeProcessEndpoint)
if p.is_main_endpoint:
return p
return None
This diff is collapsed.
from configparser import ConfigParser
config_paths = [
'zoe.conf',
'/etc/zoe/zoe.conf',
]
defaults = {
'common': {
'object_storage_url': 'http://localhost:4390'
},
'zoe_client': {
'db_connect': 'mysql+mysqlconnector://zoe:pass@dbhost/zoe',
'scheduler_ipc_address': 'localhost',
'scheduler_ipc_port': 8723,
},
'zoe_web': {
'smtp_server': 'smtp.exmaple.com',
'smtp_user': 'zoe@exmaple.com',
'smtp_password': 'changeme',
'cookie_secret': b"\xc3\xb0\xa7\xff\x8fH'\xf7m\x1c\xa2\x92F\x1d\xdcz\x05\xe6CJN5\x83!",
'web_server_name': 'localhost'
},
'zoe_scheduler': {
'swarm_manager_url': 'tcp://swarm.example.com:2380',
'docker_private_registry': '10.1.0.1:5000',
'status_refresh_interval': 10,
'check_terminated_interval': 30,
'db_connect': 'mysql+mysqlconnector://zoe:pass@dbhost/zoe',
'ipc_listen_address': '127.0.0.1',
'ipc_listen_port': 8723,
'ddns_keyfile': '/path/to/rndc.key',
'ddns_server': '127.0.0.1',
'ddns_domain': 'swarm.example.com'
}
}
_zoeconf = None
class ZoeConfig(ConfigParser):
def __init__(self):
super().__init__(interpolation=None)
self.read_dict(defaults)
@staticmethod
def write_defaults(cls, fp):
tmp = cls()
tmp.write(fp)
@property
def db_url(self) -> str:
return self.get('zoe_client', 'db_connect')
@property
def ipc_server(self) -> str:
return self.get('zoe_client', 'scheduler_ipc_address')
@property
def ipc_port(self) -> int:
return self.getint('zoe_client', 'scheduler_ipc_port')
@property
def object_storage_url(self) -> str:
return self.get('common', 'object_storage_url')
@property
def web_server_name(self) -> str:
return self.get('zoe_web', 'web_server_name')
@property
def smtp_server(self) -> str:
return self.get('zoe_web', 'smtp_server')
@property
def smtp_user(self) -> str:
return self.get('zoe_web', 'smtp_user')
@property
def smtp_password(self) -> str:
return self.get('zoe_web', 'smtp_password')
@property
def cookies_secret_key(self):
return self.get('zoe_web', 'cookie_secret')
@property
def check_terminated_interval(self) -> int:
return self.getint('zoe_scheduler', 'check_terminated_interval')
@property
def db_url(self) -> str:
return self.get('zoe_scheduler', 'db_connect')
@property
def status_refresh_interval(self) -> int:
return self.getint('zoe_scheduler', 'status_refresh_interval')
@property
def docker_swarm_manager(self) -> str:
return self.get('zoe_scheduler', 'swarm_manager_url')
@property
def docker_private_registry(self) -> str:
return self.get('zoe_scheduler', 'docker_private_registry')
@property
def ipc_listen_port(self) -> int:
return self.getint('zoe_scheduler', 'ipc_listen_port')
@property
def ipc_listen_address(self) -> str:
return self.get('zoe_scheduler', 'ipc_listen_address')
@property
def ddns_keyfile(self) -> str:
return self.get('zoe_scheduler', 'ddns_keyfile')
@property
def ddns_server(self) -> str:
return self.get('zoe_scheduler', 'ddns_server')
@property
def ddns_domain(self) -> str:
return self.get('zoe_scheduler', 'ddns_domain')
def conf_init(config_file=None) -> ZoeConfig:
global _zoeconf
_zoeconf = ZoeConfig()
if config_file is None:
_zoeconf.read(config_paths)
else:
_zoeconf.read_file(open(config_file))
return _zoeconf
def zoe_conf() -> ZoeConfig:
return _zoeconf
class ZoeException(Exception):
def __init__(self):
self.value = 'Something happened'
def __str__(self):
return repr(self.value)
class CannotCreateCluster(ZoeException):
def __init__(self, application):
self.value = "Cannot create a cluster for application {}".format(application.id)
class InvalidApplicationDescription(ZoeException):
def __init__(self, msg):
self.value = msg
class DDNSUpdateFailed(ZoeException):
def __init__(self, msg):
self.value = msg
__version__ = '0.8.91'
from io import BytesIO
import logging
import zipfile
import requests
import requests.exceptions
from common.configuration import zoe_conf
log = logging.getLogger(__name__)
def generate_storage_url(obj_id: int, kind: str) -> str:
return zoe_conf().object_storage_url + '/{}/{}'.format(kind, obj_id)
def put(obj_id, kind, data: bytes):
url = zoe_conf().object_storage_url + '/{}/{}'.format(kind, obj_id)
files = {'file': data}
try:
requests.post(url, files=files)
except requests.exceptions.ConnectionError:
log.error("Cannot connect to {} to POST the binary file".format(url))
def get(obj_id, kind) -> bytes:
url = zoe_conf().object_storage_url + '/{}/{}'.format(kind, obj_id)
try:
r = requests.get(url)
except requests.exceptions.ConnectionError:
log.error("Cannot connect to {} to GET the binary file".format(url))
return None
else:
return r.content
def check(obj_id, kind) -> bool:
url = zoe_conf().object_storage_url + '/{}/{}'.format(kind, obj_id)
try:
r = requests.head(url)
except requests.exceptions.ConnectionError:
return False
else:
return r.status_code == 200
def delete(obj_id, kind):
url = zoe_conf().object_storage_url + '/{}/{}'.format(kind, obj_id)
try:
requests.delete(url)
except requests.exceptions.ConnectionError:
log.error("Cannot connect to {} to DELETE the binary file".format(url))
def logs_archive_create(execution_id: int, logs: list):
zipdata = BytesIO()
with zipfile.ZipFile(zipdata, "w", compression=zipfile.ZIP_DEFLATED) as logzip:
for c in logs:
fname = c[0] + "-" + c[1] + ".txt"
logzip.writestr(fname, c[2])
put(execution_id, "logs", zipdata.getvalue())
......@@ -15,13 +15,11 @@
import sys
import os
import shlex
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
sys.path.insert(0, os.path.abspath(os.path.join(os.path.basename(__file__), "..", "..")))
from common.version import __version__
# -- General configuration ------------------------------------------------
......@@ -60,9 +58,9 @@ author = 'Daniele Venzano'
# built documents.
#
# The short X.Y version.
version = __version__
version = '0.8.92'
# The full version, including alpha/beta/rc tags.
release = __version__
release = version
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
......
#!/usr/bin/env bash
set -e
rm -Rf dist/ build/ zoe-analytics.egg-info
python3 setup.py sdist
python3 setup.py bdist_wheel
twine upload -r pypi dist/*
from setuptools import setup, find_packages
version = {}
with open("common/version.py") as fp:
exec(fp.read(), version)
version = version['__version__']
long_description = open('README.rst').read()
setup(
name='zoe-analytics',
# Versions should comply with PEP440. For a discussion on single-sourcing
# the version across setup.py and the project code, see
# https://packaging.python.org/en/latest/single_source_version.html
version=version,
description='Zoe - Analytics on demand',
long_description=long_description,
# The project's main homepage.
url='https://github.com/DistributedSystemsGroup/zoe',
# Author details
author='Daniele Venzano',
author_email='venza@brownhat.org',
# Choose your license
license='Apache 2.0',
# See https://pypi.python.org/pypi?%3Aaction=list_classifiers
classifiers=[
# How mature is this project? Common values are
# 3 - Alpha
# 4 - Beta
# 5 - Production/Stable
'Development Status :: 3 - Alpha',
'Environment :: Web Environment',
'Framework :: IPython',
# Indicate who your project is intended for
'Intended Audience :: Developers',
'Intended Audience :: Science/Research',
'Topic :: Education',
'Operating System :: POSIX :: Linux',
'Topic :: Software Development',
'Topic :: System :: Distributed Computing',
# Pick your license as you wish (should match "license" above)
'License :: OSI Approved :: Apache Software License',
# Specify the Python versions you support here. In particular, ensure
# that you indicate whether you support Python 2, Python 3 or both.
'Programming Language :: Python :: 3 :: Only',
'Programming Language :: Python :: 3.4',
],
# What does your project relate to?
keywords='spark analytics docker swarm containers notebook',
# You can just specify the packages manually here if your project is
# simple. Or you can use find_packages().
packages=find_packages(exclude=['scripts', 'tests']),
# List run-time dependencies here. These will be installed by pip when
# your project is installed. For an analysis of "install_requires" vs pip's
# requirements files see:
# https://packaging.python.org/en/latest/requirements.html
install_requires=['docker-py>=1.5.0',
'Flask>=0.10.1',
'python-dateutil>=2.4.2',
'SQLAlchemy>=1.0.8',
'tornado>=4.2.1',
'pyzmq>=14.0.1',
'requests',
'dnspython3'
],
# List additional groups of dependencies here (e.g. development
# dependencies). You can install these using the following syntax,
# for example:
# $ pip install -e .[dev,test]
extras_require={
'dev': ['Sphinx', 'wheel', 'twine'],
'test': ['pytest-cov', 'pytest'],
},
# If there are data files included in your packages that need to be
# installed, specify them here. If using Python 2.6 or less, then these
# have to be included in MANIFEST.in as well.
package_data={
'': ['*.sh', '*.conf', '*.rst', '*.css', '*.js', '*.html'],
},
# Although 'package_data' is the preferred approach, in some case you may
# need to place data files outside of your packages. See:
# http://docs.python.org/3.4/distutils/setupscript.html#installing-additional-files # noqa
# In this case, 'data_file' will be installed into '<sys.prefix>/my_data'
# data_files=[('my_data', ['data/data_file'])],
# To provide executable scripts, use entry points in preference to the
# "scripts" keyword. Entry points provide cross-platform support and allow