Commit 2b1a4f2d authored by Daniele Venzano's avatar Daniele Venzano
Browse files

Add a ROADMAP file

Closes #6
parent 195b51a9
Planned features for Zoe
Extract Spark from the code
Zoe should be independant from Spark and support many data analytics frameworks. Currently Spark is pretty hardcoded, but we should move all application-specific
details into an "application description". This description is fed to the Zoe Scheduler, that becames a generic application scheduler.
Integrate a monitoring solution: Zoe has access to a lot of valuable data that should be recorded and used for feedback and study. Tha data that can be gathered is of two kinds:
1. Events (users starts an execution, cluster finishes, etc.)
2. Statistics: timeseries data gathered from `docker stats`, from the docker hosts (collectd? influxdb?)
Data should be visible by the users. The difficulty of using Grafana for visulatization is that it does not handle well showing graphs from different
time intervals, for example to comapre the executions of two Spark jobs.
Zoe should support creating, listing and selecting inputs and outputs for applications. In particular users should be able to create new HDFS clusters or re-use exsting
ones, created by them ot by other users. They should be able to list the contents of these storage cluster and select inputs and outputs.
Zoe Scheduler should try to place containers trying to satisfy data-locality constraints, keeping the data containers and the compute containers "near".
For now we are thinking about HDFS, but Cassandra is also a possibility.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment