The ZApp format (the JSON file) needs to be cleaned-up and updated with new fields.
This is clean-up work that takes a small amount of time with big gains in ease of understanding what each field does. The more detailed fields for resource reservations are needed for our research into dynamic resource allocation to progress.
The following fields can be removed:
The following fields need to be renamed:
Priority → size
The port description needs to be changed and support the following fields:
name: same as now
url_template: a template for an URL exposed by the service, the hostname/port part will be filled-in by Zoe at run time
port_number: as now
protocol: tcp or udp
Resource limits should be enriched:
The ZAPP should start if there is the minimum of resources available and il should not be allowed to go over the maximum.
All input/output performed by Zoe using JSON should be validated by an appropriate schema.
Expand the API for faster UI
Implement filtering capability to the execution list API. Currently this API endpoint can only return a full list of all executions for the authenticated user (all of them if the user is admin).
After a while the list of executions grows to thousands of entries and the API becomes slow.
With the possibility of filtering the list of executions to return, the UI can generate queries that limit the amount of data to retrieve to what is actually needed.
ZApp CI pipeline (aka ZApp packaging)
Formally describe a ZApps package format (json + Dockerfile(s) and associated files) and put each ZApp in its own repository to ease testing, rebuilding and maintenance.
API monitoring subsystem
Expose metrics on API usage and other statistics.
Workspace revision (proposal)
Revise workspaces and how users interact with Zoe to provide executables and data to the ZApps.
Three kinds of storage should be considered:
User code and binaries (the application JAR for spark, the python script for Tensorflow, etc)
Input data (may be available in a data lake, may be provided as part of the execution, may be on the internet somewhere)
Output data (logs, other outputs generated by the application)
User priorities and quotas (proposal)
Define a system of labels that can be associated to users (and groups) to modify the scheduler behavior, for priorities or quotas.
Multi-queue scheduler with priorities (proposal)
Create a new scheduler that can compete feature-wise with Yarn, Mesos and the classic schedulers from the HPC world. Multiple queues with different priorities and different policies.
Zoe HA architecture (proposal)
Replication Vs failover, chose the one better suited for Zoe and modify the architecture to accommodate it.