Unfortunately, existing systems for large-scale, distributed machine learning are inflexible and difficult to use. This was not necessarily a problem when models had only tens of millions of parameters, but it is increasingly problematic as models consisting of tens of billions of parameters are common.
...
Our goal is to design new systems for machine learning, from the ground up, that allow for maximum flexibility. All that a programmer needs to do is to specify the model; breaking up the resulting computation—learning or inference---to run on different devices or different machines in a distributed cluster is then automatic. A lot of our ideas for machine learning system design are rooted in techniques used by distributed and parallel database designers for decades. An ML system should automatically figure out the best way to execute a distributed computation, just like a database system automatically figures out the best way to run a distributed SQL computation. The SQL does not change, no matter what the underlying hardware.
Much of our work is based on the idea of the tensor relational algebra, which says that machine learning systems should decompose large sensors into set of sub-tensors, and then operate on them using standard relational operations such as joins, projections, and aggregations.
For a little bit of depth on our ideas, please take a look at this slide deck detailing our ideas in distributed machine learning system design:
...