Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Image Added

Unfortunately, existing systems for large-scale, distributed machine learning are inflexible and difficult to use. This was not necessarily a problem when models had only tens of millions of parameters, but it is increasingly problematic as models consisting of tens of billions of parameters are common.

For example, the largest iteration of the LLaMA large-language model released by MetaAI has 65 billion parameters, and yet it is hard-coded to work on a server having eight GPUs. Not everyone has such a server! What if a potential user has a server with four GPUs? Or what one has six servers available, with fur GPUs each? What if no GPUs are available at all, one has access to a compute cluster with eight CPU machines?  In any of these cases, a potential user would have to write a lot of code to get LLaMA to work, and this development task is likely beyond all but the most sophisticated users. The problems are myriad; for example, the 65B parameter LLaMA model itself comes "pre-decomposed" to run on eight GPUs. That is, various tensors composing the model have all been broken up eight ways, and a programmer wanting to run the model on different hardware must "re-compose" the tensors, and then break them up again for the new hardware.

...

Our goal is to design new systems for machine learning, from the ground up, that allow for maximum flexibility. All that a programmer needs to do is to specify the model; breaking up the resulting computation—learning or inference---to run on different devices or different machines in a distributed cluster is then automatic. A lot of our ideas for machine learning system design are rooted in techniques used by distributed and parallel database designers for decades. An ML system should automatically figure out the best way to execute a distributed computation, just like a database system automatically figures out the best way to run a distributed SQL computation. The SQL does not change, no matter what the underlying hardware.

Much of our work is based on the idea of the tensor relational algebra, which says that machine learning systems should decompose large sensors into set of sub-tensors, and then operate on them using standard relational operations such as joins, projections, and aggregations.

For a little bit of depth on our ideas, please take a look at this slide deck detailing our ideas in distributed machine learning system design:

...