Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Section | |
---|---|
The Habanero-C (HC) language under development in the Habanero project at Rice University builds on past work on Habanero-Java, which in turn was derived from X10 v1.5
OverviewThe Habanero-C (HC) language under development in the Habanero project at Rice University builds on past work on Habanero-Java, which in turn was derived from X10 v1.5. HC serves as a research testbed for new compiler and runtime software technologies for extreme scale systems for homogeneous and heterogeneous processors. Like HJ, HC implements an execution model for multicore processors based on four orthogonal dimensions for portable parallelism:
Unlike HJ which needs a JVM to run, Habanero-C is designed to be mapped onto hardware platforms with lightweight system software stacks, such as the Customizable Heterogeneous Platform (CHP) being developed in the NSF Expeditions Center for Domain-Specific Computing (CDSC) which includes CPUs, GPUs, and FPGAs. The C foundation also makes it easier to integrate HC with communication middleware for cluster systems, such as MPI and GASNet. The Habanero-C compiler is written in C++ and is built on top of the ROSE compiler infrastructure, which is also used in the DARPA-funded PACE project under way at Rice University. The bulk of the Habanero-C runtime has been written from scratch in portable ANSI C. However, a few library routines for low-level synchronization and atomic operations are written in assembly language for the target platform. To date, the Habanero-C runtime has been ported and tested on Intel X86, Cyclops 64 and Intel SCC multicore platforms. A short summary of the HC language is included below. Details on the underlying implementation technologies can be found in the Habanero publications web page. The HC implementation is still evolving at an early stage. If you would like to try out HC, please contact one of the following people: Zoran Budimlić, Vincent Cave, or Vivek Sarkar.
HC Language SummaryTask Creationasync [(place)] [IN (var1, var2, ...)] [OUT (var1, var2, ...)] [INOUT (var1, var2, ...)] [AWAIT (ddf1, ddf2, ...)] [phased] Stmt - Asynchronously start a new task to execute Stmt in parallel with the parent. A destination place can optionally be specified for where the task should execute. The place can be obtained from the runtime using HC runtime functions (see HPT). - Any local variable declared in an outer scope that is used in the async has to be specified in an IN (for variables read by the async), OUT(for variables written by the async), or INOUT(for variables both read and written by the async) clauses. - an AWAIT clause can optionally be specified, listing all the data-driven futures (DDF's) that the task should wait on before starting its execution. - a phased clause can optionally be specified, registering the async on all the phasers specified in the list (ph1, ph2, ...), or on all the phasers of the parent (if the list is not specified). finish Stmt - execute Stmt, but wait until all (transitively) spawned asyncs in Stmt's scope have terminated before advancing to the next statement. Task SynchronizationDDF_CREATE() --- a library function that creates a Data-Driven Future (DDF), and returns a pointer to a DDF_t type. A DDF is a single-assignment container that is initially empty, and becomes full after a DDF_PUT operation is performed on it. DDF_GET(DDF_t * ddf) --- if ddf is full, return the value stored in ddf's container. If ddf is empty, the runtime will exit with an assertion failure. DDF_PUT(DDF_t * ddf, void * value) --- if ddf is empty, perform a put of value into ddf. If ddf already has a value, the runtime will exit with an assertion failure. async AWAIT (ddf1, ddf2, ...) Stmt --- wait until all the DDF's in the list (ddf1, ddf2, ...) have their values filled in before asynchronously starting the execution of Stmt. Stmt can safely perform a GET on the DDF's specified in the list. phaser *ph = PHASER_CREATE(mode) --- create a phaser and register the calling task on the phaser with the specified mode. async phased Stmt --- register the async with all phasers created by the parent in the immediate enclosing finish scope and asynchronously execute Stmt async phased SIGNAL_ONLY( ph1, ph2, ... ) WAIT_ONLY( ph3, ph4, ... ) SIGNAL_WAIT( ph5, ph6, ... ) Stmt --- register an async on specific phasers with specific modes. The parent should be registered on all the phasers in modes that are greater than or equal to the modes of the child as shown below. SIGNAL_WAIT > SIGNAL_ONLY NEXT--- synchronize on all the phasers that the task is registered on. Hierarchical Place Trees (HPT's)Hierachical place tree of the machine you want to execute your HC program on is specified in an .xml file. For example, the .xml file for a single Sugar computational node looks like this: <?xml version="1.0"?> You specify the HPT of the machine when invoking your HC executable, for example: ./a.out -nproc 8 -hpt sugar.xml Some HC functions that allow you to navigate the HPT: short is_cpu_place(place_t * pl) --- is pl a CPU place? short is_device_place(place_t * pl) --- is pl a device (GPU or FPGA) place? short is_nvgpu_place(place_t * pl) --- is pl a NVIDIA GPU place? place_t* hc_get_current_place() --- get the place where the task is currently executing int hc_get_num_places(short type) --- get the number of places of the specified type (NVGPU_PLACE, MEM_PLACE or FPGA_PLACE) void hc_get_places(place_t ** pls, short type) --- get an array of all the places of the specified type place_t * hc_get_place(short type) --- get any place of the specified type place_t * get_ancestor_place(hc_workerState * ws) place_t * hc_get_child_place() --- get the child place on the path from the current place to the leaf place of the current worker place_t * hc_get_parent_place() --- get the parent place of the current place in HPT place_t ** hc_get_children_places(int * numChildren) --- get an array of all the child places of the current place
DownloadNot yet available Current HC limitationsThere are some limitations due to the current implementation of the HC programming model. These limitations are not inherent to the programming model, but rather are a result of some incompleteness in our compiler or runtime implementation. - pointers to stack variables cannot be passed to async tasks. This can result in unpredictable behavior. Workaround: convert all the data meant to be shared among tasks to heap-allocated - pointers to HC functions (functions that contain HC constructs or call other HC functions) are not allowed in HC - 'const' modifiers are not allowed for function parameters or local variables in HC programs. Workaround: remove the 'const' modifiers. The program semantic will remain unchanged, as the only purpose of the 'const' modifiers is to enforce some additional compiler checking. - number of tasks registered on a phaser cannot be bigger than the number of worker threads specified with the -nproc option when the HC-compiled executable is invoked. Otherwise, a deadlock can occur.
|