Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Section | ||||||||
---|---|---|---|---|---|---|---|---|
The Hetrogeneous Habanero-C (HC) language under development in the Habanero project at Rice University provides and implementation of the Habanero execution model for modern heterogeneous (CPU + GPU) architectures.
OverviewThe Heterogeneous Habanero-C (H2C) language, compiler and runtime framework is specifically desgined to achieve portability, productivity and performance on modern heterogeneous (CPU+ GPU) architectures. The main goal is take a machine indpendent program written in H2C and generate a machine specific executable. Some highlights of H2C include:
H2C requires the underlying hardware underlying platform to support OpenCL. The H2C compiler uses a machine description file given by a Machine Description(MDes) file provided by an user or automatically generated by an auto-tuner and generates tuner to generate a target specific executable. A short summary of the H2C language H2C framework is included below. Details Details on the underlying implementation technologies can be found in the Habanero publications web page. The H2C implementation is still evolving at an early stage. If you would like to try out H2C, please contact one of the following people: Deepak Majeti, or or Vivek Sarkar. H2C Language SummaryThe language constructs are classified into communication and computation constructs. Constructs for CommunicationHeterogeneous Habanero- C extendsC extends the HabaneroHabanero async , forasync and finishconstructs to target modern heterogeneous architectures. TheThe async construct, async, is used to asynchronously transfer the data among multiple devices. Execution of the async statement returns immediately, i.e., the parent task can proceed to its following statement without waiting for the child task to complete. TheOne can easily overlap computation with the asynchronous data transfers. The finish statement, finish <stmt>, performs a join operation that causes the parent task to execute <stmt> and then wait until all the tasks created within <stmt> have terminated (including transitively spawned tasks). The Habanero-C runtime uses a work-stealing scheduler that supports work-first and help-first policies along with places for localityFor locality, Habanero-C uses Hierarchical Place Trees(HPTs). HPTs abstract the underlying hardware using hierarchical trees, allowing the program to spawn tasks at places, which for example could be cores, groups of cores sharing cache, nodes, groups of nodes, or other devices such as GPUs or FPGAs. The work-stealing runtime takes advantage of the hardware hierarchy to preserve locality when executing tasks.
ensures all the data transfers within <stmt> have completed. async [copyin (var1, var2, H2C Language SummaryConstructs for Communicationasync [(place)] [IN (var1, var2, ...)] [COPYIN (var1, var2,COPYOUT...)] [ ATcopyout (var1, var2, ...)] [ PARTITIONat (dev1, dev2, ...)] [
- Asynchronously start a new task to execute Stmt in parallel with the parent. A destination place can optionally be specified for where the task should execute. The place can be obtained from the runtime using HC runtime functions (see HPT). - Any local variable declared in an outer scope that is used in the async has to be specified in an IN (for variables read by the async), OUT(for variables written by the async), or INOUT(for variables both read and written by the async) clauses. The IN/OUT/OUT clauses have copy-in/copy-out semantics for local variables; selected variables are copied in from the parent scope at the start of the async, and out into the parent scope at the end of the async task. - an AWAIT clause can optionally be specified, listing all the data-driven futures (DDF's) that the task should wait on before starting its execution. - a phased clause can optionally be specified, registering the async on all the phasers specified in the list (ph1, ph2, ...), or on all the phasers of the parent (if the list is not specified). finish Stmt - execute Stmt, but wait until all (transitively) spawned asyncs in Stmt's scope have terminated before advancing to the next statement.
Constructs for Computationforasync [in (var1, var2, ...)] [point (ind1, ind2, ...)] [size (siz1, siz2, ...)] [seq (seq1, seq2, ...)] Body-- The semantics of the in clause is the same as in the async case. -- Loop indices in each dimension are specified by the point clause. -- The number of iterations in each dimension is specified by the size clause. -- The tile size is specified by the seq clause. forasync is lowered and implemented for CPUs in two different ways as follows.
forasync targets heterogeneous platforms by automatically generating host code and OpenCL device code. Note: The semantics of forasync does not include a barrier. An explicit finish must enclose the forasync to synchronize all the iterations.
H2C Compiler and Runtime FrameworkH2C uses a two-step compilation.
Current H2C limitationsThere are some limitations and pitfalls in the current implementation of the HC programming model. These limitations are not inherent to the programming model, but rather are a result of incompleteness in the current compiler or runtime implementation. 1) Pointers to stack variables (including stack-allocated arrays) cannot be reused across "suspendable" points. A suspendable function is a function that can directly or indirectly call a function containing an async statement or a finish statement. A suspendable point is an async statement, the end of a finish statement, or a call to a suspendable function. Work-around: copy these stack variables to the heap. There is no limitation on the reuse of heap pointers across suspendable points. 2) Pointers to HC functions (functions that contain HC constructs or call other HC functions) are not supported in HC. Work-around: only use pointers to C functions. 3) const modifiers are not supported for function parameters or local variables in HC programs. Work-around: remove these 'const' modifiers. The semantics of a correct program will remain unchanged, since the only purpose of the 'const' modifiers is to enforce additional compiler checking. 4) The number of tasks registered on a phaser cannot be larger than the number of worker threads specified with the -nproc option when an HC program is invoked. Otherwise, a deadlock may occur. 5) HC function calls must be in canonical form; either being a statement or being the right hand-side of an assignment.
AcknowledgementPartial support for Habanero-C was provided through the CDSC program of the National Science Foundation with an award in the 2009 Expedition in Computing Program. |
Page Tree |
---|