Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
LLVMOPENMP(1)			  LLVM/OpenMP			 LLVMOPENMP(1)

NAME
       llvmopenmp - LLVM/OpenMP

       NOTE:
	  This document	is a work in progress and most of the expected content
	  is  not  yet available. While	you can	expect changes,	we always wel-
	  come	feedback  and  additions.  Please   contact,   e.g.,   through
	  openmp-dev@lists.llvm.org.

       OpenMP impacts various parts of the LLVM	project, from the frontends (-
       Clang and Flang), through middle-end optimizations, up to the multitude
       of available OpenMP runtimes.

       A high-level overview of	OpenMP in LLVM can be found here.

OPENMP IN LLVM --- DESIGN OVERVIEW
   Resources
        OpenMP	  Booth	  @   SC19:   "OpenMP  clang  and  flang  Development"
	 https://youtu.be/6yOa-hRi63M

   LLVM/OpenMP Runtimes
       There are four distinct types of	LLVM/OpenMP runtimes

   LLVM/OpenMP Host Runtime (libomp)
       An early	(2015) design document for the LLVM/OpenMP host	runtime,  aka.
       libomp.so, is available as a pdf.

   LLVM/OpenMP Target Host Runtime (libomptarget)
   Environment Variables
       libomptarget  uses  environment variables to control different features
       of the library at runtime. This allows the user to obtain  useful  run-
       time  information as well as enable or disable certain features.	A full
       list of supported environment variables is defined below.

	   LIBOMPTARGET_DEBUG=<Num>

	   LIBOMPTARGET_PROFILE=<Filename>

	   LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD=<Num>

	   LIBOMPTARGET_INFO=<Num>

   LIBOMPTARGET_DEBUG
       LIBOMPTARGET_DEBUG controls whether or not debugging  information  will
       be  displayed. This feature is only availible if	libomptarget was built
       with -DOMPTARGET_DEBUG. The debugging output provided is	 intended  for
       use  by libomptarget developers.	More user-friendly output is presented
       when using LIBOMPTARGET_INFO.

   LIBOMPTARGET_PROFILE
       LIBOMPTARGET_PROFILE allows libomptarget	to generate time profile  out-
       put  similar to Clang's -ftime-trace option. This generates a JSON file
       based on	Chrome Tracing that can	be viewed with chrome://tracing	or the
       Speedscope App. Building	this feature depends on	the LLVM  Support  Li-
       brary  for  time	trace output. Using this library is enabled by default
       when building using the CMake option OPENMP_ENABLE_LIBOMPTARGET_PROFIL-
       ING. The	output will be saved to	the filename specified by the environ-
       ment variable. For multi-threaded applications, profiling in libomp  is
       also needed. Setting the	CMake option OPENMP_ENABLE_LIBOMP_PROFILING=ON
       to  enable  the feature.	Note that this will turn libomp	into a C++ li-
       brary.

   LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD
       LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD sets the threshold	size for which
       the libomptarget	memory manager will handle the allocation. Any alloca-
       tions larger than this threshold	will not use the memory	manager	and be
       freed after the device kernel exits. The	 default  threshold  value  is
       8KB.  If	 LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD	is set to 0 the	memory
       manager will be completely disabled.

   LIBOMPTARGET_INFO
       LIBOMPTARGET_INFO allows	the user to request different types of runtime
       information from	libomptarget. LIBOMPTARGET_INFO	uses a 32-bit field to
       enable or disable different types of information. This includes	infor-
       mation  about  data-mappings and	kernel execution. It is	recommended to
       build your application with debugging information  enabled,  this  will
       enable filenames	and variable declarations in the information messages.
       OpenMP  Debugging information is	enabled	at any level of	debugging so a
       full debug runtime is not required.  For	minimal	debugging  information
       compile	with -gline-tables-only, or compile with -g for	full debug in-
       formation. A full list of flags supported by LIBOMPTARGET_INFO is given
       below.

	   Print all data arguments upon entering an  OpenMP  device  kernel:
	    0x01

	   Indicate  when  a mapped address already exists in the device map-
	    ping table:	0x02

	   Dump the contents of the device pointer map	at kernel exit:	0x04

	   Print OpenMP kernel	information from device	plugins: 0x10

       Any combination of these	flags can be used by setting  the  appropriate
       bits. For example, to enable printing all data active in	an OpenMP tar-
       get region along	with CUDA information, run the following bash command.

	  $ env	LIBOMPTARGET_INFO=$((1 << 0x1 |	1 << 0x10)) ./your-application

       Or, to enable every flag	run with every bit set.

	  $ env	LIBOMPTARGET_INFO=-1 ./your-application

       For example, given a small application implementing the ZAXPY BLAS rou-
       tine,  Libomptarget  can	provide	useful information about data mappings
       and thread usages.

	  #include <complex>

	  using	complex	= std::complex<double>;

	  void zaxpy(complex *X, complex *Y, complex D,	std::size_t N) {
	  #pragma omp target teams distribute parallel for
	    for	(std::size_t i = 0; i <	N; ++i)
	      Y[i] = D * X[i] +	Y[i];
	  }

	  int main() {
	    const std::size_t N	= 1024;
	    complex X[N], Y[N],	D;
	  #pragma omp target data map(to:X[0 : N]) map(tofrom:Y[0 : N])
	    zaxpy(X, Y,	D, N);
	  }

       Compiling this code targeting nvptx64 with all information enabled will
       provide the following output from the runtime library.

	  $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 -gline-tables-only zaxpy.cpp -o zaxpy
	  $ env	LIBOMPTARGET_INFO=-1 ./zaxpy

	  Info:	Device supports	up to 65536 CUDA blocks	and 1024 threads with a	warp size of 32
	  Info:	Entering OpenMP	data region at zaxpy.cpp:14:1 with 2 arguments:
	  Info:	to(X[0:N])[16384]
	  Info:	tofrom(Y[0:N])[16384]
	  Info:	OpenMP Host-Device pointer mappings after block	at zaxpy.cpp:14:1:
	  Info:	Host Ptr	   Target Ptr	      Size (B) RefCount	Declaration
	  Info:	0x00007fff963f4000 0x00007fd225004000 16384    1	Y[0:N] at zaxpy.cpp:13:17
	  Info:	0x00007fff963f8000 0x00007fd225000000 16384    1	X[0:N] at zaxpy.cpp:13:11
	  Info:	Entering OpenMP	kernel at zaxpy.cpp:6:1	with 4 arguments:
	  Info:	firstprivate(N)[8] (implicit)
	  Info:	use_address(Y)[0] (implicit)
	  Info:	tofrom(D)[16] (implicit)
	  Info:	use_address(X)[0] (implicit)
	  Info:	Mapping	exists (implicit) with HstPtrBegin=0x00007ffe37d8be80,
		TgtPtrBegin=0x00007f90ff004000,	Size=0,	updated	RefCount=2, Name=Y
	  Info:	Mapping	exists (implicit) with HstPtrBegin=0x00007ffe37d8fe80,
		TgtPtrBegin=0x00007f90ff000000,	Size=0,	updated	RefCount=2, Name=X
	  Info:	Launching kernel __omp_offloading_fd02_c2c4ac1a__Z5daxpyPNSt3__17complexIdEES2_S1_m_l6
		with 8 blocks and 128 threads in SPMD mode
	  Info:	OpenMP Host-Device pointer mappings after block	at zaxpy.cpp:6:1:
	  Info:	Host Ptr	   Target Ptr	      Size (B) RefCount	Declaration
	  Info:	0x00007fff963f4000 0x00007fd225004000 16384    1	Y[0:N] at zaxpy.cpp:13:17
	  Info:	0x00007fff963f8000 0x00007fd225000000 16384    1	X[0:N] at zaxpy.cpp:13:11
	  Info:	Exiting	OpenMP data region at zaxpy.cpp:14:1 with 2 arguments:
	  Info:	to(X[0:N])[16384]
	  Info:	tofrom(Y[0:N])[16384]

       From this information, we can see the OpenMP kernel being  launched  on
       the  CUDA device	with enough threads and	blocks for all 1024 iterations
       of the loop in simplified SPMD Mode. The	information  from  the	OpenMP
       data  region shows the two arrays X and Y being copied from the host to
       the device. This	creates	an entry in the	host-device mapping table  as-
       sociating  the host pointers to the newly created device	data. The data
       mappings	in the OpenMP device kernel show the  default  mappings	 being
       used for	all the	variables used implicitly on the device. Because X and
       Y are already mapped in the device's table, no new entries are created.
       Additionally, the default mapping shows that D will be copied back from
       the  device  once  the  OpenMP device kernel region ends	even though it
       isn't written to. Finally, at the end of	the OpenMP data	region the en-
       tries for X and Y are removed from the table.

   Errors:
       libomptarget provides error messages when the program fails inside  the
       OpenMP  target  region.	Common	causes	of failure could be an invalid
       pointer access, running out of device memory, or	trying to offload when
       the device is busy. If the application was built	with debugging symbols
       the error messages will additionally provide the	source location	of the
       OpenMP target region.

       For example, consider the following code	that implements	a simple  par-
       allel  reduction	on the GPU. This code has a bug	that causes it to fail
       in the offloading region.

	  #include <cstdio>

	  double sum(double *A,	std::size_t N) {
	    double sum = 0.0;
	  #pragma omp target teams distribute parallel for reduction(+:sum)
	    for	(int i = 0; i <	N; ++i)
	      sum += A[i];

	    return sum;
	  }

	  int main() {
	    const int N	= 1024;
	    double A[N];
	    sum(A, N);
	  }

       If this code is compiled	and run, there will be an error	message	 indi-
       cating what is going wrong.

	  $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 -gline-tables-only sum.cpp -o	sum
	  $ ./sum

	  CUDA error: Error when copying data from device to host.
	  CUDA error: an illegal memory	access was encountered
	  Libomptarget error: Copying data from	device failed.
	  Libomptarget error: Call to targetDataEnd failed, abort target.
	  Libomptarget error: Failed to	process	data after launching the kernel.
	  Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
	  sum.cpp:5:1: Libomptarget error 1: failure of	target construct while offloading is mandatory

       This  shows  that there is an illegal memory access occuring inside the
       OpenMP target region once execution has moved to	the CUDA device,  sug-
       gesting	a  segmentation	 fault.	 This  then causes a chain reaction of
       failures	in libomptarget. Another message suggests using	the LIBOMPTAR-
       GET_INFO	environment variable as	described in Environment Variables. If
       we do this it will print	the sate of the	host-target  pointer  mappings
       at the time of failure.

	  $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 -gline-tables-only sum.cpp -o	sum
	  $ env	LIBOMPTARGET_INFO=4 ./sum

	  info:	OpenMP Host-Device pointer mappings after block	at sum.cpp:5:1:
	  info:	Host Ptr	   Target Ptr	      Size (B) RefCount	Declaration
	  info:	0x00007ffc058280f8 0x00007f4186600000 8	       1	sum at sum.cpp:4:10

       This tells us that the only data	mapped between the host	and the	device
       is  the	sum variable that will be copied back from the device once the
       reduction has ended. There is no	entry mapping the host array A to  the
       device.	In  this  situation, the compiler cannot determine the size of
       the array at compile time so it will simply assume that the pointer  is
       mapped  on the device already by	default. The solution is to add	an ex-
       plicit map clause in the	target region.

	  double sum(double *A,	std::size_t N) {
	    double sum = 0.0;
	  #pragma omp target teams distribute parallel for reduction(+:sum) map(to:A[0 : N])
	    for	(int i = 0; i <	N; ++i)
	      sum += A[i];

	    return sum;
	  }

   OpenMP in LLVM --- Offloading Design
   OpenMP Target Offloading ---	SPMD Mode
   OpenMP Target Offloading ---	Generic	Mode
   LLVM/OpenMP Target Host Runtime Plugins (libomptarget.rtl.XXXX)
   Remote Offloading Plugin:
       The remote offloading plugin permits the	execution of OpenMP target re-
       gions on	devices	in remote hosts	in addition to the  devices  connected
       to  the	local  host. All target	devices	on the remote host will	be ex-
       posed to	the application	as if they were	local devices,	that  is,  the
       remote  host  CPU  or its GPUs can be offloaded to with the appropriate
       device number. If the server is running on the same host,  each	device
       may  be	identified  twice:  once  through  the device plugins and once
       through the device plugins that the server application has access to.

       This plugin consists  of	 libomptarget.rtl.rpc.so  and  openmp-offload-
       ing-server which	should be running on the (remote) host.	The server ap-
       plication does not have to be running on	a remote host, and can instead
       be  used	 on  the same host in order to debug memory mapping during of-
       floading.  These	are implemented	via gRPC/protobuf so  these  libraries
       are  required  to  build	and use	this plugin. The server	must also have
       access to the necessary target-specific plugins in order	to perform the
       offloading.

       Due to the experimental nature of this plugin, the CMake	 variable  LI-
       BOMPTARGET_ENABLE_EXPERIMENTAL_REMOTE_PLUGIN  must  be  set in order to
       build this plugin. For example, the rpc plugin is not  designed	to  be
       thread-safe, the	server cannot concurrently handle offloading from mul-
       tiple applications at once (it is synchronous) and will terminate after
       a single	execution. Note	that openmp-offloading-server is unable	to re-
       mote  offload  onto  a remote host itself and will error	out if this is
       attempted.

       Remote offloading is configured via environment variables at runtime of
       the OpenMP application:

	      	LIBOMPTARGET_RPC_ADDRESS=<Address>:<Port>

	      	LIBOMPTARGET_RPC_ALLOCATOR_MAX=<NumBytes>

	      	LIBOMPTARGET_BLOCK_SIZE=<NumBytes>

	      	LIBOMPTARGET_RPC_LATENCY=<Seconds>

   LIBOMPTARGET_RPC_ADDRESS
       The address and port at which the server	is running. This needs	to  be
       set for the server and the application, the default is 0.0.0.0:50051. A
       single OpenMP executable	can offload onto multiple remote hosts by set-
       ting this to comma-seperated values of the addresses.

   LIBOMPTARGET_RPC_ALLOCATOR_MAX
       After allocating	this size, the protobuf	allocator will clear. This can
       be set for both endpoints.

   LIBOMPTARGET_BLOCK_SIZE
       This  is	 the  maximum  size  of	 a single message while	streaming data
       transfers between the two endpoints and can be set for both endpoints.

   LIBOMPTARGET_RPC_LATENCY
       This is the maximum amount of time the client will wait for a  response
       from the	server.

   LLVM/OpenMP Target Device Runtime (libomptarget-ARCH-SUBARCH.bc)
       LLVM,  since version 11 (12 Oct 2020), has an OpenMP-Aware optimization
       pass as well as the ability to perform  "scalar	optimizations"	across
       OpenMP region boundaries.

       In-depth	discussion of the topic	can be found here.

OPENMP OPTIMIZATIONS IN	LLVM
       LLVM,  since version 11 (12 Oct 2020), has an OpenMP-Aware optimization
       pass as well as the ability to perform  "scalar	optimizations"	across
       OpenMP region boundaries.

   OpenMP-Aware	Optimizations
   OpenMPOpt
   Resources
        2020  LLVM  Developers	Meeting: "(OpenMP) Parallelism-Aware Optimiza-
	 tions"	https://youtu.be/gtxWkeLCxmU

        2019  EuroLLVM	 Developers  Meeting:  "Compiler   Optimizations   for
	 (OpenMP) Target Offloading to GPUs" https://youtu.be/3AbS82C3X30

   OpenMP-Unaware Optimizations
   Resources
        2018  LLVM  Developers	 Meeting:  "Optimizing Indirections, using ab-
	 stractions without remorse" https://youtu.be/zfiHaPaoQPc

        2019 LLVM Developers Meeting: "The Attributor:	A Versatile Inter-pro-
	 cedural Fixpoint Iteration Framework" https://youtu.be/CzWkc_JcfS0

       LLVM has	an elaborate ecosystem around analysis	and  optimization  re-
       marks  issues  during  compilation. The remarks can be enabled from the
       clang frontend [1] [2] in various formats [3] [4] to be used by	tools,
       i.a., opt-viewer	or llvm-opt-report (dated).

       The  OpenMP  optimizations in LLVM have been developed with remark sup-
       port as a priority. For a list of OpenMP	specific remarks and more  in-
       formation on them, please refer to OpenMP Optimization Remarks.

        [1]
	 https://clang.llvm.org/docs/UsersManual.html#options-to-emit-optimization-reports

        [2]
	 https://clang.llvm.org/docs/ClangCommandLineReference.html#diagnostic-flags

        [3]
	 https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-foptimization-record-file

        [4]
	 https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang1-fsave-optimization-record

OPENMP OPTIMIZATION REMARKS
       The OpenMP-Aware	optimization pass is able to generate compiler remarks
       for   performed	 and   missed	optimisations.	 To  emit  them,  pass
       -Rpass=openmp-opt,	    -Rpass-analysis=openmp-opt,		   and
       -Rpass-missed=openmp-opt	to the Clang invocation.  For more information
       and  features  of  the  remark system the clang documentation should be
       consulted:

        Clang options to emit optimization reports

        Clang diagnostic and remark flags

        The	   -foptimization-record-file	    flag	and	   the
	 -fsave-optimization-record flag

       Some  OpenMP remarks start with a "tag",	like [OMP100], which indicates
       that there is further information about them on this page. To  directly
       jump	 to	 the	  respective	  entry,      navigate	    to
       https://openmp.llvm.org/docs/remarks/OptimizationRemarks.html#ompXXX
       where XXX is the	three digit code shown in the tag.

					----

   [OMP100] Potentially	unknown	OpenMP target region caller
       A function remark that indicates	the function, when compiled for	a GPU,
       is potentially called from outside the translation unit.	 Note  that  a
       remark  is  only	 issued	 if  we	tried to perform an optimization which
       would require us	to know	all callers on the GPU.

       To facilitate OpenMP semantics on GPUs we provide a  runtime  mechanism
       through	which  the code	that makes up the body of a parallel region is
       shared with the threads in the team. Generally we use  the  address  of
       the outlined parallel region to identify	the code that needs to be exe-
       cuted.  If we know all target regions that reach	the parallel region we
       can avoid this function pointer passing scheme and  often  improve  the
       register	 usage on the GPU. However, If a parallel region on the	GPU is
       in a function with external linkage we may not know all callers	stati-
       cally.  If there	are outside callers within target regions, this	remark
       is to be	ignored. If there are no such callers, users  can  modify  the
       linkage	 and   thereby	help  optimization  with  a  static  or	 __at-
       tribute__((internal)) function annotation. If changing the  linkage  is
       impossible,  e.g.,  because  there are outside callers on the host, one
       can split the function into an external visible interface which is  not
       compiled	 for  the  target and an internal implementation which is com-
       piled for the target and	should be called from within  the  target  re-
       gion.

       Dealing	with  OpenMP can be complicated. For help with the setup of an
       OpenMP (offload)	capable	compiler  toolchain,  its  usage,  and	common
       problems, consult the Support and FAQ page.

       We  also	 encourage  everyone  interested  in OpenMP in LLVM to get in-
       volved.

SUPPORT, GETTING INVOLVED, AND FAQ
       Please do not hesitate to reach out to us via openmp-dev@lists.llvm.org
       or join one of our regular calls. Some common questions are answered in
       the FAQ.

   Calls
   OpenMP in LLVM Technical Call
        Development updates on	OpenMP (and OpenACC) in	the LLVM Project,  in-
	 cluding Clang,	optimization, and runtime work.

        Join OpenMP in	LLVM Technical Call.

        Time: Weekly call on every Wednesday 7:00 AM Pacific time.

        Meeting minutes are here.

        Status	tracking page.

   OpenMP in Flang Technical Call
        Development updates on	OpenMP and OpenACC in the Flang	Project.

        Join OpenMP in	Flang Technical	Call

        Time: Weekly call on every Thursdays 8:00 AM Pacific time.

        Meeting minutes are here.

        Status	tracking page.

   FAQ
       NOTE:
	  The  FAQ  is	a work in progress and most of the expected content is
	  not yet available. While you can expect changes, we  always  welcome
	  feedback    and    additions.	   Please   contact,   e.g.,   through
	  openmp-dev@lists.llvm.org.

   Q: How to contribute	a patch	to the webpage or any other part?
       All patches go through the regular LLVM review process.

   Q: How to build an OpenMP offload capable compiler?
       To build	an effective OpenMP offload capable compiler, only  one	 extra
       CMake  option,  LLVM_ENABLE_RUNTIMES="openmp",  is needed when building
       LLVM (Generic information about	building  LLVM	is  available  here.).
       Make  sure  all	backends that are targeted by OpenMP to	be enabled. By
       default,	Clang will be built with all backends enabled.

       If your build machine is	not the	target machine or automatic  detection
       of the available	GPUs failed, you should	also set:

        CLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_XX where XX	is the architecture of
	 your GPU, e.g,	80.

        LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY  where  YY	is the numeric
	 compute capacity of your GPU, e.g., 75.

       NOTE:
	  The compiler that generates the offload  code	 should	 be  the  same
	  (version)  as	 the  compiler that builds the OpenMP device runtimes.
	  The OpenMP host runtime can be built by a different compiler.

   Q: Does OpenMP offloading support work in pre-packaged LLVM releases?
       For now,	the answer is most likely no. Please see Q: How	 to  build  an
       OpenMP offload capable compiler?.

   Q:  Does  OpenMP offloading support work in packages	distributed as part of
       my OS?
       For now,	the answer is most likely no. Please see Q: How	 to  build  an
       OpenMP offload capable compiler?.

   Q:  Does Clang support <math.h> and <complex.h> operations in OpenMP	target
       on GPUs?
       Yes, LLVM/Clang allows math functions and complex arithmetic inside  of
       OpenMP target regions that are compiled for GPUs.

       Clang  provides	a  set	of  wrapper  headers that are found first when
       math.h and complex.h, for C, cmath and complex,	for  C++,  or  similar
       headers are included by the application.	These wrappers will eventually
       include	the system version of the corresponding	header file after set-
       ting up a target	device specific	environment. The fact that the	system
       header is included is important because they differ based on the	archi-
       tecture	and  operating	system and may contain preprocessor, variable,
       and function definitions	that need to be	available in the target	region
       regardless of the targeted device architecture.	However, various func-
       tions may require specialized device versions, e.g.,  sin,  and	others
       are  only  available  on	 certain devices, e.g.,	__umul64hi. To provide
       "native"	support	for math and complex on	the  respective	 architecture,
       Clang  will  wrap the "native" math functions, e.g., as provided	by the
       device vendor, in an OpenMP begin/end declare variant. These  functions
       will  then  be  picked  up instead of the host versions while host only
       variables and function definitions are still available. Complex	arith-
       metic  and  functions  are  support  through a similar mechanism. It is
       worth noting that this support requires extensions to  the  OpenMP  be-
       gin/end	declare	 variant  context  selector  that  are exposed through
       LLVM/Clang to the user as well.

   Q: What is a	way to debug errors from mapping memory	to a target device?
       An experimental way to debug these errors is to use remote process  of-
       floading.    By	 using	 libomptarget.rtl.rpc.so  and  openmp-offload-
       ing-server, it is possible to explicitly	perform	memory	transfers  be-
       tween  processes	 on  the host CPU and run sanitizers while doing so in
       order to	catch these errors.

       The current (in-progress) release notes can be found here while release
       notes for releases, starting with LLVM 12, will	be  available  on  the
       Download	Page.

OPENMP 12.0.0 RELEASE NOTES
       WARNING:
	  These	 are  in-progress  notes for the upcoming LLVM 12.0.0 release.
	  Release notes	for previous releases can be  found  on	 the  Download
	  Page.

   Introduction
       This  document  contains	 the release notes for the OpenMP runtime, re-
       lease 12.0.0.  Here we describe the status of OpenMP,  including	 major
       improvements  from  the	previous  release.  All	OpenMP releases	may be
       downloaded from the LLVM	releases web site.

   Non-comprehensive list of changes in	this release
        Extended the libomptarget API functions to  include  source  location
	 information  and OpenMP target	mapper support.	This allows libomptar-
	 get to	know the source	location of the	OpenMP region it is executing,
	 as well as the	name and declarations of all the variables used	inside
	 the region. Each function generated now uses its mapper variant.  The
	 old  API calls	now call into the new API functions with nullptr argu-
	 ments for backwards compatibility with	old binaries. Source  location
	 information  for  libomptarget	is now generated by Clang at any level
	 of debugging information.

        Added improved	error messages for libomptarget	and CUDA plugins.  Er-
	 ror messages are now presented	without	requiring a debug build	of li-
	 bomptarget.  The  newly added source location information can also be
	 used to identify which	OpenMP target region the failure occurred  in.
	 More information can be found here.

        Added additional environment variables	to control output from the li-
	 bomptarget  runtime  library.	LIBOMPTARGET_PROFILE  to generate time
	 profile output	similar	to Clang's  -ftime-trace  option.   LIBOMPTAR-
	 GET_MEMORY_MANAGER_THRESHOLD  sets  the  threshold size for which the
	 libomptarget memory manager will handle the  allocation.   LIBOMPTAR-
	 GET_INFO  allows the user to request certain information from the li-
	 bomptarget runtime using a 32-bit field. A full description  of  each
	 environment variable is described here.

        target	 nowait	 was supported via hidden helper task, which is	a task
	 not bound to any parallel region. A hidden helper team	with a	number
	 of  threads  is  created when the first hidden	helper task is encoun-
	 tered.	The number of threads can be configured	 via  the  environment
	 variable LIBOMP_NUM_HIDDEN_HELPER_THREADS. By default it is 8.	If LI-
	 BOMP_NUM_HIDDEN_HELPER_THREADS=0,  hidden helper task is disabled and
	 falls back to a regular OpenMP	task. It can also be disabled by  set-
	 ting the environment variable LIBOMP_USE_HIDDEN_HELPER_TASK=OFF.

        deviceRTLs  for  NVPTX	 platform  is  CUDA  free now. It is generally
	 OpenMP	 code.	 Target	  dependent   parts   are   implemented	  with
	 Clang/LLVM/NVVM  intrinsics. CUDA SDK is also dropped as a dependence
	 to build the device runtime, which means device runtime can  also  be
	 built	on a CUDA free system. However,	it is disabled by default. Set
	 the CMake variable LIBOMPTARGET_BUILD_NVPTX_BCLIB=ON  to  enable  the
	 build of NVPTX	device runtime on a CUDA free system. gcc-multilib and
	 g++-multilib  are required. If	CUDA is	found, the device runtime will
	 be built by default.

	  Static NVPTX	 device	 runtime  library  (libomptarget-nvptx.a)  was
	   dropped.

	 A  bitcode library is required	to build an OpenMP program. If the li-
	 brary is not found in the default path	or any of the paths defined by
	 LIBRARY_PATH, an error	will be	raised.	User can also specify the path
	 to the	bitcode	device library via --libomptarget-nvptx-bc-path=.

AUTHOR
       unknown

COPYRIGHT
       2013-2025, LLVM/OpenMP

				 Apr 17, 2025			 LLVMOPENMP(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=llvmopenmp12&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help