Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
EMILUA(7)		       Emilua reference			     EMILUA(7)

NAME
       Emilua -	Lua execution engine

DESCRIPTION
	   Warning

	   The target public for this document are C++ programmers who want to
	   delve into the project's code, not lua users. Native	plug-in
	   authors should also read this page.

       The intent of this page is not to detail	every internal of the project,
       but just	to give	an overview of the architecture. Details change
       quickly and documentation would lag behind, so they're avoided.

       Once you	read it, you should be familiar	with the assumptions made
       thoroughly the project, and how to interact with	the native code.

       We assume that you already have some familiarity	with the lua C API and
       Boost.Asio.

MULTIPLE LUA VMS
       The project allows multiple OS threads to call asio::io_context::run(),
       so lua VMs can jump from	one thread to another freely, but they will
       always refer to the same	asio::io_context and each will be protected by
       its own ASIO strand.

	   -- Instantiates a new lua VM	that shares
	   -- the caller's `asio::io_context`
	   spawn_vm(module)

	   -- Instantiates a new lua VM	in a new
	   -- thread with its own `asio::io_context`
	   spawn_vm{ module=module, inherit_context=false }

       You must	specify	a lua module name to run in the	new VM,	not a
       function. The module will be loaded and run in the new VM.

       The only	way for	two different lua VMs to communicate is	message
       passing.	The channels are given when you	instantiate the	extra VMs. The
       channels	accept a range of different values and will deep-copy them.
       You can also send references to IO objects, but the original references
       will be rendered	unusable (their	metatables are unset). Do pay
       attention to not	let objects that have pending operations to be sent
       over (EBUSY, but	do create an error code	just for that).

       Nor synchronization primitives (such as mutex) nor fiber	handles	can be
       sent over the channels and by implication can't be used to synchronize
       (or send	cancellation requests to) fibers running in different lua VMs.

       You can also send a channel over	a channel. This	will only send the
       channel address over and	will allow complex routing among the lua VMs.
       If you send a channel's rx-end, the other side will receive a
       tx-channel anyway. On the C++-side, we need to implement	a MPSC
       strand-based channel.

       These characteristics should be enough to implement actor patterns. And
       it is not the job of emilua to enforce good patterns on applications.
       The patterns can	be configured purely in	the lua	side of	coding.

	   -- Spawn extra threads to the
	   -- caller's `asio::io_context`
	   spawn_context_threads(count)

       Leaving the actor model aside for a moment, it's	now easy to have
       threads with work-stealing (e.g.	8 lua VMs sharing the same
       asio::io_context	running	on 4 threads) so you don't have	to worry about
       load-balancing.

INSIDE A SINGLE	LUA VM
       When you	issue some IO operation	(including chan:receive()), the
       calling fiber will suspend, but other fibers from the same lua VM are
       allowed to kick in (cooperative multitasking). Fibers can share state
       with each other safely (and free	from contention	problems) as-if	the
       program was single-threaded.

	   -- Spawn a new fiber	on this	lua VM
	   spawn(fn)

       You can use the fiber handle just like you'd use	a thread handle. There
       is join(), detach() and cancel().

       All sync	primitives obey	some characteristics thanks to the
       restrictions we've laid out:

          They	always live in the same	strand.	They never migrate strands.

          They	don't synchronize with fibers from other strands (except for
	   channels, but that's	another	story).

       Given these conditions, it's now	easier to implement and	reason about
       the C++ code.

       Only the	C++ code that suspended	the fiber can resume it	back. If the
       operation should	be cancellable,	the async op should set	an interrupter
       before suspending the fiber. No other code from the runtime will	wake
       this fiber up. Once the interrupter is called, it'll be cleared
       automatically to	prevent	further	complications on the async op
       implementation. The completion handler should also clear	the
       interrupter to make sure	it won't be (wrongly) reused for other
       operations.

       A good level of serialization can be done by exploring these properties
       and simplify the	implementation a lot. For once,	you know no other code
       will wake the fiber up, so you can just as well call io_obj.cancel() on
       the interrupter and map asio::error::operation_aborted to
       errc::fiber_canceled on the completion handler. A single	handler	(and
       no other) will take care	of waking the fiber. There is no race to deal
       with here or anything alike.

       A lot of	the boilerplate	is handled already on the prologue/epilogue
       functions from vm_context.

USERDATA PRACTICES
       Besides the common practices to create custom objects through userdata,
       Emilua (IO) objects will	also:

          Hide	the metatable. By doing	that, user code	is prevented from
	   changing the	metatable (the metatable is just an usual table	after
	   all)	that native code relies	on.

          Assume lua_setmetatable() is	an indivisible operation for userdata
	   (i.e. if it fails, it doesn't set a metatable nor any __gc
	   metamethod).	This assumption	is important to	simplify object
	   management by getting away with all pre-initialization tricks
	   teached on Roberto's	manuals	and associated complexities.

          Assume lua_setmetatable() reports errors through exceptions (i.e.
	   it always returns 1). This is a superset of the previous point and
	   it is guaranteed by the VM[1]. We don't really care as much about
	   this	point, but as it is guaranteed,	the assumption described in
	   the previous	point (which we	do care	about) is covered as well.

C++ ASYNC OPERATIONS
       Let's begin with	require().

       require()'ing a module is also an async operation which will suspend
       the caller fiber. Every module has its own isolated environment (i.e. a
       new lua thread is created for every module and that thread's
       environment is configured to use	a separate lua table) sharing the same
       lua VM. The module's entry point	is an user-provided source code
       evaluated to prepare the	environment with the names that	should be
       exported	to the caller fiber. But this preparatory step may not be
       immediately ready and may need to call other async operations. The rule
       we define to mark a module as loaded and	ready is when its main fiber
       finishes	(synchronization code similar to fiber:join()).

       To further enforce a more manageable project layout, it is only allowed
       to import new modules from the main fiber. This may introduce a slow
       startup in some project layouts,	but:

          It is simpler to reason about the relationship of exported/imported
	   names if we restrict	them to	the same main fiber. One such use we
	   do of this feature is detecting whether the inbox module was	loaded
	   and close it	if not.

          We are explicitly not aiming	for remote modules (e.g. JS running on
	   a web browser), so we don't need to care about slow startup
	   happening in	this event.

          In the cases	where some module startup is indeed slow, the module
	   programmer himself can adopt	lazy loading techniques	within his
	   module's functions to have a	quick startup with respect to the rest
	   of the application.

       Modules evaluate	only once and are cached. We never unload them.	We
       keep a reference	to their lua thread for	as long	as the lua VM is
       active.

       Loading a module	forms a	loader-loaded relationship. This relationship
       builds a	chain that must	be checked when	a new module is	require()d (so
       we can for instance prevent cyclic imports). But	each module will have
       its own environment. This means the C++ function	that implements
       require() needs to check	lua-hidden state associated with the caller
       lua function (not a global one).	That's the module system state
       per-module.

	   Note

   Rule
       The per-module state is stored by using the module's main thread	as a
       key in the fibers table.	The fibers table is strong, but	this isn't a
       problem because the module shall	never be unloaded anyway. Code that
       unrefs fiber coroutines shall check whether the lua thread represents a
       module and skip removing	it from	the fibers table if so.

       We can't	store the module system	data directly at the thread
       environment because lua code can	change the thread environment by
       calling setfenv(0, table).

       We've already gone through the trickiest	parts and added	the most
       important restrictions to the table (no lua-related pun intended), so
       the remaining rules should be quick'n'easy to catch.

       When you	initiate an async operation, the C++ side will copy the
       lua_State* to handle the	completion (or cancellation) later. However,
       any LUA_ERRMEM will trigger an emilua-call to lua_close() and L may
       then be invalid when we later try to resume it. So the completion
       handler need to check whether the vm is still valid before accessing it
       and this	is the purpose of the vm_context structure (also protected by
       the same	strand as the vm).

this_fiber
       As long as lua code is executing, there is a current fiber and this
       property	stays unchanged	for as long as control doesn't return to host.

       transparent, adj.
	   Being or pertaining to an existing, nontangible object.

	      It's there, but you can't	see it
		-- IBM System/360 announcement,	1964

       virtual,	adj.
	   Being or pertaining to a tangible, nonexistent object.

	      I	can see	it, but	it's not there.
		-- Lady	Macbeth

       This property is	mostly transparent to lua code.	Which is to say	that
       the programmer is aware of this property, but there isn't a tangible
       object that it can track	back to	this_fiber. This is mostly true, but
       there is	a quite	tangible this_fiber lua	global object that the user
       can inspect -- exposed at the beginning of the first thread execution.

       However,	this_fiber being a global is shared among all the fibers, so
       it can't	point to a single fiber. Instead, it will query	which fiber is
       current and do operations on it.

       C++ async ops will always store which fiber is current to know how to
       resume it back. And before a fiber is resumed, this info	is stored at a
       know lua	registry's index so future async ops will get to know about it
       too. The	reason why we can't rely on the	L argument passed to C
       functions registered at the VM and the current fiber needs to be
       remembered is because there will	be a L that points to the wrong	lua
       thread as soon as the user wraps	some function in a coroutine.

       This design works well because we don't mix responsibilities of the
       scheduler with user code	(as is the case	for Fiber#resume in Ruby which
       would be	better suited by a Fiber#spawn() that accepts post/dispatch
       execution policies and would avoid the (un-)parking unsound ideas
       altogether).

ASYNCHRONOUS EVENT NOTIFICATION
       Some events are intrusive and will be generated even when no
       thread/fiber asked for them. The	classical example are UNIX signals. A
       sighandler must be registered to	handle them, but that begs the
       question: from which thread are these functions called? In the C	world
       there are multiple answers:

       SIGEV_SIGNAL
	   The handler will be called asynchronously from any thread. That
	   means a lot of restrictions to what a sighandler can	do.

       SIGEV_THREAD
	   The handler will be called from an unspecified thread. Now we have
	   way less restrictions, but some still exist (e.g. unsafe
	   thread-local	variables and thread cancelability state).

       SIGEV_KEVENT
	   The golden standard for event multiplexing in the C world.

       Generally the need for asynchronous events spurs	from bad design	and
       should be avoided. However when integrating lua code to existing
       libraries we must deal with asynchronous	events now and then. Emilua
       reserves	a lua coroutine/thread for which no suspension is ever allowed
       and that	will give the lua user a mix between SIGEV_SIGNAL and
       SIGEV_THREAD restrictions. From the handler the user can	notify a
       condition variable to achieve friction-less handling from a different
       fiber similar to	what SIGEV_KEVENT enables.

       From the	C++ side, one just needs to get	the asynchronous event (lua)
       thread and rely on lua_pcall() (no need for complex lua_resume()
       handling, nor fiber APIs).

LUA_ERRMEM
       Lua code	cannot recover from allocation failures. As an example (and
       single-VM only):

	   my_mutex:lock()
	   scope_cleanup_push(function() my_mutex:unlock() end)

       If the VM fails to allocate the closure passed to scope_cleanup_push(),
       my_mutex	will be	kept locked and	the lua	code inside that VM will be in
       an unrecoverable	state. There's no pattern or ordering to make resource
       management work here as allocation failures can happen almost anywhere
       and we then inherit some	constraints and	reasoning from preemptive
       scheduling. The only option (and	this applies to	any allocation failure
       reported	by the lua VM when running arbitrary user code)	is to
       terminate the VM	from the C++-side.

       When lua_close()	is called, there is no guarantee pending operations
       will be canceled	as they	might hold strong references to	the underlying
       IO object preventing its	destructor from	getting	called.	Therefore, the
       vm_context structure also holds an intrusive container of polymorphic
       elements	which are destroyed after lua_close() is called	and can	be
       used to register	cleanup	code to	avoid such leaks. If the operation
       finishes, the IO	object is free to reclaim their	own objects from this
       container and use them for other	purposes.

       lua_CFunction objects should never call lua_close(). If they detect
       LUA_ERRMEM all they have	to do is to mark the flags field from
       vm_context and suspend the fiber. The host will take care of closing
       lua_State* and extra cleanup when it recovers control of	the thread.

       The other side of the coin is to	detect LUA_ERRMEM. All interactions
       with the	VM from	the C API happens through the virtual stack, so
       naturally that's	the first concern. You must not	push anything on the
       stack if	there's	no extra free stack slot available. To check for such
       slot space, there's lua_checkstack().

       The usual C function signature is not enough to convey all the
       semantics required by the Lua C API. On the Functions and Types section
       from <http://www.lua.org/manual/5.1/manual.html#3.7> the	manual"	, we
       verify the following information:

	  Here we list all functions and types from the	C API in
	  alphabetical order. Each function has	an indicator like this:
	  [-o, +p, x]

	  [...]	The third field, x, tells whether the function may throw
	  errors: '-' means the	function never throws any error; 'm'
	  means	the function may throw an error	only due to not	enough
	  memory; 'e' means the	function may throw other kinds of
	  errors; 'v' means the	function may throw an error on purpose.

       The 5.1's signature for lua_checkstack()	is:

	   int lua_checkstack(lua_State	*L, int	extra);	// [-0,	+0, m]

       That's obviously	bogus. If lua_checkstack() can throw on	ENOMEM that
       means there is no possible safe interaction with	the VM.	That's --
       plain and simple	-- a bug. This bug was fixed in	Lua 5.2	when the
       signature changed to:

	   int lua_checkstack(lua_State	*L, int	extra);	// [-0,	+0, ]

	   Note

	   Lua 5.2 received a few other	improvements concerning	ENOMEM such as
	   obsoleting lua_cpcall() by introducing light	C functions. API-wise,
	   Lua 5.2 was a great release as it fixed many	shortcomings.

       You don't always	need to	call lua_checkstack() before doing anything
       thanks to at least LUA_MINSTACK free stack slots	being guaranteed for
       you when	the VM calls into your lua_CFunction objects. And here's where
       things start to get tricky. Consider the	following Lua code:

	   coroutine.wrap(function()
	       spawn(function()
		   print('Hello	World')
	       end)
	   end)()

       The underlying C	function implementing spawn() is exposed to 3
       different lua_State* handles:

       Current fiber
	   get_vm_context(L).current_fiber(). The one that calls
	   coroutine.wrap().

       Inner coroutine
	   The L parameter from	lua_CFunction. The one that calls spawn().

       New fiber
	   lua_newthread(L) return value. The one to print Hello World.

       If lua_error() is called	on L, the stack	for L will be in a completely
       deterministic state. Anything this lua_CFunction	object pushed on the
       stack will be popped and	the whole pcall()-chain	on the state L will be
       respected too. However lua_error() might	be called indirectly through
       other API functions. That's the signature for lua_newtable():

	   void	lua_newtable(lua_State *L); // [-0, +1,	m]

       As we've	seen previously:

	  'm' means the	function may throw an error only due to	not
	  enough memory

       Throw here means	sorts of a call	to lua_error() (LUAI_THROW to be more
       accurate). That's the pcall()-chain and each lua_State has its own
       (this property won't change even	if you compile the Lua VM as C++
       code). This independent pcall()-chain for each lua_State	is not a
       limitation from the C API, but an accurate model	of the underlying
       machinery happening in Lua code itself. Consider	the following snippet:

	   c1 =	coroutine.create(function()
	       pcall(function()
		   -- ...
	       end)
	   end)

       If c1 is	suspended in the middle	of pcall(), it retains this private
       pcall()-chain that doesn't get mixed with pcall()-chains	from other
       coroutines (i.e.	the other lua_State* handles). Therefore the C API
       accurately maps the language behaviour on retaining a private
       pcall()-chain for each lua_State	and we can't expect any	different
       behaviour here really. Lua documentation	on the issue has been ironed
       out little-by-little throughout its releases. Lua 5.3 was the one to
       finally explicitly state	the behaviour we just described:

	  The panic function, as its name implies, is a	mechanism of
	  last resort. Programs	should avoid it. As a general rule, when
	  a C function is called by Lua	with a Lua state, it can do
	  whatever it wants on that Lua	state, as it should be already
	  protected. However, when C code operates on other Lua	states
	  (e.g., a Lua argument	to the function, a Lua state stored in
	  the registry,	or the result of lua_newthread), it should use
	  them only in API calls that cannot raise errors.
	    -- Lua 5.3 Reference
	    <http://www.lua.org/manual/5.3/manual.html#4.6>

       In short, that means our	spawn()	implementation that is exposed to the
       {L, current fiber, new fiber} triple would throw	to the wrong
       pcall()-chain if	it calls lua_newtable(new_fiber). The solution is to
       use lua_xmove() when necessary and maintain rigorous discipline as to
       which C API functions are called	on foreign lua_State* handles paying
       very special attention to their respective throw	specifications.	As for
       the discipline required,	Rici Lake wrote	a
       <http://lua-users.org/wiki/ErrorHandlingBetweenLuaAndCplusplus> good
       summary on the lua-users	wiki" :

	  There	are quite a number of API functions which will never
	  throw	a Lua error. API functions that	throw errors are
	  identified in	the reference manual as	of 5.1.3. First, none of
	  the stack adjustment functions throw errors; this includes
	  lua_pop, lua_gettop, lua_settop, lua_pushvalue, lua_insert,
	  lua_replace and lua_remove. If you provide incorrect indexes
	  to these functions, or you haven't called lua_checkstack, then
	  you're either	going to get garbage or	a segfault, but	not a
	  Lua error.

	  None of the functions	which push atomic data --
	  lua_pushnumber, lua_pushnil, lua_pushboolean and
	  lua_pushlightuserdata	ever throw an error. API functions which
	  push complex objects (strings, tables, closures, threads, full
	  userdata) may	throw a	memory error. None of the type enquiry
	  functions -- lua_is*,	lua_type and lua_typename -- will ever
	  throw	an error, and neither will the functions which set/get
	  metatables and environments. lua_rawget, lua_rawgeti and
	  lua_rawequal will also never throw an	error. Aside from
	  lua_tostring,	none of	the lua_to* functions will throw an
	  error, and you can avoid the possibility of lua_tostring
	  throwing an out of memory error by first checking that the
	  object is a string, using lua_type. lua_rawset and lua_rawseti
	  may throw an out of memory error. The	functions which	may
	  throw	arbitrary errors are the ones which may	call
	  metamethods; these include all of the	non-raw	get and	set
	  functions, as	well as	lua_equal and lua_lt.

       On a side note, Lua 5.2 added the following:

	  If an	error happens outside any protected environment, Lua
	  calls	a panic	function (see lua_atpanic) and then calls abort,
	  thus exiting the host	application. Your panic	function can
	  avoid	this exit by never returning (e.g., doing a long jump to
	  your own recovery point outside Lua).

	  The panic function runs as if	it were	a message handler (see
	  2.3);	in particular, the error message is at the top of the
	  stack. However, there	is no guarantees about stack space. To
	  push anything	on the stack, the panic	function should	first
	  check	the available space (see 4.2).
	    -- Lua 5.2 Reference
	    <http://www.lua.org/manual/5.2/manual.html#4.6>

       That's actually behaviour that already existed on the version 5.1. An
       alternative panic function could	just throw a C++ exception to
       implement this __attribute__((noreturn))	behaviour. However this
       hypothetical panic function is not an alternative solution to our
       problems	due to the combination of the following	facts:

          As described	elsewhere in this document, we require lua_error() to
	   act as-if it	throws a C++ exception so our destructors are properly
	   called. That	requires the underlying	Lua VM (LuaJIT in our case) to
	   throw and catch C++ exceptions.

          A C++-throw is triggered from lua_newtable(L). The type thrown here
	   is internal to the Lua VM and we cannot throw it ourselves.
	   LUA_ERRMEM information is correctly preserved.

          A panic is triggered	from lua_newtable(new_fiber). Our panic
	   function would in turn discard LUA_ERRMEM and throw a generic C++
	   exception.

          On lua_newtable(new_fiber) hitting LUA_ERRMEM, the L's C++-catch
	   handler wouldn't receive the	original error (LUA_ERRMEM). That
	   means information loss. That	means our host code (the code that
	   first calls into the	Lua VM)	won't call lua_close() (when it
	   should) as its lua_pcall()/lua_resume() call	might not report the
	   correct error reason	(LUA_ERRMEM). That also	means the possibility
	   to unwind the wrong number of cascaded pcall() blocks (a pcall()
	   from	Lua code is not	supposed to handle LUA_ERRMEM -- if correctly
	   detected -- so the number of	blocks unwinded	differs	whenever
	   LUA_ERRMEM is involved).

          Although LuaJIT can catch generic C++ exceptions, it	lacks context
	   and cannot possibly restore the stack state on each lateral
	   lua_State* handle at	play (the triple {L, current fiber, new	fiber}
	   in our case). If the	spawn()	lua_CFunction had a value pushed on
	   the current_fiber stack when	a new_fiber panic-triggered exception
	   raises, the value on	the current_fiber stack	wouldn't be properly
	   popped by the time L	handles	the C++	exception (and do remember
	   that	L is executing nested on top of	current_fiber so you can
	   already imagine the chaos here). In short, the Lua VM needs our
	   cooperation to maintain some	invariants.

          By wrapping these calls into	our own	C++ catch blocks we could work
	   around some of these	issues,	but the	thought	that thread control
	   would still return to the Lua VM one	last time after	the panic
	   handler got called is just too scary	and previous mailing list
	   threads on this topic weren't very reassuring. For one, if the
	   exception is	panic-triggered	by current_fiber, we won't know	what
	   remains on this stack (except for the stack top), but that's
	   exactly the lua_State that the host is operating on when our
	   lua_CFunction got called on L. Even if control does return safely
	   to our host it would	still have problems to deal with there.

       That covers our policy when implementing	lua_CFunction objects. In
       short, we cannot	resort to Lua panics here and the only real solution
       is the rigorous discipline on C API usage mentioned earlier.

       Now let's talk about our	policy for host	code. The Lua suspending IO
       functions are implemented by querying which fiber is current and
       scheduling a lua_resume() on it as the callback for some	Boost.Asio
       supported C++ async_*() function	(plus a	ton of other details properly
       documented elsewhere on this document such as strand handling and so
       on). The	initiating function is called from the Lua VM, but the
       callback	is not.	The callback will act as the host.

       Back to lua_resume(), this function itself doesn't throw:

	   int lua_resume(lua_State *L,	int narg); // [-?, +?, ]

       However the code	that runs before lua_resume() might throw. This	is the
       code that pushes	the arguments to the coroutine.	For instance, if a
       string is one of	the coroutine parameters, you will have	to use C API
       that might throw	on ENOMEM:

	   void	lua_pushlstring(lua_State *L, const char *s, size_t len); // [-0, +1, m]

       It's no use trying to call lua_pcall() to wrap lua_pushlstring()	here.
       lua_state() now returns LUA_YIELD and that means	you can't use
       lua_pcall() on this lua_State* handle. You can't	create a new handle
       and use the lua_xmove() trick either as lua_newthread() itself can
       throw on	ENOMEM:

	   lua_State *lua_newthread(lua_State *L); // [-0, +1, m]

       Fear not, for here is the place where we	can finally use	a panic
       function	to throw a custom C++ exception. There are only	two caveats.
       The first one is	related	to LuaJIT
       <https://www.freelists.org/post/luajit/LuaJIT-ObjectiveC-throw-in-lua-atpanic-clang-infinite-recursion,5>
       having such tight integration with native exceptions that it makes
       (almost)	no distinction between lua_pcall() and C++ catch frames" [2].
       The net result is that you can use C++'s	catch-all blocks and then no
       panic function will ever	be involved (by	now you	must be	feeling	that
       we just travelled to the	farthest candy shop in the kingdom just	to
       make a full-turn	just one block away from destination when we changed
       our minds and decided to	go on the neighbour's candy shop). Despite the
       lack of a real panic function throwing our own exceptions, I'll still
       use the same previous terminology (i.e. panic-triggered exceptions).

       The second caveat is a little charming race to avoid. The completion
       handler doing the host job is executed through the strand that protects
       the VM. If we let the exception escape the completion handler, another
       thread might try	to use the VM before we	have the chance	to close it.
       In other	words, the following approach has a race and thus is not used:

	   for (;;) {
	       try {
		   // Completion handler allows	the panic
		   // exception	to escape here.
		   ioctx.run();
		   break;
	       } catch (...) {
		   // This is a	bug. This code isn't executed
		   // through the VM strand. A pending operation
		   // that just	finished could try to access
		   // `current`	from another thread while we're
		   // here.
		   vm_context* current = ...;
		   current->close();
		   continue;
	       }
	   }

       Therefore, it is	responsibility from the	completion handler to handle
       the panic-triggered exception (sorry about the boilerplate on your
       side, but that's	the way	it is).

	   try {
	       // lua_push*() calls
	   } catch (...) {
	       vm_ctx->close();
	       return;
	   }
	   int res = lua_resume(fiber, narg);

       That is enough to cover the policy for host code	and finally finish the
       LUA_ERRMEM discussion too.

CHANNELS AND RESOURCES
       The biggest challenge to	cross-VM resource management are the
       multi-strand sync primitives (i.e. the channels). They have to execute
       code that jumps from one	strand to another to finish their jobs.	If the
       associated execution context already finished, then they	would be stuck
       forever.	The solution is	for them to keep the execution context busy
       through a work guard.

       However some rules are needed to	make this work:

          Rx-channels (i.e. inbox) don't keep work guards.

          Tx-channels keep a work guard to the	other end while	they are
	   alive. But they only	keep a work guard to their own strands when
	   they	have an	active operation.

       If the tx-channels are not closed, they will prevent execution contexts
       that are	no longer necessary from being destroyed. But that's the best
       we can do. We could periodically	call the GC to free unused channels,
       but so will lua code anyway and there's nothing left for	us to do on
       the C++ side. A good practice for lua code would	be to add the
       following chunk at the beginning	of the fiber who's gonna process the
       actor messages:

	   scope_cleanup_push(function() inbox:close() end)

       Extra rules for channels	management:

          As an extra safety measure, if the main fiber finishes and inbox
	   wasn't imported, the	runtime	closes it.

          Channels (tx	and rx)	also get closed	when the VM is terminated.

          Channels must only upgrade their weak references to vm_context once
	   they	migrated to the	target strand. Otherwise, they would prevent
	   the VM from auto-closing (and hairy problems	would follow).

THE EXCEPTION MECHANISM
       C++ exceptions must not be used to propagate errors across lua/C++
       frames. However,	lua errors may simply trigger stack unwinding (the
       code makes heavy	use of setjmp()) and we	do depend on RAII to keep the
       code correct.

       It is assumed that any call to lua_error() will behave as-if it throws
       a C++ exception (thus triggering	our destructors). We require some
       support from the	luaJIT VM for this. Specifically, we can't rely	on the
       no interoperability category
       <http://luajit.org/extensions.html#exceptions> from their exception
       section on the extensions page"	because	the following restriction:

	  Throwing Lua errors across C++ frames	will not call C++
	  destructors.

       To make matters worse, the feature we do	depend on only appears in the
       the full	interoperability category:

	  Throwing Lua errors across C++ frames	is safe. C++ destructors
	  will be called.

       A different approach would be to	implement an exception mechanism in
       terms of	coroutines (although it'd add to code complexity):

	      Exceptions < Coroutines <	Continuations

	  Exceptions can be thought of as a subclass of	coroutines. You
	  can implement	an exception mechanism with coroutines.
	    leafo.net
	    <http://leafo.net/posts/itchio-and-coroutines.html#overview-of-coroutines>--

       But this	path would be a	dead-end as native lua errors would still be
       reported	through	lua_error(). For luaJIT, lua_error() plays well	with
       our code	because:

	  The LuaJIT VM	is fully resumable. This means you can yield
	  from a coroutine even	across contexts, where this would not
	  possible with	the standard Lua 5.1 VM: e.g. you can yield
	  across pcall() and xpcall(), across iterators	and across
	  metamethods.
	    -- http://luajit.org/extensions.html#resumable

       Wasn't for this guarantee, the project would be monstrous. To
       understand why this guarantee is	important, let's unravel the
       fundamental pattern for fibers support. We always implicitly wrap every
       user code inside	a lua coroutine:

	   local fib = coroutine.create(user_fn)

       So async	operations can suspend the calling fiber and resume them
       later.

       But user_fn might very well contain a pcall() and execute our
       suspending async	function inside	it:

	   function user_fn()
	       pcall(function()
		   io_obj:emilua_async_op()
	       end)
	   end

       The exception mechanism should not block	our ability to suspend fibers.
       When our	own native code	calls lua_yield() to suspend a fiber, the
       suspension mechanism should be able to cross the	pcall()	barrier.

       To wrap all up so far, the standard lua exception mechanism is used to
       report errors. The only difference is that emilua will lua_error() a
       structured error	object inspired	by std::error_code for our own errors.

       Things would get	a little tricky	on the following point that we raised
       previously though:

	  [...]	and we do depend on RAII to keep the code correct.

       Imagine we have some code like the following:

	   class reference
	   {
	   public:
	       reference() : L(nullptr)	{}

	       reference(lua_State* L)
		   : L(L)
		   , idx(luaL_ref(L, LUA_REGISTRYINDEX))
	       {}

	       ~reference()
	       {
		   if (!L)
		       return;

		   luaL_unref(L, LUA_REGISTRYINDEX, idx);
	       }

	       reference(reference&& o)
		   : L(o.L)
		   , idx(o.idx)
	       {
		   o.L = nullptr;
	       }

	       lua_State* state() const
	       {
		   return L;
	       }

	       void push() const
	       {
		   assert(L);
		   lua_pushinteger(L, idx);
		   lua_gettable(L, LUA_REGISTRYINDEX);
	       }

	   private:
	       lua_State* L;
	       int idx;
	   };

       If an object of this type has its destructor called on
       lua_error()-triggered stack unwinding, it means we're manipulating the
       lua_State* (luaL_unref(L) in this example) on stack unwinding (i.e.
       outside of a lua-catch block which would	be just	after a	pcall()
       return).	If the VM is not in a safe state for manipulations at this
       moment (this scenario just doesn't happen if you	stick with plain C
       which is	the target lua was developed for) then we're screwed. Luckily,
       the VM can handle such situations just fine as it is hinted on the
       luaJIT documentation:

	      static int wrap_exceptions(lua_State *L, lua_CFunction f)
	      {
		try {
		  return f(L);	// Call	wrapped	function and return result.
		} catch	(const char *s)	{  // Catch and	convert	exceptions.
		  lua_pushstring(L, s);
		} catch	(std::exception& e) {
		  lua_pushstring(L, e.what());
		} catch	(...) {
		  lua_pushliteral(L, "caught (...)");
		}
		return lua_error(L);  // Rethrow as a Lua error.
	      }
	    Recommended	usage pattern for <>LUAJIT_MODE_WRAPCFUNC</> --
	    http://luajit.org/ext_c_api.html#mode_wrapcfunc

       This guarantee is promised again	(although this version of the promise
       is read-only) in	their extensions page (and again only at the full
       interoperability	category):

	  Lua errors can be caught on the C++ side with	catch(...). The
	  corresponding	Lua error message can be retrieved from	the Lua
	  stack.
	    -- http://luajit.org/extensions.html#exceptions(emphasis

       The final piece for our puzzle is related to async ops converting
       std::error_code into lua	exceptions (i.e. lua_error()). The completion
       handler for async ops is	not called in a	lua context, so	they cannot
       just call lua_error() and hope the correct context will catch the
       exception (there's no API similar to resume_with()
       <https://www.boost.org/doc/libs/1_67_0/libs/context/doc/html/context/ff.html#context.ff.executing_function_on_top_of_a_fiber>
       from Boost.Context" ). They need	to return control to the native	code
       that suspended the fiber	so it can throw	a lua exception	before control
       returns to lua code.

       This guarantee used to exist on luaJIT 1.x (which included Coco):

	  Now, if the current coroutine	has an associated C stack,
	  lua_yield() returns the number of arguments passed back from
	  the resume.
	    -- http://coco.luajit.org/api.html#lua_yield

       The lack	of allocated C stacks brings more complications	to the
       implementation that will	be discussed later. lua_yieldk()
       <https://www.lua.org/manual/5.2/manual.html#lua_yieldk> from Lua	5.2
       would be	enough for us (and cheaper!), but we don't have	that either
       <https://github.com/LuaJIT/LuaJIT/issues/48>.

       Yet another option would	be to set an one-time hook to be called
       immediately just	before resuming	the lua	coroutine, but it'd present
       challenges in the future	if we ever add debugging support, so it	is
       avoided.

       And the solution	Emilua get away	with is	wrapping the C function	inside
       a lua function. The C function returns a	2-tuple. If the	first argument
       is not nil, the lua function itself will	take care of use it to raise
       an error.

	   local error,	native = ...
	   return function(...)
	       local e,	v = native(...)
	       if e then
		   error(e)
	       else
		   return v
	       end
	   end

USER-COROUTINES
       Let's jump straight to a	topic that gives some sense of continuity to
       the previous section. The pcall() barrier is not	the only barrier that
       the user	can insert to prevent lua_yield() from suspending the fiber.
       The user	might very well	just wrap calls	using coroutine.create():

	   function user_fn()
	       coroutine.create(function()
		   io_obj:emilua_async_op()
	       end)
	   end

	   Note

   Rule
       Lua's coroutine module must never be directly exposed to	lua code.

       The problem is solved by	exposing a different coroutine module -- a
       small shim over the original one. This version inspects this_fiber's
       suspension reason (native code or lua code).

       Conceptually, the implementation	looks like this:

	   function coroutine.resume(co, ...)
	       if _G.busy_coroutines[co] then
		   -- CORUN
		   error("cannot resume	running	coroutine", 2)
	       end

	       local args = {...}
	       while true do
		   local ret = {raw_coroutine.resume(co, unpack(args))}
		   if ret[1] ==	false then
		       return unpack(ret)
		   end
		   if _G.this_fiber.native_yield then
		       _G.busy_coroutines[co] =	true
		       args = {raw_coroutine.yield(unpack(ret, 2))}
		       _G.busy_coroutines[co] =	nil
		   else
		       return unpack(ret)
		   end
	       end
	   end

	   function coroutine.yield(...)
	       if _G.fibers[raw_coroutine.running()] ~=	nil then
		   error("bad coroutine", 2)
	       end
	       return raw_coroutine.yield(...)
	   end

	   function coroutine.status(co)
	       if _G.busy_coroutines[co] then
		   return "normal"
	       end

	       return raw_coroutine.status(co)
	   end

	   function coroutine.running()
	       local co	= raw_coroutine.running()
	       if _G.fibers[co]	~= nil then
		   -- Fiber's coroutines work just like	the main coroutine
		   return nil
	       end

	       return co
	   end

	   coroutine.create = ...
	   coroutine.wrap = ...

DEAD FIBERS
       When an exception escapes the fiber stack, the hook registered with
       sys.set_uncaught_hook() is called. The default hook prints the stack
       trace to	stderr and additionally	terminates the VM if the exception
       escaped from the	main fiber. If the custom hook itself fails, the
       default hook is then called anyway.

       Scope handlers are properly popped and called after the hook returns
       control of the thread to	the runtime.

       The hook	is only	called for detached fibers. Therefore, a different
       behaviour can be	chosen for each	join()ed fiber.	Also, if the fiber
       isn't explicitly	detach()ed, the	hook action will be deferred until
       some GC round.

       There isn't a pcall block around	the whole program. lua_resume is
       enough and it has the nice property of not unwinding the	stack so it
       can be examined from the	error handler. A new lua thread	is created to
       execute the uncaught-hook while it has the chance to examine the
       unchanged error'ed call stack.

	   Note

	   The hook mechanism isn't implemented	yet.

FUNCTIONS THAT RECEIVE A LUA CALLBACK
       There are plenty	of functions that have a lua closure as	a parameter
       (e.g. pcall(), scope(), ...). If	we blindly implement them in plain C,
       they will configure a non-leaf C	stack frame which we cannot suspend.

       To avoid	the C stack frame in the middle	of the call-stack altogether,
       we implement (parts of) these functions in lua, not C. The problem is
       then how	to expose sensitive raw	resources that the C functions would
       use. One	of the goals is	to not let these resources escape elsewhere.

       A quick way to achieve it is by having a	lua bootstrap function/chunk
       to create closures and later change their upvalues through C:

	   local private_resource = ...
	   return function()
	       -- use `private_resource`
	   end

       This approach is	naive as luaJIT	2.x does not implement some lua
       functions (i.e. the sensitive raw resources that	we want	to keep
       private)	as C functions and we cannot feed them as upvalues for the
       imported	bytecode. For instance,	we have	this behaviour for pcall():

	   lua_pushcfunction(L,	luaopen_base);
	   lua_call(L, 0, 0);
	   lua_getglobal(L, "pcall");
	   lua_CFunction pcall_addr = lua_tocfunction(L, -1);
	   assert(pcall_addr ==	nullptr); // :-(

       Therefore the lua bytecode won't	be a closure with uninitialized
       upvalues	per se,	but a function that receives the private resources and
       returns the needed closure. It is an extra step on startup, but at
       least we	save some cycles by compiling the bytecode with	stripped debug
       info in the project build stage.

PROCESS	ENVIRONMENT
       A part of the process environment (e.g. UNIX signals) should be under
       complete	control	of the program and no external library should meddle
       with it.	However, no protections	will be	provided to enforce this good
       practice.

VM SETTINGS INHERITANCE
       New actors should inherit generic customization points for the GC (e.g.
       step count and period) and the JIT. They	should also inherit allocator
       settings, but they must not be prevented	from creating new actors with
       higher allocation quotas	(unless	of course the global pool is already
       at its limit).

LUA 5.2/LUAJIT EXTENSIONS
       We use some C functions found only on Lua 5.2+ and/or LuaJIT:

          luaL_traceback()

          luaopen_bit()

          luaopen_jit()

          luaopen_ffi()

       There are projects such as
       <https://github.com/keplerproject/lua-compat-5.2> Kepler	that offer a
       port of these functions to Lua 5.1" .

2GB ADDRESSING LIMIT
       luaJIT
       <http://hacksoflife.blogspot.com/2012/12/integrating-luajit-with-x-plane-64-bit.html>
       has a serious 2GB limit"	 that has been fixed
       <https://www.freelists.org/post/luajit/Fixed-a-segfault-when-unsinking-64bit-pointers>
       on forks" . By default, the broken 64-bit addressing mode is hidden
       behind LUAJIT_ENABLE_GC64. Emilua might consider	moving to moonjit
       <https://www.freelists.org/post/luajit/LuaJIT-staging-fork-to-move-the-project-forward>
       if its author don't try to part away from the lua 5.1 core and keep
       himself distant from 5.3+ syntactic explosion madness. I	don't like
       this C++-like culture expanding to lua or other languages (kudos	to Go
       here for	avoiding it).

JIT PARAMETERS
       The JIT parameters are also changed from	the old	defaults
       <http://luajit.org/running.html#opt_O>:

	   maxtrace=1000
	   maxrecord=4000
	   maxmcode=512	 -- in KB

       To defaults
       <https://github.com/openresty/luajit2#updated-jit-default-parameters>
       based on	OpenResty findings" :

	   maxtrace=8000
	   maxrecord=16000
	   maxmcode=40960  -- in KB

LOCALES
       A recent	POSIX standard specified anemic	per-thread and per-function
       locale support, but, aside from this anemic support, C uses the same
       locale globally for the whole process.

       Meanwhile, C++ has somewhat usable support for multiple locales per
       process (and an extra global one	that also affects the global C
       locale).

       Functions such as perror() and strerror() will query LC_MESSAGES	from
       the global C locale. However the	sole function to query this attribute
       -- setlocale() -- is not	thread-safe so we shouldn't change the locale
       after the program starts	and minimal initialization to the process
       state is	done. Changing the global locale is highly unsafe and such API
       will not	be exposed to Lua code.

       The thread-safe C++ locales export functionality	for LC_MESSAGES
       through the facet std::messages.	This facet allows one to open
       system-defined message catalogs,	and get	translation messages for them.
       This facet exposes no equivalent	for the	query setlocale(LC_MESSAGES,
       NULL). Even if we query it at the beginning of the program and try to
       attach a	new custom facet to the	global locale object, this will	create
       a nameless locale. Unnamed global C++ locales will break	LC_MESSAGES
       for the C ecosystem (e.g. perror() will no longer print localized
       messages). Therefore custom facets are out of question.

       A direct	call to	setlocale(LC_MESSAGES, NULL) is	avoided	too because
       ISO C++ doesn't define the macro	LC_MESSAGES. To	query the current
       LC_MESSAGES we just look	for LC_MESSAGES	in the current C++ locale's
       name. This approach doesn't interfere with the C	ecosystem, and also
       paves the way for multiple per-process locales.

       One can find the	list of	POSIX environment variables that affect	the
       process'	locale at
       https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02.
       The format for these variables is defined as:

	   [language[_territory][.codeset][@modifier]]

       This format is compatible with RDF's Turtle where LANGTAG is defined
       as:

	   LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*

       And it matches the semantics for	BCP47 definition:

	   obs-language-tag = primary-subtag *(	"-" subtag )
	   primary-subtag   = 1*8ALPHA
	   subtag	    = 1*8(ALPHA	/ DIGIT)

       The registry of subtags is maintained by	IANA at
       https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry.

       So LC_MESSAGES=pt_BR becomes Turtle's "literal"@pt-BR (and at least the
       subtag is case sensitive).

	   Caution

	   A Turtle language-tagged string ceases to be	of the datatype
	   http://www.w3.org/2001/XMLSchema#string. Its	datatype will be
	   http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. If this is a
	   problem for your application, do not	use Turtle language-tagged
	   strings.

       For more	information about C++ locales, the following links are
       relevant:

          https://stdcxx.apache.org/doc/stdlibug/24-3.html

          https://gcc.gnu.org/onlinedocs/libstdc++/manual/facets.html#std.localization.facet.messages%23facet.messages.design

          https://www.gnu.org/software/libc/manual/html_node/Locale-Names.html

OPEN QUESTIONS
          Describe the	behaviour for sys.exit() (for main and secondary VMs).
	   Should it call the cancellator for every active operation? Should
	   it exit the application?

EXTRA CAUTION TO TAKE WHEN WRITING PLUG-INS
       Always keep in mind:

          If you enable your IO object	to be sent over	channels, it'll	also
	   be able to migrate to a different asio::io_context and you must
	   take	care to	keep a work guard to the original asio::io_context.

          Pending operations must hold	a strong reference to vm_context and a
	   work	guard -- directly or indirectly	-- to vm_context.strand().

          IO objects (channels	included) by themselves	must not hold any
	   strong references to	their own vm_context (this cycle would prevent
	   auto-closing	the VM and associated channels). Operation initiation
	   is the perfect time to upgrade weak references (if any) to strong
	   ones.

          Pending operations must not trust L from the	initiating operation
	   to decide which fiber to wake-up later on. They must	resort -- at
	   initiation time -- to the vm_context	API. Check the simple
	   sleep_for() implementation for a code template.

FINAL NOTE
       Emilua software is complex. There should	be no pursuit in indefinitely
       extending this base. Rather, we should search for stabilization and
       maturity	(and also tooling around a solid base).

       If you think there should be a nice lua library to handle IRC and
       what-not, by all	means do write it, but write it	as a separate lua
       library (or native plug-in), and	compete	against	the free market	of
       libraries. Do not submit	a proposal to integrate	it in the core.	There
       are no batteries	included. And there shall be no	committee-driven
       development.

       Likewise, we should be stuck in the current lua syntax (5.1 plus	some
       extensions found	in the beta branch of luaJIT 2.1[3]) forever. If you
       want more syntax, use a transpiler.

NOTES
       [1]    http://lua-users.org/lists/lua-l/2007-10/msg00600.html

       [2]    Do  notice that contrary to the feeling nourished	in the mailing
	      list thread, panic functions also	would work in our  case.  I've
	      tested/verified and I also followed the relevant source code for
	      multiple LuaJIT versions.	Really,	it's okay.

       [3]    http://luajit.org/extensions.html#lua52
	      (-DLUAJIT_ENABLE_LUA52COMPAT).

Emilua 0.11.2			  2025-03-12			     EMILUA(7)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=emilua-internals&sektion=7&manpath=FreeBSD+Ports+14.3.quarterly>

home | help