Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
EMILUA(7)		       Emilua reference			     EMILUA(7)

NAME
       Emilua -	Lua execution engine

DESCRIPTION
       Emilua also provides a fiber cancellation API that you can use to
       cancel fibers (you might	use it to free resources from fibers stuck in
       IO requets that might never complete).

       The main	question that a	fiber cancellation API needs to	answer is how
       to keep the application in a consistent state. A	consistent state is a
       knowledge that is part of the application and the programmer
       assumptions, not	a knowledge encoded in emilua source code itself. So
       it is okay to offload some of the responsibility	on the application
       itself.

       One dumb'n'quick	example	that illustrates the problem of	a consistent
       state follows:

	   local m = mutex.new()

	   local f = spawn(function()
	       m:lock()
	       sleep(2)
	       m:unlock()
	   end)

	   sleep(1)
	   f:cancel()
	   m:lock()

       Before a	fiber can be discarded at cancellation,	it needs to restore
       state invariants	and free resources. The	GC would be hopeless in	the
       previous	example	(and many more)	because	the mutex is shared and	still
       reachable even if we collect the	canceled fiber's stack.	There are
       other reasons why we can't rely on the GC for the job.

       Windows approach	to thread cancellation would be	a contract. This
       contract	requires the programmer	to never call a	blocking function
       directly	-- always using	WaitForMultipleObjects(). And another rule:
       pass a cancellation handle along	the call chain for other functions
       that need to perform blocking calls. Conceptually, this solution	is
       just the	same as	Go's:

	   select {
	   case	job <- queue:
	       // ... do job ...
	   case	<- ctx.Done():
	       // goroutine cancelled
	   }

       The difference being that Go's Context is part of the standard library
       and a contract everybody	adopts.	The lesson here	is that	cancellation
       is part of the runtime, or else it just doesn't work. In	Emilua,	the
       runtime is extended to provide cancellation API inspired	by POSIX's
       thread cancellation.

       The rest	of this	document will gloss over many details, but as long as
       you stay	on the common case, you	won't need to keep most	of these
       details in mind (sensible defaults) and for the details that you	do
       need to remember, there is a smaller recap section at the end.

	   Warning

	   Do not copy-paste code snippets surrounded by WARNING blocks.
	   They're most	likely to break	your program. Do read the manual to
	   the end. These code snippets	are there as intermediate steps	for
	   the general picture.

THE LUA	EXCEPTION MODEL
       It is easy to find a try-catch construct	in mainstream languages	like
       so:

	   try {
	       // code that might err
	   } catch (Exception e) {
	       // error	handler
	   }

	   // other code

       And here's lua translation of this pattern:

	   local ok = pcall(function()
	       -- code that might err
	   end)
	   if not ok then
	       -- error	handler
	   end
	   -- other code

       The main	difference here	is that	lua's exception	mechanism doesn't
       integrate tightly with the type system (and that's okay). So the
       catch-block is always a catch-all really. Also, the structure initially
       suggests	we don't need special syntax for a finally block:

	   try {
	       // code that might err
	   } catch (Exception e) {
	       // error	handler
	   } finally {
	       // cleanup handler
	   }

	   // other code

	   local ok = pcall(function()
	       -- code that might err
	   end)
	   if not ok then
	       -- error	handler
	   end
	   -- cleanup handler
	   -- other code

       In sloppy terms,	the cancellation API just re-schedules the fiber to be
       resumed but with	the fiber stack	slightly modified to throw an
       exception when execution	proceeds. This property	will trigger stack
       unwinding to call all the error & cleanup handlers in the reverse order
       that they were registered.

THE CANCELLATION PROTOCOL
       The fiber handle	returned by the	spawn()	function is the	heart to
       communicate intent to cancel a fiber. To	better accommodate support for
       structured concurrency and not introduce	avoidable co-dependency
       between them, we	follow the POSIX thread	cancellation model (Java's
       confusing state machine is ignored). Long story short, once a fiber has
       been canceled, it cannot	be un-canceled.

       To cancel a fiber, just call the	cancel() function from a fiber handle:

	   fib:cancel()

	   Caution

	   You can only	cancel joinable	fibers (but the	function is safe to
	   call	with any handle	at any time).

       Afterwards, you can safely join() or detach() the target	fiber:

	   fib:join()

	   -- ...or
	   fib:detach()

       If you don't detach a fiber, the	GC will	do it for you.

       It's that easy. Your fiber doesn't need to know the target fiber's
       internal	state and the target fiber doesn't need	to know	your fiber'
       internal	state. On the other end, to handle an cancellation request is
       a little	trickier.

HANDLING CANCELLATION REQUESTS
       The key concept required	to understand the cancellation's flow is the
       cancellation point. Understand this, and	you'll have learnt how to
       handle cancellation requests.

	   Note

   Definition
       An cancellation point configures	a point	in your	application where it
       is allowed for the Emilua runtime to stop normal	execution flow and
       raise an	exception to trigger stack unwinding if	an cancellation
       request from another fiber has been received.

       When the	possibility of cancellation is added to	the table, your	mental
       model has to take into account that calls to certain functions now
       might throw an error for	no other reason	but rewind the stack before
       freeing the fiber.

       The only	places that are	allowed	to serve as cancellation points	are
       calls to	suspending functions (plus the pcall() family and
       coroutine.resume() for reasons soon to be explained).

	   -- this snippet has no cancellation points
	   -- exceptions are never raised here
	   local i = 0
	   while true do
	       i = i + 1
	   end

       The following function doesn't need to worry about leaving the object
       self in an inconsistent state if	the fiber gets canceled. And the
       reason for this is quite	simple:	this function doesn't have
       cancellation points (which is usually the case for functions that are
       purely compute-bound). It won't ever be canceled	in the middle of its
       work.

	   function mt:new_sample(sample)
	       self.mean_ = self.a * sample + (1 - self.a) * self.mean_
	       self.f =	self.a + (1 - self.a) *	self.f
	   end

       Functions that suspend the fiber	(e.g. IO and functions from the
       condition_variable module) configure cancellation points. The function
       echo defined below has cancellation points.

	   function echo(sock, buf)
	       local nread = sock:read(buf) (1)
	       sock:write(buf, nread)	    (2)
	   end

       Now take	the following code to orchestrate the interaction between two
       fibers.

	   local child_fib = spawn(function()
	       local buf = buffer.new(1024)
	       echo(global_sock, buf)
	   end)

	   child_fib:cancel()

       The mother-fiber	doesn't	have cancellation points, so it	executes til
       the end.	The child_fib fiber calls echo() and echo() will in turn act
       as a cancellation point (i.e. the property of being a cancellation
       point propagates	up to the caller functions).

	   Note

	   this_fiber.yield() can be used to introduce cancellation points for
	   fibers that otherwise would have none.

       The mother-fiber	doesn't	call any suspending function, so it'll run
       until the end and only yields execution back to other fibers when it
       does end. At the	last line, a cancellation request is sent to the child
       fiber. The runtime's scheduler doesn't guarantee	when the cancellation
       request will be delivered and can schedule execution of the remaining
       fibers with plenty of freedom given we're not using any synchronization
       primitives.

       In this simple scenario,	it's quite likely that the cancellation
       request will be delivered pretty	quickly	and the	call to	sock:read()
       inside echo() will suspend child_fib just to awake it again but with an
       exception being raised instead of the result being returned. The
       exception will unwind the whole stack and the fiber finishes.

       Any of the cancellation points can serve	for the	fiber to act on	the
       cancellation request. Another possible point where these	mechanisms
       would be	triggered is the sock:write() suspending function.

	   Note

	   The uncaught-hook isn't called when the exception is	fiber_canceled
	   so you don't	really have to care about trapping cancellation
	   exceptions. You're free to just let the stack fully unwind.

	   Warning

	       local child_fib = spawn(function()
		   local buf = buffer.new(1024)
		   global_sock_mutex:lock()
		   local ok, ex	= pcall(function()
		       echo(global_sock, buf)
		   end)
		   global_sock_mutex:unlock()
		   if not ok then
		       error(ex)
		   end
	       end)

       To register a cleanup handler in	case the fiber gets canceled, all you
       need to do is handle the	raised exceptions.

       A fiber is always either	canceled or not	canceled. A fiber doesn't go
       back to the un-canceled state. Once the fiber has been canceled,	it'll
       stay in this state. The task in hand is to rewind the stack calling the
       cleanup handlers	to keep	the application	state consistent after the GC
       collect the fiber -- all	done by	the Emilua runtime.

       So you can't call more suspending functions after the fiber gets
       canceled:

	   local ok, ex	= pcall(function()
	       -- lots of IO ops		(1)
	   end)
	   if not ok then
	       watchdog_sock:write(errored_msg)	(2)
	       error(ex)
	   end
       (1) Lots	of cancellation	points.	All swallowed by pcall().
       (2) If fiber gets canceled at #1, it won't init any IO
	   operation here but instead throw another fiber_canceled
	   exception.

       The previous snippet has	an error. To properly achieve the desired
       behaviour, you have to temporally disable cancellations in the cleanup
       handler like so:

	   local ok, ex	= pcall(function()
	       -- lots of IO ops
	   end)
	   if not ok then
	       this_fiber.disable_cancellation()
	       pcall(function()
		   watchdog_sock:write(errored_msg)
	       end)
	       this_fiber.restore_cancellation()
	       error(ex)
	   end

	   Note

	   this_fiber.restore_cancellation() has to be called as many times as
	   this_fiber.disable_cancellation() has been called to	restore
	   cancelability.

       It looks	messy, but this	behaviour actually helps the common case to
       stay clean. Were	not for	these choices, a common	fiber that doesn't
       have to handle cancellation like	the following would accidentally
       swallow a cancellation request and never	get collected:

	   local ok = false
	   while not ok	do
	       ok = pcall(function()
		   my_udp_sock:send(notify_msg)
	       end)
	   end

       And the pcall() family in itself	also configures	a cancellation point
       exactly to make sure that loops like this won't prevent the fiber from
       being properly canceled.	pcall()	family and coroutine.resume() are the
       only functions which aren't suspending functions	but introduce
       cancellation points nevertheless.

	   Note

	   It is guaranteed that fib:cancel() will never be a cancellation
	   point (and neither a	suspension point).

	   This	guarantee is useful to build certain concurrency patterns.

THE scope() FACILITY
       The control flow	for the	common case is good, but handling
       cancellations right now is tricky to say	the least. To make matters
       less error-prone, the scope() family of functions exist.

          scope()

          scope_cleanup_push()

          scope_cleanup_pop()

       The scope() function receives a closure and executes it,	but it
       maintains a list	of cleanup handlers to be called on the	exit path (be
       it reached by the common	exit flow or by	a raised exception). When you
       call it,	the list of cleanup handlers is	empty, and you can use
       scope_cleanup_push() to register	cleanup	handlers. They are executed in
       the reverse order in which they were registered.	The handlers are
       called with the cancellations disabled, so you don't need to disable
       them yourself.

	   Note

	   It is safe to have nested scope()s.

       One of the previous examples can	now be rewritten as follows:

	   local child_fib = spawn(function()
	       local buf = buffer.new(1024)
	       global_sock_mutex:lock()
	       scope_cleanup_push(function() global_sock_mutex:unlock()	end)
	       echo(global_sock, buf)
	   end)

	   Note

	   A hairy situation happens when a cleanup handler itself throws an
	   error. The reason why the default uncaught-hook doesn't terminate
	   the VM when secondary fibers	fail is	that cleanup handlers are
	   trusted to keep the program invariants. Once	a cleanup handler
	   fails we can	no longer hold this assumption.

	   Once	a cleanup handler itself throws	an error, the VM is
	   terminated[1] (there's no way to recover from this error without
	   context, and	conceptually by	the time uncaught hooks	are executed,
	   the context was already lost). If you need some sort	of protection
	   against one complex module that will	fail now and then, run it in a
	   separate actor.

	   In C++ this scenario	is analogous to	a destructor throwing an
	   exception when the destructor itself	was triggered by an
	   exception-provoked stack unwinding. And the result is the same,
	   terminate() <https://en.cppreference.com/w/cpp/error/terminate>.

       If you want to call the last registered cleanup handler and pop it from
       the list, just call scope_cleanup_pop().	scope_cleanup_pop() receives
       an optional argument informing whether the cleanup handler must be
       executed	after removed from the list (defaulting	to true).

	   scope(function()
	       scope_cleanup_push(function()
		   watchdog_sock:write(errored_msg)
	       end)

	       -- lots of IO ops

	       scope_cleanup_pop(false)
	   end)

       Every fiber has an implicit root	scope so you don't need	to always
       create one yourself. The	standard lua's pcall() is also modified	to act
       as a scope which	is a lot of convenience	for you.

	   Important

	   Given pcall() is also an cancellation point,	examples written
	   enclosed in WARNING blocks from the previous	section	had bugs
	   related to maintaining invariants and the scope() family is the
	   safest way to register cleanup handlers.

IO OBJECTS
       It's not	unrealistic to share a single IO object	among multiple fibers.
       The following snippets are based	(the original code was not lua's) on
       real-world code:

       Fiber ping-sender

	   while true do
	       sleep(20)
	       write_mutex:lock()
	       scope_cleanup_push(function() write_mutex:unlock() end)
	       local ok	= pcall(function() ws:ping() end)
	       if not ok then
		   return
	       end
	       scope_cleanup_pop()
	   end

       Fiber consume-subscriptions

	   while true do
	       local ok	= pcall(function()
		   -- `app` may	call `write_mutex:lock()`
		   app:consume_subscriptions()
	       end)
	       if not ok then
		   return
	       end
	       -- uses `condition_variable`
	       app:wait_on_subscriptions()
	   end

       Fiber main

	   local buffer	= buffer.new(1024)
	   while true do
	       local ok	= pcall(function()
		   local nread = ws:read(buffer)
		   -- `app` may	call `write_mutex:lock()`
		   app:on_ws_read(buffer, nread)
	       end)
	       if not ok then
		   break
	       end
	   end

	   f1:cancel()
	   f2:cancel()
	   this_fiber.disable_cancellation()
	   f1:join()
	   f2:join()

       A fiber will never be canceled in the middle (tricky concept to define)
       of some IO operation. If	a fiber	suspended on some IO operation and it
       was successfully	canceled, it means the operation is not	delivered at
       all and can be tried again later	as if it never happened	in the first
       place. The following artificial example illustrates this	guarantee
       (restricting the	IO object to a single fiber to keep the	code sample
       small and easy to follow):

	   scope_cleanup_push(function()
	       my_sctp_sock:write(checksum.shutdown_msg)
	   end)
	   while true do
	       sleep(20)
	       my_sctp_sock:write(broadcast_msg)
	       checksum:update(broadcast_msg)
	   end

       If the cancellation request arrives when	the fiber is suspended at
       my_sctp_sock:write(), the runtime will schedule cancellation of the
       underlying IO operation and only	resume the fiber when the reply	for
       the cancellation	request	arrives. At this point,	if the original	IO
       operation already succeeded, fiber_canceled exception won't be raised
       so you have a chance to examine the result and the cancellation
       handling	will be	postponed to the next cancellation point.

	   Important

	   The pcall() family actually provides	the same fundamental
	   guarantee. Once it starts executing the argument passed, it won't
	   throw any fiber_canceled exception so you have a chance to examine
	   the result of the executed code. The	pcall()	family only checks for
	   cancellation	requests before	executing the argument.

	   Note

	   Some	IO objects might use relaxed semantics here to avoid expensive
	   implementations. For	instance, HTTP sockets might close the
	   underlying TCP socket if you	cancel an IO operation to avoid
	   bookkeeping state.

	   Refer to their documentation	to check when the behaviour uses
	   relaxed semantics. All in all, they should never block
	   indefinitely. That's	a guarantee you	can rely on. Preferably, they
	   won't use a timeout to react	on cancellations either	(that would be
	   just	bad).

USER-LEVEL COROUTINES
	   Important

	   Cancelability is not	a property from	the coroutine. The coroutine
	   can be created in one fiber,	started	in a second fiber and resumed
	   in a	third one. Cancelability is a property from the	fiber.

	   fibonacci = coroutine.create(function()
	       local a,	b = 0, 1
	       while true do
		   a, b	= b, a + b
		   coroutine.yield(a)
	       end
	   end)

       coroutine.resume() swallows exceptions raised within the	coroutine,
       just like pcall(). Therefore, the runtime guarantees coroutine.resume()
       enjoys the same properties found	in pcall():

          coroutine.resume() is a cancellation	point.

          coroutine.resume() only checks for cancellation requests before
	   resuming the	coroutine (i.e.	the cancellation notification is not
	   fully asynchronous).

          Like	pcall(), coroutine.create() will also create a new scope() for
	   the closure.	However, this scope (and any nested one) is
	   independent from the	parent fiber and tied not to the enclosing
	   parent fiber's lexical scopes but to	the coroutine lifetime.

       We can't	guarantee deterministic	resumption of zombie coroutines	to
       (re-)deliver cancellation requests (nor should).	Therefore, if the GC
       collects	any of your unreachable	coroutines with	remaining
       scope_cleanup_pop() to be done, it does nothing besides collecting the
       coroutine stack.	You have to prepare your code to cope with this
       non-guarantee otherwise you most	likely will have buggy code.

	   local co = coroutine.create(function()
	       m:lock()
	       -- this handler will never be called
	       scope_cleanup_push(function() m:unlock()	end)
	       coroutine.yield()
	   end)

	   coroutine.resume(co)

       The safe	bet is to just structure the code in a way that	there is no
       need to call scope_cleanup_push() within	user-created coroutines.

RECAP
       The fiber handle	returned by spawn() has	a cancel() member-function
       that can	be used	to cancel joinable fibers. The fiber only gets
       canceled	at cancellation	points.	To preserve invariants your app	relies
       on, register cleanup handlers with scope_cleanup_push().

       The relationship	between	user-created coroutines	and cancellations is
       tricky. Therefore, you should avoid creating (either manually or
       through some abstraction) cleanup handlers within them.

	   this_fiber.disable_cancellation()
	   local numbers = {8, 42, 38, 111, 2, 39, 1}

	   local sleeper = spawn(function()
	       local children =	{}
	       scope_cleanup_push(function()
		   for _, f in pairs(children) do
		       f:cancel()
		   end
	       end)
	       for _, n	in pairs(numbers) do
		   children[#children +	1] = spawn(function()
		       sleep(n)
		       print(n)
		   end)
	       end
	       for _, f	in pairs(children) do
		   f:join()
	       end
	   end)

	   local sigwaiter = spawn(function()
	       local sigusr1 = signals.new(signals.SIGUSR1)
	       sigusr1:wait()
	       sleeper:cancel()
	   end)

	   sleeper:join()
	   sigwaiter:cancel()

NOTES
       [1]    I	 initially  drafted  a	design to recover on limited scenarios
	      (check git history if you're curious), but then realized it  was
	      not only brittle but also	unable to handle leaked	fiber handles.
	      Worse,  it was very sensitive to leak fiber handles. Therefore I
	      dismissed	the idea altogether.

Emilua 0.11.2			  2025-03-12			     EMILUA(7)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=emilua-fiber_cancellation&sektion=7&manpath=FreeBSD+Ports+14.3.quarterly>

home | help