Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
PCBGROUP(9)	       FreeBSD Kernel Developer's Manual	   PCBGROUP(9)

     PCBGROUP -- Distributed Protocol Control Block Groups

     options PCBGROUP

     #include <sys/param.h>
     #include <netinet/in.h>
     #include <netinet/in_pcb.h>

     in_pcbgroup_init(struct inpcbinfo *pcbinfo, u_int hashfields,
	   int hash_nelements);

     in_pcbgroup_destroy(struct	inpcbinfo *pcbinfo);

     struct inpcbgroup *
     in_pcbgroup_byhash(struct inpcbinfo *pcbinfo, u_int hashtype,
	   uint32_t hash);

     struct inpcbgroup *
     in_pcbgroup_byinpcb(struct	inpcb *inp);

     in_pcbgroup_update(struct inpcb *inp);

     in_pcbgroup_update_mbuf(struct inpcb *inp,	struct mbuf *m);

     in_pcbgroup_remove(struct inpcb *inp);

     in_pcbgroup_enabled(struct	inpcbinfo *pcbinfo);

     #include <netinet6/in6_pcb.h>

     struct inpcbgroup *
     in6_pcbgroup_byhash(struct	inpcbinfo *pcbinfo, u_int hashtype,
	   uint32_t hash);

     This implementation introduces notions of affinity	for connections	and
     distribute	work so	as to reduce lock contention, with hardware work dis-
     tribution strategies such as RSS.	In this	construction, connection
     groups supplement,	rather than replace, existing reservation tables for
     protocol 4-tuples,	offering CPU-affine lookup tables with minimal cache
     line migration and	lock contention	during steady state operation.

     Internet protocols	like UDP and TCP register to use connection groups by
     providing an ipi_hashfields value other than IPI_HASHFIELDS_NONE.	This
     indicates to the connection group code whether a 2-tuple or 4-tuple is
     used as an	argument to hashes that	assign a connection to a particular
     group.  This must be aligned with any hardware-offloaded distribution
     model, such as RSS	or similar approaches taken in embedded	network
     boards.  Wildcard sockets require special handling, as in Willmann	2006,
     and are shared between connection groups while being protected by group-
     local locks.  Connection establishment and	teardown can be	signficantly
     more expensive than without connection groups, but	that steady-state pro-
     cessing can be significantly faster.

     Enabling PCBGROUP in the kernel only provides the infrastructure required
     to	create and manage multiple PCB groups.	An implementation needs	to
     fill in a few functions to	provide	PCB group hash information in order
     for PCBs to be placed in a	PCB group.

     By	default, each PCB info block (struct pcbinfo) has a single hash	for
     all PCB entries for the given protocol with a single lock protecting it.
     This can be a significant source of lock contention on SMP	hardware.
     When a PCBGROUP is	created, an array of separate hash tables are created,
     each with its own lock.  A	separate table for wildcard PCBs is provided.
     By	default, a PCBGROUP table is created for each available	CPU.  The PCB-
     GROUP code	attempts to calculate a	hash value from	the given PCB or mbuf
     when looking up a PCBGROUP.  While	processing a received frame,
     in_pcbgroup_byhash() can be used in conjunction with either a hardware-
     provided hash value (eg the RSS(9)	calculated hash	value provided by some
     NICs) or a	software-provided hash value in	order to choose	a PCBGROUP ta-
     ble to query.  A single table lock	is held	while performing a wildcard
     match.  However, all of the table locks are acquired before modifying the
     wildcard table.  The PCBGROUP tables operate in conjunction with the nor-
     mal single	PCB list in a PCB info block.  Thus, inserting and removing a
     PCB will still incur the same costs as without PCBGROUP.  A protocol
     which uses	PCBGROUP should	fall back to the normal	PCB list lookup	if a
     call to the PCBGROUP layer	does not yield a lookup	hit.

     Initialize	a PCBGROUP in a	PCB info block (struct pcbinfo)	by calling

     Add a connection to a PCBGROUP with in_pcbgroup_update().	Connections
     are removed by with in_pcbgroup_remove().	These in turn will determine
     which PCBGROUP bucket the given PCB is placed into	and calculate the hash
     value appropriately.

     Wildcard PCBs are hashed differently and placed in	a single wildcard PCB
     list.  If RSS(9) is enabled and in	use, RSS-aware wildcard	PCBs are
     placed in a single	PCBGROUP based on RSS information.  Protocols may look
     up	the PCB	entry in a PCBGROUP by using the lookup	functions
     in_pcbgroup_byhash() and in_pcbgroup_byinpcb().

     The PCB code in sys/netinet and sys/netinet6 is aware of PCBGROUP and
     will call into the	PCBGROUP code to do PCBGROUP assignment	and lookup,
     preferring	a PCBGROUP lookup to the default global	PCB info table.

     An	implementor wishing to experiment or modify the	PCBGROUP assignment
     should modify this	set of functions:

	   in_pcbgroup_getbucket() and in6_pcbgroup_getbucket()
		     Map a given 32 bit	hash value to a	PCBGROUP.  By default
		     this is hash % number_of_pcbgroups.  However, this	dis-
		     tribution may not align with NIC receive queues or	the
		     netisr(9) configuration.

	   in_pcbgroup_byhash()	and in6_pcbgroup_byhash()
		     Map a 32 bit hash value and a hash	type identifier	to a
		     PCBGROUP.	By default, this simply	returns	NULL.  This
		     function is used by the mbuf(9) receive path in
		     sys/netinet/in_pcb.c to map an mbuf to a PCBGROUP.

	   in_pcbgroup_bytuple() and in6_pcbgroup_bytuple()
		     Map the source and	destination address and	port details
		     to	a PCBGROUP.  By	default, this does a very simple XOR
		     hash.  This function is used by both the PCB lookup code
		     and as a fallback in the mbuf(9) receive path in

     mbuf(9), netisr(9), RSS(9)

     Paul Willmann, Scott Rixner, and Alan L. Cox, "An Evaluation of Network
     Stack Parallelization Strategies in Modern	Operating Systems", 2006
     USENIX Annual Technical Conference,,	2006.

     PCBGROUP first appeared in	FreeBSD	9.0.

     The PCBGROUP implementation was written by	Robert N. M. Watson
     <> under contract to Juniper Networks, Inc.

     This manual page written by Adrian	Chadd <>.

     The RSS(9)	implementation currently uses #ifdef blocks to tie into	PCB-
     GROUP.  This is a sign that a more	abstract programming API is needed.

     There is currently	no support for re-balancing the	PCBGROUP assignment,
     nor is there any support for overriding which PCBGROUP a socket/PCB
     should be in.

     No	statistics are kept to indicate	how often PCBGROUP lookups succeed or

FreeBSD	13.0			 July 23, 2014			  FreeBSD 13.0


Want to link to this manual page? Use this URL:

home | help