FreeBSD Manual Pages

home | help
TUNING(7)		Miscellaneous Information Manual	     TUNING(7)

NAME
       tuning -- performance tuning under FreeBSD

SYSTEM SETUP - DISKLABEL, NEWFS, TUNEFS, SWAP
       When using bsdlabel(8) or sysinstall(8) to lay out your file systems on
       a  hard	disk it	is important to	remember that hard drives can transfer
       data much more quickly from outer  tracks  than	they  can  from	 inner
       tracks.	 To take advantage of this you should try to pack your smaller
       file systems and	swap closer to	the  outer  tracks,  follow  with  the
       larger file systems, and	end with the largest file systems.  It is also
       important  to  size system standard file	systems	such that you will not
       be forced to resize them	later as you scale the machine up.  I  usually
       create,	in  order,  a 128M root, 1G swap, 128M /var, 128M /var/tmp, 3G
       /usr, and use any remaining space for /home.

       You should typically size your swap space to approximately 2x main mem-
       ory.  If	you do not have	a lot of RAM, though, you will generally  want
       a  lot  more  swap.   It	is not recommended that	you configure any less
       than 256M of swap on a system and you should keep in mind future	memory
       expansion when sizing the swap partition.  The kernel's VM paging algo-
       rithms are tuned	to perform best	when there is at least 2x swap	versus
       main memory.  Configuring too little swap can lead to inefficiencies in
       the  VM page scanning code as well as create issues later on if you add
       more memory to your machine.  Finally, on larger	systems	with  multiple
       SCSI  disks (or multiple	IDE disks operating on different controllers),
       we strongly recommend that you configure	swap on	each drive.  The  swap
       partitions  on  the  drives should be approximately the same size.  The
       kernel can handle arbitrary sizes but internal data structures scale to
       4 times the largest swap	partition.  Keeping the	swap  partitions  near
       the  same  size	will  allow  the kernel	to optimally stripe swap space
       across the N disks.  Do not worry about overdoing  it  a	 little,  swap
       space  is  the saving grace of Unix and even if you do not normally use
       much swap, it can give you more time to recover from a runaway  program
       before being forced to reboot.

       How  you	size your /var partition depends heavily on what you intend to
       use the machine for.  This partition is primarily used  to  hold	 mail-
       boxes,  the print spool,	and log	files.	Some people even make /var/log
       its own partition (but except for extreme cases it  is  not  worth  the
       waste of	a partition ID).  If your machine is intended to act as	a mail
       or  print  server, or you are running a heavily visited web server, you
       should consider creating	a much larger partition	-  perhaps  a  gig  or
       more.  It is very easy to underestimate log file	storage	requirements.

       Sizing  /var/tmp	 depends on the	kind of	temporary file usage you think
       you will	need.  128M is the  minimum  we	 recommend.   Also  note  that
       sysinstall  will	 create	 a /tmp	directory.  Dedicating a partition for
       temporary file storage is important for two reasons: first, it  reduces
       the possibility of file system corruption in a crash, and second	it re-
       duces  the  chance  of  a runaway process that fills up [/var]/tmp from
       blowing up more critical	subsystems (mail, logging, etc).   Filling  up
       [/var]/tmp is a very common problem to have.

       In  the	old days there were differences	between	/tmp and /var/tmp, but
       the introduction	of /var	(and /var/tmp) led  to	massive	 confusion  by
       program	writers	so today programs haphazardly use one or the other and
       thus no real distinction	can be made between  the  two.	 So  it	 makes
       sense  to have just one temporary directory and softlink	to it from the
       other tmp directory locations.  However you handle /tmp,	the one	 thing
       you  do	not want to do is leave	it sitting on the root partition where
       it might	cause root to fill up or possibly corrupt root in a  crash/re-
       boot situation.

       The  /usr partition holds the bulk of the files required	to support the
       system and a subdirectory within	it called /usr/local holds the bulk of
       the files installed from	the ports(7) hierarchy.	 If  you  do  not  use
       ports  all that much and	do not intend to keep system source (/usr/src)
       on the machine, you can get away	with  a	 1  gigabyte  /usr  partition.
       However,	 if you	install	a lot of ports (especially window managers and
       Linux-emulated binaries), we recommend at least a 2 gigabyte  /usr  and
       if you also intend to keep system source	on the machine,	we recommend a
       3  gigabyte  /usr.   Do	not underestimate the amount of	space you will
       need in this partition, it can creep up and surprise you!

       The /home partition is typically	used to	hold  user-specific  data.   I
       usually size it to the remainder	of the disk.

       Why  partition  at all?	Why not	create one big / partition and be done
       with it?	 Then I	do not have to worry about undersizing things!	 Well,
       there  are several reasons this is not a	good idea.  First, each	parti-
       tion has	different operational characteristics and separating them  al-
       lows  the file system to	tune itself to those characteristics.  For ex-
       ample, the root and /usr	partitions are read-mostly, with  very	little
       writing,	 while	a  lot	of reading and writing could occur in /var and
       /var/tmp.  By properly partitioning your	 system	 fragmentation	intro-
       duced  in  the  smaller	more  heavily write-loaded partitions will not
       bleed over into the mostly-read partitions.  Additionally, keeping  the
       write-loaded  partitions	 closer	 to the	edge of	the disk (i.e.,	before
       the really big partitions instead of after in the partition table) will
       increase	I/O performance	in the partitions where	you need it the	 most.
       Now  it	is true	that you might also need I/O performance in the	larger
       partitions, but they are	so large that shifting them more  towards  the
       edge of the disk	will not lead to a significant performance improvement
       whereas moving /var to the edge can have	a huge impact.	Finally, there
       are safety concerns.  Having a small neat root partition	that is	essen-
       tially read-only	gives it a greater chance of surviving a bad crash in-
       tact.

       Properly	partitioning your system also allows you to tune newfs(8), and
       tunefs(8) parameters.  Tuning newfs(8) requires more experience but can
       lead to significant improvements	in performance.	 There are three para-
       meters  that  are relatively safe to tune: blocksize, bytes/i-node, and
       cylinders/group.

       FreeBSD performs	best when using	8K or 16K  file	 system	 block	sizes.
       The  default file system	block size is 16K, which provides best perfor-
       mance for most applications, with the exception of those	 that  perform
       random  access on large files (such as database server software).  Such
       applications tend to perform better with	a smaller block	size, although
       modern disk characteristics are such that the performance gain from us-
       ing a smaller block size	may not	be worth consideration.	 Using a block
       size larger than	16K can	cause fragmentation of the  buffer  cache  and
       lead to lower performance.

       The  defaults  may be unsuitable	for a file system that requires	a very
       large number of i-nodes or is intended to hold a	large number  of  very
       small  files.   Such  a	file system should be created with an 8K or 4K
       block size.  This also requires you to specify a	smaller	fragment size.
       We recommend always using a fragment size that is 1/8  the  block  size
       (less  testing  has  been  done	on  other fragment size	factors).  The
       newfs(8)	options	for this would be "newfs -f 1024 -b 8192 ...".

       If a large partition is intended	to  be	used  to  hold	fewer,	larger
       files,  such as database	files, you can increase	the bytes/i-node ratio
       which reduces the number	of i-nodes (maximum number of files and	direc-
       tories that can be created) for that partition.	Decreasing the	number
       of  i-nodes  in a file system can greatly reduce	fsck(8)	recovery times
       after a crash.  Do not use this option unless you are actually  storing
       large  files  on	 the  partition, because if you	overcompensate you can
       wind up with a file system that has lots	of free	 space	remaining  but
       cannot  accommodate  any	 more  files.	Using  32768, 65536, or	262144
       bytes/i-node is recommended.  You can go	higher but it will  have  only
       incremental  effects on fsck(8) recovery	times.	For example, "newfs -i
       32768 ...".

       tunefs(8) may be	used to	further	tune a file system.  This command  can
       be  run in single-user mode without having to reformat the file system.
       However,	this is	possibly the most abused program in the	system.	  Many
       people  attempt	to increase available file system space	by setting the
       min-free	percentage to 0.  This can lead	to severe file system fragmen-
       tation and we do	not recommend that  you	 do  this.   Really  the  only
       tunefs(8) option	worthwhile here	is turning on softupdates with "tunefs
       -n  enable  /filesystem".  (Note: in FreeBSD 4.5	and later, softupdates
       can be turned on	using the -U option  to	 newfs(8),  and	 sysinstall(8)
       will  typically enable softupdates automatically	for non-root file sys-
       tems).  Softupdates drastically improves	meta-data performance,	mainly
       file  creation and deletion.  We	recommend enabling softupdates on most
       file systems; however, there are	two limitations	 to  softupdates  that
       you  should  be	aware  of when determining whether to use it on	a file
       system.	First, softupdates guarantees file system consistency  in  the
       case  of	 a  crash  but	could  very  easily be several seconds (even a
       minute!)	behind on pending write	to the physical	disk.	If  you	 crash
       you  may	 lose  more work than otherwise.  Secondly, softupdates	delays
       the freeing of file system blocks.  If you have a file system (such  as
       the  root  file system) which is	close to full, doing a major update of
       it, e.g.	"make installworld", can run it	out of space and cause the up-
       date to fail.  For this reason, softupdates will	not be enabled on  the
       root file system	during a typical install.  There is no loss of perfor-
       mance since the root file system	is rarely written to.

       A  number of run-time mount(8) options exist that can help you tune the
       system.	The most obvious and most dangerous one	is async.  Do not ever
       use it; it is far too dangerous.	 A  less  dangerous  and  more	useful
       mount(8)	 option	 is called noatime.  Unix file systems normally	update
       the last-accessed time of a file	or directory whenever it is  accessed.
       This  operation is handled in FreeBSD with a delayed write and normally
       does not	create a burden	on the system.	However, if your system	is ac-
       cessing a huge number of	files on a continuing basis the	 buffer	 cache
       can  wind  up getting polluted with atime updates, creating a burden on
       the system.  For	example, if you	are running a heavily loaded web site,
       or a news server	with lots of readers, you might	want to	consider turn-
       ing off atime updates on	your larger partitions with this mount(8)  op-
       tion.   However,	 you  should  not  gratuitously	turn off atime updates
       everywhere.  For	example, the /var file system customarily holds	 mail-
       boxes,  and  atime  (in	combination  with  mtime) is used to determine
       whether a mailbox has new mail.	You might as well leave	 atime	turned
       on for mostly read-only partitions such as / and	/usr as	well.  This is
       especially useful for / since some system utilities use the atime field
       for reporting.

STRIPING DISKS
       In  larger  systems  you	 can stripe partitions from several drives to-
       gether to create	a much larger overall partition.   Striping  can  also
       improve	the  performance  of a file system by splitting	I/O operations
       across two or more disks.  The vinum(8) and ccdconfig(8)	utilities  may
       be  used	 to  create  simple striped file systems.  Generally speaking,
       striping	smaller	partitions such	as the root and	 /var/tmp,  or	essen-
       tially  read-only  partitions such as /usr is a complete	waste of time.
       You should only stripe partitions that require serious I/O performance,
       typically /var, /home, or custom	partitions used	to hold	databases  and
       web  pages.   Choosing  the proper stripe size is also important.  File
       systems tend to store meta-data on power-of-2 boundaries	and  you  usu-
       ally  want  to reduce seeking rather than increase seeking.  This means
       you want	to use a large off-center stripe size such as 1152 sectors  so
       sequential I/O does not seek both disks and so meta-data	is distributed
       across  both  disks  rather than	concentrated on	a single disk.	If you
       really need to get sophisticated, we recommend using  a	real  hardware
       RAID controller from the	list of	FreeBSD	supported controllers.

SYSCTL TUNING
       sysctl(8)  variables  permit  system  behavior to be monitored and con-
       trolled at run-time.  Some sysctls simply report	on the behavior	of the
       system; others allow the	system behavior	to be modified;	 some  may  be
       set   at	 boot  time  using  rc.conf(5),	 but  most  will  be  set  via
       sysctl.conf(5).	There are several hundred sysctls in the  system,  in-
       cluding	many  that appear to be	candidates for tuning but actually are
       not.  In	this document we will only cover the ones that have the	great-
       est effect on the system.

       The kern.ipc.shm_use_phys sysctl	defaults to 0 (off) and	may be set  to
       0 (off) or 1 (on).  Setting this	parameter to 1 will cause all System V
       shared  memory  segments	to be mapped to	unpageable physical RAM.  This
       feature only has	an effect if you are either (A)	mapping	small  amounts
       of  shared  memory  across many (hundreds) of processes,	or (B) mapping
       large amounts of	shared memory across any number	 of  processes.	  This
       feature	allows	the  kernel  to	remove a great deal of internal	memory
       management page-tracking	overhead at the	cost of	wiring the shared mem-
       ory into	core, making it	unswappable.

       The vfs.vmiodirenable sysctl defaults to	1 (on).	 This  parameter  con-
       trols  how  directories are cached by the system.  Most directories are
       small and use but a single fragment (typically 1K) in the  file	system
       and even	less (typically	512 bytes) in the buffer cache.	 However, when
       operating  in the default mode the buffer cache will only cache a fixed
       number of directories even if you have a	huge amount of memory.	 Turn-
       ing  on this sysctl allows the buffer cache to use the VM Page Cache to
       cache the directories.  The advantage is	that  all  of  memory  is  now
       available  for caching directories.  The	disadvantage is	that the mini-
       mum in-core memory used to cache	a directory is the physical page  size
       (typically 4K) rather than 512 bytes.  We recommend turning this	option
       off  in memory-constrained environments;	however, when on, it will sub-
       stantially improve the performance of services that manipulate a	 large
       number of files.	 Such services can include web caches, large mail sys-
       tems,  and news systems.	 Turning on this option	will generally not re-
       duce performance	even with the wasted memory but	you should  experiment
       to find out.

       The  vfs.write_behind  sysctl  defaults to 1 (on).  This	tells the file
       system to issue media writes as full clusters are collected, which typ-
       ically occurs when writing large	sequential  files.   The  idea	is  to
       avoid  saturating the buffer cache with dirty buffers when it would not
       benefit I/O performance.	 However, this may stall processes  and	 under
       certain circumstances you may wish to turn it off.

       The vfs.hirunningspace sysctl determines	how much outstanding write I/O
       may  be	queued	to disk	controllers system-wide	at any given instance.
       The default is usually sufficient but on	machines with  lots  of	 disks
       you  may	 want to bump it up to four or five megabytes.	Note that set-
       ting too	high a value (exceeding	the buffer  cache's  write  threshold)
       can  lead  to  extremely	 bad  clustering performance.  Do not set this
       value arbitrarily high!	Also, higher write queueing values may add la-
       tency to	reads occurring	at the same time.

       There are various other buffer-cache and	VM page	cache related sysctls.
       We do not recommend modifying these values.  As of FreeBSD 4.3, the  VM
       system does an extremely	good job tuning	itself.

       The  net.inet.tcp.sendspace  and	 net.inet.tcp.recvspace	sysctls	are of
       particular interest if you are running network intensive	 applications.
       They  control  the  amount of send and receive buffer space allowed for
       any given TCP connection.  The default sending buffer is	32K;  the  de-
       fault  receiving	 buffer	 is 64K.  You can often	improve	bandwidth uti-
       lization	by increasing the default at the cost of eating	up more	kernel
       memory for each connection.  We do not  recommend  increasing  the  de-
       faults if you are serving hundreds or thousands of simultaneous connec-
       tions  because  it  is possible to quickly run the system out of	memory
       due to stalled connections building up.	But if you need	high bandwidth
       over a fewer number of connections, especially if you have gigabit Eth-
       ernet, increasing these defaults	can make a huge	difference.   You  can
       adjust  the buffer size for incoming and	outgoing data separately.  For
       example,	if your	machine	is primarily doing web serving you may want to
       decrease	the recvspace in order to be able to  increase	the  sendspace
       without	eating	too  much  kernel memory.  Note	that the routing table
       (see route(8)) can be used to introduce route-specific send and receive
       buffer size defaults.

       As an additional	management tool	you can	use  pipes  in	your  firewall
       rules  (see ipfw(8)) to limit the bandwidth going to or from particular
       IP blocks or ports.  For	example, if you	have a T1 you  might  want  to
       limit  your  web	traffic	to 70% of the T1's bandwidth in	order to leave
       the remainder available for mail	and interactive	use.  Normally a heav-
       ily loaded web server will not  introduce  significant  latencies  into
       other  services	even if	the network link is maxed out, but enforcing a
       limit can smooth	things out and lead to longer  term  stability.	  Many
       people also enforce artificial bandwidth	limitations in order to	ensure
       that they are not charged for using too much bandwidth.

       Setting the send	or receive TCP buffer to values	larger than 65535 will
       result  in a marginal performance improvement unless both hosts support
       the window scaling extension of the TCP protocol, which	is  controlled
       by the net.inet.tcp.rfc1323 sysctl.  These extensions should be enabled
       and  the	 TCP buffer size should	be set to a value larger than 65536 in
       order to	obtain good performance	from certain types of  network	links;
       specifically,  gigabit  WAN  links  and	high-latency  satellite	links.
       RFC1323 support is enabled by default.

       The net.inet.tcp.always_keepalive sysctl	determines whether or not  the
       TCP implementation should attempt to detect dead	TCP connections	by in-
       termittently  delivering	 "keepalives"  on the connection.  By default,
       this is enabled for all applications; by	setting	this sysctl to 0, only
       applications that specifically request keepalives will  use  them.   In
       most environments, TCP keepalives will improve the management of	system
       state  by expiring dead TCP connections,	particularly for systems serv-
       ing dialup users	who may	not always terminate  individual  TCP  connec-
       tions before disconnecting from the network.  However, in some environ-
       ments,  temporary network outages may be	incorrectly identified as dead
       sessions, resulting in unexpectedly  terminated	TCP  connections.   In
       such environments, setting the sysctl to	0 may reduce the occurrence of
       TCP session disconnections.

       The  net.inet.tcp.delayed_ack  TCP  feature  is	largely	misunderstood.
       Historically speaking, this feature was designed	to allow the  acknowl-
       edgement	 to  transmitted  data to be returned along with the response.
       For example, when you type over a remote	shell, the acknowledgement  to
       the character you send can be returned along with the data representing
       the  echo of the	character.  With delayed acks turned off, the acknowl-
       edgement	may be sent in its own packet, before the remote service has a
       chance to echo the data it just received.  This same concept  also  ap-
       plies  to  any interactive protocol (e.g. SMTP, WWW, POP3), and can cut
       the number of tiny packets flowing across the  network  in  half.   The
       FreeBSD	delayed	 ACK implementation also follows the TCP protocol rule
       that at least every other packet	be acknowledged	even if	 the  standard
       100ms timeout has not yet passed.  Normally the worst a delayed ACK can
       do  is  slightly	 delay the teardown of a connection, or	slightly delay
       the ramp-up of a	slow-start TCP connection.  While we are not  sure  we
       believe	that  the  several  FAQs related to packages such as SAMBA and
       SQUID which advise turning off delayed acks may	be  referring  to  the
       slow-start  issue.  In FreeBSD, it would	be more	beneficial to increase
       the slow-start  flightsize  via	the  net.inet.tcp.slowstart_flightsize
       sysctl rather than disable delayed acks.

       The  net.inet.tcp.inflight.enable sysctl	turns on bandwidth delay prod-
       uct limiting for	all TCP	connections.  The system will attempt to  cal-
       culate  the  bandwidth  delay product for each connection and limit the
       amount of data queued to	the network to just  the  amount  required  to
       maintain	optimum	throughput.  This feature is useful if you are serving
       data over modems, GigE, or high speed WAN links (or any other link with
       a  high bandwidth*delay product), especially if you are also using win-
       dow scaling or have configured a	large send window.  If you enable this
       option, you should also be sure to set net.inet.tcp.inflight.debug to 0
       (disable	   debugging),	  and	 for	production     use     setting
       net.inet.tcp.inflight.min  to  at  least	 6144 may be beneficial.  Note
       however,	that setting high minimums may effectively  disable  bandwidth
       limiting	 depending  on	the  link.   The  limiting feature reduces the
       amount of data built up in intermediate router and switch packet	queues
       as well as reduces the amount of	data built up in the local host's  in-
       terface	queue.	With fewer packets queued up, interactive connections,
       especially over slow modems, will also be able to  operate  with	 lower
       round  trip  times.   However, note that	this feature only affects data
       transmission (uploading / server-side).	It does	not affect data	recep-
       tion (downloading).

       Adjusting net.inet.tcp.inflight.stab is not recommended.	 This  parame-
       ter  defaults  to 20, representing 2 maximal packets added to the band-
       width delay product window calculation.	The additional window  is  re-
       quired  to stabilize the	algorithm and improve responsiveness to	chang-
       ing conditions, but it can also result in higher	ping times  over  slow
       links  (though still much lower than you	would get without the inflight
       algorithm).  In such cases you may wish to try reducing this  parameter
       to    15,   10,	 or   5,   and	 you   may   also   have   to	reduce
       net.inet.tcp.inflight.min (for example, to 3500)	to get the desired ef-
       fect.  Reducing these parameters	should be done as a last resort	only.

       The net.inet.ip.portrange.* sysctls control the port number ranges  au-
       tomatically  bound  to  TCP and UDP sockets.  There are three ranges: a
       low range, a default range,  and	 a  high  range,  selectable  via  the
       IP_PORTRANGE setsockopt(2) call.	 Most network programs use the default
       range   which   is   controlled	 by   net.inet.ip.portrange.first  and
       net.inet.ip.portrange.last, which default to 49152 and  65535,  respec-
       tively.	Bound port ranges are used for outgoing	connections, and it is
       possible	 to  run  the system out of ports under	certain	circumstances.
       This most commonly occurs when you are running  a  heavily  loaded  web
       proxy.  The port	range is not an	issue when running a server which han-
       dles mainly incoming connections, such as a normal web server, or has a
       limited number of outgoing connections, such as a mail relay.  For sit-
       uations	where  you  may	 run  out  of  ports,  we recommend decreasing
       net.inet.ip.portrange.first modestly.  A	range of 10000 to 30000	 ports
       may  be	reasonable.   You  should  also	consider firewall effects when
       changing	the port range.	 Some firewalls	 may  block  large  ranges  of
       ports  (usually	low-numbered  ports)  and expect systems to use	higher
       ranges	of   ports   for    outgoing	connections.	 By    default
       net.inet.ip.portrange.last is set at the	maximum	allowable port number.

       The  kern.ipc.somaxconn	sysctl limits the size of the listen queue for
       accepting new TCP connections.  The default value of 128	 is  typically
       too  low	for robust handling of new connections in a heavily loaded web
       server environment.  For	such  environments,  we	 recommend  increasing
       this  value to 1024 or higher.  The service daemon may itself limit the
       listen queue size (e.g. sendmail(8), apache) but	will often have	a  di-
       rective	in its configuration file to adjust the	queue size up.	Larger
       listen queues also do a better job of fending off denial	of service at-
       tacks.

       The kern.maxfiles sysctl	determines how many open files the system sup-
       ports.  The default is typically	a few thousand but  you	 may  need  to
       bump  this up to	ten or twenty thousand if you are running databases or
       large descriptor-heavy daemons.	The  read-only	kern.openfiles	sysctl
       may  be	interrogated  to determine the current number of open files on
       the system.

       The vm.swap_idle_enabled	sysctl is useful in large  multi-user  systems
       where  you  have	lots of	users entering and leaving the system and lots
       of idle processes.  Such	systems	tend to	generate a great deal of  con-
       tinuous	pressure on free memory	reserves.  Turning this	feature	on and
       adjusting   the	 swapout   hysteresis	(in    idle    seconds)	   via
       vm.swap_idle_threshold1	and  vm.swap_idle_threshold2 allows you	to de-
       press the priority of pages associated with idle	processes more quickly
       then the	normal pageout algorithm.  This	gives a	helping	 hand  to  the
       pageout daemon.	Do not turn this option	on unless you need it, because
       the  tradeoff  you  are making is to essentially	pre-page memory	sooner
       rather than later, eating more swap and disk  bandwidth.	  In  a	 small
       system this option will have a detrimental effect but in	a large	system
       that  is	already	doing moderate paging this option allows the VM	system
       to stage	whole processes	into and out of	memory more easily.

LOADER TUNABLES
       Some aspects of the system behavior may not be tunable at  runtime  be-
       cause  memory  allocations  they	 perform  must occur early in the boot
       process.	 To change loader tunables,  you  must	set  their  values  in
       loader.conf(5) and reboot the system.

       kern.maxusers controls the scaling of a number of static	system tables,
       including defaults for the maximum number of open files,	sizing of net-
       work  memory resources, etc.  As	of FreeBSD 4.5,	kern.maxusers is auto-
       matically sized at boot based on	the amount of memory available in  the
       system,	and  may  be determined	at run-time by inspecting the value of
       the read-only kern.maxusers sysctl.  Some sites will require larger  or
       smaller	values	of  kern.maxusers  and may set it as a loader tunable;
       values of 64, 128, and 256 are not uncommon.  We	do not recommend going
       above 256 unless	you need a huge	number of file	descriptors;  many  of
       the  tunable values set to their	defaults by kern.maxusers may be indi-
       vidually	overridden at boot-time	or run-time as described elsewhere  in
       this  document.	Systems	older than FreeBSD 4.4 must set	this value via
       the kernel config(8) option maxusers instead.

       The kern.dfldsiz	and kern.dflssiz tunables set the default soft	limits
       for  process  data and stack size respectively.	Processes may increase
       these up	to the hard limits by calling setrlimit(2).  The kern.maxdsiz,
       kern.maxssiz, and kern.maxtsiz tunables set the hard limits for process
       data, stack, and	text size respectively;	processes may not exceed these
       limits.	The kern.sgrowsiz tunable controls how much the	stack  segment
       will grow when a	process	needs to allocate more stack.

       kern.ipc.nmbclusters  may be adjusted to	increase the number of network
       mbufs the system	is willing to allocate.	 Each cluster  represents  ap-
       proximately  2K	of  memory, so a value of 1024 represents 2M of	kernel
       memory reserved for network buffers.  You can do	a  simple  calculation
       to  figure out how many you need.  If you have a	web server which maxes
       out at 1000 simultaneous	connections, and each connection  eats	a  16K
       receive	and 16K	send buffer, you need approximately 32MB worth of net-
       work buffers to deal with it.  A	good rule of thumb is to  multiply  by
       2, so 32MBx2 = 64MB/2K =	32768.	So for this case you would want	to set
       kern.ipc.nmbclusters  to	 32768.	  We recommend values between 1024 and
       4096 for	machines with moderates	amount of memory, and between 4096 and
       32768 for machines with greater amounts of memory.   Under  no  circum-
       stances	should	you specify an arbitrarily high	value for this parame-
       ter, it could lead to a boot-time crash.	 The -m	option	to  netstat(1)
       may  be used to observe network cluster use.  Older versions of FreeBSD
       do not have this	tunable	and require that the kernel  config(8)	option
       NMBCLUSTERS be set instead.

       More  and more programs are using the sendfile(2) system	call to	trans-
       mit files over the network.  The	kern.ipc.nsfbufs sysctl	 controls  the
       number  of file system buffers sendfile(2) is allowed to	use to perform
       its work.  This parameter nominally scales with	kern.maxusers  so  you
       should  not  need to modify this	parameter except under extreme circum-
       stances.	 See the "TUNING" section in the sendfile(2) manual  page  for
       details.

KERNEL CONFIG TUNING
       There  are  a number of kernel options that you may have	to fiddle with
       in a large-scale	system.	 In order to change these options you need  to
       be able to compile a new	kernel from source.  The config(8) manual page
       and  the	handbook are good starting points for learning how to do this.
       Generally the first thing you do	when creating your own	custom	kernel
       is  to strip out	all the	drivers	and services you do not	use.  Removing
       things like INET6 and drivers you do not	have will reduce the  size  of
       your  kernel,  sometimes	 by  a	megabyte  or more, leaving more	memory
       available for applications.

       SCSI_DELAY may be used to reduce	system boot times.  The	 defaults  are
       fairly  high and	can be responsible for 5+ seconds of delay in the boot
       process.	 Reducing SCSI_DELAY to	something below	5 seconds  could  work
       (especially with	modern drives).

       There  are a number of *_CPU options that can be	commented out.	If you
       only want the kernel to run on a	Pentium	class CPU, you can easily  re-
       move I486_CPU, but only remove I586_CPU if you are sure your CPU	is be-
       ing  recognized	as  a Pentium II or better.  Some clones may be	recog-
       nized as	a Pentium or even a 486	and not	be able	to boot	without	 those
       options.	  If  it  works,  great!  The operating	system will be able to
       better use higher-end CPU features for MMU, task	 switching,  timebase,
       and  even device	operations.  Additionally, higher-end CPUs support 4MB
       MMU pages, which	the kernel uses	to map the kernel itself into  memory,
       increasing its efficiency under heavy syscall loads.

IDE WRITE CACHING
       FreeBSD	4.3  flirted with turning off IDE write	caching.  This reduced
       write bandwidth to IDE disks but	was considered necessary due to	 seri-
       ous  data  consistency  issues introduced by hard drive vendors.	 Basi-
       cally the problem is that IDE drives lie	about when a write  completes.
       With  IDE  write	caching	turned on, IDE hard drives will	not only write
       data to disk out	of order, they will sometimes delay some of the	blocks
       indefinitely under heavy	disk load.  A crash or power failure  can  re-
       sult  in	serious	file system corruption.	 So our	default	was changed to
       be safe.	 Unfortunately,	the result was such a huge loss	in performance
       that we caved in	and changed the	default	back to	on after the  release.
       You  should check the default on	your system by observing the hw.ata.wc
       sysctl variable.	 If IDE	write caching is turned	off, you can  turn  it
       back on by setting the hw.ata.wc	loader tunable to 1.  More information
       on tuning the ATA driver	system may be found in the ata(4) manual page.
       If you need performance,	go with	SCSI.

CPU, MEMORY, DISK, NETWORK
       The  type  of tuning you	do depends heavily on where your system	begins
       to bottleneck as	load increases.	 If your system	runs out of CPU	 (idle
       times  are  perpetually 0%) then	you need to consider upgrading the CPU
       or moving to an SMP motherboard (multiple CPU's), or perhaps  you  need
       to  revisit  the	programs that are causing the load and try to optimize
       them.  If your system is	paging to swap a  lot  you  need  to  consider
       adding  more  memory.   If your system is saturating the	disk you typi-
       cally see high CPU idle times and total disk saturation.	 systat(1) can
       be used to monitor this.	 There are many	solutions to saturated	disks:
       increasing memory for caching, mirroring	disks, distributing operations
       across several machines,	and so forth.  If disk performance is an issue
       and  you	are using IDE drives, switching	to SCSI	can help a great deal.
       While modern IDE	drives compare with SCSI in raw	sequential  bandwidth,
       the moment you start seeking around the disk SCSI drives	usually	win.

       Finally,	 you might run out of network suds.  The first line of defense
       for improving network  performance  is  to  make	 sure  you  are	 using
       switches	 instead of hubs, especially these days	where switches are al-
       most as cheap.  Hubs have severe	problems under heavy loads due to col-
       lision back-off and one bad host	can severely degrade the  entire  LAN.
       Second, optimize	the network path as much as possible.  For example, in
       firewall(7)  we	describe  a  firewall protecting internal hosts	with a
       topology	where the externally visible hosts are not routed through  it.
       Use  100BaseT  rather  than  10BaseT,  or  use  1000BaseT  rather  than
       100BaseT, depending on your needs.  Most	bottlenecks occur at  the  WAN
       link  (e.g. modem, T1, DSL, whatever).  If expanding the	link is	not an
       option it may be	possible to use	the dummynet(4)	feature	 to  implement
       peak  shaving  or  other	 forms of traffic shaping to prevent the over-
       loaded service (such as web services)  from  affecting  other  services
       (such  as  email),  or vice versa.  In home installations this could be
       used to give interactive	traffic	(your browser, ssh(1) logins) priority
       over services you export	from your box (web services, email).

SEE ALSO
       netstat(1), systat(1), sendfile(2), ata(4), dummynet(4),	login.conf(5),
       rc.conf(5), sysctl.conf(5), firewall(7),	 hier(7),  ports(7),  boot(8),
       bsdlabel(8),  ccdconfig(8),  config(8),	fsck(8), ifconfig(8), ipfw(8),
       loader(8),  mount(8),  newfs(8),	 route(8),  sysctl(8),	sysinstall(8),
       tunefs(8), vinum(8)

HISTORY
       The  tuning  manual  page  was originally written by Matthew Dillon and
       first appeared in FreeBSD 4.3, May 2001.

FreeBSD	7.1		       January 17, 2007			     TUNING(7)
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=tuning&sektion=7&manpath=FreeBSD+7.1-RELEASE>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages