Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
TCP(4)			 BSD Kernel Interfaces Manual			TCP(4)

     tcp -- Internet Transmission Control Protocol

     #include <sys/types.h>
     #include <sys/socket.h>
     #include <netinet/in.h>

     socket(AF_INET, SOCK_STREAM, 0);

     The TCP protocol provides reliable, flow-controlled, two-way transmission
     of	data.  It is a byte-stream protocol used to support the	SOCK_STREAM
     abstraction.  TCP uses the	standard Internet address format and, in addi-
     tion, provides a per-host collection of "port addresses".	Thus, each ad-
     dress is composed of an Internet address specifying the host and network,
     with a specific TCP port on the host identifying the peer entity.

     Sockets utilizing the tcp protocol	are either "active" or "passive".  Ac-
     tive sockets initiate connections to passive sockets.  By default TCP
     sockets are created active; to create a passive socket the	listen(2) sys-
     tem call must be used after binding the socket with the bind(2) system
     call.  Only passive sockets may use the accept(2) call to accept incoming
     connections.  Only	active sockets may use the connect(2) call to initiate
     connections.  TCP also supports a more datagram-like mode,	called Trans-
     action TCP, which is described in ttcp(4).

     Passive sockets may "underspecify"	their location to match	incoming con-
     nection requests from multiple networks.  This technique, termed
     "wildcard addressing", allows a single server to provide service to
     clients on	multiple networks.  To create a	socket which listens on	all
     networks, the Internet address INADDR_ANY must be bound.  The TCP port
     may still be specified at this time; if the port is not specified the
     system will assign	one.  Once a connection	has been established the
     socket's address is fixed by the peer entity's location.	The address
     assigned the socket is the	address	associated with	the network interface
     through which packets are being transmitted and received.	Normally this
     address corresponds to the	peer entity's network.

     TCP supports a number of socket options which can be set with
     setsockopt(2) and tested with getsockopt(2):

     TCP_NODELAY   Under most circumstances, TCP sends data when it is pre-
		   sented; when	outstanding data has not yet been acknowl-
		   edged, it gathers small amounts of output to	be sent	in a
		   single packet once an acknowledgement is received.  For a
		   small number	of clients, such as window systems that	send a
		   stream of mouse events which	receive	no replies, this pack-
		   etization may cause significant delays.  The	boolean	option
		   TCP_NODELAY defeats this algorithm.

     TCP_MAXSEG	   By default, a sender- and receiver-TCP will negotiate among
		   themselves to determine the maximum segment size to be used
		   for each connection.	 The TCP_MAXSEG	option allows the user
		   to determine	the result of this negotiation,	and to reduce
		   it if desired.

     TCP_NOOPT	   TCP usually sends a number of options in each packet, cor-
		   responding to various TCP extensions	which are provided in
		   this	implementation.	 The boolean option TCP_NOOPT is pro-
		   vided to disable TCP	option use on a	per-connection basis.

     TCP_NOPUSH	   By convention, the sender-TCP will set the "push" bit and
		   begin transmission immediately (if permitted) at the	end of
		   every user call to write(2) or writev(2).  The TCP_NOPUSH
		   option is provided to allow servers to easily make use of
		   Transaction TCP (see	ttcp(4)).  When	the option is set to a
		   non-zero value, TCP will delay sending any data at all un-
		   til either the socket is closed, or the internal send buf-
		   fer is filled.

     The option	level for the setsockopt(2) call is the	protocol number	for
     TCP, available from getprotobyname(3), or IPPROTO_TCP.  All options are
     declared in <netinet/tcp.h>.

     Options at	the IP transport level may be used with	TCP; see ip(4).	 In-
     coming connection requests	that are source-routed are noted, and the re-
     verse source route	is used	in responding.

     The tcp protocol implements a number of variables in the net.inet branch
     of	the sysctl(3) MIB.

     TCPCTL_DO_RFC1323	(tcp.rfc1323) Implement	the window scaling and time-
			stamp options of RFC 1323 (default true).

     TCPCTL_DO_RFC1644	(tcp.rfc1644) Implement	Transaction TCP, as described
			in RFC 1644.

     TCPCTL_MSSDFLT	(tcp.mssdflt) The default value	used for the maximum
			segment	size ("MSS") when no advice to the contrary is
			received from MSS negotiation.

     TCPCTL_SENDSPACE	(tcp.sendspace)	Maximum	TCP send window.

     TCPCTL_RECVSPACE	(tcp.recvspace)	Maximum	TCP receive window.

     tcp.log_in_vain	Log any	connection attempts to ports where there is
			not a socket accepting connections.  The value of 1
			limits the logging to SYN (connection establishment)
			packets	only.  That of 2 results in any	TCP packets to
			closed ports being logged.  Any	value unlisted above
			disables the logging (default is 0, i.e., the logging
			is disabled).

			The number of packets allowed to be in-flight during
			the TCP	slow-start phase on a non-local	network.

			The number of packets allowed to be in-flight during
			the TCP	slow-start phase to local machines in the same

     tcp.msl		The Maximum Segment Lifetime for a packet.

     tcp.keepinit	Timeout	for new, non-established TCP connections.

     tcp.keepidle	Amount of time the connection should be	idle before
			keepalive probes (if enabled) are sent.

     tcp.keepintvl	The interval between keepalive probes sent to remote
			machines.  After TCPTV_KEEPCNT (default	8) probes are
			sent, with no response,	the connection is dropped.

			Assume that SO_KEEPALIVE is set	on all TCP connec-
			tions, the kernel will periodically send a packet to
			the remote host	to verify the connection is still up.

     tcp.icmp_may_rst	Certain	ICMP unreachable messages may abort connec-
			tions in SYN-SENT state.

     tcp.do_tcpdrain	Flush packets in the TCP reassembly queue if the sys-
			tem is low on mbufs.

     tcp.blackhole	If enabled, disable sending of RST when	a connection
			is attempted to	a port where there is not a socket ac-
			cepting	connections.  See blackhole(4).

     tcp.delayed_ack	Delay ACK to try and piggyback it onto a data packet.

     tcp.delacktime	Maximum	amount of time before a	delayed	ACK is sent.

     tcp.newreno	Enable TCP NewReno Fast	Recovery algorithm, as de-
			scribed	in RFC 2582.

			Enable Path MTU	Discovery

     tcp.tcbhashsize	Size of	the TCP	control-block hashtable	(read-only).
			This may be tuned using	the kernel option TCBHASHSIZE
			or by setting net.inet.tcp.tcbhashsize in the

     tcp.pcbcount	Number of active process control blocks	(read-only).

     tcp.syncookies	Determines whether or not syn cookies should be	gener-
			ated for outbound syn-ack packets.  Syn	cookies	are a
			great help during syn flood attacks, and are enabled
			by default.

			The interval (in seconds) specifying how often the se-
			cret data used in RFC 1948 initial sequence number
			calculations should be reseeded.  By default, this
			variable is set	to zero, indicating that no reseeding
			will occur.  Reseeding should not be necessary,	and
			will break TIME_WAIT recycling for a few minutes.

			Adjust the retransmit timer calculation	for TCP.  The
			slop is	typically added	to the raw calculation to take
			into account occasional	variances that the SRTT
			(smoothed round	trip time) is unable to	accomodate,
			while the minimum specifies an absolute	minimum.
			While a	number of TCP RFCs suggest a 1 second minimum
			these RFCs tend	to focus on streaming behavior and
			fail to	deal with the fact that	a 1 second minimum has
			severe detrimental effects over	lossy interactive con-
			nections, such as a 802.11b wireless link, and over
			very fast but lossy connections	for those cases	not
			covered	by the fast retransmit code.  For this reason
			we suggest changing the	slop to	200ms and setting the
			minimum	to something out of the	way, like 20ms,	which
			gives you an effective minimum of 200ms	(similar to

			Enable TCP bandwidth delay product limiting.  An at-
			tempt will be made to calculate	the bandwidth delay
			product	for each individual TCP	connection and limit
			the amount of inflight data being transmitted to avoid
			building up unnecessary	packets	in the network.	 This
			option is recommended if you are serving a lot of data
			over connections with high bandwidth-delay products,
			such as	modems,	GigE links, and	fast long-haul WANs,
			and/or you have	configured your	machine	to accomodate
			large TCP windows.  In such situations,	without	this
			option,	you may	experience high	interactive latencies
			or packet loss due to the overloading of intermediate
			routers	and switches.  Note that bandwidth delay prod-
			uct limiting only effects the transmit side of a TCP

			Enable debugging for the bandwidth delay product algo-
			rithm.	This may default to on (1) so if you enable
			the algorithm you should probably also disable debug-
			ging by	setting	this variable to 0.

     tcp.inflight_min	This puts an lower bound on the	bandwidth delay	prod-
			uct window, in bytes.  A value of 1024 is typically
			used for debugging.  6000-16000	is more	typical	in a
			production installation.  Setting this value too low
			may result in slow ramp-up times for bursty connec-
			tions.	Setting	this value too high effectively	dis-
			ables the algorithm.

     tcp.inflight_max	This puts an upper bound on the	bandwidth delay	prod-
			uct window, in bytes.  This value should not generally
			be modified but	may be used to set a global per-con-
			nection	limit on queued	data, potentially allowing you
			to intentionally set a less then optimum limit to
			smooth data flow over a	network	while still being able
			to specify huge	internal TCP buffers.

     tcp.inflight_stab	The bandwidth delay product algorithm requires a
			slightly larger	window then it otherwise calculates
			for stability.	This parameter determines the extra
			window in maximal packets / 10.	 The default value of
			20 represents 2	maximal	packets.  Reducing this	value
			is not recommended but you may come across a situation
			with very slow links where the ping time reduction of
			the default inflight code is not sufficient.  If this
			case occurs you	should first try reducing
			flight_min and,	if that	does not work, reduce both
			tcp.inflight_min and tcp.inflight_stab,	trying values
			of 15, 10, or 5	for the	latter.	 Never use a value
			less then 5.  Reducing tcp.inflight_stab can lead to
			upwards	of a 20% underutilization of the link as well
			as reducing the	algorithm's ability to adapt to	chang-
			ing situations and should only be done as a last re-

     A socket operation	may fail with one of the following errors returned:

     [EISCONN]		when trying to establish a connection on a socket
			which already has one;

     [ENOBUFS]		when the system	runs out of memory for an internal
			data structure;

     [ETIMEDOUT]	when a connection was dropped due to excessive re-

     [ECONNRESET]	when the remote	peer forces the	connection to be

     [ECONNREFUSED]	when the remote	peer actively refuses connection es-
			tablishment (usually because no	process	is listening
			to the port);

     [EADDRINUSE]	when an	attempt	is made	to create a socket with	a port
			which has already been allocated;

     [EADDRNOTAVAIL]	when an	attempt	is made	to create a socket with	a net-
			work address for which no network interface exists.

     [EAFNOSUPPORT]	when an	attempt	is made	to bind	or connect a socket to
			a multicast address.

     getsockopt(2), socket(2), sysctl(3), blackhole(4),	inet(4), intro(4),
     ip(4), ttcp(4)

     V.	Jacobson, R. Braden, and D. Borman, TCP	Extensions for High
     Performance, RFC 1323.

     R.	Braden,	T/TCP -	TCP Extensions for Transactions, RFC 1644.

     The tcp protocol appeared in 4.2BSD.  The RFC 1323	extensions for window
     scaling and timestamps were added in 4.4BSD.

BSD			       February	14, 1995			   BSD


Want to link to this manual page? Use this URL:

home | help