summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2019-01-21 12:52:31 +1300
committerLinus Torvalds <torvalds@linux-foundation.org>2019-01-21 12:52:31 +1300
commit7d0ae236ed13d7645fb73b85e7c95deee46c4656 (patch)
tree60ac172dee7a3528df7bfa4deb26bb822192ca5c /Documentation
parentbb617b9b4519b0cef939c9c8e9c41470749f0d51 (diff)
parent6436408e814b81046f4595245c1f9bc4409e945c (diff)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller: 1) Fix endless loop in nf_tables, from Phil Sutter. 2) Fix cross namespace ip6_gre tunnel hash list corruption, from Olivier Matz. 3) Don't be too strict in phy_start_aneg() otherwise we might not allow restarting auto negotiation. From Heiner Kallweit. 4) Fix various KMSAN uninitialized value cases in tipc, from Ying Xue. 5) Memory leak in act_tunnel_key, from Davide Caratti. 6) Handle chip errata of mv88e6390 PHY, from Andrew Lunn. 7) Remove linear SKB assumption in fou/fou6, from Eric Dumazet. 8) Missing udplite rehash callbacks, from Alexey Kodanev. 9) Log dirty pages properly in vhost, from Jason Wang. 10) Use consume_skb() in neigh_probe() as this is a normal free not a drop, from Yang Wei. Likewise in macvlan_process_broadcast(). 11) Missing device_del() in mdiobus_register() error paths, from Thomas Petazzoni. 12) Fix checksum handling of short packets in mlx5, from Cong Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (96 commits) bpf: in __bpf_redirect_no_mac pull mac only if present virtio_net: bulk free tx skbs net: phy: phy driver features are mandatory isdn: avm: Fix string plus integer warning from Clang net/mlx5e: Fix cb_ident duplicate in indirect block register net/mlx5e: Fix wrong (zero) TX drop counter indication for representor net/mlx5e: Fix wrong error code return on FEC query failure net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames tools: bpftool: Cleanup license mess bpf: fix inner map masking to prevent oob under speculation bpf: pull in pkt_sched.h header for tooling to fix bpftool build selftests: forwarding: Add a test case for externally learned FDB entries selftests: mlxsw: Test FDB offload indication mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky net: bridge: Mark FDB entries that were added by user as such mlxsw: spectrum_fid: Update dummy FID index mlxsw: pci: Return error on PCI reset timeout mlxsw: pci: Increase PCI SW reset timeout mlxsw: pci: Ring CQ's doorbell before RDQ's MAINTAINERS: update email addresses of liquidio driver maintainers ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/networking/index.rst26
-rw-r--r--Documentation/networking/rxrpc.txt45
-rw-r--r--Documentation/networking/snmp_counter.rst130
-rw-r--r--Documentation/networking/timestamping.txt4
4 files changed, 140 insertions, 65 deletions
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 6a47629ef8ed..59e86de662cd 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -11,19 +11,19 @@ Contents:
batman-adv
can
can_ucan_protocol
- dpaa2/index
- e100
- e1000
- e1000e
- fm10k
- igb
- igbvf
- ixgb
- ixgbe
- ixgbevf
- i40e
- iavf
- ice
+ device_drivers/freescale/dpaa2/index
+ device_drivers/intel/e100
+ device_drivers/intel/e1000
+ device_drivers/intel/e1000e
+ device_drivers/intel/fm10k
+ device_drivers/intel/igb
+ device_drivers/intel/igbvf
+ device_drivers/intel/ixgb
+ device_drivers/intel/ixgbe
+ device_drivers/intel/ixgbevf
+ device_drivers/intel/i40e
+ device_drivers/intel/iavf
+ device_drivers/intel/ice
kapi
z8530book
msg_zerocopy
diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt
index c9d052e0cf51..2df5894353d6 100644
--- a/Documentation/networking/rxrpc.txt
+++ b/Documentation/networking/rxrpc.txt
@@ -1000,51 +1000,6 @@ The kernel interface functions are as follows:
size should be set when the call is begun. tx_total_len may not be less
than zero.
- (*) Check to see the completion state of a call so that the caller can assess
- whether it needs to be retried.
-
- enum rxrpc_call_completion {
- RXRPC_CALL_SUCCEEDED,
- RXRPC_CALL_REMOTELY_ABORTED,
- RXRPC_CALL_LOCALLY_ABORTED,
- RXRPC_CALL_LOCAL_ERROR,
- RXRPC_CALL_NETWORK_ERROR,
- };
-
- int rxrpc_kernel_check_call(struct socket *sock, struct rxrpc_call *call,
- enum rxrpc_call_completion *_compl,
- u32 *_abort_code);
-
- On return, -EINPROGRESS will be returned if the call is still ongoing; if
- it is finished, *_compl will be set to indicate the manner of completion,
- *_abort_code will be set to any abort code that occurred. 0 will be
- returned on a successful completion, -ECONNABORTED will be returned if the
- client failed due to a remote abort and anything else will return an
- appropriate error code.
-
- The caller should look at this information to decide if it's worth
- retrying the call.
-
- (*) Retry a client call.
-
- int rxrpc_kernel_retry_call(struct socket *sock,
- struct rxrpc_call *call,
- struct sockaddr_rxrpc *srx,
- struct key *key);
-
- This attempts to partially reinitialise a call and submit it again while
- reusing the original call's Tx queue to avoid the need to repackage and
- re-encrypt the data to be sent. call indicates the call to retry, srx the
- new address to send it to and key the encryption key to use for signing or
- encrypting the packets.
-
- For this to work, the first Tx data packet must still be in the transmit
- queue, and currently this is only permitted for local and network errors
- and the call must not have been aborted. Any partially constructed Tx
- packet is left as is and can continue being filled afterwards.
-
- It returns 0 if the call was requeued and an error otherwise.
-
(*) Get call RTT.
u64 rxrpc_kernel_get_rtt(struct socket *sock, struct rxrpc_call *call);
diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst
index b0dfdaaca512..fe8f741193be 100644
--- a/Documentation/networking/snmp_counter.rst
+++ b/Documentation/networking/snmp_counter.rst
@@ -336,7 +336,26 @@ time client replies ACK, this socket will get another chance to move
to the accept queue.
-TCP Fast Open
+* TcpEstabResets
+Defined in `RFC1213 tcpEstabResets`_.
+
+.. _RFC1213 tcpEstabResets: https://tools.ietf.org/html/rfc1213#page-48
+
+* TcpAttemptFails
+Defined in `RFC1213 tcpAttemptFails`_.
+
+.. _RFC1213 tcpAttemptFails: https://tools.ietf.org/html/rfc1213#page-48
+
+* TcpOutRsts
+Defined in `RFC1213 tcpOutRsts`_. The RFC says this counter indicates
+the 'segments sent containing the RST flag', but in linux kernel, this
+couner indicates the segments kerenl tried to send. The sending
+process might be failed due to some errors (e.g. memory alloc failed).
+
+.. _RFC1213 tcpOutRsts: https://tools.ietf.org/html/rfc1213#page-52
+
+
+TCP Fast Path
============
When kernel receives a TCP packet, it has two paths to handler the
packet, one is fast path, another is slow path. The comment in kernel
@@ -383,8 +402,6 @@ increase 1.
TCP abort
========
-
-
* TcpExtTCPAbortOnData
It means TCP layer has data in flight, but need to close the
connection. So TCP layer sends a RST to the other side, indicate the
@@ -545,7 +562,6 @@ packet yet, the sender would know packet 4 is out of order. The TCP
stack of kernel will increase TcpExtTCPSACKReorder for both of the
above scenarios.
-
DSACK
=====
The DSACK is defined in `RFC2883`_. The receiver uses DSACK to report
@@ -566,13 +582,63 @@ The TCP stack receives an out of order duplicate packet, so it sends a
DSACK to the sender.
* TcpExtTCPDSACKRecv
-The TCP stack receives a DSACK, which indicate an acknowledged
+The TCP stack receives a DSACK, which indicates an acknowledged
duplicate packet is received.
* TcpExtTCPDSACKOfoRecv
The TCP stack receives a DSACK, which indicate an out of order
duplicate packet is received.
+invalid SACK and DSACK
+====================
+When a SACK (or DSACK) block is invalid, a corresponding counter would
+be updated. The validation method is base on the start/end sequence
+number of the SACK block. For more details, please refer the comment
+of the function tcp_is_sackblock_valid in the kernel source code. A
+SACK option could have up to 4 blocks, they are checked
+individually. E.g., if 3 blocks of a SACk is invalid, the
+corresponding counter would be updated 3 times. The comment of the
+`Add counters for discarded SACK blocks`_ patch has additional
+explaination:
+
+.. _Add counters for discarded SACK blocks: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18f02545a9a16c9a89778b91a162ad16d510bb32
+
+* TcpExtTCPSACKDiscard
+This counter indicates how many SACK blocks are invalid. If the invalid
+SACK block is caused by ACK recording, the TCP stack will only ignore
+it and won't update this counter.
+
+* TcpExtTCPDSACKIgnoredOld and TcpExtTCPDSACKIgnoredNoUndo
+When a DSACK block is invalid, one of these two counters would be
+updated. Which counter will be updated depends on the undo_marker flag
+of the TCP socket. If the undo_marker is not set, the TCP stack isn't
+likely to re-transmit any packets, and we still receive an invalid
+DSACK block, the reason might be that the packet is duplicated in the
+middle of the network. In such scenario, TcpExtTCPDSACKIgnoredNoUndo
+will be updated. If the undo_marker is set, TcpExtTCPDSACKIgnoredOld
+will be updated. As implied in its name, it might be an old packet.
+
+SACK shift
+=========
+The linux networking stack stores data in sk_buff struct (skb for
+short). If a SACK block acrosses multiple skb, the TCP stack will try
+to re-arrange data in these skb. E.g. if a SACK block acknowledges seq
+10 to 15, skb1 has seq 10 to 13, skb2 has seq 14 to 20. The seq 14 and
+15 in skb2 would be moved to skb1. This operation is 'shift'. If a
+SACK block acknowledges seq 10 to 20, skb1 has seq 10 to 13, skb2 has
+seq 14 to 20. All data in skb2 will be moved to skb1, and skb2 will be
+discard, this operation is 'merge'.
+
+* TcpExtTCPSackShifted
+A skb is shifted
+
+* TcpExtTCPSackMerged
+A skb is merged
+
+* TcpExtTCPSackShiftFallback
+A skb should be shifted or merged, but the TCP stack doesn't do it for
+some reasons.
+
TCP out of order
===============
* TcpExtTCPOFOQueue
@@ -662,6 +728,60 @@ unacknowledged number (more strict than `RFC 5961 section 5.2`_).
.. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9
.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11
+TCP receive window
+=================
+* TcpExtTCPWantZeroWindowAdv
+Depending on current memory usage, the TCP stack tries to set receive
+window to zero. But the receive window might still be a no-zero
+value. For example, if the previous window size is 10, and the TCP
+stack receives 3 bytes, the current window size would be 7 even if the
+window size calculated by the memory usage is zero.
+
+* TcpExtTCPToZeroWindowAdv
+The TCP receive window is set to zero from a no-zero value.
+
+* TcpExtTCPFromZeroWindowAdv
+The TCP receive window is set to no-zero value from zero.
+
+
+Delayed ACK
+==========
+The TCP Delayed ACK is a technique which is used for reducing the
+packet count in the network. For more details, please refer the
+`Delayed ACK wiki`_
+
+.. _Delayed ACK wiki: https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment
+
+* TcpExtDelayedACKs
+A delayed ACK timer expires. The TCP stack will send a pure ACK packet
+and exit the delayed ACK mode.
+
+* TcpExtDelayedACKLocked
+A delayed ACK timer expires, but the TCP stack can't send an ACK
+immediately due to the socket is locked by a userspace program. The
+TCP stack will send a pure ACK later (after the userspace program
+unlock the socket). When the TCP stack sends the pure ACK later, the
+TCP stack will also update TcpExtDelayedACKs and exit the delayed ACK
+mode.
+
+* TcpExtDelayedACKLost
+It will be updated when the TCP stack receives a packet which has been
+ACKed. A Delayed ACK loss might cause this issue, but it would also be
+triggered by other reasons, such as a packet is duplicated in the
+network.
+
+Tail Loss Probe (TLP)
+===================
+TLP is an algorithm which is used to detect TCP packet loss. For more
+details, please refer the `TLP paper`_.
+
+.. _TLP paper: https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01
+
+* TcpExtTCPLossProbes
+A TLP probe packet is sent.
+
+* TcpExtTCPLossProbeRecovery
+A packet loss is detected and recovered by TLP.
examples
=======
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index 1be0b6f9e0cb..9d1432e0aaa8 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -417,7 +417,7 @@ is again deprecated and ts[2] holds a hardware timestamp if set.
Hardware time stamping must also be initialized for each device driver
that is expected to do hardware time stamping. The parameter is defined in
-/include/linux/net_tstamp.h as:
+include/uapi/linux/net_tstamp.h as:
struct hwtstamp_config {
int flags; /* no flags defined right now, must be zero */
@@ -487,7 +487,7 @@ enum {
HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
/* for the complete list of values, please check
- * the include file /include/linux/net_tstamp.h
+ * the include file include/uapi/linux/net_tstamp.h
*/
};