Commit Graph

2931 Commits

Author SHA1 Message Date
Jiri Pirko
bf3994d2ed net/sched: introduce Match-all classifier
The matchall classifier matches every packet and allows the user to apply
actions on it. This filter is very useful in usecases where every packet
should be matched, for example, packet mirroring (SPAN) can be setup very
easily using that filter.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 23:11:59 -07:00
David S. Miller
de0ba9a0d8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just several instances of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24 00:53:32 -04:00
Daniel Borkmann
b02b94b331 bpf, elf: add official ELF machine define for eBPF
Add the official BPF ELF e_machine value that was assigned recently [1,2]
and will be propagated to glibc, et al. LLVM is switching to it in 3.9
release.

  [1] 36b9c09330
  [2] http://lists.iovisor.org/pipermail/iovisor-dev/2016-June/000266.html

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 12:37:39 -07:00
Brenden Blanco
6ce96ca348 bpf: add XDP_TX xdp_action for direct forwarding
XDP enabled drivers must transmit received packets back out on the same
port they were received on when a program returns this action.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:32 -07:00
Brenden Blanco
d1fdd91386 rtnl: add option for setting link xdp prog
Sets the bpf program represented by fd as an early filter in the rx path
of the netdev. The fd must have been created as BPF_PROG_TYPE_XDP.
Providing a negative value as fd clears the program. Getting the fd back
via rtnl is not possible, therefore reading of this value merely
provides a bool whether the program is valid on the link or not.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:32 -07:00
Brenden Blanco
6a773a15a1 bpf: add XDP prog type for early driver filter
Add a new bpf prog type that is intended to run in early stages of the
packet rx path. Only minimal packet metadata will be available, hence a
new context type, struct xdp_md, is exposed to userspace. So far only
expose the packet start and end pointers, and only in read mode.

An XDP program must return one of the well known enum values, all other
return codes are reserved for future use. Unfortunately, this
restriction is hard to enforce at verification time, so take the
approach of warning at runtime when such programs are encountered. Out
of bounds return codes should alias to XDP_ABORTED.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 21:46:31 -07:00
Gavin Shan
6389eaa7fa net/ncsi: NCSI command packet handler
The NCSI command packets are sent from MC (Management Controller)
to remote end. They are used for multiple purposes: probe existing
NCSI package/channel, retrieve NCSI channel's capability, configure
NCSI channel etc.

This defines struct to represent NCSI command packets and introduces
function ncsi_xmit_cmd(), which will be used to transmit NCSI command
packet according to the request. The request is represented by struct
ncsi_cmd_arg.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 20:49:16 -07:00
Linus Torvalds
631517032f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 "A few last-minute updates for the input subsystem"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ts4800-ts - add missing of_node_put after calling of_parse_phandle
  Input: synaptics-rmi4 - use of_get_child_by_name() to fix refcount
  Revert "Input: wacom_w8001 - drop use of ABS_MT_TOOL_TYPE"
  Input: xpad - validate USB endpoint count during probe
  Input: add SW_PEN_INSERTED define
2016-07-16 07:04:12 +09:00
Daniel Borkmann
555c8a8623 bpf: avoid stack copy and use skb ctx for event output
This work addresses a couple of issues bpf_skb_event_output()
helper currently has: i) We need two copies instead of just a
single one for the skb data when it should be part of a sample.
The data can be non-linear and thus needs to be extracted via
bpf_skb_load_bytes() helper first, and then copied once again
into the ring buffer slot. ii) Since bpf_skb_load_bytes()
currently needs to be used first, the helper needs to see a
constant size on the passed stack buffer to make sure BPF
verifier can do sanity checks on it during verification time.
Thus, just passing skb->len (or any other non-constant value)
wouldn't work, but changing bpf_skb_load_bytes() is also not
the proper solution, since the two copies are generally still
needed. iii) bpf_skb_load_bytes() is just for rather small
buffers like headers, since they need to sit on the limited
BPF stack anyway. Instead of working around in bpf_skb_load_bytes(),
this work improves the bpf_skb_event_output() helper to address
all 3 at once.

We can make use of the passed in skb context that we have in
the helper anyway, and use some of the reserved flag bits as
a length argument. The helper will use the new __output_custom()
facility from perf side with bpf_skb_copy() as callback helper
to walk and extract the data. It will pass the data for setup
to bpf_event_output(), which generates and pushes the raw record
with an additional frag part. The linear data used in the first
frag of the record serves as programmatically defined meta data
passed along with the appended sample.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 14:23:56 -07:00
Mauro Carvalho Chehab
12cb22bb8a uapi: export lirc.h header
This header contains the userspace API for lirc.

This is a fixup for commit b7be755733 ("[media] bz#75751: Move
internal header file lirc.h to uapi/").  It moved the header to the
right place, but it forgot to add it at Kbuild.  So, despite being at
uapi, it is not copied to the right place.

Fixes: b7be755733 ("[media] bz#75751: Move internal header file lirc.h to uapi/")
Link: http://lkml.kernel.org/r/320c765d32bfc82c582e336d52ffe1026c73c644.1468439021.git.mchehab@s-opensource.com
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Alec Leamas <leamas.alec@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-15 14:54:27 +09:00
Xin Long
826d253d57 sctp: add SCTP_PR_ASSOC_STATUS on sctp sockopt
This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used
to dump the prsctp statistics info from the asoc. The prsctp statistics
includes abandoned_sent/unsent from the asoc. abandoned_sent is the
count of the packets we drop packets from retransmit/transmited queue,
and abandoned_unsent is the count of the packets we drop from out_queue
according to the policy.

Note: another option for prsctp statistics dump described in rfc is
SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics
info from each stream. But by now, linux doesn't yet have per stream
statistics info, it needs rfc6525 to be implemented. As the prsctp
statistics for each stream has to be based on per stream statistics,
we will delay it until rfc6525 is done in linux.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 13:25:39 -07:00
Xin Long
f959fb442c sctp: add SCTP_DEFAULT_PRINFO into sctp sockopt
This patch adds SCTP_DEFAULT_PRINFO to sctp sockopt. It is used
to set/get sctp Partially Reliable Policies' default params,
which includes 3 policies (ttl, rtx, prio) and their values.

Still, if we set policy params in sndinfo, we will use the params
of sndinfo against chunks, instead of the default params.

In this patch, we will use 5-8bit of sp/asoc->default_flags
to store prsctp policies, and reuse asoc->default_timetolive
to store their values. It means if we enable and set prsctp
policy, prior ttl timeout in sctp will not work any more.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 13:25:38 -07:00
Xin Long
28aa4c26fc sctp: add SCTP_PR_SUPPORTED on sctp sockopt
According to section 4.5 of rfc7496, prsctp_enable should be per asoc.
We will add prsctp_enable to both asoc and ep, and replace the places
where it used net.sctp->prsctp_enable with asoc->prsctp_enable.

ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and
asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can
also modify it's value through sockopt SCTP_PR_SUPPORTED.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 13:25:38 -07:00
Nikolay Aleksandrov
a65056ecf4 net: bridge: extend MLD/IGMP query stats
As was suggested this patch adds support for the different versions of MLD
and IGMP query types. Since the user visible structure is still in net-next
we can augment it instead of adding netlink attributes.
The distinction between the different IGMP/MLD query types is done as
suggested in Section 7.1, RFC 3376 [1] and Section 8.1, RFC 3810 [2] based
on query payload size and code for IGMP. Since all IGMP packets go through
multicast_rcv() and it uses ip_mc_check_igmp/ipv6_mc_check_mld we can be
sure that at least the ip/ipv6 header can be directly used.

[1] https://tools.ietf.org/html/rfc3376#section-7
[2] https://tools.ietf.org/html/rfc3810#section-8.1

Suggested-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09 17:40:09 -04:00
Alexei Starovoitov
606274c5ab bpf: introduce bpf_get_current_task() helper
over time there were multiple requests to access different data
structures and fields of task_struct current, so finally add
the helper to access 'current' as-is. Tracing bpf programs will do
the rest of walking the pointers via bpf_probe_read().
Note that current can be null and bpf program has to deal it with,
but even dumb passing null into bpf_probe_read() is still safe.

Suggested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09 00:00:16 -04:00
David S. Miller
a90a6e55f3 Merge tag 'mac80211-next-for-davem-2016-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:

====================
One more set of new features:
 * beacon report (for radio measurement) support in cfg80211/mac80211
 * hwsim: allow wmediumd in namespaces
 * mac80211: extend 160MHz workaround to CSA IEs
 * mesh: properly encrypt group-addressed privacy action frames
 * mesh: allow setting peer AID
 * first steps for MU-MIMO monitor mode
 * along with various other cleanups and improvements
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06 22:32:15 -07:00
David S. Miller
30d0844bdc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx5/core/en.h
	drivers/net/ethernet/mellanox/mlx5/core/en_main.c
	drivers/net/usb/r8152.c

All three conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06 10:35:22 -07:00
David S. Miller
ae3e4562e2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next,
they are:

1) Don't use userspace datatypes in bridge netfilter code, from
   Tobin Harding.

2) Iterate only once over the expectation table when removing the
   helper module, instead of once per-netns, from Florian Westphal.

3) Extra sanitization in xt_hook_ops_alloc() to return error in case
   we ever pass zero hooks, xt_hook_ops_alloc():

4) Handle NFPROTO_INET from the logging core infrastructure, from
   Liping Zhang.

5) Autoload loggers when TRACE target is used from rules, this doesn't
   change the behaviour in case the user already selected nfnetlink_log
   as preferred way to print tracing logs, also from Liping Zhang.

6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
   by cache lines, increases the size of entries in 11% per entry.
   From Florian Westphal.

7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.

8) Remove useless defensive check in nf_logger_find_get() from Shivani
   Bhardwaj.

9) Remove zone extension as place it in the conntrack object, this is
   always include in the hashing and we expect more intensive use of
   zones since containers are in place. Also from Florian Westphal.

10) Owner match now works from any namespace, from Eric Bierdeman.

11) Make sure we only reply with TCP reset to TCP traffic from
    nf_reject_ipv4, patch from Liping Zhang.

12) Introduce --nflog-size to indicate amount of network packet bytes
    that are copied to userspace via log message, from Vishwanath Pai.
    This obsoletes --nflog-range that has never worked, it was designed
    to achieve this but it has never worked.

13) Introduce generic macros for nf_tables object generation masks.

14) Use generation mask in table, chain and set objects in nf_tables.
    This allows fixes interferences with ongoing preparation phase of
    the commit protocol and object listings going on at the same time.
    This update is introduced in three patches, one per object.

15) Check if the object is active in the next generation for element
    deactivation in the rbtree implementation, given that deactivation
    happens from the commit phase path we have to observe the future
    status of the object.

16) Support for deletion of just added elements in the hash set type.

17) Allow to resize hashtable from /proc entry, not only from the
    obscure /sys entry that maps to the module parameter, from Florian
    Westphal.

18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
    anymore since we tear down the ruleset whenever the netdevice
    goes away.

19) Support for matching inverted set lookups, from Arturo Borrero.

20) Simplify the iptables_mangle_hook() by removing a superfluous
    extra branch.

21) Introduce ether_addr_equal_masked() and use it from the netfilter
    codebase, from Joe Perches.

22) Remove references to "Use netfilter MARK value as routing key"
    from the Netfilter Kconfig description given that this toggle
    doesn't exists already for 10 years, from Moritz Sichert.

23) Introduce generic NF_INVF() and use it from the xtables codebase,
    from Joe Perches.

24) Setting logger to NONE via /proc was not working unless explicit
    nul-termination was included in the string. This fixes seems to
    leave the former behaviour there, so we don't break backward.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06 09:15:15 -07:00
Masashi Honma
7d27a0ba7a cfg80211: Add mesh peer AID setting API
Previously, mesh power management functionality works only with kernel
MPM. Because user space MPM did not report mesh peer AID to kernel,
the kernel could not identify the bit in TIM element. So this patch
adds mesh peer AID setting API.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-06 15:04:52 +02:00
Avraham Stern
1d76250bd3 nl80211: support beacon report scanning
Beacon report radio measurement requires reporting observed BSSs
on the channels specified in the beacon request. If the measurement
mode is set to passive or active, it requires actually performing a
scan (passive or active, accordingly), and reporting the time that
the scan was started and the time each beacon/probe was received
(both in terms of TSF of the BSS of the requesting AP). If the
request mode is table, this information is optional.
In addition, the radio measurement request specifies the channel
dwell time for the measurement.

In order to use scan for beacon report when the mode is active or
passive, add a parameter to scan request that specifies the
channel dwell time, and add scan start time and beacon received time
to scan results information.

Supporting beacon report is required for Multi Band Operation (MBO).

Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-06 14:51:31 +02:00
Aviya Erenfeld
c6e6a0c8be nl80211: Add API to support VHT MU-MIMO air sniffer
add API to support VHT MU-MIMO air sniffer.
in MU-MIMO there are parallel frames on the air while the HW
has only one RX.
add the capability to sniff one of the MU-MIMO parallel frames by
giving the sniffer additional information so it'll know which
of the parallel frames it shall follow.

Add attribute - NL80211_ATTR_MU_MIMO_GROUP_DATA - for getting
a MU-MIMO groupID in order to monitor packets from that group
using VHT MU-MIMO.
And add attribute -NL80211_ATTR_MU_MIMO_FOLLOW_ADDR - for passing
MAC address to monitor mode.
that option will be used by VHT MU-MIMO air sniffer to follow a
station according to it's MAC address using VHT MU-MIMO.

Signed-off-by: Aviya Erenfeld <aviya.erenfeld@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-07-06 14:46:04 +02:00
David S. Miller
b77af26a79 Merge tag 'batadv-next-for-davem-20160704' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:

====================
This feature patchset includes the following changes:

 - Cleanup work by Markus Pargmann and Sven Eckelmann (six patches)

 - Initial Netlink support by Matthias Schiffer (two patches)

 - Throughput Meter implementation by Antonio Quartulli, a kernel-space
   traffic generator to estimate link speeds. This feature is useful on
   low-end WiFi APs where running iperf or netperf from userspace
   gives wrong results due to heavy userspace/kernelspace overhead.
   (two patches)

 - API clean-up work by Antonio Quartulli (one patch)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 23:33:59 -07:00
Daniel Borkmann
13c5c240f7 bpf: add bpf_get_hash_recalc helper
If skb_clear_hash() was invoked due to mangling of relevant headers and
BPF program needs skb->hash later on, we can add a helper to trigger hash
recalculation via bpf_get_hash_recalc().

The helper will return the newly retrieved hash directly, but later access
can also be done via skb context again through skb->hash directly (inline)
without needing to call the helper once more.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 16:08:40 -07:00
Jamal Hadi Salim
ff202ee1ed net sched actions: skbedit add support for mod-ing skb pkt_type
Extremely useful for setting packet type to host so i dont
have to modify the dst mac address using pedit (which requires
that i know the mac address)

Example usage:
tc filter add dev eth0 parent ffff: protocol ip pref 9 u32 \
match ip src 5.5.5.5/32 \
flowid 1:5 action skbedit ptype host

This will tag all packets incoming from 5.5.5.5 with type
PACKET_HOST

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 15:11:14 -07:00
Antonio Quartulli
33a3bb4a33 batman-adv: throughput meter implementation
The throughput meter module is a simple, kernel-space replacement for
throughtput measurements tool like iperf and netperf. It is intended to
approximate TCP behaviour.

It is invoked through batctl: the protocol is connection oriented, with
cumulative acknowledgment and a dynamic-size sliding window.

The test *can* be interrupted by batctl. A receiver side timeout avoids
unlimited waitings for sender packets: after one second of inactivity, the
receiver abort the ongoing test.

Based on a prototype from Edo Monticelli <montik@autistici.org>

Signed-off-by: Antonio Quartulli <antonio.quartulli@open-mesh.com>
Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-04 12:37:18 +02:00
Matthias Schiffer
5da0aef5e9 batman-adv: add netlink command to query generic mesh information files
BATADV_CMD_GET_MESH_INFO is used to query basic information about a
batman-adv softif (name, index and MAC address for both the softif and
the primary hardif; routing algorithm; batman-adv version).

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[sven.eckelmann@open-mesh.com: Reduce the number of changes to
BATADV_CMD_GET_MESH_INFO, add missing kerneldoc, add policy for attributes]
Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-04 12:37:17 +02:00
Matthias Schiffer
09748a22f4 batman-adv: add generic netlink family for batman-adv
debugfs is currently severely broken virtually everywhere in the kernel
where files are dynamically added and removed (see
http://lkml.iu.edu/hypermail/linux/kernel/1506.1/02196.html for some
details). In addition to that, debugfs is not namespace-aware.

Instead of adding new debugfs entries, the whole infrastructure should be
moved to netlink. This will fix the long standing problem of large buffers
for debug tables and hard to parse text files.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[sven.eckelmann@open-mesh.com: Strip down patch to only add genl family,
add missing kerneldoc]
Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-04 12:37:17 +02:00
Linus Torvalds
0b295dd5b8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fix from Miklos Szeredi:
 "This makes sure userspace filesystems are not broken by the parallel
  lookups and readdir feature"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: serialize dirops by default
2016-07-03 12:02:00 -07:00
Or Gerlitz
08f4b5918b net/devlink: Add E-Switch mode control
Add the commands to set and show the mode of SRIOV E-Switch, two modes
are supported:

* legacy: operating in the "old" L2 based mode (DMAC --> VF vport)

* switchdev: the E-Switch is referred to as whitebox switch configured
using standard tools such as tc, bridge, openvswitch etc. To allow
working with the tools, for each VF, a VF representor netdevice is
created by the E-Switch manager vendor device driver instance (e.g PF).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-02 14:40:40 -04:00
Martin KaFai Lau
4a482f34af cgroup: bpf: Add bpf_skb_in_cgroup_proto
Adds a bpf helper, bpf_skb_in_cgroup, to decide if a skb->sk
belongs to a descendant of a cgroup2.  It is similar to the
feature added in netfilter:
commit c38c4597e4 ("netfilter: implement xt_cgroup cgroup2 path match")

The user is expected to populate a BPF_MAP_TYPE_CGROUP_ARRAY
which will be used by the bpf_skb_in_cgroup.

Modifications to the bpf verifier is to ensure BPF_MAP_TYPE_CGROUP_ARRAY
and bpf_skb_in_cgroup() are always used together.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01 16:32:13 -04:00
Martin KaFai Lau
4ed8ec521e cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY
Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations.
To update an element, the caller is expected to obtain a cgroup2 backed
fd by open(cgroup2_dir) and then update the array with that fd.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01 16:30:38 -04:00
Andrey Vagin
b1ed4c4fa9 tcp: add an ability to dump and restore window parameters
We found that sometimes a restored tcp socket doesn't work.

A reason of this bug is incorrect window parameters and in this case
tcp_acceptable_seq() returns tcp_wnd_end(tp) instead of tp->snd_nxt. The
other side drops packets with this seq, because seq is less than
tp->rcv_nxt ( tcp_sequence() ).

Data from a send queue is sent only if there is enough space in a
window, so when we restore unacked data, we need to expand a window to
fit this data.

This was in a first version of this patch:
"tcp: extend window to fit all restored unacked data in a send queue"

Then Alexey recommended me to restore window parameters instead of
adjusted them according with data in a sent queue. This sounds resonable.

rcv_wnd has to be restored, because it was reported to another side
and the offered window is never shrunk.
One of reasons why we need to restore snd_wnd was described above.

Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 08:15:31 -04:00
Miklos Szeredi
5c672ab3f0 fuse: serialize dirops by default
Negotiate with userspace filesystems whether they support parallel readdir
and lookup.  Disable parallelism by default for fear of breaking fuse
filesystems.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 9902af79c0 ("parallel lookups: actual switch to rwsem")
Fixes: d9b3dbdcfd ("fuse: switch to ->iterate_shared()")
2016-06-30 13:10:49 +02:00
Nikolay Aleksandrov
1080ab95e3 net: bridge: add support for IGMP/MLD stats and export them via netlink
This patch adds stats support for the currently used IGMP/MLD types by the
bridge. The stats are per-port (plus one stat per-bridge) and per-direction
(RX/TX). The stats are exported via netlink via the new linkxstats API
(RTM_GETSTATS). In order to minimize the performance impact, a new option
is used to enable/disable the stats - multicast_stats_enabled, similar to
the recent vlan stats. Also in order to avoid multiple IGMP/MLD type
lookups and checks, we make use of the current "igmp" member of the bridge
private skb->cb region to record the type on Rx (both host-generated and
external packets pass by multicast_rcv()). We can do that since the igmp
member was used as a boolean and all the valid IGMP/MLD types are positive
values. The normal bridge fast-path is not affected at all, the only
affected paths are the flooding ones and since we make use of the IGMP/MLD
type, we can quickly determine if the packet should be counted using
cache-hot data (cb's igmp member). We add counters for:
* IGMP Queries
* IGMP Leaves
* IGMP v1/v2/v3 reports

* MLD Queries
* MLD Leaves
* MLD v1/v2 reports

These are invaluable when monitoring or debugging complex multicast setups
with bridges.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 06:18:24 -04:00
Nikolay Aleksandrov
80e73cc563 net: rtnetlink: add support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute
This patch adds support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute
which allows to export per-slave statistics if the master device supports
the linkxstats callback. The attribute is passed down to the linkxstats
callback and it is up to the callback user to use it (an example has been
added to the only current user - the bridge). This allows us to query only
specific slaves of master devices like bridge ports and export only what
we're interested in instead of having to dump all ports and searching only
for a single one. This will be used to export per-port IGMP/MLD stats and
also per-port vlan stats in the future, possibly other statistics as well.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 06:15:04 -04:00
Daniel Borkmann
d2485c4242 bpf: add bpf_skb_change_type helper
This work adds a helper for changing skb->pkt_type in a controlled way.
We only allow a subset of possible values and can extend that in future
should other use cases come up. Doing this as a helper has the advantage
that errors can be handeled gracefully and thus helper kept extensible.

It's a write counterpart to pkt_type member we can already read from
struct __sk_buff context. Major use case is to change incoming skbs to
PACKET_HOST in a programmatic way instead of having to recirculate via
redirect(..., BPF_F_INGRESS), for example.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 05:54:40 -04:00
Daniel Borkmann
6578171a7f bpf: add bpf_skb_change_proto helper
This patch adds a minimal helper for doing the groundwork of changing
the skb->protocol in a controlled way. Currently supported is v4 to
v6 and vice versa transitions, which allows f.e. for a minimal, static
nat64 implementation where applications in containers that still
require IPv4 can be transparently operated in an IPv6-only environment.
For example, host facing veth of the container can transparently do
the transitions in a programmatic way with the help of clsact qdisc
and cls_bpf.

Idea is to separate concerns for keeping complexity of the helper
lower, which means that the programs utilize bpf_skb_change_proto(),
bpf_skb_store_bytes() and bpf_lX_csum_replace() to get the job done,
instead of doing everything in a single helper (and thus partially
duplicating helper functionality). Also, bpf_skb_change_proto()
shouldn't need to deal with raw packet data as this is done by other
helpers.

bpf_skb_proto_6_to_4() and bpf_skb_proto_4_to_6() unclone the skb to
operate on a private one, push or pop additionally required header
space and migrate the gso/gro meta data from the shared info. We do
mark the gso type as dodgy so that headers are checked and segs
recalculated by the gso/gro engine. The gso_size target is adapted
as well. The flags argument added is currently reserved and can be
used for future extensions.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 05:54:40 -04:00
Daniel Borkmann
6816a7ffce bpf, trace: add BPF_F_CURRENT_CPU flag for bpf_perf_event_read
Follow-up commit to 1e33759c78 ("bpf, trace: add BPF_F_CURRENT_CPU
flag for bpf_perf_event_output") to add the same functionality into
bpf_perf_event_read() helper. The split of index into flags and index
component is also safe here, since such large maps are rejected during
map allocation time.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 05:54:40 -04:00
David S. Miller
ee58b57100 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of overlapping changes, except the packet scheduler
conflicts which deal with the addition of the free list parameter
to qdisc_enqueue().

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 05:03:36 -04:00
Linus Torvalds
32826ac41f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "I've been traveling so this accumulates more than week or so of bug
  fixing.  It perhaps looks a little worse than it really is.

   1) Fix deadlock in ath10k driver, from Ben Greear.

   2) Increase scan timeout in iwlwifi, from Luca Coelho.

   3) Unbreak STP by properly reinjecting STP packets back into the
      stack.  Regression fix from Ido Schimmel.

   4) Mediatek driver fixes (missing malloc failure checks, leaking of
      scratch memory, wrong indexing when mapping TX buffers, etc.) from
      John Crispin.

   5) Fix endianness bug in icmpv6_err() handler, from Hannes Frederic
      Sowa.

   6) Fix hashing of flows in UDP in the ruseport case, from Xuemin Su.

   7) Fix netlink notifications in ovs for tunnels, delete link messages
      are never emitted because of how the device registry state is
      handled.  From Nicolas Dichtel.

   8) Conntrack module leaks kmemcache on unload, from Florian Westphal.

   9) Prevent endless jump loops in nft rules, from Liping Zhang and
      Pablo Neira Ayuso.

  10) Not early enough spinlock initialization in mlx4, from Eric
      Dumazet.

  11) Bind refcount leak in act_ipt, from Cong WANG.

  12) Missing RCU locking in HTB scheduler, from Florian Westphal.

  13) Several small MACSEC bug fixes from Sabrina Dubroca (missing RCU
      barrier, using heap for SG and IV, and erroneous use of async flag
      when allocating AEAD conext.)

  14) RCU handling fix in TIPC, from Ying Xue.

  15) Pass correct protocol down into ipv4_{update_pmtu,redirect}() in
      SIT driver, from Simon Horman.

  16) Socket timer deadlock fix in TIPC from Jon Paul Maloy.

  17) Fix potential deadlock in team enslave, from Ido Schimmel.

  18) Memory leak in KCM procfs handling, from Jiri Slaby.

  19) ESN generation fix in ipv4 ESP, from Herbert Xu.

  20) Fix GFP_KERNEL allocations with locks held in act_ife, from Cong
      WANG.

  21) Use after free in netem, from Eric Dumazet.

  22) Uninitialized last assert time in multicast router code, from Tom
      Goff.

  23) Skip raw sockets in sock_diag destruction broadcast, from Willem
      de Bruijn.

  24) Fix link status reporting in thunderx, from Sunil Goutham.

  25) Limit resegmentation of retransmit queue so that we do not
      retransmit too large GSO frames.  From Eric Dumazet.

  26) Delay bpf program release after grace period, from Daniel
      Borkmann"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (141 commits)
  openvswitch: fix conntrack netlink event delivery
  qed: Protect the doorbell BAR with the write barriers.
  neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
  e1000e: keep VLAN interfaces functional after rxvlan off
  cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header
  qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag()
  bpf, perf: delay release of BPF prog after grace period
  net: bridge: fix vlan stats continue counter
  tcp: do not send too big packets at retransmit time
  ibmvnic: fix to use list_for_each_safe() when delete items
  net: thunderx: Fix TL4 configuration for secondary Qsets
  net: thunderx: Fix link status reporting
  net/mlx5e: Reorganize ethtool statistics
  net/mlx5e: Fix number of PFC counters reported to ethtool
  net/mlx5e: Prevent adding the same vxlan port
  net/mlx5e: Check for BlueFlame capability before allocating SQ uar
  net/mlx5e: Change enum to better reflect usage
  net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices
  net/mlx5: Update command strings
  net: marvell: Add separate config ANEG function for Marvell 88E1111
  ...
2016-06-29 11:50:42 -07:00
Douglas Anderson
9a9b6aa6a8 Input: add SW_PEN_INSERTED define
Some devices with a pen may have a switch that can be used to detect
when the pen is inserted or removed to a slot on the device.  Let's add
a define to the input event codes so that everyone can be on the same
page for what event we should generate when the pen is inserted or
removed.

In general the pen switch could be used by the software on the device to
kick off any number of actions when the pen is inserted or removed.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-06-29 09:51:03 -07:00
David Ahern
637c841dd7 net: diag: Add support to filter on device index
Add support to inet_diag facility to filter sockets based on device
index. If an interface index is in the filter only sockets bound
to that index (sk_bound_dev_if) are returned.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-28 05:25:04 -04:00
Linus Torvalds
18751e2eb5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: vmmouse - remove port reservation
  Input: elantech - add more IC body types to the list
  Input: wacom_w8001 - ignore invalid pen data packets
  Input: wacom_w8001 - w8001_MAX_LENGTH should be 13
  Input: xpad - fix oops when attaching an unknown Xbox One gamepad
  MAINTAINERS: add Pali Rohár as reviewer of ALPS PS/2 touchpad driver
  Input: add HDMI CEC specific keycodes
  Input: add BUS_CEC type
  Input: xpad - fix rumble on Xbox One controllers with 2015 firmware
2016-06-27 20:34:43 -07:00
Gal Pressman
89da45b8b5 ethtool: Add 50G baseSR2 link mode
Add ETHTOOL_LINK_MODE_50000baseSR2_Full_BIT bit.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: Ben Hutchings <bwh@kernel.org>
Cc: David Decotigny <decot@googlers.com>
Acked-By: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-27 04:10:41 -04:00
Arturo Borrero
0071e184a5 netfilter: nf_tables: add support for inverted logic in nft_lookup
Introduce a new configuration option for this expression, which allows users
to invert the logic of set lookups.

In _init() we will now return EINVAL if NFT_LOOKUP_F_INV is in anyway
related to a map lookup.

The code in the _eval() function has been untangled and updated to sopport the
XOR of options, as we should consider 4 cases:
 * lookup false, invert false -> NFT_BREAK
 * lookup false, invert true -> return w/o NFT_BREAK
 * lookup true, invert false -> return w/o NFT_BREAK
 * lookup true, invert true -> NFT_BREAK

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:29 +02:00
Vishwanath Pai
7643507fe8 netfilter: xt_NFLOG: nflog-range does not truncate packets
li->u.ulog.copy_len is currently ignored by the kernel, we should truncate
the packet to either li->u.ulog.copy_len (if set) or copy_range before
sending it to userspace. 0 is a valid input for copy_len, so add a new
flag to indicate whether this was option was specified by the user or not.

Add two flags to indicate whether nflog-size/copy_len was set or not.
XT_NFLOG_F_COPY_LEN is for XT_NFLOG and NFLOG_F_COPY_LEN for nfnetlink_log

On the userspace side, this was initially represented by the option
nflog-range, this will be replaced by --nflog-size now. --nflog-range would
still exist but does not do anything.

Reported-by: Joe Dollard <jdollard@akamai.com>
Reviewed-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:23 +02:00
William Tu
b95e5928fc openvswitch: Add packet len info to upcall.
The commit f2a4d086ed ("openvswitch: Add packet truncation support.")
introduces packet truncation before sending to userspace upcall receiver.
This patch passes up the skb->len before truncation so that the upcall
receiver knows the original packet size. Potentially this will be used
by sFlow, where OVS translates sFlow config header=N to a sample action,
truncating packet to N byte in kernel datapath. Thus, only N bytes instead
of full-packet size is copied from kernel to userspace, saving the
kernel-to-userspace bandwidth.

Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-22 16:34:39 -04:00
Eric Dumazet
20e1954fe2 ipv6: RFC 4884 partial support for SIT/GRE tunnels
When receiving an ICMPv4 message containing extensions as
defined in RFC 4884, and translating it to ICMPv6 at SIT
or GRE tunnel, we need some extra manipulation in order
to properly forward the extensions.

This patch only takes care of Time Exceeded messages as they
are the ones that typically carry information from various
routers in a fabric during a traceroute session.

It also avoids complex skb logic if the data_len is not
a multiple of 8.

RFC states :

   The "original datagram" field MUST contain at least 128 octets.
   If the original datagram did not contain 128 octets, the
   "original datagram" field MUST be zero padded to 128 octets.

In practice routers use 128 bytes of original datagram, not more.

Initial translation was added in commit ca15a078bd
("sit: generate icmpv6 error when receiving icmpv4 error")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Oussama Ghorbel <ghorbel@pivasoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-18 22:11:39 -07:00
Kamil Debski
488326947c Input: add HDMI CEC specific keycodes
Add HDMI CEC specific keycodes to the keycodes definition.

Signed-off-by: Kamil Debski <kamil@wypas.org>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-06-18 10:32:29 -07:00
Hans Verkuil
3720b69baf Input: add BUS_CEC type
Inputs can come in over the HDMI CEC bus, so add a new type for this.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-06-18 10:32:28 -07:00