Commit Graph

27015 Commits

Author SHA1 Message Date
Jiri Pirko
2429f7ac2e net: introduce netif_addr_lock_nested() and call if when appropriate
dev_uc_sync() and dev_mc_sync() are acquiring netif_addr_lock for
destination device of synchronization. Since netif_addr_lock is
already held at the time for source device, this triggers lockdep
deadlock warning.

There's no way this deadlock can happen so use spin_lock_nested() to
silence the warning.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-09 12:46:58 -08:00
Linus Torvalds
770e1b035d Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm
* 'for-linus' of git://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm: (207 commits)
  ARM: 7267/1: Remove BUILD_BUG_ON from asm/bug.h
  ARM: 7269/1: mach-sa1100: fix sched_clock breakage
  ARM: 7198/1: arm/imx6: add restart support for imx6q
  ARM: restart: remove the now empty arch_reset()
  ARM: restart: remove comments about adding code to arch_reset()
  ARM: restart: lpc32xx & u300: remove unnecessary printk
  ARM: restart: plat-samsung: remove plat/reset.h and s5p_reset_hook
  ARM: restart: w90x900: use new restart hook
  ARM: restart: Versatile Express: use new restart hook
  ARM: restart: versatile: use new restart hook
  ARM: restart: u300: use new restart hook
  ARM: restart: tegra: use new restart hook
  ARM: restart: spear: use new restart hook
  ARM: restart: shark: use new restart hook
  ARM: restart: sa1100: use new restart hook
  ARM: 7252/1: restart: S5PV210: use new restart hook
  ARM: 7251/1: restart: S5PC100: use new restart hook
  ARM: 7250/1: restart: S5P64X0: use new restart hook
  ARM: 7266/1: restart: S3C64XX: use new restart hook
  ARM: 7265/1: restart: S3C24XX: use new restart hook
  ...

Fix up trivial conflict in arch/arm/mm/init.c due to removal of
memblock_init() clashing with the movement of the sorting of the meminfo
array.
2012-01-06 18:15:25 -08:00
Linus Torvalds
d3d0b02434 Merge branch 'amba-modalias' of git://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm
* 'amba-modalias' of git://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
  sound: aaci: Enable module alias autogeneration for AMBA drivers
  watchdog: sp805: Enable module alias autogeneration for AMBA drivers
  fbdev: amba: Enable module alias autogeneration for AMBA drivers
  serial: pl011: Enable module alias autogeneration for AMBA drivers
  serial: pl010: Enable module alias autogeneration for AMBA drivers
  spi: pl022: Enable module alias autogeneration for AMBA drivers
  rtc: pl031: Enable module alias autogeneration for AMBA drivers
  rtc: pl030: Enable module alias autogeneration for AMBA drivers
  mmc: mmci: Enable module alias autogeneration for AMBA drivers
  input: ambakmi: Enable module alias autogeneration for AMBA drivers
  gpio: pl061: Enable module alias autogeneration for AMBA drivers
  dmaengine: pl330: Enable module alias autogeneration for AMBA drivers
  dmaengine: pl08x: Enable module alias autogeneration for AMBA drivers
  hwrng: nomadik: Enable module alias autogeneration for AMBA drivers
  ARM: amba: Auto-generate AMBA driver module aliases during modpost
  ARM: amba: Move definition of struct amba_id to mod_devicetable.h
2012-01-06 18:03:30 -08:00
Linus Torvalds
6ed23fd6c0 Merge branch 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
* 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore: gracefully handle NULL pstore_info functions
  pstore: pass reason to backend write callback
2012-01-06 18:03:02 -08:00
Linus Torvalds
e4e88f31bc Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (185 commits)
  powerpc: fix compile error with 85xx/p1010rdb.c
  powerpc: fix compile error with 85xx/p1023_rds.c
  powerpc/fsl: add MSI support for the Freescale hypervisor
  arch/powerpc/sysdev/fsl_rmu.c: introduce missing kfree
  powerpc/fsl: Add support for Integrated Flash Controller
  powerpc/fsl: update compatiable on fsl 16550 uart nodes
  powerpc/85xx: fix PCI and localbus properties in p1022ds.dts
  powerpc/85xx: re-enable ePAPR byte channel driver in corenet32_smp_defconfig
  powerpc/fsl: Update defconfigs to enable some standard FSL HW features
  powerpc: Add TBI PHY node to first MDIO bus
  sbc834x: put full compat string in board match check
  powerpc/fsl-pci: Allow 64-bit PCIe devices to DMA to any memory address
  powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exit
  offb: Fix setting of the pseudo-palette for >8bpp
  offb: Add palette hack for qemu "standard vga" framebuffer
  offb: Fix bug in calculating requested vram size
  powerpc/boot: Change the WARN to INFO for boot wrapper overlap message
  powerpc/44x: Fix build error on currituck platform
  powerpc/boot: Change the load address for the wrapper to fit the kernel
  powerpc/44x: Enable CRASH_DUMP for 440x
  ...

Fix up a trivial conflict in arch/powerpc/include/asm/cputime.h due to
the additional sparse-checking code for cputime_t.
2012-01-06 17:58:22 -08:00
Linus Torvalds
9753dfe19a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1958 commits)
  net: pack skb_shared_info more efficiently
  net_sched: red: split red_parms into parms and vars
  net_sched: sfq: extend limits
  cnic: Improve error recovery on bnx2x devices
  cnic: Re-init dev->stats_addr after chip reset
  net_sched: Bug in netem reordering
  bna: fix sparse warnings/errors
  bna: make ethtool_ops and strings const
  xgmac: cleanups
  net: make ethtool_ops const
  vmxnet3" make ethtool ops const
  xen-netback: make ops structs const
  virtio_net: Pass gfp flags when allocating rx buffers.
  ixgbe: FCoE: Add support for ndo_get_fcoe_hbainfo() call
  netdev: FCoE: Add new ndo_get_fcoe_hbainfo() call
  igb: reset PHY after recovering from PHY power down
  igb: add basic runtime PM support
  igb: Add support for byte queue limits.
  e1000: cleanup CE4100 MDIO registers access
  e1000: unmap ce4100_gbe_mdio_base_virt in e1000_remove
  ...
2012-01-06 17:22:09 -08:00
Linus Torvalds
0db49b72bc Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  sched/tracing: Add a new tracepoint for sleeptime
  sched: Disable scheduler warnings during oopses
  sched: Fix cgroup movement of waking process
  sched: Fix cgroup movement of newly created process
  sched: Fix cgroup movement of forking process
  sched: Remove cfs bandwidth period check in tg_set_cfs_period()
  sched: Fix load-balance lock-breaking
  sched: Replace all_pinned with a generic flags field
  sched: Only queue remote wakeups when crossing cache boundaries
  sched: Add missing rcu_dereference() around ->real_parent usage
  [S390] fix cputime overflow in uptime_proc_show
  [S390] cputime: add sparse checking and cleanup
  sched: Mark parent and real_parent as __rcu
  sched, nohz: Fix missing RCU read lock
  sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
  sched, nohz: Fix the idle cpu check in nohz_idle_balance
  sched: Use jump_labels for sched_feat
  sched/accounting: Fix parameter passing in task_group_account_field
  sched/accounting: Fix user/system tick double accounting
  sched/accounting: Re-use scheduler statistics for the root cgroup
  ...

Fix up conflicts in
 - arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
	usecs_to_cputime64() vs the sparse cleanups
 - kernel/sched/fair.c, kernel/time/tick-sched.c
	scheduler changes in multiple branches
2012-01-06 08:44:54 -08:00
Linus Torvalds
35b740e466 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (106 commits)
  perf kvm: Fix copy & paste error in description
  perf script: Kill script_spec__delete
  perf top: Fix a memory leak
  perf stat: Introduce get_ratio_color() helper
  perf session: Remove impossible condition check
  perf tools: Fix feature-bits rework fallout, remove unused variable
  perf script: Add generic perl handler to process events
  perf tools: Use for_each_set_bit() to iterate over feature flags
  perf tools: Unify handling of features when writing feature section
  perf report: Accept fifos as input file
  perf tools: Moving code in some files
  perf tools: Fix out-of-bound access to struct perf_session
  perf tools: Continue processing header on unknown features
  perf tools: Improve macros for struct feature_ops
  perf: builtin-record: Document and check that mmap_pages must be a power of two.
  perf: builtin-record: Provide advice if mmap'ing fails with EPERM.
  perf tools: Fix truncated annotation
  perf script: look up thread using tid instead of pid
  perf tools: Look up thread names for system wide profiling
  perf tools: Fix comm for processes with named threads
  ...
2012-01-06 08:02:58 -08:00
Linus Torvalds
423d091dfe Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  cpu: Export cpu_up()
  rcu: Apply ACCESS_ONCE() to rcu_boost() return value
  Revert "rcu: Permit rt_mutex_unlock() with irqs disabled"
  docs: Additional LWN links to RCU API
  rcu: Augment rcu_batch_end tracing for idle and callback state
  rcu: Add rcutorture tests for srcu_read_lock_raw()
  rcu: Make rcutorture test for hotpluggability before offlining CPUs
  driver-core/cpu: Expose hotpluggability to the rest of the kernel
  rcu: Remove redundant rcu_cpu_stall_suppress declaration
  rcu: Adaptive dyntick-idle preparation
  rcu: Keep invoking callbacks if CPU otherwise idle
  rcu: Irq nesting is always 0 on rcu_enter_idle_common
  rcu: Don't check irq nesting from rcu idle entry/exit
  rcu: Permit dyntick-idle with callbacks pending
  rcu: Document same-context read-side constraints
  rcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass
  rcu: Remove dynticks false positives and RCU failures
  rcu: Reduce latency of rcu_prepare_for_idle()
  rcu: Eliminate RCU_FAST_NO_HZ grace-period hang
  rcu: Avoid needlessly IPIing CPUs at GP end
  ...
2012-01-06 08:02:40 -08:00
Linus Torvalds
1483b38235 Merge branch 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  printk, lockdep: Switch to tracked irq ops
  printk, lockdep: Remove superfluous preempt_disable()
  printk, lockdep: Disable lock debugging on zap_locks()
2012-01-06 08:01:59 -08:00
Linus Torvalds
4a2164a7db Merge branch 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  memblock: Reimplement memblock allocation using reverse free area iterator
  memblock: Kill early_node_map[]
  score: Use HAVE_MEMBLOCK_NODE_MAP
  s390: Use HAVE_MEMBLOCK_NODE_MAP
  mips: Use HAVE_MEMBLOCK_NODE_MAP
  ia64: Use HAVE_MEMBLOCK_NODE_MAP
  SuperH: Use HAVE_MEMBLOCK_NODE_MAP
  sparc: Use HAVE_MEMBLOCK_NODE_MAP
  powerpc: Use HAVE_MEMBLOCK_NODE_MAP
  memblock: Implement memblock_add_node()
  memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users
  memblock: Track total size of regions automatically
  powerpc: Cleanup memblock usage
  memblock: Reimplement memblock_enforce_memory_limit() using __memblock_remove()
  memblock: Make memblock functions handle overflowing range @size
  memblock: Reimplement __memblock_remove() using memblock_isolate_range()
  memblock: Separate out memblock_isolate_range() from memblock_set_node()
  memblock: Kill memblock_init()
  memblock: Kill sentinel entries at the end of static region arrays
  memblock: Add __memblock_dump_all()
  ...
2012-01-06 07:54:53 -08:00
Linus Torvalds
15f043a65f Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep/waitqueues: Add better annotation
  lockdep, bug: Exclude TAINT_OOT_MODULE from disabling lock debugging
  lockdep: Print lock name in lockdep_init_error()
  init/main.c: Execute lockdep_init() as early as possible
  lockdep, kmemcheck: Annotate ->lock in lockdep_init_map()
  lockdep, rtmutex, bug: Show taint flags on error
  lockdep, bug: Exclude TAINT_FIRMWARE_WORKAROUND from disabling lockdep
  lockdep: Always try to set ->class_cache in register_lock_class() lockdep_init_map()
2012-01-06 07:53:52 -08:00
Linus Torvalds
8c717b72de Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timer: Use debugobjects to catch deletion of uninitialized timers
  timer: Setup uninitialized timer with a stub callback
  debugobjects: Extend to assert that an object is initialized
  debugobjects: Be smarter about static objects
2012-01-06 07:53:34 -08:00
Ian Campbell
9f42f12615 net: pack skb_shared_info more efficiently
nr_frags can be 8 bits since 256 is plenty of fragments. This allows it to be
packed with tx_flags.

Also by moving ip6_frag_id and dataref (both 4 bytes) next to each other we can
avoid a hole between ip6_frag_id and frag_list on 64 bit systems.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-05 14:01:21 -05:00
Eric Dumazet
18cb809850 net_sched: sfq: extend limits
SFQ as implemented in Linux is very limited, with at most 127 flows
and limit of 127 packets. [ So if 127 flows are active, we have one
packet per flow ]

This patch brings to SFQ following features to cope with modern needs.

- Ability to specify a smaller per flow limit of inflight packets.
    (default value being at 127 packets)

- Ability to have up to 65408 active flows (instead of 127)

- Ability to have head drops instead of tail drops
  (to drop old packets from a flow)

Example of use : No more than 20 packets per flow, max 8000 flows, max
20000 packets in SFQ qdisc, hash table of 65536 slots.

tc qdisc add ... sfq \
        flows 8000 \
        depth 20 \
        headdrop \
        limit 20000 \
	divisor 65536

Ram usage :

2 bytes per hash table entry (instead of previous 1 byte/entry)
32 bytes per flow on 64bit arches, instead of 384 for QFQ, so much
better cache hit ratio.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-05 14:01:21 -05:00
Neerav Parikh
68bad94ed8 netdev: FCoE: Add new ndo_get_fcoe_hbainfo() call
This adds a new ndo_get_fcoe_hbainfo() call in
net_device_ops for FCoE protocol stack.

If supported by the underlying device, the FCoE protocol
stack will call this to get device specific information
from the underlying device.
This information will then be utilized by the FCoE protocol
stack to register Fiber Channel HBA attributes with the
Fiber Channel Management Service via Fabric Device
Management Interface (FDMI) as per the T11 FC-GS
specification.

Changes in v2:
- As per comments from David Miller aligning the parameters
of the ndo_get_fcoe_hbainfo()

Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-05 13:12:04 -05:00
John W. Linville
1032c736e8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/b43legacy/dma.c
2012-01-05 10:13:24 -05:00
Russell King
2e0e943436 Merge branch 'devel-stable' into for-linus
Conflicts:
	arch/arm/kernel/setup.c
	arch/arm/mach-shmobile/board-kota2.c
2012-01-05 13:24:33 +00:00
David S. Miller
117ff42fd4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-01-04 21:35:43 -05:00
Ilan Elias
288e0713f4 NFC: Export a new attribute nfcid1 in target info
The nfcid1 is the NFC-A identifier.
It is exported as an attribute of the target info
(returned as a response to NFC_CMD_GET_TARGET).

Signed-off-by: Ilan Elias <ilane@ti.com>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-01-04 14:30:43 -05:00
Ben Hutchings
6cfb5e759d ethtool: Remove ethtool_ops::set_rx_ntuple operation
All implementations have been converted to implement set_rxnfc
instead.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-04 14:10:28 -05:00
Ben Hutchings
55664f324c ethtool: Allow drivers to select RX NFC rule locations
Define special location values for RX NFC that request the driver to
select the actual rule location.  This allows for implementation on
devices that use hash-based filter lookup, whereas currently the API is
more suited to devices with TCAM lookup or linear search.

In ethtool_set_rxnfc() and the compat wrapper ethtool_ioctl(), copy
the structure back to user-space after insertion so that the actual
location is returned.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-04 14:09:10 -05:00
Javier Martinez Canillas
43c6759e73 net: phy: smsc: Move SMSC PHY constants to <linux/smscphy.h>
SMSC generation 4 LAN chips integrate an IEEE 802.3 ethernet physical layer.
The ethernet driver for this family of devices needs to access the SMSC PHY
registers and bit-fields.

So, this patch moves these constants to a place where it can be used for both
the PHY and LAN drivers.

Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-03 20:23:18 -05:00
Jan Kara
30e053248d security: Fix security_old_inode_init_security() when CONFIG_SECURITY is not set
Commit 1e39f384bb ("evm: fix build problems") makes the stub version
of security_old_inode_init_security() return 0 when CONFIG_SECURITY is
not set.

But that makes callers such as reiserfs_security_init() assume that
security_old_inode_init_security() has set name, value, and len
arguments properly - but security_old_inode_init_security() left them
uninitialized which then results in interesting failures.

Revert security_old_inode_init_security() to the old behavior of
returning EOPNOTSUPP since both callers (reiserfs and ocfs2) handle this
just fine.

[ Also fixed the S_PRIVATE(inode) case of the actual non-stub
  security_old_inode_init_security() function to return EOPNOTSUPP
  for the same reason, as pointed out by Mimi Zohar.

  It got incorrectly changed to match the new function in commit
  fb88c2b6cb: "evm: fix security/security_old_init_security return
  code".   - Linus ]

Reported-by: Jorge Bastos <mysql.jorge@decimal.pt>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-03 16:12:19 -08:00
John W. Linville
57adc1fcba Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/b43/dma.c
	drivers/net/wireless/brcm80211/brcmfmac/dhd_linux.c
2012-01-03 15:16:34 -05:00
Pavel Emelyanov
c9da99e647 unix_diag: Fixup RQLEN extension report
While it's not too late fix the recently added RQLEN diag extension
to report rqlen and wqlen in the same way as TCP does.

I.e. for listening sockets the ack backlog length (which is the input
queue length for socket) in rqlen and the max ack backlog length in
wqlen, and what the CINQ/OUTQ ioctls do for established.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-30 16:46:02 -05:00
Pavel Emelyanov
257b529876 unix_diag: Add the MEMINFO extension
[ Fix indentation of sock_diag*() calls. -DaveM ]

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-30 16:44:24 -05:00
Pavel Emelyanov
c0636faa53 inet_diag: Add the SKMEMINFO extension
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-30 16:42:19 -05:00
Pavel Emelyanov
5d2e5f274f sock_diag: Introduce the meminfo nla core (v2)
Add a routine that dumps memory-related values of a socket.
It's made as an array to make it possible to add more stuff
here later without breaking compatibility.

Since v1: The SK_MEMINFO_ constants are in userspace
visible part of sock_diag.h, the rest is under __KERNEL__.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-30 16:42:19 -05:00
Pavel Emelyanov
288461e154 unix_diag: Include unix_diag.h into header-y target
The headers check complains it should include the linux/types.h
withing, thus add this one.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-30 16:42:19 -05:00
Pavel Emelyanov
e6fe2371bd sock_diag: Arrange sock_diag.h such that it is exportable to userspace
Properly toss existing components around the ifdef __KERNEL__
and include the header into the header-y target.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-30 16:42:18 -05:00
David S. Miller
7f8e3234c5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-30 13:04:14 -05:00
Jan Engelhardt
a4c6f9d363 netfilter: xtables: give xt_ecn its own name
Use the new macro and struct names in xt_ecn.h, and put the old
definitions into a definition-forwarding ipt_ecn.h.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-27 20:31:38 +01:00
Jan Engelhardt
d446a8202c netfilter: xtables: move ipt_ecn to xt_ecn
Prepare the ECN match for augmentation by an IPv6 counterpart. Since
no symbol dependencies to ipv6.ko are added, having a single ecn match
module is the more so welcome.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-27 20:31:31 +01:00
Jan Kiszka
4d25a066b6 KVM: Don't automatically expose the TSC deadline timer in cpuid
Unlike all of the other cpuid bits, the TSC deadline timer bit is set
unconditionally, regardless of what userspace wants.

This is broken in several ways:
 - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
   deadline timer feature, a guest that uses the feature will break
 - live migration to older host kernels that don't support the TSC deadline
   timer will cause the feature to be pulled from under the guest's feet;
   breaking it
 - guests that are broken wrt the feature will fail.

Fix by not enabling the feature automatically; instead report it to userspace.
Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
KVM_GET_SUPPORTED_CPUID.

Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.

[avi: add the KVM_CAP + documentation]

Reported-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-26 13:27:44 +02:00
David S. Miller
c5e1fd8cca Merge branch 'nf-next' of git://1984.lsi.us.es/net-next 2011-12-25 02:21:45 -05:00
Pablo Neira Ayuso
ceb98d03ea netfilter: xtables: add nfacct match to support extended accounting
This patch adds the match that allows to perform extended
accounting. It requires the new nfnetlink_acct infrastructure.

 # iptables -I INPUT -p tcp --sport 80 -m nfacct --nfacct-name http-traffic
 # iptables -I OUTPUT -p tcp --dport 80 -m nfacct --nfacct-name http-traffic

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-25 02:43:17 +01:00
Pablo Neira Ayuso
9413902796 netfilter: add extended accounting infrastructure over nfnetlink
We currently have two ways to account traffic in netfilter:

- iptables chain and rule counters:

 # iptables -L -n -v
Chain INPUT (policy DROP 3 packets, 867 bytes)
 pkts bytes target     prot opt in     out     source               destination
    8  1104 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0

- use flow-based accounting provided by ctnetlink:

 # conntrack -L
tcp      6 431999 ESTABLISHED src=192.168.1.130 dst=212.106.219.168 sport=58152 dport=80 packets=47 bytes=7654 src=212.106.219.168 dst=192.168.1.130 sport=80 dport=58152 packets=49 bytes=66340 [ASSURED] mark=0 use=1

While trying to display real-time accounting statistics, we require
to pool the kernel periodically to obtain this information. This is
OK if the number of flows is relatively low. However, in case that
the number of flows is huge, we can spend a considerable amount of
cycles to iterate over the list of flows that have been obtained.

Moreover, if we want to obtain the sum of the flow accounting results
that match some criteria, we have to iterate over the whole list of
existing flows, look for matchings and update the counters.

This patch adds the extended accounting infrastructure for
nfnetlink which aims to allow displaying real-time traffic accounting
without the need of complicated and resource-consuming implementation
in user-space. Basically, this new infrastructure allows you to create
accounting objects. One accounting object is composed of packet and
byte counters.

In order to manipulate create accounting objects, you require the
new libnetfilter_acct library. It contains several examples of use:

libnetfilter_acct/examples# ./nfacct-add http-traffic
libnetfilter_acct/examples# ./nfacct-get
http-traffic = { pkts = 000000000000,   bytes = 000000000000 };

Then, you can use one of this accounting objects in several iptables
rules using the new nfacct match (which comes in a follow-up patch):

 # iptables -I INPUT -p tcp --sport 80 -m nfacct --nfacct-name http-traffic
 # iptables -I OUTPUT -p tcp --dport 80 -m nfacct --nfacct-name http-traffic

The idea is simple: if one packet matches the rule, the nfacct match
updates the counters.

Thanks to Patrick McHardy, Eric Dumazet, Changli Gao for reviewing and
providing feedback for this contribution.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-25 02:43:03 +01:00
Eric Dumazet
60b778ce51 rfs: better sizing of dev_flow_table
Aim of this patch is to provide full range of rps_flow_cnt on 64bit arches.

Theorical limit on number of flows is 2^32

Fix some buggy RPS/RFS macros as well.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
CC: Xi Wang <xi.wang@gmail.com>
CC: Laurent Chavey <chavey@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-24 16:13:27 -05:00
Jamie Iles
c87fb57346 ARM: 7235/1: irqdomain: export irq_domain_simple_ops for !CONFIG_OF
irqdomain support is used in interrupt controller drivers that may not
have device tree support but only need the basic HW->Linux irq
translation.  Rather than having each of these implement their own IRQ
domain, allow them to use the simple ops.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rob Herring <robherring2@gmail.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-12-23 22:33:58 +00:00
David S. Miller
abb434cb05 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/bluetooth/l2cap_core.c

Just two overlapping changes, one added an initialization of
a local variable, and another change added a new local variable.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23 17:13:56 -05:00
Ian Campbell
9d4dde5215 net: only use a single page of slop in MAX_SKB_FRAGS
In order to accommodate a 64K buffer we need 64K/PAGE_SIZE plus one more page
in order to allow for a buffer which does not start on a page boundary.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23 16:51:18 -05:00
Patrick McHardy
cbc9f2f4fc netfilter: nf_nat: export NAT definitions to userspace
Export the NAT definitions to userspace. So far userspace (specifically,
iptables) has been copying the headers files from include/net. Also
rename some structures and definitions in preparation for IPv6 NAT.
Since these have never been officially exported, this doesn't affect
existing userspace code.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23 14:36:43 +01:00
Pablo Neira Ayuso
3d058d7bc2 netfilter: rework user-space expectation helper support
This partially reworks bc01befdcf
which added userspace expectation support.

This patch removes the nf_ct_userspace_expect_list since now we
force to use the new iptables CT target feature to add the helper
extension for conntracks that have attached expectations from
userspace.

A new version of the proof-of-concept code to implement userspace
helpers from userspace is available at:

http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-POC.tar.bz2

This patch also modifies the CT target to allow to set the
conntrack's userspace helper status flags. This flag is used
to tell the conntrack system to explicitly allocate the helper
extension.

This helper extension is useful to link the userspace expectations
with the master conntrack that is being tracked from one userspace
helper.

This feature fixes a problem in the current approach of the
userspace helper support. Basically, if the master conntrack that
has got a userspace expectation vanishes, the expectations point to
one invalid memory address. Thus, triggering an oops in the
expectation deletion event path.

I decided not to add a new revision of the CT target because
I only needed to add a new flag for it. I'll document in this
issue in the iptables manpage. I have also changed the return
value from EINVAL to EOPNOTSUPP if one flag not supported is
specified. Thus, in the future adding new features that only
require a new flag can be added without a new revision.

There is no official code using this in userspace (apart from
the proof-of-concept) that uses this infrastructure but there
will be some by beginning 2012.

Reported-by: Sam Roberts <vieuxtech@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-23 14:36:39 +01:00
Srivatsa S. Bhat
e30e2fdfe5 VFS: Fix race between CPU hotplug and lglocks
Currently, the *_global_[un]lock_online() routines are not at all synchronized
with CPU hotplug. Soft-lockups detected as a consequence of this race was
reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng
for finding out that the root-cause of this issue is the race condition
between br_write_[un]lock() and CPU hotplug, which results in the lock states
getting messed up).

Fixing this race by just adding {get,put}_online_cpus() at appropriate places
in *_global_[un]lock_online() is not a good option, because, then suddenly
br_write_[un]lock() would become blocking, whereas they have been kept as
non-blocking all this time, and we would want to keep them that way.

So, overall, we want to ensure 3 things:
1. br_write_lock() and br_write_unlock() must remain as non-blocking.
2. The corresponding lock and unlock of the per-cpu spinlocks must not happen
   for different sets of CPUs.
3. Either prevent any new CPU online operation in between this lock-unlock, or
   ensure that the newly onlined CPU does not proceed with its corresponding
   per-cpu spinlock unlocked.

To achieve all this:
(a) We introduce a new spinlock that is taken by the *_global_lock_online()
    routine and released by the *_global_unlock_online() routine.
(b) We register a callback for CPU hotplug notifications, and this callback
    takes the same spinlock as above.
(c) We maintain a bitmap which is close to the cpu_online_mask, and once it is
    initialized in the lock_init() code, all future updates to it are done in
    the callback, under the above spinlock.
(d) The above bitmap is used (instead of cpu_online_mask) while locking and
    unlocking the per-cpu locks.

The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the
br_write_lock-unlock sequence is in progress, the callback keeps spinning,
thus preventing the CPU online operation till the lock-unlock sequence is
complete. This takes care of requirement (3).

The bitmap that we maintain remains unmodified throughout the lock-unlock
sequence, since all updates to it are managed by the callback, which takes
the same spinlock as the one taken by the lock code and released only by the
unlock routine. Combining this with (d) above, satisfies requirement (2).

Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug
operations from racing with br_write_lock-unlock, requirement (1) is also
taken care of.

By the way, it is to be noted that a CPU offline operation can actually run
in parallel with our lock-unlock sequence, because our callback doesn't react
to notifications earlier than CPU_DEAD (in order to maintain our bitmap
properly). And this means, since we use our own bitmap (which is stale, on
purpose) during the lock-unlock sequence, we could end up unlocking the
per-cpu lock of an offline CPU (because we had locked it earlier, when the
CPU was online), in order to satisfy requirement (2). But this is harmless,
though it looks a bit awkward.

Debugged-by: Cong Meng <mc@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
2011-12-22 02:02:20 -05:00
Stephane Eranian
c37e17497e perf events: Add PERF_COUNT_HW_REF_CPU_CYCLES generic PMU event
This event counts the number of reference core cpu cycles.
Reference means that the event increments at a constant rate which
is not subject to core CPU frequency adjustments. The event may
not count when the processor is in halted (low power) state.
As such, it may not be equivalent to wall clock time. However,
when the processor is not halted state, the event keeps
a constant correlation with wall clock time.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1323559734-3488-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21 10:26:37 +01:00
Peter Zijlstra
f07fdec50a lockdep/waitqueues: Add better annotation
-> #2 (&tty->write_wait){-.-...}:

is a lot more informative than:

 -> #2 (key#19){-.....}:

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-8zpopbny51023rdb0qq67eye@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21 10:07:39 +01:00
Vijay Subramanian
ab56222a32 tcp: Replace constants with #define macros
to record the state of SACK/FACK and DSACK for better readability and maintenance.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-21 01:03:23 -05:00
Linus Torvalds
5fbd305dd2 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time/clocksource: Fix kernel-doc warnings
  rtc: m41t80: Workaround broken alarm functionality
  rtc: Expire alarms after the time is set.
2011-12-20 11:42:38 -08:00
Ingo Molnar
d87f69a16e Merge commit 'v3.2-rc6' into perf/core
Merge reason: Update with the latest fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-20 20:32:11 +01:00