Commit Graph

950437 Commits

Author SHA1 Message Date
Jason Gunthorpe
5dee5872f8 Merge branch 'mlx5_active_speed' into rdma.git for-next
Leon Romanovsky says:

====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_active_speed':
  RDMA: Fix link active_speed size
  RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
  net/mlx5: Refactor query port speed functions
2020-09-18 10:31:45 -03:00
Aharon Landau
376ceb31ff RDMA: Fix link active_speed size
According to the IB spec active_speed size should be u16 and not u8 as
before. Changing it to allow further extensions in offered speeds.

Link: https://lore.kernel.org/r/20200917090223.1018224-4-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 10:31:24 -03:00
Leon Romanovsky
c0a6b5ecc5 RDMA: Convert RWQ table logic to ib_core allocation scheme
Move struct ib_rwq_ind_table allocation to ib_core.

Link: https://lore.kernel.org/r/20200902081623.746359-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 14:04:33 -03:00
Leon Romanovsky
d18bb3e152 RDMA: Clean MW allocation and free flows
Move allocation and destruction of memory windows under ib_core
responsibility and clean drivers to ensure that no updates to MW
ib_core structures are done in driver layer.

Link: https://lore.kernel.org/r/20200902081623.746359-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 14:04:32 -03:00
Aharon Landau
e27014bdb4 RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
Combine two same enums to avoid duplication.

Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-09-17 19:33:03 +03:00
Aharon Landau
639bf4415c net/mlx5: Refactor query port speed functions
The functions mlx5_query_port_link_width_oper and
mlx5_query_port_ib_proto_oper are always called together, so combine them
to a new function called mlx5_query_port_oper to avoid duplication.

And while the mlx5i_get_port_settings is the same as
mlx5_query_port_oper therefore let's remove it.

According to the IB spec link_width_oper and ib_proto_oper should be u16
and not as written u8, so perform casting as a preparation to cross-RDMA
patch which will fix that type for all drivers in the RDMA subsystem.

Fixes: ada68c31ba ("net/mlx5: Introduce a new header file for physical port functions")
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2020-09-17 19:33:02 +03:00
Jason Gunthorpe
b5de0c60cc RDMA/cma: Fix use after free race in roce multicast join
The roce path triggers a work queue that continues to touch the id_priv
but doesn't hold any reference on it. Futher, unlike in the IB case, the
work queue is not fenced during rdma_destroy_id().

This can trigger a use after free if a destroy is triggered in the
incredibly narrow window after the queue_work and the work starting and
obtaining the handler_mutex.

The only purpose of this work queue is to run the ULP event callback from
the standard context, so switch the design to use the existing
cma_work_handler() scheme. This simplifies quite a lot of the flow:

- Use the cma_work_handler() callback to launch the work for roce. This
  requires generating the event synchronously inside the
  rdma_join_multicast(), which in turn means the dummy struct
  ib_sa_multicast can become a simple stack variable.

- cm_work_handler() used the id_priv kref, so we can entirely eliminate
  the kref inside struct cma_multicast. Since the cma_multicast never
  leaks into an unprotected work queue the kfree can be done at the same
  time as for IB.

- Eliminating the general multicast.ib requires using cma_set_mgid() in a
  few places to recompute the mgid.

Fixes: 3c86aa70bf ("RDMA/cm: Add RDMA CM support for IBoE devices")
Link: https://lore.kernel.org/r/20200902081122.745412-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:25 -03:00
Jason Gunthorpe
3788d2997b RDMA/cma: Consolidate the destruction of a cma_multicast in one place
Two places were open coding this sequence, and also pull in
cma_leave_roce_mc_group() which was called only once.

Link: https://lore.kernel.org/r/20200902081122.745412-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00
Jason Gunthorpe
1bb5091def RDMA/cma: Remove dead code for kernel rdmacm multicast
There is no kernel user of RDMA CM multicast so this code managing the
multicast subscription of the kernel-only internal QP is dead. Remove it.

This makes the bug fixes in the next patches much simpler.

Link: https://lore.kernel.org/r/20200902081122.745412-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00
Jason Gunthorpe
7e85bcda8b RDMA/cma: Combine cma_ndev_work with cma_work
These are the same thing, except that cma_ndev_work doesn't have a state
transition. Signal no state transition by setting old_state and new_state
== 0.

In all cases the handler function should not be called once
rdma_destroy_id() has progressed passed setting the state.

Link: https://lore.kernel.org/r/20200902081122.745412-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00
Jason Gunthorpe
5cfbf9291e RDMA/cma: Remove cma_comp()
The only place that still uses it is rdma_join_multicast() which is only
doing a sanity check that the caller hasn't done something wrong and
doesn't need the spinlock.

At least in the case of rdma_join_multicast() the information it needs
will remain until the ID is destroyed once it enters these
states. Similarly there is no reason to check for these specific states in
the handler callback, instead use the usual check for a destroyed id under
the handler_mutex.

Link: https://lore.kernel.org/r/20200902081122.745412-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00
Jason Gunthorpe
d490ee52f0 RDMA/cma: Fix locking for the RDMA_CM_LISTEN state
There is a strange unlocked read of the ID state when checking for
reuseaddr. This is because an ID cannot be reusable once it becomes a
listening ID. Instead of using the state to exclude reuse, just clear it
as part of rdma_listen()'s flow to convert reusable into not reusable.

Once a ID goes to listen there is no way back out, and the only use of
reusable is on the bind_list check.

Finally, update the checks under handler_mutex to use READ_ONCE and audit
that once RDMA_CM_LISTEN is observed in a req callback it is stable under
the handler_mutex.

Link: https://lore.kernel.org/r/20200902081122.745412-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00
Jason Gunthorpe
732d41c545 RDMA/cma: Make the locking for automatic state transition more clear
Re-organize things so the state variable is not read unlocked. The first
attempt to go directly from ADDR_BOUND immediately tells us if the ID is
already bound, if we can't do that then the attempt inside
rdma_bind_addr() to go from IDLE to ADDR_BOUND confirms the ID needs
binding.

Link: https://lore.kernel.org/r/20200902081122.745412-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:23 -03:00
Jason Gunthorpe
2a7cec5381 RDMA/cma: Fix locking for the RDMA_CM_CONNECT state
It is currently a bit confusing, but the design is if the handler_mutex
is held, and the state is in RDMA_CM_CONNECT, then the state cannot leave
RDMA_CM_CONNECT without also serializing with the handler_mutex.

Make this clearer by adding a direct assertion, fixing the usage in
rdma_connect and generally using READ_ONCE to read the state value.

Link: https://lore.kernel.org/r/20200902081122.745412-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:23 -03:00
Liu Shixin
3cc30e8dfc RDMA/ipoib: Convert to use DEFINE_SEQ_ATTRIBUTE macro
Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

Link: https://lore.kernel.org/r/20200916025022.3992627-1-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-16 13:46:18 -03:00
Parav Pandit
1d7c995820 RDMA/i40iw: Avoid typecast from void to pci_dev
hw_context stores a pci_device pointer. Use the correct type instead of
void * and avoid typecasts.

Link: https://lore.kernel.org/r/20200914123528.270382-1-parav@mellanox.com
Signed-off-by: Parav Pandit <parav@nvidia.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-16 13:40:29 -03:00
Yuval Basson
06e8d1df46 RDMA/qedr: Add support for user mode XRC-SRQ's
Implement the XRC specific verbs.  The additional QP type introduced new
logic to the rest of the verbs that now require distinguishing whether a
QP has an "RQ" or an "SQ" or both.

Link: https://lore.kernel.org/r/20200722102339.30104-1-ybason@marvell.com
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Yuval Basson <ybason@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-16 08:33:31 -03:00
Linus Torvalds
856deb866d Linux 5.9-rc5 2020-09-13 16:06:00 -07:00
Linus Torvalds
5712c3ed54 Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
 "A collection of fixes I've been accruing over the last few weeks, none
  of them have been severe enough to warrant flushing the queue but it's
  been long enough now that it's a good idea to send them in.

  A handful of them are fixups for QSPI DT/bindings/compatibles, some
  smaller fixes for system DMA clock control and TMU interrupts on i.MX,
  a handful of fixes for OMAP, including a fix for DSI (display) on
  omap5"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (27 commits)
  arm64: dts: ns2: Fixed QSPI compatible string
  ARM: dts: BCM5301X: Fixed QSPI compatible string
  ARM: dts: NSP: Fixed QSPI compatible string
  ARM: dts: bcm: HR2: Fixed QSPI compatible string
  dt-bindings: spi: Fix spi-bcm-qspi compatible ordering
  ARM: dts: imx6sx: fix the pad QSPI1B_SCLK mux mode for uart3
  arm64: dts: imx8mp: correct sdma1 clk setting
  arm64: dts: imx8mq: Fix TMU interrupt property
  ARM: dts: imx7d-zii-rmu2: fix rgmii phy-mode for ksz9031 phy
  ARM: dts: vfxxx: Add syscon compatible with OCOTP
  ARM: dts: imx6q-logicpd: Fix broken PWM
  arm64: dts: imx: Add missing imx8mm-beacon-kit.dtb to build
  ARM: dts: imx6q-prtwd2: Remove unneeded i2c unit name
  ARM: dts: imx6qdl-gw51xx: Remove unneeded #address-cells/#size-cells
  ARM: dts: imx7ulp: Correct gpio ranges
  ARM: dts: ls1021a: fix QuadSPI-memory reg range
  arm64: defconfig: Enable ptn5150 extcon driver
  arm64: defconfig: Enable USB gadget with configfs
  ARM: configs: Update Integrator defconfig
  ARM: dts: omap5: Fix DSI base address and clocks
  ...
2020-09-13 14:54:40 -07:00
Linus Torvalds
e4c26faa42 Merge tag 'usb-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt fixes from Greg KH:
 "Here are some small USB and Thunderbolt driver fixes for 5.9-rc5.

  Nothing huge, just a number of bugfixes and new device ids for
  problems reported:

   - new USB serial driver ids

   - bug fixes for syzbot reported problems

   - typec driver fixes

   - thunderbolt driver fixes

   - revert of reported broken commit

  All of these have been in linux-next with no reported issues"

* tag 'usb-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: intel_pmc_mux: Do not configure SBU and HSL Orientation in Alternate modes
  usb: typec: intel_pmc_mux: Do not configure Altmode HPD High
  usb: core: fix slab-out-of-bounds Read in read_descriptors
  Revert "usb: dwc3: meson-g12a: fix shared reset control use"
  usb: typec: ucsi: acpi: Check the _DEP dependencies
  usb: typec: intel_pmc_mux: Un-register the USB role switch
  usb: Fix out of sync data toggle if a configured device is reconfigured
  USB: serial: option: support dynamic Quectel USB compositions
  USB: serial: option: add support for SIM7070/SIM7080/SIM7090 modules
  thunderbolt: Use maximum USB3 link rate when reclaiming if link is not up
  thunderbolt: Disable ports that are not implemented
  USB: serial: ftdi_sio: add IDs for Xsens Mti USB converter
2020-09-13 09:23:54 -07:00
Linus Torvalds
6c7247f625 Merge tag 'staging-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO driver fixes from Greg KH:
 "Here are a number of staging and IIO driver fixes for 5.9-rc5.

  The majority of these are IIO driver fixes, to resolve a timestamp
  issue that was recently found to affect a bunch of IIO drivers.

  The other fixes in here are:

   - small IIO driver fixes

   - greybus driver fix

   - counter driver fix (came in through the IIO fixes tree)

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (23 commits)
  iio: adc: mcp3422: fix locking on error path
  iio: adc: mcp3422: fix locking scope
  iio: adc: meson-saradc: Use the parent device to look up the calib data
  iio:adc:max1118 Fix alignment of timestamp and data leak issues
  iio:adc:ina2xx Fix timestamp alignment issue.
  iio:adc:ti-adc084s021 Fix alignment and data leak issues.
  iio:adc:ti-adc081c Fix alignment and data leak issues
  iio:magnetometer:ak8975 Fix alignment and data leak issues.
  iio:light:ltr501 Fix timestamp alignment issue.
  iio:light:max44000 Fix timestamp alignment and prevent data leak.
  iio:chemical:ccs811: Fix timestamp alignment and prevent data leak.
  iio:proximity:mb1232: Fix timestamp alignment and prevent data leak.
  iio:accel:mma7455: Fix timestamp alignment and prevent data leak.
  iio:accel:bmc150-accel: Fix timestamp alignment and prevent data leak.
  iio:accel:mma8452: Fix timestamp alignment and prevent data leak.
  iio: accel: kxsd9: Fix alignment of local buffer.
  iio: adc: rockchip_saradc: select IIO_TRIGGERED_BUFFER
  iio: adc: ti-ads1015: fix conversion when CONFIG_PM is not set
  counter: microchip-tcb-capture: check the correct variable
  iio: cros_ec: Set Gyroscope default frequency to 25Hz
  ...
2020-09-13 09:15:20 -07:00
Linus Torvalds
20a7b6be05 Merge tag 'driver-core-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
 "Here are some small driver core and debugfs fixes for 5.9-rc5

  Included in here are:

   - firmware loader memory leak fix

   - firmware loader testing fixes for non-EFI systems

   - device link locking fixes found by lockdep

   - kobject_del() bugfix that has been affecting some callers

   - debugfs minor fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  test_firmware: Test platform fw loading on non-EFI systems
  PM: <linux/device.h>: fix @em_pd kernel-doc warning
  kobject: Drop unneeded conditional in __kobject_del()
  driver core: Fix device_pm_lock() locking for device links
  MAINTAINERS: Add the security document to SECURITY CONTACT
  driver code: print symbolic error code
  debugfs: Fix module state check condition
  kobject: Restore old behaviour of kobject_del(NULL)
  firmware_loader: fix memory leak for paged buffer
2020-09-13 09:02:59 -07:00
Olof Johansson
a4da411e41 Merge tag 'arm-soc/for-5.9/devicetree-fixes' of https://github.com/Broadcom/stblinux into arm/fixes
This pull request contains Broadcom ARM-based SoCs Device Tree fixes for
5.9, please pull the following:

- Florian fixes the Broadcom QSPI controller binding such that the most
  specific compatible string is the left most one, and all existing
  in-tree users are updated as well.

* tag 'arm-soc/for-5.9/devicetree-fixes' of https://github.com/Broadcom/stblinux:
  arm64: dts: ns2: Fixed QSPI compatible string
  ARM: dts: BCM5301X: Fixed QSPI compatible string
  ARM: dts: NSP: Fixed QSPI compatible string
  ARM: dts: bcm: HR2: Fixed QSPI compatible string
  dt-bindings: spi: Fix spi-bcm-qspi compatible ordering

Link: https://lore.kernel.org/r/20200909211857.4144718-1-f.fainelli@gmail.com
Signed-off-by: Olof Johansson <olof@lixom.net>
2020-09-13 08:57:38 -07:00
Olof Johansson
2aedcb042f Merge tag 'imx-fixes-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
i.MX fixes for 5.9, round 2:

- Fix the misspelling of 'interrupts' property in i.MX8MQ TMU DT node.
- Correct 'ahb' clock for i.MX8MP SDMA1 in device tree.
- Fix pad QSPI1B_SCLK mux mode for UART3 on i.MX6SX.

* tag 'imx-fixes-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  ARM: dts: imx6sx: fix the pad QSPI1B_SCLK mux mode for uart3
  arm64: dts: imx8mp: correct sdma1 clk setting
  arm64: dts: imx8mq: Fix TMU interrupt property

Link: https://lore.kernel.org/r/20200909143844.GA25109@dragon
Signed-off-by: Olof Johansson <olof@lixom.net>
2020-09-13 08:56:04 -07:00
Olof Johansson
0e384029e1 Merge tag 'omap-for-v5.9/fixes-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
Fixes for omaps for v5.9-rc cycle

Few fixes for omap based devices:

- Fix of_clk_get() error handling for omap-iommu

- Fix missing audio pinctrl entries for logicpd boards

- Fix video for logicpd-som-lv after switch to generic panels

- Fix omap5 DSI clocks base

* tag 'omap-for-v5.9/fixes-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: omap5: Fix DSI base address and clocks
  ARM: dts: logicpd-som-lv-baseboard: Fix missing video
  ARM: dts: logicpd-som-lv-baseboard: Fix broken audio
  ARM: dts: logicpd-torpedo-baseboard: Fix broken audio
  ARM: OMAP2+: Fix an IS_ERR() vs NULL check in _get_pwrdm()

Link: https://lore.kernel.org/r/pull-1599132064-54898@atomide.com
Signed-off-by: Olof Johansson <olof@lixom.net>
2020-09-13 08:54:02 -07:00
Linus Torvalds
2a1a4bee5e Merge tag 'char-misc-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver fixes from Greg KH:
 "Here are a number of small driver fixes for 5.9-rc5

  Included in here are:

   - habanalabs driver fixes

   - interconnect driver fixes

   - soundwire driver fixes

   - dyndbg fixes for reported issues, and then reverts to fix it all up
     to a sane state.

   - phy driver fixes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Revert "dyndbg: accept query terms like file=bar and module=foo"
  Revert "dyndbg: fix problem parsing format="foo bar""
  scripts/tags.sh: exclude tools directory from tags generation
  video: fbdev: fix OOB read in vga_8planes_imageblit()
  dyndbg: fix problem parsing format="foo bar"
  dyndbg: refine export, rename to dynamic_debug_exec_queries()
  dyndbg: give %3u width in pr-format, cosmetic only
  interconnect: qcom: Fix small BW votes being truncated to zero
  soundwire: fix double free of dangling pointer
  interconnect: Show bandwidth for disabled paths as zero in debugfs
  habanalabs: fix report of RAZWI initiator coordinates
  habanalabs: prevent user buff overflow
  phy: omap-usb2-phy: disable PHY charger detect
  phy: qcom-qmp: Use correct values for ipq8074 PCIe Gen2 PHY init
  soundwire: bus: fix typo in comment on INTSTAT registers
  phy: qualcomm: fix return value check in qcom_ipq806x_usb_phy_probe()
  phy: qualcomm: fix platform_no_drv_owner.cocci warnings
2020-09-13 08:52:21 -07:00
Linus Torvalds
84b1349972 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "A bit on the bigger side, mostly due to me being on vacation, then
  busy, then on parental leave, but there's nothing worrisome.

  ARM:
   - Multiple stolen time fixes, with a new capability to match x86
   - Fix for hugetlbfs mappings when PUD and PMD are the same level
   - Fix for hugetlbfs mappings when PTE mappings are enforced (dirty
     logging, for example)
   - Fix tracing output of 64bit values

  x86:
   - nSVM state restore fixes
   - Async page fault fixes
   - Lots of small fixes everywhere"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
  KVM: emulator: more strict rsm checks.
  KVM: nSVM: more strict SMM checks when returning to nested guest
  SVM: nSVM: setup nested msr permission bitmap on nested state load
  SVM: nSVM: correctly restore GIF on vmexit from nesting after migration
  x86/kvm: don't forget to ACK async PF IRQ
  x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macro
  KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit
  KVM: SVM: avoid emulation with stale next_rip
  KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN
  KVM: SVM: Periodically schedule when unregistering regions on destroy
  KVM: MIPS: Change the definition of kvm type
  kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when needed
  KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control
  KVM: fix memory leak in kvm_io_bus_unregister_dev()
  KVM: Check the allocation of pv cpu mask
  KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected
  KVM: arm64: Update page shift if stage 2 block mapping not supported
  KVM: arm64: Fix address truncation in traces
  KVM: arm64: Do not try to map PUDs when they are folded into PMD
  arm64/x86: KVM: Introduce steal-time cap
  ...
2020-09-13 08:34:47 -07:00
Linus Torvalds
b952e97430 Merge tag 'for-linus' of git://github.com/openrisc/linux
Pull OpenRISC fixes from Stafford Horne:
 "Fixes for compile issues pointed out by kbuild and one bug I found in
  initrd with the 5.9 patches"

* tag 'for-linus' of git://github.com/openrisc/linux:
  openrisc: Fix issue with get_user for 64-bit values
  openrisc: Fix cache API compile issue when not inlining
  openrisc: Reserve memblock for initrd
2020-09-12 13:03:49 -07:00
Linus Torvalds
ef2e9a563b Merge tag 'seccomp-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
 "This fixes a rare race condition in seccomp when using TSYNC and
  USER_NOTIF together where a memory allocation would not get freed
  (found by syzkaller, fixed by Tycho).

  Additionally updates Tycho's MAINTAINERS and .mailmap entries for his
  new address"

* tag 'seccomp-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: don't leave dangling ->notif if file allocation fails
  mailmap, MAINTAINERS: move to tycho.pizza
  seccomp: don't leak memory when filter install races
2020-09-12 12:58:01 -07:00
Linus Torvalds
4f8b0a5b3f Merge tag 'libnvdimm-fix-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Vishal Verma:
 "Fix detection of dax support for block devices.

  Previous fixes in this area, which only affected printing of debug
  messages, had an incorrect condition for detection of dax. This fix
  should finally do the right thing"

* tag 'libnvdimm-fix-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: fix detection of dax support for non-persistent memory block devices
2020-09-12 12:43:58 -07:00
Linus Torvalds
edf6b0e1e4 Merge tag 'for-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
 "A few more fixes:

   - regression fix for a crash after failed snapshot creation

   - one more lockep fix: use nofs allocation when allocating missing
     device

   - fix reloc tree leak on degraded mount

   - make some extent buffer alignment checks less strict to mount
     filesystems created by btrfs-convert"

* tag 'for-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix NULL pointer dereference after failure to create snapshot
  btrfs: free data reloc tree on failed mount
  btrfs: require only sector size alignment for parent eb bytenr
  btrfs: fix lockdep splat in add_missing_dev
2020-09-12 12:28:39 -07:00
Linus Torvalds
5a3c558a9f Merge tag '5.9-rc4-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fix from Steve French:
 "A fix for lookup on DFS link when cifsacl or modefromsid is used"

* tag '5.9-rc4-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix DFS mount with cifsacl/modefromsid
2020-09-12 11:48:04 -07:00
Maxim Levitsky
37f66bbef0 KVM: emulator: more strict rsm checks.
Don't ignore return values in rsm_load_state_64/32 to avoid
loading invalid state from SMM state area if it was tampered with
by the guest.

This is primarly intended to avoid letting guest set bits in EFER
(like EFER.SVME when nesting is disabled) by manipulating SMM save area.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827171145.374620-8-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12 12:22:55 -04:00
Maxim Levitsky
3ebb5d2617 KVM: nSVM: more strict SMM checks when returning to nested guest
* check that guest is 64 bit guest, otherwise the SVM related fields
  in the smm state area are not defined

* If the SMM area indicates that SMM interrupted a running guest,
  check that EFER.SVME which is also saved in this area is set, otherwise
  the guest might have tampered with SMM save area, and so indicate
  emulation failure which should triple fault the guest.

* Check that that guest CPUID supports SVM (due to the same issue as above)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827162720.278690-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12 12:21:43 -04:00
Maxim Levitsky
772b81bb2f SVM: nSVM: setup nested msr permission bitmap on nested state load
This code was missing and was forcing the L2 run with L1's msr
permission bitmap

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827162720.278690-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12 12:20:53 -04:00
Maxim Levitsky
9883764ad0 SVM: nSVM: correctly restore GIF on vmexit from nesting after migration
Currently code in svm_set_nested_state copies the current vmcb control
area to L1 control area (hsave->control), under assumption that
it mostly reflects the defaults that kvm choose, and later qemu
overrides  these defaults with L2 state using standard KVM interfaces,
like KVM_SET_REGS.

However nested GIF (which is AMD specific thing) is by default is true,
and it is copied to hsave area as such.

This alone is not a big deal since on VMexit, GIF is always set to false,
regardless of what it was on VM entry.  However in nested_svm_vmexit we
were first were setting GIF to false, but then we overwrite the control
fields with value from the hsave area.  (including the nested GIF field
itself if GIF virtualization is enabled).

Now on normal vm entry this is not a problem, since GIF is usually false
prior to normal vm entry, and this is the value that copied to hsave,
and then restored, but this is not always the case when the nested state
is loaded as explained above.

To fix this issue, move svm_set_gif after we restore the L1 control
state in nested_svm_vmexit, so that even with wrong GIF in the
saved L1 control area, we still clear GIF as the spec says.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827162720.278690-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12 12:19:06 -04:00
Stafford Horne
d877322bc1 openrisc: Fix issue with get_user for 64-bit values
A build failure was raised by kbuild with the following error.

    drivers/android/binder.c: Assembler messages:
    drivers/android/binder.c:3861: Error: unrecognized keyword/register name `l.lwz ?ap,4(r24)'
    drivers/android/binder.c:3866: Error: unrecognized keyword/register name `l.addi ?ap,r0,0'

The issue is with 64-bit get_user() calls on openrisc.  I traced this to
a problem where in the internally in the get_user macros there is a cast
to long __gu_val this causes GCC to think the get_user call is 32-bit.
This binder code is really long and GCC allocates register r30, which
triggers the issue. The 64-bit get_user asm tries to get the 64-bit pair
register, which for r30 overflows the general register names and returns
the dummy register ?ap.

The fix here is to move the temporary variables into the asm macros.  We
use a 32-bit __gu_tmp for 32-bit and smaller macro and a 64-bit tmp in
the 64-bit macro.  The cast in the 64-bit macro has a trick of casting
through __typeof__((x)-(x)) which avoids the below warning.  This was
barrowed from riscv.

    arch/openrisc/include/asm/uaccess.h:240:8: warning: cast to pointer from integer of different size

I tested this in a small unit test to check reading between 64-bit and
32-bit pointers to 64-bit and 32-bit values in all combinations.  Also I
ran make C=1 to confirm no new sparse warnings came up.  It all looks
clean to me.

Link: https://lore.kernel.org/lkml/202008200453.ohnhqkjQ%25lkp@intel.com/
Signed-off-by: Stafford Horne <shorne@gmail.com>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-09-12 17:26:00 +09:00
Vitaly Kuznetsov
cc17b22559 x86/kvm: don't forget to ACK async PF IRQ
Merge commit 26d05b368a ("Merge branch 'kvm-async-pf-int' into HEAD")
tried to adapt the new interrupt based async PF mechanism to the newly
introduced IDTENTRY magic but unfortunately it missed the fact that
DEFINE_IDTENTRY_SYSVEC() doesn't call ack_APIC_irq() on its own and
all DEFINE_IDTENTRY_SYSVEC() users have to call it manually.

As the result all multi-CPU KVM guest hang on boot when
KVM_FEATURE_ASYNC_PF_INT is present. The breakage went unnoticed because no
KVM userspace (e.g. QEMU) currently set it (and thus async PF mechanism
is currently disabled) but we're about to change that.

Fixes: 26d05b368a ("Merge branch 'kvm-async-pf-int' into HEAD")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200908135350.355053-3-vkuznets@redhat.com>
Tested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12 02:22:21 -04:00
Vitaly Kuznetsov
244081f907 x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macro
DEFINE_IDTENTRY_SYSVEC() already contains irqentry_enter()/
irqentry_exit().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200908135350.355053-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12 02:22:07 -04:00
Wanpeng Li
99b82a1437 KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit
According to SDM 27.2.4, Event delivery causes an APIC-access VM exit.
Don't report internal error and freeze guest when event delivery causes
an APIC-access exit, it is handleable and the event will be re-injected
during the next vmentry.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1597827327-25055-2-git-send-email-wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12 02:21:17 -04:00
Wanpeng Li
e42c68281b KVM: SVM: avoid emulation with stale next_rip
svm->next_rip is reset in svm_vcpu_run() only after calling
svm_exit_handlers_fastpath(), which will cause SVM's
skip_emulated_instruction() to write a stale RIP.

We can move svm_exit_handlers_fastpath towards the end of
svm_vcpu_run().  To align VMX with SVM, keep svm_complete_interrupts()
close as well.

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Paul K. <kronenpj@kronenpj.dyndns.org>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
[Also move vmcb_mark_all_clean before any possible write to the VMCB.
 - Paolo]
2020-09-12 02:19:23 -04:00
Linus Torvalds
729e3d0919 Merge tag 'ceph-for-5.9-rc5' of git://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
 "Add missing capability checks in rbd, marked for stable"

* tag 'ceph-for-5.9-rc5' of git://github.com/ceph/ceph-client:
  rbd: require global CAP_SYS_ADMIN for mapping and unmapping
2020-09-11 13:47:29 -07:00
Linus Torvalds
e9287bd248 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
 "Usual driver bugfixes for the I2C subsystem"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: algo: pca: Reapply i2c bus settings after reset
  i2c: npcm7xx: Fix timeout calculation
  misc: eeprom: at24: register nvmem only after eeprom is ready to use
2020-09-11 13:43:05 -07:00
Linus Torvalds
566e24eeb4 Merge tag 'pm-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These fix three pieces of documentation and add new CPU IDs to the
  Intel RAPL power capping driver.

  Specifics:

   - Add CPU IDs of the TigerLake Desktop, RocketLake and AlderLake
     chips to the Intel RAPL power capping driver (Zhang Rui).

   - Add the missing energy model performance domain item to the struct
     device kerneldoc comment (Randy Dunlap).

   - Fix the struct powercap_control_type kerneldoc comment to match the
     actual definition of that structure and add missing item to the
     struct powercap_zone_ops kerneldoc comment (Amit Kucheria)"

* tag 'pm-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  powercap: make documentation reflect code
  PM: <linux/device.h>: fix @em_pd kernel-doc warning
  powercap/intel_rapl: add support for AlderLake
  powercap/intel_rapl: add support for RocketLake
  powercap/intel_rapl: add support for TigerLake Desktop
2020-09-11 11:59:14 -07:00
Linus Torvalds
7b8731d958 Merge tag 'block-5.9-2020-09-11' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:

 - Fix a regression in bdev partition locking (Christoph)

 - NVMe pull request from Christoph:
      - cancel async events before freeing them (David Milburn)
      - revert a broken race fix (James Smart)
      - fix command processing during resets (Sagi Grimberg)

 - Fix a kyber crash with requeued flushes (Omar)

 - Fix __bio_try_merge_page() same_page error for no merging (Ritesh)

* tag 'block-5.9-2020-09-11' of git://git.kernel.dk/linux-block:
  block: Set same_page to false in __bio_try_merge_page if ret is false
  nvme-fabrics: allow to queue requests for live queues
  block: only call sched requeue_request() for scheduled requests
  nvme-tcp: cancel async events before freeing event struct
  nvme-rdma: cancel async events before freeing event struct
  nvme-fc: cancel async events before freeing event struct
  nvme: Revert: Fix controller creation races with teardown flow
  block: restore a specific error code in bdev_del_partition
2020-09-11 11:55:28 -07:00
Linus Torvalds
e8878ab825 Merge tag 'spi-fix-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
 "There's some driver specific fixes here plus one core fix for memory
  leaks that could be triggered by a potential race condition when
  cleaning up after we have split transfers to fit into what the
  controller can support"

* tag 'spi-fix-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: stm32: fix pm_runtime_get_sync() error checking
  spi: Fix memory leak on splited transfers
  spi: spi-cadence-quadspi: Fix mapping of buffers for DMA reads
  spi: stm32: Rate-limit the 'Communication suspended' message
  spi: spi-loopback-test: Fix out-of-bounds read
  spi: spi-cadence-quadspi: Populate get_name() interface
  MAINTAINERS: add myself as maintainer for spi-fsl-dspi driver
2020-09-11 11:35:55 -07:00
Linus Torvalds
8b6ce25177 Merge tag 'regulator-fix-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
 "The biggest set of fixes here is those from Michał Mirosław fixing
  some locking issues with coupled regulators that are triggered in
  cases where a coupled regulator is used by a device involved in
  fs_reclaim like eMMC storage.

  These are relatively serious for the affected systems, though the
  circumstances where they trigger are very rare"

* tag 'regulator-fix-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: pwm: Fix machine constraints application
  regulator: core: Fix slab-out-of-bounds in regulator_unlock_recursive()
  regulator: remove superfluous lock in regulator_resolve_coupling()
  regulator: cleanup regulator_ena_gpio_free()
  regulator: plug of_node leak in regulator_register()'s error path
  regulator: push allocation in set_consumer_device_supply() out of lock
  regulator: push allocations in create_regulator() outside of lock
  regulator: push allocation in regulator_ena_gpio_request() out of lock
  regulator: push allocation in regulator_init_coupling() outside of lock
  regulator: fix spelling mistake "Cant" -> "Can't"
  regulator: cros-ec-regulator: Add NULL test for devm_kmemdup call
2020-09-11 11:25:55 -07:00
Vitaly Kuznetsov
d831de1772 KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN
Even without in-kernel LAPIC we should allow writing '0' to
MSR_KVM_ASYNC_PF_EN as we're not enabling the mechanism. In
particular, QEMU with 'kernel-irqchip=off' fails to start
a guest with

qemu-system-x86_64: error: failed to set MSR 0x4b564d02 to 0x0

Fixes: 9d3c447c72 ("KVM: X86: Fix async pf caused null-ptr-deref")
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200911093147.484565-1-vkuznets@redhat.com>
[Actually commit the version proposed by Sean Christopherson. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11 13:26:47 -04:00
David Rientjes
7be74942f1 KVM: SVM: Periodically schedule when unregistering regions on destroy
There may be many encrypted regions that need to be unregistered when a
SEV VM is destroyed.  This can lead to soft lockups.  For example, on a
host running 4.15:

watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348]
CPU: 206 PID: 194348 Comm: t_virtual_machi
RIP: 0010:free_unref_page_list+0x105/0x170
...
Call Trace:
 [<0>] release_pages+0x159/0x3d0
 [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd]
 [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd]
 [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd]
 [<0>] kvm_arch_destroy_vm+0x47/0x200
 [<0>] kvm_put_kvm+0x1a8/0x2f0
 [<0>] kvm_vm_release+0x25/0x30
 [<0>] do_exit+0x335/0xc10
 [<0>] do_group_exit+0x3f/0xa0
 [<0>] get_signal+0x1bc/0x670
 [<0>] do_signal+0x31/0x130

Although the CLFLUSH is no longer issued on every encrypted region to be
unregistered, there are no other changes that can prevent soft lockups for
very large SEV VMs in the latest kernel.

Periodically schedule if necessary.  This still holds kvm->lock across the
resched, but since this only happens when the VM is destroyed this is
assumed to be acceptable.

Signed-off-by: David Rientjes <rientjes@google.com>
Message-Id: <alpine.DEB.2.23.453.2008251255240.2987727@chino.kir.corp.google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11 13:24:15 -04:00
Huacai Chen
15e9e35cd1 KVM: MIPS: Change the definition of kvm type
MIPS defines two kvm types:

 #define KVM_VM_MIPS_TE          0
 #define KVM_VM_MIPS_VZ          1

In Documentation/virt/kvm/api.rst it is said that "You probably want to
use 0 as machine type", which implies that type 0 be the "automatic" or
"default" type. And, in user-space libvirt use the null-machine (with
type 0) to detect the kvm capability, which returns "KVM not supported"
on a VZ platform.

I try to fix it in QEMU but it is ugly:
https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg05629.html

And Thomas Huth suggests me to change the definition of kvm type:
https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03281.html

So I define like this:

 #define KVM_VM_MIPS_AUTO        0
 #define KVM_VM_MIPS_VZ          1
 #define KVM_VM_MIPS_TE          2

Since VZ and TE cannot co-exists, using type 0 on a TE platform will
still return success (so old user-space tools have no problems on new
kernels); the advantage is that using type 0 on a VZ platform will not
return failure. So, the only problem is "new user-space tools use type
2 on old kernels", but if we treat this as a kernel bug, we can backport
this patch to old stable kernels.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Message-Id: <1599734031-28746-1-git-send-email-chenhc@lemote.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11 13:22:52 -04:00