Commit Graph

1052 Commits

Author SHA1 Message Date
Linus Torvalds
f2586d921c Merge tag 'tpmdd-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:

 - Restrict linking of keys to .ima and .evm keyrings based on
   digitalSignature attribute in the certificate

 - PowerVM: load machine owner keys into the .machine [1] keyring

 - PowerVM: load module signing keys into the secondary trusted keyring
   (keys blessed by the vendor)

 - tpm_tis_spi: half-duplex transfer mode

 - tpm_tis: retry corrupted transfers

 - Apply revocation list (.mokx) to an all system keyrings (e.g.
   .machine keyring)

Link: https://blogs.oracle.com/linux/post/the-machine-keyring [1]

* tag 'tpmdd-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  certs: Reference revocation list for all keyrings
  tpm/tpm_tis_synquacer: Use module_platform_driver macro to simplify the code
  tpm: remove redundant variable len
  tpm_tis: Resend command to recover from data transfer errors
  tpm_tis: Use responseRetry to recover from data transfer errors
  tpm_tis: Move CRC check to generic send routine
  tpm_tis_spi: Add hardware wait polling
  KEYS: Replace all non-returning strlcpy with strscpy
  integrity: PowerVM support for loading third party code signing keys
  integrity: PowerVM machine keyring enablement
  integrity: check whether imputed trust is enabled
  integrity: remove global variable from machine_keyring.c
  integrity: ignore keys failing CA restrictions on non-UEFI platform
  integrity: PowerVM support for loading CA keys on machine keyring
  integrity: Enforce digitalSignature usage in the ima and evm keyrings
  KEYS: DigitalSignature link restriction
  tpm_tis: Revert "tpm_tis: Disable interrupts on ThinkPad T490s"
2023-08-29 08:05:18 -07:00
Linus Torvalds
330235e874 Merge tag 'acpi-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
 "These include new ACPICA material, a rework of the ACPI thermal
  driver, a switch-over of the ACPI processor driver to using _OSC
  instead of (long deprecated) _PDC for CPU initialization, a rework of
  firmware notifications handling in several drivers, fixes and cleanups
  for suspend-to-idle handling on AMD systems, ACPI backlight driver
  updates and more.

  Specifics:

   - Update the ACPICA code in the kernel to upstream revision 20230628
     including the following changes:
      - Suppress a GCC 12 dangling-pointer warning (Philip Prindeville)
      - Reformat the ACPI_STATE_COMMON macro and its users (George Guo)
      - Replace the ternary operator with ACPI_MIN() (Jiangshan Yi)
      - Add support for _DSC as per ACPI 6.5 (Saket Dumbre)
      - Remove a duplicate macro from zephyr header (Najumon B.A)
      - Add data structures for GED and _EVT tracking (Jose Marinho)
      - Fix misspelled CDAT DSMAS define (Dave Jiang)
      - Simplify an error message in acpi_ds_result_push() (Christophe
        Jaillet)
      - Add a struct size macro related to SRAT (Dave Jiang)
      - Add AML_NO_OPERAND_RESOLVE flag to Timer (Abhishek Mainkar)
      - Add support for RISC-V external interrupt controllers in MADT
        (Sunil V L)
      - Add RHCT flags, CMO and MMU nodes (Sunil V L)
      - Change ACPICA version to 20230628 (Bob Moore)

   - Introduce new wrappers for ACPICA notify handler install/remove and
     convert multiple drivers to using their own Notify() handlers
     instead of the ACPI bus type .notify() slated for removal (Michal
     Wilczynski)

   - Add backlight=native DMI quirk for Apple iMac12,1 and iMac12,2
     (Hans de Goede)

   - Put ACPI video and its child devices explicitly into D0 on boot to
     avoid platform firmware confusion (Kai-Heng Feng)

   - Add backlight=native DMI quirk for Lenovo Ideapad Z470 (Jiri Slaby)

   - Support obtaining physical CPU ID from MADT on LoongArch (Bibo Mao)

   - Convert ACPI CPU initialization to using _OSC instead of _PDC that
     has been depreceted since 2018 and dropped from the specification
     in ACPI 6.5 (Michal Wilczynski, Rafael Wysocki)

   - Drop non-functional nocrt parameter from ACPI thermal (Mario
     Limonciello)

   - Clean up the ACPI thermal driver, rework the handling of firmware
     notifications in it and make it provide a table of generic trip
     point structures to the core during initialization (Rafael Wysocki)

   - Defer enumeration of devices with _DEP pointing to IVSC (Wentong
     Wu)

   - Install SystemCMOS address space handler for ACPI000E (TAD) to meet
     platform firmware expectations on some platforms (Zhang Rui)

   - Fix finding the generic error data in the ACPi extlog driver for
     compatibility with old and new firmware interface versions
     (Xiaochun Lee)

   - Remove assorted unused declarations of functions (Yue Haibing)

   - Move AMBA bus scan handling into arm64 specific directory (Sudeep
     Holla)

   - Fix and clean up suspend-to-idle interface for AMD systems (Mario
     Limonciello, Andy Shevchenko)

   - Fix string truncation warning in pnpacpi_add_device() (Sunil V L)"

* tag 'acpi-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (66 commits)
  ACPI: x86: s2idle: Add a function to get LPS0 constraint for a device
  ACPI: x86: s2idle: Add for_each_lpi_constraint() helper
  ACPI: x86: s2idle: Add more debugging for AMD constraints parsing
  ACPI: x86: s2idle: Fix a logic error parsing AMD constraints table
  ACPI: x86: s2idle: Catch multiple ACPI_TYPE_PACKAGE objects
  ACPI: x86: s2idle: Post-increment variables when getting constraints
  ACPI: Adjust #ifdef for *_lps0_dev use
  ACPI: TAD: Install SystemCMOS address space handler for ACPI000E
  ACPI: Remove assorted unused declarations of functions
  ACPI: extlog: Fix finding the generic error data for v3 structure
  PNP: ACPI: Fix string truncation warning
  ACPI: Remove unused extern declaration acpi_paddr_to_node()
  ACPI: video: Add backlight=native DMI quirk for Apple iMac12,1 and iMac12,2
  ACPI: video: Put ACPI video and its child devices into D0 on boot
  ACPI: processor: LoongArch: Get physical ID from MADT
  ACPI: scan: Defer enumeration of devices with a _DEP pointing to IVSC device
  ACPI: thermal: Eliminate code duplication from acpi_thermal_notify()
  ACPI: thermal: Drop unnecessary thermal zone callbacks
  ACPI: thermal: Rework thermal_get_trend()
  ACPI: thermal: Use trip point table to register thermal zones
  ...
2023-08-28 17:58:39 -07:00
Linus Torvalds
e5b7ca09e9 Merge tag 's390-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:

 - Add vfio-ap support to pass-through crypto devices to secure
   execution guests

 - Add API ordinal 6 support to zcrypt_ep11misc device drive, which is
   required to handle key generate and key derive (e.g. secure key to
   protected key) correctly

 - Add missing secure/has_secure sysfs files for the case where it is
   not possible to figure where a system has been booted from. Existing
   user space relies on that these files are always present

 - Fix DCSS block device driver list corruption, caused by incorrect
   error handling

 - Convert virt_to_pfn() and pfn_to_virt() from defines to static inline
   functions to enforce type checking

 - Cleanups, improvements, and minor fixes to the kernel mapping setup

 - Fix various virtual vs physical address confusions

 - Move pfault code to separate file, since it has nothing to do with
   regular fault handling

 - Move s390 documentation to Documentation/arch/ like it has been done
   for other architectures already

 - Add HAVE_FUNCTION_GRAPH_RETVAL support

 - Factor out the s390_hypfs filesystem and add a new config option for
   it. The filesystem is deprecated and as soon as all users are gone it
   can be removed some time in the not so near future

 - Remove support for old CEX2 and CEX3 crypto cards from zcrypt device
   driver

 - Add support for user-defined certificates: receive user-defined
   certificates with a diagnose call and provide them via 'cert_store'
   keyring to user space

 - Couple of other small fixes and improvements all over the place

* tag 's390-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (66 commits)
  s390/pci: use builtin_misc_device macro to simplify the code
  s390/vfio-ap: make sure nib is shared
  KVM: s390: export kvm_s390_pv*_is_protected functions
  s390/uv: export uv_pin_shared for direct usage
  s390/vfio-ap: check for TAPQ response codes 0x35 and 0x36
  s390/vfio-ap: handle queue state change in progress on reset
  s390/vfio-ap: use work struct to verify queue reset
  s390/vfio-ap: store entire AP queue status word with the queue object
  s390/vfio-ap: remove upper limit on wait for queue reset to complete
  s390/vfio-ap: allow deconfigured queue to be passed through to a guest
  s390/vfio-ap: wait for response code 05 to clear on queue reset
  s390/vfio-ap: clean up irq resources if possible
  s390/vfio-ap: no need to check the 'E' and 'I' bits in APQSW after TAPQ
  s390/ipl: refactor deprecated strncpy
  s390/ipl: fix virtual vs physical address confusion
  s390/zcrypt_ep11misc: support API ordinal 6 with empty pin-blob
  s390/paes: fix PKEY_TYPE_EP11_AES handling for secure keyblobs
  s390/pkey: fix PKEY_TYPE_EP11_AES handling for sysfs attributes
  s390/pkey: fix PKEY_TYPE_EP11_AES handling in PKEY_VERIFYKEY2 IOCTL
  s390/pkey: fix PKEY_TYPE_EP11_AES handling in PKEY_KBLOB2PROTK[23]
  ...
2023-08-28 17:22:39 -07:00
Linus Torvalds
68cadad11f Merge tag 'rcu.2023.08.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:

 - Documentation updates

 - Miscellaneous fixes, perhaps most notably simplifying
   SRCU_NOTIFIER_INIT() as suggested

 - RCU Tasks updates, most notably treating Tasks RCU callbacks as lazy
   while still treating synchronous grace periods as urgent. Also fixes
   one bug that restores the ability to apply debug-objects to RCU Tasks
   and another that fixes a race condition that could result in
   false-positive failures of the boot-time self-test code

 - RCU-scalability performance-test updates, most notably adding the
   ability to measure the RCU-Tasks's grace-period kthread's CPU
   consumption. This proved quite useful for the RCU Tasks work

 - Reference-acquisition/release performance-test updates, including a
   fix for an uninitialized wait_queue_head_t

 - Miscellaneous torture-test updates

 - Torture-test scripting updates, including removal of the
   non-longer-functional formal-verification scripts, test builds of
   individual RCU Tasks flavors, better diagnostics for loss of
   connectivity for distributed rcutorture tests, disabling of reboot
   loops in qemu/KVM-based rcutorture testing, and passing of init
   parameters to rcutorture's init program

* tag 'rcu.2023.08.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (64 commits)
  rcu: Use WRITE_ONCE() for assignments to ->next for rculist_nulls
  rcu: Make the rcu_nocb_poll boot parameter usable via boot config
  rcu: Mark __rcu_irq_enter_check_tick() ->rcu_urgent_qs load
  srcu,notifier: Remove #ifdefs in favor of SRCU Tiny srcu_usage
  rcutorture: Stop right-shifting torture_random() return values
  torture: Stop right-shifting torture_random() return values
  torture: Move stutter_wait() timeouts to hrtimers
  torture: Move torture_shuffle() timeouts to hrtimers
  torture: Move torture_onoff() timeouts to hrtimers
  torture: Make torture_hrtimeout_*() use TASK_IDLE
  torture: Add lock_torture writer_fifo module parameter
  torture: Add a kthread-creation callback to _torture_create_kthread()
  rcu-tasks: Fix boot-time RCU tasks debug-only deadlock
  rcu-tasks: Permit use of debug-objects with RCU Tasks flavors
  checkpatch: Complain about unexpected uses of RCU Tasks Trace
  torture: Cause mkinitrd.sh to indicate failure on compile errors
  torture: Make init program dump command-line arguments
  torture: Switch qemu from -nographic to -display none
  torture: Add init-program support for loongarch
  torture: Avoid torture-test reboot loops
  ...
2023-08-28 13:19:28 -07:00
Linus Torvalds
de16588a77 Merge tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
 "This contains the usual miscellaneous features, cleanups, and fixes
  for vfs and individual filesystems.

  Features:

   - Block mode changes on symlinks and rectify our broken semantics

   - Report file modifications via fsnotify() for splice

   - Allow specifying an explicit timeout for the "rootwait" kernel
     command line option. This allows to timeout and reboot instead of
     always waiting indefinitely for the root device to show up

   - Use synchronous fput for the close system call

  Cleanups:

   - Get rid of open-coded lockdep workarounds for async io submitters
     and replace it all with a single consolidated helper

   - Simplify epoll allocation helper

   - Convert simple_write_begin and simple_write_end to use a folio

   - Convert page_cache_pipe_buf_confirm() to use a folio

   - Simplify __range_close to avoid pointless locking

   - Disable per-cpu buffer head cache for isolated cpus

   - Port ecryptfs to kmap_local_page() api

   - Remove redundant initialization of pointer buf in pipe code

   - Unexport the d_genocide() function which is only used within core
     vfs

   - Replace printk(KERN_ERR) and WARN_ON() with WARN()

  Fixes:

   - Fix various kernel-doc issues

   - Fix refcount underflow for eventfds when used as EFD_SEMAPHORE

   - Fix a mainly theoretical issue in devpts

   - Check the return value of __getblk() in reiserfs

   - Fix a racy assert in i_readcount_dec

   - Fix integer conversion issues in various functions

   - Fix LSM security context handling during automounts that prevented
     NFS superblock sharing"

* tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
  cachefiles: use kiocb_{start,end}_write() helpers
  ovl: use kiocb_{start,end}_write() helpers
  aio: use kiocb_{start,end}_write() helpers
  io_uring: use kiocb_{start,end}_write() helpers
  fs: create kiocb_{start,end}_write() helpers
  fs: add kerneldoc to file_{start,end}_write() helpers
  io_uring: rename kiocb_end_write() local helper
  splice: Convert page_cache_pipe_buf_confirm() to use a folio
  libfs: Convert simple_write_begin and simple_write_end to use a folio
  fs/dcache: Replace printk and WARN_ON by WARN
  fs/pipe: remove redundant initialization of pointer buf
  fs: Fix kernel-doc warnings
  devpts: Fix kernel-doc warnings
  doc: idmappings: fix an error and rephrase a paragraph
  init: Add support for rootwait timeout parameter
  vfs: fix up the assert in i_readcount_dec
  fs: Fix one kernel-doc comment
  docs: filesystems: idmappings: clarify from where idmappings are taken
  fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs
  vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
  ...
2023-08-28 10:17:14 -07:00
Rafael J. Wysocki
0c2ec0f165 Merge branch 'acpi-thermal'
Merge ACPI thermal driver changes for 6.6-rc1:

 - Drop non-functional nocrt parameter from ACPI thermal (Mario
   Limonciello).

 - Clean up the ACPI thermal driver, rework the handling of firmware
   notifications in it and make it provide a table of generic trip point
   structures to the core during initialization (Rafael Wysocki).

* acpi-thermal:
  ACPI: thermal: Eliminate code duplication from acpi_thermal_notify()
  ACPI: thermal: Drop unnecessary thermal zone callbacks
  ACPI: thermal: Rework thermal_get_trend()
  ACPI: thermal: Use trip point table to register thermal zones
  thermal: core: Rework and rename __for_each_thermal_trip()
  ACPI: thermal: Introduce struct acpi_thermal_trip
  ACPI: thermal: Carry out trip point updates under zone lock
  ACPI: thermal: Clean up acpi_thermal_register_thermal_zone()
  thermal: core: Add priv pointer to struct thermal_trip
  thermal: core: Introduce thermal_zone_device_exec()
  thermal: core: Do not handle trip points with invalid temperature
  ACPI: thermal: Drop redundant local variable from acpi_thermal_resume()
  ACPI: thermal: Do not attach private data to ACPI handles
  ACPI: thermal: Drop enabled flag from struct acpi_thermal_active
  ACPI: thermal: Drop nocrt parameter
2023-08-25 20:44:26 +02:00
Jarkko Sakkinen
bff24699b9 tpm_tis: Revert "tpm_tis: Disable interrupts on ThinkPad T490s"
Since for MMIO driver using FIFO registers, also known as tpm_tis, the
default (and tbh recommended) behaviour is now the polling mode, the
"tristate" workaround is no longer for benefit.

If someone wants to explicitly enable IRQs for a TPM chip that should be
without question allowed. It could very well be a piece hardware in the
existing deny list because of e.g. firmware update or something similar.

While at it, document the module parameter, as this was not done in 2006
when it first appeared in the mainline.

Link: https://lore.kernel.org/linux-integrity/20201015214430.17937-1-jsnitsel@redhat.com/
Link: https://lore.kernel.org/all/1145393776.4829.19.camel@localhost.localdomain/
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-08-17 15:53:09 +00:00
Paul E. McKenney
fe24a0b632 Merge branches 'doc.2023.07.14b', 'fixes.2023.08.16a', 'rcu-tasks.2023.07.24a', 'rcuscale.2023.07.14b', 'refscale.2023.07.14b', 'torture.2023.08.14a' and 'torturescripts.2023.07.20a' into HEAD
doc.2023.07.14b:  Documentation updates.
fixes.2023.08.16a:  Miscellaneous fixes.
rcu-tasks.2023.07.24a:  RCU Tasks updates.
rcuscale.2023.07.14b:  RCU (updater) scalability test updates.
refscale.2023.07.14b:  Reference (reader) scalability test updates.
torture.2023.08.14a:  Other torture-test updates.
torturescripts.2023.07.20a:  Other torture-test scripting updates.
2023-08-16 14:31:08 -07:00
Loic Poulain
45071e1c28 init: Add support for rootwait timeout parameter
Add an optional timeout arg to 'rootwait' as the maximum time in
seconds to wait for the root device to show up before attempting
forced mount of the root filesystem.

Use case:
In case of device mapper usage for the rootfs (e.g. root=/dev/dm-0),
if the mapper is not able to create the virtual block for any reason
(wrong arguments, bad dm-verity signature, etc), the `rootwait` param
causes the kernel to wait forever. It may however be desirable to only
wait for a given time and then panic (force mount) to cause device reset.
This gives the bootloader a chance to detect the problem and to take some
measures, such as marking the booted partition as bad (for A/B case) or
entering a recovery mode.

In success case, mounting happens as soon as the root device is ready,
unlike the existing 'rootdelay' parameter which performs an unconditional
pause.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Message-Id: <20230813082349.513386-1-loic.poulain@linaro.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-15 11:34:23 +02:00
Dietmar Eggemann
5d248bb39f torture: Add lock_torture writer_fifo module parameter
This commit adds a module parameter that causes the locktorture writer
to run at real-time priority.

To use it:
insmod /lib/modules/torture.ko random_shuffle=1
insmod /lib/modules/locktorture.ko torture_type=mutex_lock rt_boost=1 rt_boost_factor=50 nested_locks=3 writer_fifo=1
													^^^^^^^^^^^^^

A predecessor to this patch has been helpful to uncover issues with the
proxy-execution series.

[ paulmck: Remove locktorture-specific code from kernel/torture.c. ]

Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: kernel-team@android.com
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
[jstultz: Include header change to build, reword commit message]
Signed-off-by: John Stultz <jstultz@google.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-08-14 15:01:07 -07:00
Linus Torvalds
64094e7e31 Merge tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/gds fixes from Dave Hansen:
 "Mitigate Gather Data Sampling issue:

   - Add Base GDS mitigation

   - Support GDS_NO under KVM

   - Fix a documentation typo"

* tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/x86: Fix backwards on/off logic about YMM support
  KVM: Add GDS_NO support to KVM
  x86/speculation: Add Kconfig option for GDS
  x86/speculation: Add force option to GDS mitigation
  x86/speculation: Add Gather Data Sampling mitigation
2023-08-07 17:03:54 -07:00
Borislav Petkov (AMD)
fb3bd914b3 x86/srso: Add a Speculative RAS Overflow mitigation
Add a mitigation for the speculative return address stack overflow
vulnerability found on AMD processors.

The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence.  To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.

To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference.  In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.

In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-07-27 11:07:14 +02:00
Costa Shulyupin
37002bc6b6 docs: move s390 under arch
and fix all in-tree references.

Architecture-specific documentation is being moved into Documentation/arch/
as a way of cleaning up the top-level documentation directory and making
the docs hierarchy more closely match the source hierarchy.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230718045550.495428-1-costa.shul@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-24 12:12:24 +02:00
Daniel Sneddon
553a5c03e9 x86/speculation: Add force option to GDS mitigation
The Gather Data Sampling (GDS) vulnerability allows malicious software
to infer stale data previously stored in vector registers. This may
include sensitive data such as cryptographic keys. GDS is mitigated in
microcode, and systems with up-to-date microcode are protected by
default. However, any affected system that is running with older
microcode will still be vulnerable to GDS attacks.

Since the gather instructions used by the attacker are part of the
AVX2 and AVX512 extensions, disabling these extensions prevents gather
instructions from being executed, thereby mitigating the system from
GDS. Disabling AVX2 is sufficient, but we don't have the granularity
to do this. The XCR0[2] disables AVX, with no option to just disable
AVX2.

Add a kernel parameter gather_data_sampling=force that will enable the
microcode mitigation if available, otherwise it will disable AVX on
affected systems.

This option will be ignored if cmdline mitigations=off.

This is a *big* hammer.  It is known to break buggy userspace that
uses incomplete, buggy AVX enumeration.  Unfortunately, such userspace
does exist in the wild:

	https://www.mail-archive.com/bug-coreutils@gnu.org/msg33046.html

[ dhansen: add some more ominous warnings about disabling AVX ]

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-07-21 12:59:49 -07:00
Daniel Sneddon
8974eb5882 x86/speculation: Add Gather Data Sampling mitigation
Gather Data Sampling (GDS) is a hardware vulnerability which allows
unprivileged speculative access to data which was previously stored in
vector registers.

Intel processors that support AVX2 and AVX512 have gather instructions
that fetch non-contiguous data elements from memory. On vulnerable
hardware, when a gather instruction is transiently executed and
encounters a fault, stale data from architectural or internal vector
registers may get transiently stored to the destination vector
register allowing an attacker to infer the stale data using typical
side channel techniques like cache timing attacks.

This mitigation is different from many earlier ones for two reasons.
First, it is enabled by default and a bit must be set to *DISABLE* it.
This is the opposite of normal mitigation polarity. This means GDS can
be mitigated simply by updating microcode and leaving the new control
bit alone.

Second, GDS has a "lock" bit. This lock bit is there because the
mitigation affects the hardware security features KeyLocker and SGX.
It needs to be enabled and *STAY* enabled for these features to be
mitigated against GDS.

The mitigation is enabled in the microcode by default. Disable it by
setting gather_data_sampling=off or by disabling all mitigations with
mitigations=off. The mitigation status can be checked by reading:

    /sys/devices/system/cpu/vulnerabilities/gather_data_sampling

Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-07-19 16:45:37 -07:00
Mario Limonciello
5f641174a1 ACPI: thermal: Drop nocrt parameter
The `nocrt` module parameter has no code associated with it and does
nothing.  As `crt=-1` has same functionality as what nocrt should be
doing drop `nocrt` and associated documentation.

This should fix a quirk for Gigabyte GA-7ZX that used `nocrt` and
thus didn't function properly.

Fixes: 8c99fdce30 ("ACPI: thermal: set "thermal.nocrt" via DMI on Gigabyte GA-7ZX")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-07-17 15:24:53 +02:00
Paul E. McKenney
2d7b2b344c rcuscale: Add kfree_by_call_rcu and kfree_mult to documentation
This commit adds the kfree_by_call_rcu and kfree_mult rcuscale module
parameters to kernel-parameters.txt.  While in the area, it updates
to rcuscale.scale_type.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
2023-07-14 15:01:49 -07:00
Paul E. McKenney
7221f493c5 rcuscale: Add minruntime module parameter
By default, rcuscale collects only 100 points of data per writer, but
arranging for all kthreads to be actively collecting (if not recording)
data during the time that any kthread might be recording.  This works
well, but does not allow much time to bring external performance tools
to bear.  This commit therefore adds a minruntime module parameter
that specifies a minimum data-collection interval in seconds.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-07-14 15:01:49 -07:00
Paul E. McKenney
2226f3dc05 rcuscale: Permit blocking delays between writers
Some workloads do isolated RCU work, and this can affect operation
latencies.  This commit therefore adds a writer_holdoff_jiffies module
parameter that causes writers to block for the specified number of
jiffies between each pair of consecutive write-side operations.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-07-14 15:01:48 -07:00
Paul E. McKenney
db13710a03 rcu-tasks: Cancel callback laziness if too many callbacks
The various RCU Tasks flavors now do lazy grace periods when there are
only asynchronous grace period requests.  By default, the system will let
250 milliseconds elapse after the first call_rcu_tasks*() callbacki is
queued before starting a grace period.  In contrast, synchronous grace
period requests such as synchronize_rcu_tasks*() will start a grace
period immediately.

However, invoking one of the call_rcu_tasks*() functions in a too-tight
loop can result in a callback flood, which in turn can exhaust memory
if grace periods are delayed for too long.

This commit therefore sets a limit so that the grace-period kthread
will be awakened when any CPU's callback list expands to contain
rcupdate.rcu_task_lazy_lim callbacks elements (defaulting to 32, set to -1
to disable), the grace-period kthread will be awakened, thus cancelling
any ongoing laziness and getting out in front of the potential callback
flood.

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-07-14 15:00:12 -07:00
Paul E. McKenney
450d461aa6 rcu-tasks: Add kernel boot parameters for callback laziness
This commit adds kernel boot parameters for callback laziness, allowing
the RCU Tasks flavors to be individually adjusted.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-07-14 15:00:12 -07:00
Linus Torvalds
7210de3a32 Merge tag 'docs-6.5-2' of git://git.lwn.net/linux
Pull mode documentation updates from Jonathan Corbet:
 "A half-dozen late arriving docs patches. They are mostly fixes, but we
  also have a kernel-doc tweak for enums and the long-overdue removal of
  the outdated and redundant patch-submission comments at the top of the
  MAINTAINERS file"

* tag 'docs-6.5-2' of git://git.lwn.net/linux:
  scripts: kernel-doc: support private / public marking for enums
  Documentation: KVM: SEV: add a missing backtick
  Documentation: ACPI: fix typo in ssdt-overlays.rst
  Fix documentation of panic_on_warn
  docs: remove the tips on how to submit patches from MAINTAINERS
  docs: fix typo in zh_TW and zh_CN translation
2023-07-06 22:15:38 -07:00
Olaf Hering
57ada2358f Fix documentation of panic_on_warn
The kernel cmdline option panic_on_warn expects an integer, it is not a
plain option as documented. A number of uses in the tree figured this
already, and use panic_on_warn=1 for their purpose.

Adjust a comment which otherwise may mislead people in the future.

Fixes: 9e3961a097 ("kernel: add panic_on_warn")
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-07-04 08:29:32 -06:00
Linus Torvalds
533925cb76 Merge tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:

 - Support for ACPI

 - Various cleanups to the ISA string parsing, including making them
   case-insensitive

 - Support for the vector extension

 - Support for independent irq/softirq stacks

 - Our CPU DT binding now has "unevaluatedProperties: false"

* tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (78 commits)
  riscv: hibernate: remove WARN_ON in save_processor_state
  dt-bindings: riscv: cpus: switch to unevaluatedProperties: false
  dt-bindings: riscv: cpus: add a ref the common cpu schema
  riscv: stack: Add config of thread stack size
  riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK
  riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK
  RISC-V: always report presence of extensions formerly part of the base ISA
  dt-bindings: riscv: explicitly mention assumption of Zicntr & Zihpm support
  RISC-V: remove decrement/increment dance in ISA string parser
  RISC-V: rework comments in ISA string parser
  RISC-V: validate riscv,isa at boot, not during ISA string parsing
  RISC-V: split early & late of_node to hartid mapping
  RISC-V: simplify register width check in ISA string parsing
  perf: RISC-V: Limit the number of counters returned from SBI
  riscv: replace deprecated scall with ecall
  riscv: uprobes: Restore thread.bad_cause
  riscv: mm: try VMA lock-based page fault handling first
  riscv: mm: Pre-allocate PGD entries for vmalloc/modules area
  RISC-V: hwprobe: Expose Zba, Zbb, and Zbs
  RISC-V: Track ISA extensions per hart
  ...
2023-06-30 09:37:26 -07:00
Linus Torvalds
d35ac6ac0e Merge tag 'iommu-updates-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
 "Core changes:
   - iova_magazine_alloc() optimization
   - Make flush-queue an IOMMU driver capability
   - Consolidate the error handling around device attachment

  AMD IOMMU changes:
   - AVIC Interrupt Remapping Improvements
   - Some minor fixes and cleanups

  Intel VT-d changes from Lu Baolu:
   - Small and misc cleanups

  ARM-SMMU changes from Will Deacon:
   - Device-tree binding updates:
      - Add missing clocks for SC8280XP and SA8775 Adreno SMMUs
      - Add two new Qualcomm SMMUs in SDX75 and SM6375
   - Workarounds for Arm MMU-700 errata:
      - 1076982: Avoid use of SEV-based cmdq wakeup
      - 2812531: Terminate command batches with a CMD_SYNC
      - Enforce single-stage translation to avoid nesting-related errata
   - Set the correct level hint for range TLB invalidation on teardown

  .. and some other minor fixes and cleanups (including Freescale PAMU
  and virtio-iommu changes)"

* tag 'iommu-updates-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (50 commits)
  iommu/vt-d: Remove commented-out code
  iommu/vt-d: Remove two WARN_ON in domain_context_mapping_one()
  iommu/vt-d: Handle the failure case of dmar_reenable_qi()
  iommu/vt-d: Remove unnecessary (void*) conversions
  iommu/amd: Remove extern from function prototypes
  iommu/amd: Use BIT/BIT_ULL macro to define bit fields
  iommu/amd: Fix DTE_IRQ_PHYS_ADDR_MASK macro
  iommu/amd: Fix compile error for unused function
  iommu/amd: Improving Interrupt Remapping Table Invalidation
  iommu/amd: Do not Invalidate IRT when IRTE caching is disabled
  iommu/amd: Introduce Disable IRTE Caching Support
  iommu/amd: Remove the unused struct amd_ir_data.ref
  iommu/amd: Switch amd_iommu_update_ga() to use modify_irte_ga()
  iommu/arm-smmu-v3: Set TTL invalidation hint better
  iommu/arm-smmu-v3: Document nesting-related errata
  iommu/arm-smmu-v3: Add explicit feature for nesting
  iommu/arm-smmu-v3: Document MMU-700 erratum 2812531
  iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982
  dt-bindings: arm-smmu: Add SDX75 SMMU compatible
  dt-bindings: arm-smmu: Add SM6375 GPU SMMU
  ...
2023-06-29 20:51:03 -07:00
Linus Torvalds
b775d6c585 Merge tag 'mips_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:

 - add support for TP-Link HC220 G5 v1

 - add support for Wifi/Bluetooth on CI20

 - rework Ralink clock and reset handling

 - cleanups and fixes

* tag 'mips_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (58 commits)
  MIPS: Loongson64: DTS: Add RTC support to Loongson-2K1000
  MIPS: Loongson64: DTS: Add RTC support to LS7A PCH
  MIPS: OCTEON: octeon-usb: cleanup divider calculation
  MIPS: OCTEON: octeon-usb: introduce dwc3_octeon_{read,write}q
  MIPS: OCTEON: octeon-usb: move gpio config to separate function
  MIPS: OCTEON: octeon-usb: use bitfields for shim register
  MIPS: OCTEON: octeon-usb: use bitfields for host config register
  MIPS: OCTEON: octeon-usb: use bitfields for control register
  MIPS: OCTEON: octeon-usb: add all register offsets
  mips: ralink: match all supported system controller compatible strings
  MIPS: dec: prom: Address -Warray-bounds warning
  MIPS: DTS: CI20: Raise VDDCORE voltage to 1.125 volts
  clk: ralink: mtmips: Fix uninitialized use of ret in mtmips_register_{fixed,factor}_clocks()
  mips: ralink: introduce commonly used remap node function
  mips: pci-mt7620: use dev_info() to log PCIe device detection result
  mips: pci-mt7620: do not print NFTS register value as error log
  MAINTAINERS: add Mediatek MTMIPS Clock maintainer
  mips: ralink: get cpu rate from new driver code
  mips: ralink: remove reset related code
  mips: ralink: mt7620: remove clock related code
  ...
2023-06-29 15:01:51 -07:00
Linus Torvalds
6aeadf7896 Merge tag 'docs-arm64-move' of git://git.lwn.net/linux
Pull arm64 documentation move from Jonathan Corbet:
 "Move the arm64 architecture documentation under Documentation/arch/.

  This brings some order to the documentation directory, declutters the
  top-level directory, and makes the documentation organization more
  closely match that of the source"

* tag 'docs-arm64-move' of git://git.lwn.net/linux:
  perf arm-spe: Fix a dangling Documentation/arm64 reference
  mm: Fix a dangling Documentation/arm64 reference
  arm64: Fix dangling references to Documentation/arm64
  dt-bindings: fix dangling Documentation/arm64 reference
  docs: arm64: Move arm64 documentation under Documentation/arch/
2023-06-27 21:52:15 -07:00
Linus Torvalds
7ab044a4f4 Merge tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:

 - Concurrency-managed per-cpu work items that hog CPUs and delay the
   execution of other work items are now automatically detected and
   excluded from concurrency management. Reporting on such work items
   can also be enabled through a config option.

 - Added tools/workqueue/wq_monitor.py which improves visibility into
   workqueue usages and behaviors.

 - Arnd's minimal fix for gcc-13 enum warning on 32bit compiles,
   superseded by commit afa4bb778e in mainline.

* tag 'wq-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Disable per-cpu CPU hog detection when wq_cpu_intensive_thresh_us is 0
  workqueue: Fix WARN_ON_ONCE() triggers in worker_enter_idle()
  workqueue: fix enum type for gcc-13
  workqueue: Track and monitor per-workqueue CPU time usage
  workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism
  workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE
  workqueue: Improve locking rule description for worker fields
  workqueue: Move worker_set/clr_flags() upwards
  workqueue: Re-order struct worker fields
  workqueue: Add pwq->stats[] and a monitoring script
  Further upgrade queue_work_on() comment
2023-06-27 16:32:52 -07:00
Linus Torvalds
6f612579be Merge tag 'objtool-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molar:
 "Build footprint & performance improvements:

   - Reduce memory usage with CONFIG_DEBUG_INFO=y

     In the worst case of an allyesconfig+CONFIG_DEBUG_INFO=y kernel,
     DWARF creates almost 200 million relocations, ballooning objtool's
     peak heap usage to 53GB. These patches reduce that to 25GB.

     On a distro-type kernel with kernel IBT enabled, they reduce
     objtool's peak heap usage from 4.2GB to 2.8GB.

     These changes also improve the runtime significantly.

  Debuggability improvements:

   - Add the unwind_debug command-line option, for more extend unwinding
     debugging output
   - Limit unreachable warnings to once per function
   - Add verbose option for disassembling affected functions
   - Include backtrace in verbose mode
   - Detect missing __noreturn annotations
   - Ignore exc_double_fault() __noreturn warnings
   - Remove superfluous global_noreturns entries
   - Move noreturn function list to separate file
   - Add __kunit_abort() to noreturns

  Unwinder improvements:

   - Allow stack operations in UNWIND_HINT_UNDEFINED regions
   - drm/vmwgfx: Add unwind hints around RBP clobber

  Cleanups:

   - Move the x86 entry thunk restore code into thunk functions
   - x86/unwind/orc: Use swap() instead of open coding it
   - Remove unnecessary/unused variables

  Fixes for modern stack canary handling"

* tag 'objtool-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
  x86/orc: Make the is_callthunk() definition depend on CONFIG_BPF_JIT=y
  objtool: Skip reading DWARF section data
  objtool: Free insns when done
  objtool: Get rid of reloc->rel[a]
  objtool: Shrink elf hash nodes
  objtool: Shrink reloc->sym_reloc_entry
  objtool: Get rid of reloc->jump_table_start
  objtool: Get rid of reloc->addend
  objtool: Get rid of reloc->type
  objtool: Get rid of reloc->offset
  objtool: Get rid of reloc->idx
  objtool: Get rid of reloc->list
  objtool: Allocate relocs in advance for new rela sections
  objtool: Add for_each_reloc()
  objtool: Don't free memory in elf_close()
  objtool: Keep GElf_Rel[a] structs synced
  objtool: Add elf_create_section_pair()
  objtool: Add mark_sec_changed()
  objtool: Fix reloc_hash size
  objtool: Consolidate rel/rela handling
  ...
2023-06-27 15:05:41 -07:00
Linus Torvalds
dc43fc753b Merge tag 'x86_mtrr_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mtrr updates from Borislav Petkov:
 "A serious scrubbing of the MTRR code including adding a new map
  mechanism in order to look up the memory type of a region easily.

  Also address memory range lookup issues like returning an invalid
  memory type. Furthermore, this handles the decoupling of PAT from MTRR
  more naturally.

  All work by Juergen Gross"

* tag 'x86_mtrr_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/xen: Set default memory type for PV guests to WB
  x86/mtrr: Unify debugging printing
  x86/mtrr: Remove unused code
  x86/mm: Only check uniform after calling mtrr_type_lookup()
  x86/mtrr: Don't let mtrr_type_lookup() return MTRR_TYPE_INVALID
  x86/mtrr: Use new cache_map in mtrr_type_lookup()
  x86/mtrr: Add mtrr=debug command line option
  x86/mtrr: Construct a memory map with cache modes
  x86/mtrr: Add get_effective_type() service function
  x86/mtrr: Allocate mtrr_value array dynamically
  x86/mtrr: Move 32-bit code from mtrr.c to legacy.c
  x86/mtrr: Have only one set_mtrr() variant
  x86/mtrr: Replace vendor tests in MTRR code
  x86/xen: Set MTRR state when running as Xen PV initial domain
  x86/hyperv: Set MTRR state when running as SEV-SNP Hyper-V guest
  x86/mtrr: Support setting MTRR state for software defined MTRRs
  x86/mtrr: Replace size_or_mask and size_and_mask with a much easier concept
  x86/mtrr: Remove physical address size calculation
2023-06-27 13:11:32 -07:00
Linus Torvalds
a354049532 Merge tag 'docs-6.5' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
 "It's been a relatively calm cycle in docsland. We do have:

   - Some initial page-table documentation from Linus (the other Linus)

   - Regression-handling documentation improvements from Thorsten

   - Addition of kerneldoc documentation for the ERR_PTR() and related
     macros from James Seo

  ... and the usual collection of fixes and updates"

* tag 'docs-6.5' of git://git.lwn.net/linux:
  docs: consolidate storage interfaces
  Documentation: update git configuration for Link: tag
  Documentation: KVM: make corrections to vcpu-requests.rst
  Documentation: KVM: make corrections to ppc-pv.rst
  Documentation: KVM: make corrections to locking.rst
  Documentation: KVM: make corrections to halt-polling.rst
  Documentation: virt: correct location of haltpoll module params
  Documentation/mm: Initial page table documentation
  docs: crypto: async-tx-api: fix typo in struct name
  docs/doc-guide: Clarify how to write tables
  docs: handling-regressions: rework section about fixing procedures
  docs: process: fix a typoed cross-reference
  docs: submitting-patches: Discuss interleaved replies
  MAINTAINERS: direct process doc changes to a dedicated ML
  Documentation: core-api: Add error pointer functions to kernel-api
  err.h: Add missing kerneldocs for error pointer functions
  Documentation: conf.py: Add __force to c_id_attributes
  docs: clarify KVM related kernel parameters' descriptions
  docs: consolidate human interface subsystems
  docs: admin-guide: Add information about intel_pstate active mode
2023-06-27 11:33:47 -07:00
Linus Torvalds
af96134dc8 Merge tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
 "Documentation updates

  Miscellaneous fixes, perhaps most notably:

   - Remove RCU_NONIDLE(). The new visibility of most of the idle loop
     to RCU has obsoleted this API.

   - Make the RCU_SOFTIRQ callback-invocation time limit also apply to
     the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT.

   - Add a jiffies-based callback-invocation time limit to handle
     long-running callbacks. (The local_clock() function is only invoked
     once per 32 callbacks due to its high overhead.)

   - Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs, which
     fixes a bug that can occur on systems with non-contiguous CPU
     numbering.

  kvfree_rcu updates:

   - Eliminate the single-argument variant of k[v]free_rcu() now that
     all uses have been converted to k[v]free_rcu_mightsleep().

   - Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks too
     soon. Yes, this is closing the barn door after the horse has
     escaped, but Murphy says that there will be more horses.

  Callback-offloading updates:

   - Fix a number of bugs involving the shrinker and lazy callbacks.

  Tasks RCU updates

  Torture-test updates"

* tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits)
  torture: Remove duplicated argument -enable-kvm for ppc64
  doc/rcutorture: Add description of rcutorture.stall_cpu_block
  rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale
  rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup()
  rcutorture: Correct name of use_softirq module parameter
  locktorture: Add long_hold to adjust lock-hold delays
  rcu/nocb: Make shrinker iterate only over NOCB CPUs
  rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs
  rcu: Make rcu_cpu_starting() rely on interrupts being disabled
  rcu: Mark rcu_cpu_kthread() accesses to ->rcu_cpu_has_work
  rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
  rcu: Employ jiffies-based backstop to callback time limit
  rcu: Check callback-invocation time limit for rcuc kthreads
  rcu: Remove RCU_NONIDLE()
  rcu: Add more RCU files to kernel-api.rst
  rcu-tasks: Clarify the cblist_init_generic() function's pr_info() output
  rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic()
  rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker
  rcu/nocb: Fix shrinker race against callback enqueuer
  rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading
  ...
2023-06-27 10:37:01 -07:00
Linus Torvalds
2605e80d34 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
 "Notable features are user-space support for the memcpy/memset
  instructions and the permission indirection extension.

   - Support for the Armv8.9 Permission Indirection Extensions. While
     this feature doesn't add new functionality, it enables future
     support for Guarded Control Stacks (GCS) and Permission Overlays

   - User-space support for the Armv8.8 memcpy/memset instructions

   - arm64 perf: support the HiSilicon SoC uncore PMU, Arm CMN sysfs
     identifier, support for the NXP i.MX9 SoC DDRC PMU, fixes and
     cleanups

   - Removal of superfluous ISBs on context switch (following
     retrospective architecture tightening)

   - Decode the ISS2 register during faults for additional information
     to help with debugging

   - KPTI clean-up/simplification of the trampoline exit code

   - Addressing several -Wmissing-prototype warnings

   - Kselftest improvements for signal handling and ptrace

   - Fix TPIDR2_EL0 restoring on sigreturn

   - Clean-up, robustness improvements of the module allocation code

   - More sysreg conversions to the automatic register/bitfields
     generation

   - CPU capabilities handling cleanup

   - Arm documentation updates: ACPI, ptdump"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (124 commits)
  kselftest/arm64: Add a test case for TPIDR2 restore
  arm64/signal: Restore TPIDR2 register rather than memory state
  arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe
  Documentation/arm64: Add ptdump documentation
  arm64: hibernate: remove WARN_ON in save_processor_state
  kselftest/arm64: Log signal code and address for unexpected signals
  docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst
  arm64/fpsimd: Exit streaming mode when flushing tasks
  docs: perf: Add new description for HiSilicon UC PMU
  drivers/perf: hisi: Add support for HiSilicon UC PMU driver
  drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver
  perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE
  perf/arm-cmn: Add sysfs identifier
  perf/arm-cmn: Revamp model detection
  perf/arm_dmc620: Add cpumask
  arm64: mm: fix VA-range sanity check
  arm64/mm: remove now-superfluous ISBs from TTBR writes
  Documentation/arm64: Update ACPI tables from BBR
  Documentation/arm64: Update references in arm-acpi
  Documentation/arm64: Update ARM and arch reference
  ...
2023-06-26 17:11:53 -07:00
Linus Torvalds
9244724fbf Merge tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP updates from Thomas Gleixner:
 "A large update for SMP management:

   - Parallel CPU bringup

     The reason why people are interested in parallel bringup is to
     shorten the (kexec) reboot time of cloud servers to reduce the
     downtime of the VM tenants.

     The current fully serialized bringup does the following per AP:

       1) Prepare callbacks (allocate, intialize, create threads)
       2) Kick the AP alive (e.g. INIT/SIPI on x86)
       3) Wait for the AP to report alive state
       4) Let the AP continue through the atomic bringup
       5) Let the AP run the threaded bringup to full online state

     There are two significant delays:

       #3 The time for an AP to report alive state in start_secondary()
          on x86 has been measured in the range between 350us and 3.5ms
          depending on vendor and CPU type, BIOS microcode size etc.

       #4 The atomic bringup does the microcode update. This has been
          measured to take up to ~8ms on the primary threads depending
          on the microcode patch size to apply.

     On a two socket SKL server with 56 cores (112 threads) the boot CPU
     spends on current mainline about 800ms busy waiting for the APs to
     come up and apply microcode. That's more than 80% of the actual
     onlining procedure.

     This can be reduced significantly by splitting the bringup
     mechanism into two parts:

       1) Run the prepare callbacks and kick the AP alive for each AP
          which needs to be brought up.

          The APs wake up, do their firmware initialization and run the
          low level kernel startup code including microcode loading in
          parallel up to the first synchronization point. (#1 and #2
          above)

       2) Run the rest of the bringup code strictly serialized per CPU
          (#3 - #5 above) as it's done today.

          Parallelizing that stage of the CPU bringup might be possible
          in theory, but it's questionable whether required surgery
          would be justified for a pretty small gain.

     If the system is large enough the first AP is already waiting at
     the first synchronization point when the boot CPU finished the
     wake-up of the last AP. That reduces the AP bringup time on that
     SKL from ~800ms to ~80ms, i.e. by a factor ~10x.

     The actual gain varies wildly depending on the system, CPU,
     microcode patch size and other factors. There are some
     opportunities to reduce the overhead further, but that needs some
     deep surgery in the x86 CPU bringup code.

     For now this is only enabled on x86, but the core functionality
     obviously works for all SMP capable architectures.

   - Enhancements for SMP function call tracing so it is possible to
     locate the scheduling and the actual execution points. That allows
     to measure IPI delivery time precisely"

* tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  trace,smp: Add tracepoints for scheduling remotelly called functions
  trace,smp: Add tracepoints around remotelly called functions
  MAINTAINERS: Add CPU HOTPLUG entry
  x86/smpboot: Fix the parallel bringup decision
  x86/realmode: Make stack lock work in trampoline_compat()
  x86/smp: Initialize cpu_primary_thread_mask late
  cpu/hotplug: Fix off by one in cpuhp_bringup_mask()
  x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils
  x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it
  x86/smpboot: Support parallel startup of secondary CPUs
  x86/smpboot: Implement a bit spinlock to protect the realmode stack
  x86/apic: Save the APIC virtual base address
  cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
  x86/apic: Provide cpu_primary_thread mask
  x86/smpboot: Enable split CPU startup
  cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
  cpu/hotplug: Reset task stack state in _cpu_up()
  cpu/hotplug: Remove unused state functions
  riscv: Switch to hotplug core state synchronization
  parisc: Switch to hotplug core state synchronization
  ...
2023-06-26 13:59:56 -07:00
Jonathan Corbet
e4624435f3 docs: arm64: Move arm64 documentation under Documentation/arch/
Architecture-specific documentation is being moved into Documentation/arch/
as a way of cleaning up the top-level documentation directory and making
the docs hierarchy more closely match the source hierarchy.  Move
Documentation/arm64 into arch/ (along with the Chinese equvalent
translations) and fix up documentation references.

Cc: Will Deacon <will@kernel.org>
Cc: Alex Shi <alexs@kernel.org>
Cc: Hu Haowen <src.res@email.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Yantengsi <siyanteng@loongson.cn>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-06-21 08:51:51 -06:00
Suravee Suthikulpanit
66419036f6 iommu/amd: Introduce Disable IRTE Caching Support
An Interrupt Remapping Table (IRT) stores interrupt remapping configuration
for each device. In a normal operation, the AMD IOMMU caches the table
to optimize subsequent data accesses. This requires the IOMMU driver to
invalidate IRT whenever it updates the table. The invalidation process
includes issuing an INVALIDATE_INTERRUPT_TABLE command following by
a COMPLETION_WAIT command.

However, there are cases in which the IRT is updated at a high rate.
For example, for IOMMU AVIC, the IRTE[IsRun] bit is updated on every
vcpu scheduling (i.e. amd_iommu_update_ga()). On system with large
amount of vcpus and VFIO PCI pass-through devices, the invalidation
process could potentially become a performance bottleneck.

Introducing a new kernel boot option:

    amd_iommu=irtcachedis

which disables IRTE caching by setting the IRTCachedis bit in each IOMMU
Control register, and bypass the IRT invalidation process.

Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Co-developed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20230530141137.14376-4-suravee.suthikulpanit@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-09 14:47:09 +02:00
Jiaxun Yang
975fd3c26f MIPS: Select CONFIG_GENERIC_IDLE_POLL_SETUP
hlt,nohlt paramaters are useful when debugging cpuidle
related issues.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2023-06-09 10:34:26 +02:00
Jiaxun Yang
96cb8ae28c MIPS: Rework smt cmdline parameters
Provide a generic smt parameters interface aligned with s390
to allow users to limit smt usage and threads per core.

It replaced previous undocumented "nothreads" parameter for
smp-cps which is ambiguous and does not cover smp-mt.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2023-06-09 10:34:14 +02:00
Paul E. McKenney
2e31da752c Merge branches 'doc.2023.05.10a', 'fixes.2023.05.11a', 'kvfree.2023.05.10a', 'nocb.2023.05.11a', 'rcu-tasks.2023.05.10a', 'torture.2023.05.15a' and 'rcu-urgent.2023.06.06a' into HEAD
doc.2023.05.10a: Documentation updates
fixes.2023.05.11a: Miscellaneous fixes
kvfree.2023.05.10a: kvfree_rcu updates
nocb.2023.05.11a: Callback-offloading updates
rcu-tasks.2023.05.10a: Tasks RCU updates
torture.2023.05.15a: Torture-test updates
rcu-urgent.2023.06.06a: Urgent SRCU fix
2023-06-07 13:44:06 -07:00
Christoph Hellwig
702f3189e4 block: move the code to do early boot lookup of block devices to block/
Create a new block/early-lookup.c to keep the early block device lookup
code instead of having this code sit with the early mount code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05 10:57:40 -06:00
Christoph Hellwig
cf056a4312 init: improve the name_to_dev_t interface
name_to_dev_t has a very misleading name, that doesn't make clear
it should only be used by the early init code, and also has a bad
calling convention that doesn't allow returning different kinds of
errors.  Rename it to early_lookup_bdev to make the use case clear,
and return an errno, where -EINVAL means the string could not be
parsed, and -ENODEV means it the string was valid, but there was
no device found for it.

Also stub out the whole call for !CONFIG_BLOCK as all the non-block
root cases are always covered in the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05 10:56:46 -06:00
Christoph Hellwig
c0c1a7dcb6 init: move the nfs/cifs/ram special cases out of name_to_dev_t
The nfs/cifs/ram special case only needs to be parsed once, and only in
the boot code.  Move them out of name_to_dev_t and into
prepare_namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05 10:55:20 -06:00
Kristina Martsenko
3e1dedb29d arm64: mops: allow disabling MOPS from the kernel command line
Make it possible to disable the MOPS extension at runtime using the
kernel command line. This can be useful for testing or working around
hardware issues. For example it could be used to test new memory copy
routines that do not use MOPS instructions (e.g. from Arm Optimized
Routines).

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20230509142235.3284028-11-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-05 17:05:42 +01:00
Sunil V L
724f4c0df7 RISC-V: Add ACPI initialization in setup_arch()
Initialize the ACPI core for RISC-V during boot.

ACPI tables and interpreter are initialized based on
the information passed from the firmware and the value of
the kernel parameter 'acpi'.

With ACPI support added for RISC-V, the kernel parameter 'acpi'
is also supported on RISC-V. Hence, update the documentation.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230515054928.2079268-9-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-01 08:45:03 -07:00
Juergen Gross
a431660353 x86/mtrr: Add mtrr=debug command line option
Add a new command line option "mtrr=debug" for getting debug output
after building the new cache mode map. The output will include MTRR
register values and the resulting map.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20230502120931.20719-13-jgross@suse.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-06-01 15:04:33 +02:00
Yan-Jie Wang
f41dd67da6 docs: clarify KVM related kernel parameters' descriptions
The descriptions of certain KVM related kernel parameters can be
confusing. They state "Disable ...," which may make people think that
setting them to 1 will disable the associated feature when in fact the
opposite is true.

This commit addresses this issue by revising the descriptions of these
parameters by using "Control..." rather than "Enable/Disable...".
1==enabled or 0==disabled can be communicated by the description of
default value such as "1 (enabled)" or "0 (disabled)".

Also update the description of KVM's default value for kvm-intel.nested
as it is enabled by default.

Signed-off-by: Yan-Jie Wang <yanjiewtw@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20230503081530.19956-1-yanjiewtw@gmail.com
2023-05-19 08:56:30 -06:00
Natesh Sharma
f02c20d9f1 docs: admin-guide: Add information about intel_pstate active mode
Information about intel_pstate active mode is added in the doc.
This operation mode could be used to set on the hardware when it's
not activated. Status of the mode could be checked from sysfs file
i.e., /sys/devices/system/cpu/intel_pstate/status.
The information is already available in cpu-freq/intel-pstate.txt
documentation.

Signed-off-by: Natesh Sharma <nsharma@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
[jc: reformatted for width ]
Link: https://lore.kernel.org/r/20230427083706.49882-1-nsharma@redhat.com
2023-05-19 08:33:27 -06:00
Tejun Heo
6363845005 workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism
Workqueue now automatically marks per-cpu work items that hog CPU for too
long as CPU_INTENSIVE, which excludes them from concurrency management and
prevents stalling other concurrency-managed work items. If a work function
keeps running over the thershold, it likely needs to be switched to use an
unbound workqueue.

This patch adds a debug mechanism which tracks the work functions which
trigger the automatic CPU_INTENSIVE mechanism and report them using
pr_warn() with exponential backoff.

v3: Documentation update.

v2: Drop bouncing to kthread_worker for printing messages. It was to avoid
    introducing circular locking dependency through printk but not effective
    as it still had pool lock -> wci_lock -> printk -> pool lock loop. Let's
    just print directly using printk_deferred().

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
2023-05-17 17:02:08 -10:00
Tejun Heo
616db8779b workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE
If a per-cpu work item hogs the CPU, it can prevent other work items from
starting through concurrency management. A per-cpu workqueue which intends
to host such CPU-hogging work items can choose to not participate in
concurrency management by setting %WQ_CPU_INTENSIVE; however, this can be
error-prone and difficult to debug when missed.

This patch adds an automatic CPU usage based detection. If a
concurrency-managed work item consumes more CPU time than the threshold
(10ms by default) continuously without intervening sleeps, wq_worker_tick()
which is called from scheduler_tick() will detect the condition and
automatically mark it CPU_INTENSIVE.

The mechanism isn't foolproof:

* Detection depends on tick hitting the work item. Getting preempted at the
  right timings may allow a violating work item to evade detection at least
  temporarily.

* nohz_full CPUs may not be running ticks and thus can fail detection.

* Even when detection is working, the 10ms detection delays can add up if
  many CPU-hogging work items are queued at the same time.

However, in vast majority of cases, this should be able to detect violations
reliably and provide reasonable protection with a small increase in code
complexity.

If some work items trigger this condition repeatedly, the bigger problem
likely is the CPU being saturated with such per-cpu work items and the
solution would be making them UNBOUND. The next patch will add a debug
mechanism to help spot such cases.

v4: Documentation for workqueue.cpu_intensive_thresh_us added to
    kernel-parameters.txt.

v3: Switch to use wq_worker_tick() instead of hooking into preemptions as
    suggested by Peter.

v2: Lai pointed out that wq_worker_stopping() also needs to be called from
    preemption and rtlock paths and an earlier patch was updated
    accordingly. This patch adds a comment describing the risk of infinte
    recursions and how they're avoided.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
2023-05-17 17:02:08 -10:00
Josh Poimboeuf
89da5a69a8 x86/unwind/orc: Add 'unwind_debug' cmdline option
Sometimes the one-line ORC unwinder warnings aren't very helpful.  Add a
new 'unwind_debug' cmdline option which will dump the full stack
contents of the current task when an error condition is encountered.

Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lore.kernel.org/r/6afb9e48a05fd2046bfad47e69b061b43dfd0e0e.1681331449.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-05-16 06:31:50 -07:00