rk: revert to v4.19
Signed-off-by: Tao Huang <huangtao@rock-chips.com> Change-Id: I502dce68b639df4ebf5a1688e0dc2e5c5763ebc2
This commit is contained in:
4
.gitignore
vendored
4
.gitignore
vendored
@@ -48,10 +48,6 @@ modules.builtin
|
||||
#
|
||||
# Top-level generic files
|
||||
#
|
||||
/boot.img
|
||||
/kernel.img
|
||||
/resource.img
|
||||
/zboot.img
|
||||
/tags
|
||||
/TAGS
|
||||
/linux
|
||||
|
||||
@@ -6,8 +6,6 @@ Description:
|
||||
This file allows user to read/write the raw NVMEM contents.
|
||||
Permissions for write to this file depends on the nvmem
|
||||
provider configuration.
|
||||
Note: This file is only present if CONFIG_NVMEM_SYSFS
|
||||
is enabled
|
||||
|
||||
ex:
|
||||
hexdump /sys/bus/nvmem/devices/qfprom0/nvmem
|
||||
|
||||
@@ -12,10 +12,6 @@ Date: Dec 2014
|
||||
KernelVersion: 4.0
|
||||
Description: Control descriptors
|
||||
|
||||
All attributes read only:
|
||||
bInterfaceNumber - USB interface number for this
|
||||
streaming interface
|
||||
|
||||
What: /config/usb-gadget/gadget/functions/uvc.name/control/class
|
||||
Date: Dec 2014
|
||||
KernelVersion: 4.0
|
||||
@@ -113,10 +109,6 @@ Date: Dec 2014
|
||||
KernelVersion: 4.0
|
||||
Description: Streaming descriptors
|
||||
|
||||
All attributes read only:
|
||||
bInterfaceNumber - USB interface number for this
|
||||
streaming interface
|
||||
|
||||
What: /config/usb-gadget/gadget/functions/uvc.name/streaming/class
|
||||
Date: Dec 2014
|
||||
KernelVersion: 4.0
|
||||
@@ -168,10 +160,6 @@ Description: Specific MJPEG format descriptors
|
||||
|
||||
All attributes read only,
|
||||
except bmaControls and bDefaultFrameIndex:
|
||||
bFormatIndex - unique id for this format descriptor;
|
||||
only defined after parent header is
|
||||
linked into the streaming class;
|
||||
read-only
|
||||
bmaControls - this format's data for bmaControls in
|
||||
the streaming header
|
||||
bmInterfaceFlags - specifies interlace information,
|
||||
@@ -189,10 +177,6 @@ Date: Dec 2014
|
||||
KernelVersion: 4.0
|
||||
Description: Specific MJPEG frame descriptors
|
||||
|
||||
bFrameIndex - unique id for this framedescriptor;
|
||||
only defined after parent format is
|
||||
linked into the streaming header;
|
||||
read-only
|
||||
dwFrameInterval - indicates how frame interval can be
|
||||
programmed; a number of values
|
||||
separated by newline can be specified
|
||||
@@ -220,10 +204,6 @@ Date: Dec 2014
|
||||
KernelVersion: 4.0
|
||||
Description: Specific uncompressed format descriptors
|
||||
|
||||
bFormatIndex - unique id for this format descriptor;
|
||||
only defined after parent header is
|
||||
linked into the streaming class;
|
||||
read-only
|
||||
bmaControls - this format's data for bmaControls in
|
||||
the streaming header
|
||||
bmInterfaceFlags - specifies interlace information,
|
||||
@@ -244,10 +224,6 @@ Date: Dec 2014
|
||||
KernelVersion: 4.0
|
||||
Description: Specific uncompressed frame descriptors
|
||||
|
||||
bFrameIndex - unique id for this framedescriptor;
|
||||
only defined after parent format is
|
||||
linked into the streaming header;
|
||||
read-only
|
||||
dwFrameInterval - indicates how frame interval can be
|
||||
programmed; a number of values
|
||||
separated by newline can be specified
|
||||
|
||||
@@ -1,16 +0,0 @@
|
||||
What: /proc/uid_concurrent_active_time
|
||||
Date: December 2018
|
||||
Contact: Connor O'Brien <connoro@google.com>
|
||||
Description:
|
||||
The /proc/uid_concurrent_active_time file displays aggregated cputime
|
||||
numbers for each uid, broken down by the total number of cores that were
|
||||
active while the uid's task was running.
|
||||
|
||||
What: /proc/uid_concurrent_policy_time
|
||||
Date: December 2018
|
||||
Contact: Connor O'Brien <connoro@google.com>
|
||||
Description:
|
||||
The /proc/uid_concurrent_policy_time file displays aggregated cputime
|
||||
numbers for each uid, broken down based on the cpufreq policy
|
||||
of the core used by the uid's task and the number of cores associated
|
||||
with that policy that were active while the uid's task was running.
|
||||
@@ -98,42 +98,3 @@ Description:
|
||||
The backing_dev file is read-write and set up backing
|
||||
device for zram to write incompressible pages.
|
||||
For using, user should enable CONFIG_ZRAM_WRITEBACK.
|
||||
|
||||
What: /sys/block/zram<id>/idle
|
||||
Date: November 2018
|
||||
Contact: Minchan Kim <minchan@kernel.org>
|
||||
Description:
|
||||
idle file is write-only and mark zram slot as idle.
|
||||
If system has mounted debugfs, user can see which slots
|
||||
are idle via /sys/kernel/debug/zram/zram<id>/block_state
|
||||
|
||||
What: /sys/block/zram<id>/writeback
|
||||
Date: November 2018
|
||||
Contact: Minchan Kim <minchan@kernel.org>
|
||||
Description:
|
||||
The writeback file is write-only and trigger idle and/or
|
||||
huge page writeback to backing device.
|
||||
|
||||
What: /sys/block/zram<id>/bd_stat
|
||||
Date: November 2018
|
||||
Contact: Minchan Kim <minchan@kernel.org>
|
||||
Description:
|
||||
The bd_stat file is read-only and represents backing device's
|
||||
statistics (bd_count, bd_reads, bd_writes) in a format
|
||||
similar to block layer statistics file format.
|
||||
|
||||
What: /sys/block/zram<id>/writeback_limit_enable
|
||||
Date: November 2018
|
||||
Contact: Minchan Kim <minchan@kernel.org>
|
||||
Description:
|
||||
The writeback_limit_enable file is read-write and specifies
|
||||
eanbe of writeback_limit feature. "1" means eable the feature.
|
||||
No limit "0" is the initial state.
|
||||
|
||||
What: /sys/block/zram<id>/writeback_limit
|
||||
Date: November 2018
|
||||
Contact: Minchan Kim <minchan@kernel.org>
|
||||
Description:
|
||||
The writeback_limit file is read-write and specifies the maximum
|
||||
amount of writeback ZRAM can do. The limit could be changed
|
||||
in run time.
|
||||
|
||||
@@ -199,7 +199,7 @@ Description:
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_positionrelative_x_raw
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_positionrelative_y_raw
|
||||
KernelVersion: 4.19
|
||||
KernelVersion: 4.18
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Relative position in direction x or y on a pad (may be
|
||||
@@ -1559,8 +1559,7 @@ What: /sys/bus/iio/devices/iio:deviceX/in_concentrationX_voc_raw
|
||||
KernelVersion: 4.3
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Raw (unscaled no offset etc.) reading of a substance. Units
|
||||
after application of scale and offset are percents.
|
||||
Raw (unscaled no offset etc.) percentage reading of a substance.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_resistance_raw
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_resistanceX_raw
|
||||
|
||||
@@ -4,7 +4,7 @@ KernelVersion: 3.10
|
||||
Contact: Samuel Ortiz <sameo@linux.intel.com>
|
||||
linux-mei@linux.intel.com
|
||||
Description: Stores the same MODALIAS value emitted by uevent
|
||||
Format: mei:<mei device name>:<device uuid>:<protocol version>
|
||||
Format: mei:<mei device name>:<device uuid>:
|
||||
|
||||
What: /sys/bus/mei/devices/.../name
|
||||
Date: May 2015
|
||||
|
||||
@@ -7,13 +7,6 @@ Description:
|
||||
The name of devfreq object denoted as ... is same as the
|
||||
name of device using devfreq.
|
||||
|
||||
What: /sys/class/devfreq/.../name
|
||||
Date: November 2019
|
||||
Contact: Chanwoo Choi <cw00.choi@samsung.com>
|
||||
Description:
|
||||
The /sys/class/devfreq/.../name shows the name of device
|
||||
of the corresponding devfreq object.
|
||||
|
||||
What: /sys/class/devfreq/.../governor
|
||||
Date: September 2011
|
||||
Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
|
||||
|
||||
@@ -29,7 +29,7 @@ Contact: Bjørn Mork <bjorn@mork.no>
|
||||
Description:
|
||||
Unsigned integer.
|
||||
|
||||
Write a number ranging from 1 to 254 to add a qmap mux
|
||||
Write a number ranging from 1 to 127 to add a qmap mux
|
||||
based network device, supported by recent Qualcomm based
|
||||
modems.
|
||||
|
||||
@@ -46,5 +46,5 @@ Contact: Bjørn Mork <bjorn@mork.no>
|
||||
Description:
|
||||
Unsigned integer.
|
||||
|
||||
Write a number ranging from 1 to 254 to delete a previously
|
||||
Write a number ranging from 1 to 127 to delete a previously
|
||||
created qmap mux based network device.
|
||||
|
||||
@@ -144,8 +144,7 @@ Description:
|
||||
Access: Read
|
||||
Valid values: "Unknown", "Good", "Overheat", "Dead",
|
||||
"Over voltage", "Unspecified failure", "Cold",
|
||||
"Watchdog timer expire", "Safety timer expire",
|
||||
"Over current", "Warm", "Cool", "Hot"
|
||||
"Watchdog timer expire", "Safety timer expire"
|
||||
|
||||
What: /sys/class/power_supply/<supply_name>/precharge_current
|
||||
Date: June 2017
|
||||
|
||||
@@ -1,76 +0,0 @@
|
||||
What: /sys/class/wakeup/
|
||||
Date: June 2019
|
||||
Contact: Tri Vo <trong@android.com>
|
||||
Description:
|
||||
The /sys/class/wakeup/ directory contains pointers to all
|
||||
wakeup sources in the kernel at that moment in time.
|
||||
|
||||
What: /sys/class/wakeup/.../name
|
||||
Date: June 2019
|
||||
Contact: Tri Vo <trong@android.com>
|
||||
Description:
|
||||
This file contains the name of the wakeup source.
|
||||
|
||||
What: /sys/class/wakeup/.../active_count
|
||||
Date: June 2019
|
||||
Contact: Tri Vo <trong@android.com>
|
||||
Description:
|
||||
This file contains the number of times the wakeup source was
|
||||
activated.
|
||||
|
||||
What: /sys/class/wakeup/.../event_count
|
||||
Date: June 2019
|
||||
Contact: Tri Vo <trong@android.com>
|
||||
Description:
|
||||
This file contains the number of signaled wakeup events
|
||||
associated with the wakeup source.
|
||||
|
||||
What: /sys/class/wakeup/.../wakeup_count
|
||||
Date: June 2019
|
||||
Contact: Tri Vo <trong@android.com>
|
||||
Description:
|
||||
This file contains the number of times the wakeup source might
|
||||
abort suspend.
|
||||
|
||||
What: /sys/class/wakeup/.../expire_count
|
||||
Date: June 2019
|
||||
Contact: Tri Vo <trong@android.com>
|
||||
Description:
|
||||
This file contains the number of times the wakeup source's
|
||||
timeout has expired.
|
||||
|
||||
What: /sys/class/wakeup/.../active_time_ms
|
||||
Date: June 2019
|
||||
Contact: Tri Vo <trong@android.com>
|
||||
Description:
|
||||
This file contains the amount of time the wakeup source has
|
||||
been continuously active, in milliseconds. If the wakeup
|
||||
source is not active, this file contains '0'.
|
||||
|
||||
What: /sys/class/wakeup/.../total_time_ms
|
||||
Date: June 2019
|
||||
Contact: Tri Vo <trong@android.com>
|
||||
Description:
|
||||
This file contains the total amount of time this wakeup source
|
||||
has been active, in milliseconds.
|
||||
|
||||
What: /sys/class/wakeup/.../max_time_ms
|
||||
Date: June 2019
|
||||
Contact: Tri Vo <trong@android.com>
|
||||
Description:
|
||||
This file contains the maximum amount of time this wakeup
|
||||
source has been continuously active, in milliseconds.
|
||||
|
||||
What: /sys/class/wakeup/.../last_change_ms
|
||||
Date: June 2019
|
||||
Contact: Tri Vo <trong@android.com>
|
||||
Description:
|
||||
This file contains the monotonic clock time when the wakeup
|
||||
source was touched last time, in milliseconds.
|
||||
|
||||
What: /sys/class/wakeup/.../prevent_suspend_time_ms
|
||||
Date: June 2019
|
||||
Contact: Tri Vo <trong@android.com>
|
||||
Description:
|
||||
The file contains the total amount of time this wakeup source
|
||||
has been preventing autosleep, in milliseconds.
|
||||
@@ -477,10 +477,6 @@ What: /sys/devices/system/cpu/vulnerabilities
|
||||
/sys/devices/system/cpu/vulnerabilities/spectre_v2
|
||||
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
|
||||
/sys/devices/system/cpu/vulnerabilities/l1tf
|
||||
/sys/devices/system/cpu/vulnerabilities/mds
|
||||
/sys/devices/system/cpu/vulnerabilities/srbds
|
||||
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
|
||||
/sys/devices/system/cpu/vulnerabilities/itlb_multihit
|
||||
Date: January 2018
|
||||
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
|
||||
Description: Information about CPU vulnerabilities
|
||||
@@ -493,7 +489,8 @@ Description: Information about CPU vulnerabilities
|
||||
"Vulnerable" CPU is affected and no mitigation in effect
|
||||
"Mitigation: $M" CPU is affected and mitigation $M is in effect
|
||||
|
||||
See also: Documentation/admin-guide/hw-vuln/index.rst
|
||||
Details about the l1tf file can be found in
|
||||
Documentation/admin-guide/l1tf.rst
|
||||
|
||||
What: /sys/devices/system/cpu/smt
|
||||
/sys/devices/system/cpu/smt/active
|
||||
|
||||
@@ -1,349 +1,214 @@
|
||||
What: /sys/fs/f2fs/<disk>/gc_max_sleep_time
|
||||
Date: July 2013
|
||||
Contact: "Namjae Jeon" <namjae.jeon@samsung.com>
|
||||
Description: Controls the maximum sleep time for gc_thread. Time
|
||||
is in milliseconds.
|
||||
Description:
|
||||
Controls the maximun sleep time for gc_thread. Time
|
||||
is in milliseconds.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/gc_min_sleep_time
|
||||
Date: July 2013
|
||||
Contact: "Namjae Jeon" <namjae.jeon@samsung.com>
|
||||
Description: Controls the minimum sleep time for gc_thread. Time
|
||||
is in milliseconds.
|
||||
Description:
|
||||
Controls the minimum sleep time for gc_thread. Time
|
||||
is in milliseconds.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/gc_no_gc_sleep_time
|
||||
Date: July 2013
|
||||
Contact: "Namjae Jeon" <namjae.jeon@samsung.com>
|
||||
Description: Controls the default sleep time for gc_thread. Time
|
||||
is in milliseconds.
|
||||
Description:
|
||||
Controls the default sleep time for gc_thread. Time
|
||||
is in milliseconds.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/gc_idle
|
||||
Date: July 2013
|
||||
Contact: "Namjae Jeon" <namjae.jeon@samsung.com>
|
||||
Description: Controls the victim selection policy for garbage collection.
|
||||
Setting gc_idle = 0(default) will disable this option. Setting
|
||||
gc_idle = 1 will select the Cost Benefit approach & setting
|
||||
gc_idle = 2 will select the greedy approach.
|
||||
Description:
|
||||
Controls the victim selection policy for garbage collection.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/reclaim_segments
|
||||
Date: October 2013
|
||||
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
|
||||
Description: This parameter controls the number of prefree segments to be
|
||||
reclaimed. If the number of prefree segments is larger than
|
||||
the number of segments in the proportion to the percentage
|
||||
over total volume size, f2fs tries to conduct checkpoint to
|
||||
reclaim the prefree segments to free segments.
|
||||
By default, 5% over total # of segments.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/main_blkaddr
|
||||
Date: November 2019
|
||||
Contact: "Ramon Pantin" <pantin@google.com>
|
||||
Description:
|
||||
Shows first block address of MAIN area.
|
||||
Controls the issue rate of segment discard commands.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/ipu_policy
|
||||
Date: November 2013
|
||||
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
|
||||
Description: Controls the in-place-update policy.
|
||||
updates in f2fs. User can set:
|
||||
0x01: F2FS_IPU_FORCE, 0x02: F2FS_IPU_SSR,
|
||||
0x04: F2FS_IPU_UTIL, 0x08: F2FS_IPU_SSR_UTIL,
|
||||
0x10: F2FS_IPU_FSYNC, 0x20: F2FS_IPU_ASYNC,
|
||||
0x40: F2FS_IPU_NOCACHE.
|
||||
Refer segment.h for details.
|
||||
Description:
|
||||
Controls the in-place-update policy.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/min_ipu_util
|
||||
Date: November 2013
|
||||
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
|
||||
Description: Controls the FS utilization condition for the in-place-update
|
||||
policies. It is used by F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies.
|
||||
Description:
|
||||
Controls the FS utilization condition for the in-place-update
|
||||
policies.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/min_fsync_blocks
|
||||
Date: September 2014
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Controls the dirty page count condition for the in-place-update
|
||||
policies.
|
||||
Description:
|
||||
Controls the dirty page count condition for the in-place-update
|
||||
policies.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/min_seq_blocks
|
||||
Date: August 2018
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Controls the dirty page count condition for batched sequential
|
||||
writes in writepages.
|
||||
Description:
|
||||
Controls the dirty page count condition for batched sequential
|
||||
writes in ->writepages.
|
||||
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/min_hot_blocks
|
||||
Date: March 2017
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Controls the dirty page count condition for redefining hot data.
|
||||
Description:
|
||||
Controls the dirty page count condition for redefining hot data.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/min_ssr_sections
|
||||
Date: October 2017
|
||||
Contact: "Chao Yu" <yuchao0@huawei.com>
|
||||
Description: Controls the free section threshold to trigger SSR allocation.
|
||||
If this is large, SSR mode will be enabled early.
|
||||
Description:
|
||||
Controls the fee section threshold to trigger SSR allocation.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/max_small_discards
|
||||
Date: November 2013
|
||||
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
|
||||
Description: Controls the issue rate of discard commands that consist of small
|
||||
blocks less than 2MB. The candidates to be discarded are cached until
|
||||
checkpoint is triggered, and issued during the checkpoint.
|
||||
By default, it is disabled with 0.
|
||||
Description:
|
||||
Controls the issue rate of small discard commands.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/discard_granularity
|
||||
Date: July 2017
|
||||
Contact: "Chao Yu" <yuchao0@huawei.com>
|
||||
Description: Controls discard granularity of inner discard thread. Inner thread
|
||||
What: /sys/fs/f2fs/<disk>/discard_granularity
|
||||
Date: July 2017
|
||||
Contact: "Chao Yu" <yuchao0@huawei.com>
|
||||
Description:
|
||||
Controls discard granularity of inner discard thread, inner thread
|
||||
will not issue discards with size that is smaller than granularity.
|
||||
The unit size is one block(4KB), now only support configuring
|
||||
in range of [1, 512]. Default value is 4(=16KB).
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/umount_discard_timeout
|
||||
Date: January 2019
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Set timeout to issue discard commands during umount.
|
||||
Default: 5 secs
|
||||
The unit size is one block, now only support configuring in range
|
||||
of [1, 512].
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/max_victim_search
|
||||
Date: January 2014
|
||||
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
|
||||
Description: Controls the number of trials to find a victim segment
|
||||
when conducting SSR and cleaning operations. The default value
|
||||
is 4096 which covers 8GB block address range.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/migration_granularity
|
||||
Date: October 2018
|
||||
Contact: "Chao Yu" <yuchao0@huawei.com>
|
||||
Description: Controls migration granularity of garbage collection on large
|
||||
section, it can let GC move partial segment{s} of one section
|
||||
in one GC cycle, so that dispersing heavy overhead GC to
|
||||
multiple lightweight one.
|
||||
Description:
|
||||
Controls the number of trials to find a victim segment.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/dir_level
|
||||
Date: March 2014
|
||||
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
|
||||
Description: Controls the directory level for large directory. If a
|
||||
directory has a number of files, it can reduce the file lookup
|
||||
latency by increasing this dir_level value. Otherwise, it
|
||||
needs to decrease this value to reduce the space overhead.
|
||||
The default value is 0.
|
||||
Description:
|
||||
Controls the directory level for large directory.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/ram_thresh
|
||||
Date: March 2014
|
||||
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
|
||||
Description: Controls the memory footprint used by free nids and cached
|
||||
nat entries. By default, 1 is set, which indicates
|
||||
10 MB / 1 GB RAM.
|
||||
Description:
|
||||
Controls the memory footprint used by f2fs.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/batched_trim_sections
|
||||
Date: February 2015
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Controls the trimming rate in batch mode.
|
||||
<deprecated>
|
||||
Description:
|
||||
Controls the trimming rate in batch mode.
|
||||
<deprecated>
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/cp_interval
|
||||
Date: October 2015
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Controls the checkpoint timing, set to 60 seconds by default.
|
||||
Description:
|
||||
Controls the checkpoint timing.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/idle_interval
|
||||
Date: January 2016
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Controls the idle timing of system, if there is no FS operation
|
||||
during given interval.
|
||||
Set to 5 seconds by default.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/discard_idle_interval
|
||||
Date: September 2018
|
||||
Contact: "Chao Yu" <yuchao0@huawei.com>
|
||||
Contact: "Sahitya Tummala" <stummala@codeaurora.org>
|
||||
Description: Controls the idle timing of discard thread given
|
||||
this time interval.
|
||||
Default is 5 secs.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/gc_idle_interval
|
||||
Date: September 2018
|
||||
Contact: "Chao Yu" <yuchao0@huawei.com>
|
||||
Contact: "Sahitya Tummala" <stummala@codeaurora.org>
|
||||
Description: Controls the idle timing for gc path. Set to 5 seconds by default.
|
||||
Description:
|
||||
Controls the idle timing.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/iostat_enable
|
||||
Date: August 2017
|
||||
Contact: "Chao Yu" <yuchao0@huawei.com>
|
||||
Description: Controls to enable/disable IO stat.
|
||||
Description:
|
||||
Controls to enable/disable IO stat.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/ra_nid_pages
|
||||
Date: October 2015
|
||||
Contact: "Chao Yu" <chao2.yu@samsung.com>
|
||||
Description: Controls the count of nid pages to be readaheaded.
|
||||
When building free nids, F2FS reads NAT blocks ahead for
|
||||
speed up. Default is 0.
|
||||
Description:
|
||||
Controls the count of nid pages to be readaheaded.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/dirty_nats_ratio
|
||||
Date: January 2016
|
||||
Contact: "Chao Yu" <chao2.yu@samsung.com>
|
||||
Description: Controls dirty nat entries ratio threshold, if current
|
||||
ratio exceeds configured threshold, checkpoint will
|
||||
be triggered for flushing dirty nat entries.
|
||||
Description:
|
||||
Controls dirty nat entries ratio threshold, if current
|
||||
ratio exceeds configured threshold, checkpoint will
|
||||
be triggered for flushing dirty nat entries.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/lifetime_write_kbytes
|
||||
Date: January 2016
|
||||
Contact: "Shuoran Liu" <liushuoran@huawei.com>
|
||||
Description: Shows total written kbytes issued to disk.
|
||||
Description:
|
||||
Shows total written kbytes issued to disk.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/features
|
||||
Date: July 2017
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Shows all enabled features in current device.
|
||||
Description:
|
||||
Shows all enabled features in current device.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/inject_rate
|
||||
Date: May 2016
|
||||
Contact: "Sheng Yong" <shengyong1@huawei.com>
|
||||
Description: Controls the injection rate of arbitrary faults.
|
||||
Description:
|
||||
Controls the injection rate.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/inject_type
|
||||
Date: May 2016
|
||||
Contact: "Sheng Yong" <shengyong1@huawei.com>
|
||||
Description: Controls the injection type of arbitrary faults.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/dirty_segments
|
||||
Date: October 2017
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Shows the number of dirty segments.
|
||||
Description:
|
||||
Controls the injection type.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/reserved_blocks
|
||||
Date: June 2017
|
||||
Contact: "Chao Yu" <yuchao0@huawei.com>
|
||||
Description: Controls target reserved blocks in system, the threshold
|
||||
is soft, it could exceed current available user space.
|
||||
Description:
|
||||
Controls target reserved blocks in system, the threshold
|
||||
is soft, it could exceed current available user space.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/current_reserved_blocks
|
||||
Date: October 2017
|
||||
Contact: "Yunlong Song" <yunlong.song@huawei.com>
|
||||
Contact: "Chao Yu" <yuchao0@huawei.com>
|
||||
Description: Shows current reserved blocks in system, it may be temporarily
|
||||
smaller than target_reserved_blocks, but will gradually
|
||||
increase to target_reserved_blocks when more free blocks are
|
||||
freed by user later.
|
||||
Description:
|
||||
Shows current reserved blocks in system, it may be temporarily
|
||||
smaller than target_reserved_blocks, but will gradually
|
||||
increase to target_reserved_blocks when more free blocks are
|
||||
freed by user later.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/gc_urgent
|
||||
Date: August 2017
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Do background GC agressively when set. When gc_urgent = 1,
|
||||
background thread starts to do GC by given gc_urgent_sleep_time
|
||||
interval. It is set to 0 by default.
|
||||
Description:
|
||||
Do background GC agressively
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/gc_urgent_sleep_time
|
||||
Date: August 2017
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Controls sleep time of GC urgent mode. Set to 500ms by default.
|
||||
Description:
|
||||
Controls sleep time of GC urgent mode
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/readdir_ra
|
||||
Date: November 2017
|
||||
Contact: "Sheng Yong" <shengyong1@huawei.com>
|
||||
Description: Controls readahead inode block in readdir. Enabled by default.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/gc_pin_file_thresh
|
||||
Date: January 2018
|
||||
Contact: Jaegeuk Kim <jaegeuk@kernel.org>
|
||||
Description: This indicates how many GC can be failed for the pinned
|
||||
file. If it exceeds this, F2FS doesn't guarantee its pinning
|
||||
state. 2048 trials is set by default.
|
||||
Description:
|
||||
Controls readahead inode block in readdir.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/extension_list
|
||||
Date: Feburary 2018
|
||||
Contact: "Chao Yu" <yuchao0@huawei.com>
|
||||
Description: Used to control configure extension list:
|
||||
- Query: cat /sys/fs/f2fs/<disk>/extension_list
|
||||
- Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
|
||||
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
|
||||
- [h] means add/del hot file extension
|
||||
- [c] means add/del cold file extension
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/unusable
|
||||
Date April 2019
|
||||
Contact: "Daniel Rosenberg" <drosen@google.com>
|
||||
Description: If checkpoint=disable, it displays the number of blocks that
|
||||
are unusable.
|
||||
If checkpoint=enable it displays the enumber of blocks that
|
||||
would be unusable if checkpoint=disable were to be set.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/encoding
|
||||
Date July 2019
|
||||
Contact: "Daniel Rosenberg" <drosen@google.com>
|
||||
Description: Displays name and version of the encoding set for the filesystem.
|
||||
If no encoding is set, displays (none)
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/free_segments
|
||||
Date: September 2019
|
||||
Contact: "Hridya Valsaraju" <hridya@google.com>
|
||||
Description: Number of free segments in disk.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/cp_foreground_calls
|
||||
Date: September 2019
|
||||
Contact: "Hridya Valsaraju" <hridya@google.com>
|
||||
Description: Number of checkpoint operations performed on demand. Available when
|
||||
CONFIG_F2FS_STAT_FS=y.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/cp_background_calls
|
||||
Date: September 2019
|
||||
Contact: "Hridya Valsaraju" <hridya@google.com>
|
||||
Description: Number of checkpoint operations performed in the background to
|
||||
free segments. Available when CONFIG_F2FS_STAT_FS=y.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/gc_foreground_calls
|
||||
Date: September 2019
|
||||
Contact: "Hridya Valsaraju" <hridya@google.com>
|
||||
Description: Number of garbage collection operations performed on demand.
|
||||
Available when CONFIG_F2FS_STAT_FS=y.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/gc_background_calls
|
||||
Date: September 2019
|
||||
Contact: "Hridya Valsaraju" <hridya@google.com>
|
||||
Description: Number of garbage collection operations triggered in background.
|
||||
Available when CONFIG_F2FS_STAT_FS=y.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/moved_blocks_foreground
|
||||
Date: September 2019
|
||||
Contact: "Hridya Valsaraju" <hridya@google.com>
|
||||
Description: Number of blocks moved by garbage collection in foreground.
|
||||
Available when CONFIG_F2FS_STAT_FS=y.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/moved_blocks_background
|
||||
Date: September 2019
|
||||
Contact: "Hridya Valsaraju" <hridya@google.com>
|
||||
Description: Number of blocks moved by garbage collection in background.
|
||||
Available when CONFIG_F2FS_STAT_FS=y.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/avg_vblocks
|
||||
Date: September 2019
|
||||
Contact: "Hridya Valsaraju" <hridya@google.com>
|
||||
Description: Average number of valid blocks.
|
||||
Available when CONFIG_F2FS_STAT_FS=y.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/mounted_time_sec
|
||||
Date: February 2020
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Show the mounted time in secs of this partition.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/data_io_flag
|
||||
Date: April 2020
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Give a way to attach REQ_META|FUA to data writes
|
||||
given temperature-based bits. Now the bits indicate:
|
||||
* REQ_META | REQ_FUA |
|
||||
* 5 | 4 | 3 | 2 | 1 | 0 |
|
||||
* Cold | Warm | Hot | Cold | Warm | Hot |
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/node_io_flag
|
||||
Date: June 2020
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Give a way to attach REQ_META|FUA to node writes
|
||||
given temperature-based bits. Now the bits indicate:
|
||||
* REQ_META | REQ_FUA |
|
||||
* 5 | 4 | 3 | 2 | 1 | 0 |
|
||||
* Cold | Warm | Hot | Cold | Warm | Hot |
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/iostat_period_ms
|
||||
Date: April 2020
|
||||
Contact: "Daeho Jeong" <daehojeong@google.com>
|
||||
Description: Give a way to change iostat_period time. 3secs by default.
|
||||
The new iostat trace gives stats gap given the period.
|
||||
Description:
|
||||
Used to control configure extension list:
|
||||
- Query: cat /sys/fs/f2fs/<disk>/extension_list
|
||||
- Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
|
||||
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
|
||||
- [h] means add/del hot file extension
|
||||
- [c] means add/del cold file extension
|
||||
|
||||
@@ -1,27 +0,0 @@
|
||||
What: /sys/kernel/ion
|
||||
Date: Dec 2019
|
||||
KernelVersion: 4.14.158
|
||||
Contact: Suren Baghdasaryan <surenb@google.com>,
|
||||
Sandeep Patil <sspatil@google.com>
|
||||
Description:
|
||||
The /sys/kernel/ion directory contains a snapshot of the
|
||||
internal state of ION memory heaps and pools.
|
||||
Users: kernel memory tuning tools
|
||||
|
||||
What: /sys/kernel/ion/total_heaps_kb
|
||||
Date: Dec 2019
|
||||
KernelVersion: 4.14.158
|
||||
Contact: Suren Baghdasaryan <surenb@google.com>,
|
||||
Sandeep Patil <sspatil@google.com>
|
||||
Description:
|
||||
The total_heaps_kb file is read-only and specifies how much
|
||||
memory in Kb is allocated to ION heaps.
|
||||
|
||||
What: /sys/kernel/ion/total_pools_kb
|
||||
Date: Dec 2019
|
||||
KernelVersion: 4.14.158
|
||||
Contact: Suren Baghdasaryan <surenb@google.com>,
|
||||
Sandeep Patil <sspatil@google.com>
|
||||
Description:
|
||||
The total_pools_kb file is read-only and specifies how much
|
||||
memory in Kb is allocated to ION pools.
|
||||
@@ -1,16 +0,0 @@
|
||||
What: /sys/kernel/wakeup_reasons/last_resume_reason
|
||||
Date: February 2014
|
||||
Contact: Ruchi Kandoi <kandoiruchi@google.com>
|
||||
Description:
|
||||
The /sys/kernel/wakeup_reasons/last_resume_reason is
|
||||
used to report wakeup reasons after system exited suspend.
|
||||
|
||||
What: /sys/kernel/wakeup_reasons/last_suspend_time
|
||||
Date: March 2015
|
||||
Contact: jinqian <jinqian@google.com>
|
||||
Description:
|
||||
The /sys/kernel/wakeup_reasons/last_suspend_time is
|
||||
used to report time spent in last suspend cycle. It contains
|
||||
two numbers (in seconds) separated by space. First number is
|
||||
the time spent in suspend and resume processes. Second number
|
||||
is the time spent in sleep state.
|
||||
@@ -300,110 +300,4 @@ Description:
|
||||
attempt.
|
||||
|
||||
Using this sysfs file will override any values that were
|
||||
set using the kernel command line for disk offset.
|
||||
|
||||
What: /sys/power/suspend_stats
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats directory contains suspend related
|
||||
statistics.
|
||||
|
||||
What: /sys/power/suspend_stats/success
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats/success file contains the number
|
||||
of times entering system sleep state succeeded.
|
||||
|
||||
What: /sys/power/suspend_stats/fail
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats/fail file contains the number
|
||||
of times entering system sleep state failed.
|
||||
|
||||
What: /sys/power/suspend_stats/failed_freeze
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats/failed_freeze file contains the
|
||||
number of times freezing processes failed.
|
||||
|
||||
What: /sys/power/suspend_stats/failed_prepare
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats/failed_prepare file contains the
|
||||
number of times preparing all non-sysdev devices for
|
||||
a system PM transition failed.
|
||||
|
||||
What: /sys/power/suspend_stats/failed_resume
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats/failed_resume file contains the
|
||||
number of times executing "resume" callbacks of
|
||||
non-sysdev devices failed.
|
||||
|
||||
What: /sys/power/suspend_stats/failed_resume_early
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats/failed_resume_early file contains
|
||||
the number of times executing "early resume" callbacks
|
||||
of devices failed.
|
||||
|
||||
What: /sys/power/suspend_stats/failed_resume_noirq
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats/failed_resume_noirq file contains
|
||||
the number of times executing "noirq resume" callbacks
|
||||
of devices failed.
|
||||
|
||||
What: /sys/power/suspend_stats/failed_suspend
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats/failed_suspend file contains
|
||||
the number of times executing "suspend" callbacks
|
||||
of all non-sysdev devices failed.
|
||||
|
||||
What: /sys/power/suspend_stats/failed_suspend_late
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats/failed_suspend_late file contains
|
||||
the number of times executing "late suspend" callbacks
|
||||
of all devices failed.
|
||||
|
||||
What: /sys/power/suspend_stats/failed_suspend_noirq
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats/failed_suspend_noirq file contains
|
||||
the number of times executing "noirq suspend" callbacks
|
||||
of all devices failed.
|
||||
|
||||
What: /sys/power/suspend_stats/last_failed_dev
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats/last_failed_dev file contains
|
||||
the last device for which a suspend/resume callback failed.
|
||||
|
||||
What: /sys/power/suspend_stats/last_failed_errno
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats/last_failed_errno file contains
|
||||
the errno of the last failed attempt at entering
|
||||
system sleep state.
|
||||
|
||||
What: /sys/power/suspend_stats/last_failed_step
|
||||
Date: July 2019
|
||||
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
|
||||
Description:
|
||||
The /sys/power/suspend_stats/last_failed_step file contains
|
||||
the last failed step in the suspend/resume path.
|
||||
set using the kernel command line for disk offset.
|
||||
@@ -1,180 +0,0 @@
|
||||
================================
|
||||
PSI - Pressure Stall Information
|
||||
================================
|
||||
|
||||
:Date: April, 2018
|
||||
:Author: Johannes Weiner <hannes@cmpxchg.org>
|
||||
|
||||
When CPU, memory or IO devices are contended, workloads experience
|
||||
latency spikes, throughput losses, and run the risk of OOM kills.
|
||||
|
||||
Without an accurate measure of such contention, users are forced to
|
||||
either play it safe and under-utilize their hardware resources, or
|
||||
roll the dice and frequently suffer the disruptions resulting from
|
||||
excessive overcommit.
|
||||
|
||||
The psi feature identifies and quantifies the disruptions caused by
|
||||
such resource crunches and the time impact it has on complex workloads
|
||||
or even entire systems.
|
||||
|
||||
Having an accurate measure of productivity losses caused by resource
|
||||
scarcity aids users in sizing workloads to hardware--or provisioning
|
||||
hardware according to workload demand.
|
||||
|
||||
As psi aggregates this information in realtime, systems can be managed
|
||||
dynamically using techniques such as load shedding, migrating jobs to
|
||||
other systems or data centers, or strategically pausing or killing low
|
||||
priority or restartable batch jobs.
|
||||
|
||||
This allows maximizing hardware utilization without sacrificing
|
||||
workload health or risking major disruptions such as OOM kills.
|
||||
|
||||
Pressure interface
|
||||
==================
|
||||
|
||||
Pressure information for each resource is exported through the
|
||||
respective file in /proc/pressure/ -- cpu, memory, and io.
|
||||
|
||||
The format for CPU is as such:
|
||||
|
||||
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||||
|
||||
and for memory and IO:
|
||||
|
||||
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||||
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||||
|
||||
The "some" line indicates the share of time in which at least some
|
||||
tasks are stalled on a given resource.
|
||||
|
||||
The "full" line indicates the share of time in which all non-idle
|
||||
tasks are stalled on a given resource simultaneously. In this state
|
||||
actual CPU cycles are going to waste, and a workload that spends
|
||||
extended time in this state is considered to be thrashing. This has
|
||||
severe impact on performance, and it's useful to distinguish this
|
||||
situation from a state where some tasks are stalled but the CPU is
|
||||
still doing productive work. As such, time spent in this subset of the
|
||||
stall state is tracked separately and exported in the "full" averages.
|
||||
|
||||
The ratios are tracked as recent trends over ten, sixty, and three
|
||||
hundred second windows, which gives insight into short term events as
|
||||
well as medium and long term trends. The total absolute stall time is
|
||||
tracked and exported as well, to allow detection of latency spikes
|
||||
which wouldn't necessarily make a dent in the time averages, or to
|
||||
average trends over custom time frames.
|
||||
|
||||
Monitoring for pressure thresholds
|
||||
==================================
|
||||
|
||||
Users can register triggers and use poll() to be woken up when resource
|
||||
pressure exceeds certain thresholds.
|
||||
|
||||
A trigger describes the maximum cumulative stall time over a specific
|
||||
time window, e.g. 100ms of total stall time within any 500ms window to
|
||||
generate a wakeup event.
|
||||
|
||||
To register a trigger user has to open psi interface file under
|
||||
/proc/pressure/ representing the resource to be monitored and write the
|
||||
desired threshold and time window. The open file descriptor should be
|
||||
used to wait for trigger events using select(), poll() or epoll().
|
||||
The following format is used:
|
||||
|
||||
<some|full> <stall amount in us> <time window in us>
|
||||
|
||||
For example writing "some 150000 1000000" into /proc/pressure/memory
|
||||
would add 150ms threshold for partial memory stall measured within
|
||||
1sec time window. Writing "full 50000 1000000" into /proc/pressure/io
|
||||
would add 50ms threshold for full io stall measured within 1sec time window.
|
||||
|
||||
Triggers can be set on more than one psi metric and more than one trigger
|
||||
for the same psi metric can be specified. However for each trigger a separate
|
||||
file descriptor is required to be able to poll it separately from others,
|
||||
therefore for each trigger a separate open() syscall should be made even
|
||||
when opening the same psi interface file.
|
||||
|
||||
Monitors activate only when system enters stall state for the monitored
|
||||
psi metric and deactivates upon exit from the stall state. While system is
|
||||
in the stall state psi signal growth is monitored at a rate of 10 times per
|
||||
tracking window.
|
||||
|
||||
The kernel accepts window sizes ranging from 500ms to 10s, therefore min
|
||||
monitoring update interval is 50ms and max is 1s. Min limit is set to
|
||||
prevent overly frequent polling. Max limit is chosen as a high enough number
|
||||
after which monitors are most likely not needed and psi averages can be used
|
||||
instead.
|
||||
|
||||
When activated, psi monitor stays active for at least the duration of one
|
||||
tracking window to avoid repeated activations/deactivations when system is
|
||||
bouncing in and out of the stall state.
|
||||
|
||||
Notifications to the userspace are rate-limited to one per tracking window.
|
||||
|
||||
The trigger will de-register when the file descriptor used to define the
|
||||
trigger is closed.
|
||||
|
||||
Userspace monitor usage example
|
||||
===============================
|
||||
|
||||
#include <errno.h>
|
||||
#include <fcntl.h>
|
||||
#include <stdio.h>
|
||||
#include <poll.h>
|
||||
#include <string.h>
|
||||
#include <unistd.h>
|
||||
|
||||
/*
|
||||
* Monitor memory partial stall with 1s tracking window size
|
||||
* and 150ms threshold.
|
||||
*/
|
||||
int main() {
|
||||
const char trig[] = "some 150000 1000000";
|
||||
struct pollfd fds;
|
||||
int n;
|
||||
|
||||
fds.fd = open("/proc/pressure/memory", O_RDWR | O_NONBLOCK);
|
||||
if (fds.fd < 0) {
|
||||
printf("/proc/pressure/memory open error: %s\n",
|
||||
strerror(errno));
|
||||
return 1;
|
||||
}
|
||||
fds.events = POLLPRI;
|
||||
|
||||
if (write(fds.fd, trig, strlen(trig) + 1) < 0) {
|
||||
printf("/proc/pressure/memory write error: %s\n",
|
||||
strerror(errno));
|
||||
return 1;
|
||||
}
|
||||
|
||||
printf("waiting for events...\n");
|
||||
while (1) {
|
||||
n = poll(&fds, 1, -1);
|
||||
if (n < 0) {
|
||||
printf("poll error: %s\n", strerror(errno));
|
||||
return 1;
|
||||
}
|
||||
if (fds.revents & POLLERR) {
|
||||
printf("got POLLERR, event source is gone\n");
|
||||
return 0;
|
||||
}
|
||||
if (fds.revents & POLLPRI) {
|
||||
printf("event triggered!\n");
|
||||
} else {
|
||||
printf("unknown event received: 0x%x\n", fds.revents);
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
Cgroup2 interface
|
||||
=================
|
||||
|
||||
In a system with a CONFIG_CGROUP=y kernel and the cgroup2 filesystem
|
||||
mounted, pressure stall information is also tracked for tasks grouped
|
||||
into cgroups. Each subdirectory in the cgroupfs mountpoint contains
|
||||
cpu.pressure, memory.pressure, and io.pressure files; the format is
|
||||
the same as the /proc/pressure/ files.
|
||||
|
||||
Per-cgroup psi monitors can be specified and used the same way as
|
||||
system-wide ones.
|
||||
@@ -694,12 +694,6 @@ Conventions
|
||||
informational files on the root cgroup which end up showing global
|
||||
information available elsewhere shouldn't exist.
|
||||
|
||||
- The default time unit is microseconds. If a different unit is ever
|
||||
used, an explicit unit suffix must be present.
|
||||
|
||||
- A parts-per quantity should use a percentage decimal with at least
|
||||
two digit fractional part - e.g. 13.40.
|
||||
|
||||
- If a controller implements weight based resource distribution, its
|
||||
interface file should be named "weight" and have the range [1,
|
||||
10000] with 100 as the default. The values are chosen to allow
|
||||
@@ -913,13 +907,6 @@ controller implements weight and absolute bandwidth limit models for
|
||||
normal scheduling policy and absolute bandwidth allocation model for
|
||||
realtime scheduling policy.
|
||||
|
||||
In all the above models, cycles distribution is defined only on a temporal
|
||||
base and it does not account for the frequency at which tasks are executed.
|
||||
The (optional) utilization clamping support allows to hint the schedutil
|
||||
cpufreq governor about the minimum desired frequency which should always be
|
||||
provided by a CPU, as well as the maximum desired frequency, which should not
|
||||
be exceeded by a CPU.
|
||||
|
||||
WARNING: cgroup2 doesn't yet support control of realtime processes and
|
||||
the cpu controller can only be enabled when all RT processes are in
|
||||
the root cgroup. Be aware that system management software may already
|
||||
@@ -979,39 +966,6 @@ All time durations are in microseconds.
|
||||
$PERIOD duration. "max" for $MAX indicates no limit. If only
|
||||
one number is written, $MAX is updated.
|
||||
|
||||
cpu.pressure
|
||||
A read-only nested-key file which exists on non-root cgroups.
|
||||
|
||||
Shows pressure stall information for CPU. See
|
||||
Documentation/accounting/psi.txt for details.
|
||||
|
||||
cpu.uclamp.min
|
||||
A read-write single value file which exists on non-root cgroups.
|
||||
The default is "0", i.e. no utilization boosting.
|
||||
|
||||
The requested minimum utilization (protection) as a percentage
|
||||
rational number, e.g. 12.34 for 12.34%.
|
||||
|
||||
This interface allows reading and setting minimum utilization clamp
|
||||
values similar to the sched_setattr(2). This minimum utilization
|
||||
value is used to clamp the task specific minimum utilization clamp.
|
||||
|
||||
The requested minimum utilization (protection) is always capped by
|
||||
the current value for the maximum utilization (limit), i.e.
|
||||
`cpu.uclamp.max`.
|
||||
|
||||
cpu.uclamp.max
|
||||
A read-write single value file which exists on non-root cgroups.
|
||||
The default is "max". i.e. no utilization capping
|
||||
|
||||
The requested maximum utilization (limit) as a percentage rational
|
||||
number, e.g. 98.76 for 98.76%.
|
||||
|
||||
This interface allows reading and setting maximum utilization clamp
|
||||
values similar to the sched_setattr(2). This maximum utilization
|
||||
value is used to clamp the task specific maximum utilization clamp.
|
||||
|
||||
|
||||
|
||||
Memory
|
||||
------
|
||||
@@ -1317,12 +1271,6 @@ PAGE_SIZE multiple when read back.
|
||||
higher than the limit for an extended period of time. This
|
||||
reduces the impact on the workload and memory management.
|
||||
|
||||
memory.pressure
|
||||
A read-only nested-key file which exists on non-root cgroups.
|
||||
|
||||
Shows pressure stall information for memory. See
|
||||
Documentation/accounting/psi.txt for details.
|
||||
|
||||
|
||||
Usage Guidelines
|
||||
~~~~~~~~~~~~~~~~
|
||||
@@ -1460,12 +1408,6 @@ IO Interface Files
|
||||
|
||||
8:16 rbps=2097152 wbps=max riops=max wiops=max
|
||||
|
||||
io.pressure
|
||||
A read-only nested-key file which exists on non-root cgroups.
|
||||
|
||||
Shows pressure stall information for IO. See
|
||||
Documentation/accounting/psi.txt for details.
|
||||
|
||||
|
||||
Writeback
|
||||
~~~~~~~~~
|
||||
|
||||
@@ -54,9 +54,6 @@ If you make a mistake with the syntax, the write will fail thus::
|
||||
<debugfs>/dynamic_debug/control
|
||||
-bash: echo: write error: Invalid argument
|
||||
|
||||
Note, for systems without 'debugfs' enabled, the control file can be
|
||||
found in ``/proc/dynamic_debug/control``.
|
||||
|
||||
Viewing Dynamic Debug Behaviour
|
||||
===============================
|
||||
|
||||
|
||||
@@ -1,17 +0,0 @@
|
||||
========================
|
||||
Hardware vulnerabilities
|
||||
========================
|
||||
|
||||
This section describes CPU vulnerabilities and provides an overview of the
|
||||
possible mitigations along with guidance for selecting mitigations if they
|
||||
are configurable at compile, boot or run time.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
spectre
|
||||
l1tf
|
||||
mds
|
||||
tsx_async_abort
|
||||
multihit.rst
|
||||
special-register-buffer-data-sampling.rst
|
||||
@@ -1,311 +0,0 @@
|
||||
MDS - Microarchitectural Data Sampling
|
||||
======================================
|
||||
|
||||
Microarchitectural Data Sampling is a hardware vulnerability which allows
|
||||
unprivileged speculative access to data which is available in various CPU
|
||||
internal buffers.
|
||||
|
||||
Affected processors
|
||||
-------------------
|
||||
|
||||
This vulnerability affects a wide range of Intel processors. The
|
||||
vulnerability is not present on:
|
||||
|
||||
- Processors from AMD, Centaur and other non Intel vendors
|
||||
|
||||
- Older processor models, where the CPU family is < 6
|
||||
|
||||
- Some Atoms (Bonnell, Saltwell, Goldmont, GoldmontPlus)
|
||||
|
||||
- Intel processors which have the ARCH_CAP_MDS_NO bit set in the
|
||||
IA32_ARCH_CAPABILITIES MSR.
|
||||
|
||||
Whether a processor is affected or not can be read out from the MDS
|
||||
vulnerability file in sysfs. See :ref:`mds_sys_info`.
|
||||
|
||||
Not all processors are affected by all variants of MDS, but the mitigation
|
||||
is identical for all of them so the kernel treats them as a single
|
||||
vulnerability.
|
||||
|
||||
Related CVEs
|
||||
------------
|
||||
|
||||
The following CVE entries are related to the MDS vulnerability:
|
||||
|
||||
============== ===== ===================================================
|
||||
CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling
|
||||
CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling
|
||||
CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling
|
||||
CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory
|
||||
============== ===== ===================================================
|
||||
|
||||
Problem
|
||||
-------
|
||||
|
||||
When performing store, load, L1 refill operations, processors write data
|
||||
into temporary microarchitectural structures (buffers). The data in the
|
||||
buffer can be forwarded to load operations as an optimization.
|
||||
|
||||
Under certain conditions, usually a fault/assist caused by a load
|
||||
operation, data unrelated to the load memory address can be speculatively
|
||||
forwarded from the buffers. Because the load operation causes a fault or
|
||||
assist and its result will be discarded, the forwarded data will not cause
|
||||
incorrect program execution or state changes. But a malicious operation
|
||||
may be able to forward this speculative data to a disclosure gadget which
|
||||
allows in turn to infer the value via a cache side channel attack.
|
||||
|
||||
Because the buffers are potentially shared between Hyper-Threads cross
|
||||
Hyper-Thread attacks are possible.
|
||||
|
||||
Deeper technical information is available in the MDS specific x86
|
||||
architecture section: :ref:`Documentation/x86/mds.rst <mds>`.
|
||||
|
||||
|
||||
Attack scenarios
|
||||
----------------
|
||||
|
||||
Attacks against the MDS vulnerabilities can be mounted from malicious non
|
||||
priviledged user space applications running on hosts or guest. Malicious
|
||||
guest OSes can obviously mount attacks as well.
|
||||
|
||||
Contrary to other speculation based vulnerabilities the MDS vulnerability
|
||||
does not allow the attacker to control the memory target address. As a
|
||||
consequence the attacks are purely sampling based, but as demonstrated with
|
||||
the TLBleed attack samples can be postprocessed successfully.
|
||||
|
||||
Web-Browsers
|
||||
^^^^^^^^^^^^
|
||||
|
||||
It's unclear whether attacks through Web-Browsers are possible at
|
||||
all. The exploitation through Java-Script is considered very unlikely,
|
||||
but other widely used web technologies like Webassembly could possibly be
|
||||
abused.
|
||||
|
||||
|
||||
.. _mds_sys_info:
|
||||
|
||||
MDS system information
|
||||
-----------------------
|
||||
|
||||
The Linux kernel provides a sysfs interface to enumerate the current MDS
|
||||
status of the system: whether the system is vulnerable, and which
|
||||
mitigations are active. The relevant sysfs file is:
|
||||
|
||||
/sys/devices/system/cpu/vulnerabilities/mds
|
||||
|
||||
The possible values in this file are:
|
||||
|
||||
.. list-table::
|
||||
|
||||
* - 'Not affected'
|
||||
- The processor is not vulnerable
|
||||
* - 'Vulnerable'
|
||||
- The processor is vulnerable, but no mitigation enabled
|
||||
* - 'Vulnerable: Clear CPU buffers attempted, no microcode'
|
||||
- The processor is vulnerable but microcode is not updated.
|
||||
|
||||
The mitigation is enabled on a best effort basis. See :ref:`vmwerv`
|
||||
* - 'Mitigation: Clear CPU buffers'
|
||||
- The processor is vulnerable and the CPU buffer clearing mitigation is
|
||||
enabled.
|
||||
|
||||
If the processor is vulnerable then the following information is appended
|
||||
to the above information:
|
||||
|
||||
======================== ============================================
|
||||
'SMT vulnerable' SMT is enabled
|
||||
'SMT mitigated' SMT is enabled and mitigated
|
||||
'SMT disabled' SMT is disabled
|
||||
'SMT Host state unknown' Kernel runs in a VM, Host SMT state unknown
|
||||
======================== ============================================
|
||||
|
||||
.. _vmwerv:
|
||||
|
||||
Best effort mitigation mode
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If the processor is vulnerable, but the availability of the microcode based
|
||||
mitigation mechanism is not advertised via CPUID the kernel selects a best
|
||||
effort mitigation mode. This mode invokes the mitigation instructions
|
||||
without a guarantee that they clear the CPU buffers.
|
||||
|
||||
This is done to address virtualization scenarios where the host has the
|
||||
microcode update applied, but the hypervisor is not yet updated to expose
|
||||
the CPUID to the guest. If the host has updated microcode the protection
|
||||
takes effect otherwise a few cpu cycles are wasted pointlessly.
|
||||
|
||||
The state in the mds sysfs file reflects this situation accordingly.
|
||||
|
||||
|
||||
Mitigation mechanism
|
||||
-------------------------
|
||||
|
||||
The kernel detects the affected CPUs and the presence of the microcode
|
||||
which is required.
|
||||
|
||||
If a CPU is affected and the microcode is available, then the kernel
|
||||
enables the mitigation by default. The mitigation can be controlled at boot
|
||||
time via a kernel command line option. See
|
||||
:ref:`mds_mitigation_control_command_line`.
|
||||
|
||||
.. _cpu_buffer_clear:
|
||||
|
||||
CPU buffer clearing
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The mitigation for MDS clears the affected CPU buffers on return to user
|
||||
space and when entering a guest.
|
||||
|
||||
If SMT is enabled it also clears the buffers on idle entry when the CPU
|
||||
is only affected by MSBDS and not any other MDS variant, because the
|
||||
other variants cannot be protected against cross Hyper-Thread attacks.
|
||||
|
||||
For CPUs which are only affected by MSBDS the user space, guest and idle
|
||||
transition mitigations are sufficient and SMT is not affected.
|
||||
|
||||
.. _virt_mechanism:
|
||||
|
||||
Virtualization mitigation
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The protection for host to guest transition depends on the L1TF
|
||||
vulnerability of the CPU:
|
||||
|
||||
- CPU is affected by L1TF:
|
||||
|
||||
If the L1D flush mitigation is enabled and up to date microcode is
|
||||
available, the L1D flush mitigation is automatically protecting the
|
||||
guest transition.
|
||||
|
||||
If the L1D flush mitigation is disabled then the MDS mitigation is
|
||||
invoked explicit when the host MDS mitigation is enabled.
|
||||
|
||||
For details on L1TF and virtualization see:
|
||||
:ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <mitigation_control_kvm>`.
|
||||
|
||||
- CPU is not affected by L1TF:
|
||||
|
||||
CPU buffers are flushed before entering the guest when the host MDS
|
||||
mitigation is enabled.
|
||||
|
||||
The resulting MDS protection matrix for the host to guest transition:
|
||||
|
||||
============ ===== ============= ============ =================
|
||||
L1TF MDS VMX-L1FLUSH Host MDS MDS-State
|
||||
|
||||
Don't care No Don't care N/A Not affected
|
||||
|
||||
Yes Yes Disabled Off Vulnerable
|
||||
|
||||
Yes Yes Disabled Full Mitigated
|
||||
|
||||
Yes Yes Enabled Don't care Mitigated
|
||||
|
||||
No Yes N/A Off Vulnerable
|
||||
|
||||
No Yes N/A Full Mitigated
|
||||
============ ===== ============= ============ =================
|
||||
|
||||
This only covers the host to guest transition, i.e. prevents leakage from
|
||||
host to guest, but does not protect the guest internally. Guests need to
|
||||
have their own protections.
|
||||
|
||||
.. _xeon_phi:
|
||||
|
||||
XEON PHI specific considerations
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The XEON PHI processor family is affected by MSBDS which can be exploited
|
||||
cross Hyper-Threads when entering idle states. Some XEON PHI variants allow
|
||||
to use MWAIT in user space (Ring 3) which opens an potential attack vector
|
||||
for malicious user space. The exposure can be disabled on the kernel
|
||||
command line with the 'ring3mwait=disable' command line option.
|
||||
|
||||
XEON PHI is not affected by the other MDS variants and MSBDS is mitigated
|
||||
before the CPU enters a idle state. As XEON PHI is not affected by L1TF
|
||||
either disabling SMT is not required for full protection.
|
||||
|
||||
.. _mds_smt_control:
|
||||
|
||||
SMT control
|
||||
^^^^^^^^^^^
|
||||
|
||||
All MDS variants except MSBDS can be attacked cross Hyper-Threads. That
|
||||
means on CPUs which are affected by MFBDS or MLPDS it is necessary to
|
||||
disable SMT for full protection. These are most of the affected CPUs; the
|
||||
exception is XEON PHI, see :ref:`xeon_phi`.
|
||||
|
||||
Disabling SMT can have a significant performance impact, but the impact
|
||||
depends on the type of workloads.
|
||||
|
||||
See the relevant chapter in the L1TF mitigation documentation for details:
|
||||
:ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`.
|
||||
|
||||
|
||||
.. _mds_mitigation_control_command_line:
|
||||
|
||||
Mitigation control on the kernel command line
|
||||
---------------------------------------------
|
||||
|
||||
The kernel command line allows to control the MDS mitigations at boot
|
||||
time with the option "mds=". The valid arguments for this option are:
|
||||
|
||||
============ =============================================================
|
||||
full If the CPU is vulnerable, enable all available mitigations
|
||||
for the MDS vulnerability, CPU buffer clearing on exit to
|
||||
userspace and when entering a VM. Idle transitions are
|
||||
protected as well if SMT is enabled.
|
||||
|
||||
It does not automatically disable SMT.
|
||||
|
||||
full,nosmt The same as mds=full, with SMT disabled on vulnerable
|
||||
CPUs. This is the complete mitigation.
|
||||
|
||||
off Disables MDS mitigations completely.
|
||||
|
||||
============ =============================================================
|
||||
|
||||
Not specifying this option is equivalent to "mds=full". For processors
|
||||
that are affected by both TAA (TSX Asynchronous Abort) and MDS,
|
||||
specifying just "mds=off" without an accompanying "tsx_async_abort=off"
|
||||
will have no effect as the same mitigation is used for both
|
||||
vulnerabilities.
|
||||
|
||||
Mitigation selection guide
|
||||
--------------------------
|
||||
|
||||
1. Trusted userspace
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If all userspace applications are from a trusted source and do not
|
||||
execute untrusted code which is supplied externally, then the mitigation
|
||||
can be disabled.
|
||||
|
||||
|
||||
2. Virtualization with trusted guests
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The same considerations as above versus trusted user space apply.
|
||||
|
||||
3. Virtualization with untrusted guests
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The protection depends on the state of the L1TF mitigations.
|
||||
See :ref:`virt_mechanism`.
|
||||
|
||||
If the MDS mitigation is enabled and SMT is disabled, guest to host and
|
||||
guest to guest attacks are prevented.
|
||||
|
||||
.. _mds_default_mitigations:
|
||||
|
||||
Default mitigations
|
||||
-------------------
|
||||
|
||||
The kernel default mitigations for vulnerable processors are:
|
||||
|
||||
- Enable CPU buffer clearing
|
||||
|
||||
The kernel does not by default enforce the disabling of SMT, which leaves
|
||||
SMT systems vulnerable when running untrusted code. The same rationale as
|
||||
for L1TF applies.
|
||||
See :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <default_mitigations>`.
|
||||
@@ -1,163 +0,0 @@
|
||||
iTLB multihit
|
||||
=============
|
||||
|
||||
iTLB multihit is an erratum where some processors may incur a machine check
|
||||
error, possibly resulting in an unrecoverable CPU lockup, when an
|
||||
instruction fetch hits multiple entries in the instruction TLB. This can
|
||||
occur when the page size is changed along with either the physical address
|
||||
or cache type. A malicious guest running on a virtualized system can
|
||||
exploit this erratum to perform a denial of service attack.
|
||||
|
||||
|
||||
Affected processors
|
||||
-------------------
|
||||
|
||||
Variations of this erratum are present on most Intel Core and Xeon processor
|
||||
models. The erratum is not present on:
|
||||
|
||||
- non-Intel processors
|
||||
|
||||
- Some Atoms (Airmont, Bonnell, Goldmont, GoldmontPlus, Saltwell, Silvermont)
|
||||
|
||||
- Intel processors that have the PSCHANGE_MC_NO bit set in the
|
||||
IA32_ARCH_CAPABILITIES MSR.
|
||||
|
||||
|
||||
Related CVEs
|
||||
------------
|
||||
|
||||
The following CVE entry is related to this issue:
|
||||
|
||||
============== =================================================
|
||||
CVE-2018-12207 Machine Check Error Avoidance on Page Size Change
|
||||
============== =================================================
|
||||
|
||||
|
||||
Problem
|
||||
-------
|
||||
|
||||
Privileged software, including OS and virtual machine managers (VMM), are in
|
||||
charge of memory management. A key component in memory management is the control
|
||||
of the page tables. Modern processors use virtual memory, a technique that creates
|
||||
the illusion of a very large memory for processors. This virtual space is split
|
||||
into pages of a given size. Page tables translate virtual addresses to physical
|
||||
addresses.
|
||||
|
||||
To reduce latency when performing a virtual to physical address translation,
|
||||
processors include a structure, called TLB, that caches recent translations.
|
||||
There are separate TLBs for instruction (iTLB) and data (dTLB).
|
||||
|
||||
Under this errata, instructions are fetched from a linear address translated
|
||||
using a 4 KB translation cached in the iTLB. Privileged software modifies the
|
||||
paging structure so that the same linear address using large page size (2 MB, 4
|
||||
MB, 1 GB) with a different physical address or memory type. After the page
|
||||
structure modification but before the software invalidates any iTLB entries for
|
||||
the linear address, a code fetch that happens on the same linear address may
|
||||
cause a machine-check error which can result in a system hang or shutdown.
|
||||
|
||||
|
||||
Attack scenarios
|
||||
----------------
|
||||
|
||||
Attacks against the iTLB multihit erratum can be mounted from malicious
|
||||
guests in a virtualized system.
|
||||
|
||||
|
||||
iTLB multihit system information
|
||||
--------------------------------
|
||||
|
||||
The Linux kernel provides a sysfs interface to enumerate the current iTLB
|
||||
multihit status of the system:whether the system is vulnerable and which
|
||||
mitigations are active. The relevant sysfs file is:
|
||||
|
||||
/sys/devices/system/cpu/vulnerabilities/itlb_multihit
|
||||
|
||||
The possible values in this file are:
|
||||
|
||||
.. list-table::
|
||||
|
||||
* - Not affected
|
||||
- The processor is not vulnerable.
|
||||
* - KVM: Mitigation: Split huge pages
|
||||
- Software changes mitigate this issue.
|
||||
* - KVM: Vulnerable
|
||||
- The processor is vulnerable, but no mitigation enabled
|
||||
|
||||
|
||||
Enumeration of the erratum
|
||||
--------------------------------
|
||||
|
||||
A new bit has been allocated in the IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) msr
|
||||
and will be set on CPU's which are mitigated against this issue.
|
||||
|
||||
======================================= =========== ===============================
|
||||
IA32_ARCH_CAPABILITIES MSR Not present Possibly vulnerable,check model
|
||||
IA32_ARCH_CAPABILITIES[PSCHANGE_MC_NO] '0' Likely vulnerable,check model
|
||||
IA32_ARCH_CAPABILITIES[PSCHANGE_MC_NO] '1' Not vulnerable
|
||||
======================================= =========== ===============================
|
||||
|
||||
|
||||
Mitigation mechanism
|
||||
-------------------------
|
||||
|
||||
This erratum can be mitigated by restricting the use of large page sizes to
|
||||
non-executable pages. This forces all iTLB entries to be 4K, and removes
|
||||
the possibility of multiple hits.
|
||||
|
||||
In order to mitigate the vulnerability, KVM initially marks all huge pages
|
||||
as non-executable. If the guest attempts to execute in one of those pages,
|
||||
the page is broken down into 4K pages, which are then marked executable.
|
||||
|
||||
If EPT is disabled or not available on the host, KVM is in control of TLB
|
||||
flushes and the problematic situation cannot happen. However, the shadow
|
||||
EPT paging mechanism used by nested virtualization is vulnerable, because
|
||||
the nested guest can trigger multiple iTLB hits by modifying its own
|
||||
(non-nested) page tables. For simplicity, KVM will make large pages
|
||||
non-executable in all shadow paging modes.
|
||||
|
||||
Mitigation control on the kernel command line and KVM - module parameter
|
||||
------------------------------------------------------------------------
|
||||
|
||||
The KVM hypervisor mitigation mechanism for marking huge pages as
|
||||
non-executable can be controlled with a module parameter "nx_huge_pages=".
|
||||
The kernel command line allows to control the iTLB multihit mitigations at
|
||||
boot time with the option "kvm.nx_huge_pages=".
|
||||
|
||||
The valid arguments for these options are:
|
||||
|
||||
========== ================================================================
|
||||
force Mitigation is enabled. In this case, the mitigation implements
|
||||
non-executable huge pages in Linux kernel KVM module. All huge
|
||||
pages in the EPT are marked as non-executable.
|
||||
If a guest attempts to execute in one of those pages, the page is
|
||||
broken down into 4K pages, which are then marked executable.
|
||||
|
||||
off Mitigation is disabled.
|
||||
|
||||
auto Enable mitigation only if the platform is affected and the kernel
|
||||
was not booted with the "mitigations=off" command line parameter.
|
||||
This is the default option.
|
||||
========== ================================================================
|
||||
|
||||
|
||||
Mitigation selection guide
|
||||
--------------------------
|
||||
|
||||
1. No virtualization in use
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The system is protected by the kernel unconditionally and no further
|
||||
action is required.
|
||||
|
||||
2. Virtualization with trusted guests
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If the guest comes from a trusted source, you may assume that the guest will
|
||||
not attempt to maliciously exploit these errata and no further action is
|
||||
required.
|
||||
|
||||
3. Virtualization with untrusted guests
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
If the guest comes from an untrusted source, the guest host kernel will need
|
||||
to apply iTLB multihit mitigation via the kernel command line or kvm
|
||||
module parameter.
|
||||
@@ -1,149 +0,0 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
SRBDS - Special Register Buffer Data Sampling
|
||||
=============================================
|
||||
|
||||
SRBDS is a hardware vulnerability that allows MDS :doc:`mds` techniques to
|
||||
infer values returned from special register accesses. Special register
|
||||
accesses are accesses to off core registers. According to Intel's evaluation,
|
||||
the special register reads that have a security expectation of privacy are
|
||||
RDRAND, RDSEED and SGX EGETKEY.
|
||||
|
||||
When RDRAND, RDSEED and EGETKEY instructions are used, the data is moved
|
||||
to the core through the special register mechanism that is susceptible
|
||||
to MDS attacks.
|
||||
|
||||
Affected processors
|
||||
--------------------
|
||||
Core models (desktop, mobile, Xeon-E3) that implement RDRAND and/or RDSEED may
|
||||
be affected.
|
||||
|
||||
A processor is affected by SRBDS if its Family_Model and stepping is
|
||||
in the following list, with the exception of the listed processors
|
||||
exporting MDS_NO while Intel TSX is available yet not enabled. The
|
||||
latter class of processors are only affected when Intel TSX is enabled
|
||||
by software using TSX_CTRL_MSR otherwise they are not affected.
|
||||
|
||||
============= ============ ========
|
||||
common name Family_Model Stepping
|
||||
============= ============ ========
|
||||
IvyBridge 06_3AH All
|
||||
|
||||
Haswell 06_3CH All
|
||||
Haswell_L 06_45H All
|
||||
Haswell_G 06_46H All
|
||||
|
||||
Broadwell_G 06_47H All
|
||||
Broadwell 06_3DH All
|
||||
|
||||
Skylake_L 06_4EH All
|
||||
Skylake 06_5EH All
|
||||
|
||||
Kabylake_L 06_8EH <= 0xC
|
||||
Kabylake 06_9EH <= 0xD
|
||||
============= ============ ========
|
||||
|
||||
Related CVEs
|
||||
------------
|
||||
|
||||
The following CVE entry is related to this SRBDS issue:
|
||||
|
||||
============== ===== =====================================
|
||||
CVE-2020-0543 SRBDS Special Register Buffer Data Sampling
|
||||
============== ===== =====================================
|
||||
|
||||
Attack scenarios
|
||||
----------------
|
||||
An unprivileged user can extract values returned from RDRAND and RDSEED
|
||||
executed on another core or sibling thread using MDS techniques.
|
||||
|
||||
|
||||
Mitigation mechanism
|
||||
-------------------
|
||||
Intel will release microcode updates that modify the RDRAND, RDSEED, and
|
||||
EGETKEY instructions to overwrite secret special register data in the shared
|
||||
staging buffer before the secret data can be accessed by another logical
|
||||
processor.
|
||||
|
||||
During execution of the RDRAND, RDSEED, or EGETKEY instructions, off-core
|
||||
accesses from other logical processors will be delayed until the special
|
||||
register read is complete and the secret data in the shared staging buffer is
|
||||
overwritten.
|
||||
|
||||
This has three effects on performance:
|
||||
|
||||
#. RDRAND, RDSEED, or EGETKEY instructions have higher latency.
|
||||
|
||||
#. Executing RDRAND at the same time on multiple logical processors will be
|
||||
serialized, resulting in an overall reduction in the maximum RDRAND
|
||||
bandwidth.
|
||||
|
||||
#. Executing RDRAND, RDSEED or EGETKEY will delay memory accesses from other
|
||||
logical processors that miss their core caches, with an impact similar to
|
||||
legacy locked cache-line-split accesses.
|
||||
|
||||
The microcode updates provide an opt-out mechanism (RNGDS_MITG_DIS) to disable
|
||||
the mitigation for RDRAND and RDSEED instructions executed outside of Intel
|
||||
Software Guard Extensions (Intel SGX) enclaves. On logical processors that
|
||||
disable the mitigation using this opt-out mechanism, RDRAND and RDSEED do not
|
||||
take longer to execute and do not impact performance of sibling logical
|
||||
processors memory accesses. The opt-out mechanism does not affect Intel SGX
|
||||
enclaves (including execution of RDRAND or RDSEED inside an enclave, as well
|
||||
as EGETKEY execution).
|
||||
|
||||
IA32_MCU_OPT_CTRL MSR Definition
|
||||
--------------------------------
|
||||
Along with the mitigation for this issue, Intel added a new thread-scope
|
||||
IA32_MCU_OPT_CTRL MSR, (address 0x123). The presence of this MSR and
|
||||
RNGDS_MITG_DIS (bit 0) is enumerated by CPUID.(EAX=07H,ECX=0).EDX[SRBDS_CTRL =
|
||||
9]==1. This MSR is introduced through the microcode update.
|
||||
|
||||
Setting IA32_MCU_OPT_CTRL[0] (RNGDS_MITG_DIS) to 1 for a logical processor
|
||||
disables the mitigation for RDRAND and RDSEED executed outside of an Intel SGX
|
||||
enclave on that logical processor. Opting out of the mitigation for a
|
||||
particular logical processor does not affect the RDRAND and RDSEED mitigations
|
||||
for other logical processors.
|
||||
|
||||
Note that inside of an Intel SGX enclave, the mitigation is applied regardless
|
||||
of the value of RNGDS_MITG_DS.
|
||||
|
||||
Mitigation control on the kernel command line
|
||||
---------------------------------------------
|
||||
The kernel command line allows control over the SRBDS mitigation at boot time
|
||||
with the option "srbds=". The option for this is:
|
||||
|
||||
============= =============================================================
|
||||
off This option disables SRBDS mitigation for RDRAND and RDSEED on
|
||||
affected platforms.
|
||||
============= =============================================================
|
||||
|
||||
SRBDS System Information
|
||||
-----------------------
|
||||
The Linux kernel provides vulnerability status information through sysfs. For
|
||||
SRBDS this can be accessed by the following sysfs file:
|
||||
/sys/devices/system/cpu/vulnerabilities/srbds
|
||||
|
||||
The possible values contained in this file are:
|
||||
|
||||
============================== =============================================
|
||||
Not affected Processor not vulnerable
|
||||
Vulnerable Processor vulnerable and mitigation disabled
|
||||
Vulnerable: No microcode Processor vulnerable and microcode is missing
|
||||
mitigation
|
||||
Mitigation: Microcode Processor is vulnerable and mitigation is in
|
||||
effect.
|
||||
Mitigation: TSX disabled Processor is only vulnerable when TSX is
|
||||
enabled while this system was booted with TSX
|
||||
disabled.
|
||||
Unknown: Dependent on
|
||||
hypervisor status Running on virtual guest processor that is
|
||||
affected but with no way to know if host
|
||||
processor is mitigated or vulnerable.
|
||||
============================== =============================================
|
||||
|
||||
SRBDS Default mitigation
|
||||
------------------------
|
||||
This new microcode serializes processor access during execution of RDRAND,
|
||||
RDSEED ensures that the shared buffer is overwritten before it is released for
|
||||
reuse. Use the "srbds=off" kernel command line to disable the mitigation for
|
||||
RDRAND and RDSEED.
|
||||
@@ -1,769 +0,0 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
Spectre Side Channels
|
||||
=====================
|
||||
|
||||
Spectre is a class of side channel attacks that exploit branch prediction
|
||||
and speculative execution on modern CPUs to read memory, possibly
|
||||
bypassing access controls. Speculative execution side channel exploits
|
||||
do not modify memory but attempt to infer privileged data in the memory.
|
||||
|
||||
This document covers Spectre variant 1 and Spectre variant 2.
|
||||
|
||||
Affected processors
|
||||
-------------------
|
||||
|
||||
Speculative execution side channel methods affect a wide range of modern
|
||||
high performance processors, since most modern high speed processors
|
||||
use branch prediction and speculative execution.
|
||||
|
||||
The following CPUs are vulnerable:
|
||||
|
||||
- Intel Core, Atom, Pentium, and Xeon processors
|
||||
|
||||
- AMD Phenom, EPYC, and Zen processors
|
||||
|
||||
- IBM POWER and zSeries processors
|
||||
|
||||
- Higher end ARM processors
|
||||
|
||||
- Apple CPUs
|
||||
|
||||
- Higher end MIPS CPUs
|
||||
|
||||
- Likely most other high performance CPUs. Contact your CPU vendor for details.
|
||||
|
||||
Whether a processor is affected or not can be read out from the Spectre
|
||||
vulnerability files in sysfs. See :ref:`spectre_sys_info`.
|
||||
|
||||
Related CVEs
|
||||
------------
|
||||
|
||||
The following CVE entries describe Spectre variants:
|
||||
|
||||
============= ======================= ==========================
|
||||
CVE-2017-5753 Bounds check bypass Spectre variant 1
|
||||
CVE-2017-5715 Branch target injection Spectre variant 2
|
||||
CVE-2019-1125 Spectre v1 swapgs Spectre variant 1 (swapgs)
|
||||
============= ======================= ==========================
|
||||
|
||||
Problem
|
||||
-------
|
||||
|
||||
CPUs use speculative operations to improve performance. That may leave
|
||||
traces of memory accesses or computations in the processor's caches,
|
||||
buffers, and branch predictors. Malicious software may be able to
|
||||
influence the speculative execution paths, and then use the side effects
|
||||
of the speculative execution in the CPUs' caches and buffers to infer
|
||||
privileged data touched during the speculative execution.
|
||||
|
||||
Spectre variant 1 attacks take advantage of speculative execution of
|
||||
conditional branches, while Spectre variant 2 attacks use speculative
|
||||
execution of indirect branches to leak privileged memory.
|
||||
See :ref:`[1] <spec_ref1>` :ref:`[5] <spec_ref5>` :ref:`[7] <spec_ref7>`
|
||||
:ref:`[10] <spec_ref10>` :ref:`[11] <spec_ref11>`.
|
||||
|
||||
Spectre variant 1 (Bounds Check Bypass)
|
||||
---------------------------------------
|
||||
|
||||
The bounds check bypass attack :ref:`[2] <spec_ref2>` takes advantage
|
||||
of speculative execution that bypasses conditional branch instructions
|
||||
used for memory access bounds check (e.g. checking if the index of an
|
||||
array results in memory access within a valid range). This results in
|
||||
memory accesses to invalid memory (with out-of-bound index) that are
|
||||
done speculatively before validation checks resolve. Such speculative
|
||||
memory accesses can leave side effects, creating side channels which
|
||||
leak information to the attacker.
|
||||
|
||||
There are some extensions of Spectre variant 1 attacks for reading data
|
||||
over the network, see :ref:`[12] <spec_ref12>`. However such attacks
|
||||
are difficult, low bandwidth, fragile, and are considered low risk.
|
||||
|
||||
Note that, despite "Bounds Check Bypass" name, Spectre variant 1 is not
|
||||
only about user-controlled array bounds checks. It can affect any
|
||||
conditional checks. The kernel entry code interrupt, exception, and NMI
|
||||
handlers all have conditional swapgs checks. Those may be problematic
|
||||
in the context of Spectre v1, as kernel code can speculatively run with
|
||||
a user GS.
|
||||
|
||||
Spectre variant 2 (Branch Target Injection)
|
||||
-------------------------------------------
|
||||
|
||||
The branch target injection attack takes advantage of speculative
|
||||
execution of indirect branches :ref:`[3] <spec_ref3>`. The indirect
|
||||
branch predictors inside the processor used to guess the target of
|
||||
indirect branches can be influenced by an attacker, causing gadget code
|
||||
to be speculatively executed, thus exposing sensitive data touched by
|
||||
the victim. The side effects left in the CPU's caches during speculative
|
||||
execution can be measured to infer data values.
|
||||
|
||||
.. _poison_btb:
|
||||
|
||||
In Spectre variant 2 attacks, the attacker can steer speculative indirect
|
||||
branches in the victim to gadget code by poisoning the branch target
|
||||
buffer of a CPU used for predicting indirect branch addresses. Such
|
||||
poisoning could be done by indirect branching into existing code,
|
||||
with the address offset of the indirect branch under the attacker's
|
||||
control. Since the branch prediction on impacted hardware does not
|
||||
fully disambiguate branch address and uses the offset for prediction,
|
||||
this could cause privileged code's indirect branch to jump to a gadget
|
||||
code with the same offset.
|
||||
|
||||
The most useful gadgets take an attacker-controlled input parameter (such
|
||||
as a register value) so that the memory read can be controlled. Gadgets
|
||||
without input parameters might be possible, but the attacker would have
|
||||
very little control over what memory can be read, reducing the risk of
|
||||
the attack revealing useful data.
|
||||
|
||||
One other variant 2 attack vector is for the attacker to poison the
|
||||
return stack buffer (RSB) :ref:`[13] <spec_ref13>` to cause speculative
|
||||
subroutine return instruction execution to go to a gadget. An attacker's
|
||||
imbalanced subroutine call instructions might "poison" entries in the
|
||||
return stack buffer which are later consumed by a victim's subroutine
|
||||
return instructions. This attack can be mitigated by flushing the return
|
||||
stack buffer on context switch, or virtual machine (VM) exit.
|
||||
|
||||
On systems with simultaneous multi-threading (SMT), attacks are possible
|
||||
from the sibling thread, as level 1 cache and branch target buffer
|
||||
(BTB) may be shared between hardware threads in a CPU core. A malicious
|
||||
program running on the sibling thread may influence its peer's BTB to
|
||||
steer its indirect branch speculations to gadget code, and measure the
|
||||
speculative execution's side effects left in level 1 cache to infer the
|
||||
victim's data.
|
||||
|
||||
Attack scenarios
|
||||
----------------
|
||||
|
||||
The following list of attack scenarios have been anticipated, but may
|
||||
not cover all possible attack vectors.
|
||||
|
||||
1. A user process attacking the kernel
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Spectre variant 1
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
The attacker passes a parameter to the kernel via a register or
|
||||
via a known address in memory during a syscall. Such parameter may
|
||||
be used later by the kernel as an index to an array or to derive
|
||||
a pointer for a Spectre variant 1 attack. The index or pointer
|
||||
is invalid, but bound checks are bypassed in the code branch taken
|
||||
for speculative execution. This could cause privileged memory to be
|
||||
accessed and leaked.
|
||||
|
||||
For kernel code that has been identified where data pointers could
|
||||
potentially be influenced for Spectre attacks, new "nospec" accessor
|
||||
macros are used to prevent speculative loading of data.
|
||||
|
||||
Spectre variant 1 (swapgs)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
An attacker can train the branch predictor to speculatively skip the
|
||||
swapgs path for an interrupt or exception. If they initialize
|
||||
the GS register to a user-space value, if the swapgs is speculatively
|
||||
skipped, subsequent GS-related percpu accesses in the speculation
|
||||
window will be done with the attacker-controlled GS value. This
|
||||
could cause privileged memory to be accessed and leaked.
|
||||
|
||||
For example:
|
||||
|
||||
::
|
||||
|
||||
if (coming from user space)
|
||||
swapgs
|
||||
mov %gs:<percpu_offset>, %reg
|
||||
mov (%reg), %reg1
|
||||
|
||||
When coming from user space, the CPU can speculatively skip the
|
||||
swapgs, and then do a speculative percpu load using the user GS
|
||||
value. So the user can speculatively force a read of any kernel
|
||||
value. If a gadget exists which uses the percpu value as an address
|
||||
in another load/store, then the contents of the kernel value may
|
||||
become visible via an L1 side channel attack.
|
||||
|
||||
A similar attack exists when coming from kernel space. The CPU can
|
||||
speculatively do the swapgs, causing the user GS to get used for the
|
||||
rest of the speculative window.
|
||||
|
||||
Spectre variant 2
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
A spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
|
||||
target buffer (BTB) before issuing syscall to launch an attack.
|
||||
After entering the kernel, the kernel could use the poisoned branch
|
||||
target buffer on indirect jump and jump to gadget code in speculative
|
||||
execution.
|
||||
|
||||
If an attacker tries to control the memory addresses leaked during
|
||||
speculative execution, he would also need to pass a parameter to the
|
||||
gadget, either through a register or a known address in memory. After
|
||||
the gadget has executed, he can measure the side effect.
|
||||
|
||||
The kernel can protect itself against consuming poisoned branch
|
||||
target buffer entries by using return trampolines (also known as
|
||||
"retpoline") :ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` for all
|
||||
indirect branches. Return trampolines trap speculative execution paths
|
||||
to prevent jumping to gadget code during speculative execution.
|
||||
x86 CPUs with Enhanced Indirect Branch Restricted Speculation
|
||||
(Enhanced IBRS) available in hardware should use the feature to
|
||||
mitigate Spectre variant 2 instead of retpoline. Enhanced IBRS is
|
||||
more efficient than retpoline.
|
||||
|
||||
There may be gadget code in firmware which could be exploited with
|
||||
Spectre variant 2 attack by a rogue user process. To mitigate such
|
||||
attacks on x86, Indirect Branch Restricted Speculation (IBRS) feature
|
||||
is turned on before the kernel invokes any firmware code.
|
||||
|
||||
2. A user process attacking another user process
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
A malicious user process can try to attack another user process,
|
||||
either via a context switch on the same hardware thread, or from the
|
||||
sibling hyperthread sharing a physical processor core on simultaneous
|
||||
multi-threading (SMT) system.
|
||||
|
||||
Spectre variant 1 attacks generally require passing parameters
|
||||
between the processes, which needs a data passing relationship, such
|
||||
as remote procedure calls (RPC). Those parameters are used in gadget
|
||||
code to derive invalid data pointers accessing privileged memory in
|
||||
the attacked process.
|
||||
|
||||
Spectre variant 2 attacks can be launched from a rogue process by
|
||||
:ref:`poisoning <poison_btb>` the branch target buffer. This can
|
||||
influence the indirect branch targets for a victim process that either
|
||||
runs later on the same hardware thread, or running concurrently on
|
||||
a sibling hardware thread sharing the same physical core.
|
||||
|
||||
A user process can protect itself against Spectre variant 2 attacks
|
||||
by using the prctl() syscall to disable indirect branch speculation
|
||||
for itself. An administrator can also cordon off an unsafe process
|
||||
from polluting the branch target buffer by disabling the process's
|
||||
indirect branch speculation. This comes with a performance cost
|
||||
from not using indirect branch speculation and clearing the branch
|
||||
target buffer. When SMT is enabled on x86, for a process that has
|
||||
indirect branch speculation disabled, Single Threaded Indirect Branch
|
||||
Predictors (STIBP) :ref:`[4] <spec_ref4>` are turned on to prevent the
|
||||
sibling thread from controlling branch target buffer. In addition,
|
||||
the Indirect Branch Prediction Barrier (IBPB) is issued to clear the
|
||||
branch target buffer when context switching to and from such process.
|
||||
|
||||
On x86, the return stack buffer is stuffed on context switch.
|
||||
This prevents the branch target buffer from being used for branch
|
||||
prediction when the return stack buffer underflows while switching to
|
||||
a deeper call stack. Any poisoned entries in the return stack buffer
|
||||
left by the previous process will also be cleared.
|
||||
|
||||
User programs should use address space randomization to make attacks
|
||||
more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2).
|
||||
|
||||
3. A virtualized guest attacking the host
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The attack mechanism is similar to how user processes attack the
|
||||
kernel. The kernel is entered via hyper-calls or other virtualization
|
||||
exit paths.
|
||||
|
||||
For Spectre variant 1 attacks, rogue guests can pass parameters
|
||||
(e.g. in registers) via hyper-calls to derive invalid pointers to
|
||||
speculate into privileged memory after entering the kernel. For places
|
||||
where such kernel code has been identified, nospec accessor macros
|
||||
are used to stop speculative memory access.
|
||||
|
||||
For Spectre variant 2 attacks, rogue guests can :ref:`poison
|
||||
<poison_btb>` the branch target buffer or return stack buffer, causing
|
||||
the kernel to jump to gadget code in the speculative execution paths.
|
||||
|
||||
To mitigate variant 2, the host kernel can use return trampolines
|
||||
for indirect branches to bypass the poisoned branch target buffer,
|
||||
and flushing the return stack buffer on VM exit. This prevents rogue
|
||||
guests from affecting indirect branching in the host kernel.
|
||||
|
||||
To protect host processes from rogue guests, host processes can have
|
||||
indirect branch speculation disabled via prctl(). The branch target
|
||||
buffer is cleared before context switching to such processes.
|
||||
|
||||
4. A virtualized guest attacking other guest
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
A rogue guest may attack another guest to get data accessible by the
|
||||
other guest.
|
||||
|
||||
Spectre variant 1 attacks are possible if parameters can be passed
|
||||
between guests. This may be done via mechanisms such as shared memory
|
||||
or message passing. Such parameters could be used to derive data
|
||||
pointers to privileged data in guest. The privileged data could be
|
||||
accessed by gadget code in the victim's speculation paths.
|
||||
|
||||
Spectre variant 2 attacks can be launched from a rogue guest by
|
||||
:ref:`poisoning <poison_btb>` the branch target buffer or the return
|
||||
stack buffer. Such poisoned entries could be used to influence
|
||||
speculation execution paths in the victim guest.
|
||||
|
||||
Linux kernel mitigates attacks to other guests running in the same
|
||||
CPU hardware thread by flushing the return stack buffer on VM exit,
|
||||
and clearing the branch target buffer before switching to a new guest.
|
||||
|
||||
If SMT is used, Spectre variant 2 attacks from an untrusted guest
|
||||
in the sibling hyperthread can be mitigated by the administrator,
|
||||
by turning off the unsafe guest's indirect branch speculation via
|
||||
prctl(). A guest can also protect itself by turning on microcode
|
||||
based mitigations (such as IBPB or STIBP on x86) within the guest.
|
||||
|
||||
.. _spectre_sys_info:
|
||||
|
||||
Spectre system information
|
||||
--------------------------
|
||||
|
||||
The Linux kernel provides a sysfs interface to enumerate the current
|
||||
mitigation status of the system for Spectre: whether the system is
|
||||
vulnerable, and which mitigations are active.
|
||||
|
||||
The sysfs file showing Spectre variant 1 mitigation status is:
|
||||
|
||||
/sys/devices/system/cpu/vulnerabilities/spectre_v1
|
||||
|
||||
The possible values in this file are:
|
||||
|
||||
.. list-table::
|
||||
|
||||
* - 'Not affected'
|
||||
- The processor is not vulnerable.
|
||||
* - 'Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers'
|
||||
- The swapgs protections are disabled; otherwise it has
|
||||
protection in the kernel on a case by case base with explicit
|
||||
pointer sanitation and usercopy LFENCE barriers.
|
||||
* - 'Mitigation: usercopy/swapgs barriers and __user pointer sanitization'
|
||||
- Protection in the kernel on a case by case base with explicit
|
||||
pointer sanitation, usercopy LFENCE barriers, and swapgs LFENCE
|
||||
barriers.
|
||||
|
||||
However, the protections are put in place on a case by case basis,
|
||||
and there is no guarantee that all possible attack vectors for Spectre
|
||||
variant 1 are covered.
|
||||
|
||||
The spectre_v2 kernel file reports if the kernel has been compiled with
|
||||
retpoline mitigation or if the CPU has hardware mitigation, and if the
|
||||
CPU has support for additional process-specific mitigation.
|
||||
|
||||
This file also reports CPU features enabled by microcode to mitigate
|
||||
attack between user processes:
|
||||
|
||||
1. Indirect Branch Prediction Barrier (IBPB) to add additional
|
||||
isolation between processes of different users.
|
||||
2. Single Thread Indirect Branch Predictors (STIBP) to add additional
|
||||
isolation between CPU threads running on the same core.
|
||||
|
||||
These CPU features may impact performance when used and can be enabled
|
||||
per process on a case-by-case base.
|
||||
|
||||
The sysfs file showing Spectre variant 2 mitigation status is:
|
||||
|
||||
/sys/devices/system/cpu/vulnerabilities/spectre_v2
|
||||
|
||||
The possible values in this file are:
|
||||
|
||||
- Kernel status:
|
||||
|
||||
==================================== =================================
|
||||
'Not affected' The processor is not vulnerable
|
||||
'Vulnerable' Vulnerable, no mitigation
|
||||
'Mitigation: Full generic retpoline' Software-focused mitigation
|
||||
'Mitigation: Full AMD retpoline' AMD-specific software mitigation
|
||||
'Mitigation: Enhanced IBRS' Hardware-focused mitigation
|
||||
==================================== =================================
|
||||
|
||||
- Firmware status: Show if Indirect Branch Restricted Speculation (IBRS) is
|
||||
used to protect against Spectre variant 2 attacks when calling firmware (x86 only).
|
||||
|
||||
========== =============================================================
|
||||
'IBRS_FW' Protection against user program attacks when calling firmware
|
||||
========== =============================================================
|
||||
|
||||
- Indirect branch prediction barrier (IBPB) status for protection between
|
||||
processes of different users. This feature can be controlled through
|
||||
prctl() per process, or through kernel command line options. This is
|
||||
an x86 only feature. For more details see below.
|
||||
|
||||
=================== ========================================================
|
||||
'IBPB: disabled' IBPB unused
|
||||
'IBPB: always-on' Use IBPB on all tasks
|
||||
'IBPB: conditional' Use IBPB on SECCOMP or indirect branch restricted tasks
|
||||
=================== ========================================================
|
||||
|
||||
- Single threaded indirect branch prediction (STIBP) status for protection
|
||||
between different hyper threads. This feature can be controlled through
|
||||
prctl per process, or through kernel command line options. This is x86
|
||||
only feature. For more details see below.
|
||||
|
||||
==================== ========================================================
|
||||
'STIBP: disabled' STIBP unused
|
||||
'STIBP: forced' Use STIBP on all tasks
|
||||
'STIBP: conditional' Use STIBP on SECCOMP or indirect branch restricted tasks
|
||||
==================== ========================================================
|
||||
|
||||
- Return stack buffer (RSB) protection status:
|
||||
|
||||
============= ===========================================
|
||||
'RSB filling' Protection of RSB on context switch enabled
|
||||
============= ===========================================
|
||||
|
||||
Full mitigation might require a microcode update from the CPU
|
||||
vendor. When the necessary microcode is not available, the kernel will
|
||||
report vulnerability.
|
||||
|
||||
Turning on mitigation for Spectre variant 1 and Spectre variant 2
|
||||
-----------------------------------------------------------------
|
||||
|
||||
1. Kernel mitigation
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Spectre variant 1
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
For the Spectre variant 1, vulnerable kernel code (as determined
|
||||
by code audit or scanning tools) is annotated on a case by case
|
||||
basis to use nospec accessor macros for bounds clipping :ref:`[2]
|
||||
<spec_ref2>` to avoid any usable disclosure gadgets. However, it may
|
||||
not cover all attack vectors for Spectre variant 1.
|
||||
|
||||
Copy-from-user code has an LFENCE barrier to prevent the access_ok()
|
||||
check from being mis-speculated. The barrier is done by the
|
||||
barrier_nospec() macro.
|
||||
|
||||
For the swapgs variant of Spectre variant 1, LFENCE barriers are
|
||||
added to interrupt, exception and NMI entry where needed. These
|
||||
barriers are done by the FENCE_SWAPGS_KERNEL_ENTRY and
|
||||
FENCE_SWAPGS_USER_ENTRY macros.
|
||||
|
||||
Spectre variant 2
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
For Spectre variant 2 mitigation, the compiler turns indirect calls or
|
||||
jumps in the kernel into equivalent return trampolines (retpolines)
|
||||
:ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` to go to the target
|
||||
addresses. Speculative execution paths under retpolines are trapped
|
||||
in an infinite loop to prevent any speculative execution jumping to
|
||||
a gadget.
|
||||
|
||||
To turn on retpoline mitigation on a vulnerable CPU, the kernel
|
||||
needs to be compiled with a gcc compiler that supports the
|
||||
-mindirect-branch=thunk-extern -mindirect-branch-register options.
|
||||
If the kernel is compiled with a Clang compiler, the compiler needs
|
||||
to support -mretpoline-external-thunk option. The kernel config
|
||||
CONFIG_RETPOLINE needs to be turned on, and the CPU needs to run with
|
||||
the latest updated microcode.
|
||||
|
||||
On Intel Skylake-era systems the mitigation covers most, but not all,
|
||||
cases. See :ref:`[3] <spec_ref3>` for more details.
|
||||
|
||||
On CPUs with hardware mitigation for Spectre variant 2 (e.g. Enhanced
|
||||
IBRS on x86), retpoline is automatically disabled at run time.
|
||||
|
||||
The retpoline mitigation is turned on by default on vulnerable
|
||||
CPUs. It can be forced on or off by the administrator
|
||||
via the kernel command line and sysfs control files. See
|
||||
:ref:`spectre_mitigation_control_command_line`.
|
||||
|
||||
On x86, indirect branch restricted speculation is turned on by default
|
||||
before invoking any firmware code to prevent Spectre variant 2 exploits
|
||||
using the firmware.
|
||||
|
||||
Using kernel address space randomization (CONFIG_RANDOMIZE_SLAB=y
|
||||
and CONFIG_SLAB_FREELIST_RANDOM=y in the kernel configuration) makes
|
||||
attacks on the kernel generally more difficult.
|
||||
|
||||
2. User program mitigation
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
User programs can mitigate Spectre variant 1 using LFENCE or "bounds
|
||||
clipping". For more details see :ref:`[2] <spec_ref2>`.
|
||||
|
||||
For Spectre variant 2 mitigation, individual user programs
|
||||
can be compiled with return trampolines for indirect branches.
|
||||
This protects them from consuming poisoned entries in the branch
|
||||
target buffer left by malicious software. Alternatively, the
|
||||
programs can disable their indirect branch speculation via prctl()
|
||||
(See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
|
||||
On x86, this will turn on STIBP to guard against attacks from the
|
||||
sibling thread when the user program is running, and use IBPB to
|
||||
flush the branch target buffer when switching to/from the program.
|
||||
|
||||
Restricting indirect branch speculation on a user program will
|
||||
also prevent the program from launching a variant 2 attack
|
||||
on x86. All sand-boxed SECCOMP programs have indirect branch
|
||||
speculation restricted by default. Administrators can change
|
||||
that behavior via the kernel command line and sysfs control files.
|
||||
See :ref:`spectre_mitigation_control_command_line`.
|
||||
|
||||
Programs that disable their indirect branch speculation will have
|
||||
more overhead and run slower.
|
||||
|
||||
User programs should use address space randomization
|
||||
(/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more
|
||||
difficult.
|
||||
|
||||
3. VM mitigation
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
Within the kernel, Spectre variant 1 attacks from rogue guests are
|
||||
mitigated on a case by case basis in VM exit paths. Vulnerable code
|
||||
uses nospec accessor macros for "bounds clipping", to avoid any
|
||||
usable disclosure gadgets. However, this may not cover all variant
|
||||
1 attack vectors.
|
||||
|
||||
For Spectre variant 2 attacks from rogue guests to the kernel, the
|
||||
Linux kernel uses retpoline or Enhanced IBRS to prevent consumption of
|
||||
poisoned entries in branch target buffer left by rogue guests. It also
|
||||
flushes the return stack buffer on every VM exit to prevent a return
|
||||
stack buffer underflow so poisoned branch target buffer could be used,
|
||||
or attacker guests leaving poisoned entries in the return stack buffer.
|
||||
|
||||
To mitigate guest-to-guest attacks in the same CPU hardware thread,
|
||||
the branch target buffer is sanitized by flushing before switching
|
||||
to a new guest on a CPU.
|
||||
|
||||
The above mitigations are turned on by default on vulnerable CPUs.
|
||||
|
||||
To mitigate guest-to-guest attacks from sibling thread when SMT is
|
||||
in use, an untrusted guest running in the sibling thread can have
|
||||
its indirect branch speculation disabled by administrator via prctl().
|
||||
|
||||
The kernel also allows guests to use any microcode based mitigation
|
||||
they choose to use (such as IBPB or STIBP on x86) to protect themselves.
|
||||
|
||||
.. _spectre_mitigation_control_command_line:
|
||||
|
||||
Mitigation control on the kernel command line
|
||||
---------------------------------------------
|
||||
|
||||
Spectre variant 2 mitigation can be disabled or force enabled at the
|
||||
kernel command line.
|
||||
|
||||
nospectre_v1
|
||||
|
||||
[X86,PPC] Disable mitigations for Spectre Variant 1
|
||||
(bounds check bypass). With this option data leaks are
|
||||
possible in the system.
|
||||
|
||||
nospectre_v2
|
||||
|
||||
[X86] Disable all mitigations for the Spectre variant 2
|
||||
(indirect branch prediction) vulnerability. System may
|
||||
allow data leaks with this option, which is equivalent
|
||||
to spectre_v2=off.
|
||||
|
||||
|
||||
spectre_v2=
|
||||
|
||||
[X86] Control mitigation of Spectre variant 2
|
||||
(indirect branch speculation) vulnerability.
|
||||
The default operation protects the kernel from
|
||||
user space attacks.
|
||||
|
||||
on
|
||||
unconditionally enable, implies
|
||||
spectre_v2_user=on
|
||||
off
|
||||
unconditionally disable, implies
|
||||
spectre_v2_user=off
|
||||
auto
|
||||
kernel detects whether your CPU model is
|
||||
vulnerable
|
||||
|
||||
Selecting 'on' will, and 'auto' may, choose a
|
||||
mitigation method at run time according to the
|
||||
CPU, the available microcode, the setting of the
|
||||
CONFIG_RETPOLINE configuration option, and the
|
||||
compiler with which the kernel was built.
|
||||
|
||||
Selecting 'on' will also enable the mitigation
|
||||
against user space to user space task attacks.
|
||||
|
||||
Selecting 'off' will disable both the kernel and
|
||||
the user space protections.
|
||||
|
||||
Specific mitigations can also be selected manually:
|
||||
|
||||
retpoline
|
||||
replace indirect branches
|
||||
retpoline,generic
|
||||
google's original retpoline
|
||||
retpoline,amd
|
||||
AMD-specific minimal thunk
|
||||
|
||||
Not specifying this option is equivalent to
|
||||
spectre_v2=auto.
|
||||
|
||||
For user space mitigation:
|
||||
|
||||
spectre_v2_user=
|
||||
|
||||
[X86] Control mitigation of Spectre variant 2
|
||||
(indirect branch speculation) vulnerability between
|
||||
user space tasks
|
||||
|
||||
on
|
||||
Unconditionally enable mitigations. Is
|
||||
enforced by spectre_v2=on
|
||||
|
||||
off
|
||||
Unconditionally disable mitigations. Is
|
||||
enforced by spectre_v2=off
|
||||
|
||||
prctl
|
||||
Indirect branch speculation is enabled,
|
||||
but mitigation can be enabled via prctl
|
||||
per thread. The mitigation control state
|
||||
is inherited on fork.
|
||||
|
||||
prctl,ibpb
|
||||
Like "prctl" above, but only STIBP is
|
||||
controlled per thread. IBPB is issued
|
||||
always when switching between different user
|
||||
space processes.
|
||||
|
||||
seccomp
|
||||
Same as "prctl" above, but all seccomp
|
||||
threads will enable the mitigation unless
|
||||
they explicitly opt out.
|
||||
|
||||
seccomp,ibpb
|
||||
Like "seccomp" above, but only STIBP is
|
||||
controlled per thread. IBPB is issued
|
||||
always when switching between different
|
||||
user space processes.
|
||||
|
||||
auto
|
||||
Kernel selects the mitigation depending on
|
||||
the available CPU features and vulnerability.
|
||||
|
||||
Default mitigation:
|
||||
If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
|
||||
|
||||
Not specifying this option is equivalent to
|
||||
spectre_v2_user=auto.
|
||||
|
||||
In general the kernel by default selects
|
||||
reasonable mitigations for the current CPU. To
|
||||
disable Spectre variant 2 mitigations, boot with
|
||||
spectre_v2=off. Spectre variant 1 mitigations
|
||||
cannot be disabled.
|
||||
|
||||
Mitigation selection guide
|
||||
--------------------------
|
||||
|
||||
1. Trusted userspace
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If all userspace applications are from trusted sources and do not
|
||||
execute externally supplied untrusted code, then the mitigations can
|
||||
be disabled.
|
||||
|
||||
2. Protect sensitive programs
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
For security-sensitive programs that have secrets (e.g. crypto
|
||||
keys), protection against Spectre variant 2 can be put in place by
|
||||
disabling indirect branch speculation when the program is running
|
||||
(See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
|
||||
|
||||
3. Sandbox untrusted programs
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Untrusted programs that could be a source of attacks can be cordoned
|
||||
off by disabling their indirect branch speculation when they are run
|
||||
(See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
|
||||
This prevents untrusted programs from polluting the branch target
|
||||
buffer. All programs running in SECCOMP sandboxes have indirect
|
||||
branch speculation restricted by default. This behavior can be
|
||||
changed via the kernel command line and sysfs control files. See
|
||||
:ref:`spectre_mitigation_control_command_line`.
|
||||
|
||||
3. High security mode
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
All Spectre variant 2 mitigations can be forced on
|
||||
at boot time for all programs (See the "on" option in
|
||||
:ref:`spectre_mitigation_control_command_line`). This will add
|
||||
overhead as indirect branch speculations for all programs will be
|
||||
restricted.
|
||||
|
||||
On x86, branch target buffer will be flushed with IBPB when switching
|
||||
to a new program. STIBP is left on all the time to protect programs
|
||||
against variant 2 attacks originating from programs running on
|
||||
sibling threads.
|
||||
|
||||
Alternatively, STIBP can be used only when running programs
|
||||
whose indirect branch speculation is explicitly disabled,
|
||||
while IBPB is still used all the time when switching to a new
|
||||
program to clear the branch target buffer (See "ibpb" option in
|
||||
:ref:`spectre_mitigation_control_command_line`). This "ibpb" option
|
||||
has less performance cost than the "on" option, which leaves STIBP
|
||||
on all the time.
|
||||
|
||||
References on Spectre
|
||||
---------------------
|
||||
|
||||
Intel white papers:
|
||||
|
||||
.. _spec_ref1:
|
||||
|
||||
[1] `Intel analysis of speculative execution side channels <https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf>`_.
|
||||
|
||||
.. _spec_ref2:
|
||||
|
||||
[2] `Bounds check bypass <https://software.intel.com/security-software-guidance/software-guidance/bounds-check-bypass>`_.
|
||||
|
||||
.. _spec_ref3:
|
||||
|
||||
[3] `Deep dive: Retpoline: A branch target injection mitigation <https://software.intel.com/security-software-guidance/insights/deep-dive-retpoline-branch-target-injection-mitigation>`_.
|
||||
|
||||
.. _spec_ref4:
|
||||
|
||||
[4] `Deep Dive: Single Thread Indirect Branch Predictors <https://software.intel.com/security-software-guidance/insights/deep-dive-single-thread-indirect-branch-predictors>`_.
|
||||
|
||||
AMD white papers:
|
||||
|
||||
.. _spec_ref5:
|
||||
|
||||
[5] `AMD64 technology indirect branch control extension <https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf>`_.
|
||||
|
||||
.. _spec_ref6:
|
||||
|
||||
[6] `Software techniques for managing speculation on AMD processors <https://developer.amd.com/wp-content/resources/90343-B_SoftwareTechniquesforManagingSpeculation_WP_7-18Update_FNL.pdf>`_.
|
||||
|
||||
ARM white papers:
|
||||
|
||||
.. _spec_ref7:
|
||||
|
||||
[7] `Cache speculation side-channels <https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/download-the-whitepaper>`_.
|
||||
|
||||
.. _spec_ref8:
|
||||
|
||||
[8] `Cache speculation issues update <https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/latest-updates/cache-speculation-issues-update>`_.
|
||||
|
||||
Google white paper:
|
||||
|
||||
.. _spec_ref9:
|
||||
|
||||
[9] `Retpoline: a software construct for preventing branch-target-injection <https://support.google.com/faqs/answer/7625886>`_.
|
||||
|
||||
MIPS white paper:
|
||||
|
||||
.. _spec_ref10:
|
||||
|
||||
[10] `MIPS: response on speculative execution and side channel vulnerabilities <https://www.mips.com/blog/mips-response-on-speculative-execution-and-side-channel-vulnerabilities/>`_.
|
||||
|
||||
Academic papers:
|
||||
|
||||
.. _spec_ref11:
|
||||
|
||||
[11] `Spectre Attacks: Exploiting Speculative Execution <https://spectreattack.com/spectre.pdf>`_.
|
||||
|
||||
.. _spec_ref12:
|
||||
|
||||
[12] `NetSpectre: Read Arbitrary Memory over Network <https://arxiv.org/abs/1807.10535>`_.
|
||||
|
||||
.. _spec_ref13:
|
||||
|
||||
[13] `Spectre Returns! Speculation Attacks using the Return Stack Buffer <https://www.usenix.org/system/files/conference/woot18/woot18-paper-koruyeh.pdf>`_.
|
||||
@@ -1,279 +0,0 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
TAA - TSX Asynchronous Abort
|
||||
======================================
|
||||
|
||||
TAA is a hardware vulnerability that allows unprivileged speculative access to
|
||||
data which is available in various CPU internal buffers by using asynchronous
|
||||
aborts within an Intel TSX transactional region.
|
||||
|
||||
Affected processors
|
||||
-------------------
|
||||
|
||||
This vulnerability only affects Intel processors that support Intel
|
||||
Transactional Synchronization Extensions (TSX) when the TAA_NO bit (bit 8)
|
||||
is 0 in the IA32_ARCH_CAPABILITIES MSR. On processors where the MDS_NO bit
|
||||
(bit 5) is 0 in the IA32_ARCH_CAPABILITIES MSR, the existing MDS mitigations
|
||||
also mitigate against TAA.
|
||||
|
||||
Whether a processor is affected or not can be read out from the TAA
|
||||
vulnerability file in sysfs. See :ref:`tsx_async_abort_sys_info`.
|
||||
|
||||
Related CVEs
|
||||
------------
|
||||
|
||||
The following CVE entry is related to this TAA issue:
|
||||
|
||||
============== ===== ===================================================
|
||||
CVE-2019-11135 TAA TSX Asynchronous Abort (TAA) condition on some
|
||||
microprocessors utilizing speculative execution may
|
||||
allow an authenticated user to potentially enable
|
||||
information disclosure via a side channel with
|
||||
local access.
|
||||
============== ===== ===================================================
|
||||
|
||||
Problem
|
||||
-------
|
||||
|
||||
When performing store, load or L1 refill operations, processors write
|
||||
data into temporary microarchitectural structures (buffers). The data in
|
||||
those buffers can be forwarded to load operations as an optimization.
|
||||
|
||||
Intel TSX is an extension to the x86 instruction set architecture that adds
|
||||
hardware transactional memory support to improve performance of multi-threaded
|
||||
software. TSX lets the processor expose and exploit concurrency hidden in an
|
||||
application due to dynamically avoiding unnecessary synchronization.
|
||||
|
||||
TSX supports atomic memory transactions that are either committed (success) or
|
||||
aborted. During an abort, operations that happened within the transactional region
|
||||
are rolled back. An asynchronous abort takes place, among other options, when a
|
||||
different thread accesses a cache line that is also used within the transactional
|
||||
region when that access might lead to a data race.
|
||||
|
||||
Immediately after an uncompleted asynchronous abort, certain speculatively
|
||||
executed loads may read data from those internal buffers and pass it to dependent
|
||||
operations. This can be then used to infer the value via a cache side channel
|
||||
attack.
|
||||
|
||||
Because the buffers are potentially shared between Hyper-Threads cross
|
||||
Hyper-Thread attacks are possible.
|
||||
|
||||
The victim of a malicious actor does not need to make use of TSX. Only the
|
||||
attacker needs to begin a TSX transaction and raise an asynchronous abort
|
||||
which in turn potenitally leaks data stored in the buffers.
|
||||
|
||||
More detailed technical information is available in the TAA specific x86
|
||||
architecture section: :ref:`Documentation/x86/tsx_async_abort.rst <tsx_async_abort>`.
|
||||
|
||||
|
||||
Attack scenarios
|
||||
----------------
|
||||
|
||||
Attacks against the TAA vulnerability can be implemented from unprivileged
|
||||
applications running on hosts or guests.
|
||||
|
||||
As for MDS, the attacker has no control over the memory addresses that can
|
||||
be leaked. Only the victim is responsible for bringing data to the CPU. As
|
||||
a result, the malicious actor has to sample as much data as possible and
|
||||
then postprocess it to try to infer any useful information from it.
|
||||
|
||||
A potential attacker only has read access to the data. Also, there is no direct
|
||||
privilege escalation by using this technique.
|
||||
|
||||
|
||||
.. _tsx_async_abort_sys_info:
|
||||
|
||||
TAA system information
|
||||
-----------------------
|
||||
|
||||
The Linux kernel provides a sysfs interface to enumerate the current TAA status
|
||||
of mitigated systems. The relevant sysfs file is:
|
||||
|
||||
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
|
||||
|
||||
The possible values in this file are:
|
||||
|
||||
.. list-table::
|
||||
|
||||
* - 'Vulnerable'
|
||||
- The CPU is affected by this vulnerability and the microcode and kernel mitigation are not applied.
|
||||
* - 'Vulnerable: Clear CPU buffers attempted, no microcode'
|
||||
- The system tries to clear the buffers but the microcode might not support the operation.
|
||||
* - 'Mitigation: Clear CPU buffers'
|
||||
- The microcode has been updated to clear the buffers. TSX is still enabled.
|
||||
* - 'Mitigation: TSX disabled'
|
||||
- TSX is disabled.
|
||||
* - 'Not affected'
|
||||
- The CPU is not affected by this issue.
|
||||
|
||||
.. _ucode_needed:
|
||||
|
||||
Best effort mitigation mode
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If the processor is vulnerable, but the availability of the microcode-based
|
||||
mitigation mechanism is not advertised via CPUID the kernel selects a best
|
||||
effort mitigation mode. This mode invokes the mitigation instructions
|
||||
without a guarantee that they clear the CPU buffers.
|
||||
|
||||
This is done to address virtualization scenarios where the host has the
|
||||
microcode update applied, but the hypervisor is not yet updated to expose the
|
||||
CPUID to the guest. If the host has updated microcode the protection takes
|
||||
effect; otherwise a few CPU cycles are wasted pointlessly.
|
||||
|
||||
The state in the tsx_async_abort sysfs file reflects this situation
|
||||
accordingly.
|
||||
|
||||
|
||||
Mitigation mechanism
|
||||
--------------------
|
||||
|
||||
The kernel detects the affected CPUs and the presence of the microcode which is
|
||||
required. If a CPU is affected and the microcode is available, then the kernel
|
||||
enables the mitigation by default.
|
||||
|
||||
|
||||
The mitigation can be controlled at boot time via a kernel command line option.
|
||||
See :ref:`taa_mitigation_control_command_line`.
|
||||
|
||||
.. _virt_mechanism:
|
||||
|
||||
Virtualization mitigation
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Affected systems where the host has TAA microcode and TAA is mitigated by
|
||||
having disabled TSX previously, are not vulnerable regardless of the status
|
||||
of the VMs.
|
||||
|
||||
In all other cases, if the host either does not have the TAA microcode or
|
||||
the kernel is not mitigated, the system might be vulnerable.
|
||||
|
||||
|
||||
.. _taa_mitigation_control_command_line:
|
||||
|
||||
Mitigation control on the kernel command line
|
||||
---------------------------------------------
|
||||
|
||||
The kernel command line allows to control the TAA mitigations at boot time with
|
||||
the option "tsx_async_abort=". The valid arguments for this option are:
|
||||
|
||||
============ =============================================================
|
||||
off This option disables the TAA mitigation on affected platforms.
|
||||
If the system has TSX enabled (see next parameter) and the CPU
|
||||
is affected, the system is vulnerable.
|
||||
|
||||
full TAA mitigation is enabled. If TSX is enabled, on an affected
|
||||
system it will clear CPU buffers on ring transitions. On
|
||||
systems which are MDS-affected and deploy MDS mitigation,
|
||||
TAA is also mitigated. Specifying this option on those
|
||||
systems will have no effect.
|
||||
|
||||
full,nosmt The same as tsx_async_abort=full, with SMT disabled on
|
||||
vulnerable CPUs that have TSX enabled. This is the complete
|
||||
mitigation. When TSX is disabled, SMT is not disabled because
|
||||
CPU is not vulnerable to cross-thread TAA attacks.
|
||||
============ =============================================================
|
||||
|
||||
Not specifying this option is equivalent to "tsx_async_abort=full". For
|
||||
processors that are affected by both TAA and MDS, specifying just
|
||||
"tsx_async_abort=off" without an accompanying "mds=off" will have no
|
||||
effect as the same mitigation is used for both vulnerabilities.
|
||||
|
||||
The kernel command line also allows to control the TSX feature using the
|
||||
parameter "tsx=" on CPUs which support TSX control. MSR_IA32_TSX_CTRL is used
|
||||
to control the TSX feature and the enumeration of the TSX feature bits (RTM
|
||||
and HLE) in CPUID.
|
||||
|
||||
The valid options are:
|
||||
|
||||
============ =============================================================
|
||||
off Disables TSX on the system.
|
||||
|
||||
Note that this option takes effect only on newer CPUs which are
|
||||
not vulnerable to MDS, i.e., have MSR_IA32_ARCH_CAPABILITIES.MDS_NO=1
|
||||
and which get the new IA32_TSX_CTRL MSR through a microcode
|
||||
update. This new MSR allows for the reliable deactivation of
|
||||
the TSX functionality.
|
||||
|
||||
on Enables TSX.
|
||||
|
||||
Although there are mitigations for all known security
|
||||
vulnerabilities, TSX has been known to be an accelerator for
|
||||
several previous speculation-related CVEs, and so there may be
|
||||
unknown security risks associated with leaving it enabled.
|
||||
|
||||
auto Disables TSX if X86_BUG_TAA is present, otherwise enables TSX
|
||||
on the system.
|
||||
============ =============================================================
|
||||
|
||||
Not specifying this option is equivalent to "tsx=off".
|
||||
|
||||
The following combinations of the "tsx_async_abort" and "tsx" are possible. For
|
||||
affected platforms tsx=auto is equivalent to tsx=off and the result will be:
|
||||
|
||||
========= ========================== =========================================
|
||||
tsx=on tsx_async_abort=full The system will use VERW to clear CPU
|
||||
buffers. Cross-thread attacks are still
|
||||
possible on SMT machines.
|
||||
tsx=on tsx_async_abort=full,nosmt As above, cross-thread attacks on SMT
|
||||
mitigated.
|
||||
tsx=on tsx_async_abort=off The system is vulnerable.
|
||||
tsx=off tsx_async_abort=full TSX might be disabled if microcode
|
||||
provides a TSX control MSR. If so,
|
||||
system is not vulnerable.
|
||||
tsx=off tsx_async_abort=full,nosmt Ditto
|
||||
tsx=off tsx_async_abort=off ditto
|
||||
========= ========================== =========================================
|
||||
|
||||
|
||||
For unaffected platforms "tsx=on" and "tsx_async_abort=full" does not clear CPU
|
||||
buffers. For platforms without TSX control (MSR_IA32_ARCH_CAPABILITIES.MDS_NO=0)
|
||||
"tsx" command line argument has no effect.
|
||||
|
||||
For the affected platforms below table indicates the mitigation status for the
|
||||
combinations of CPUID bit MD_CLEAR and IA32_ARCH_CAPABILITIES MSR bits MDS_NO
|
||||
and TSX_CTRL_MSR.
|
||||
|
||||
======= ========= ============= ========================================
|
||||
MDS_NO MD_CLEAR TSX_CTRL_MSR Status
|
||||
======= ========= ============= ========================================
|
||||
0 0 0 Vulnerable (needs microcode)
|
||||
0 1 0 MDS and TAA mitigated via VERW
|
||||
1 1 0 MDS fixed, TAA vulnerable if TSX enabled
|
||||
because MD_CLEAR has no meaning and
|
||||
VERW is not guaranteed to clear buffers
|
||||
1 X 1 MDS fixed, TAA can be mitigated by
|
||||
VERW or TSX_CTRL_MSR
|
||||
======= ========= ============= ========================================
|
||||
|
||||
Mitigation selection guide
|
||||
--------------------------
|
||||
|
||||
1. Trusted userspace and guests
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If all user space applications are from a trusted source and do not execute
|
||||
untrusted code which is supplied externally, then the mitigation can be
|
||||
disabled. The same applies to virtualized environments with trusted guests.
|
||||
|
||||
|
||||
2. Untrusted userspace and guests
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If there are untrusted applications or guests on the system, enabling TSX
|
||||
might allow a malicious actor to leak data from the host or from other
|
||||
processes running on the same physical core.
|
||||
|
||||
If the microcode is available and the TSX is disabled on the host, attacks
|
||||
are prevented in a virtualized environment as well, even if the VMs do not
|
||||
explicitly enable the mitigation.
|
||||
|
||||
|
||||
.. _taa_default_mitigations:
|
||||
|
||||
Default mitigations
|
||||
-------------------
|
||||
|
||||
The kernel's default action for vulnerable processors is:
|
||||
|
||||
- Deploy TSX disable mitigation (tsx_async_abort=full tsx=off).
|
||||
@@ -17,12 +17,14 @@ etc.
|
||||
kernel-parameters
|
||||
devices
|
||||
|
||||
This section describes CPU vulnerabilities and their mitigations.
|
||||
This section describes CPU vulnerabilities and provides an overview of the
|
||||
possible mitigations along with guidance for selecting mitigations if they
|
||||
are configurable at compile, boot or run time.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
hw-vuln/index
|
||||
l1tf
|
||||
|
||||
Here is a set of documents aimed at users who are trying to track down
|
||||
problems and bugs in particular.
|
||||
|
||||
@@ -126,7 +126,6 @@ parameter is applicable::
|
||||
NET Appropriate network support is enabled.
|
||||
NUMA NUMA support is enabled.
|
||||
NFS Appropriate NFS support is enabled.
|
||||
OF Devicetree is enabled.
|
||||
OSS OSS sound support is enabled.
|
||||
PV_OPS A paravirtualized kernel is enabled.
|
||||
PARIDE The ParIDE (parallel port IDE) subsystem is enabled.
|
||||
|
||||
@@ -113,7 +113,7 @@
|
||||
the GPE dispatcher.
|
||||
This facility can be used to prevent such uncontrolled
|
||||
GPE floodings.
|
||||
Format: <byte>
|
||||
Format: <int>
|
||||
|
||||
acpi_no_auto_serialize [HW,ACPI]
|
||||
Disable auto-serialization of AML methods
|
||||
@@ -136,10 +136,6 @@
|
||||
dynamic table installation which will install SSDT
|
||||
tables to /sys/firmware/acpi/tables/dynamic.
|
||||
|
||||
acpi_no_watchdog [HW,ACPI,WDT]
|
||||
Ignore the ACPI-based watchdog interface (WDAT) and let
|
||||
a native driver control the watchdog device instead.
|
||||
|
||||
acpi_rsdp= [ACPI,EFI,KEXEC]
|
||||
Pass the RSDP address to the kernel, mostly used
|
||||
on machines running EFI runtime service to boot the
|
||||
@@ -490,14 +486,10 @@
|
||||
cut the overhead, others just disable the usage. So
|
||||
only cgroup_disable=memory is actually worthy}
|
||||
|
||||
cgroup_no_v1= [KNL] Disable cgroup controllers and named hierarchies in v1
|
||||
Format: { { controller | "all" | "named" }
|
||||
[,{ controller | "all" | "named" }...] }
|
||||
cgroup_no_v1= [KNL] Disable one, multiple, all cgroup controllers in v1
|
||||
Format: { controller[,controller...] | "all" }
|
||||
Like cgroup_disable, but only applies to cgroup v1;
|
||||
the blacklisted controllers remain available in cgroup2.
|
||||
"all" blacklists all controllers and "named" disables
|
||||
named mounts. Specifying both "all" and "named" disables
|
||||
all v1 hierarchies.
|
||||
|
||||
cgroup.memory= [KNL] Pass options to the cgroup memory controller.
|
||||
Format: <string>
|
||||
@@ -562,7 +554,7 @@
|
||||
loops can be debugged more effectively on production
|
||||
systems.
|
||||
|
||||
clearcpuid=BITNUM[,BITNUM...] [X86]
|
||||
clearcpuid=BITNUM [X86]
|
||||
Disable CPUID feature X for the kernel. See
|
||||
arch/x86/include/asm/cpufeatures.h for the valid bit
|
||||
numbers. Note the Linux specific bits are not necessarily
|
||||
@@ -905,10 +897,6 @@
|
||||
The filter can be disabled or changed to another
|
||||
driver later using sysfs.
|
||||
|
||||
driver_async_probe= [KNL]
|
||||
List of driver names to be probed asynchronously.
|
||||
Format: <driver_name1>,<driver_name2>...
|
||||
|
||||
drm.edid_firmware=[<connector>:]<file>[,[<connector>:]<file>]
|
||||
Broken monitors, graphic adapters, KVMs and EDIDless
|
||||
panels may send no or incorrect EDID data sets.
|
||||
@@ -1075,7 +1063,7 @@
|
||||
earlyprintk=serial[,0x...[,baudrate]]
|
||||
earlyprintk=ttySn[,baudrate]
|
||||
earlyprintk=dbgp[debugController#]
|
||||
earlyprintk=pciserial[,force],bus:device.function[,baudrate]
|
||||
earlyprintk=pciserial,bus:device.function[,baudrate]
|
||||
earlyprintk=xdbc[xhciController#]
|
||||
|
||||
earlyprintk is useful when the kernel crashes before
|
||||
@@ -1107,10 +1095,6 @@
|
||||
|
||||
The sclp output can only be used on s390.
|
||||
|
||||
The optional "force" to "pciserial" enables use of a
|
||||
PCI device even when its classcode is not of the
|
||||
UART class.
|
||||
|
||||
edac_report= [HW,EDAC] Control how to report EDAC event
|
||||
Format: {"on" | "off" | "force"}
|
||||
on: enable EDAC to report H/W event. May be overridden
|
||||
@@ -1643,15 +1627,6 @@
|
||||
|
||||
initrd= [BOOT] Specify the location of the initial ramdisk
|
||||
|
||||
init_on_alloc= [MM] Fill newly allocated pages and heap objects with
|
||||
zeroes.
|
||||
Format: 0 | 1
|
||||
Default set by CONFIG_INIT_ON_ALLOC_DEFAULT_ON.
|
||||
|
||||
init_on_free= [MM] Fill freed pages and heap objects with zeroes.
|
||||
Format: 0 | 1
|
||||
Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.
|
||||
|
||||
init_pkru= [x86] Specify the default memory protection keys rights
|
||||
register contents for all processes. 0x55555554 by
|
||||
default (disallow access to all but pkey 0). Can
|
||||
@@ -1967,12 +1942,6 @@
|
||||
Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y,
|
||||
the default is off.
|
||||
|
||||
kpti= [ARM64] Control page table isolation of user
|
||||
and kernel address spaces.
|
||||
Default: enabled on cores which need mitigation.
|
||||
0: force disabled
|
||||
1: force enabled
|
||||
|
||||
kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
|
||||
Default is 0 (don't ignore, but inject #GP)
|
||||
|
||||
@@ -1983,25 +1952,6 @@
|
||||
KVM MMU at runtime.
|
||||
Default is 0 (off)
|
||||
|
||||
kvm.nx_huge_pages=
|
||||
[KVM] Controls the software workaround for the
|
||||
X86_BUG_ITLB_MULTIHIT bug.
|
||||
force : Always deploy workaround.
|
||||
off : Never deploy workaround.
|
||||
auto : Deploy workaround based on the presence of
|
||||
X86_BUG_ITLB_MULTIHIT.
|
||||
|
||||
Default is 'auto'.
|
||||
|
||||
If the software workaround is enabled for the host,
|
||||
guests do need not to enable it for nested guests.
|
||||
|
||||
kvm.nx_huge_pages_recovery_ratio=
|
||||
[KVM] Controls how many 4KiB pages are periodically zapped
|
||||
back to huge pages. 0 disables the recovery, otherwise if
|
||||
the value is N KVM will zap 1/Nth of the 4KiB pages every
|
||||
minute. The default is 60.
|
||||
|
||||
kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM.
|
||||
Default is 1 (enabled)
|
||||
|
||||
@@ -2119,13 +2069,10 @@
|
||||
off
|
||||
Disables hypervisor mitigations and doesn't
|
||||
emit any warnings.
|
||||
It also drops the swap size and available
|
||||
RAM limit restriction on both hypervisor and
|
||||
bare metal.
|
||||
|
||||
Default is 'flush'.
|
||||
|
||||
For details see: Documentation/admin-guide/hw-vuln/l1tf.rst
|
||||
For details see: Documentation/admin-guide/l1tf.rst
|
||||
|
||||
l2cr= [PPC]
|
||||
|
||||
@@ -2365,38 +2312,6 @@
|
||||
Format: <first>,<last>
|
||||
Specifies range of consoles to be captured by the MDA.
|
||||
|
||||
mds= [X86,INTEL]
|
||||
Control mitigation for the Micro-architectural Data
|
||||
Sampling (MDS) vulnerability.
|
||||
|
||||
Certain CPUs are vulnerable to an exploit against CPU
|
||||
internal buffers which can forward information to a
|
||||
disclosure gadget under certain conditions.
|
||||
|
||||
In vulnerable processors, the speculatively
|
||||
forwarded data can be used in a cache side channel
|
||||
attack, to access data to which the attacker does
|
||||
not have direct access.
|
||||
|
||||
This parameter controls the MDS mitigation. The
|
||||
options are:
|
||||
|
||||
full - Enable MDS mitigation on vulnerable CPUs
|
||||
full,nosmt - Enable MDS mitigation and disable
|
||||
SMT on vulnerable CPUs
|
||||
off - Unconditionally disable MDS mitigation
|
||||
|
||||
On TAA-affected machines, mds=off can be prevented by
|
||||
an active TAA mitigation as both vulnerabilities are
|
||||
mitigated with the same mechanism so in order to disable
|
||||
this mitigation, you need to specify tsx_async_abort=off
|
||||
too.
|
||||
|
||||
Not specifying this option is equivalent to
|
||||
mds=full.
|
||||
|
||||
For details see: Documentation/admin-guide/hw-vuln/mds.rst
|
||||
|
||||
mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory
|
||||
Amount of memory to be used when the kernel is not able
|
||||
to see the whole system memory or for test.
|
||||
@@ -2554,53 +2469,6 @@
|
||||
in the "bleeding edge" mini2440 support kernel at
|
||||
http://repo.or.cz/w/linux-2.6/mini2440.git
|
||||
|
||||
mitigations=
|
||||
[X86,PPC,S390,ARM64] Control optional mitigations for
|
||||
CPU vulnerabilities. This is a set of curated,
|
||||
arch-independent options, each of which is an
|
||||
aggregation of existing arch-specific options.
|
||||
|
||||
off
|
||||
Disable all optional CPU mitigations. This
|
||||
improves system performance, but it may also
|
||||
expose users to several CPU vulnerabilities.
|
||||
Equivalent to: nopti [X86,PPC]
|
||||
kpti=0 [ARM64]
|
||||
nospectre_v1 [PPC]
|
||||
nobp=0 [S390]
|
||||
nospectre_v1 [X86]
|
||||
nospectre_v2 [X86,PPC,S390,ARM64]
|
||||
spectre_v2_user=off [X86]
|
||||
spec_store_bypass_disable=off [X86,PPC]
|
||||
ssbd=force-off [ARM64]
|
||||
l1tf=off [X86]
|
||||
mds=off [X86]
|
||||
tsx_async_abort=off [X86]
|
||||
kvm.nx_huge_pages=off [X86]
|
||||
no_entry_flush [PPC]
|
||||
no_uaccess_flush [PPC]
|
||||
|
||||
Exceptions:
|
||||
This does not have any effect on
|
||||
kvm.nx_huge_pages when
|
||||
kvm.nx_huge_pages=force.
|
||||
|
||||
auto (default)
|
||||
Mitigate all CPU vulnerabilities, but leave SMT
|
||||
enabled, even if it's vulnerable. This is for
|
||||
users who don't want to be surprised by SMT
|
||||
getting disabled across kernel upgrades, or who
|
||||
have other ways of avoiding SMT-based attacks.
|
||||
Equivalent to: (default behavior)
|
||||
|
||||
auto,nosmt
|
||||
Mitigate all CPU vulnerabilities, disabling SMT
|
||||
if needed. This is for users who always want to
|
||||
be fully mitigated, even if it means losing SMT.
|
||||
Equivalent to: l1tf=flush,nosmt [X86]
|
||||
mds=full,nosmt [X86]
|
||||
tsx_async_abort=full,nosmt [X86]
|
||||
|
||||
mminit_loglevel=
|
||||
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
|
||||
parameter allows control of the logging verbosity for
|
||||
@@ -2889,8 +2757,6 @@
|
||||
|
||||
noefi Disable EFI runtime services support.
|
||||
|
||||
no_entry_flush [PPC] Don't flush the L1-D cache when entering the kernel.
|
||||
|
||||
noexec [IA-64]
|
||||
|
||||
noexec [X86]
|
||||
@@ -2928,21 +2794,18 @@
|
||||
nosmt=force: Force disable SMT, cannot be undone
|
||||
via the sysfs control file.
|
||||
|
||||
nospectre_v1 [X66, PPC] Disable mitigations for Spectre Variant 1
|
||||
(bounds check bypass). With this option data leaks
|
||||
are possible in the system.
|
||||
nospectre_v1 [PPC] Disable mitigations for Spectre Variant 1 (bounds
|
||||
check bypass). With this option data leaks are possible
|
||||
in the system.
|
||||
|
||||
nospectre_v2 [X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for
|
||||
the Spectre variant 2 (indirect branch prediction)
|
||||
vulnerability. System may allow data leaks with this
|
||||
option.
|
||||
nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2
|
||||
(indirect branch prediction) vulnerability. System may
|
||||
allow data leaks with this option, which is equivalent
|
||||
to spectre_v2=off.
|
||||
|
||||
nospec_store_bypass_disable
|
||||
[HW] Disable all mitigations for the Speculative Store Bypass vulnerability
|
||||
|
||||
no_uaccess_flush
|
||||
[PPC] Don't flush the L1-D cache after accessing user data.
|
||||
|
||||
noxsave [BUGS=X86] Disables x86 extended register state save
|
||||
and restore using xsave. The kernel will fallback to
|
||||
enabling legacy floating-point and sse state.
|
||||
@@ -3136,12 +2999,6 @@
|
||||
This can be set from sysctl after boot.
|
||||
See Documentation/sysctl/vm.txt for details.
|
||||
|
||||
of_devlink [OF, KNL] Create device links between consumer and
|
||||
supplier devices by scanning the devictree to infer the
|
||||
consumer/supplier relationships. A consumer device
|
||||
will not be probed until all the supplier devices have
|
||||
probed successfully.
|
||||
|
||||
ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
|
||||
See Documentation/debugging-via-ohci1394.txt for more
|
||||
info.
|
||||
@@ -3619,10 +3476,6 @@
|
||||
before loading.
|
||||
See Documentation/blockdev/ramdisk.txt.
|
||||
|
||||
psi= [KNL] Enable or disable pressure stall information
|
||||
tracking.
|
||||
Format: <bool>
|
||||
|
||||
psmouse.proto= [HW,MOUSE] Highest PS2 mouse protocol extension to
|
||||
probe for; one of (bare|imps|exps|lifebook|any).
|
||||
psmouse.rate= [HW,MOUSE] Set desired mouse report rate, in reports
|
||||
@@ -4027,13 +3880,6 @@
|
||||
Run specified binary instead of /init from the ramdisk,
|
||||
used for early userspace startup. See initrd.
|
||||
|
||||
rdrand= [X86]
|
||||
force - Override the decision by the kernel to hide the
|
||||
advertisement of RDRAND support (this affects
|
||||
certain AMD processors because of buggy BIOS
|
||||
support, specifically around the suspend/resume
|
||||
path).
|
||||
|
||||
rdt= [HW,X86,RDT]
|
||||
Turn on/off individual RDT features. List is:
|
||||
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
|
||||
@@ -4047,9 +3893,7 @@
|
||||
[[,]s[mp]#### \
|
||||
[[,]b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]] \
|
||||
[[,]f[orce]
|
||||
Where reboot_mode is one of warm (soft) or cold (hard) or gpio
|
||||
(prefix with 'panic_' to set mode for panic
|
||||
reboot only),
|
||||
Where reboot_mode is one of warm (soft) or cold (hard) or gpio,
|
||||
reboot_type is one of bios, acpi, kbd, triple, efi, or pci,
|
||||
reboot_force is either force or not specified,
|
||||
reboot_cpu is s[mp]#### with #### being the processor
|
||||
@@ -4321,13 +4165,9 @@
|
||||
|
||||
spectre_v2= [X86] Control mitigation of Spectre variant 2
|
||||
(indirect branch speculation) vulnerability.
|
||||
The default operation protects the kernel from
|
||||
user space attacks.
|
||||
|
||||
on - unconditionally enable, implies
|
||||
spectre_v2_user=on
|
||||
off - unconditionally disable, implies
|
||||
spectre_v2_user=off
|
||||
on - unconditionally enable
|
||||
off - unconditionally disable
|
||||
auto - kernel detects whether your CPU model is
|
||||
vulnerable
|
||||
|
||||
@@ -4337,12 +4177,6 @@
|
||||
CONFIG_RETPOLINE configuration option, and the
|
||||
compiler with which the kernel was built.
|
||||
|
||||
Selecting 'on' will also enable the mitigation
|
||||
against user space to user space task attacks.
|
||||
|
||||
Selecting 'off' will disable both the kernel and
|
||||
the user space protections.
|
||||
|
||||
Specific mitigations can also be selected manually:
|
||||
|
||||
retpoline - replace indirect branches
|
||||
@@ -4352,48 +4186,6 @@
|
||||
Not specifying this option is equivalent to
|
||||
spectre_v2=auto.
|
||||
|
||||
spectre_v2_user=
|
||||
[X86] Control mitigation of Spectre variant 2
|
||||
(indirect branch speculation) vulnerability between
|
||||
user space tasks
|
||||
|
||||
on - Unconditionally enable mitigations. Is
|
||||
enforced by spectre_v2=on
|
||||
|
||||
off - Unconditionally disable mitigations. Is
|
||||
enforced by spectre_v2=off
|
||||
|
||||
prctl - Indirect branch speculation is enabled,
|
||||
but mitigation can be enabled via prctl
|
||||
per thread. The mitigation control state
|
||||
is inherited on fork.
|
||||
|
||||
prctl,ibpb
|
||||
- Like "prctl" above, but only STIBP is
|
||||
controlled per thread. IBPB is issued
|
||||
always when switching between different user
|
||||
space processes.
|
||||
|
||||
seccomp
|
||||
- Same as "prctl" above, but all seccomp
|
||||
threads will enable the mitigation unless
|
||||
they explicitly opt out.
|
||||
|
||||
seccomp,ibpb
|
||||
- Like "seccomp" above, but only STIBP is
|
||||
controlled per thread. IBPB is issued
|
||||
always when switching between different
|
||||
user space processes.
|
||||
|
||||
auto - Kernel selects the mitigation depending on
|
||||
the available CPU features and vulnerability.
|
||||
|
||||
Default mitigation:
|
||||
If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
|
||||
|
||||
Not specifying this option is equivalent to
|
||||
spectre_v2_user=auto.
|
||||
|
||||
spec_store_bypass_disable=
|
||||
[HW] Control Speculative Store Bypass (SSB) Disable mitigation
|
||||
(Speculative Store Bypass vulnerability)
|
||||
@@ -4451,26 +4243,6 @@
|
||||
spia_pedr=
|
||||
spia_peddr=
|
||||
|
||||
srbds= [X86,INTEL]
|
||||
Control the Special Register Buffer Data Sampling
|
||||
(SRBDS) mitigation.
|
||||
|
||||
Certain CPUs are vulnerable to an MDS-like
|
||||
exploit which can leak bits from the random
|
||||
number generator.
|
||||
|
||||
By default, this issue is mitigated by
|
||||
microcode. However, the microcode fix can cause
|
||||
the RDRAND and RDSEED instructions to become
|
||||
much slower. Among other effects, this will
|
||||
result in reduced throughput from /dev/urandom.
|
||||
|
||||
The microcode mitigation can be disabled with
|
||||
the following option:
|
||||
|
||||
off: Disable mitigation and remove
|
||||
performance impact to RDRAND and RDSEED
|
||||
|
||||
srcutree.counter_wrap_check [KNL]
|
||||
Specifies how frequently to check for
|
||||
grace-period sequence counter wrap for the
|
||||
@@ -4789,76 +4561,6 @@
|
||||
marks the TSC unconditionally unstable at bootup and
|
||||
avoids any further wobbles once the TSC watchdog notices.
|
||||
|
||||
tsx= [X86] Control Transactional Synchronization
|
||||
Extensions (TSX) feature in Intel processors that
|
||||
support TSX control.
|
||||
|
||||
This parameter controls the TSX feature. The options are:
|
||||
|
||||
on - Enable TSX on the system. Although there are
|
||||
mitigations for all known security vulnerabilities,
|
||||
TSX has been known to be an accelerator for
|
||||
several previous speculation-related CVEs, and
|
||||
so there may be unknown security risks associated
|
||||
with leaving it enabled.
|
||||
|
||||
off - Disable TSX on the system. (Note that this
|
||||
option takes effect only on newer CPUs which are
|
||||
not vulnerable to MDS, i.e., have
|
||||
MSR_IA32_ARCH_CAPABILITIES.MDS_NO=1 and which get
|
||||
the new IA32_TSX_CTRL MSR through a microcode
|
||||
update. This new MSR allows for the reliable
|
||||
deactivation of the TSX functionality.)
|
||||
|
||||
auto - Disable TSX if X86_BUG_TAA is present,
|
||||
otherwise enable TSX on the system.
|
||||
|
||||
Not specifying this option is equivalent to tsx=off.
|
||||
|
||||
See Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
|
||||
for more details.
|
||||
|
||||
tsx_async_abort= [X86,INTEL] Control mitigation for the TSX Async
|
||||
Abort (TAA) vulnerability.
|
||||
|
||||
Similar to Micro-architectural Data Sampling (MDS)
|
||||
certain CPUs that support Transactional
|
||||
Synchronization Extensions (TSX) are vulnerable to an
|
||||
exploit against CPU internal buffers which can forward
|
||||
information to a disclosure gadget under certain
|
||||
conditions.
|
||||
|
||||
In vulnerable processors, the speculatively forwarded
|
||||
data can be used in a cache side channel attack, to
|
||||
access data to which the attacker does not have direct
|
||||
access.
|
||||
|
||||
This parameter controls the TAA mitigation. The
|
||||
options are:
|
||||
|
||||
full - Enable TAA mitigation on vulnerable CPUs
|
||||
if TSX is enabled.
|
||||
|
||||
full,nosmt - Enable TAA mitigation and disable SMT on
|
||||
vulnerable CPUs. If TSX is disabled, SMT
|
||||
is not disabled because CPU is not
|
||||
vulnerable to cross-thread TAA attacks.
|
||||
off - Unconditionally disable TAA mitigation
|
||||
|
||||
On MDS-affected machines, tsx_async_abort=off can be
|
||||
prevented by an active MDS mitigation as both vulnerabilities
|
||||
are mitigated with the same mechanism so in order to disable
|
||||
this mitigation, you need to specify mds=off too.
|
||||
|
||||
Not specifying this option is equivalent to
|
||||
tsx_async_abort=full. On CPUs which are MDS affected
|
||||
and deploy MDS mitigation, TAA mitigation is not
|
||||
required and doesn't provide any additional
|
||||
mitigation.
|
||||
|
||||
For details see:
|
||||
Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
|
||||
|
||||
turbografx.map[2|3]= [HW,JOY]
|
||||
TurboGraFX parallel port interface
|
||||
Format:
|
||||
@@ -4981,8 +4683,6 @@
|
||||
prevent spurious wakeup);
|
||||
n = USB_QUIRK_DELAY_CTRL_MSG (Device needs a
|
||||
pause after every control message);
|
||||
o = USB_QUIRK_HUB_SLOW_RESET (Hub needs extra
|
||||
delay after resetting its port);
|
||||
Example: quirks=0781:5580:bk,0a5c:5834:gij
|
||||
|
||||
usbhid.mousepoll=
|
||||
@@ -5007,13 +4707,13 @@
|
||||
Flags is a set of characters, each corresponding
|
||||
to a common usb-storage quirk flag as follows:
|
||||
a = SANE_SENSE (collect more than 18 bytes
|
||||
of sense data, not on uas);
|
||||
of sense data);
|
||||
b = BAD_SENSE (don't collect more than 18
|
||||
bytes of sense data, not on uas);
|
||||
bytes of sense data);
|
||||
c = FIX_CAPACITY (decrease the reported
|
||||
device capacity by one sector);
|
||||
d = NO_READ_DISC_INFO (don't use
|
||||
READ_DISC_INFO command, not on uas);
|
||||
READ_DISC_INFO command);
|
||||
e = NO_READ_CAPACITY_16 (don't use
|
||||
READ_CAPACITY_16 command);
|
||||
f = NO_REPORT_OPCODES (don't use report opcodes
|
||||
@@ -5027,20 +4727,18 @@
|
||||
device);
|
||||
j = NO_REPORT_LUNS (don't use report luns
|
||||
command, uas only);
|
||||
k = NO_SAME (do not use WRITE_SAME, uas only)
|
||||
l = NOT_LOCKABLE (don't try to lock and
|
||||
unlock ejectable media, not on uas);
|
||||
unlock ejectable media);
|
||||
m = MAX_SECTORS_64 (don't transfer more
|
||||
than 64 sectors = 32 KB at a time,
|
||||
not on uas);
|
||||
than 64 sectors = 32 KB at a time);
|
||||
n = INITIAL_READ10 (force a retry of the
|
||||
initial READ(10) command, not on uas);
|
||||
initial READ(10) command);
|
||||
o = CAPACITY_OK (accept the capacity
|
||||
reported by the device, not on uas);
|
||||
reported by the device);
|
||||
p = WRITE_CACHE (the device cache is ON
|
||||
by default, not on uas);
|
||||
by default);
|
||||
r = IGNORE_RESIDUE (the device reports
|
||||
bogus residue values, not on uas);
|
||||
bogus residue values);
|
||||
s = SINGLE_LUN (the device has only one
|
||||
Logical Unit);
|
||||
t = NO_ATA_1X (don't allow ATA(12) and ATA(16)
|
||||
@@ -5049,8 +4747,7 @@
|
||||
w = NO_WP_DETECT (don't test whether the
|
||||
medium is write-protected).
|
||||
y = ALWAYS_SYNC (issue a SYNCHRONIZE_CACHE
|
||||
even if the device claims no cache,
|
||||
not on uas)
|
||||
even if the device claims no cache)
|
||||
Example: quirks=0419:aaf5:rl,0421:0433:rc
|
||||
|
||||
user_debug= [KNL,ARM]
|
||||
@@ -5158,6 +4855,12 @@
|
||||
emulate [default] Vsyscalls turn into traps and are
|
||||
emulated reasonably safely.
|
||||
|
||||
native Vsyscalls are native syscall instructions.
|
||||
This is a little bit faster than trapping
|
||||
and makes a few dynamic recompilers work
|
||||
better than they would in emulation mode.
|
||||
It also makes exploits much easier to write.
|
||||
|
||||
none Vsyscalls don't work at all. This makes
|
||||
them quite hard to use for exploits but
|
||||
might break your system.
|
||||
@@ -5289,10 +4992,6 @@
|
||||
the unplug protocol
|
||||
never -- do not unplug even if version check succeeds
|
||||
|
||||
xen_legacy_crash [X86,XEN]
|
||||
Crash from Xen panic notifier, without executing late
|
||||
panic() code such as dumping handler.
|
||||
|
||||
xen_nopvspin [X86,XEN]
|
||||
Disables the ticketlock slowpath using Xen PV
|
||||
optimizations.
|
||||
@@ -5307,14 +5006,6 @@
|
||||
with /sys/devices/system/xen_memory/xen_memory0/scrub_pages.
|
||||
Default value controlled with CONFIG_XEN_SCRUB_PAGES_DEFAULT.
|
||||
|
||||
xen.event_eoi_delay= [XEN]
|
||||
How long to delay EOI handling in case of event
|
||||
storms (jiffies). Default is 10.
|
||||
|
||||
xen.event_loop_timeout= [XEN]
|
||||
After which time (jiffies) the event handling loop
|
||||
should start to delay EOI handling. Default is 2.
|
||||
|
||||
xirc2ps_cs= [NET,PCMCIA]
|
||||
Format:
|
||||
<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
|
||||
|
||||
@@ -405,9 +405,6 @@ time with the option "l1tf=". The valid arguments for this option are:
|
||||
|
||||
off Disables hypervisor mitigations and doesn't emit any
|
||||
warnings.
|
||||
It also drops the swap size and available RAM limit restrictions
|
||||
on both hypervisor and bare metal.
|
||||
|
||||
============ =============================================================
|
||||
|
||||
The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
|
||||
@@ -445,7 +442,6 @@ The default is 'cond'. If 'l1tf=full,force' is given on the kernel command
|
||||
line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
|
||||
module parameter is ignored and writes to the sysfs file are rejected.
|
||||
|
||||
.. _mitigation_selection:
|
||||
|
||||
Mitigation selection guide
|
||||
--------------------------
|
||||
@@ -557,7 +553,7 @@ When nested virtualization is in use, three operating systems are involved:
|
||||
the bare metal hypervisor, the nested hypervisor and the nested virtual
|
||||
machine. VMENTER operations from the nested hypervisor into the nested
|
||||
guest will always be processed by the bare metal hypervisor. If KVM is the
|
||||
bare metal hypervisor it will:
|
||||
bare metal hypervisor it wiil:
|
||||
|
||||
- Flush the L1D cache on every switch from the nested hypervisor to the
|
||||
nested virtual machine, so that the nested hypervisor's secrets are not
|
||||
@@ -580,8 +576,7 @@ Default mitigations
|
||||
The kernel default mitigations for vulnerable processors are:
|
||||
|
||||
- PTE inversion to protect against malicious user space. This is done
|
||||
unconditionally and cannot be controlled. The swap storage is limited
|
||||
to ~16TB.
|
||||
unconditionally and cannot be controlled.
|
||||
|
||||
- L1D conditional flushing on VMENTER when EPT is enabled for
|
||||
a guest.
|
||||
@@ -587,64 +587,6 @@ This governor exposes the following tunables:
|
||||
It effectively causes the frequency to go down ``sampling_down_factor``
|
||||
times slower than it ramps up.
|
||||
|
||||
``interactive``
|
||||
----------------
|
||||
|
||||
The CPUfreq governor `interactive` is designed for latency-sensitive,
|
||||
interactive workloads. This governor sets the CPU speed depending on
|
||||
usage, similar to `ondemand` and `conservative` governors, but with a
|
||||
different set of configurable behaviors.
|
||||
|
||||
The tunable values for this governor are:
|
||||
|
||||
``above_hispeed_delay``
|
||||
When speed is at or above hispeed_freq, wait for
|
||||
this long before raising speed in response to continued high load.
|
||||
The format is a single delay value, optionally followed by pairs of
|
||||
CPU speeds and the delay to use at or above those speeds. Colons can
|
||||
be used between the speeds and associated delays for readability. For
|
||||
example:
|
||||
|
||||
80000 1300000:200000 1500000:40000
|
||||
|
||||
uses delay 80000 uS until CPU speed 1.3 GHz, at which speed delay
|
||||
200000 uS is used until speed 1.5 GHz, at which speed (and above)
|
||||
delay 40000 uS is used. If speeds are specified these must appear in
|
||||
ascending order. Default is 20000 uS.
|
||||
|
||||
``boost``
|
||||
If non-zero, immediately boost speed of all CPUs to at least
|
||||
hispeed_freq until zero is written to this attribute. If zero, allow
|
||||
CPU speeds to drop below hispeed_freq according to load as usual.
|
||||
Default is zero.
|
||||
|
||||
``boostpulse``
|
||||
On each write, immediately boost speed of all CPUs to
|
||||
hispeed_freq for at least the period of time specified by
|
||||
boostpulse_duration, after which speeds are allowed to drop below
|
||||
hispeed_freq according to load as usual. Its a write-only file.
|
||||
|
||||
``boostpulse_duration``
|
||||
Length of time to hold CPU speed at hispeed_freq
|
||||
on a write to boostpulse, before allowing speed to drop according to
|
||||
load as usual. Default is 80000 uS.
|
||||
|
||||
``go_hispeed_load``
|
||||
The CPU load at which to ramp to hispeed_freq.
|
||||
Default is 99%.
|
||||
|
||||
``hispeed_freq``
|
||||
An intermediate "high speed" at which to initially ramp
|
||||
when CPU load hits the value specified in go_hispeed_load. If load
|
||||
stays high for the amount of time specified in above_hispeed_delay,
|
||||
then speed may be bumped higher. Default is the maximum speed allowed
|
||||
by the policy at governor initialization time.
|
||||
|
||||
``io_is_busy``
|
||||
If set, the governor accounts IO time as CPU busy time.
|
||||
|
||||
``min_sample_time``
|
||||
The minimum amount of time to spend at the current
|
||||
|
||||
Frequency Boost Support
|
||||
=======================
|
||||
|
||||
@@ -26,35 +26,23 @@ information is helpful. Any exploit code is very helpful and will not
|
||||
be released without consent from the reporter unless it has already been
|
||||
made public.
|
||||
|
||||
Disclosure and embargoed information
|
||||
------------------------------------
|
||||
Disclosure
|
||||
----------
|
||||
|
||||
The security list is not a disclosure channel. For that, see Coordination
|
||||
below.
|
||||
The goal of the Linux kernel security team is to work with the bug
|
||||
submitter to understand and fix the bug. We prefer to publish the fix as
|
||||
soon as possible, but try to avoid public discussion of the bug itself
|
||||
and leave that to others.
|
||||
|
||||
Once a robust fix has been developed, the release process starts. Fixes
|
||||
for publicly known bugs are released immediately.
|
||||
|
||||
Although our preference is to release fixes for publicly undisclosed bugs
|
||||
as soon as they become available, this may be postponed at the request of
|
||||
the reporter or an affected party for up to 7 calendar days from the start
|
||||
of the release process, with an exceptional extension to 14 calendar days
|
||||
if it is agreed that the criticality of the bug requires more time. The
|
||||
only valid reason for deferring the publication of a fix is to accommodate
|
||||
the logistics of QA and large scale rollouts which require release
|
||||
coordination.
|
||||
|
||||
Whilst embargoed information may be shared with trusted individuals in
|
||||
order to develop a fix, such information will not be published alongside
|
||||
the fix or on any other disclosure channel without the permission of the
|
||||
reporter. This includes but is not limited to the original bug report
|
||||
and followup discussions (if any), exploits, CVE information or the
|
||||
identity of the reporter.
|
||||
|
||||
In other words our only interest is in getting bugs fixed. All other
|
||||
information submitted to the security list and any followup discussions
|
||||
of the report are treated confidentially even after the embargo has been
|
||||
lifted, in perpetuity.
|
||||
Publishing the fix may be delayed when the bug or the fix is not yet
|
||||
fully understood, the solution is not well-tested or for vendor
|
||||
coordination. However, we expect these delays to be short, measurable in
|
||||
days, not weeks or months. A release date is negotiated by the security
|
||||
team working with the bug submitter as well as vendors. However, the
|
||||
kernel security team holds the final say when setting a timeframe. The
|
||||
timeframe varies from immediate (esp. if it's already publicly known bug)
|
||||
to a few weeks. As a basic default policy, we expect report date to
|
||||
release date to be on the order of 7 days.
|
||||
|
||||
Coordination
|
||||
------------
|
||||
@@ -80,7 +68,7 @@ may delay the bug handling. If a reporter wishes to have a CVE identifier
|
||||
assigned ahead of public disclosure, they will need to contact the private
|
||||
linux-distros list, described above. When such a CVE identifier is known
|
||||
before a patch is provided, it is desirable to mention it in the commit
|
||||
message if the reporter agrees.
|
||||
message, though.
|
||||
|
||||
Non-disclosure agreements
|
||||
-------------------------
|
||||
|
||||
@@ -6,7 +6,7 @@ TL;DR summary
|
||||
* Use only NEON instructions, or VFP instructions that don't rely on support
|
||||
code
|
||||
* Isolate your NEON code in a separate compilation unit, and compile it with
|
||||
'-march=armv7-a -mfpu=neon -mfloat-abi=softfp'
|
||||
'-mfpu=neon -mfloat-abi=softfp'
|
||||
* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
|
||||
NEON code
|
||||
* Don't sleep in your NEON code, and be aware that it will be executed with
|
||||
@@ -87,7 +87,7 @@ instructions appearing in unexpected places if no special care is taken.
|
||||
Therefore, the recommended and only supported way of using NEON/VFP in the
|
||||
kernel is by adhering to the following rules:
|
||||
* isolate the NEON code in a separate compilation unit and compile it with
|
||||
'-march=armv7-a -mfpu=neon -mfloat-abi=softfp';
|
||||
'-mfpu=neon -mfloat-abi=softfp';
|
||||
* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
|
||||
into the unit containing the NEON code from a compilation unit which is *not*
|
||||
built with the GCC flag '-mfpu=neon' set.
|
||||
|
||||
@@ -178,7 +178,3 @@ HWCAP_ILRCPC
|
||||
HWCAP_FLAGM
|
||||
|
||||
Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001.
|
||||
|
||||
HWCAP_SSBS
|
||||
|
||||
Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010.
|
||||
|
||||
@@ -1,27 +0,0 @@
|
||||
==================
|
||||
ARM64 Architecture
|
||||
==================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
acpi_object_usage
|
||||
arm-acpi
|
||||
booting
|
||||
cpu-feature-registers
|
||||
elf_hwcaps
|
||||
hugetlbpage
|
||||
legacy_instructions
|
||||
memory
|
||||
pointer-authentication
|
||||
silicon-errata
|
||||
sve
|
||||
tagged-address-abi
|
||||
tagged-pointers
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
||||
@@ -44,8 +44,6 @@ stable kernels.
|
||||
|
||||
| Implementor | Component | Erratum ID | Kconfig |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 |
|
||||
| | | | |
|
||||
| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 |
|
||||
| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 |
|
||||
| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 |
|
||||
@@ -58,8 +56,6 @@ stable kernels.
|
||||
| ARM | Cortex-A72 | #853709 | N/A |
|
||||
| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 |
|
||||
| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 |
|
||||
| ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 |
|
||||
| ARM | Neoverse-N1 | #1542419 | ARM64_ERRATUM_1542419 |
|
||||
| ARM | MMU-500 | #841119,#826419 | N/A |
|
||||
| | | | |
|
||||
| Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 |
|
||||
|
||||
@@ -1,163 +0,0 @@
|
||||
==========================
|
||||
AArch64 TAGGED ADDRESS ABI
|
||||
==========================
|
||||
|
||||
Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
|
||||
Catalin Marinas <catalin.marinas@arm.com>
|
||||
|
||||
Date: 21 August 2019
|
||||
|
||||
This document describes the usage and semantics of the Tagged Address
|
||||
ABI on AArch64 Linux.
|
||||
|
||||
1. Introduction
|
||||
---------------
|
||||
|
||||
On AArch64 the ``TCR_EL1.TBI0`` bit is set by default, allowing
|
||||
userspace (EL0) to perform memory accesses through 64-bit pointers with
|
||||
a non-zero top byte. This document describes the relaxation of the
|
||||
syscall ABI that allows userspace to pass certain tagged pointers to
|
||||
kernel syscalls.
|
||||
|
||||
2. AArch64 Tagged Address ABI
|
||||
-----------------------------
|
||||
|
||||
From the kernel syscall interface perspective and for the purposes of
|
||||
this document, a "valid tagged pointer" is a pointer with a potentially
|
||||
non-zero top-byte that references an address in the user process address
|
||||
space obtained in one of the following ways:
|
||||
|
||||
- ``mmap()`` syscall where either:
|
||||
|
||||
- flags have the ``MAP_ANONYMOUS`` bit set or
|
||||
- the file descriptor refers to a regular file (including those
|
||||
returned by ``memfd_create()``) or ``/dev/zero``
|
||||
|
||||
- ``brk()`` syscall (i.e. the heap area between the initial location of
|
||||
the program break at process creation and its current location).
|
||||
|
||||
- any memory mapped by the kernel in the address space of the process
|
||||
during creation and with the same restrictions as for ``mmap()`` above
|
||||
(e.g. data, bss, stack).
|
||||
|
||||
The AArch64 Tagged Address ABI has two stages of relaxation depending
|
||||
how the user addresses are used by the kernel:
|
||||
|
||||
1. User addresses not accessed by the kernel but used for address space
|
||||
management (e.g. ``mprotect()``, ``madvise()``). The use of valid
|
||||
tagged pointers in this context is allowed with the exception of
|
||||
``brk()``, ``mmap()`` and the ``new_address`` argument to
|
||||
``mremap()`` as these have the potential to alias with existing
|
||||
user addresses.
|
||||
|
||||
NOTE: This behaviour changed in v5.6 and so some earlier kernels may
|
||||
incorrectly accept valid tagged pointers for the ``brk()``,
|
||||
``mmap()`` and ``mremap()`` system calls.
|
||||
|
||||
2. User addresses accessed by the kernel (e.g. ``write()``). This ABI
|
||||
relaxation is disabled by default and the application thread needs to
|
||||
explicitly enable it via ``prctl()`` as follows:
|
||||
|
||||
- ``PR_SET_TAGGED_ADDR_CTRL``: enable or disable the AArch64 Tagged
|
||||
Address ABI for the calling thread.
|
||||
|
||||
The ``(unsigned int) arg2`` argument is a bit mask describing the
|
||||
control mode used:
|
||||
|
||||
- ``PR_TAGGED_ADDR_ENABLE``: enable AArch64 Tagged Address ABI.
|
||||
Default status is disabled.
|
||||
|
||||
Arguments ``arg3``, ``arg4``, and ``arg5`` must be 0.
|
||||
|
||||
- ``PR_GET_TAGGED_ADDR_CTRL``: get the status of the AArch64 Tagged
|
||||
Address ABI for the calling thread.
|
||||
|
||||
Arguments ``arg2``, ``arg3``, ``arg4``, and ``arg5`` must be 0.
|
||||
|
||||
The ABI properties described above are thread-scoped, inherited on
|
||||
clone() and fork() and cleared on exec().
|
||||
|
||||
Calling ``prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)``
|
||||
returns ``-EINVAL`` if the AArch64 Tagged Address ABI is globally
|
||||
disabled by ``sysctl abi.tagged_addr_disabled=1``. The default
|
||||
``sysctl abi.tagged_addr_disabled`` configuration is 0.
|
||||
|
||||
When the AArch64 Tagged Address ABI is enabled for a thread, the
|
||||
following behaviours are guaranteed:
|
||||
|
||||
- All syscalls except the cases mentioned in section 3 can accept any
|
||||
valid tagged pointer.
|
||||
|
||||
- The syscall behaviour is undefined for invalid tagged pointers: it may
|
||||
result in an error code being returned, a (fatal) signal being raised,
|
||||
or other modes of failure.
|
||||
|
||||
- The syscall behaviour for a valid tagged pointer is the same as for
|
||||
the corresponding untagged pointer.
|
||||
|
||||
|
||||
A definition of the meaning of tagged pointers on AArch64 can be found
|
||||
in Documentation/arm64/tagged-pointers.rst.
|
||||
|
||||
3. AArch64 Tagged Address ABI Exceptions
|
||||
-----------------------------------------
|
||||
|
||||
The following system call parameters must be untagged regardless of the
|
||||
ABI relaxation:
|
||||
|
||||
- ``prctl()`` other than pointers to user data either passed directly or
|
||||
indirectly as arguments to be accessed by the kernel.
|
||||
|
||||
- ``ioctl()`` other than pointers to user data either passed directly or
|
||||
indirectly as arguments to be accessed by the kernel.
|
||||
|
||||
- ``shmat()`` and ``shmdt()``.
|
||||
|
||||
Any attempt to use non-zero tagged pointers may result in an error code
|
||||
being returned, a (fatal) signal being raised, or other modes of
|
||||
failure.
|
||||
|
||||
4. Example of correct usage
|
||||
---------------------------
|
||||
.. code-block:: c
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <unistd.h>
|
||||
#include <sys/mman.h>
|
||||
#include <sys/prctl.h>
|
||||
|
||||
#define PR_SET_TAGGED_ADDR_CTRL 55
|
||||
#define PR_TAGGED_ADDR_ENABLE (1UL << 0)
|
||||
|
||||
#define TAG_SHIFT 56
|
||||
|
||||
int main(void)
|
||||
{
|
||||
int tbi_enabled = 0;
|
||||
unsigned long tag = 0;
|
||||
char *ptr;
|
||||
|
||||
/* check/enable the tagged address ABI */
|
||||
if (!prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0))
|
||||
tbi_enabled = 1;
|
||||
|
||||
/* memory allocation */
|
||||
ptr = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE,
|
||||
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
|
||||
if (ptr == MAP_FAILED)
|
||||
return 1;
|
||||
|
||||
/* set a non-zero tag if the ABI is available */
|
||||
if (tbi_enabled)
|
||||
tag = rand() & 0xff;
|
||||
ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT));
|
||||
|
||||
/* memory access to a tagged address */
|
||||
strcpy(ptr, "tagged pointer\n");
|
||||
|
||||
/* syscall with a tagged pointer */
|
||||
write(1, ptr, strlen(ptr));
|
||||
|
||||
return 0;
|
||||
}
|
||||
@@ -1,75 +0,0 @@
|
||||
=========================================
|
||||
Tagged virtual addresses in AArch64 Linux
|
||||
=========================================
|
||||
|
||||
Author: Will Deacon <will.deacon@arm.com>
|
||||
|
||||
Date : 12 June 2013
|
||||
|
||||
This document briefly describes the provision of tagged virtual
|
||||
addresses in the AArch64 translation system and their potential uses
|
||||
in AArch64 Linux.
|
||||
|
||||
The kernel configures the translation tables so that translations made
|
||||
via TTBR0 (i.e. userspace mappings) have the top byte (bits 63:56) of
|
||||
the virtual address ignored by the translation hardware. This frees up
|
||||
this byte for application use.
|
||||
|
||||
|
||||
Passing tagged addresses to the kernel
|
||||
--------------------------------------
|
||||
|
||||
All interpretation of userspace memory addresses by the kernel assumes
|
||||
an address tag of 0x00, unless the application enables the AArch64
|
||||
Tagged Address ABI explicitly
|
||||
(Documentation/arm64/tagged-address-abi.rst).
|
||||
|
||||
This includes, but is not limited to, addresses found in:
|
||||
|
||||
- pointer arguments to system calls, including pointers in structures
|
||||
passed to system calls,
|
||||
|
||||
- the stack pointer (sp), e.g. when interpreting it to deliver a
|
||||
signal,
|
||||
|
||||
- the frame pointer (x29) and frame records, e.g. when interpreting
|
||||
them to generate a backtrace or call graph.
|
||||
|
||||
Using non-zero address tags in any of these locations when the
|
||||
userspace application did not enable the AArch64 Tagged Address ABI may
|
||||
result in an error code being returned, a (fatal) signal being raised,
|
||||
or other modes of failure.
|
||||
|
||||
For these reasons, when the AArch64 Tagged Address ABI is disabled,
|
||||
passing non-zero address tags to the kernel via system calls is
|
||||
forbidden, and using a non-zero address tag for sp is strongly
|
||||
discouraged.
|
||||
|
||||
Programs maintaining a frame pointer and frame records that use non-zero
|
||||
address tags may suffer impaired or inaccurate debug and profiling
|
||||
visibility.
|
||||
|
||||
|
||||
Preserving tags
|
||||
---------------
|
||||
|
||||
Non-zero tags are not preserved when delivering signals. This means that
|
||||
signal handlers in applications making use of tags cannot rely on the
|
||||
tag information for user virtual addresses being maintained for fields
|
||||
inside siginfo_t. One exception to this rule is for signals raised in
|
||||
response to watchpoint debug exceptions, where the tag information will
|
||||
be preserved.
|
||||
|
||||
The architecture prevents the use of a tagged PC, so the upper byte will
|
||||
be set to a sign-extension of bit 55 on exception return.
|
||||
|
||||
This behaviour is maintained when the AArch64 Tagged Address ABI is
|
||||
enabled.
|
||||
|
||||
|
||||
Other considerations
|
||||
--------------------
|
||||
|
||||
Special care should be taken when using tagged pointers, since it is
|
||||
likely that C compilers will not hazard two virtual addresses differing
|
||||
only in the upper byte.
|
||||
@@ -18,9 +18,7 @@ Passing tagged addresses to the kernel
|
||||
--------------------------------------
|
||||
|
||||
All interpretation of userspace memory addresses by the kernel assumes
|
||||
an address tag of 0x00, unless the application enables the AArch64
|
||||
Tagged Address ABI explicitly
|
||||
(Documentation/arm64/tagged-address-abi.rst).
|
||||
an address tag of 0x00.
|
||||
|
||||
This includes, but is not limited to, addresses found in:
|
||||
|
||||
@@ -33,15 +31,13 @@ This includes, but is not limited to, addresses found in:
|
||||
- the frame pointer (x29) and frame records, e.g. when interpreting
|
||||
them to generate a backtrace or call graph.
|
||||
|
||||
Using non-zero address tags in any of these locations when the
|
||||
userspace application did not enable the AArch64 Tagged Address ABI may
|
||||
result in an error code being returned, a (fatal) signal being raised,
|
||||
or other modes of failure.
|
||||
Using non-zero address tags in any of these locations may result in an
|
||||
error code being returned, a (fatal) signal being raised, or other modes
|
||||
of failure.
|
||||
|
||||
For these reasons, when the AArch64 Tagged Address ABI is disabled,
|
||||
passing non-zero address tags to the kernel via system calls is
|
||||
forbidden, and using a non-zero address tag for sp is strongly
|
||||
discouraged.
|
||||
For these reasons, passing non-zero address tags to the kernel via
|
||||
system calls is forbidden, and using a non-zero address tag for sp is
|
||||
strongly discouraged.
|
||||
|
||||
Programs maintaining a frame pointer and frame records that use non-zero
|
||||
address tags may suffer impaired or inaccurate debug and profiling
|
||||
@@ -61,9 +57,6 @@ be preserved.
|
||||
The architecture prevents the use of a tagged PC, so the upper byte will
|
||||
be set to a sign-extension of bit 55 on exception return.
|
||||
|
||||
This behaviour is maintained when the AArch64 Tagged Address ABI is
|
||||
enabled.
|
||||
|
||||
|
||||
Other considerations
|
||||
--------------------
|
||||
|
||||
@@ -177,9 +177,6 @@ These helper barriers exist because architectures have varying implicit
|
||||
ordering on their SMP atomic primitives. For example our TSO architectures
|
||||
provide full ordered atomics and these barriers are no-ops.
|
||||
|
||||
NOTE: when the atomic RmW ops are fully ordered, they should also imply a
|
||||
compiler barrier.
|
||||
|
||||
Thus:
|
||||
|
||||
atomic_fetch_add();
|
||||
|
||||
@@ -1,183 +0,0 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=================
|
||||
Inline Encryption
|
||||
=================
|
||||
|
||||
Objective
|
||||
=========
|
||||
|
||||
We want to support inline encryption (IE) in the kernel.
|
||||
To allow for testing, we also want a crypto API fallback when actual
|
||||
IE hardware is absent. We also want IE to work with layered devices
|
||||
like dm and loopback (i.e. we want to be able to use the IE hardware
|
||||
of the underlying devices if present, or else fall back to crypto API
|
||||
en/decryption).
|
||||
|
||||
|
||||
Constraints and notes
|
||||
=====================
|
||||
|
||||
- IE hardware have a limited number of "keyslots" that can be programmed
|
||||
with an encryption context (key, algorithm, data unit size, etc.) at any time.
|
||||
One can specify a keyslot in a data request made to the device, and the
|
||||
device will en/decrypt the data using the encryption context programmed into
|
||||
that specified keyslot. When possible, we want to make multiple requests with
|
||||
the same encryption context share the same keyslot.
|
||||
|
||||
- We need a way for filesystems to specify an encryption context to use for
|
||||
en/decrypting a struct bio, and a device driver (like UFS) needs to be able
|
||||
to use that encryption context when it processes the bio.
|
||||
|
||||
- We need a way for device drivers to expose their capabilities in a unified
|
||||
way to the upper layers.
|
||||
|
||||
|
||||
Design
|
||||
======
|
||||
|
||||
We add a struct bio_crypt_ctx to struct bio that can represent an
|
||||
encryption context, because we need to be able to pass this encryption
|
||||
context from the FS layer to the device driver to act upon.
|
||||
|
||||
While IE hardware works on the notion of keyslots, the FS layer has no
|
||||
knowledge of keyslots - it simply wants to specify an encryption context to
|
||||
use while en/decrypting a bio.
|
||||
|
||||
We introduce a keyslot manager (KSM) that handles the translation from
|
||||
encryption contexts specified by the FS to keyslots on the IE hardware.
|
||||
This KSM also serves as the way IE hardware can expose their capabilities to
|
||||
upper layers. The generic mode of operation is: each device driver that wants
|
||||
to support IE will construct a KSM and set it up in its struct request_queue.
|
||||
Upper layers that want to use IE on this device can then use this KSM in
|
||||
the device's struct request_queue to translate an encryption context into
|
||||
a keyslot. The presence of the KSM in the request queue shall be used to mean
|
||||
that the device supports IE.
|
||||
|
||||
On the device driver end of the interface, the device driver needs to tell the
|
||||
KSM how to actually manipulate the IE hardware in the device to do things like
|
||||
programming the crypto key into the IE hardware into a particular keyslot. All
|
||||
this is achieved through the :c:type:`struct keyslot_mgmt_ll_ops` that the
|
||||
device driver passes to the KSM when creating it.
|
||||
|
||||
It uses refcounts to track which keyslots are idle (either they have no
|
||||
encryption context programmed, or there are no in-flight struct bios
|
||||
referencing that keyslot). When a new encryption context needs a keyslot, it
|
||||
tries to find a keyslot that has already been programmed with the same
|
||||
encryption context, and if there is no such keyslot, it evicts the least
|
||||
recently used idle keyslot and programs the new encryption context into that
|
||||
one. If no idle keyslots are available, then the caller will sleep until there
|
||||
is at least one.
|
||||
|
||||
|
||||
Blk-crypto
|
||||
==========
|
||||
|
||||
The above is sufficient for simple cases, but does not work if there is a
|
||||
need for a crypto API fallback, or if we are want to use IE with layered
|
||||
devices. To these ends, we introduce blk-crypto. Blk-crypto allows us to
|
||||
present a unified view of encryption to the FS (so FS only needs to specify
|
||||
an encryption context and not worry about keyslots at all), and blk-crypto
|
||||
can decide whether to delegate the en/decryption to IE hardware or to the
|
||||
crypto API. Blk-crypto maintains an internal KSM that serves as the crypto
|
||||
API fallback.
|
||||
|
||||
Blk-crypto needs to ensure that the encryption context is programmed into the
|
||||
"correct" keyslot manager for IE. If a bio is submitted to a layered device
|
||||
that eventually passes the bio down to a device that really does support IE, we
|
||||
want the encryption context to be programmed into a keyslot for the KSM of the
|
||||
device with IE support. However, blk-crypto does not know a priori whether a
|
||||
particular device is the final device in the layering structure for a bio or
|
||||
not. So in the case that a particular device does not support IE, since it is
|
||||
possibly the final destination device for the bio, if the bio requires
|
||||
encryption (i.e. the bio is doing a write operation), blk-crypto must fallback
|
||||
to the crypto API *before* sending the bio to the device.
|
||||
|
||||
Blk-crypto ensures that:
|
||||
|
||||
- The bio's encryption context is programmed into a keyslot in the KSM of the
|
||||
request queue that the bio is being submitted to (or the crypto API fallback
|
||||
KSM if the request queue doesn't have a KSM), and that the ``bc_ksm``
|
||||
in the ``bi_crypt_context`` is set to this KSM
|
||||
|
||||
- That the bio has its own individual reference to the keyslot in this KSM.
|
||||
Once the bio passes through blk-crypto, its encryption context is programmed
|
||||
in some KSM. The "its own individual reference to the keyslot" ensures that
|
||||
keyslots can be released by each bio independently of other bios while
|
||||
ensuring that the bio has a valid reference to the keyslot when, for e.g., the
|
||||
crypto API fallback KSM in blk-crypto performs crypto on the device's behalf.
|
||||
The individual references are ensured by increasing the refcount for the
|
||||
keyslot in the ``bc_ksm`` when a bio with a programmed encryption
|
||||
context is cloned.
|
||||
|
||||
|
||||
What blk-crypto does on bio submission
|
||||
--------------------------------------
|
||||
|
||||
**Case 1:** blk-crypto is given a bio with only an encryption context that hasn't
|
||||
been programmed into any keyslot in any KSM (for e.g. a bio from the FS).
|
||||
In this case, blk-crypto will program the encryption context into the KSM of the
|
||||
request queue the bio is being submitted to (and if this KSM does not exist,
|
||||
then it will program it into blk-crypto's internal KSM for crypto API
|
||||
fallback). The KSM that this encryption context was programmed into is stored
|
||||
as the ``bc_ksm`` in the bio's ``bi_crypt_context``.
|
||||
|
||||
**Case 2:** blk-crypto is given a bio whose encryption context has already been
|
||||
programmed into a keyslot in the *crypto API fallback* KSM.
|
||||
In this case, blk-crypto does nothing; it treats the bio as not having
|
||||
specified an encryption context. Note that we cannot do here what we will do
|
||||
in Case 3 because we would have already encrypted the bio via the crypto API
|
||||
by this point.
|
||||
|
||||
**Case 3:** blk-crypto is given a bio whose encryption context has already been
|
||||
programmed into a keyslot in some KSM (that is *not* the crypto API fallback
|
||||
KSM).
|
||||
In this case, blk-crypto first releases that keyslot from that KSM and then
|
||||
treats the bio as in Case 1.
|
||||
|
||||
This way, when a device driver is processing a bio, it can be sure that
|
||||
the bio's encryption context has been programmed into some KSM (either the
|
||||
device driver's request queue's KSM, or blk-crypto's crypto API fallback KSM).
|
||||
It then simply needs to check if the bio's ``bc_ksm`` is the device's
|
||||
request queue's KSM. If so, then it should proceed with IE. If not, it should
|
||||
simply do nothing with respect to crypto, because some other KSM (perhaps the
|
||||
blk-crypto crypto API fallback KSM) is handling the en/decryption.
|
||||
|
||||
Blk-crypto will release the keyslot that is being held by the bio (and also
|
||||
decrypt it if the bio is using the crypto API fallback KSM) once
|
||||
``bio_remaining_done`` returns true for the bio.
|
||||
|
||||
|
||||
Layered Devices
|
||||
===============
|
||||
|
||||
Layered devices that wish to support IE need to create their own keyslot
|
||||
manager for their request queue, and expose whatever functionality they choose.
|
||||
When a layered device wants to pass a bio to another layer (either by
|
||||
resubmitting the same bio, or by submitting a clone), it doesn't need to do
|
||||
anything special because the bio (or the clone) will once again pass through
|
||||
blk-crypto, which will work as described in Case 3. If a layered device wants
|
||||
for some reason to do the IO by itself instead of passing it on to a child
|
||||
device, but it also chose to expose IE capabilities by setting up a KSM in its
|
||||
request queue, it is then responsible for en/decrypting the data itself. In
|
||||
such cases, the device can choose to call the blk-crypto function
|
||||
``blk_crypto_fallback_to_kernel_crypto_api`` (TODO: Not yet implemented), which will
|
||||
cause the en/decryption to be done via the crypto API fallback.
|
||||
|
||||
|
||||
Future Optimizations for layered devices
|
||||
========================================
|
||||
|
||||
Creating a keyslot manager for the layered device uses up memory for each
|
||||
keyslot, and in general, a layered device (like dm-linear) merely passes the
|
||||
request on to a "child" device, so the keyslots in the layered device itself
|
||||
might be completely unused. We can instead define a new type of KSM; the
|
||||
"passthrough KSM", that layered devices can use to let blk-crypto know that
|
||||
this layered device *will* pass the bio to some child device (and hence
|
||||
through blk-crypto again, at which point blk-crypto can program the encryption
|
||||
context, instead of programming it into the layered device's KSM). Again, if
|
||||
the device "lies" and decides to do the IO itself instead of passing it on to
|
||||
a child device, it is responsible for doing the en/decryption (and can choose
|
||||
to call ``blk_crypto_fallback_to_kernel_crypto_api``). Another use case for the
|
||||
"passthrough KSM" is for IE devices that want to manage their own keyslots/do
|
||||
not have a limited number of keyslots.
|
||||
@@ -156,23 +156,19 @@ Per-device statistics are exported as various nodes under /sys/block/zram<id>/
|
||||
A brief description of exported device attributes. For more details please
|
||||
read Documentation/ABI/testing/sysfs-block-zram.
|
||||
|
||||
Name access description
|
||||
---- ------ -----------
|
||||
disksize RW show and set the device's disk size
|
||||
initstate RO shows the initialization state of the device
|
||||
reset WO trigger device reset
|
||||
mem_used_max WO reset the `mem_used_max' counter (see later)
|
||||
mem_limit WO specifies the maximum amount of memory ZRAM can use
|
||||
to store the compressed data
|
||||
writeback_limit WO specifies the maximum amount of write IO zram can
|
||||
write out to backing device as 4KB unit
|
||||
writeback_limit_enable RW show and set writeback_limit feature
|
||||
max_comp_streams RW the number of possible concurrent compress operations
|
||||
comp_algorithm RW show and change the compression algorithm
|
||||
compact WO trigger memory compaction
|
||||
debug_stat RO this file is used for zram debugging purposes
|
||||
backing_dev RW set up backend storage for zram to write out
|
||||
idle WO mark allocated slot as idle
|
||||
Name access description
|
||||
---- ------ -----------
|
||||
disksize RW show and set the device's disk size
|
||||
initstate RO shows the initialization state of the device
|
||||
reset WO trigger device reset
|
||||
mem_used_max WO reset the `mem_used_max' counter (see later)
|
||||
mem_limit WO specifies the maximum amount of memory ZRAM can use
|
||||
to store the compressed data
|
||||
max_comp_streams RW the number of possible concurrent compress operations
|
||||
comp_algorithm RW show and change the compression algorithm
|
||||
compact WO trigger memory compaction
|
||||
debug_stat RO this file is used for zram debugging purposes
|
||||
backing_dev RW set up backend storage for zram to write out
|
||||
|
||||
|
||||
User space is advised to use the following files to read the device statistics.
|
||||
@@ -224,17 +220,6 @@ line of text and contains the following stats separated by whitespace:
|
||||
pages_compacted the number of pages freed during compaction
|
||||
huge_pages the number of incompressible pages
|
||||
|
||||
File /sys/block/zram<id>/bd_stat
|
||||
|
||||
The stat file represents device's backing device statistics. It consists of
|
||||
a single line of text and contains the following stats separated by whitespace:
|
||||
bd_count size of data written in backing device.
|
||||
Unit: 4K bytes
|
||||
bd_reads the number of reads from backing device
|
||||
Unit: 4K bytes
|
||||
bd_writes the number of writes to backing device
|
||||
Unit: 4K bytes
|
||||
|
||||
9) Deactivate:
|
||||
swapoff /dev/zram0
|
||||
umount /dev/zram1
|
||||
@@ -252,79 +237,11 @@ a single line of text and contains the following stats separated by whitespace:
|
||||
|
||||
= writeback
|
||||
|
||||
With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
|
||||
With incompressible pages, there is no memory saving with zram.
|
||||
Instead, with CONFIG_ZRAM_WRITEBACK, zram can write incompressible page
|
||||
to backing storage rather than keeping it in memory.
|
||||
To use the feature, admin should set up backing device via
|
||||
|
||||
"echo /dev/sda5 > /sys/block/zramX/backing_dev"
|
||||
|
||||
before disksize setting. It supports only partition at this moment.
|
||||
If admin want to use incompressible page writeback, they could do via
|
||||
|
||||
"echo huge > /sys/block/zramX/write"
|
||||
|
||||
To use idle page writeback, first, user need to declare zram pages
|
||||
as idle.
|
||||
|
||||
"echo all > /sys/block/zramX/idle"
|
||||
|
||||
From now on, any pages on zram are idle pages. The idle mark
|
||||
will be removed until someone request access of the block.
|
||||
IOW, unless there is access request, those pages are still idle pages.
|
||||
|
||||
Admin can request writeback of those idle pages at right timing via
|
||||
|
||||
"echo idle > /sys/block/zramX/writeback"
|
||||
|
||||
With the command, zram writeback idle pages from memory to the storage.
|
||||
|
||||
If there are lots of write IO with flash device, potentially, it has
|
||||
flash wearout problem so that admin needs to design write limitation
|
||||
to guarantee storage health for entire product life.
|
||||
|
||||
To overcome the concern, zram supports "writeback_limit" feature.
|
||||
The "writeback_limit_enable"'s default value is 0 so that it doesn't limit
|
||||
any writeback. IOW, if admin want to apply writeback budget, he should
|
||||
enable writeback_limit_enable via
|
||||
|
||||
$ echo 1 > /sys/block/zramX/writeback_limit_enable
|
||||
|
||||
Once writeback_limit_enable is set, zram doesn't allow any writeback
|
||||
until admin set the budget via /sys/block/zramX/writeback_limit.
|
||||
|
||||
(If admin doesn't enable writeback_limit_enable, writeback_limit's value
|
||||
assigned via /sys/block/zramX/writeback_limit is meaninless.)
|
||||
|
||||
If admin want to limit writeback as per-day 400M, he could do it
|
||||
like below.
|
||||
|
||||
$ MB_SHIFT=20
|
||||
$ 4K_SHIFT=12
|
||||
$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
|
||||
/sys/block/zram0/writeback_limit.
|
||||
$ echo 1 > /sys/block/zram0/writeback_limit_enable
|
||||
|
||||
If admin want to allow further write again once the bugdet is exausted,
|
||||
he could do it like below
|
||||
|
||||
$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
|
||||
/sys/block/zram0/writeback_limit
|
||||
|
||||
If admin want to see remaining writeback budget since he set,
|
||||
|
||||
$ cat /sys/block/zramX/writeback_limit
|
||||
|
||||
If admin want to disable writeback limit, he could do
|
||||
|
||||
$ echo 0 > /sys/block/zramX/writeback_limit_enable
|
||||
|
||||
The writeback_limit count will reset whenever you reset zram(e.g.,
|
||||
system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of
|
||||
writeback happened until you reset the zram to allocate extra writeback
|
||||
budget in next setting is user's job.
|
||||
|
||||
If admin want to measure writeback count in a certain period, he could
|
||||
know it via /sys/block/zram0/bd_stat's 3rd column.
|
||||
User should set up backing device via /sys/block/zramX/backing_dev
|
||||
before disksize setting.
|
||||
|
||||
= memory tracking
|
||||
|
||||
@@ -334,17 +251,16 @@ pages of the process with*pagemap.
|
||||
If you enable the feature, you could see block state via
|
||||
/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
|
||||
|
||||
300 75.033841 .wh.
|
||||
301 63.806904 s...
|
||||
302 63.806919 ..hi
|
||||
300 75.033841 .wh
|
||||
301 63.806904 s..
|
||||
302 63.806919 ..h
|
||||
|
||||
First column is zram's block index.
|
||||
Second column is access time since the system was booted
|
||||
Third column is state of the block.
|
||||
(s: same page
|
||||
w: written page to backing store
|
||||
h: huge page
|
||||
i: idle page)
|
||||
h: huge page)
|
||||
|
||||
First line of above example says 300th block is accessed at 75.033841sec
|
||||
and the block's state is huge so it is written back to the backing
|
||||
|
||||
@@ -37,7 +37,7 @@ needs_sphinx = '1.3'
|
||||
extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure', 'sphinx.ext.ifconfig']
|
||||
|
||||
# The name of the math extension changed on Sphinx 1.4
|
||||
if (major == 1 and minor > 3) or (major > 1):
|
||||
if major == 1 and minor > 3:
|
||||
extensions.append("sphinx.ext.imgmath")
|
||||
else:
|
||||
extensions.append("sphinx.ext.pngmath")
|
||||
|
||||
@@ -99,20 +99,16 @@ Coarse and fast_ns access
|
||||
|
||||
Some additional variants exist for more specialized cases:
|
||||
|
||||
.. c:function:: ktime_t ktime_get_coarse( void )
|
||||
ktime_t ktime_get_coarse_boottime( void )
|
||||
.. c:function:: ktime_t ktime_get_coarse_boottime( void )
|
||||
ktime_t ktime_get_coarse_real( void )
|
||||
ktime_t ktime_get_coarse_clocktai( void )
|
||||
|
||||
.. c:function:: u64 ktime_get_coarse_ns( void )
|
||||
u64 ktime_get_coarse_boottime_ns( void )
|
||||
u64 ktime_get_coarse_real_ns( void )
|
||||
u64 ktime_get_coarse_clocktai_ns( void )
|
||||
ktime_t ktime_get_coarse_raw( void )
|
||||
|
||||
.. c:function:: void ktime_get_coarse_ts64( struct timespec64 * )
|
||||
void ktime_get_coarse_boottime_ts64( struct timespec64 * )
|
||||
void ktime_get_coarse_real_ts64( struct timespec64 * )
|
||||
void ktime_get_coarse_clocktai_ts64( struct timespec64 * )
|
||||
void ktime_get_coarse_raw_ts64( struct timespec64 * )
|
||||
|
||||
These are quicker than the non-coarse versions, but less accurate,
|
||||
corresponding to CLOCK_MONONOTNIC_COARSE and CLOCK_REALTIME_COARSE
|
||||
|
||||
@@ -34,6 +34,10 @@ Configure the kernel with::
|
||||
CONFIG_DEBUG_FS=y
|
||||
CONFIG_GCOV_KERNEL=y
|
||||
|
||||
select the gcc's gcov format, default is autodetect based on gcc version::
|
||||
|
||||
CONFIG_GCOV_FORMAT_AUTODETECT=y
|
||||
|
||||
and to get coverage data for the entire kernel::
|
||||
|
||||
CONFIG_GCOV_PROFILE_ALL=y
|
||||
@@ -165,20 +169,6 @@ b) gcov is run on the BUILD machine
|
||||
[user@build] gcov -o /tmp/coverage/tmp/out/init main.c
|
||||
|
||||
|
||||
Note on compilers
|
||||
-----------------
|
||||
|
||||
GCC and LLVM gcov tools are not necessarily compatible. Use gcov_ to work with
|
||||
GCC-generated .gcno and .gcda files, and use llvm-cov_ for Clang.
|
||||
|
||||
.. _gcov: http://gcc.gnu.org/onlinedocs/gcc/Gcov.html
|
||||
.. _llvm-cov: https://llvm.org/docs/CommandGuide/llvm-cov.html
|
||||
|
||||
Build differences between GCC and Clang gcov are handled by Kconfig. It
|
||||
automatically selects the appropriate gcov format depending on the detected
|
||||
toolchain.
|
||||
|
||||
|
||||
Troubleshooting
|
||||
---------------
|
||||
|
||||
|
||||
@@ -4,25 +4,15 @@ The Kernel Address Sanitizer (KASAN)
|
||||
Overview
|
||||
--------
|
||||
|
||||
KernelAddressSANitizer (KASAN) is a dynamic memory error detector designed to
|
||||
find out-of-bound and use-after-free bugs. KASAN has two modes: generic KASAN
|
||||
(similar to userspace ASan) and software tag-based KASAN (similar to userspace
|
||||
HWASan).
|
||||
KernelAddressSANitizer (KASAN) is a dynamic memory error detector. It provides
|
||||
a fast and comprehensive solution for finding use-after-free and out-of-bounds
|
||||
bugs.
|
||||
|
||||
KASAN uses compile-time instrumentation to insert validity checks before every
|
||||
memory access, and therefore requires a compiler version that supports that.
|
||||
KASAN uses compile-time instrumentation for checking every memory access,
|
||||
therefore you will need a GCC version 4.9.2 or later. GCC 5.0 or later is
|
||||
required for detection of out-of-bounds accesses to stack or global variables.
|
||||
|
||||
Generic KASAN is supported in both GCC and Clang. With GCC it requires version
|
||||
4.9.2 or later for basic support and version 5.0 or later for detection of
|
||||
out-of-bounds accesses for stack and global variables and for inline
|
||||
instrumentation mode (see the Usage section). With Clang it requires version
|
||||
7.0.0 or later and it doesn't support detection of out-of-bounds accesses for
|
||||
global variables yet.
|
||||
|
||||
Tag-based KASAN is only supported in Clang and requires version 7.0.0 or later.
|
||||
|
||||
Currently generic KASAN is supported for the x86_64, arm64, xtensa and s390
|
||||
architectures, and tag-based KASAN is supported only for arm64.
|
||||
Currently KASAN is supported only for the x86_64 and arm64 architectures.
|
||||
|
||||
Usage
|
||||
-----
|
||||
@@ -31,14 +21,12 @@ To enable KASAN configure kernel with::
|
||||
|
||||
CONFIG_KASAN = y
|
||||
|
||||
and choose between CONFIG_KASAN_GENERIC (to enable generic KASAN) and
|
||||
CONFIG_KASAN_SW_TAGS (to enable software tag-based KASAN).
|
||||
and choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. Outline and
|
||||
inline are compiler instrumentation types. The former produces smaller binary
|
||||
the latter is 1.1 - 2 times faster. Inline instrumentation requires a GCC
|
||||
version 5.0 or later.
|
||||
|
||||
You also need to choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE.
|
||||
Outline and inline are compiler instrumentation types. The former produces
|
||||
smaller binary while the latter is 1.1 - 2 times faster.
|
||||
|
||||
Both KASAN modes work with both SLUB and SLAB memory allocators.
|
||||
KASAN works with both SLUB and SLAB memory allocators.
|
||||
For better bug detection and nicer reporting, enable CONFIG_STACKTRACE.
|
||||
|
||||
To disable instrumentation for specific files or directories, add a line
|
||||
@@ -55,85 +43,85 @@ similar to the following to the respective kernel Makefile:
|
||||
Error reports
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
A typical out-of-bounds access generic KASAN report looks like this::
|
||||
A typical out of bounds access report looks like this::
|
||||
|
||||
==================================================================
|
||||
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [test_kasan]
|
||||
Write of size 1 at addr ffff8801f44ec37b by task insmod/2760
|
||||
BUG: AddressSanitizer: out of bounds access in kmalloc_oob_right+0x65/0x75 [test_kasan] at addr ffff8800693bc5d3
|
||||
Write of size 1 by task modprobe/1689
|
||||
=============================================================================
|
||||
BUG kmalloc-128 (Not tainted): kasan error
|
||||
-----------------------------------------------------------------------------
|
||||
|
||||
CPU: 1 PID: 2760 Comm: insmod Not tainted 4.19.0-rc3+ #698
|
||||
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
|
||||
Disabling lock debugging due to kernel taint
|
||||
INFO: Allocated in kmalloc_oob_right+0x3d/0x75 [test_kasan] age=0 cpu=0 pid=1689
|
||||
__slab_alloc+0x4b4/0x4f0
|
||||
kmem_cache_alloc_trace+0x10b/0x190
|
||||
kmalloc_oob_right+0x3d/0x75 [test_kasan]
|
||||
init_module+0x9/0x47 [test_kasan]
|
||||
do_one_initcall+0x99/0x200
|
||||
load_module+0x2cb3/0x3b20
|
||||
SyS_finit_module+0x76/0x80
|
||||
system_call_fastpath+0x12/0x17
|
||||
INFO: Slab 0xffffea0001a4ef00 objects=17 used=7 fp=0xffff8800693bd728 flags=0x100000000004080
|
||||
INFO: Object 0xffff8800693bc558 @offset=1368 fp=0xffff8800693bc720
|
||||
|
||||
Bytes b4 ffff8800693bc548: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
|
||||
Object ffff8800693bc558: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||
Object ffff8800693bc568: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||
Object ffff8800693bc578: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||
Object ffff8800693bc588: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||
Object ffff8800693bc598: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||
Object ffff8800693bc5a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||
Object ffff8800693bc5b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
|
||||
Object ffff8800693bc5c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
|
||||
Redzone ffff8800693bc5d8: cc cc cc cc cc cc cc cc ........
|
||||
Padding ffff8800693bc718: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
|
||||
CPU: 0 PID: 1689 Comm: modprobe Tainted: G B 3.18.0-rc1-mm1+ #98
|
||||
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
|
||||
ffff8800693bc000 0000000000000000 ffff8800693bc558 ffff88006923bb78
|
||||
ffffffff81cc68ae 00000000000000f3 ffff88006d407600 ffff88006923bba8
|
||||
ffffffff811fd848 ffff88006d407600 ffffea0001a4ef00 ffff8800693bc558
|
||||
Call Trace:
|
||||
dump_stack+0x94/0xd8
|
||||
print_address_description+0x73/0x280
|
||||
kasan_report+0x144/0x187
|
||||
__asan_report_store1_noabort+0x17/0x20
|
||||
kmalloc_oob_right+0xa8/0xbc [test_kasan]
|
||||
kmalloc_tests_init+0x16/0x700 [test_kasan]
|
||||
do_one_initcall+0xa5/0x3ae
|
||||
do_init_module+0x1b6/0x547
|
||||
load_module+0x75df/0x8070
|
||||
__do_sys_init_module+0x1c6/0x200
|
||||
__x64_sys_init_module+0x6e/0xb0
|
||||
do_syscall_64+0x9f/0x2c0
|
||||
entry_SYSCALL_64_after_hwframe+0x44/0xa9
|
||||
RIP: 0033:0x7f96443109da
|
||||
RSP: 002b:00007ffcf0b51b08 EFLAGS: 00000202 ORIG_RAX: 00000000000000af
|
||||
RAX: ffffffffffffffda RBX: 000055dc3ee521a0 RCX: 00007f96443109da
|
||||
RDX: 00007f96445cff88 RSI: 0000000000057a50 RDI: 00007f9644992000
|
||||
RBP: 000055dc3ee510b0 R08: 0000000000000003 R09: 0000000000000000
|
||||
R10: 00007f964430cd0a R11: 0000000000000202 R12: 00007f96445cff88
|
||||
R13: 000055dc3ee51090 R14: 0000000000000000 R15: 0000000000000000
|
||||
|
||||
Allocated by task 2760:
|
||||
save_stack+0x43/0xd0
|
||||
kasan_kmalloc+0xa7/0xd0
|
||||
kmem_cache_alloc_trace+0xe1/0x1b0
|
||||
kmalloc_oob_right+0x56/0xbc [test_kasan]
|
||||
kmalloc_tests_init+0x16/0x700 [test_kasan]
|
||||
do_one_initcall+0xa5/0x3ae
|
||||
do_init_module+0x1b6/0x547
|
||||
load_module+0x75df/0x8070
|
||||
__do_sys_init_module+0x1c6/0x200
|
||||
__x64_sys_init_module+0x6e/0xb0
|
||||
do_syscall_64+0x9f/0x2c0
|
||||
entry_SYSCALL_64_after_hwframe+0x44/0xa9
|
||||
|
||||
Freed by task 815:
|
||||
save_stack+0x43/0xd0
|
||||
__kasan_slab_free+0x135/0x190
|
||||
kasan_slab_free+0xe/0x10
|
||||
kfree+0x93/0x1a0
|
||||
umh_complete+0x6a/0xa0
|
||||
call_usermodehelper_exec_async+0x4c3/0x640
|
||||
ret_from_fork+0x35/0x40
|
||||
|
||||
The buggy address belongs to the object at ffff8801f44ec300
|
||||
which belongs to the cache kmalloc-128 of size 128
|
||||
The buggy address is located 123 bytes inside of
|
||||
128-byte region [ffff8801f44ec300, ffff8801f44ec380)
|
||||
The buggy address belongs to the page:
|
||||
page:ffffea0007d13b00 count:1 mapcount:0 mapping:ffff8801f7001640 index:0x0
|
||||
flags: 0x200000000000100(slab)
|
||||
raw: 0200000000000100 ffffea0007d11dc0 0000001a0000001a ffff8801f7001640
|
||||
raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000
|
||||
page dumped because: kasan: bad access detected
|
||||
|
||||
[<ffffffff81cc68ae>] dump_stack+0x46/0x58
|
||||
[<ffffffff811fd848>] print_trailer+0xf8/0x160
|
||||
[<ffffffffa00026a7>] ? kmem_cache_oob+0xc3/0xc3 [test_kasan]
|
||||
[<ffffffff811ff0f5>] object_err+0x35/0x40
|
||||
[<ffffffffa0002065>] ? kmalloc_oob_right+0x65/0x75 [test_kasan]
|
||||
[<ffffffff8120b9fa>] kasan_report_error+0x38a/0x3f0
|
||||
[<ffffffff8120a79f>] ? kasan_poison_shadow+0x2f/0x40
|
||||
[<ffffffff8120b344>] ? kasan_unpoison_shadow+0x14/0x40
|
||||
[<ffffffff8120a79f>] ? kasan_poison_shadow+0x2f/0x40
|
||||
[<ffffffffa00026a7>] ? kmem_cache_oob+0xc3/0xc3 [test_kasan]
|
||||
[<ffffffff8120a995>] __asan_store1+0x75/0xb0
|
||||
[<ffffffffa0002601>] ? kmem_cache_oob+0x1d/0xc3 [test_kasan]
|
||||
[<ffffffffa0002065>] ? kmalloc_oob_right+0x65/0x75 [test_kasan]
|
||||
[<ffffffffa0002065>] kmalloc_oob_right+0x65/0x75 [test_kasan]
|
||||
[<ffffffffa00026b0>] init_module+0x9/0x47 [test_kasan]
|
||||
[<ffffffff810002d9>] do_one_initcall+0x99/0x200
|
||||
[<ffffffff811e4e5c>] ? __vunmap+0xec/0x160
|
||||
[<ffffffff81114f63>] load_module+0x2cb3/0x3b20
|
||||
[<ffffffff8110fd70>] ? m_show+0x240/0x240
|
||||
[<ffffffff81115f06>] SyS_finit_module+0x76/0x80
|
||||
[<ffffffff81cd3129>] system_call_fastpath+0x12/0x17
|
||||
Memory state around the buggy address:
|
||||
ffff8801f44ec200: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
|
||||
ffff8801f44ec280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
|
||||
>ffff8801f44ec300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03
|
||||
^
|
||||
ffff8801f44ec380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
|
||||
ffff8801f44ec400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
|
||||
ffff8800693bc300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
||||
ffff8800693bc380: fc fc 00 00 00 00 00 00 00 00 00 00 00 00 00 fc
|
||||
ffff8800693bc400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
||||
ffff8800693bc480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
||||
ffff8800693bc500: fc fc fc fc fc fc fc fc fc fc fc 00 00 00 00 00
|
||||
>ffff8800693bc580: 00 00 00 00 00 00 00 00 00 00 03 fc fc fc fc fc
|
||||
^
|
||||
ffff8800693bc600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
||||
ffff8800693bc680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
|
||||
ffff8800693bc700: fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb fb
|
||||
ffff8800693bc780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
|
||||
ffff8800693bc800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
|
||||
==================================================================
|
||||
|
||||
The header of the report provides a short summary of what kind of bug happened
|
||||
and what kind of access caused it. It's followed by a stack trace of the bad
|
||||
access, a stack trace of where the accessed memory was allocated (in case bad
|
||||
access happens on a slab object), and a stack trace of where the object was
|
||||
freed (in case of a use-after-free bug report). Next comes a description of
|
||||
the accessed slab object and information about the accessed memory page.
|
||||
The header of the report discribe what kind of bug happened and what kind of
|
||||
access caused it. It's followed by the description of the accessed slub object
|
||||
(see 'SLUB Debug output' section in Documentation/vm/slub.rst for details) and
|
||||
the description of the accessed memory page.
|
||||
|
||||
In the last section the report shows memory state around the accessed address.
|
||||
Reading this part requires some understanding of how KASAN works.
|
||||
@@ -150,24 +138,18 @@ inaccessible memory like redzones or freed memory (see mm/kasan/kasan.h).
|
||||
In the report above the arrows point to the shadow byte 03, which means that
|
||||
the accessed address is partially accessible.
|
||||
|
||||
For tag-based KASAN this last report section shows the memory tags around the
|
||||
accessed address (see Implementation details section).
|
||||
|
||||
|
||||
Implementation details
|
||||
----------------------
|
||||
|
||||
Generic KASAN
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
From a high level, our approach to memory error detection is similar to that
|
||||
of kmemcheck: use shadow memory to record whether each byte of memory is safe
|
||||
to access, and use compile-time instrumentation to insert checks of shadow
|
||||
memory on each memory access.
|
||||
to access, and use compile-time instrumentation to check shadow memory on each
|
||||
memory access.
|
||||
|
||||
Generic KASAN dedicates 1/8th of kernel memory to its shadow memory (e.g. 16TB
|
||||
to cover 128TB on x86_64) and uses direct mapping with a scale and offset to
|
||||
translate a memory address to its corresponding shadow address.
|
||||
AddressSanitizer dedicates 1/8 of kernel memory to its shadow memory
|
||||
(e.g. 16TB to cover 128TB on x86_64) and uses direct mapping with a scale and
|
||||
offset to translate a memory address to its corresponding shadow address.
|
||||
|
||||
Here is the function which translates an address to its corresponding shadow
|
||||
address::
|
||||
@@ -180,38 +162,12 @@ address::
|
||||
|
||||
where ``KASAN_SHADOW_SCALE_SHIFT = 3``.
|
||||
|
||||
Compile-time instrumentation is used to insert memory access checks. Compiler
|
||||
inserts function calls (__asan_load*(addr), __asan_store*(addr)) before each
|
||||
memory access of size 1, 2, 4, 8 or 16. These functions check whether memory
|
||||
access is valid or not by checking corresponding shadow memory.
|
||||
Compile-time instrumentation used for checking memory accesses. Compiler inserts
|
||||
function calls (__asan_load*(addr), __asan_store*(addr)) before each memory
|
||||
access of size 1, 2, 4, 8 or 16. These functions check whether memory access is
|
||||
valid or not by checking corresponding shadow memory.
|
||||
|
||||
GCC 5.0 has possibility to perform inline instrumentation. Instead of making
|
||||
function calls GCC directly inserts the code to check the shadow memory.
|
||||
This option significantly enlarges kernel but it gives x1.1-x2 performance
|
||||
boost over outline instrumented kernel.
|
||||
|
||||
Software tag-based KASAN
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Tag-based KASAN uses the Top Byte Ignore (TBI) feature of modern arm64 CPUs to
|
||||
store a pointer tag in the top byte of kernel pointers. Like generic KASAN it
|
||||
uses shadow memory to store memory tags associated with each 16-byte memory
|
||||
cell (therefore it dedicates 1/16th of the kernel memory for shadow memory).
|
||||
|
||||
On each memory allocation tag-based KASAN generates a random tag, tags the
|
||||
allocated memory with this tag, and embeds this tag into the returned pointer.
|
||||
Software tag-based KASAN uses compile-time instrumentation to insert checks
|
||||
before each memory access. These checks make sure that tag of the memory that
|
||||
is being accessed is equal to tag of the pointer that is used to access this
|
||||
memory. In case of a tag mismatch tag-based KASAN prints a bug report.
|
||||
|
||||
Software tag-based KASAN also has two instrumentation modes (outline, that
|
||||
emits callbacks to check memory accesses; and inline, that performs the shadow
|
||||
memory checks inline). With outline instrumentation mode, a bug report is
|
||||
simply printed from the function that performs the access check. With inline
|
||||
instrumentation a brk instruction is emitted by the compiler, and a dedicated
|
||||
brk handler is used to print bug reports.
|
||||
|
||||
A potential expansion of this mode is a hardware tag-based mode, which would
|
||||
use hardware memory tagging support instead of compiler instrumentation and
|
||||
manual shadow memory manipulation.
|
||||
|
||||
@@ -34,7 +34,6 @@ Profiling data will only become accessible once debugfs has been mounted::
|
||||
|
||||
Coverage collection
|
||||
-------------------
|
||||
|
||||
The following program demonstrates coverage collection from within a test
|
||||
program using kcov:
|
||||
|
||||
@@ -129,7 +128,6 @@ only need to enable coverage (disable happens automatically on thread end).
|
||||
|
||||
Comparison operands collection
|
||||
------------------------------
|
||||
|
||||
Comparison operands collection is similar to coverage collection:
|
||||
|
||||
.. code-block:: c
|
||||
@@ -204,130 +202,3 @@ Comparison operands collection is similar to coverage collection:
|
||||
|
||||
Note that the kcov modes (coverage collection or comparison operands) are
|
||||
mutually exclusive.
|
||||
|
||||
Remote coverage collection
|
||||
--------------------------
|
||||
|
||||
With KCOV_ENABLE coverage is collected only for syscalls that are issued
|
||||
from the current process. With KCOV_REMOTE_ENABLE it's possible to collect
|
||||
coverage for arbitrary parts of the kernel code, provided that those parts
|
||||
are annotated with kcov_remote_start()/kcov_remote_stop().
|
||||
|
||||
This allows to collect coverage from two types of kernel background
|
||||
threads: the global ones, that are spawned during kernel boot in a limited
|
||||
number of instances (e.g. one USB hub_event() worker thread is spawned per
|
||||
USB HCD); and the local ones, that are spawned when a user interacts with
|
||||
some kernel interface (e.g. vhost workers).
|
||||
|
||||
To enable collecting coverage from a global background thread, a unique
|
||||
global handle must be assigned and passed to the corresponding
|
||||
kcov_remote_start() call. Then a userspace process can pass a list of such
|
||||
handles to the KCOV_REMOTE_ENABLE ioctl in the handles array field of the
|
||||
kcov_remote_arg struct. This will attach the used kcov device to the code
|
||||
sections, that are referenced by those handles.
|
||||
|
||||
Since there might be many local background threads spawned from different
|
||||
userspace processes, we can't use a single global handle per annotation.
|
||||
Instead, the userspace process passes a non-zero handle through the
|
||||
common_handle field of the kcov_remote_arg struct. This common handle gets
|
||||
saved to the kcov_handle field in the current task_struct and needs to be
|
||||
passed to the newly spawned threads via custom annotations. Those threads
|
||||
should in turn be annotated with kcov_remote_start()/kcov_remote_stop().
|
||||
|
||||
Internally kcov stores handles as u64 integers. The top byte of a handle
|
||||
is used to denote the id of a subsystem that this handle belongs to, and
|
||||
the lower 4 bytes are used to denote the id of a thread instance within
|
||||
that subsystem. A reserved value 0 is used as a subsystem id for common
|
||||
handles as they don't belong to a particular subsystem. The bytes 4-7 are
|
||||
currently reserved and must be zero. In the future the number of bytes
|
||||
used for the subsystem or handle ids might be increased.
|
||||
|
||||
When a particular userspace proccess collects coverage by via a common
|
||||
handle, kcov will collect coverage for each code section that is annotated
|
||||
to use the common handle obtained as kcov_handle from the current
|
||||
task_struct. However non common handles allow to collect coverage
|
||||
selectively from different subsystems.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct kcov_remote_arg {
|
||||
__u32 trace_mode;
|
||||
__u32 area_size;
|
||||
__u32 num_handles;
|
||||
__aligned_u64 common_handle;
|
||||
__aligned_u64 handles[0];
|
||||
};
|
||||
|
||||
#define KCOV_INIT_TRACE _IOR('c', 1, unsigned long)
|
||||
#define KCOV_DISABLE _IO('c', 101)
|
||||
#define KCOV_REMOTE_ENABLE _IOW('c', 102, struct kcov_remote_arg)
|
||||
|
||||
#define COVER_SIZE (64 << 10)
|
||||
|
||||
#define KCOV_TRACE_PC 0
|
||||
|
||||
#define KCOV_SUBSYSTEM_COMMON (0x00ull << 56)
|
||||
#define KCOV_SUBSYSTEM_USB (0x01ull << 56)
|
||||
|
||||
#define KCOV_SUBSYSTEM_MASK (0xffull << 56)
|
||||
#define KCOV_INSTANCE_MASK (0xffffffffull)
|
||||
|
||||
static inline __u64 kcov_remote_handle(__u64 subsys, __u64 inst)
|
||||
{
|
||||
if (subsys & ~KCOV_SUBSYSTEM_MASK || inst & ~KCOV_INSTANCE_MASK)
|
||||
return 0;
|
||||
return subsys | inst;
|
||||
}
|
||||
|
||||
#define KCOV_COMMON_ID 0x42
|
||||
#define KCOV_USB_BUS_NUM 1
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
int fd;
|
||||
unsigned long *cover, n, i;
|
||||
struct kcov_remote_arg *arg;
|
||||
|
||||
fd = open("/sys/kernel/debug/kcov", O_RDWR);
|
||||
if (fd == -1)
|
||||
perror("open"), exit(1);
|
||||
if (ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE))
|
||||
perror("ioctl"), exit(1);
|
||||
cover = (unsigned long*)mmap(NULL, COVER_SIZE * sizeof(unsigned long),
|
||||
PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
|
||||
if ((void*)cover == MAP_FAILED)
|
||||
perror("mmap"), exit(1);
|
||||
|
||||
/* Enable coverage collection via common handle and from USB bus #1. */
|
||||
arg = calloc(1, sizeof(*arg) + sizeof(uint64_t));
|
||||
if (!arg)
|
||||
perror("calloc"), exit(1);
|
||||
arg->trace_mode = KCOV_TRACE_PC;
|
||||
arg->area_size = COVER_SIZE;
|
||||
arg->num_handles = 1;
|
||||
arg->common_handle = kcov_remote_handle(KCOV_SUBSYSTEM_COMMON,
|
||||
KCOV_COMMON_ID);
|
||||
arg->handles[0] = kcov_remote_handle(KCOV_SUBSYSTEM_USB,
|
||||
KCOV_USB_BUS_NUM);
|
||||
if (ioctl(fd, KCOV_REMOTE_ENABLE, arg))
|
||||
perror("ioctl"), free(arg), exit(1);
|
||||
free(arg);
|
||||
|
||||
/*
|
||||
* Here the user needs to trigger execution of a kernel code section
|
||||
* that is either annotated with the common handle, or to trigger some
|
||||
* activity on USB bus #1.
|
||||
*/
|
||||
sleep(2);
|
||||
|
||||
n = __atomic_load_n(&cover[0], __ATOMIC_RELAXED);
|
||||
for (i = 0; i < n; i++)
|
||||
printf("0x%lx\n", cover[i + 1]);
|
||||
if (ioctl(fd, KCOV_DISABLE, 0))
|
||||
perror("ioctl"), exit(1);
|
||||
if (munmap(cover, COVER_SIZE * sizeof(unsigned long)))
|
||||
perror("munmap"), exit(1);
|
||||
if (close(fd))
|
||||
perror("close"), exit(1);
|
||||
return 0;
|
||||
}
|
||||
|
||||
@@ -1,99 +0,0 @@
|
||||
dm_bow (backup on write)
|
||||
========================
|
||||
|
||||
dm_bow is a device mapper driver that uses the free space on a device to back up
|
||||
data that is overwritten. The changes can then be committed by a simple state
|
||||
change, or rolled back by removing the dm_bow device and running a command line
|
||||
utility over the underlying device.
|
||||
|
||||
dm_bow has three states, set by writing ‘1’ or ‘2’ to /sys/block/dm-?/bow/state.
|
||||
It is only possible to go from state 0 (initial state) to state 1, and then from
|
||||
state 1 to state 2.
|
||||
|
||||
State 0: dm_bow collects all trims to the device and assumes that these mark
|
||||
free space on the overlying file system that can be safely used. Typically the
|
||||
mount code would create the dm_bow device, mount the file system, call the
|
||||
FITRIM ioctl on the file system then switch to state 1. These trims are not
|
||||
propagated to the underlying device.
|
||||
|
||||
State 1: All writes to the device cause the underlying data to be backed up to
|
||||
the free (trimmed) area as needed in such a way as they can be restored.
|
||||
However, the writes, with one exception, then happen exactly as they would
|
||||
without dm_bow, so the device is always in a good final state. The exception is
|
||||
that sector 0 is used to keep a log of the latest changes, both to indicate that
|
||||
we are in this state and to allow rollback. See below for all details. If there
|
||||
isn't enough free space, writes are failed with -ENOSPC.
|
||||
|
||||
State 2: The transition to state 2 triggers replacing the special sector 0 with
|
||||
the normal sector 0, and the freeing of all state information. dm_bow then
|
||||
becomes a pass-through driver, allowing the device to continue to be used with
|
||||
minimal performance impact.
|
||||
|
||||
Usage
|
||||
=====
|
||||
dm-bow takes one command line parameter, the name of the underlying device.
|
||||
|
||||
dm-bow will typically be used in the following way. dm-bow will be loaded with a
|
||||
suitable underlying device and the resultant device will be mounted. A file
|
||||
system trim will be issued via the FITRIM ioctl, then the device will be
|
||||
switched to state 1. The file system will now be used as normal. At some point,
|
||||
the changes can either be committed by switching to state 2, or rolled back by
|
||||
unmounting the file system, removing the dm-bow device and running the command
|
||||
line utility. Note that rebooting the device will be equivalent to unmounting
|
||||
and removing, but the command line utility must still be run
|
||||
|
||||
Details of operation in state 1
|
||||
===============================
|
||||
|
||||
dm_bow maintains a type for all sectors. A sector can be any of:
|
||||
|
||||
SECTOR0
|
||||
SECTOR0_CURRENT
|
||||
UNCHANGED
|
||||
FREE
|
||||
CHANGED
|
||||
BACKUP
|
||||
|
||||
SECTOR0 is the first sector on the device, and is used to hold the log of
|
||||
changes. This is the one exception.
|
||||
|
||||
SECTOR0_CURRENT is a sector picked from the FREE sectors, and is where reads and
|
||||
writes from the true sector zero are redirected to. Note that like any backup
|
||||
sector, if the sector is written to directly, it must be moved again.
|
||||
|
||||
UNCHANGED means that the sector has not been changed since we entered state 1.
|
||||
Thus if it is written to or trimmed, the contents must first be backed up.
|
||||
|
||||
FREE means that the sector was trimmed in state 0 and has not yet been written
|
||||
to or used for backup. On being written to, a FREE sector is changed to CHANGED.
|
||||
|
||||
CHANGED means that the sector has been modified, and can be further modified
|
||||
without further backup.
|
||||
|
||||
BACKUP means that this is a free sector being used as a backup. On being written
|
||||
to, the contents must first be backed up again.
|
||||
|
||||
All backup operations are logged to the first sector. The log sector has the
|
||||
format:
|
||||
--------------------------------------------------------
|
||||
| Magic | Count | Sequence | Log entry | Log entry | …
|
||||
--------------------------------------------------------
|
||||
|
||||
Magic is a magic number. Count is the number of log entries. Sequence is 0
|
||||
initially. A log entry is
|
||||
|
||||
-----------------------------------
|
||||
| Source | Dest | Size | Checksum |
|
||||
-----------------------------------
|
||||
|
||||
When SECTOR0 is full, the log sector is backed up and another empty log sector
|
||||
created with sequence number one higher. The first entry in any log entry with
|
||||
sequence > 0 therefore must be the log of the backing up of the previous log
|
||||
sector. Note that sequence is not strictly needed, but is a useful sanity check
|
||||
and potentially limits the time spent trying to restore a corrupted snapshot.
|
||||
|
||||
On entering state 1, dm_bow has a list of free sectors. All other sectors are
|
||||
unchanged. Sector0_current is selected from the free sectors and the contents of
|
||||
sector 0 are copied there. The sector 0 is backed up, which triggers the first
|
||||
log entry to be written.
|
||||
|
||||
@@ -146,13 +146,6 @@ block_size:number
|
||||
Supported values are 512, 1024, 2048 and 4096 bytes. If not
|
||||
specified the default block size is 512 bytes.
|
||||
|
||||
legacy_recalculate
|
||||
Allow recalculating of volumes with HMAC keys. This is disabled by
|
||||
default for security reasons - an attacker could modify the volume,
|
||||
set recalc_sector to zero, and the kernel would not detect the
|
||||
modification.
|
||||
|
||||
|
||||
The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can
|
||||
be changed when reloading the target (load an inactive table and swap the
|
||||
tables with suspend and resume). The other arguments should not be changed
|
||||
|
||||
@@ -75,15 +75,6 @@ its hardware characteristcs.
|
||||
|
||||
* port or ports: same as above.
|
||||
|
||||
* Optional properties for all components:
|
||||
|
||||
* arm,coresight-loses-context-with-cpu : boolean. Indicates that the
|
||||
hardware will lose register context on CPU power down (e.g. CPUIdle).
|
||||
An example of where this may be needed are systems which contain a
|
||||
coresight component and CPU in the same power domain. When the CPU
|
||||
powers down the coresight component also powers down and loses its
|
||||
context. This property is currently only used for the ETM 4.x driver.
|
||||
|
||||
* Optional properties for ETM/PTMs:
|
||||
|
||||
* arm,cp14: must be present if the system accesses ETM/PTM management
|
||||
|
||||
@@ -164,10 +164,6 @@ Rockchip platforms device tree bindings
|
||||
Required root node properties:
|
||||
- compatible = "rockchip,px3-evb", "rockchip,px3", "rockchip,rk3188";
|
||||
|
||||
- Rockchip PX30 Evaluation board (Linux):
|
||||
Required root node properties:
|
||||
- compatible = "rockchip,px30-evb-ddr3-v11-linux", "rockchip,px30";
|
||||
|
||||
- Rockchip PX5 Evaluation board:
|
||||
Required root node properties:
|
||||
- compatible = "rockchip,px5-evb", "rockchip,px5", "rockchip,rk3368";
|
||||
@@ -191,17 +187,10 @@ Rockchip platforms device tree bindings
|
||||
- Rockchip RK3229 Evaluation board:
|
||||
- compatible = "rockchip,rk3229-evb", "rockchip,rk3229";
|
||||
|
||||
- Rockchip RK3229 Echo board:
|
||||
- compatible = "rockchip,rk3229-echo", "rockchip,rk3229";
|
||||
|
||||
- Rockchip RK3288 Fennec board:
|
||||
Required root node properties:
|
||||
- compatible = "rockchip,rk3288-fennec", "rockchip,rk3288";
|
||||
|
||||
- Rockchip RK3326 f863 rkisp1 board:
|
||||
Required root node properties:
|
||||
- compatible = "rockchip,rk3326-863-lp3-v10-rkisp1", "rockchip,rk3326";
|
||||
|
||||
- Rockchip RK3328 evb:
|
||||
Required root node properties:
|
||||
- compatible = "rockchip,rk3328-evb", "rockchip,rk3328";
|
||||
@@ -210,11 +199,6 @@ Rockchip platforms device tree bindings
|
||||
Required root node properties:
|
||||
- compatible = "rockchip,rk3399-evb", "rockchip,rk3399";
|
||||
|
||||
- Rockchip RK3399 evb ind board (Linux):
|
||||
Required root node properties:
|
||||
- compatible = "rockchip,linux", "rockchip,rk3399";
|
||||
- compatible = "rockchip,rk3399-evb-ind-lpddr4-linux";
|
||||
|
||||
- Rockchip RK3399 Sapphire board standalone:
|
||||
Required root node properties:
|
||||
- compatible = "rockchip,rk3399-sapphire", "rockchip,rk3399";
|
||||
@@ -223,15 +207,6 @@ Rockchip platforms device tree bindings
|
||||
Required root node properties:
|
||||
- compatible = "rockchip,rk3399-sapphire-excavator", "rockchip,rk3399";
|
||||
|
||||
- Rockchip RK3399pro evb:
|
||||
Required root node properties:
|
||||
- compatible = "rockchip,rk3399pro-evb-v10", "rockchip,rk3399pro";
|
||||
- compatible = "rockchip,rk3399pro-evb-v10-linux", "rockchip,rk3399pro";
|
||||
- compatible = "rockchip,rk3399pro-evb-v11-linux", "rockchip,rk3399pro";
|
||||
- compatible = "rockchip,rk3399pro-evb-v13-linux", "rockchip,rk3399pro";
|
||||
- compatible = "rockchip,rk3399pro-evb-v13-multi-cam", "rockchip,rk3399pro";
|
||||
- compatible = "rockchip,rk3399pro-evb-v14-linux", "rockchip,rk3399pro";
|
||||
|
||||
- Theobroma Systems RK3368-uQ7 Haikou Baseboard:
|
||||
Required root node properties:
|
||||
- compatible = "tsd,rk3368-uq7-haikou", "rockchip,rk3368";
|
||||
@@ -243,7 +218,3 @@ Rockchip platforms device tree bindings
|
||||
- Tronsmart Orion R68 Meta
|
||||
Required root node properties:
|
||||
- compatible = "tronsmart,orion-r68-meta", "rockchip,rk3368";
|
||||
|
||||
- Rockchip RK3568 NVR Demo V10:
|
||||
Required root node properties:
|
||||
- compatible = "rockchip,rk3568-nvr-demo-ddr4-v10-linux-spi-nand", "rockchip,rk3568";
|
||||
|
||||
@@ -35,7 +35,6 @@ Required standard properties:
|
||||
"ti,sysc-omap3-sham"
|
||||
"ti,sysc-omap-aes"
|
||||
"ti,sysc-mcasp"
|
||||
"ti,sysc-dra7-mcasp"
|
||||
"ti,sysc-usb-host-fs"
|
||||
"ti,sysc-dra7-mcan"
|
||||
|
||||
|
||||
@@ -46,7 +46,7 @@ Required properties:
|
||||
Example (R-Car H3):
|
||||
|
||||
usb2_clksel: clock-controller@e6590630 {
|
||||
compatible = "renesas,r8a7795-rcar-usb2-clock-sel",
|
||||
compatible = "renesas,r8a77950-rcar-usb2-clock-sel",
|
||||
"renesas,rcar-gen3-usb2-clock-sel";
|
||||
reg = <0 0xe6590630 0 0x02>;
|
||||
clocks = <&cpg CPG_MOD 703>, <&usb_extal>, <&usb_xtal>;
|
||||
|
||||
@@ -1,9 +1,7 @@
|
||||
Rockchip Electronics And Security Accelerator
|
||||
|
||||
Required properties:
|
||||
- compatible: should be one of the following:
|
||||
- "rockchip,rk3288-crypto": for rk3288
|
||||
- "rockchip,px30-crypto": for px30
|
||||
- compatible: Should be "rockchip,rk3288-crypto"
|
||||
- reg: Base physical address of the engine and length of memory mapped
|
||||
region
|
||||
- interrupts: Interrupt number
|
||||
|
||||
@@ -1,22 +1,8 @@
|
||||
|
||||
* Rockchip DFI device
|
||||
* Rockchip rk3399 DFI device
|
||||
|
||||
Required properties:
|
||||
- compatible: Should be one of the following.
|
||||
- "rockchip,px30-dfi" - for PX30 SoCs.
|
||||
- "rockchip,rk1808-dfi" - for RK1808 SoCs.
|
||||
- "rockchip,rk3128-dfi" - for RK3128 SoCs.
|
||||
- "rockchip,rk3288-dfi" - for RK3288 SoCs.
|
||||
- "rockchip,rk3328-dfi" - for RK3328 SoCs.
|
||||
- "rockchip,rk3368-dfi" - for RK3368 SoCs.
|
||||
- "rockchip,rk3399-dfi" - for RK3399 SoCs.
|
||||
- "rockchip,rk3568-dfi" - for RK3568 SoCs.
|
||||
- "rockchip,rv1126-dfi" - for RV1126 SoCs.
|
||||
|
||||
Required properties for RK3368:
|
||||
- rockchip,grf: phandle to the syscon managing the "general register files"
|
||||
|
||||
Required properties for RK3399:
|
||||
- compatible: Must be "rockchip,rk3399-dfi".
|
||||
- reg: physical base address of each DFI and length of memory mapped region
|
||||
- rockchip,pmu: phandle to the syscon managing the "pmu general register files"
|
||||
- clocks: phandles for clock specified in "clock-names" property
|
||||
|
||||
213
Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt
Normal file
213
Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt
Normal file
@@ -0,0 +1,213 @@
|
||||
* Rockchip rk3399 DMC (Dynamic Memory Controller) device
|
||||
|
||||
Required properties:
|
||||
- compatible: Must be "rockchip,rk3399-dmc".
|
||||
- devfreq-events: Node to get DDR loading, Refer to
|
||||
Documentation/devicetree/bindings/devfreq/event/
|
||||
rockchip-dfi.txt
|
||||
- clocks: Phandles for clock specified in "clock-names" property
|
||||
- clock-names : The name of clock used by the DFI, must be
|
||||
"pclk_ddr_mon";
|
||||
- operating-points-v2: Refer to Documentation/devicetree/bindings/opp/opp.txt
|
||||
for details.
|
||||
- center-supply: DMC supply node.
|
||||
- status: Marks the node enabled/disabled.
|
||||
|
||||
Optional properties:
|
||||
- interrupts: The CPU interrupt number. The interrupt specifier
|
||||
format depends on the interrupt controller.
|
||||
It should be a DCF interrupt. When DDR DVFS finishes
|
||||
a DCF interrupt is triggered.
|
||||
|
||||
Following properties relate to DDR timing:
|
||||
|
||||
- rockchip,dram_speed_bin : Value reference include/dt-bindings/clock/rk3399-ddr.h,
|
||||
it selects the DDR3 cl-trp-trcd type. It must be
|
||||
set according to "Speed Bin" in DDR3 datasheet,
|
||||
DO NOT use a smaller "Speed Bin" than specified
|
||||
for the DDR3 being used.
|
||||
|
||||
- rockchip,pd_idle : Configure the PD_IDLE value. Defines the
|
||||
power-down idle period in which memories are
|
||||
placed into power-down mode if bus is idle
|
||||
for PD_IDLE DFI clock cycles.
|
||||
|
||||
- rockchip,sr_idle : Configure the SR_IDLE value. Defines the
|
||||
self-refresh idle period in which memories are
|
||||
placed into self-refresh mode if bus is idle
|
||||
for SR_IDLE * 1024 DFI clock cycles (DFI
|
||||
clocks freq is half of DRAM clock), default
|
||||
value is "0".
|
||||
|
||||
- rockchip,sr_mc_gate_idle : Defines the memory self-refresh and controller
|
||||
clock gating idle period. Memories are placed
|
||||
into self-refresh mode and memory controller
|
||||
clock arg gating started if bus is idle for
|
||||
sr_mc_gate_idle*1024 DFI clock cycles.
|
||||
|
||||
- rockchip,srpd_lite_idle : Defines the self-refresh power down idle
|
||||
period in which memories are placed into
|
||||
self-refresh power down mode if bus is idle
|
||||
for srpd_lite_idle * 1024 DFI clock cycles.
|
||||
This parameter is for LPDDR4 only.
|
||||
|
||||
- rockchip,standby_idle : Defines the standby idle period in which
|
||||
memories are placed into self-refresh mode.
|
||||
The controller, pi, PHY and DRAM clock will
|
||||
be gated if bus is idle for standby_idle * DFI
|
||||
clock cycles.
|
||||
|
||||
- rockchip,dram_dll_dis_freq : Defines the DDR3 DLL bypass frequency in MHz.
|
||||
When DDR frequency is less than DRAM_DLL_DISB_FREQ,
|
||||
DDR3 DLL will be bypassed. Note: if DLL was bypassed,
|
||||
the odt will also stop working.
|
||||
|
||||
- rockchip,phy_dll_dis_freq : Defines the PHY dll bypass frequency in
|
||||
MHz (Mega Hz). When DDR frequency is less than
|
||||
DRAM_DLL_DISB_FREQ, PHY DLL will be bypassed.
|
||||
Note: PHY DLL and PHY ODT are independent.
|
||||
|
||||
- rockchip,ddr3_odt_dis_freq : When the DRAM type is DDR3, this parameter defines
|
||||
the ODT disable frequency in MHz (Mega Hz).
|
||||
when the DDR frequency is less then ddr3_odt_dis_freq,
|
||||
the ODT on the DRAM side and controller side are
|
||||
both disabled.
|
||||
|
||||
- rockchip,ddr3_drv : When the DRAM type is DDR3, this parameter defines
|
||||
the DRAM side driver strength in ohms. Default
|
||||
value is DDR3_DS_40ohm.
|
||||
|
||||
- rockchip,ddr3_odt : When the DRAM type is DDR3, this parameter defines
|
||||
the DRAM side ODT strength in ohms. Default value
|
||||
is DDR3_ODT_120ohm.
|
||||
|
||||
- rockchip,phy_ddr3_ca_drv : When the DRAM type is DDR3, this parameter defines
|
||||
the phy side CA line (incluing command line,
|
||||
address line and clock line) driver strength.
|
||||
Default value is PHY_DRV_ODT_40.
|
||||
|
||||
- rockchip,phy_ddr3_dq_drv : When the DRAM type is DDR3, this parameter defines
|
||||
the PHY side DQ line (including DQS/DQ/DM line)
|
||||
driver strength. Default value is PHY_DRV_ODT_40.
|
||||
|
||||
- rockchip,phy_ddr3_odt : When the DRAM type is DDR3, this parameter defines
|
||||
the PHY side ODT strength. Default value is
|
||||
PHY_DRV_ODT_240.
|
||||
|
||||
- rockchip,lpddr3_odt_dis_freq : When the DRAM type is LPDDR3, this parameter defines
|
||||
then ODT disable frequency in MHz (Mega Hz).
|
||||
When DDR frequency is less then ddr3_odt_dis_freq,
|
||||
the ODT on the DRAM side and controller side are
|
||||
both disabled.
|
||||
|
||||
- rockchip,lpddr3_drv : When the DRAM type is LPDDR3, this parameter defines
|
||||
the DRAM side driver strength in ohms. Default
|
||||
value is LP3_DS_34ohm.
|
||||
|
||||
- rockchip,lpddr3_odt : When the DRAM type is LPDDR3, this parameter defines
|
||||
the DRAM side ODT strength in ohms. Default value
|
||||
is LP3_ODT_240ohm.
|
||||
|
||||
- rockchip,phy_lpddr3_ca_drv : When the DRAM type is LPDDR3, this parameter defines
|
||||
the PHY side CA line (including command line,
|
||||
address line and clock line) driver strength.
|
||||
Default value is PHY_DRV_ODT_40.
|
||||
|
||||
- rockchip,phy_lpddr3_dq_drv : When the DRAM type is LPDDR3, this parameter defines
|
||||
the PHY side DQ line (including DQS/DQ/DM line)
|
||||
driver strength. Default value is
|
||||
PHY_DRV_ODT_40.
|
||||
|
||||
- rockchip,phy_lpddr3_odt : When dram type is LPDDR3, this parameter define
|
||||
the phy side odt strength, default value is
|
||||
PHY_DRV_ODT_240.
|
||||
|
||||
- rockchip,lpddr4_odt_dis_freq : When the DRAM type is LPDDR4, this parameter
|
||||
defines the ODT disable frequency in
|
||||
MHz (Mega Hz). When the DDR frequency is less then
|
||||
ddr3_odt_dis_freq, the ODT on the DRAM side and
|
||||
controller side are both disabled.
|
||||
|
||||
- rockchip,lpddr4_drv : When the DRAM type is LPDDR4, this parameter defines
|
||||
the DRAM side driver strength in ohms. Default
|
||||
value is LP4_PDDS_60ohm.
|
||||
|
||||
- rockchip,lpddr4_dq_odt : When the DRAM type is LPDDR4, this parameter defines
|
||||
the DRAM side ODT on DQS/DQ line strength in ohms.
|
||||
Default value is LP4_DQ_ODT_40ohm.
|
||||
|
||||
- rockchip,lpddr4_ca_odt : When the DRAM type is LPDDR4, this parameter defines
|
||||
the DRAM side ODT on CA line strength in ohms.
|
||||
Default value is LP4_CA_ODT_40ohm.
|
||||
|
||||
- rockchip,phy_lpddr4_ca_drv : When the DRAM type is LPDDR4, this parameter defines
|
||||
the PHY side CA line (including command address
|
||||
line) driver strength. Default value is
|
||||
PHY_DRV_ODT_40.
|
||||
|
||||
- rockchip,phy_lpddr4_ck_cs_drv : When the DRAM type is LPDDR4, this parameter defines
|
||||
the PHY side clock line and CS line driver
|
||||
strength. Default value is PHY_DRV_ODT_80.
|
||||
|
||||
- rockchip,phy_lpddr4_dq_drv : When the DRAM type is LPDDR4, this parameter defines
|
||||
the PHY side DQ line (including DQS/DQ/DM line)
|
||||
driver strength. Default value is PHY_DRV_ODT_80.
|
||||
|
||||
- rockchip,phy_lpddr4_odt : When the DRAM type is LPDDR4, this parameter defines
|
||||
the PHY side ODT strength. Default value is
|
||||
PHY_DRV_ODT_60.
|
||||
|
||||
Example:
|
||||
dmc_opp_table: dmc_opp_table {
|
||||
compatible = "operating-points-v2";
|
||||
|
||||
opp00 {
|
||||
opp-hz = /bits/ 64 <300000000>;
|
||||
opp-microvolt = <900000>;
|
||||
};
|
||||
opp01 {
|
||||
opp-hz = /bits/ 64 <666000000>;
|
||||
opp-microvolt = <900000>;
|
||||
};
|
||||
};
|
||||
|
||||
dmc: dmc {
|
||||
compatible = "rockchip,rk3399-dmc";
|
||||
devfreq-events = <&dfi>;
|
||||
interrupts = <GIC_SPI 1 IRQ_TYPE_LEVEL_HIGH>;
|
||||
clocks = <&cru SCLK_DDRCLK>;
|
||||
clock-names = "dmc_clk";
|
||||
operating-points-v2 = <&dmc_opp_table>;
|
||||
center-supply = <&ppvar_centerlogic>;
|
||||
upthreshold = <15>;
|
||||
downdifferential = <10>;
|
||||
rockchip,ddr3_speed_bin = <21>;
|
||||
rockchip,pd_idle = <0x40>;
|
||||
rockchip,sr_idle = <0x2>;
|
||||
rockchip,sr_mc_gate_idle = <0x3>;
|
||||
rockchip,srpd_lite_idle = <0x4>;
|
||||
rockchip,standby_idle = <0x2000>;
|
||||
rockchip,dram_dll_dis_freq = <300>;
|
||||
rockchip,phy_dll_dis_freq = <125>;
|
||||
rockchip,auto_pd_dis_freq = <666>;
|
||||
rockchip,ddr3_odt_dis_freq = <333>;
|
||||
rockchip,ddr3_drv = <DDR3_DS_40ohm>;
|
||||
rockchip,ddr3_odt = <DDR3_ODT_120ohm>;
|
||||
rockchip,phy_ddr3_ca_drv = <PHY_DRV_ODT_40>;
|
||||
rockchip,phy_ddr3_dq_drv = <PHY_DRV_ODT_40>;
|
||||
rockchip,phy_ddr3_odt = <PHY_DRV_ODT_240>;
|
||||
rockchip,lpddr3_odt_dis_freq = <333>;
|
||||
rockchip,lpddr3_drv = <LP3_DS_34ohm>;
|
||||
rockchip,lpddr3_odt = <LP3_ODT_240ohm>;
|
||||
rockchip,phy_lpddr3_ca_drv = <PHY_DRV_ODT_40>;
|
||||
rockchip,phy_lpddr3_dq_drv = <PHY_DRV_ODT_40>;
|
||||
rockchip,phy_lpddr3_odt = <PHY_DRV_ODT_240>;
|
||||
rockchip,lpddr4_odt_dis_freq = <333>;
|
||||
rockchip,lpddr4_drv = <LP4_PDDS_60ohm>;
|
||||
rockchip,lpddr4_dq_odt = <LP4_DQ_ODT_40ohm>;
|
||||
rockchip,lpddr4_ca_odt = <LP4_CA_ODT_40ohm>;
|
||||
rockchip,phy_lpddr4_ca_drv = <PHY_DRV_ODT_40>;
|
||||
rockchip,phy_lpddr4_ck_cs_drv = <PHY_DRV_ODT_80>;
|
||||
rockchip,phy_lpddr4_dq_drv = <PHY_DRV_ODT_80>;
|
||||
rockchip,phy_lpddr4_odt = <PHY_DRV_ODT_60>;
|
||||
};
|
||||
@@ -5,9 +5,7 @@ Required properties for dp-controller:
|
||||
platform specific such as:
|
||||
* "samsung,exynos5-dp"
|
||||
* "rockchip,rk3288-dp"
|
||||
* "rockchip,rk3368-edp"
|
||||
* "rockchip,rk3399-edp"
|
||||
* "rockchip,rk3568-edp"
|
||||
-reg:
|
||||
physical base address of the controller and length
|
||||
of memory mapped region.
|
||||
@@ -23,8 +21,6 @@ Required properties for dp-controller:
|
||||
from general PHY binding: Should be "dp".
|
||||
|
||||
Optional properties for dp-controller:
|
||||
-analogix,video-bist-enable:
|
||||
Enable video bist pattern for DP_TX debugging.
|
||||
-force-hpd:
|
||||
Indicate driver need force hpd when hpd detect failed, this
|
||||
is used for some eDP screen which don't have hpd signal.
|
||||
|
||||
@@ -16,9 +16,6 @@ Required properties:
|
||||
Documentation/devicetree/bindings/graph.txt. This port should be connected
|
||||
to the input port of an attached HDMI or LVDS encoder chip.
|
||||
|
||||
Optional properties:
|
||||
- pinctrl-names: Contain "default" and "sleep".
|
||||
|
||||
Example:
|
||||
|
||||
dpi0: dpi@1401d000 {
|
||||
@@ -29,9 +26,6 @@ dpi0: dpi@1401d000 {
|
||||
<&mmsys CLK_MM_DPI_ENGINE>,
|
||||
<&apmixedsys CLK_APMIXED_TVDPLL>;
|
||||
clock-names = "pixel", "engine", "pll";
|
||||
pinctrl-names = "default", "sleep";
|
||||
pinctrl-0 = <&dpi_pin_func>;
|
||||
pinctrl-1 = <&dpi_pin_idle>;
|
||||
|
||||
port {
|
||||
dpi0_out: endpoint {
|
||||
|
||||
@@ -1,9 +0,0 @@
|
||||
Armadeus ST0700 Adapt. A Santek ST0700I5Y-RBSLW 7.0" WVGA (800x480) TFT with
|
||||
an adapter board.
|
||||
|
||||
Required properties:
|
||||
- compatible: "armadeus,st0700-adapt"
|
||||
- power-supply: see panel-common.txt
|
||||
|
||||
Optional properties:
|
||||
- backlight: see panel-common.txt
|
||||
@@ -5,53 +5,12 @@ panel node
|
||||
----------
|
||||
|
||||
Required properties:
|
||||
- compatible: Should contain one of the following:
|
||||
- "simple-panel": for common simple panel
|
||||
- "simple-panel-dsi": for common simple dsi panel
|
||||
- "vendor,panel": for vendor specific panel
|
||||
- power-supply: See panel-common.txt
|
||||
|
||||
Optional properties:
|
||||
- vsp-supply: positive voltage supply
|
||||
- vsn-supply: negative voltage supply
|
||||
- ddc-i2c-bus: phandle of an I2C controller used for DDC EDID probing
|
||||
- enable-gpios: GPIO pin to enable or disable the panel
|
||||
- reset-gpios: GPIO pin to reset the panel
|
||||
- backlight: phandle of the backlight device attached to the panel
|
||||
- prepare-delay-ms: the time (in milliseconds) that it takes for the panel to
|
||||
become ready and start receiving video data
|
||||
- enable-delay-ms: the time (in milliseconds) that it takes for the panel to
|
||||
display the first valid frame after starting to receive
|
||||
video data
|
||||
- disable-delay-ms: the time (in milliseconds) that it takes for the panel to
|
||||
turn the display off (no content is visible)
|
||||
- unprepare-delay-ms: the time (in milliseconds) that it takes for the panel
|
||||
to power itself down completely
|
||||
- reset-delay-ms: the time (in milliseconds) that it takes for the panel to
|
||||
reset itself completely
|
||||
- init-delay-ms: the time (in milliseconds) that it takes for the panel to
|
||||
send init command sequence after reset deassert
|
||||
- width-mm: width (in millimeters) of the panel's active display area
|
||||
- height-mm: height (in millimeters) of the panel's active display area
|
||||
- bpc: bits per color/component
|
||||
- bus-format: Pixel data format on the wire
|
||||
|
||||
- dsi,lanes: number of active data lanes
|
||||
- dsi,format: pixel format for video mode
|
||||
- dsi,flags: DSI operation mode related flags
|
||||
- panel-init-sequence:
|
||||
- panel-exit-sequence:
|
||||
A byte stream formed by simple multiple dcs packets.
|
||||
byte 0: dcs data type
|
||||
byte 1: wait number of specified ms after dcs command transmitted
|
||||
byte 2: packet payload length
|
||||
byte 3 and beyond: number byte of payload
|
||||
|
||||
- power-invert: power invert control
|
||||
- rockchip,cmd-type: default is DSI cmd, or "spi", "mcu" cmd type
|
||||
- spi-sdi: spi init panel for spi-sdi io
|
||||
- spi-scl: spi init panel for spi-scl io
|
||||
- spi-cs: spi init pael for spi-cs io
|
||||
|
||||
Example:
|
||||
|
||||
|
||||
@@ -3,9 +3,7 @@ Rockchip RK3288 specific extensions to the Analogix Display Port
|
||||
|
||||
Required properties:
|
||||
- compatible: "rockchip,rk3288-dp",
|
||||
"rockchip,rk3368-edp",
|
||||
"rockchip,rk3399-edp";
|
||||
"rockchip,rk3568-edp";
|
||||
|
||||
- reg: physical base address of the controller and length
|
||||
|
||||
@@ -23,6 +21,8 @@ Required properties:
|
||||
|
||||
- reset-names: Must include the name "dp"
|
||||
|
||||
- rockchip,grf: this soc should set GRF regs, so need get grf here.
|
||||
|
||||
- ports: there are 2 port nodes with endpoint definitions as defined in
|
||||
Documentation/devicetree/bindings/media/video-interfaces.txt.
|
||||
Port 0: contained 2 endpoints, connecting to the output of vop.
|
||||
@@ -33,11 +33,6 @@ Optional property for different chips:
|
||||
|
||||
- clock-names: from common clock binding:
|
||||
Required elements: "grf"
|
||||
- rockchip,grf: this soc should set GRF regs, so need get grf here.
|
||||
Optional properties
|
||||
- reset-names: "apb" must control the reset line to the APB interface.
|
||||
- vcc-supply: Regulator for eDP_AVDD_1V0.
|
||||
- vccio-supply: Regulator for eDP_AVDD_1V8.
|
||||
|
||||
For the below properties, please refer to Analogix DP binding document:
|
||||
* Documentation/devicetree/bindings/display/bridge/analogix_dp.txt
|
||||
|
||||
@@ -12,10 +12,7 @@ following device-specific properties.
|
||||
Required properties:
|
||||
|
||||
- compatible: should be one of the following:
|
||||
"rockchip,rk3228-dw-hdmi"
|
||||
"rockchip,rk3288-dw-hdmi"
|
||||
"rockchip,rk3328-dw-hdmi"
|
||||
"rockchip,rk3368-dw-hdmi"
|
||||
"rockchip,rk3399-dw-hdmi"
|
||||
- reg: See dw_hdmi.txt.
|
||||
- reg-io-width: See dw_hdmi.txt. Shall be 4.
|
||||
@@ -26,7 +23,6 @@ Required properties:
|
||||
corresponding to the video input of the controller. The port shall have two
|
||||
endpoints, numbered 0 and 1, connected respectively to the vopb and vopl.
|
||||
- rockchip,grf: Shall reference the GRF to mux vopl/vopb.
|
||||
- rockchip,phy-table: the parameter table of hdmi phy configuration.
|
||||
|
||||
Optional properties
|
||||
|
||||
@@ -38,13 +34,6 @@ Optional properties
|
||||
- clock-names: May contain "cec" as defined in dw_hdmi.txt.
|
||||
- clock-names: May contain "grf", power for grf io.
|
||||
- clock-names: May contain "vpll", external clock for some hdmi phy.
|
||||
- phys: from general PHY binding: the phandle for the PHY device.
|
||||
- phy-names: Should be "hdmi" if phys references an external phy.
|
||||
- max-tmdsclk: hdmi max tmdsclk, if this property is not set, set max tmdsclk to 594M.
|
||||
- unsupported-yuv-input: If yuv input is not supported, set this property.
|
||||
- unsupported-deep-color: If deep color mode is not supported, set this property.
|
||||
- hdcp1x-enable: enable hdcp1.x, enable if defined, disable if not defined
|
||||
- scramble-low-rates: if defined enable scarmble when tmdsclk less than 340Mhz
|
||||
|
||||
Example:
|
||||
|
||||
@@ -71,9 +60,4 @@ hdmi: hdmi@ff980000 {
|
||||
};
|
||||
};
|
||||
};
|
||||
rockchip,phy-table = <74250000 0x8009 0x0004 0x0272>,
|
||||
<165000000 0x802b 0x0004 0x0209>,
|
||||
<297000000 0x8039 0x0005 0x028d>,
|
||||
<594000000 0x8039 0x0000 0x019d>,
|
||||
<000000000 0x0000 0x0000 0x0000>;
|
||||
};
|
||||
|
||||
@@ -4,44 +4,28 @@ Rockchip specific extensions to the Synopsys Designware MIPI DSI
|
||||
Required properties:
|
||||
- #address-cells: Should be <1>.
|
||||
- #size-cells: Should be <0>.
|
||||
- compatible: must be one of:
|
||||
"rockchip,px30-mipi-dsi", "snps,dw-mipi-dsi".
|
||||
"rockchip,rk1808-mipi-dsi", "snps,dw-mipi-dsi".
|
||||
"rockchip,rk3128-mipi-dsi", "snps,dw-mipi-dsi".
|
||||
"rockchip,rk3288-mipi-dsi", "snps,dw-mipi-dsi".
|
||||
"rockchip,rk3368-mipi-dsi", "snps,dw-mipi-dsi".
|
||||
"rockchip,rk3399-mipi-dsi", "snps,dw-mipi-dsi".
|
||||
"rockchip,rk3568-mipi-dsi", "snps,dw-mipi-dsi".
|
||||
"rockchip,rv1126-mipi-dsi", "snps,dw-mipi-dsi".
|
||||
- compatible: "rockchip,rk3288-mipi-dsi", "snps,dw-mipi-dsi".
|
||||
"rockchip,rk3399-mipi-dsi", "snps,dw-mipi-dsi".
|
||||
- reg: Represent the physical address range of the controller.
|
||||
- interrupts: Represent the controller's interrupt to the CPU(s).
|
||||
- power-domains: a phandle to mipi dsi power domain node.
|
||||
- resets: list of phandle + reset specifier pairs, as described in [3].
|
||||
- reset-names: string reset name, must be "apb".
|
||||
- clocks, clock-names: Phandles to the controller's APB clock(pclk),
|
||||
As described in [1].
|
||||
- clocks, clock-names: Phandles to the controller's pll reference
|
||||
clock(ref) and APB clock(pclk). For RK3399, a phy config clock
|
||||
(phy_cfg) and a grf clock(grf) are required. As described in [1].
|
||||
- rockchip,grf: this soc should set GRF regs to mux vopl/vopb.
|
||||
- ports: contain a port node with endpoint definitions as defined in [2].
|
||||
For vopb,set the reg = <0> and set the reg = <1> for vopl.
|
||||
|
||||
Optional properties:
|
||||
- clocks, clock-names:
|
||||
phandle to the SNPS-PHY config clock, name should be "phy_cfg".
|
||||
phandle to the SNPS-PHY PLL reference clock, name should be "ref".
|
||||
phandle to the Non-SNPS PHY high speed clock, name should be "hs_clk".
|
||||
- phys: phandle to Non-SNPS PHY node
|
||||
- phy-names: the string "mipi_dphy" when is found in a node, along with "phys"
|
||||
attribute, provides phandle to MIPI PHY node
|
||||
- rockchip,lane-rate: optional override of the desired bandwidth.
|
||||
- rockchip,dual-channel: for dual-channel mode, phandle to the slave channel.
|
||||
- rockchip,data-swap: for dual-channel mode, swap two channel data of MIPI.
|
||||
- power-domains: a phandle to mipi dsi power domain node.
|
||||
- resets: list of phandle + reset specifier pairs, as described in [3].
|
||||
- reset-names: string reset name, must be "apb".
|
||||
|
||||
[1] Documentation/devicetree/bindings/clock/clock-bindings.txt
|
||||
[2] Documentation/devicetree/bindings/media/video-interfaces.txt
|
||||
[3] Documentation/devicetree/bindings/reset/reset.txt
|
||||
|
||||
Example:
|
||||
dsi: dsi@ff960000 {
|
||||
mipi_dsi: mipi@ff960000 {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
compatible = "rockchip,rk3288-mipi-dsi", "snps,dw-mipi-dsi";
|
||||
@@ -58,18 +42,16 @@ Example:
|
||||
#size-cells = <0>;
|
||||
reg = <1>;
|
||||
|
||||
port {
|
||||
mipi_in: port {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
|
||||
dsi_in_vopb: endpoint@0 {
|
||||
mipi_in_vopb: endpoint@0 {
|
||||
reg = <0>;
|
||||
remote-endpoint = <&vopb_out_dsi>;
|
||||
remote-endpoint = <&vopb_out_mipi>;
|
||||
};
|
||||
|
||||
dsi_in_vopl: endpoint@1 {
|
||||
mipi_in_vopl: endpoint@1 {
|
||||
reg = <1>;
|
||||
remote-endpoint = <&vopl_out_dsi>;
|
||||
remote-endpoint = <&vopl_out_mipi>;
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
@@ -11,21 +11,9 @@ Required properties:
|
||||
of vop devices. vop definitions as defined in
|
||||
Documentation/devicetree/bindings/display/rockchip/rockchip-vop.txt
|
||||
|
||||
Optional properties
|
||||
- secure-memory-region: phandle to a node describing secure memory
|
||||
- clocks: include clock specifiers corresponding to entries in the
|
||||
clock-names property.
|
||||
- clock-names: optional include
|
||||
hdmi-tmds-pll: special pll required by some hdmi-vop design,
|
||||
if there is no hdmi plug, also can reuse for
|
||||
common display pll.
|
||||
default-vop-pll: common display pll.
|
||||
|
||||
example:
|
||||
|
||||
display-subsystem {
|
||||
compatible = "rockchip,display-subsystem";
|
||||
ports = <&vopl_out>, <&vopb_out>;
|
||||
clocks = <&cru PLL_VPLL>, <&cru PLL_CPLL>;
|
||||
clock-names = "hdmi-tmds-pll", "default-vop-pll";
|
||||
};
|
||||
|
||||
@@ -1,19 +1,26 @@
|
||||
Rockchip SoCs LVDS interface
|
||||
Rockchip RK3288 LVDS interface
|
||||
================================
|
||||
|
||||
Required properties:
|
||||
- compatible: matching the soc type, one of
|
||||
- "rockchip,px30-lvds",
|
||||
- "rockchip,rk3126-lvds",
|
||||
- "rockchip,rk3288-lvds",
|
||||
- "rockchip,rk3368-lvds";
|
||||
- "rockchip,rk3568-lvds";
|
||||
- phys : phandle for the PHY device
|
||||
- phy-names : should be "phy"
|
||||
- "rockchip,rk3288-lvds";
|
||||
|
||||
- reg: physical base address of the controller and length
|
||||
of memory mapped region.
|
||||
- clocks: must include clock specifiers corresponding to entries in the
|
||||
clock-names property.
|
||||
- clock-names: must contain "pclk_lvds"
|
||||
|
||||
- avdd1v0-supply: regulator phandle for 1.0V analog power
|
||||
- avdd1v8-supply: regulator phandle for 1.8V analog power
|
||||
- avdd3v3-supply: regulator phandle for 3.3V analog power
|
||||
|
||||
- rockchip,grf: phandle to the general register files syscon
|
||||
- rockchip,output: "rgb", "lvds" or "duallvds", This describes the output interface
|
||||
|
||||
Optional properties:
|
||||
- dual-channel: boolean. if it exists, enable dual channel mode
|
||||
- rockchip,data-swap: boolean to enable odd/even data swap in dual channel mode
|
||||
- pinctrl-names: must contain a "lcdc" entry.
|
||||
- pinctrl-0: pin control group to be used for this controller.
|
||||
|
||||
Required nodes:
|
||||
|
||||
@@ -25,36 +32,68 @@ Their connections are modeled using the OF graph bindings specified in
|
||||
- video port 0 for the VOP input, the remote endpoint maybe vopb or vopl
|
||||
- video port 1 for either a panel or subsequent encoder
|
||||
|
||||
the lvds panel described by
|
||||
Documentation/devicetree/bindings/display/panel/simple-panel.txt
|
||||
|
||||
Panel required properties:
|
||||
- ports for remote LVDS output
|
||||
|
||||
Panel optional properties:
|
||||
- data-mapping: should be "vesa-24","jeida-24" or "jeida-18".
|
||||
This describes decribed by:
|
||||
Documentation/devicetree/bindings/display/panel/panel-lvds.txt
|
||||
|
||||
Example:
|
||||
|
||||
&grf {
|
||||
status = "okay";
|
||||
lvds_panel: lvds-panel {
|
||||
compatible = "auo,b101ean01";
|
||||
enable-gpios = <&gpio7 21 GPIO_ACTIVE_HIGH>;
|
||||
data-mapping = "jeida-24";
|
||||
|
||||
lvds: lvds {
|
||||
ports {
|
||||
panel_in_lvds: endpoint {
|
||||
remote-endpoint = <&lvds_out_panel>;
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
For Rockchip RK3288:
|
||||
|
||||
lvds: lvds@ff96c000 {
|
||||
compatible = "rockchip,rk3288-lvds";
|
||||
phys = <&video_phy>;
|
||||
phy-names = "phy";
|
||||
status = "disabled";
|
||||
|
||||
rockchip,grf = <&grf>;
|
||||
reg = <0xff96c000 0x4000>;
|
||||
clocks = <&cru PCLK_LVDS_PHY>;
|
||||
clock-names = "pclk_lvds";
|
||||
pinctrl-names = "lcdc";
|
||||
pinctrl-0 = <&lcdc_ctl>;
|
||||
avdd1v0-supply = <&vdd10_lcd>;
|
||||
avdd1v8-supply = <&vcc18_lcd>;
|
||||
avdd3v3-supply = <&vcca_33>;
|
||||
rockchip,output = "rgb";
|
||||
ports {
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
|
||||
port@0 {
|
||||
lvds_in: port@0 {
|
||||
reg = <0>;
|
||||
#address-cells = <1>;
|
||||
#size-cells = <0>;
|
||||
|
||||
lvds_in_vopb: endpoint@0 {
|
||||
reg = <0>;
|
||||
remote-endpoint = <&vopb_out_lvds>;
|
||||
};
|
||||
|
||||
lvds_in_vopl: endpoint@1 {
|
||||
reg = <1>;
|
||||
remote-endpoint = <&vopl_out_lvds>;
|
||||
};
|
||||
};
|
||||
|
||||
lvds_out: port@1 {
|
||||
reg = <1>;
|
||||
|
||||
lvds_out_panel: endpoint {
|
||||
remote-endpoint = <&panel_in_lvds>;
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
@@ -7,16 +7,8 @@ buffer to an external LCD interface.
|
||||
Required properties:
|
||||
- compatible: value should be one of the following
|
||||
"rockchip,rk3036-vop";
|
||||
"rockchip,rk3066-vop";
|
||||
"rockchip,rk3126-vop";
|
||||
"rockchip,px30-vop-big";
|
||||
"rockchip,px30-vop-lit";
|
||||
"rockchip,rk3308-vop";
|
||||
"rockchip,rk1808-vop-lit";
|
||||
"rockchip,rk1808-vop-raw";
|
||||
"rockchip,rv1126-vop";
|
||||
"rockchip,rk3288-vop-lit";
|
||||
"rockchip,rk3288-vop-big";
|
||||
"rockchip,rk3288-vop";
|
||||
"rockchip,rk3368-vop";
|
||||
"rockchip,rk3366-vop";
|
||||
"rockchip,rk3399-vop-big";
|
||||
@@ -24,13 +16,6 @@ Required properties:
|
||||
"rockchip,rk3228-vop";
|
||||
"rockchip,rk3328-vop";
|
||||
|
||||
- reg: Address and length of the register set for the device.
|
||||
- reg-names: The names of register regions. contain following regions:
|
||||
- "regs" : (Required) Base address and size of the controllers.
|
||||
- "gamma_lut" : (Optional) gamma function lut table registers,
|
||||
take care of this register's length, driver would use
|
||||
register's length to decide gamma table size.
|
||||
|
||||
- interrupts: should contain a list of all VOP IP block interrupts in the
|
||||
order: VSYNC, LCD_SYSTEM. The interrupt specifier
|
||||
format depends on the interrupt controller used.
|
||||
@@ -42,7 +27,6 @@ Required properties:
|
||||
aclk_vop: for ddr buffer transfer.
|
||||
hclk_vop: for ahb bus to R/W the phy regs.
|
||||
dclk_vop: pixel clock.
|
||||
dclk_source: optional, dclk sources from display plls.
|
||||
|
||||
- resets: Must contain an entry for each entry in reset-names.
|
||||
See ../reset/reset.txt for details.
|
||||
@@ -55,14 +39,12 @@ Required properties:
|
||||
|
||||
- port: A port node with endpoint definitions as defined in
|
||||
Documentation/devicetree/bindings/media/video-interfaces.txt.
|
||||
- rockchip,dual-channel-swap: boolean, optional config for vop dual channel swap
|
||||
|
||||
Example:
|
||||
SoC specific DT entry:
|
||||
vopb: vopb@ff930000 {
|
||||
compatible = "rockchip,rk3288-vop";
|
||||
reg = <0xff930000 0x19c>, <0xff931000 0x1000>;
|
||||
reg-names = "regs", "gamma_lut";
|
||||
reg = <0xff930000 0x19c>;
|
||||
interrupts = <GIC_SPI 15 IRQ_TYPE_LEVEL_HIGH>;
|
||||
clocks = <&cru ACLK_VOP0>, <&cru DCLK_VOP0>, <&cru HCLK_VOP0>;
|
||||
clock-names = "aclk_vop", "dclk_vop", "hclk_vop";
|
||||
|
||||
@@ -27,7 +27,6 @@ Required properties:
|
||||
"atmel,24c256",
|
||||
"atmel,24c512",
|
||||
"atmel,24c1024",
|
||||
"atmel,24c2048",
|
||||
|
||||
If <manufacturer> is not "atmel", then a fallback must be used
|
||||
with the same <model> and "atmel" as manufacturer.
|
||||
|
||||
@@ -8,7 +8,6 @@ Required properties :
|
||||
- reg : Offset and length of the register set for the device
|
||||
- compatible: should be one of the following:
|
||||
- "rockchip,rv1108-i2c": for rv1108
|
||||
- "rockchip,rv1126-i2c": for rv1126
|
||||
- "rockchip,rk3066-i2c": for rk3066
|
||||
- "rockchip,rk3188-i2c": for rk3188
|
||||
- "rockchip,rk3228-i2c": for rk3228
|
||||
|
||||
@@ -7,7 +7,6 @@ Required properties:
|
||||
- "rockchip,rk3328-saradc", "rockchip,rk3399-saradc": for rk3328
|
||||
- "rockchip,rk3399-saradc": for rk3399
|
||||
- "rockchip,rv1108-saradc", "rockchip,rk3399-saradc": for rv1108
|
||||
- "rockchip,rk1808-saradc", "rockchip,rk3399-saradc": for rk1808
|
||||
|
||||
- reg: physical base address of the controller and length of memory mapped
|
||||
region.
|
||||
|
||||
@@ -11,13 +11,11 @@ New driver handles the following
|
||||
|
||||
Required properties:
|
||||
- compatible: Must be "samsung,exynos-adc-v1"
|
||||
for Exynos5250 controllers.
|
||||
for exynos4412/5250 and s5pv210 controllers.
|
||||
Must be "samsung,exynos-adc-v2" for
|
||||
future controllers.
|
||||
Must be "samsung,exynos3250-adc" for
|
||||
controllers compatible with ADC of Exynos3250.
|
||||
Must be "samsung,exynos4212-adc" for
|
||||
controllers compatible with ADC of Exynos4212 and Exynos4412.
|
||||
Must be "samsung,exynos7-adc" for
|
||||
the ADC in Exynos7 and compatibles
|
||||
Must be "samsung,s3c2410-adc" for
|
||||
@@ -30,8 +28,6 @@ Required properties:
|
||||
the ADC in s3c2443 and compatibles
|
||||
Must be "samsung,s3c6410-adc" for
|
||||
the ADC in s3c6410 and compatibles
|
||||
Must be "samsung,s5pv210-adc" for
|
||||
the ADC in s5pv210 and compatibles
|
||||
- reg: List of ADC register address range
|
||||
- The base address and range of ADC register
|
||||
- The base address and range of ADC_PHY register (every
|
||||
|
||||
@@ -21,7 +21,7 @@ controller state. The mux controller state is described in
|
||||
|
||||
Example:
|
||||
mux: mux-controller {
|
||||
compatible = "gpio-mux";
|
||||
compatible = "mux-gpio";
|
||||
#mux-control-cells = <0>;
|
||||
|
||||
mux-gpios = <&pioA 0 GPIO_ACTIVE_HIGH>,
|
||||
|
||||
@@ -24,10 +24,6 @@ Optional properties:
|
||||
- rockchip,disable-mmu-reset : Don't use the mmu reset operation.
|
||||
Some mmu instances may produce unexpected results
|
||||
when the reset operation is used.
|
||||
- rk_iommu,disable_reset_quirk : Same with above, for compatible with previous code
|
||||
|
||||
- rockchip,skip-mmu-read : Some iommu instances are not able to be read, skip
|
||||
reading operation to make iommu work as normal
|
||||
|
||||
Example:
|
||||
|
||||
|
||||
@@ -73,7 +73,7 @@ Example:
|
||||
};
|
||||
};
|
||||
|
||||
port@a {
|
||||
port@10 {
|
||||
reg = <10>;
|
||||
|
||||
adv7482_txa: endpoint {
|
||||
@@ -83,7 +83,7 @@ Example:
|
||||
};
|
||||
};
|
||||
|
||||
port@b {
|
||||
port@11 {
|
||||
reg = <11>;
|
||||
|
||||
adv7482_txb: endpoint {
|
||||
|
||||
@@ -62,10 +62,6 @@ Optional properties:
|
||||
be referred to mmc-pwrseq-simple.txt. But now it's reused as a tunable delay
|
||||
waiting for I/O signalling and card power supply to be stable, regardless of
|
||||
whether pwrseq-simple is used. Default to 10ms if no available.
|
||||
- supports-cqe : The presence of this property indicates that the corresponding
|
||||
MMC host controller supports HW command queue feature.
|
||||
- disable-cqe-dcmd: This property indicates that the MMC controller's command
|
||||
queue engine (CQE) does not support direct commands (DCMDs).
|
||||
|
||||
*NOTE* on CD and WP polarity. To use common for all SD/MMC host controllers line
|
||||
polarity properties, we have to fix the meaning of the "normal" and "inverted"
|
||||
|
||||
@@ -19,9 +19,6 @@ Optional properties:
|
||||
- interrupt-names: must be "mdio_done_error" when there is a share interrupt fed
|
||||
to this hardware block, or must be "mdio_done" for the first interrupt and
|
||||
"mdio_error" for the second when there are separate interrupts
|
||||
- clocks: A reference to the clock supplying the MDIO bus controller
|
||||
- clock-frequency: the MDIO bus clock that must be output by the MDIO bus
|
||||
hardware, if absent, the default hardware values are used
|
||||
|
||||
Child nodes of this MDIO bus controller node are standard Ethernet PHY device
|
||||
nodes as described in Documentation/devicetree/bindings/net/phy.txt
|
||||
|
||||
@@ -17,7 +17,7 @@ Example:
|
||||
reg = <1>;
|
||||
clocks = <&clk32m>;
|
||||
interrupt-parent = <&gpio4>;
|
||||
interrupts = <13 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupts = <13 IRQ_TYPE_EDGE_RISING>;
|
||||
vdd-supply = <®5v0>;
|
||||
xceiver-supply = <®5v0>;
|
||||
};
|
||||
|
||||
@@ -4,7 +4,6 @@ Required properties:
|
||||
- compatible: Should be one of the following:
|
||||
- "microchip,mcp2510" for MCP2510.
|
||||
- "microchip,mcp2515" for MCP2515.
|
||||
- "microchip,mcp25625" for MCP25625.
|
||||
- reg: SPI chip select.
|
||||
- clocks: The clock feeding the CAN controller.
|
||||
- interrupts: Should contain IRQ line for the CAN controller.
|
||||
|
||||
@@ -110,13 +110,6 @@ PROPERTIES
|
||||
Usage: required
|
||||
Definition: See soc/fsl/qman.txt and soc/fsl/bman.txt
|
||||
|
||||
- fsl,erratum-a050385
|
||||
Usage: optional
|
||||
Value type: boolean
|
||||
Definition: A boolean property. Indicates the presence of the
|
||||
erratum A050385 which indicates that DMA transactions that are
|
||||
split can result in a FMan lock.
|
||||
|
||||
=============================================================================
|
||||
FMan MURAM Node
|
||||
|
||||
|
||||
@@ -16,7 +16,7 @@ Required properties:
|
||||
|
||||
Optional properties:
|
||||
- interrupts: interrupt line number for the SMI error/done interrupt
|
||||
- clocks: phandle for up to four required clocks for the MDIO instance
|
||||
- clocks: phandle for up to three required clocks for the MDIO instance
|
||||
|
||||
The child nodes of the MDIO driver are the individual PHY devices
|
||||
connected to this MDIO bus. They must have a "reg" property given the
|
||||
|
||||
@@ -25,7 +25,7 @@ Example (for ARM-based BeagleBone with NPC100 NFC controller on I2C2):
|
||||
clock-frequency = <100000>;
|
||||
|
||||
interrupt-parent = <&gpio1>;
|
||||
interrupts = <29 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupts = <29 GPIO_ACTIVE_HIGH>;
|
||||
|
||||
enable-gpios = <&gpio0 30 GPIO_ACTIVE_HIGH>;
|
||||
firmware-gpios = <&gpio0 31 GPIO_ACTIVE_HIGH>;
|
||||
|
||||
@@ -25,7 +25,7 @@ Example (for ARM-based BeagleBone with PN544 on I2C2):
|
||||
clock-frequency = <400000>;
|
||||
|
||||
interrupt-parent = <&gpio1>;
|
||||
interrupts = <17 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupts = <17 GPIO_ACTIVE_HIGH>;
|
||||
|
||||
enable-gpios = <&gpio3 21 GPIO_ACTIVE_HIGH>;
|
||||
firmware-gpios = <&gpio3 19 GPIO_ACTIVE_HIGH>;
|
||||
|
||||
@@ -5,18 +5,14 @@ The device node has following properties.
|
||||
Required properties:
|
||||
- compatible: should be "rockchip,<name>-gamc"
|
||||
"rockchip,px30-gmac": found on PX30 SoCs
|
||||
"rockchip,rk1808-gmac": found on RK1808 SoCs
|
||||
"rockchip,rk3128-gmac": found on RK312x SoCs
|
||||
"rockchip,rk3228-gmac": found on RK322x SoCs
|
||||
"rockchip,rk3288-gmac": found on RK3288 SoCs
|
||||
"rockchip,rk3308-mac": found on RK3308 SoCs
|
||||
"rockchip,rk3328-gmac": found on RK3328 SoCs
|
||||
"rockchip,rk3366-gmac": found on RK3366 SoCs
|
||||
"rockchip,rk3368-gmac": found on RK3368 SoCs
|
||||
"rockchip,rk3399-gmac": found on RK3399 SoCs
|
||||
"rockchip,rk3568-gmac": found on RK3568 SoCs
|
||||
"rockchip,rv1108-gmac": found on RV1108 SoCs
|
||||
"rockchip,rv1126-gmac": found on RV1126 SoCs
|
||||
- reg: addresses and length of the register sets for the device.
|
||||
- interrupts: Should contain the GMAC interrupts.
|
||||
- interrupt-names: Should contain the interrupt names "macirq".
|
||||
|
||||
@@ -2,13 +2,10 @@
|
||||
|
||||
Required properties:
|
||||
- compatible: Should be one of the following.
|
||||
- "rockchip,rk1808-efuse" - for RK1808 SoCs.
|
||||
- "rockchip,rk3066a-efuse" - for RK3066a SoCs.
|
||||
- "rockchip,rk3128-efuse" - for RK3128 SoCs.
|
||||
- "rockchip,rk3188-efuse" - for RK3188 SoCs.
|
||||
- "rockchip,rk3228-efuse" - for RK3228 SoCs.
|
||||
- "rockchip,rk3288-efuse" - for RK3288 SoCs.
|
||||
- "rockchip,rk3288-secure-efuse" - for RK3288 SoCs.
|
||||
- "rockchip,rk3328-efuse" - for RK3328 SoCs.
|
||||
- "rockchip,rk3368-efuse" - for RK3368 SoCs.
|
||||
- "rockchip,rk3399-efuse" - for RK3399 SoCs.
|
||||
|
||||
@@ -2,15 +2,10 @@ ROCKCHIP USB2.0 PHY WITH INNO IP BLOCK
|
||||
|
||||
Required properties (phy (parent) node):
|
||||
- compatible : should be one of the listed compatibles:
|
||||
* "rockchip,px30-usb2phy"
|
||||
* "rockchip,rk3128-usb2phy"
|
||||
* "rockchip,rk3228-usb2phy"
|
||||
* "rockchip,rk3308-usb2phy"
|
||||
* "rockchip,rk3328-usb2phy"
|
||||
* "rockchip,rk3366-usb2phy"
|
||||
* "rockchip,rk3368-usb2phy"
|
||||
* "rockchip,rk3399-usb2phy"
|
||||
* "rockchip,rk3568-usb2phy"
|
||||
* "rockchip,rv1108-usb2phy"
|
||||
- reg : the address offset of grf for usb-phy configuration.
|
||||
- #clock-cells : should be 0.
|
||||
@@ -28,10 +23,6 @@ Optional properties:
|
||||
register files". When set driver will request its
|
||||
phandle as one companion-grf for some special SoCs
|
||||
(e.g RV1108).
|
||||
- rockchip,u2phy-tuning: when set, tuning u2phy to improve usb2 SI.
|
||||
- vbus-supply: regulator phandle for vbus power source.
|
||||
- wakeup-source: enable USB irq wakeup when suspend.
|
||||
only work when suspend wakeup-config is not work.
|
||||
|
||||
Required nodes : a sub-node is required for each port the phy provides.
|
||||
The sub-node name is used to identify host or otg port,
|
||||
@@ -54,16 +45,6 @@ Required properties (port (child) node):
|
||||
Optional properties:
|
||||
- phy-supply : phandle to a regulator that provides power to VBUS.
|
||||
See ./phy-bindings.txt for details.
|
||||
- rockchip,bypass-uart: when set, indicates that support usb to
|
||||
bypass uart feature.
|
||||
note: this property can only be added in debug stage.
|
||||
- rockchip,utmi-avalid : when set, the usb2 phy will use avalid
|
||||
status bit to get vbus status. If not, it will use
|
||||
bvalid status bit to get vbus status by default.
|
||||
- rockchip,vbus-always-on: when set, indicates that the otg vbus
|
||||
is always powered on.
|
||||
- rockchip,low-power-mode: when set, the port will enter low power
|
||||
state when suspend.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -95,48 +76,3 @@ grf: syscon@ff770000 {
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
Required properties (usb2phy grf node):
|
||||
- compatible : should be one of the listed compatibles:
|
||||
"rockchip,px30-usb2phy-grf", "syscon", "simple-mfd";
|
||||
"rockchip,rk1808-usb2phy-grf", "syscon", "simple-mfd";
|
||||
"rockchip,rk3308-usb2phy-grf", "syscon", "simple-mfd";
|
||||
"rockchip,rk3328-usb2phy-grf", "syscon", "simple-mfd";
|
||||
"rockchip,rk3568-usb2phy-grf", "syscon";
|
||||
- reg : the address offset of grf for usb-phy configuration.
|
||||
- #address-cells : should be 1.
|
||||
- #size-cells : should be 1.
|
||||
|
||||
Required nodes : a sub-node is required for the phy provides.
|
||||
The sub-node name is used to identify each phy,
|
||||
and shall be the following entries:
|
||||
|
||||
Example:
|
||||
|
||||
usb2phy_grf: syscon@ff450000 {
|
||||
compatible = "rockchip,rk3328-usb2phy-grf", "syscon",
|
||||
"simple-mfd";
|
||||
reg = <0x0 0xff450000 0x0 0x10000>;
|
||||
#address-cells = <1>;
|
||||
#size-cells = <1>;
|
||||
|
||||
u2phy: usb2-phy@100 {
|
||||
compatible = "rockchip,rk3328-usb2phy";
|
||||
reg = <0x100 0x10>;
|
||||
clocks = <&xin24m>;
|
||||
clock-names = "phyclk";
|
||||
#clock-cells = <0>;
|
||||
assigned-clocks = <&cru USB480M>;
|
||||
assigned-clock-parents = <&u2phy>;
|
||||
clock-output-names = "usb480m_phy";
|
||||
status = "disabled";
|
||||
|
||||
u2phy_host: host-port {
|
||||
#phy-cells = <0>;
|
||||
interrupts = <GIC_SPI 62 IRQ_TYPE_LEVEL_HIGH>;
|
||||
interrupt-names = "linestate";
|
||||
status = "disabled";
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
|
||||
@@ -17,11 +17,6 @@ Required properties:
|
||||
|
||||
Optional properties:
|
||||
- extcon : extcon specifier for the Power Delivery
|
||||
- rockchip,phy-config : A list of voltage swing(mV) and pre-emphasis
|
||||
(dB) pairs. They are 3 blocks of 4 entries and
|
||||
correspond to s0p0 ~ s0p3, s1p0 ~ s1p3,
|
||||
s2p0 ~ s2p3, s3p0 ~ s2p3 swing and pre-emphasis
|
||||
values.
|
||||
|
||||
Required nodes : a sub-node is required for each port the phy provides.
|
||||
The sub-node name is used to identify dp or usb3 port,
|
||||
@@ -55,21 +50,6 @@ Example:
|
||||
<&cru SRST_P_UPHY0_TCPHY>;
|
||||
reset-names = "uphy", "uphy-pipe", "uphy-tcphy";
|
||||
|
||||
rockchip,phy-config = <0x2a 0x00>,
|
||||
<0x1f 0x15>,
|
||||
<0x14 0x22>,
|
||||
<0x02 0x2b>,
|
||||
|
||||
<0x21 0x00>,
|
||||
<0x12 0x15>,
|
||||
<0x02 0x22>,
|
||||
<0 0>,
|
||||
|
||||
<0x15 0x00>,
|
||||
<0x00 0x15>,
|
||||
<0 0>,
|
||||
<0 0>;
|
||||
|
||||
tcphy0_dp: dp-port {
|
||||
#phy-cells = <0>;
|
||||
};
|
||||
@@ -94,21 +74,6 @@ Example:
|
||||
<&cru SRST_P_UPHY1_TCPHY>;
|
||||
reset-names = "uphy", "uphy-pipe", "uphy-tcphy";
|
||||
|
||||
rockchip,phy-config = <0x2a 0x00>,
|
||||
<0x1f 0x15>,
|
||||
<0x14 0x22>,
|
||||
<0x02 0x2b>,
|
||||
|
||||
<0x21 0x00>,
|
||||
<0x12 0x15>,
|
||||
<0x02 0x22>,
|
||||
<0 0>,
|
||||
|
||||
<0x15 0x00>,
|
||||
<0x00 0x15>,
|
||||
<0 0>,
|
||||
<0 0>;
|
||||
|
||||
tcphy1_dp: dp-port {
|
||||
#phy-cells = <0>;
|
||||
};
|
||||
|
||||
@@ -4,17 +4,12 @@ Rockchip specific extensions to the Analogix Display Port PHY
|
||||
Required properties:
|
||||
- compatible : should be one of the following supported values:
|
||||
- "rockchip.rk3288-dp-phy"
|
||||
- "rockchip.rk3368-dp-phy"
|
||||
- clocks: from common clock binding: handle to dp clock.
|
||||
of memory mapped region.
|
||||
- clock-names: from common clock binding:
|
||||
Required elements: "24m"
|
||||
- #phy-cells : from the generic PHY bindings, must be 0;
|
||||
|
||||
Optional properties:
|
||||
- resets : phandle to the reset of eDP 24m clock domain.
|
||||
- reset-names : should be "edp_24m".
|
||||
|
||||
Example:
|
||||
|
||||
grf: syscon@ff770000 {
|
||||
|
||||
@@ -22,7 +22,6 @@ Required properties for iomux controller:
|
||||
- compatible: should be
|
||||
"rockchip,px30-pinctrl": for Rockchip PX30
|
||||
"rockchip,rv1108-pinctrl": for Rockchip RV1108
|
||||
"rockchip,rv1126-pinctrl": for Rockchip RV1126
|
||||
"rockchip,rk2928-pinctrl": for Rockchip RK2928
|
||||
"rockchip,rk3066a-pinctrl": for Rockchip RK3066a
|
||||
"rockchip,rk3066b-pinctrl": for Rockchip RK3066b
|
||||
@@ -30,7 +29,6 @@ Required properties for iomux controller:
|
||||
"rockchip,rk3188-pinctrl": for Rockchip RK3188
|
||||
"rockchip,rk3228-pinctrl": for Rockchip RK3228
|
||||
"rockchip,rk3288-pinctrl": for Rockchip RK3288
|
||||
"rockchip,rk3308-pinctrl": for Rockchip RK3308
|
||||
"rockchip,rk3328-pinctrl": for Rockchip RK3328
|
||||
"rockchip,rk3368-pinctrl": for Rockchip RK3368
|
||||
"rockchip,rk3399-pinctrl": for Rockchip RK3399
|
||||
|
||||
@@ -36,7 +36,6 @@ Required properties:
|
||||
- "rockchip,rk3188-io-voltage-domain" for rk3188
|
||||
- "rockchip,rk3228-io-voltage-domain" for rk3228
|
||||
- "rockchip,rk3288-io-voltage-domain" for rk3288
|
||||
- "rockchip,rk3308-io-voltage-domain" for rk3308
|
||||
- "rockchip,rk3328-io-voltage-domain" for rk3328
|
||||
- "rockchip,rk3368-io-voltage-domain" for rk3368
|
||||
- "rockchip,rk3368-pmu-io-voltage-domain" for rk3368 pmu-domains
|
||||
@@ -44,7 +43,6 @@ Required properties:
|
||||
- "rockchip,rk3399-pmu-io-voltage-domain" for rk3399 pmu-domains
|
||||
- "rockchip,rv1108-io-voltage-domain" for rv1108
|
||||
- "rockchip,rv1108-pmu-io-voltage-domain" for rv1108 pmu-domains
|
||||
- "rockchip,rv1126-pmu-io-voltage-domain" for rv1126 pmu-domains
|
||||
|
||||
Deprecated properties:
|
||||
- rockchip,grf: phandle to the syscon managing the "general register files"
|
||||
@@ -119,17 +117,6 @@ Possible supplies for rk3399:
|
||||
Possible supplies for rk3399 pmu-domains:
|
||||
- pmu1830-supply:The supply connected to PMUIO2_VDD.
|
||||
|
||||
Possible supplies for rv1126 pmu-domains:
|
||||
- vccio1-supply:The supply connected to VCCIO1_VDD.
|
||||
- vccio2-supply:The supply connected to VCCIO2_VDD.
|
||||
- vccio3-supply:The supply connected to VCCIO3_VDD.
|
||||
- vccio4-supply:The supply connected to VCCIO4_VDD.
|
||||
- vccio5-supply:The supply connected to VCCIO5_VDD.
|
||||
- vccio6-supply:The supply connected to VCCIO6_VDD.
|
||||
- vccio7-supply:The supply connected to VCCIO7_VDD.
|
||||
- pmuio1-supply:The supply connected to PMUIO1_VDD.
|
||||
- pmuio2-supply:The supply connected to PMUIO2_VDD.
|
||||
|
||||
Example:
|
||||
|
||||
io-domains {
|
||||
|
||||
@@ -17,9 +17,6 @@ Required properties:
|
||||
- #pwm-cells: must be 2 (rk2928) or 3 (rk3288). See pwm.txt in this directory
|
||||
for a description of the cell format.
|
||||
|
||||
Optional properties:
|
||||
- center-aligned: support pwm output center aligned mode.
|
||||
|
||||
Example:
|
||||
|
||||
pwm0: pwm@20030000 {
|
||||
@@ -27,5 +24,4 @@ Example:
|
||||
reg = <0x20030000 0x10>;
|
||||
clocks = <&cru PCLK_PWM01>;
|
||||
#pwm-cells = <2>;
|
||||
center-aligned;
|
||||
};
|
||||
|
||||
@@ -1,8 +1,7 @@
|
||||
Binding for Fairchild FAN53555 regulators
|
||||
|
||||
Required properties:
|
||||
- compatible: one of "fcs,fan53555", "silergy,syr827", "silergy,syr828",
|
||||
"rockchip,rk8603", "rockchip,rk8604"
|
||||
- compatible: one of "fcs,fan53555", "silergy,syr827", "silergy,syr828"
|
||||
- reg: I2C address
|
||||
|
||||
Optional properties:
|
||||
|
||||
@@ -1,32 +0,0 @@
|
||||
Regulator Proxy Consumer Bindings
|
||||
|
||||
Regulator proxy consumers provide a means to use a default regulator state
|
||||
during bootup only which is removed at the end of boot. This feature can be
|
||||
used in situations where a shared regulator can be scaled between several
|
||||
possible voltages and hardware requires that it be at a high level at the
|
||||
beginning of boot before the consumer device responsible for requesting the
|
||||
high level has probed.
|
||||
|
||||
Optional properties:
|
||||
proxy-supply: phandle of the regulator's own device node.
|
||||
This property is required if any of the three
|
||||
properties below are specified.
|
||||
qcom,proxy-consumer-enable: Boolean indicating that the regulator must be
|
||||
kept enabled during boot.
|
||||
qcom,proxy-consumer-voltage: List of two integers corresponding the minimum
|
||||
and maximum voltage allowed during boot in
|
||||
microvolts.
|
||||
qcom,proxy-consumer-current: Minimum current in microamps required during
|
||||
boot.
|
||||
|
||||
Example:
|
||||
|
||||
foo_vreg: regulator@0 {
|
||||
regulator-name = "foo";
|
||||
regulator-min-microvolt = <1000000>;
|
||||
regulator-max-microvolt = <2000000>;
|
||||
proxy-supply = <&foo_vreg>;
|
||||
qcom,proxy-consumer-voltage = <1500000 2000000>;
|
||||
qcom,proxy-consumer-current = <25000>;
|
||||
qcom,proxy-consumer-enable;
|
||||
};
|
||||
@@ -63,9 +63,6 @@ reusable (optional) - empty property
|
||||
able to reclaim it back. Typically that means that the operating
|
||||
system can use that region to store volatile or cached data that
|
||||
can be otherwise regenerated or migrated elsewhere.
|
||||
inactive (optional) - empty property
|
||||
- Indicates the operating system must not active the region. It's only
|
||||
do active or inactive for region reusable.
|
||||
|
||||
Linux implementation note:
|
||||
- If a "linux,cma-default" property is present, then Linux will use the
|
||||
|
||||
@@ -1,27 +0,0 @@
|
||||
OMAP ROM RNG driver binding
|
||||
|
||||
Secure SoCs may provide RNG via secure ROM calls like Nokia N900 does. The
|
||||
implementation can depend on the SoC secure ROM used.
|
||||
|
||||
- compatible:
|
||||
Usage: required
|
||||
Value type: <string>
|
||||
Definition: must be "nokia,n900-rom-rng"
|
||||
|
||||
- clocks:
|
||||
Usage: required
|
||||
Value type: <prop-encoded-array>
|
||||
Definition: reference to the the RNG interface clock
|
||||
|
||||
- clock-names:
|
||||
Usage: required
|
||||
Value type: <stringlist>
|
||||
Definition: must be "ick"
|
||||
|
||||
Example:
|
||||
|
||||
rom_rng: rng {
|
||||
compatible = "nokia,n900-rom-rng";
|
||||
clocks = <&rng_ick>;
|
||||
clock-names = "ick";
|
||||
};
|
||||
@@ -27,4 +27,4 @@ and valid to enable charging:
|
||||
|
||||
- "abracon,tc-diode": should be "standard" (0.6V) or "schottky" (0.3V)
|
||||
- "abracon,tc-resistor": should be <0>, <3>, <6> or <11>. 0 disables the output
|
||||
resistor, the other values are in kOhm.
|
||||
resistor, the other values are in ohm.
|
||||
|
||||
@@ -51,7 +51,6 @@ Optional properties:
|
||||
- tx-threshold: Specify the TX FIFO low water indication for parts with
|
||||
programmable TX FIFO thresholds.
|
||||
- resets : phandle + reset specifier pairs
|
||||
- wakeup-source: wake up the system when UART receives data.
|
||||
|
||||
Note:
|
||||
* fsl,ns16550:
|
||||
|
||||
@@ -13,7 +13,6 @@ On RK3328 SoCs, the GRF adds a section for USB2PHYGRF,
|
||||
Required Properties:
|
||||
|
||||
- compatible: GRF should be one of the following:
|
||||
- "rockchip,rk1808-grf", "syscon": for rk1808
|
||||
- "rockchip,rk3036-grf", "syscon": for rk3036
|
||||
- "rockchip,rk3066-grf", "syscon": for rk3066
|
||||
- "rockchip,rk3188-grf", "syscon": for rk3188
|
||||
@@ -23,12 +22,7 @@ Required Properties:
|
||||
- "rockchip,rk3368-grf", "syscon": for rk3368
|
||||
- "rockchip,rk3399-grf", "syscon": for rk3399
|
||||
- "rockchip,rv1108-grf", "syscon": for rv1108
|
||||
- "rockchip,rv1126-grf", "syscon": for rv1126
|
||||
- compatible: COREGRF should be one of the following
|
||||
- "rockchip,rk1808-coregrf", "syscon": for rk1808
|
||||
- compatible: PMUGRF should be one of the following:
|
||||
- "rockchip,rv1108-pmugrf", "syscon": for rv1108
|
||||
- "rockchip,rv1126-pmugrf", "syscon": for rv1126
|
||||
- "rockchip,rk3368-pmugrf", "syscon": for rk3368
|
||||
- "rockchip,rk3399-pmugrf", "syscon": for rk3399
|
||||
- compatible: SGRF should be one of the following
|
||||
|
||||
@@ -6,8 +6,6 @@ powered up/down by software based on different application scenes to save power.
|
||||
Required properties for power domain controller:
|
||||
- compatible: Should be one of the following.
|
||||
"rockchip,px30-power-controller" - for PX30 SoCs.
|
||||
"rockchip,rv1126-power-controller" - for RV1126 SoCs
|
||||
"rockchip,rk1808-power-controller" - for RK1808 SoCs
|
||||
"rockchip,rk3036-power-controller" - for RK3036 SoCs.
|
||||
"rockchip,rk3128-power-controller" - for RK3128 SoCs.
|
||||
"rockchip,rk3228-power-controller" - for RK3228 SoCs.
|
||||
@@ -16,7 +14,6 @@ Required properties for power domain controller:
|
||||
"rockchip,rk3366-power-controller" - for RK3366 SoCs.
|
||||
"rockchip,rk3368-power-controller" - for RK3368 SoCs.
|
||||
"rockchip,rk3399-power-controller" - for RK3399 SoCs.
|
||||
"rockchip,rk3568-power-controller" - for RK3568 SoCs.
|
||||
- #power-domain-cells: Number of cells in a power-domain specifier.
|
||||
Should be 1 for multiple PM domains.
|
||||
- #address-cells: Should be 1.
|
||||
@@ -25,8 +22,6 @@ Required properties for power domain controller:
|
||||
Required properties for power domain sub nodes:
|
||||
- reg: index of the power domain, should use macros in:
|
||||
"include/dt-bindings/power/px30-power.h" - for PX30 type power domain.
|
||||
"include/dt-bindings/power/rv1126-power.h" - for RV1126 type power domain.
|
||||
"include/dt-bindings/power/rk1808-power.h" - for RK1808 type power domain.
|
||||
"include/dt-bindings/power/rk3036-power.h" - for RK3036 type power domain.
|
||||
"include/dt-bindings/power/rk3128-power.h" - for RK3128 type power domain.
|
||||
"include/dt-bindings/power/rk3228-power.h" - for RK3228 type power domain.
|
||||
@@ -35,7 +30,6 @@ Required properties for power domain sub nodes:
|
||||
"include/dt-bindings/power/rk3366-power.h" - for RK3366 type power domain.
|
||||
"include/dt-bindings/power/rk3368-power.h" - for RK3368 type power domain.
|
||||
"include/dt-bindings/power/rk3399-power.h" - for RK3399 type power domain.
|
||||
"include/dt-bindings/power/rk3568-power.h" - for RK3568 type power domain.
|
||||
- clocks (optional): phandles to clocks which need to be enabled while power domain
|
||||
switches state.
|
||||
- pm_qos (optional): phandles to qos blocks which need to be saved and restored
|
||||
@@ -108,8 +102,6 @@ containing a phandle to the power device node and an index specifying which
|
||||
power domain to use.
|
||||
The index should use macros in:
|
||||
"include/dt-bindings/power/px30-power.h" - for px30 type power domain.
|
||||
"include/dt-bindings/power/rv1126-power.h" - for RV1126 type power domain.
|
||||
"include/dt-bindings/power/rk1808-power.h" - for rk1808 type power domain.
|
||||
"include/dt-bindings/power/rk3036-power.h" - for rk3036 type power domain.
|
||||
"include/dt-bindings/power/rk3128-power.h" - for rk3128 type power domain.
|
||||
"include/dt-bindings/power/rk3128-power.h" - for rk3228 type power domain.
|
||||
@@ -118,7 +110,6 @@ The index should use macros in:
|
||||
"include/dt-bindings/power/rk3366-power.h" - for rk3366 type power domain.
|
||||
"include/dt-bindings/power/rk3368-power.h" - for rk3368 type power domain.
|
||||
"include/dt-bindings/power/rk3399-power.h" - for rk3399 type power domain.
|
||||
"include/dt-bindings/power/rk3568-power.h" - for rk3568 type power domain.
|
||||
|
||||
Example of the node using power domain:
|
||||
|
||||
|
||||
@@ -3,11 +3,6 @@
|
||||
Required properties:
|
||||
|
||||
- compatible: "rockchip,pdm"
|
||||
- "rockchip,px30-pdm"
|
||||
- "rockchip,rk1808-pdm"
|
||||
- "rockchip,rk3308-pdm"
|
||||
- "rockchip,rk3568-pdm"
|
||||
- "rockchip,rv1126-pdm"
|
||||
- reg: physical base address of the controller and length of memory mapped
|
||||
region.
|
||||
- dmas: DMA specifiers for rx dma. See the DMA client binding,
|
||||
@@ -17,38 +12,11 @@ Required properties:
|
||||
- clock-names: should contain following:
|
||||
- "pdm_hclk": clock for PDM BUS
|
||||
- "pdm_clk" : clock for PDM controller
|
||||
- resets: a list of phandle + reset-specifer paris, one for each entry in reset-names.
|
||||
- reset-names: reset names, should include "pdm-m".
|
||||
- rockchip,no-dmaengine: This is a boolean property. If present, driver will do not
|
||||
register pcm dmaengine, only just register dai. if the dai is part of multi-dais,
|
||||
the property should be present. Please refer to rockchip,multidais.txt about
|
||||
multi-dais usage.
|
||||
- pinctrl-names: Must contain a "default" entry.
|
||||
- pinctrl-N: One property must exist for each entry in
|
||||
pinctrl-names. See ../pinctrl/pinctrl-bindings.txt
|
||||
for details of the property values.
|
||||
|
||||
Optional properties:
|
||||
- rockchip,path-map: This is a variable length array, that shows the mapping
|
||||
of SDIx to PATHx. By default, they are one-to-one mapping as follows:
|
||||
|
||||
path0 <-- sdi0
|
||||
path1 <-- sdi1
|
||||
path2 <-- sdi2
|
||||
path3 <-- sdi3
|
||||
|
||||
e.g. "rockchip,path-map = <3 2 1 0>" means the mapping as follows:
|
||||
|
||||
path0 <-- sdi3
|
||||
path1 <-- sdi2
|
||||
path2 <-- sdi1
|
||||
path3 <-- sdi0
|
||||
|
||||
- rockchip,mclk-calibrate: This is a boolean value, if present, enable
|
||||
clk calibrate and compenation.
|
||||
|
||||
note: the corresponding property 'pdm_clk_root' should be assigned.
|
||||
|
||||
Example for rk3328 PDM controller:
|
||||
|
||||
pdm: pdm@ff040000 {
|
||||
@@ -71,12 +39,3 @@ pdm: pdm@ff040000 {
|
||||
&pdmm0_sdi2_sleep
|
||||
&pdmm0_sdi3_sleep>;
|
||||
};
|
||||
|
||||
Example for RV1126 PDM controller with mclk-calibrate:
|
||||
|
||||
&pdm {
|
||||
status = "okay";
|
||||
clocks = <&cru MCLK_PDM>, <&cru HCLK_PDM>, <&cru PLL_CPLL>;
|
||||
clock-names = "pdm_clk", "pdm_hclk", "pdm_clk_root";
|
||||
rockchip,mclk-calibrate;
|
||||
};
|
||||
|
||||
@@ -8,18 +8,14 @@ Required properties:
|
||||
- compatible: should be one of the following:
|
||||
- "rockchip,rk3066-i2s": for rk3066
|
||||
- "rockchip,px30-i2s", "rockchip,rk3066-i2s": for px30
|
||||
- "rockchip,rk1808-i2s", "rockchip,rk3066-i2s": for rk1808
|
||||
- "rockchip,rk3036-i2s", "rockchip,rk3066-i2s": for rk3036
|
||||
- "rockchip,rk3128-i2s", "rockchip,rk3066-i2s": for rk3128
|
||||
- "rockchip,rk3188-i2s", "rockchip,rk3066-i2s": for rk3188
|
||||
- "rockchip,rk3228-i2s", "rockchip,rk3066-i2s": for rk3228
|
||||
- "rockchip,rk3288-i2s", "rockchip,rk3066-i2s": for rk3288
|
||||
- "rockchip,rk3308-i2s", "rockchip,rk3066-i2s": for rk3308
|
||||
- "rockchip,rk3328-i2s", "rockchip,rk3066-i2s": for rk3328
|
||||
- "rockchip,rk3366-i2s", "rockchip,rk3066-i2s": for rk3366
|
||||
- "rockchip,rk3368-i2s", "rockchip,rk3066-i2s": for rk3368
|
||||
- "rockchip,rk3399-i2s", "rockchip,rk3066-i2s": for rk3399
|
||||
- "rockchip,rv1126-i2s", "rockchip,rk3066-i2s": for rv1126
|
||||
- reg: physical base address of the controller and length of memory mapped
|
||||
region.
|
||||
- interrupts: should contain the I2S interrupt.
|
||||
@@ -30,34 +26,14 @@ Required properties:
|
||||
- clock-names: should contain the following:
|
||||
- "i2s_hclk": clock for I2S BUS
|
||||
- "i2s_clk" : clock for I2S controller
|
||||
- resets: a list of phandle + reset-specifer paris, one for each entry in reset-names.
|
||||
- reset-names: reset names, should include "reset-m", "reset-h", there may be only
|
||||
"reset-m" on some platforms.
|
||||
- rockchip,playback-channels: max playback channels, if not set, 8 channels default.
|
||||
- rockchip,capture-channels: max capture channels, if not set, 2 channels default.
|
||||
- rockchip,bclk-fs: configure the i2s bclk fs.
|
||||
- rockchip,clk-trcm: tx and rx lrck/bclk common use.
|
||||
- 0: both tx_lrck/bclk and rx_lrck/bclk are used
|
||||
- 1: only tx_lrck/bclk is used
|
||||
- 2: only rx_lrck/bclk is used
|
||||
- rockchip,no-dmaengine: This is a boolean property. If present, driver will do not
|
||||
register pcm dmaengine, only just register dai. if the dai is part of multi-dais,
|
||||
the property should be present. Please refer to rockchip,multidais.txt about
|
||||
multi-dais usage.
|
||||
- rockchip,playback-only: Specify that the controller only has playback capability.
|
||||
- rockchip,capture-only: Specify that the controller only has capture capability.
|
||||
|
||||
Required properties for controller which support multi channels
|
||||
playback/capture:
|
||||
|
||||
- rockchip,grf: the phandle of the syscon node for GRF register.
|
||||
|
||||
Optional properties:
|
||||
- rockchip,mclk-calibrate: This is a boolean value, if present, enable
|
||||
clk calibrate and compenation.
|
||||
|
||||
note: the corresponding property 'i2s_clk_root' should be assigned.
|
||||
|
||||
Example for rk3288 I2S controller:
|
||||
|
||||
i2s@ff890000 {
|
||||
@@ -70,14 +46,4 @@ i2s@ff890000 {
|
||||
clocks = <&cru HCLK_I2S0>, <&cru SCLK_I2S0>;
|
||||
rockchip,playback-channels = <8>;
|
||||
rockchip,capture-channels = <2>;
|
||||
rockchip,bclk-fs = <64>;
|
||||
};
|
||||
|
||||
Example for RV1126 I2S controller with mclk-calibrate:
|
||||
|
||||
&i2s1_2ch {
|
||||
status = "okay";
|
||||
clocks = <&cru MCLK_I2S1>, <&cru HCLK_I2S1>, <&cru PLL_CPLL>;
|
||||
clock-names = "i2s_clk", "i2s_hclk", "i2s_clk_root";
|
||||
rockchip,mclk-calibrate;
|
||||
};
|
||||
|
||||
@@ -15,7 +15,6 @@ Required properties:
|
||||
- "rockchip,rk3366-spdif"
|
||||
- "rockchip,rk3368-spdif"
|
||||
- "rockchip,rk3399-spdif"
|
||||
- "rockchip,rk3568-spdif"
|
||||
- reg: physical base address of the controller and length of memory mapped
|
||||
region.
|
||||
- interrupts: should contain the SPDIF interrupt.
|
||||
|
||||
@@ -33,8 +33,6 @@ Optional properties:
|
||||
2: Scale current by 1.0
|
||||
3: Scale current by 1.5
|
||||
|
||||
- spk-con-gpio: speaker amplifier enable/disable control
|
||||
|
||||
Pins on the device (for linking into audio routes) for RT5651:
|
||||
|
||||
* DMIC L1
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user