rk: revert to v4.19

Signed-off-by: Tao Huang <huangtao@rock-chips.com>
Change-Id: I502dce68b639df4ebf5a1688e0dc2e5c5763ebc2
This commit is contained in:
Tao Huang
2021-03-17 18:05:39 +08:00
parent 8e964a33a1
commit 251c226c35
10942 changed files with 116283 additions and 538595 deletions

4
.gitignore vendored
View File

@@ -48,10 +48,6 @@ modules.builtin
#
# Top-level generic files
#
/boot.img
/kernel.img
/resource.img
/zboot.img
/tags
/TAGS
/linux

View File

@@ -6,8 +6,6 @@ Description:
This file allows user to read/write the raw NVMEM contents.
Permissions for write to this file depends on the nvmem
provider configuration.
Note: This file is only present if CONFIG_NVMEM_SYSFS
is enabled
ex:
hexdump /sys/bus/nvmem/devices/qfprom0/nvmem

View File

@@ -12,10 +12,6 @@ Date: Dec 2014
KernelVersion: 4.0
Description: Control descriptors
All attributes read only:
bInterfaceNumber - USB interface number for this
streaming interface
What: /config/usb-gadget/gadget/functions/uvc.name/control/class
Date: Dec 2014
KernelVersion: 4.0
@@ -113,10 +109,6 @@ Date: Dec 2014
KernelVersion: 4.0
Description: Streaming descriptors
All attributes read only:
bInterfaceNumber - USB interface number for this
streaming interface
What: /config/usb-gadget/gadget/functions/uvc.name/streaming/class
Date: Dec 2014
KernelVersion: 4.0
@@ -168,10 +160,6 @@ Description: Specific MJPEG format descriptors
All attributes read only,
except bmaControls and bDefaultFrameIndex:
bFormatIndex - unique id for this format descriptor;
only defined after parent header is
linked into the streaming class;
read-only
bmaControls - this format's data for bmaControls in
the streaming header
bmInterfaceFlags - specifies interlace information,
@@ -189,10 +177,6 @@ Date: Dec 2014
KernelVersion: 4.0
Description: Specific MJPEG frame descriptors
bFrameIndex - unique id for this framedescriptor;
only defined after parent format is
linked into the streaming header;
read-only
dwFrameInterval - indicates how frame interval can be
programmed; a number of values
separated by newline can be specified
@@ -220,10 +204,6 @@ Date: Dec 2014
KernelVersion: 4.0
Description: Specific uncompressed format descriptors
bFormatIndex - unique id for this format descriptor;
only defined after parent header is
linked into the streaming class;
read-only
bmaControls - this format's data for bmaControls in
the streaming header
bmInterfaceFlags - specifies interlace information,
@@ -244,10 +224,6 @@ Date: Dec 2014
KernelVersion: 4.0
Description: Specific uncompressed frame descriptors
bFrameIndex - unique id for this framedescriptor;
only defined after parent format is
linked into the streaming header;
read-only
dwFrameInterval - indicates how frame interval can be
programmed; a number of values
separated by newline can be specified

View File

@@ -1,16 +0,0 @@
What: /proc/uid_concurrent_active_time
Date: December 2018
Contact: Connor O'Brien <connoro@google.com>
Description:
The /proc/uid_concurrent_active_time file displays aggregated cputime
numbers for each uid, broken down by the total number of cores that were
active while the uid's task was running.
What: /proc/uid_concurrent_policy_time
Date: December 2018
Contact: Connor O'Brien <connoro@google.com>
Description:
The /proc/uid_concurrent_policy_time file displays aggregated cputime
numbers for each uid, broken down based on the cpufreq policy
of the core used by the uid's task and the number of cores associated
with that policy that were active while the uid's task was running.

View File

@@ -98,42 +98,3 @@ Description:
The backing_dev file is read-write and set up backing
device for zram to write incompressible pages.
For using, user should enable CONFIG_ZRAM_WRITEBACK.
What: /sys/block/zram<id>/idle
Date: November 2018
Contact: Minchan Kim <minchan@kernel.org>
Description:
idle file is write-only and mark zram slot as idle.
If system has mounted debugfs, user can see which slots
are idle via /sys/kernel/debug/zram/zram<id>/block_state
What: /sys/block/zram<id>/writeback
Date: November 2018
Contact: Minchan Kim <minchan@kernel.org>
Description:
The writeback file is write-only and trigger idle and/or
huge page writeback to backing device.
What: /sys/block/zram<id>/bd_stat
Date: November 2018
Contact: Minchan Kim <minchan@kernel.org>
Description:
The bd_stat file is read-only and represents backing device's
statistics (bd_count, bd_reads, bd_writes) in a format
similar to block layer statistics file format.
What: /sys/block/zram<id>/writeback_limit_enable
Date: November 2018
Contact: Minchan Kim <minchan@kernel.org>
Description:
The writeback_limit_enable file is read-write and specifies
eanbe of writeback_limit feature. "1" means eable the feature.
No limit "0" is the initial state.
What: /sys/block/zram<id>/writeback_limit
Date: November 2018
Contact: Minchan Kim <minchan@kernel.org>
Description:
The writeback_limit file is read-write and specifies the maximum
amount of writeback ZRAM can do. The limit could be changed
in run time.

View File

@@ -199,7 +199,7 @@ Description:
What: /sys/bus/iio/devices/iio:deviceX/in_positionrelative_x_raw
What: /sys/bus/iio/devices/iio:deviceX/in_positionrelative_y_raw
KernelVersion: 4.19
KernelVersion: 4.18
Contact: linux-iio@vger.kernel.org
Description:
Relative position in direction x or y on a pad (may be
@@ -1559,8 +1559,7 @@ What: /sys/bus/iio/devices/iio:deviceX/in_concentrationX_voc_raw
KernelVersion: 4.3
Contact: linux-iio@vger.kernel.org
Description:
Raw (unscaled no offset etc.) reading of a substance. Units
after application of scale and offset are percents.
Raw (unscaled no offset etc.) percentage reading of a substance.
What: /sys/bus/iio/devices/iio:deviceX/in_resistance_raw
What: /sys/bus/iio/devices/iio:deviceX/in_resistanceX_raw

View File

@@ -4,7 +4,7 @@ KernelVersion: 3.10
Contact: Samuel Ortiz <sameo@linux.intel.com>
linux-mei@linux.intel.com
Description: Stores the same MODALIAS value emitted by uevent
Format: mei:<mei device name>:<device uuid>:<protocol version>
Format: mei:<mei device name>:<device uuid>:
What: /sys/bus/mei/devices/.../name
Date: May 2015

View File

@@ -7,13 +7,6 @@ Description:
The name of devfreq object denoted as ... is same as the
name of device using devfreq.
What: /sys/class/devfreq/.../name
Date: November 2019
Contact: Chanwoo Choi <cw00.choi@samsung.com>
Description:
The /sys/class/devfreq/.../name shows the name of device
of the corresponding devfreq object.
What: /sys/class/devfreq/.../governor
Date: September 2011
Contact: MyungJoo Ham <myungjoo.ham@samsung.com>

View File

@@ -29,7 +29,7 @@ Contact: Bjørn Mork <bjorn@mork.no>
Description:
Unsigned integer.
Write a number ranging from 1 to 254 to add a qmap mux
Write a number ranging from 1 to 127 to add a qmap mux
based network device, supported by recent Qualcomm based
modems.
@@ -46,5 +46,5 @@ Contact: Bjørn Mork <bjorn@mork.no>
Description:
Unsigned integer.
Write a number ranging from 1 to 254 to delete a previously
Write a number ranging from 1 to 127 to delete a previously
created qmap mux based network device.

View File

@@ -144,8 +144,7 @@ Description:
Access: Read
Valid values: "Unknown", "Good", "Overheat", "Dead",
"Over voltage", "Unspecified failure", "Cold",
"Watchdog timer expire", "Safety timer expire",
"Over current", "Warm", "Cool", "Hot"
"Watchdog timer expire", "Safety timer expire"
What: /sys/class/power_supply/<supply_name>/precharge_current
Date: June 2017

View File

@@ -1,76 +0,0 @@
What: /sys/class/wakeup/
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
The /sys/class/wakeup/ directory contains pointers to all
wakeup sources in the kernel at that moment in time.
What: /sys/class/wakeup/.../name
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the name of the wakeup source.
What: /sys/class/wakeup/.../active_count
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the number of times the wakeup source was
activated.
What: /sys/class/wakeup/.../event_count
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the number of signaled wakeup events
associated with the wakeup source.
What: /sys/class/wakeup/.../wakeup_count
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the number of times the wakeup source might
abort suspend.
What: /sys/class/wakeup/.../expire_count
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the number of times the wakeup source's
timeout has expired.
What: /sys/class/wakeup/.../active_time_ms
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the amount of time the wakeup source has
been continuously active, in milliseconds. If the wakeup
source is not active, this file contains '0'.
What: /sys/class/wakeup/.../total_time_ms
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the total amount of time this wakeup source
has been active, in milliseconds.
What: /sys/class/wakeup/.../max_time_ms
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the maximum amount of time this wakeup
source has been continuously active, in milliseconds.
What: /sys/class/wakeup/.../last_change_ms
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
This file contains the monotonic clock time when the wakeup
source was touched last time, in milliseconds.
What: /sys/class/wakeup/.../prevent_suspend_time_ms
Date: June 2019
Contact: Tri Vo <trong@android.com>
Description:
The file contains the total amount of time this wakeup source
has been preventing autosleep, in milliseconds.

View File

@@ -477,10 +477,6 @@ What: /sys/devices/system/cpu/vulnerabilities
/sys/devices/system/cpu/vulnerabilities/spectre_v2
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
/sys/devices/system/cpu/vulnerabilities/l1tf
/sys/devices/system/cpu/vulnerabilities/mds
/sys/devices/system/cpu/vulnerabilities/srbds
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
/sys/devices/system/cpu/vulnerabilities/itlb_multihit
Date: January 2018
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description: Information about CPU vulnerabilities
@@ -493,7 +489,8 @@ Description: Information about CPU vulnerabilities
"Vulnerable" CPU is affected and no mitigation in effect
"Mitigation: $M" CPU is affected and mitigation $M is in effect
See also: Documentation/admin-guide/hw-vuln/index.rst
Details about the l1tf file can be found in
Documentation/admin-guide/l1tf.rst
What: /sys/devices/system/cpu/smt
/sys/devices/system/cpu/smt/active

View File

@@ -1,349 +1,214 @@
What: /sys/fs/f2fs/<disk>/gc_max_sleep_time
Date: July 2013
Contact: "Namjae Jeon" <namjae.jeon@samsung.com>
Description: Controls the maximum sleep time for gc_thread. Time
is in milliseconds.
Description:
Controls the maximun sleep time for gc_thread. Time
is in milliseconds.
What: /sys/fs/f2fs/<disk>/gc_min_sleep_time
Date: July 2013
Contact: "Namjae Jeon" <namjae.jeon@samsung.com>
Description: Controls the minimum sleep time for gc_thread. Time
is in milliseconds.
Description:
Controls the minimum sleep time for gc_thread. Time
is in milliseconds.
What: /sys/fs/f2fs/<disk>/gc_no_gc_sleep_time
Date: July 2013
Contact: "Namjae Jeon" <namjae.jeon@samsung.com>
Description: Controls the default sleep time for gc_thread. Time
is in milliseconds.
Description:
Controls the default sleep time for gc_thread. Time
is in milliseconds.
What: /sys/fs/f2fs/<disk>/gc_idle
Date: July 2013
Contact: "Namjae Jeon" <namjae.jeon@samsung.com>
Description: Controls the victim selection policy for garbage collection.
Setting gc_idle = 0(default) will disable this option. Setting
gc_idle = 1 will select the Cost Benefit approach & setting
gc_idle = 2 will select the greedy approach.
Description:
Controls the victim selection policy for garbage collection.
What: /sys/fs/f2fs/<disk>/reclaim_segments
Date: October 2013
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: This parameter controls the number of prefree segments to be
reclaimed. If the number of prefree segments is larger than
the number of segments in the proportion to the percentage
over total volume size, f2fs tries to conduct checkpoint to
reclaim the prefree segments to free segments.
By default, 5% over total # of segments.
What: /sys/fs/f2fs/<disk>/main_blkaddr
Date: November 2019
Contact: "Ramon Pantin" <pantin@google.com>
Description:
Shows first block address of MAIN area.
Controls the issue rate of segment discard commands.
What: /sys/fs/f2fs/<disk>/ipu_policy
Date: November 2013
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: Controls the in-place-update policy.
updates in f2fs. User can set:
0x01: F2FS_IPU_FORCE, 0x02: F2FS_IPU_SSR,
0x04: F2FS_IPU_UTIL, 0x08: F2FS_IPU_SSR_UTIL,
0x10: F2FS_IPU_FSYNC, 0x20: F2FS_IPU_ASYNC,
0x40: F2FS_IPU_NOCACHE.
Refer segment.h for details.
Description:
Controls the in-place-update policy.
What: /sys/fs/f2fs/<disk>/min_ipu_util
Date: November 2013
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: Controls the FS utilization condition for the in-place-update
policies. It is used by F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies.
Description:
Controls the FS utilization condition for the in-place-update
policies.
What: /sys/fs/f2fs/<disk>/min_fsync_blocks
Date: September 2014
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls the dirty page count condition for the in-place-update
policies.
Description:
Controls the dirty page count condition for the in-place-update
policies.
What: /sys/fs/f2fs/<disk>/min_seq_blocks
Date: August 2018
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls the dirty page count condition for batched sequential
writes in writepages.
Description:
Controls the dirty page count condition for batched sequential
writes in ->writepages.
What: /sys/fs/f2fs/<disk>/min_hot_blocks
Date: March 2017
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls the dirty page count condition for redefining hot data.
Description:
Controls the dirty page count condition for redefining hot data.
What: /sys/fs/f2fs/<disk>/min_ssr_sections
Date: October 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Controls the free section threshold to trigger SSR allocation.
If this is large, SSR mode will be enabled early.
Description:
Controls the fee section threshold to trigger SSR allocation.
What: /sys/fs/f2fs/<disk>/max_small_discards
Date: November 2013
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: Controls the issue rate of discard commands that consist of small
blocks less than 2MB. The candidates to be discarded are cached until
checkpoint is triggered, and issued during the checkpoint.
By default, it is disabled with 0.
Description:
Controls the issue rate of small discard commands.
What: /sys/fs/f2fs/<disk>/discard_granularity
Date: July 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Controls discard granularity of inner discard thread. Inner thread
What: /sys/fs/f2fs/<disk>/discard_granularity
Date: July 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description:
Controls discard granularity of inner discard thread, inner thread
will not issue discards with size that is smaller than granularity.
The unit size is one block(4KB), now only support configuring
in range of [1, 512]. Default value is 4(=16KB).
What: /sys/fs/f2fs/<disk>/umount_discard_timeout
Date: January 2019
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Set timeout to issue discard commands during umount.
Default: 5 secs
The unit size is one block, now only support configuring in range
of [1, 512].
What: /sys/fs/f2fs/<disk>/max_victim_search
Date: January 2014
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: Controls the number of trials to find a victim segment
when conducting SSR and cleaning operations. The default value
is 4096 which covers 8GB block address range.
What: /sys/fs/f2fs/<disk>/migration_granularity
Date: October 2018
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Controls migration granularity of garbage collection on large
section, it can let GC move partial segment{s} of one section
in one GC cycle, so that dispersing heavy overhead GC to
multiple lightweight one.
Description:
Controls the number of trials to find a victim segment.
What: /sys/fs/f2fs/<disk>/dir_level
Date: March 2014
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: Controls the directory level for large directory. If a
directory has a number of files, it can reduce the file lookup
latency by increasing this dir_level value. Otherwise, it
needs to decrease this value to reduce the space overhead.
The default value is 0.
Description:
Controls the directory level for large directory.
What: /sys/fs/f2fs/<disk>/ram_thresh
Date: March 2014
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
Description: Controls the memory footprint used by free nids and cached
nat entries. By default, 1 is set, which indicates
10 MB / 1 GB RAM.
Description:
Controls the memory footprint used by f2fs.
What: /sys/fs/f2fs/<disk>/batched_trim_sections
Date: February 2015
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls the trimming rate in batch mode.
<deprecated>
Description:
Controls the trimming rate in batch mode.
<deprecated>
What: /sys/fs/f2fs/<disk>/cp_interval
Date: October 2015
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls the checkpoint timing, set to 60 seconds by default.
Description:
Controls the checkpoint timing.
What: /sys/fs/f2fs/<disk>/idle_interval
Date: January 2016
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls the idle timing of system, if there is no FS operation
during given interval.
Set to 5 seconds by default.
What: /sys/fs/f2fs/<disk>/discard_idle_interval
Date: September 2018
Contact: "Chao Yu" <yuchao0@huawei.com>
Contact: "Sahitya Tummala" <stummala@codeaurora.org>
Description: Controls the idle timing of discard thread given
this time interval.
Default is 5 secs.
What: /sys/fs/f2fs/<disk>/gc_idle_interval
Date: September 2018
Contact: "Chao Yu" <yuchao0@huawei.com>
Contact: "Sahitya Tummala" <stummala@codeaurora.org>
Description: Controls the idle timing for gc path. Set to 5 seconds by default.
Description:
Controls the idle timing.
What: /sys/fs/f2fs/<disk>/iostat_enable
Date: August 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Controls to enable/disable IO stat.
Description:
Controls to enable/disable IO stat.
What: /sys/fs/f2fs/<disk>/ra_nid_pages
Date: October 2015
Contact: "Chao Yu" <chao2.yu@samsung.com>
Description: Controls the count of nid pages to be readaheaded.
When building free nids, F2FS reads NAT blocks ahead for
speed up. Default is 0.
Description:
Controls the count of nid pages to be readaheaded.
What: /sys/fs/f2fs/<disk>/dirty_nats_ratio
Date: January 2016
Contact: "Chao Yu" <chao2.yu@samsung.com>
Description: Controls dirty nat entries ratio threshold, if current
ratio exceeds configured threshold, checkpoint will
be triggered for flushing dirty nat entries.
Description:
Controls dirty nat entries ratio threshold, if current
ratio exceeds configured threshold, checkpoint will
be triggered for flushing dirty nat entries.
What: /sys/fs/f2fs/<disk>/lifetime_write_kbytes
Date: January 2016
Contact: "Shuoran Liu" <liushuoran@huawei.com>
Description: Shows total written kbytes issued to disk.
Description:
Shows total written kbytes issued to disk.
What: /sys/fs/f2fs/<disk>/features
Date: July 2017
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Shows all enabled features in current device.
Description:
Shows all enabled features in current device.
What: /sys/fs/f2fs/<disk>/inject_rate
Date: May 2016
Contact: "Sheng Yong" <shengyong1@huawei.com>
Description: Controls the injection rate of arbitrary faults.
Description:
Controls the injection rate.
What: /sys/fs/f2fs/<disk>/inject_type
Date: May 2016
Contact: "Sheng Yong" <shengyong1@huawei.com>
Description: Controls the injection type of arbitrary faults.
What: /sys/fs/f2fs/<disk>/dirty_segments
Date: October 2017
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Shows the number of dirty segments.
Description:
Controls the injection type.
What: /sys/fs/f2fs/<disk>/reserved_blocks
Date: June 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Controls target reserved blocks in system, the threshold
is soft, it could exceed current available user space.
Description:
Controls target reserved blocks in system, the threshold
is soft, it could exceed current available user space.
What: /sys/fs/f2fs/<disk>/current_reserved_blocks
Date: October 2017
Contact: "Yunlong Song" <yunlong.song@huawei.com>
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Shows current reserved blocks in system, it may be temporarily
smaller than target_reserved_blocks, but will gradually
increase to target_reserved_blocks when more free blocks are
freed by user later.
Description:
Shows current reserved blocks in system, it may be temporarily
smaller than target_reserved_blocks, but will gradually
increase to target_reserved_blocks when more free blocks are
freed by user later.
What: /sys/fs/f2fs/<disk>/gc_urgent
Date: August 2017
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Do background GC agressively when set. When gc_urgent = 1,
background thread starts to do GC by given gc_urgent_sleep_time
interval. It is set to 0 by default.
Description:
Do background GC agressively
What: /sys/fs/f2fs/<disk>/gc_urgent_sleep_time
Date: August 2017
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Controls sleep time of GC urgent mode. Set to 500ms by default.
Description:
Controls sleep time of GC urgent mode
What: /sys/fs/f2fs/<disk>/readdir_ra
Date: November 2017
Contact: "Sheng Yong" <shengyong1@huawei.com>
Description: Controls readahead inode block in readdir. Enabled by default.
What: /sys/fs/f2fs/<disk>/gc_pin_file_thresh
Date: January 2018
Contact: Jaegeuk Kim <jaegeuk@kernel.org>
Description: This indicates how many GC can be failed for the pinned
file. If it exceeds this, F2FS doesn't guarantee its pinning
state. 2048 trials is set by default.
Description:
Controls readahead inode block in readdir.
What: /sys/fs/f2fs/<disk>/extension_list
Date: Feburary 2018
Contact: "Chao Yu" <yuchao0@huawei.com>
Description: Used to control configure extension list:
- Query: cat /sys/fs/f2fs/<disk>/extension_list
- Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
- [h] means add/del hot file extension
- [c] means add/del cold file extension
What: /sys/fs/f2fs/<disk>/unusable
Date April 2019
Contact: "Daniel Rosenberg" <drosen@google.com>
Description: If checkpoint=disable, it displays the number of blocks that
are unusable.
If checkpoint=enable it displays the enumber of blocks that
would be unusable if checkpoint=disable were to be set.
What: /sys/fs/f2fs/<disk>/encoding
Date July 2019
Contact: "Daniel Rosenberg" <drosen@google.com>
Description: Displays name and version of the encoding set for the filesystem.
If no encoding is set, displays (none)
What: /sys/fs/f2fs/<disk>/free_segments
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of free segments in disk.
What: /sys/fs/f2fs/<disk>/cp_foreground_calls
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of checkpoint operations performed on demand. Available when
CONFIG_F2FS_STAT_FS=y.
What: /sys/fs/f2fs/<disk>/cp_background_calls
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of checkpoint operations performed in the background to
free segments. Available when CONFIG_F2FS_STAT_FS=y.
What: /sys/fs/f2fs/<disk>/gc_foreground_calls
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of garbage collection operations performed on demand.
Available when CONFIG_F2FS_STAT_FS=y.
What: /sys/fs/f2fs/<disk>/gc_background_calls
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of garbage collection operations triggered in background.
Available when CONFIG_F2FS_STAT_FS=y.
What: /sys/fs/f2fs/<disk>/moved_blocks_foreground
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of blocks moved by garbage collection in foreground.
Available when CONFIG_F2FS_STAT_FS=y.
What: /sys/fs/f2fs/<disk>/moved_blocks_background
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Number of blocks moved by garbage collection in background.
Available when CONFIG_F2FS_STAT_FS=y.
What: /sys/fs/f2fs/<disk>/avg_vblocks
Date: September 2019
Contact: "Hridya Valsaraju" <hridya@google.com>
Description: Average number of valid blocks.
Available when CONFIG_F2FS_STAT_FS=y.
What: /sys/fs/f2fs/<disk>/mounted_time_sec
Date: February 2020
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Show the mounted time in secs of this partition.
What: /sys/fs/f2fs/<disk>/data_io_flag
Date: April 2020
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Give a way to attach REQ_META|FUA to data writes
given temperature-based bits. Now the bits indicate:
* REQ_META | REQ_FUA |
* 5 | 4 | 3 | 2 | 1 | 0 |
* Cold | Warm | Hot | Cold | Warm | Hot |
What: /sys/fs/f2fs/<disk>/node_io_flag
Date: June 2020
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Give a way to attach REQ_META|FUA to node writes
given temperature-based bits. Now the bits indicate:
* REQ_META | REQ_FUA |
* 5 | 4 | 3 | 2 | 1 | 0 |
* Cold | Warm | Hot | Cold | Warm | Hot |
What: /sys/fs/f2fs/<disk>/iostat_period_ms
Date: April 2020
Contact: "Daeho Jeong" <daehojeong@google.com>
Description: Give a way to change iostat_period time. 3secs by default.
The new iostat trace gives stats gap given the period.
Description:
Used to control configure extension list:
- Query: cat /sys/fs/f2fs/<disk>/extension_list
- Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
- [h] means add/del hot file extension
- [c] means add/del cold file extension

View File

@@ -1,27 +0,0 @@
What: /sys/kernel/ion
Date: Dec 2019
KernelVersion: 4.14.158
Contact: Suren Baghdasaryan <surenb@google.com>,
Sandeep Patil <sspatil@google.com>
Description:
The /sys/kernel/ion directory contains a snapshot of the
internal state of ION memory heaps and pools.
Users: kernel memory tuning tools
What: /sys/kernel/ion/total_heaps_kb
Date: Dec 2019
KernelVersion: 4.14.158
Contact: Suren Baghdasaryan <surenb@google.com>,
Sandeep Patil <sspatil@google.com>
Description:
The total_heaps_kb file is read-only and specifies how much
memory in Kb is allocated to ION heaps.
What: /sys/kernel/ion/total_pools_kb
Date: Dec 2019
KernelVersion: 4.14.158
Contact: Suren Baghdasaryan <surenb@google.com>,
Sandeep Patil <sspatil@google.com>
Description:
The total_pools_kb file is read-only and specifies how much
memory in Kb is allocated to ION pools.

View File

@@ -1,16 +0,0 @@
What: /sys/kernel/wakeup_reasons/last_resume_reason
Date: February 2014
Contact: Ruchi Kandoi <kandoiruchi@google.com>
Description:
The /sys/kernel/wakeup_reasons/last_resume_reason is
used to report wakeup reasons after system exited suspend.
What: /sys/kernel/wakeup_reasons/last_suspend_time
Date: March 2015
Contact: jinqian <jinqian@google.com>
Description:
The /sys/kernel/wakeup_reasons/last_suspend_time is
used to report time spent in last suspend cycle. It contains
two numbers (in seconds) separated by space. First number is
the time spent in suspend and resume processes. Second number
is the time spent in sleep state.

View File

@@ -300,110 +300,4 @@ Description:
attempt.
Using this sysfs file will override any values that were
set using the kernel command line for disk offset.
What: /sys/power/suspend_stats
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats directory contains suspend related
statistics.
What: /sys/power/suspend_stats/success
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/success file contains the number
of times entering system sleep state succeeded.
What: /sys/power/suspend_stats/fail
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/fail file contains the number
of times entering system sleep state failed.
What: /sys/power/suspend_stats/failed_freeze
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_freeze file contains the
number of times freezing processes failed.
What: /sys/power/suspend_stats/failed_prepare
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_prepare file contains the
number of times preparing all non-sysdev devices for
a system PM transition failed.
What: /sys/power/suspend_stats/failed_resume
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_resume file contains the
number of times executing "resume" callbacks of
non-sysdev devices failed.
What: /sys/power/suspend_stats/failed_resume_early
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_resume_early file contains
the number of times executing "early resume" callbacks
of devices failed.
What: /sys/power/suspend_stats/failed_resume_noirq
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_resume_noirq file contains
the number of times executing "noirq resume" callbacks
of devices failed.
What: /sys/power/suspend_stats/failed_suspend
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_suspend file contains
the number of times executing "suspend" callbacks
of all non-sysdev devices failed.
What: /sys/power/suspend_stats/failed_suspend_late
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_suspend_late file contains
the number of times executing "late suspend" callbacks
of all devices failed.
What: /sys/power/suspend_stats/failed_suspend_noirq
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/failed_suspend_noirq file contains
the number of times executing "noirq suspend" callbacks
of all devices failed.
What: /sys/power/suspend_stats/last_failed_dev
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/last_failed_dev file contains
the last device for which a suspend/resume callback failed.
What: /sys/power/suspend_stats/last_failed_errno
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/last_failed_errno file contains
the errno of the last failed attempt at entering
system sleep state.
What: /sys/power/suspend_stats/last_failed_step
Date: July 2019
Contact: Kalesh Singh <kaleshsingh96@gmail.com>
Description:
The /sys/power/suspend_stats/last_failed_step file contains
the last failed step in the suspend/resume path.
set using the kernel command line for disk offset.

View File

@@ -1,180 +0,0 @@
================================
PSI - Pressure Stall Information
================================
:Date: April, 2018
:Author: Johannes Weiner <hannes@cmpxchg.org>
When CPU, memory or IO devices are contended, workloads experience
latency spikes, throughput losses, and run the risk of OOM kills.
Without an accurate measure of such contention, users are forced to
either play it safe and under-utilize their hardware resources, or
roll the dice and frequently suffer the disruptions resulting from
excessive overcommit.
The psi feature identifies and quantifies the disruptions caused by
such resource crunches and the time impact it has on complex workloads
or even entire systems.
Having an accurate measure of productivity losses caused by resource
scarcity aids users in sizing workloads to hardware--or provisioning
hardware according to workload demand.
As psi aggregates this information in realtime, systems can be managed
dynamically using techniques such as load shedding, migrating jobs to
other systems or data centers, or strategically pausing or killing low
priority or restartable batch jobs.
This allows maximizing hardware utilization without sacrificing
workload health or risking major disruptions such as OOM kills.
Pressure interface
==================
Pressure information for each resource is exported through the
respective file in /proc/pressure/ -- cpu, memory, and io.
The format for CPU is as such:
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
and for memory and IO:
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
The "some" line indicates the share of time in which at least some
tasks are stalled on a given resource.
The "full" line indicates the share of time in which all non-idle
tasks are stalled on a given resource simultaneously. In this state
actual CPU cycles are going to waste, and a workload that spends
extended time in this state is considered to be thrashing. This has
severe impact on performance, and it's useful to distinguish this
situation from a state where some tasks are stalled but the CPU is
still doing productive work. As such, time spent in this subset of the
stall state is tracked separately and exported in the "full" averages.
The ratios are tracked as recent trends over ten, sixty, and three
hundred second windows, which gives insight into short term events as
well as medium and long term trends. The total absolute stall time is
tracked and exported as well, to allow detection of latency spikes
which wouldn't necessarily make a dent in the time averages, or to
average trends over custom time frames.
Monitoring for pressure thresholds
==================================
Users can register triggers and use poll() to be woken up when resource
pressure exceeds certain thresholds.
A trigger describes the maximum cumulative stall time over a specific
time window, e.g. 100ms of total stall time within any 500ms window to
generate a wakeup event.
To register a trigger user has to open psi interface file under
/proc/pressure/ representing the resource to be monitored and write the
desired threshold and time window. The open file descriptor should be
used to wait for trigger events using select(), poll() or epoll().
The following format is used:
<some|full> <stall amount in us> <time window in us>
For example writing "some 150000 1000000" into /proc/pressure/memory
would add 150ms threshold for partial memory stall measured within
1sec time window. Writing "full 50000 1000000" into /proc/pressure/io
would add 50ms threshold for full io stall measured within 1sec time window.
Triggers can be set on more than one psi metric and more than one trigger
for the same psi metric can be specified. However for each trigger a separate
file descriptor is required to be able to poll it separately from others,
therefore for each trigger a separate open() syscall should be made even
when opening the same psi interface file.
Monitors activate only when system enters stall state for the monitored
psi metric and deactivates upon exit from the stall state. While system is
in the stall state psi signal growth is monitored at a rate of 10 times per
tracking window.
The kernel accepts window sizes ranging from 500ms to 10s, therefore min
monitoring update interval is 50ms and max is 1s. Min limit is set to
prevent overly frequent polling. Max limit is chosen as a high enough number
after which monitors are most likely not needed and psi averages can be used
instead.
When activated, psi monitor stays active for at least the duration of one
tracking window to avoid repeated activations/deactivations when system is
bouncing in and out of the stall state.
Notifications to the userspace are rate-limited to one per tracking window.
The trigger will de-register when the file descriptor used to define the
trigger is closed.
Userspace monitor usage example
===============================
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <poll.h>
#include <string.h>
#include <unistd.h>
/*
* Monitor memory partial stall with 1s tracking window size
* and 150ms threshold.
*/
int main() {
const char trig[] = "some 150000 1000000";
struct pollfd fds;
int n;
fds.fd = open("/proc/pressure/memory", O_RDWR | O_NONBLOCK);
if (fds.fd < 0) {
printf("/proc/pressure/memory open error: %s\n",
strerror(errno));
return 1;
}
fds.events = POLLPRI;
if (write(fds.fd, trig, strlen(trig) + 1) < 0) {
printf("/proc/pressure/memory write error: %s\n",
strerror(errno));
return 1;
}
printf("waiting for events...\n");
while (1) {
n = poll(&fds, 1, -1);
if (n < 0) {
printf("poll error: %s\n", strerror(errno));
return 1;
}
if (fds.revents & POLLERR) {
printf("got POLLERR, event source is gone\n");
return 0;
}
if (fds.revents & POLLPRI) {
printf("event triggered!\n");
} else {
printf("unknown event received: 0x%x\n", fds.revents);
return 1;
}
}
return 0;
}
Cgroup2 interface
=================
In a system with a CONFIG_CGROUP=y kernel and the cgroup2 filesystem
mounted, pressure stall information is also tracked for tasks grouped
into cgroups. Each subdirectory in the cgroupfs mountpoint contains
cpu.pressure, memory.pressure, and io.pressure files; the format is
the same as the /proc/pressure/ files.
Per-cgroup psi monitors can be specified and used the same way as
system-wide ones.

View File

@@ -694,12 +694,6 @@ Conventions
informational files on the root cgroup which end up showing global
information available elsewhere shouldn't exist.
- The default time unit is microseconds. If a different unit is ever
used, an explicit unit suffix must be present.
- A parts-per quantity should use a percentage decimal with at least
two digit fractional part - e.g. 13.40.
- If a controller implements weight based resource distribution, its
interface file should be named "weight" and have the range [1,
10000] with 100 as the default. The values are chosen to allow
@@ -913,13 +907,6 @@ controller implements weight and absolute bandwidth limit models for
normal scheduling policy and absolute bandwidth allocation model for
realtime scheduling policy.
In all the above models, cycles distribution is defined only on a temporal
base and it does not account for the frequency at which tasks are executed.
The (optional) utilization clamping support allows to hint the schedutil
cpufreq governor about the minimum desired frequency which should always be
provided by a CPU, as well as the maximum desired frequency, which should not
be exceeded by a CPU.
WARNING: cgroup2 doesn't yet support control of realtime processes and
the cpu controller can only be enabled when all RT processes are in
the root cgroup. Be aware that system management software may already
@@ -979,39 +966,6 @@ All time durations are in microseconds.
$PERIOD duration. "max" for $MAX indicates no limit. If only
one number is written, $MAX is updated.
cpu.pressure
A read-only nested-key file which exists on non-root cgroups.
Shows pressure stall information for CPU. See
Documentation/accounting/psi.txt for details.
cpu.uclamp.min
A read-write single value file which exists on non-root cgroups.
The default is "0", i.e. no utilization boosting.
The requested minimum utilization (protection) as a percentage
rational number, e.g. 12.34 for 12.34%.
This interface allows reading and setting minimum utilization clamp
values similar to the sched_setattr(2). This minimum utilization
value is used to clamp the task specific minimum utilization clamp.
The requested minimum utilization (protection) is always capped by
the current value for the maximum utilization (limit), i.e.
`cpu.uclamp.max`.
cpu.uclamp.max
A read-write single value file which exists on non-root cgroups.
The default is "max". i.e. no utilization capping
The requested maximum utilization (limit) as a percentage rational
number, e.g. 98.76 for 98.76%.
This interface allows reading and setting maximum utilization clamp
values similar to the sched_setattr(2). This maximum utilization
value is used to clamp the task specific maximum utilization clamp.
Memory
------
@@ -1317,12 +1271,6 @@ PAGE_SIZE multiple when read back.
higher than the limit for an extended period of time. This
reduces the impact on the workload and memory management.
memory.pressure
A read-only nested-key file which exists on non-root cgroups.
Shows pressure stall information for memory. See
Documentation/accounting/psi.txt for details.
Usage Guidelines
~~~~~~~~~~~~~~~~
@@ -1460,12 +1408,6 @@ IO Interface Files
8:16 rbps=2097152 wbps=max riops=max wiops=max
io.pressure
A read-only nested-key file which exists on non-root cgroups.
Shows pressure stall information for IO. See
Documentation/accounting/psi.txt for details.
Writeback
~~~~~~~~~

View File

@@ -54,9 +54,6 @@ If you make a mistake with the syntax, the write will fail thus::
<debugfs>/dynamic_debug/control
-bash: echo: write error: Invalid argument
Note, for systems without 'debugfs' enabled, the control file can be
found in ``/proc/dynamic_debug/control``.
Viewing Dynamic Debug Behaviour
===============================

View File

@@ -1,17 +0,0 @@
========================
Hardware vulnerabilities
========================
This section describes CPU vulnerabilities and provides an overview of the
possible mitigations along with guidance for selecting mitigations if they
are configurable at compile, boot or run time.
.. toctree::
:maxdepth: 1
spectre
l1tf
mds
tsx_async_abort
multihit.rst
special-register-buffer-data-sampling.rst

View File

@@ -1,311 +0,0 @@
MDS - Microarchitectural Data Sampling
======================================
Microarchitectural Data Sampling is a hardware vulnerability which allows
unprivileged speculative access to data which is available in various CPU
internal buffers.
Affected processors
-------------------
This vulnerability affects a wide range of Intel processors. The
vulnerability is not present on:
- Processors from AMD, Centaur and other non Intel vendors
- Older processor models, where the CPU family is < 6
- Some Atoms (Bonnell, Saltwell, Goldmont, GoldmontPlus)
- Intel processors which have the ARCH_CAP_MDS_NO bit set in the
IA32_ARCH_CAPABILITIES MSR.
Whether a processor is affected or not can be read out from the MDS
vulnerability file in sysfs. See :ref:`mds_sys_info`.
Not all processors are affected by all variants of MDS, but the mitigation
is identical for all of them so the kernel treats them as a single
vulnerability.
Related CVEs
------------
The following CVE entries are related to the MDS vulnerability:
============== ===== ===================================================
CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling
CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling
CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling
CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory
============== ===== ===================================================
Problem
-------
When performing store, load, L1 refill operations, processors write data
into temporary microarchitectural structures (buffers). The data in the
buffer can be forwarded to load operations as an optimization.
Under certain conditions, usually a fault/assist caused by a load
operation, data unrelated to the load memory address can be speculatively
forwarded from the buffers. Because the load operation causes a fault or
assist and its result will be discarded, the forwarded data will not cause
incorrect program execution or state changes. But a malicious operation
may be able to forward this speculative data to a disclosure gadget which
allows in turn to infer the value via a cache side channel attack.
Because the buffers are potentially shared between Hyper-Threads cross
Hyper-Thread attacks are possible.
Deeper technical information is available in the MDS specific x86
architecture section: :ref:`Documentation/x86/mds.rst <mds>`.
Attack scenarios
----------------
Attacks against the MDS vulnerabilities can be mounted from malicious non
priviledged user space applications running on hosts or guest. Malicious
guest OSes can obviously mount attacks as well.
Contrary to other speculation based vulnerabilities the MDS vulnerability
does not allow the attacker to control the memory target address. As a
consequence the attacks are purely sampling based, but as demonstrated with
the TLBleed attack samples can be postprocessed successfully.
Web-Browsers
^^^^^^^^^^^^
It's unclear whether attacks through Web-Browsers are possible at
all. The exploitation through Java-Script is considered very unlikely,
but other widely used web technologies like Webassembly could possibly be
abused.
.. _mds_sys_info:
MDS system information
-----------------------
The Linux kernel provides a sysfs interface to enumerate the current MDS
status of the system: whether the system is vulnerable, and which
mitigations are active. The relevant sysfs file is:
/sys/devices/system/cpu/vulnerabilities/mds
The possible values in this file are:
.. list-table::
* - 'Not affected'
- The processor is not vulnerable
* - 'Vulnerable'
- The processor is vulnerable, but no mitigation enabled
* - 'Vulnerable: Clear CPU buffers attempted, no microcode'
- The processor is vulnerable but microcode is not updated.
The mitigation is enabled on a best effort basis. See :ref:`vmwerv`
* - 'Mitigation: Clear CPU buffers'
- The processor is vulnerable and the CPU buffer clearing mitigation is
enabled.
If the processor is vulnerable then the following information is appended
to the above information:
======================== ============================================
'SMT vulnerable' SMT is enabled
'SMT mitigated' SMT is enabled and mitigated
'SMT disabled' SMT is disabled
'SMT Host state unknown' Kernel runs in a VM, Host SMT state unknown
======================== ============================================
.. _vmwerv:
Best effort mitigation mode
^^^^^^^^^^^^^^^^^^^^^^^^^^^
If the processor is vulnerable, but the availability of the microcode based
mitigation mechanism is not advertised via CPUID the kernel selects a best
effort mitigation mode. This mode invokes the mitigation instructions
without a guarantee that they clear the CPU buffers.
This is done to address virtualization scenarios where the host has the
microcode update applied, but the hypervisor is not yet updated to expose
the CPUID to the guest. If the host has updated microcode the protection
takes effect otherwise a few cpu cycles are wasted pointlessly.
The state in the mds sysfs file reflects this situation accordingly.
Mitigation mechanism
-------------------------
The kernel detects the affected CPUs and the presence of the microcode
which is required.
If a CPU is affected and the microcode is available, then the kernel
enables the mitigation by default. The mitigation can be controlled at boot
time via a kernel command line option. See
:ref:`mds_mitigation_control_command_line`.
.. _cpu_buffer_clear:
CPU buffer clearing
^^^^^^^^^^^^^^^^^^^
The mitigation for MDS clears the affected CPU buffers on return to user
space and when entering a guest.
If SMT is enabled it also clears the buffers on idle entry when the CPU
is only affected by MSBDS and not any other MDS variant, because the
other variants cannot be protected against cross Hyper-Thread attacks.
For CPUs which are only affected by MSBDS the user space, guest and idle
transition mitigations are sufficient and SMT is not affected.
.. _virt_mechanism:
Virtualization mitigation
^^^^^^^^^^^^^^^^^^^^^^^^^
The protection for host to guest transition depends on the L1TF
vulnerability of the CPU:
- CPU is affected by L1TF:
If the L1D flush mitigation is enabled and up to date microcode is
available, the L1D flush mitigation is automatically protecting the
guest transition.
If the L1D flush mitigation is disabled then the MDS mitigation is
invoked explicit when the host MDS mitigation is enabled.
For details on L1TF and virtualization see:
:ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <mitigation_control_kvm>`.
- CPU is not affected by L1TF:
CPU buffers are flushed before entering the guest when the host MDS
mitigation is enabled.
The resulting MDS protection matrix for the host to guest transition:
============ ===== ============= ============ =================
L1TF MDS VMX-L1FLUSH Host MDS MDS-State
Don't care No Don't care N/A Not affected
Yes Yes Disabled Off Vulnerable
Yes Yes Disabled Full Mitigated
Yes Yes Enabled Don't care Mitigated
No Yes N/A Off Vulnerable
No Yes N/A Full Mitigated
============ ===== ============= ============ =================
This only covers the host to guest transition, i.e. prevents leakage from
host to guest, but does not protect the guest internally. Guests need to
have their own protections.
.. _xeon_phi:
XEON PHI specific considerations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The XEON PHI processor family is affected by MSBDS which can be exploited
cross Hyper-Threads when entering idle states. Some XEON PHI variants allow
to use MWAIT in user space (Ring 3) which opens an potential attack vector
for malicious user space. The exposure can be disabled on the kernel
command line with the 'ring3mwait=disable' command line option.
XEON PHI is not affected by the other MDS variants and MSBDS is mitigated
before the CPU enters a idle state. As XEON PHI is not affected by L1TF
either disabling SMT is not required for full protection.
.. _mds_smt_control:
SMT control
^^^^^^^^^^^
All MDS variants except MSBDS can be attacked cross Hyper-Threads. That
means on CPUs which are affected by MFBDS or MLPDS it is necessary to
disable SMT for full protection. These are most of the affected CPUs; the
exception is XEON PHI, see :ref:`xeon_phi`.
Disabling SMT can have a significant performance impact, but the impact
depends on the type of workloads.
See the relevant chapter in the L1TF mitigation documentation for details:
:ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`.
.. _mds_mitigation_control_command_line:
Mitigation control on the kernel command line
---------------------------------------------
The kernel command line allows to control the MDS mitigations at boot
time with the option "mds=". The valid arguments for this option are:
============ =============================================================
full If the CPU is vulnerable, enable all available mitigations
for the MDS vulnerability, CPU buffer clearing on exit to
userspace and when entering a VM. Idle transitions are
protected as well if SMT is enabled.
It does not automatically disable SMT.
full,nosmt The same as mds=full, with SMT disabled on vulnerable
CPUs. This is the complete mitigation.
off Disables MDS mitigations completely.
============ =============================================================
Not specifying this option is equivalent to "mds=full". For processors
that are affected by both TAA (TSX Asynchronous Abort) and MDS,
specifying just "mds=off" without an accompanying "tsx_async_abort=off"
will have no effect as the same mitigation is used for both
vulnerabilities.
Mitigation selection guide
--------------------------
1. Trusted userspace
^^^^^^^^^^^^^^^^^^^^
If all userspace applications are from a trusted source and do not
execute untrusted code which is supplied externally, then the mitigation
can be disabled.
2. Virtualization with trusted guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The same considerations as above versus trusted user space apply.
3. Virtualization with untrusted guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The protection depends on the state of the L1TF mitigations.
See :ref:`virt_mechanism`.
If the MDS mitigation is enabled and SMT is disabled, guest to host and
guest to guest attacks are prevented.
.. _mds_default_mitigations:
Default mitigations
-------------------
The kernel default mitigations for vulnerable processors are:
- Enable CPU buffer clearing
The kernel does not by default enforce the disabling of SMT, which leaves
SMT systems vulnerable when running untrusted code. The same rationale as
for L1TF applies.
See :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <default_mitigations>`.

View File

@@ -1,163 +0,0 @@
iTLB multihit
=============
iTLB multihit is an erratum where some processors may incur a machine check
error, possibly resulting in an unrecoverable CPU lockup, when an
instruction fetch hits multiple entries in the instruction TLB. This can
occur when the page size is changed along with either the physical address
or cache type. A malicious guest running on a virtualized system can
exploit this erratum to perform a denial of service attack.
Affected processors
-------------------
Variations of this erratum are present on most Intel Core and Xeon processor
models. The erratum is not present on:
- non-Intel processors
- Some Atoms (Airmont, Bonnell, Goldmont, GoldmontPlus, Saltwell, Silvermont)
- Intel processors that have the PSCHANGE_MC_NO bit set in the
IA32_ARCH_CAPABILITIES MSR.
Related CVEs
------------
The following CVE entry is related to this issue:
============== =================================================
CVE-2018-12207 Machine Check Error Avoidance on Page Size Change
============== =================================================
Problem
-------
Privileged software, including OS and virtual machine managers (VMM), are in
charge of memory management. A key component in memory management is the control
of the page tables. Modern processors use virtual memory, a technique that creates
the illusion of a very large memory for processors. This virtual space is split
into pages of a given size. Page tables translate virtual addresses to physical
addresses.
To reduce latency when performing a virtual to physical address translation,
processors include a structure, called TLB, that caches recent translations.
There are separate TLBs for instruction (iTLB) and data (dTLB).
Under this errata, instructions are fetched from a linear address translated
using a 4 KB translation cached in the iTLB. Privileged software modifies the
paging structure so that the same linear address using large page size (2 MB, 4
MB, 1 GB) with a different physical address or memory type. After the page
structure modification but before the software invalidates any iTLB entries for
the linear address, a code fetch that happens on the same linear address may
cause a machine-check error which can result in a system hang or shutdown.
Attack scenarios
----------------
Attacks against the iTLB multihit erratum can be mounted from malicious
guests in a virtualized system.
iTLB multihit system information
--------------------------------
The Linux kernel provides a sysfs interface to enumerate the current iTLB
multihit status of the system:whether the system is vulnerable and which
mitigations are active. The relevant sysfs file is:
/sys/devices/system/cpu/vulnerabilities/itlb_multihit
The possible values in this file are:
.. list-table::
* - Not affected
- The processor is not vulnerable.
* - KVM: Mitigation: Split huge pages
- Software changes mitigate this issue.
* - KVM: Vulnerable
- The processor is vulnerable, but no mitigation enabled
Enumeration of the erratum
--------------------------------
A new bit has been allocated in the IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) msr
and will be set on CPU's which are mitigated against this issue.
======================================= =========== ===============================
IA32_ARCH_CAPABILITIES MSR Not present Possibly vulnerable,check model
IA32_ARCH_CAPABILITIES[PSCHANGE_MC_NO] '0' Likely vulnerable,check model
IA32_ARCH_CAPABILITIES[PSCHANGE_MC_NO] '1' Not vulnerable
======================================= =========== ===============================
Mitigation mechanism
-------------------------
This erratum can be mitigated by restricting the use of large page sizes to
non-executable pages. This forces all iTLB entries to be 4K, and removes
the possibility of multiple hits.
In order to mitigate the vulnerability, KVM initially marks all huge pages
as non-executable. If the guest attempts to execute in one of those pages,
the page is broken down into 4K pages, which are then marked executable.
If EPT is disabled or not available on the host, KVM is in control of TLB
flushes and the problematic situation cannot happen. However, the shadow
EPT paging mechanism used by nested virtualization is vulnerable, because
the nested guest can trigger multiple iTLB hits by modifying its own
(non-nested) page tables. For simplicity, KVM will make large pages
non-executable in all shadow paging modes.
Mitigation control on the kernel command line and KVM - module parameter
------------------------------------------------------------------------
The KVM hypervisor mitigation mechanism for marking huge pages as
non-executable can be controlled with a module parameter "nx_huge_pages=".
The kernel command line allows to control the iTLB multihit mitigations at
boot time with the option "kvm.nx_huge_pages=".
The valid arguments for these options are:
========== ================================================================
force Mitigation is enabled. In this case, the mitigation implements
non-executable huge pages in Linux kernel KVM module. All huge
pages in the EPT are marked as non-executable.
If a guest attempts to execute in one of those pages, the page is
broken down into 4K pages, which are then marked executable.
off Mitigation is disabled.
auto Enable mitigation only if the platform is affected and the kernel
was not booted with the "mitigations=off" command line parameter.
This is the default option.
========== ================================================================
Mitigation selection guide
--------------------------
1. No virtualization in use
^^^^^^^^^^^^^^^^^^^^^^^^^^^
The system is protected by the kernel unconditionally and no further
action is required.
2. Virtualization with trusted guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If the guest comes from a trusted source, you may assume that the guest will
not attempt to maliciously exploit these errata and no further action is
required.
3. Virtualization with untrusted guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If the guest comes from an untrusted source, the guest host kernel will need
to apply iTLB multihit mitigation via the kernel command line or kvm
module parameter.

View File

@@ -1,149 +0,0 @@
.. SPDX-License-Identifier: GPL-2.0
SRBDS - Special Register Buffer Data Sampling
=============================================
SRBDS is a hardware vulnerability that allows MDS :doc:`mds` techniques to
infer values returned from special register accesses. Special register
accesses are accesses to off core registers. According to Intel's evaluation,
the special register reads that have a security expectation of privacy are
RDRAND, RDSEED and SGX EGETKEY.
When RDRAND, RDSEED and EGETKEY instructions are used, the data is moved
to the core through the special register mechanism that is susceptible
to MDS attacks.
Affected processors
--------------------
Core models (desktop, mobile, Xeon-E3) that implement RDRAND and/or RDSEED may
be affected.
A processor is affected by SRBDS if its Family_Model and stepping is
in the following list, with the exception of the listed processors
exporting MDS_NO while Intel TSX is available yet not enabled. The
latter class of processors are only affected when Intel TSX is enabled
by software using TSX_CTRL_MSR otherwise they are not affected.
============= ============ ========
common name Family_Model Stepping
============= ============ ========
IvyBridge 06_3AH All
Haswell 06_3CH All
Haswell_L 06_45H All
Haswell_G 06_46H All
Broadwell_G 06_47H All
Broadwell 06_3DH All
Skylake_L 06_4EH All
Skylake 06_5EH All
Kabylake_L 06_8EH <= 0xC
Kabylake 06_9EH <= 0xD
============= ============ ========
Related CVEs
------------
The following CVE entry is related to this SRBDS issue:
============== ===== =====================================
CVE-2020-0543 SRBDS Special Register Buffer Data Sampling
============== ===== =====================================
Attack scenarios
----------------
An unprivileged user can extract values returned from RDRAND and RDSEED
executed on another core or sibling thread using MDS techniques.
Mitigation mechanism
-------------------
Intel will release microcode updates that modify the RDRAND, RDSEED, and
EGETKEY instructions to overwrite secret special register data in the shared
staging buffer before the secret data can be accessed by another logical
processor.
During execution of the RDRAND, RDSEED, or EGETKEY instructions, off-core
accesses from other logical processors will be delayed until the special
register read is complete and the secret data in the shared staging buffer is
overwritten.
This has three effects on performance:
#. RDRAND, RDSEED, or EGETKEY instructions have higher latency.
#. Executing RDRAND at the same time on multiple logical processors will be
serialized, resulting in an overall reduction in the maximum RDRAND
bandwidth.
#. Executing RDRAND, RDSEED or EGETKEY will delay memory accesses from other
logical processors that miss their core caches, with an impact similar to
legacy locked cache-line-split accesses.
The microcode updates provide an opt-out mechanism (RNGDS_MITG_DIS) to disable
the mitigation for RDRAND and RDSEED instructions executed outside of Intel
Software Guard Extensions (Intel SGX) enclaves. On logical processors that
disable the mitigation using this opt-out mechanism, RDRAND and RDSEED do not
take longer to execute and do not impact performance of sibling logical
processors memory accesses. The opt-out mechanism does not affect Intel SGX
enclaves (including execution of RDRAND or RDSEED inside an enclave, as well
as EGETKEY execution).
IA32_MCU_OPT_CTRL MSR Definition
--------------------------------
Along with the mitigation for this issue, Intel added a new thread-scope
IA32_MCU_OPT_CTRL MSR, (address 0x123). The presence of this MSR and
RNGDS_MITG_DIS (bit 0) is enumerated by CPUID.(EAX=07H,ECX=0).EDX[SRBDS_CTRL =
9]==1. This MSR is introduced through the microcode update.
Setting IA32_MCU_OPT_CTRL[0] (RNGDS_MITG_DIS) to 1 for a logical processor
disables the mitigation for RDRAND and RDSEED executed outside of an Intel SGX
enclave on that logical processor. Opting out of the mitigation for a
particular logical processor does not affect the RDRAND and RDSEED mitigations
for other logical processors.
Note that inside of an Intel SGX enclave, the mitigation is applied regardless
of the value of RNGDS_MITG_DS.
Mitigation control on the kernel command line
---------------------------------------------
The kernel command line allows control over the SRBDS mitigation at boot time
with the option "srbds=". The option for this is:
============= =============================================================
off This option disables SRBDS mitigation for RDRAND and RDSEED on
affected platforms.
============= =============================================================
SRBDS System Information
-----------------------
The Linux kernel provides vulnerability status information through sysfs. For
SRBDS this can be accessed by the following sysfs file:
/sys/devices/system/cpu/vulnerabilities/srbds
The possible values contained in this file are:
============================== =============================================
Not affected Processor not vulnerable
Vulnerable Processor vulnerable and mitigation disabled
Vulnerable: No microcode Processor vulnerable and microcode is missing
mitigation
Mitigation: Microcode Processor is vulnerable and mitigation is in
effect.
Mitigation: TSX disabled Processor is only vulnerable when TSX is
enabled while this system was booted with TSX
disabled.
Unknown: Dependent on
hypervisor status Running on virtual guest processor that is
affected but with no way to know if host
processor is mitigated or vulnerable.
============================== =============================================
SRBDS Default mitigation
------------------------
This new microcode serializes processor access during execution of RDRAND,
RDSEED ensures that the shared buffer is overwritten before it is released for
reuse. Use the "srbds=off" kernel command line to disable the mitigation for
RDRAND and RDSEED.

View File

@@ -1,769 +0,0 @@
.. SPDX-License-Identifier: GPL-2.0
Spectre Side Channels
=====================
Spectre is a class of side channel attacks that exploit branch prediction
and speculative execution on modern CPUs to read memory, possibly
bypassing access controls. Speculative execution side channel exploits
do not modify memory but attempt to infer privileged data in the memory.
This document covers Spectre variant 1 and Spectre variant 2.
Affected processors
-------------------
Speculative execution side channel methods affect a wide range of modern
high performance processors, since most modern high speed processors
use branch prediction and speculative execution.
The following CPUs are vulnerable:
- Intel Core, Atom, Pentium, and Xeon processors
- AMD Phenom, EPYC, and Zen processors
- IBM POWER and zSeries processors
- Higher end ARM processors
- Apple CPUs
- Higher end MIPS CPUs
- Likely most other high performance CPUs. Contact your CPU vendor for details.
Whether a processor is affected or not can be read out from the Spectre
vulnerability files in sysfs. See :ref:`spectre_sys_info`.
Related CVEs
------------
The following CVE entries describe Spectre variants:
============= ======================= ==========================
CVE-2017-5753 Bounds check bypass Spectre variant 1
CVE-2017-5715 Branch target injection Spectre variant 2
CVE-2019-1125 Spectre v1 swapgs Spectre variant 1 (swapgs)
============= ======================= ==========================
Problem
-------
CPUs use speculative operations to improve performance. That may leave
traces of memory accesses or computations in the processor's caches,
buffers, and branch predictors. Malicious software may be able to
influence the speculative execution paths, and then use the side effects
of the speculative execution in the CPUs' caches and buffers to infer
privileged data touched during the speculative execution.
Spectre variant 1 attacks take advantage of speculative execution of
conditional branches, while Spectre variant 2 attacks use speculative
execution of indirect branches to leak privileged memory.
See :ref:`[1] <spec_ref1>` :ref:`[5] <spec_ref5>` :ref:`[7] <spec_ref7>`
:ref:`[10] <spec_ref10>` :ref:`[11] <spec_ref11>`.
Spectre variant 1 (Bounds Check Bypass)
---------------------------------------
The bounds check bypass attack :ref:`[2] <spec_ref2>` takes advantage
of speculative execution that bypasses conditional branch instructions
used for memory access bounds check (e.g. checking if the index of an
array results in memory access within a valid range). This results in
memory accesses to invalid memory (with out-of-bound index) that are
done speculatively before validation checks resolve. Such speculative
memory accesses can leave side effects, creating side channels which
leak information to the attacker.
There are some extensions of Spectre variant 1 attacks for reading data
over the network, see :ref:`[12] <spec_ref12>`. However such attacks
are difficult, low bandwidth, fragile, and are considered low risk.
Note that, despite "Bounds Check Bypass" name, Spectre variant 1 is not
only about user-controlled array bounds checks. It can affect any
conditional checks. The kernel entry code interrupt, exception, and NMI
handlers all have conditional swapgs checks. Those may be problematic
in the context of Spectre v1, as kernel code can speculatively run with
a user GS.
Spectre variant 2 (Branch Target Injection)
-------------------------------------------
The branch target injection attack takes advantage of speculative
execution of indirect branches :ref:`[3] <spec_ref3>`. The indirect
branch predictors inside the processor used to guess the target of
indirect branches can be influenced by an attacker, causing gadget code
to be speculatively executed, thus exposing sensitive data touched by
the victim. The side effects left in the CPU's caches during speculative
execution can be measured to infer data values.
.. _poison_btb:
In Spectre variant 2 attacks, the attacker can steer speculative indirect
branches in the victim to gadget code by poisoning the branch target
buffer of a CPU used for predicting indirect branch addresses. Such
poisoning could be done by indirect branching into existing code,
with the address offset of the indirect branch under the attacker's
control. Since the branch prediction on impacted hardware does not
fully disambiguate branch address and uses the offset for prediction,
this could cause privileged code's indirect branch to jump to a gadget
code with the same offset.
The most useful gadgets take an attacker-controlled input parameter (such
as a register value) so that the memory read can be controlled. Gadgets
without input parameters might be possible, but the attacker would have
very little control over what memory can be read, reducing the risk of
the attack revealing useful data.
One other variant 2 attack vector is for the attacker to poison the
return stack buffer (RSB) :ref:`[13] <spec_ref13>` to cause speculative
subroutine return instruction execution to go to a gadget. An attacker's
imbalanced subroutine call instructions might "poison" entries in the
return stack buffer which are later consumed by a victim's subroutine
return instructions. This attack can be mitigated by flushing the return
stack buffer on context switch, or virtual machine (VM) exit.
On systems with simultaneous multi-threading (SMT), attacks are possible
from the sibling thread, as level 1 cache and branch target buffer
(BTB) may be shared between hardware threads in a CPU core. A malicious
program running on the sibling thread may influence its peer's BTB to
steer its indirect branch speculations to gadget code, and measure the
speculative execution's side effects left in level 1 cache to infer the
victim's data.
Attack scenarios
----------------
The following list of attack scenarios have been anticipated, but may
not cover all possible attack vectors.
1. A user process attacking the kernel
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Spectre variant 1
~~~~~~~~~~~~~~~~~
The attacker passes a parameter to the kernel via a register or
via a known address in memory during a syscall. Such parameter may
be used later by the kernel as an index to an array or to derive
a pointer for a Spectre variant 1 attack. The index or pointer
is invalid, but bound checks are bypassed in the code branch taken
for speculative execution. This could cause privileged memory to be
accessed and leaked.
For kernel code that has been identified where data pointers could
potentially be influenced for Spectre attacks, new "nospec" accessor
macros are used to prevent speculative loading of data.
Spectre variant 1 (swapgs)
~~~~~~~~~~~~~~~~~~~~~~~~~~
An attacker can train the branch predictor to speculatively skip the
swapgs path for an interrupt or exception. If they initialize
the GS register to a user-space value, if the swapgs is speculatively
skipped, subsequent GS-related percpu accesses in the speculation
window will be done with the attacker-controlled GS value. This
could cause privileged memory to be accessed and leaked.
For example:
::
if (coming from user space)
swapgs
mov %gs:<percpu_offset>, %reg
mov (%reg), %reg1
When coming from user space, the CPU can speculatively skip the
swapgs, and then do a speculative percpu load using the user GS
value. So the user can speculatively force a read of any kernel
value. If a gadget exists which uses the percpu value as an address
in another load/store, then the contents of the kernel value may
become visible via an L1 side channel attack.
A similar attack exists when coming from kernel space. The CPU can
speculatively do the swapgs, causing the user GS to get used for the
rest of the speculative window.
Spectre variant 2
~~~~~~~~~~~~~~~~~
A spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
target buffer (BTB) before issuing syscall to launch an attack.
After entering the kernel, the kernel could use the poisoned branch
target buffer on indirect jump and jump to gadget code in speculative
execution.
If an attacker tries to control the memory addresses leaked during
speculative execution, he would also need to pass a parameter to the
gadget, either through a register or a known address in memory. After
the gadget has executed, he can measure the side effect.
The kernel can protect itself against consuming poisoned branch
target buffer entries by using return trampolines (also known as
"retpoline") :ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` for all
indirect branches. Return trampolines trap speculative execution paths
to prevent jumping to gadget code during speculative execution.
x86 CPUs with Enhanced Indirect Branch Restricted Speculation
(Enhanced IBRS) available in hardware should use the feature to
mitigate Spectre variant 2 instead of retpoline. Enhanced IBRS is
more efficient than retpoline.
There may be gadget code in firmware which could be exploited with
Spectre variant 2 attack by a rogue user process. To mitigate such
attacks on x86, Indirect Branch Restricted Speculation (IBRS) feature
is turned on before the kernel invokes any firmware code.
2. A user process attacking another user process
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A malicious user process can try to attack another user process,
either via a context switch on the same hardware thread, or from the
sibling hyperthread sharing a physical processor core on simultaneous
multi-threading (SMT) system.
Spectre variant 1 attacks generally require passing parameters
between the processes, which needs a data passing relationship, such
as remote procedure calls (RPC). Those parameters are used in gadget
code to derive invalid data pointers accessing privileged memory in
the attacked process.
Spectre variant 2 attacks can be launched from a rogue process by
:ref:`poisoning <poison_btb>` the branch target buffer. This can
influence the indirect branch targets for a victim process that either
runs later on the same hardware thread, or running concurrently on
a sibling hardware thread sharing the same physical core.
A user process can protect itself against Spectre variant 2 attacks
by using the prctl() syscall to disable indirect branch speculation
for itself. An administrator can also cordon off an unsafe process
from polluting the branch target buffer by disabling the process's
indirect branch speculation. This comes with a performance cost
from not using indirect branch speculation and clearing the branch
target buffer. When SMT is enabled on x86, for a process that has
indirect branch speculation disabled, Single Threaded Indirect Branch
Predictors (STIBP) :ref:`[4] <spec_ref4>` are turned on to prevent the
sibling thread from controlling branch target buffer. In addition,
the Indirect Branch Prediction Barrier (IBPB) is issued to clear the
branch target buffer when context switching to and from such process.
On x86, the return stack buffer is stuffed on context switch.
This prevents the branch target buffer from being used for branch
prediction when the return stack buffer underflows while switching to
a deeper call stack. Any poisoned entries in the return stack buffer
left by the previous process will also be cleared.
User programs should use address space randomization to make attacks
more difficult (Set /proc/sys/kernel/randomize_va_space = 1 or 2).
3. A virtualized guest attacking the host
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The attack mechanism is similar to how user processes attack the
kernel. The kernel is entered via hyper-calls or other virtualization
exit paths.
For Spectre variant 1 attacks, rogue guests can pass parameters
(e.g. in registers) via hyper-calls to derive invalid pointers to
speculate into privileged memory after entering the kernel. For places
where such kernel code has been identified, nospec accessor macros
are used to stop speculative memory access.
For Spectre variant 2 attacks, rogue guests can :ref:`poison
<poison_btb>` the branch target buffer or return stack buffer, causing
the kernel to jump to gadget code in the speculative execution paths.
To mitigate variant 2, the host kernel can use return trampolines
for indirect branches to bypass the poisoned branch target buffer,
and flushing the return stack buffer on VM exit. This prevents rogue
guests from affecting indirect branching in the host kernel.
To protect host processes from rogue guests, host processes can have
indirect branch speculation disabled via prctl(). The branch target
buffer is cleared before context switching to such processes.
4. A virtualized guest attacking other guest
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A rogue guest may attack another guest to get data accessible by the
other guest.
Spectre variant 1 attacks are possible if parameters can be passed
between guests. This may be done via mechanisms such as shared memory
or message passing. Such parameters could be used to derive data
pointers to privileged data in guest. The privileged data could be
accessed by gadget code in the victim's speculation paths.
Spectre variant 2 attacks can be launched from a rogue guest by
:ref:`poisoning <poison_btb>` the branch target buffer or the return
stack buffer. Such poisoned entries could be used to influence
speculation execution paths in the victim guest.
Linux kernel mitigates attacks to other guests running in the same
CPU hardware thread by flushing the return stack buffer on VM exit,
and clearing the branch target buffer before switching to a new guest.
If SMT is used, Spectre variant 2 attacks from an untrusted guest
in the sibling hyperthread can be mitigated by the administrator,
by turning off the unsafe guest's indirect branch speculation via
prctl(). A guest can also protect itself by turning on microcode
based mitigations (such as IBPB or STIBP on x86) within the guest.
.. _spectre_sys_info:
Spectre system information
--------------------------
The Linux kernel provides a sysfs interface to enumerate the current
mitigation status of the system for Spectre: whether the system is
vulnerable, and which mitigations are active.
The sysfs file showing Spectre variant 1 mitigation status is:
/sys/devices/system/cpu/vulnerabilities/spectre_v1
The possible values in this file are:
.. list-table::
* - 'Not affected'
- The processor is not vulnerable.
* - 'Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers'
- The swapgs protections are disabled; otherwise it has
protection in the kernel on a case by case base with explicit
pointer sanitation and usercopy LFENCE barriers.
* - 'Mitigation: usercopy/swapgs barriers and __user pointer sanitization'
- Protection in the kernel on a case by case base with explicit
pointer sanitation, usercopy LFENCE barriers, and swapgs LFENCE
barriers.
However, the protections are put in place on a case by case basis,
and there is no guarantee that all possible attack vectors for Spectre
variant 1 are covered.
The spectre_v2 kernel file reports if the kernel has been compiled with
retpoline mitigation or if the CPU has hardware mitigation, and if the
CPU has support for additional process-specific mitigation.
This file also reports CPU features enabled by microcode to mitigate
attack between user processes:
1. Indirect Branch Prediction Barrier (IBPB) to add additional
isolation between processes of different users.
2. Single Thread Indirect Branch Predictors (STIBP) to add additional
isolation between CPU threads running on the same core.
These CPU features may impact performance when used and can be enabled
per process on a case-by-case base.
The sysfs file showing Spectre variant 2 mitigation status is:
/sys/devices/system/cpu/vulnerabilities/spectre_v2
The possible values in this file are:
- Kernel status:
==================================== =================================
'Not affected' The processor is not vulnerable
'Vulnerable' Vulnerable, no mitigation
'Mitigation: Full generic retpoline' Software-focused mitigation
'Mitigation: Full AMD retpoline' AMD-specific software mitigation
'Mitigation: Enhanced IBRS' Hardware-focused mitigation
==================================== =================================
- Firmware status: Show if Indirect Branch Restricted Speculation (IBRS) is
used to protect against Spectre variant 2 attacks when calling firmware (x86 only).
========== =============================================================
'IBRS_FW' Protection against user program attacks when calling firmware
========== =============================================================
- Indirect branch prediction barrier (IBPB) status for protection between
processes of different users. This feature can be controlled through
prctl() per process, or through kernel command line options. This is
an x86 only feature. For more details see below.
=================== ========================================================
'IBPB: disabled' IBPB unused
'IBPB: always-on' Use IBPB on all tasks
'IBPB: conditional' Use IBPB on SECCOMP or indirect branch restricted tasks
=================== ========================================================
- Single threaded indirect branch prediction (STIBP) status for protection
between different hyper threads. This feature can be controlled through
prctl per process, or through kernel command line options. This is x86
only feature. For more details see below.
==================== ========================================================
'STIBP: disabled' STIBP unused
'STIBP: forced' Use STIBP on all tasks
'STIBP: conditional' Use STIBP on SECCOMP or indirect branch restricted tasks
==================== ========================================================
- Return stack buffer (RSB) protection status:
============= ===========================================
'RSB filling' Protection of RSB on context switch enabled
============= ===========================================
Full mitigation might require a microcode update from the CPU
vendor. When the necessary microcode is not available, the kernel will
report vulnerability.
Turning on mitigation for Spectre variant 1 and Spectre variant 2
-----------------------------------------------------------------
1. Kernel mitigation
^^^^^^^^^^^^^^^^^^^^
Spectre variant 1
~~~~~~~~~~~~~~~~~
For the Spectre variant 1, vulnerable kernel code (as determined
by code audit or scanning tools) is annotated on a case by case
basis to use nospec accessor macros for bounds clipping :ref:`[2]
<spec_ref2>` to avoid any usable disclosure gadgets. However, it may
not cover all attack vectors for Spectre variant 1.
Copy-from-user code has an LFENCE barrier to prevent the access_ok()
check from being mis-speculated. The barrier is done by the
barrier_nospec() macro.
For the swapgs variant of Spectre variant 1, LFENCE barriers are
added to interrupt, exception and NMI entry where needed. These
barriers are done by the FENCE_SWAPGS_KERNEL_ENTRY and
FENCE_SWAPGS_USER_ENTRY macros.
Spectre variant 2
~~~~~~~~~~~~~~~~~
For Spectre variant 2 mitigation, the compiler turns indirect calls or
jumps in the kernel into equivalent return trampolines (retpolines)
:ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` to go to the target
addresses. Speculative execution paths under retpolines are trapped
in an infinite loop to prevent any speculative execution jumping to
a gadget.
To turn on retpoline mitigation on a vulnerable CPU, the kernel
needs to be compiled with a gcc compiler that supports the
-mindirect-branch=thunk-extern -mindirect-branch-register options.
If the kernel is compiled with a Clang compiler, the compiler needs
to support -mretpoline-external-thunk option. The kernel config
CONFIG_RETPOLINE needs to be turned on, and the CPU needs to run with
the latest updated microcode.
On Intel Skylake-era systems the mitigation covers most, but not all,
cases. See :ref:`[3] <spec_ref3>` for more details.
On CPUs with hardware mitigation for Spectre variant 2 (e.g. Enhanced
IBRS on x86), retpoline is automatically disabled at run time.
The retpoline mitigation is turned on by default on vulnerable
CPUs. It can be forced on or off by the administrator
via the kernel command line and sysfs control files. See
:ref:`spectre_mitigation_control_command_line`.
On x86, indirect branch restricted speculation is turned on by default
before invoking any firmware code to prevent Spectre variant 2 exploits
using the firmware.
Using kernel address space randomization (CONFIG_RANDOMIZE_SLAB=y
and CONFIG_SLAB_FREELIST_RANDOM=y in the kernel configuration) makes
attacks on the kernel generally more difficult.
2. User program mitigation
^^^^^^^^^^^^^^^^^^^^^^^^^^
User programs can mitigate Spectre variant 1 using LFENCE or "bounds
clipping". For more details see :ref:`[2] <spec_ref2>`.
For Spectre variant 2 mitigation, individual user programs
can be compiled with return trampolines for indirect branches.
This protects them from consuming poisoned entries in the branch
target buffer left by malicious software. Alternatively, the
programs can disable their indirect branch speculation via prctl()
(See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
On x86, this will turn on STIBP to guard against attacks from the
sibling thread when the user program is running, and use IBPB to
flush the branch target buffer when switching to/from the program.
Restricting indirect branch speculation on a user program will
also prevent the program from launching a variant 2 attack
on x86. All sand-boxed SECCOMP programs have indirect branch
speculation restricted by default. Administrators can change
that behavior via the kernel command line and sysfs control files.
See :ref:`spectre_mitigation_control_command_line`.
Programs that disable their indirect branch speculation will have
more overhead and run slower.
User programs should use address space randomization
(/proc/sys/kernel/randomize_va_space = 1 or 2) to make attacks more
difficult.
3. VM mitigation
^^^^^^^^^^^^^^^^
Within the kernel, Spectre variant 1 attacks from rogue guests are
mitigated on a case by case basis in VM exit paths. Vulnerable code
uses nospec accessor macros for "bounds clipping", to avoid any
usable disclosure gadgets. However, this may not cover all variant
1 attack vectors.
For Spectre variant 2 attacks from rogue guests to the kernel, the
Linux kernel uses retpoline or Enhanced IBRS to prevent consumption of
poisoned entries in branch target buffer left by rogue guests. It also
flushes the return stack buffer on every VM exit to prevent a return
stack buffer underflow so poisoned branch target buffer could be used,
or attacker guests leaving poisoned entries in the return stack buffer.
To mitigate guest-to-guest attacks in the same CPU hardware thread,
the branch target buffer is sanitized by flushing before switching
to a new guest on a CPU.
The above mitigations are turned on by default on vulnerable CPUs.
To mitigate guest-to-guest attacks from sibling thread when SMT is
in use, an untrusted guest running in the sibling thread can have
its indirect branch speculation disabled by administrator via prctl().
The kernel also allows guests to use any microcode based mitigation
they choose to use (such as IBPB or STIBP on x86) to protect themselves.
.. _spectre_mitigation_control_command_line:
Mitigation control on the kernel command line
---------------------------------------------
Spectre variant 2 mitigation can be disabled or force enabled at the
kernel command line.
nospectre_v1
[X86,PPC] Disable mitigations for Spectre Variant 1
(bounds check bypass). With this option data leaks are
possible in the system.
nospectre_v2
[X86] Disable all mitigations for the Spectre variant 2
(indirect branch prediction) vulnerability. System may
allow data leaks with this option, which is equivalent
to spectre_v2=off.
spectre_v2=
[X86] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability.
The default operation protects the kernel from
user space attacks.
on
unconditionally enable, implies
spectre_v2_user=on
off
unconditionally disable, implies
spectre_v2_user=off
auto
kernel detects whether your CPU model is
vulnerable
Selecting 'on' will, and 'auto' may, choose a
mitigation method at run time according to the
CPU, the available microcode, the setting of the
CONFIG_RETPOLINE configuration option, and the
compiler with which the kernel was built.
Selecting 'on' will also enable the mitigation
against user space to user space task attacks.
Selecting 'off' will disable both the kernel and
the user space protections.
Specific mitigations can also be selected manually:
retpoline
replace indirect branches
retpoline,generic
google's original retpoline
retpoline,amd
AMD-specific minimal thunk
Not specifying this option is equivalent to
spectre_v2=auto.
For user space mitigation:
spectre_v2_user=
[X86] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability between
user space tasks
on
Unconditionally enable mitigations. Is
enforced by spectre_v2=on
off
Unconditionally disable mitigations. Is
enforced by spectre_v2=off
prctl
Indirect branch speculation is enabled,
but mitigation can be enabled via prctl
per thread. The mitigation control state
is inherited on fork.
prctl,ibpb
Like "prctl" above, but only STIBP is
controlled per thread. IBPB is issued
always when switching between different user
space processes.
seccomp
Same as "prctl" above, but all seccomp
threads will enable the mitigation unless
they explicitly opt out.
seccomp,ibpb
Like "seccomp" above, but only STIBP is
controlled per thread. IBPB is issued
always when switching between different
user space processes.
auto
Kernel selects the mitigation depending on
the available CPU features and vulnerability.
Default mitigation:
If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
Not specifying this option is equivalent to
spectre_v2_user=auto.
In general the kernel by default selects
reasonable mitigations for the current CPU. To
disable Spectre variant 2 mitigations, boot with
spectre_v2=off. Spectre variant 1 mitigations
cannot be disabled.
Mitigation selection guide
--------------------------
1. Trusted userspace
^^^^^^^^^^^^^^^^^^^^
If all userspace applications are from trusted sources and do not
execute externally supplied untrusted code, then the mitigations can
be disabled.
2. Protect sensitive programs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For security-sensitive programs that have secrets (e.g. crypto
keys), protection against Spectre variant 2 can be put in place by
disabling indirect branch speculation when the program is running
(See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
3. Sandbox untrusted programs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Untrusted programs that could be a source of attacks can be cordoned
off by disabling their indirect branch speculation when they are run
(See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
This prevents untrusted programs from polluting the branch target
buffer. All programs running in SECCOMP sandboxes have indirect
branch speculation restricted by default. This behavior can be
changed via the kernel command line and sysfs control files. See
:ref:`spectre_mitigation_control_command_line`.
3. High security mode
^^^^^^^^^^^^^^^^^^^^^
All Spectre variant 2 mitigations can be forced on
at boot time for all programs (See the "on" option in
:ref:`spectre_mitigation_control_command_line`). This will add
overhead as indirect branch speculations for all programs will be
restricted.
On x86, branch target buffer will be flushed with IBPB when switching
to a new program. STIBP is left on all the time to protect programs
against variant 2 attacks originating from programs running on
sibling threads.
Alternatively, STIBP can be used only when running programs
whose indirect branch speculation is explicitly disabled,
while IBPB is still used all the time when switching to a new
program to clear the branch target buffer (See "ibpb" option in
:ref:`spectre_mitigation_control_command_line`). This "ibpb" option
has less performance cost than the "on" option, which leaves STIBP
on all the time.
References on Spectre
---------------------
Intel white papers:
.. _spec_ref1:
[1] `Intel analysis of speculative execution side channels <https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf>`_.
.. _spec_ref2:
[2] `Bounds check bypass <https://software.intel.com/security-software-guidance/software-guidance/bounds-check-bypass>`_.
.. _spec_ref3:
[3] `Deep dive: Retpoline: A branch target injection mitigation <https://software.intel.com/security-software-guidance/insights/deep-dive-retpoline-branch-target-injection-mitigation>`_.
.. _spec_ref4:
[4] `Deep Dive: Single Thread Indirect Branch Predictors <https://software.intel.com/security-software-guidance/insights/deep-dive-single-thread-indirect-branch-predictors>`_.
AMD white papers:
.. _spec_ref5:
[5] `AMD64 technology indirect branch control extension <https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf>`_.
.. _spec_ref6:
[6] `Software techniques for managing speculation on AMD processors <https://developer.amd.com/wp-content/resources/90343-B_SoftwareTechniquesforManagingSpeculation_WP_7-18Update_FNL.pdf>`_.
ARM white papers:
.. _spec_ref7:
[7] `Cache speculation side-channels <https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/download-the-whitepaper>`_.
.. _spec_ref8:
[8] `Cache speculation issues update <https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/latest-updates/cache-speculation-issues-update>`_.
Google white paper:
.. _spec_ref9:
[9] `Retpoline: a software construct for preventing branch-target-injection <https://support.google.com/faqs/answer/7625886>`_.
MIPS white paper:
.. _spec_ref10:
[10] `MIPS: response on speculative execution and side channel vulnerabilities <https://www.mips.com/blog/mips-response-on-speculative-execution-and-side-channel-vulnerabilities/>`_.
Academic papers:
.. _spec_ref11:
[11] `Spectre Attacks: Exploiting Speculative Execution <https://spectreattack.com/spectre.pdf>`_.
.. _spec_ref12:
[12] `NetSpectre: Read Arbitrary Memory over Network <https://arxiv.org/abs/1807.10535>`_.
.. _spec_ref13:
[13] `Spectre Returns! Speculation Attacks using the Return Stack Buffer <https://www.usenix.org/system/files/conference/woot18/woot18-paper-koruyeh.pdf>`_.

View File

@@ -1,279 +0,0 @@
.. SPDX-License-Identifier: GPL-2.0
TAA - TSX Asynchronous Abort
======================================
TAA is a hardware vulnerability that allows unprivileged speculative access to
data which is available in various CPU internal buffers by using asynchronous
aborts within an Intel TSX transactional region.
Affected processors
-------------------
This vulnerability only affects Intel processors that support Intel
Transactional Synchronization Extensions (TSX) when the TAA_NO bit (bit 8)
is 0 in the IA32_ARCH_CAPABILITIES MSR. On processors where the MDS_NO bit
(bit 5) is 0 in the IA32_ARCH_CAPABILITIES MSR, the existing MDS mitigations
also mitigate against TAA.
Whether a processor is affected or not can be read out from the TAA
vulnerability file in sysfs. See :ref:`tsx_async_abort_sys_info`.
Related CVEs
------------
The following CVE entry is related to this TAA issue:
============== ===== ===================================================
CVE-2019-11135 TAA TSX Asynchronous Abort (TAA) condition on some
microprocessors utilizing speculative execution may
allow an authenticated user to potentially enable
information disclosure via a side channel with
local access.
============== ===== ===================================================
Problem
-------
When performing store, load or L1 refill operations, processors write
data into temporary microarchitectural structures (buffers). The data in
those buffers can be forwarded to load operations as an optimization.
Intel TSX is an extension to the x86 instruction set architecture that adds
hardware transactional memory support to improve performance of multi-threaded
software. TSX lets the processor expose and exploit concurrency hidden in an
application due to dynamically avoiding unnecessary synchronization.
TSX supports atomic memory transactions that are either committed (success) or
aborted. During an abort, operations that happened within the transactional region
are rolled back. An asynchronous abort takes place, among other options, when a
different thread accesses a cache line that is also used within the transactional
region when that access might lead to a data race.
Immediately after an uncompleted asynchronous abort, certain speculatively
executed loads may read data from those internal buffers and pass it to dependent
operations. This can be then used to infer the value via a cache side channel
attack.
Because the buffers are potentially shared between Hyper-Threads cross
Hyper-Thread attacks are possible.
The victim of a malicious actor does not need to make use of TSX. Only the
attacker needs to begin a TSX transaction and raise an asynchronous abort
which in turn potenitally leaks data stored in the buffers.
More detailed technical information is available in the TAA specific x86
architecture section: :ref:`Documentation/x86/tsx_async_abort.rst <tsx_async_abort>`.
Attack scenarios
----------------
Attacks against the TAA vulnerability can be implemented from unprivileged
applications running on hosts or guests.
As for MDS, the attacker has no control over the memory addresses that can
be leaked. Only the victim is responsible for bringing data to the CPU. As
a result, the malicious actor has to sample as much data as possible and
then postprocess it to try to infer any useful information from it.
A potential attacker only has read access to the data. Also, there is no direct
privilege escalation by using this technique.
.. _tsx_async_abort_sys_info:
TAA system information
-----------------------
The Linux kernel provides a sysfs interface to enumerate the current TAA status
of mitigated systems. The relevant sysfs file is:
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
The possible values in this file are:
.. list-table::
* - 'Vulnerable'
- The CPU is affected by this vulnerability and the microcode and kernel mitigation are not applied.
* - 'Vulnerable: Clear CPU buffers attempted, no microcode'
- The system tries to clear the buffers but the microcode might not support the operation.
* - 'Mitigation: Clear CPU buffers'
- The microcode has been updated to clear the buffers. TSX is still enabled.
* - 'Mitigation: TSX disabled'
- TSX is disabled.
* - 'Not affected'
- The CPU is not affected by this issue.
.. _ucode_needed:
Best effort mitigation mode
^^^^^^^^^^^^^^^^^^^^^^^^^^^
If the processor is vulnerable, but the availability of the microcode-based
mitigation mechanism is not advertised via CPUID the kernel selects a best
effort mitigation mode. This mode invokes the mitigation instructions
without a guarantee that they clear the CPU buffers.
This is done to address virtualization scenarios where the host has the
microcode update applied, but the hypervisor is not yet updated to expose the
CPUID to the guest. If the host has updated microcode the protection takes
effect; otherwise a few CPU cycles are wasted pointlessly.
The state in the tsx_async_abort sysfs file reflects this situation
accordingly.
Mitigation mechanism
--------------------
The kernel detects the affected CPUs and the presence of the microcode which is
required. If a CPU is affected and the microcode is available, then the kernel
enables the mitigation by default.
The mitigation can be controlled at boot time via a kernel command line option.
See :ref:`taa_mitigation_control_command_line`.
.. _virt_mechanism:
Virtualization mitigation
^^^^^^^^^^^^^^^^^^^^^^^^^
Affected systems where the host has TAA microcode and TAA is mitigated by
having disabled TSX previously, are not vulnerable regardless of the status
of the VMs.
In all other cases, if the host either does not have the TAA microcode or
the kernel is not mitigated, the system might be vulnerable.
.. _taa_mitigation_control_command_line:
Mitigation control on the kernel command line
---------------------------------------------
The kernel command line allows to control the TAA mitigations at boot time with
the option "tsx_async_abort=". The valid arguments for this option are:
============ =============================================================
off This option disables the TAA mitigation on affected platforms.
If the system has TSX enabled (see next parameter) and the CPU
is affected, the system is vulnerable.
full TAA mitigation is enabled. If TSX is enabled, on an affected
system it will clear CPU buffers on ring transitions. On
systems which are MDS-affected and deploy MDS mitigation,
TAA is also mitigated. Specifying this option on those
systems will have no effect.
full,nosmt The same as tsx_async_abort=full, with SMT disabled on
vulnerable CPUs that have TSX enabled. This is the complete
mitigation. When TSX is disabled, SMT is not disabled because
CPU is not vulnerable to cross-thread TAA attacks.
============ =============================================================
Not specifying this option is equivalent to "tsx_async_abort=full". For
processors that are affected by both TAA and MDS, specifying just
"tsx_async_abort=off" without an accompanying "mds=off" will have no
effect as the same mitigation is used for both vulnerabilities.
The kernel command line also allows to control the TSX feature using the
parameter "tsx=" on CPUs which support TSX control. MSR_IA32_TSX_CTRL is used
to control the TSX feature and the enumeration of the TSX feature bits (RTM
and HLE) in CPUID.
The valid options are:
============ =============================================================
off Disables TSX on the system.
Note that this option takes effect only on newer CPUs which are
not vulnerable to MDS, i.e., have MSR_IA32_ARCH_CAPABILITIES.MDS_NO=1
and which get the new IA32_TSX_CTRL MSR through a microcode
update. This new MSR allows for the reliable deactivation of
the TSX functionality.
on Enables TSX.
Although there are mitigations for all known security
vulnerabilities, TSX has been known to be an accelerator for
several previous speculation-related CVEs, and so there may be
unknown security risks associated with leaving it enabled.
auto Disables TSX if X86_BUG_TAA is present, otherwise enables TSX
on the system.
============ =============================================================
Not specifying this option is equivalent to "tsx=off".
The following combinations of the "tsx_async_abort" and "tsx" are possible. For
affected platforms tsx=auto is equivalent to tsx=off and the result will be:
========= ========================== =========================================
tsx=on tsx_async_abort=full The system will use VERW to clear CPU
buffers. Cross-thread attacks are still
possible on SMT machines.
tsx=on tsx_async_abort=full,nosmt As above, cross-thread attacks on SMT
mitigated.
tsx=on tsx_async_abort=off The system is vulnerable.
tsx=off tsx_async_abort=full TSX might be disabled if microcode
provides a TSX control MSR. If so,
system is not vulnerable.
tsx=off tsx_async_abort=full,nosmt Ditto
tsx=off tsx_async_abort=off ditto
========= ========================== =========================================
For unaffected platforms "tsx=on" and "tsx_async_abort=full" does not clear CPU
buffers. For platforms without TSX control (MSR_IA32_ARCH_CAPABILITIES.MDS_NO=0)
"tsx" command line argument has no effect.
For the affected platforms below table indicates the mitigation status for the
combinations of CPUID bit MD_CLEAR and IA32_ARCH_CAPABILITIES MSR bits MDS_NO
and TSX_CTRL_MSR.
======= ========= ============= ========================================
MDS_NO MD_CLEAR TSX_CTRL_MSR Status
======= ========= ============= ========================================
0 0 0 Vulnerable (needs microcode)
0 1 0 MDS and TAA mitigated via VERW
1 1 0 MDS fixed, TAA vulnerable if TSX enabled
because MD_CLEAR has no meaning and
VERW is not guaranteed to clear buffers
1 X 1 MDS fixed, TAA can be mitigated by
VERW or TSX_CTRL_MSR
======= ========= ============= ========================================
Mitigation selection guide
--------------------------
1. Trusted userspace and guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If all user space applications are from a trusted source and do not execute
untrusted code which is supplied externally, then the mitigation can be
disabled. The same applies to virtualized environments with trusted guests.
2. Untrusted userspace and guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If there are untrusted applications or guests on the system, enabling TSX
might allow a malicious actor to leak data from the host or from other
processes running on the same physical core.
If the microcode is available and the TSX is disabled on the host, attacks
are prevented in a virtualized environment as well, even if the VMs do not
explicitly enable the mitigation.
.. _taa_default_mitigations:
Default mitigations
-------------------
The kernel's default action for vulnerable processors is:
- Deploy TSX disable mitigation (tsx_async_abort=full tsx=off).

View File

@@ -17,12 +17,14 @@ etc.
kernel-parameters
devices
This section describes CPU vulnerabilities and their mitigations.
This section describes CPU vulnerabilities and provides an overview of the
possible mitigations along with guidance for selecting mitigations if they
are configurable at compile, boot or run time.
.. toctree::
:maxdepth: 1
hw-vuln/index
l1tf
Here is a set of documents aimed at users who are trying to track down
problems and bugs in particular.

View File

@@ -126,7 +126,6 @@ parameter is applicable::
NET Appropriate network support is enabled.
NUMA NUMA support is enabled.
NFS Appropriate NFS support is enabled.
OF Devicetree is enabled.
OSS OSS sound support is enabled.
PV_OPS A paravirtualized kernel is enabled.
PARIDE The ParIDE (parallel port IDE) subsystem is enabled.

View File

@@ -113,7 +113,7 @@
the GPE dispatcher.
This facility can be used to prevent such uncontrolled
GPE floodings.
Format: <byte>
Format: <int>
acpi_no_auto_serialize [HW,ACPI]
Disable auto-serialization of AML methods
@@ -136,10 +136,6 @@
dynamic table installation which will install SSDT
tables to /sys/firmware/acpi/tables/dynamic.
acpi_no_watchdog [HW,ACPI,WDT]
Ignore the ACPI-based watchdog interface (WDAT) and let
a native driver control the watchdog device instead.
acpi_rsdp= [ACPI,EFI,KEXEC]
Pass the RSDP address to the kernel, mostly used
on machines running EFI runtime service to boot the
@@ -490,14 +486,10 @@
cut the overhead, others just disable the usage. So
only cgroup_disable=memory is actually worthy}
cgroup_no_v1= [KNL] Disable cgroup controllers and named hierarchies in v1
Format: { { controller | "all" | "named" }
[,{ controller | "all" | "named" }...] }
cgroup_no_v1= [KNL] Disable one, multiple, all cgroup controllers in v1
Format: { controller[,controller...] | "all" }
Like cgroup_disable, but only applies to cgroup v1;
the blacklisted controllers remain available in cgroup2.
"all" blacklists all controllers and "named" disables
named mounts. Specifying both "all" and "named" disables
all v1 hierarchies.
cgroup.memory= [KNL] Pass options to the cgroup memory controller.
Format: <string>
@@ -562,7 +554,7 @@
loops can be debugged more effectively on production
systems.
clearcpuid=BITNUM[,BITNUM...] [X86]
clearcpuid=BITNUM [X86]
Disable CPUID feature X for the kernel. See
arch/x86/include/asm/cpufeatures.h for the valid bit
numbers. Note the Linux specific bits are not necessarily
@@ -905,10 +897,6 @@
The filter can be disabled or changed to another
driver later using sysfs.
driver_async_probe= [KNL]
List of driver names to be probed asynchronously.
Format: <driver_name1>,<driver_name2>...
drm.edid_firmware=[<connector>:]<file>[,[<connector>:]<file>]
Broken monitors, graphic adapters, KVMs and EDIDless
panels may send no or incorrect EDID data sets.
@@ -1075,7 +1063,7 @@
earlyprintk=serial[,0x...[,baudrate]]
earlyprintk=ttySn[,baudrate]
earlyprintk=dbgp[debugController#]
earlyprintk=pciserial[,force],bus:device.function[,baudrate]
earlyprintk=pciserial,bus:device.function[,baudrate]
earlyprintk=xdbc[xhciController#]
earlyprintk is useful when the kernel crashes before
@@ -1107,10 +1095,6 @@
The sclp output can only be used on s390.
The optional "force" to "pciserial" enables use of a
PCI device even when its classcode is not of the
UART class.
edac_report= [HW,EDAC] Control how to report EDAC event
Format: {"on" | "off" | "force"}
on: enable EDAC to report H/W event. May be overridden
@@ -1643,15 +1627,6 @@
initrd= [BOOT] Specify the location of the initial ramdisk
init_on_alloc= [MM] Fill newly allocated pages and heap objects with
zeroes.
Format: 0 | 1
Default set by CONFIG_INIT_ON_ALLOC_DEFAULT_ON.
init_on_free= [MM] Fill freed pages and heap objects with zeroes.
Format: 0 | 1
Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.
init_pkru= [x86] Specify the default memory protection keys rights
register contents for all processes. 0x55555554 by
default (disallow access to all but pkey 0). Can
@@ -1967,12 +1942,6 @@
Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y,
the default is off.
kpti= [ARM64] Control page table isolation of user
and kernel address spaces.
Default: enabled on cores which need mitigation.
0: force disabled
1: force enabled
kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
Default is 0 (don't ignore, but inject #GP)
@@ -1983,25 +1952,6 @@
KVM MMU at runtime.
Default is 0 (off)
kvm.nx_huge_pages=
[KVM] Controls the software workaround for the
X86_BUG_ITLB_MULTIHIT bug.
force : Always deploy workaround.
off : Never deploy workaround.
auto : Deploy workaround based on the presence of
X86_BUG_ITLB_MULTIHIT.
Default is 'auto'.
If the software workaround is enabled for the host,
guests do need not to enable it for nested guests.
kvm.nx_huge_pages_recovery_ratio=
[KVM] Controls how many 4KiB pages are periodically zapped
back to huge pages. 0 disables the recovery, otherwise if
the value is N KVM will zap 1/Nth of the 4KiB pages every
minute. The default is 60.
kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM.
Default is 1 (enabled)
@@ -2119,13 +2069,10 @@
off
Disables hypervisor mitigations and doesn't
emit any warnings.
It also drops the swap size and available
RAM limit restriction on both hypervisor and
bare metal.
Default is 'flush'.
For details see: Documentation/admin-guide/hw-vuln/l1tf.rst
For details see: Documentation/admin-guide/l1tf.rst
l2cr= [PPC]
@@ -2365,38 +2312,6 @@
Format: <first>,<last>
Specifies range of consoles to be captured by the MDA.
mds= [X86,INTEL]
Control mitigation for the Micro-architectural Data
Sampling (MDS) vulnerability.
Certain CPUs are vulnerable to an exploit against CPU
internal buffers which can forward information to a
disclosure gadget under certain conditions.
In vulnerable processors, the speculatively
forwarded data can be used in a cache side channel
attack, to access data to which the attacker does
not have direct access.
This parameter controls the MDS mitigation. The
options are:
full - Enable MDS mitigation on vulnerable CPUs
full,nosmt - Enable MDS mitigation and disable
SMT on vulnerable CPUs
off - Unconditionally disable MDS mitigation
On TAA-affected machines, mds=off can be prevented by
an active TAA mitigation as both vulnerabilities are
mitigated with the same mechanism so in order to disable
this mitigation, you need to specify tsx_async_abort=off
too.
Not specifying this option is equivalent to
mds=full.
For details see: Documentation/admin-guide/hw-vuln/mds.rst
mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory
Amount of memory to be used when the kernel is not able
to see the whole system memory or for test.
@@ -2554,53 +2469,6 @@
in the "bleeding edge" mini2440 support kernel at
http://repo.or.cz/w/linux-2.6/mini2440.git
mitigations=
[X86,PPC,S390,ARM64] Control optional mitigations for
CPU vulnerabilities. This is a set of curated,
arch-independent options, each of which is an
aggregation of existing arch-specific options.
off
Disable all optional CPU mitigations. This
improves system performance, but it may also
expose users to several CPU vulnerabilities.
Equivalent to: nopti [X86,PPC]
kpti=0 [ARM64]
nospectre_v1 [PPC]
nobp=0 [S390]
nospectre_v1 [X86]
nospectre_v2 [X86,PPC,S390,ARM64]
spectre_v2_user=off [X86]
spec_store_bypass_disable=off [X86,PPC]
ssbd=force-off [ARM64]
l1tf=off [X86]
mds=off [X86]
tsx_async_abort=off [X86]
kvm.nx_huge_pages=off [X86]
no_entry_flush [PPC]
no_uaccess_flush [PPC]
Exceptions:
This does not have any effect on
kvm.nx_huge_pages when
kvm.nx_huge_pages=force.
auto (default)
Mitigate all CPU vulnerabilities, but leave SMT
enabled, even if it's vulnerable. This is for
users who don't want to be surprised by SMT
getting disabled across kernel upgrades, or who
have other ways of avoiding SMT-based attacks.
Equivalent to: (default behavior)
auto,nosmt
Mitigate all CPU vulnerabilities, disabling SMT
if needed. This is for users who always want to
be fully mitigated, even if it means losing SMT.
Equivalent to: l1tf=flush,nosmt [X86]
mds=full,nosmt [X86]
tsx_async_abort=full,nosmt [X86]
mminit_loglevel=
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
parameter allows control of the logging verbosity for
@@ -2889,8 +2757,6 @@
noefi Disable EFI runtime services support.
no_entry_flush [PPC] Don't flush the L1-D cache when entering the kernel.
noexec [IA-64]
noexec [X86]
@@ -2928,21 +2794,18 @@
nosmt=force: Force disable SMT, cannot be undone
via the sysfs control file.
nospectre_v1 [X66, PPC] Disable mitigations for Spectre Variant 1
(bounds check bypass). With this option data leaks
are possible in the system.
nospectre_v1 [PPC] Disable mitigations for Spectre Variant 1 (bounds
check bypass). With this option data leaks are possible
in the system.
nospectre_v2 [X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for
the Spectre variant 2 (indirect branch prediction)
vulnerability. System may allow data leaks with this
option.
nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2
(indirect branch prediction) vulnerability. System may
allow data leaks with this option, which is equivalent
to spectre_v2=off.
nospec_store_bypass_disable
[HW] Disable all mitigations for the Speculative Store Bypass vulnerability
no_uaccess_flush
[PPC] Don't flush the L1-D cache after accessing user data.
noxsave [BUGS=X86] Disables x86 extended register state save
and restore using xsave. The kernel will fallback to
enabling legacy floating-point and sse state.
@@ -3136,12 +2999,6 @@
This can be set from sysctl after boot.
See Documentation/sysctl/vm.txt for details.
of_devlink [OF, KNL] Create device links between consumer and
supplier devices by scanning the devictree to infer the
consumer/supplier relationships. A consumer device
will not be probed until all the supplier devices have
probed successfully.
ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
See Documentation/debugging-via-ohci1394.txt for more
info.
@@ -3619,10 +3476,6 @@
before loading.
See Documentation/blockdev/ramdisk.txt.
psi= [KNL] Enable or disable pressure stall information
tracking.
Format: <bool>
psmouse.proto= [HW,MOUSE] Highest PS2 mouse protocol extension to
probe for; one of (bare|imps|exps|lifebook|any).
psmouse.rate= [HW,MOUSE] Set desired mouse report rate, in reports
@@ -4027,13 +3880,6 @@
Run specified binary instead of /init from the ramdisk,
used for early userspace startup. See initrd.
rdrand= [X86]
force - Override the decision by the kernel to hide the
advertisement of RDRAND support (this affects
certain AMD processors because of buggy BIOS
support, specifically around the suspend/resume
path).
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
@@ -4047,9 +3893,7 @@
[[,]s[mp]#### \
[[,]b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]] \
[[,]f[orce]
Where reboot_mode is one of warm (soft) or cold (hard) or gpio
(prefix with 'panic_' to set mode for panic
reboot only),
Where reboot_mode is one of warm (soft) or cold (hard) or gpio,
reboot_type is one of bios, acpi, kbd, triple, efi, or pci,
reboot_force is either force or not specified,
reboot_cpu is s[mp]#### with #### being the processor
@@ -4321,13 +4165,9 @@
spectre_v2= [X86] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability.
The default operation protects the kernel from
user space attacks.
on - unconditionally enable, implies
spectre_v2_user=on
off - unconditionally disable, implies
spectre_v2_user=off
on - unconditionally enable
off - unconditionally disable
auto - kernel detects whether your CPU model is
vulnerable
@@ -4337,12 +4177,6 @@
CONFIG_RETPOLINE configuration option, and the
compiler with which the kernel was built.
Selecting 'on' will also enable the mitigation
against user space to user space task attacks.
Selecting 'off' will disable both the kernel and
the user space protections.
Specific mitigations can also be selected manually:
retpoline - replace indirect branches
@@ -4352,48 +4186,6 @@
Not specifying this option is equivalent to
spectre_v2=auto.
spectre_v2_user=
[X86] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability between
user space tasks
on - Unconditionally enable mitigations. Is
enforced by spectre_v2=on
off - Unconditionally disable mitigations. Is
enforced by spectre_v2=off
prctl - Indirect branch speculation is enabled,
but mitigation can be enabled via prctl
per thread. The mitigation control state
is inherited on fork.
prctl,ibpb
- Like "prctl" above, but only STIBP is
controlled per thread. IBPB is issued
always when switching between different user
space processes.
seccomp
- Same as "prctl" above, but all seccomp
threads will enable the mitigation unless
they explicitly opt out.
seccomp,ibpb
- Like "seccomp" above, but only STIBP is
controlled per thread. IBPB is issued
always when switching between different
user space processes.
auto - Kernel selects the mitigation depending on
the available CPU features and vulnerability.
Default mitigation:
If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
Not specifying this option is equivalent to
spectre_v2_user=auto.
spec_store_bypass_disable=
[HW] Control Speculative Store Bypass (SSB) Disable mitigation
(Speculative Store Bypass vulnerability)
@@ -4451,26 +4243,6 @@
spia_pedr=
spia_peddr=
srbds= [X86,INTEL]
Control the Special Register Buffer Data Sampling
(SRBDS) mitigation.
Certain CPUs are vulnerable to an MDS-like
exploit which can leak bits from the random
number generator.
By default, this issue is mitigated by
microcode. However, the microcode fix can cause
the RDRAND and RDSEED instructions to become
much slower. Among other effects, this will
result in reduced throughput from /dev/urandom.
The microcode mitigation can be disabled with
the following option:
off: Disable mitigation and remove
performance impact to RDRAND and RDSEED
srcutree.counter_wrap_check [KNL]
Specifies how frequently to check for
grace-period sequence counter wrap for the
@@ -4789,76 +4561,6 @@
marks the TSC unconditionally unstable at bootup and
avoids any further wobbles once the TSC watchdog notices.
tsx= [X86] Control Transactional Synchronization
Extensions (TSX) feature in Intel processors that
support TSX control.
This parameter controls the TSX feature. The options are:
on - Enable TSX on the system. Although there are
mitigations for all known security vulnerabilities,
TSX has been known to be an accelerator for
several previous speculation-related CVEs, and
so there may be unknown security risks associated
with leaving it enabled.
off - Disable TSX on the system. (Note that this
option takes effect only on newer CPUs which are
not vulnerable to MDS, i.e., have
MSR_IA32_ARCH_CAPABILITIES.MDS_NO=1 and which get
the new IA32_TSX_CTRL MSR through a microcode
update. This new MSR allows for the reliable
deactivation of the TSX functionality.)
auto - Disable TSX if X86_BUG_TAA is present,
otherwise enable TSX on the system.
Not specifying this option is equivalent to tsx=off.
See Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
for more details.
tsx_async_abort= [X86,INTEL] Control mitigation for the TSX Async
Abort (TAA) vulnerability.
Similar to Micro-architectural Data Sampling (MDS)
certain CPUs that support Transactional
Synchronization Extensions (TSX) are vulnerable to an
exploit against CPU internal buffers which can forward
information to a disclosure gadget under certain
conditions.
In vulnerable processors, the speculatively forwarded
data can be used in a cache side channel attack, to
access data to which the attacker does not have direct
access.
This parameter controls the TAA mitigation. The
options are:
full - Enable TAA mitigation on vulnerable CPUs
if TSX is enabled.
full,nosmt - Enable TAA mitigation and disable SMT on
vulnerable CPUs. If TSX is disabled, SMT
is not disabled because CPU is not
vulnerable to cross-thread TAA attacks.
off - Unconditionally disable TAA mitigation
On MDS-affected machines, tsx_async_abort=off can be
prevented by an active MDS mitigation as both vulnerabilities
are mitigated with the same mechanism so in order to disable
this mitigation, you need to specify mds=off too.
Not specifying this option is equivalent to
tsx_async_abort=full. On CPUs which are MDS affected
and deploy MDS mitigation, TAA mitigation is not
required and doesn't provide any additional
mitigation.
For details see:
Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
turbografx.map[2|3]= [HW,JOY]
TurboGraFX parallel port interface
Format:
@@ -4981,8 +4683,6 @@
prevent spurious wakeup);
n = USB_QUIRK_DELAY_CTRL_MSG (Device needs a
pause after every control message);
o = USB_QUIRK_HUB_SLOW_RESET (Hub needs extra
delay after resetting its port);
Example: quirks=0781:5580:bk,0a5c:5834:gij
usbhid.mousepoll=
@@ -5007,13 +4707,13 @@
Flags is a set of characters, each corresponding
to a common usb-storage quirk flag as follows:
a = SANE_SENSE (collect more than 18 bytes
of sense data, not on uas);
of sense data);
b = BAD_SENSE (don't collect more than 18
bytes of sense data, not on uas);
bytes of sense data);
c = FIX_CAPACITY (decrease the reported
device capacity by one sector);
d = NO_READ_DISC_INFO (don't use
READ_DISC_INFO command, not on uas);
READ_DISC_INFO command);
e = NO_READ_CAPACITY_16 (don't use
READ_CAPACITY_16 command);
f = NO_REPORT_OPCODES (don't use report opcodes
@@ -5027,20 +4727,18 @@
device);
j = NO_REPORT_LUNS (don't use report luns
command, uas only);
k = NO_SAME (do not use WRITE_SAME, uas only)
l = NOT_LOCKABLE (don't try to lock and
unlock ejectable media, not on uas);
unlock ejectable media);
m = MAX_SECTORS_64 (don't transfer more
than 64 sectors = 32 KB at a time,
not on uas);
than 64 sectors = 32 KB at a time);
n = INITIAL_READ10 (force a retry of the
initial READ(10) command, not on uas);
initial READ(10) command);
o = CAPACITY_OK (accept the capacity
reported by the device, not on uas);
reported by the device);
p = WRITE_CACHE (the device cache is ON
by default, not on uas);
by default);
r = IGNORE_RESIDUE (the device reports
bogus residue values, not on uas);
bogus residue values);
s = SINGLE_LUN (the device has only one
Logical Unit);
t = NO_ATA_1X (don't allow ATA(12) and ATA(16)
@@ -5049,8 +4747,7 @@
w = NO_WP_DETECT (don't test whether the
medium is write-protected).
y = ALWAYS_SYNC (issue a SYNCHRONIZE_CACHE
even if the device claims no cache,
not on uas)
even if the device claims no cache)
Example: quirks=0419:aaf5:rl,0421:0433:rc
user_debug= [KNL,ARM]
@@ -5158,6 +4855,12 @@
emulate [default] Vsyscalls turn into traps and are
emulated reasonably safely.
native Vsyscalls are native syscall instructions.
This is a little bit faster than trapping
and makes a few dynamic recompilers work
better than they would in emulation mode.
It also makes exploits much easier to write.
none Vsyscalls don't work at all. This makes
them quite hard to use for exploits but
might break your system.
@@ -5289,10 +4992,6 @@
the unplug protocol
never -- do not unplug even if version check succeeds
xen_legacy_crash [X86,XEN]
Crash from Xen panic notifier, without executing late
panic() code such as dumping handler.
xen_nopvspin [X86,XEN]
Disables the ticketlock slowpath using Xen PV
optimizations.
@@ -5307,14 +5006,6 @@
with /sys/devices/system/xen_memory/xen_memory0/scrub_pages.
Default value controlled with CONFIG_XEN_SCRUB_PAGES_DEFAULT.
xen.event_eoi_delay= [XEN]
How long to delay EOI handling in case of event
storms (jiffies). Default is 10.
xen.event_loop_timeout= [XEN]
After which time (jiffies) the event handling loop
should start to delay EOI handling. Default is 2.
xirc2ps_cs= [NET,PCMCIA]
Format:
<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]

View File

@@ -405,9 +405,6 @@ time with the option "l1tf=". The valid arguments for this option are:
off Disables hypervisor mitigations and doesn't emit any
warnings.
It also drops the swap size and available RAM limit restrictions
on both hypervisor and bare metal.
============ =============================================================
The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
@@ -445,7 +442,6 @@ The default is 'cond'. If 'l1tf=full,force' is given on the kernel command
line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
module parameter is ignored and writes to the sysfs file are rejected.
.. _mitigation_selection:
Mitigation selection guide
--------------------------
@@ -557,7 +553,7 @@ When nested virtualization is in use, three operating systems are involved:
the bare metal hypervisor, the nested hypervisor and the nested virtual
machine. VMENTER operations from the nested hypervisor into the nested
guest will always be processed by the bare metal hypervisor. If KVM is the
bare metal hypervisor it will:
bare metal hypervisor it wiil:
- Flush the L1D cache on every switch from the nested hypervisor to the
nested virtual machine, so that the nested hypervisor's secrets are not
@@ -580,8 +576,7 @@ Default mitigations
The kernel default mitigations for vulnerable processors are:
- PTE inversion to protect against malicious user space. This is done
unconditionally and cannot be controlled. The swap storage is limited
to ~16TB.
unconditionally and cannot be controlled.
- L1D conditional flushing on VMENTER when EPT is enabled for
a guest.

View File

@@ -587,64 +587,6 @@ This governor exposes the following tunables:
It effectively causes the frequency to go down ``sampling_down_factor``
times slower than it ramps up.
``interactive``
----------------
The CPUfreq governor `interactive` is designed for latency-sensitive,
interactive workloads. This governor sets the CPU speed depending on
usage, similar to `ondemand` and `conservative` governors, but with a
different set of configurable behaviors.
The tunable values for this governor are:
``above_hispeed_delay``
When speed is at or above hispeed_freq, wait for
this long before raising speed in response to continued high load.
The format is a single delay value, optionally followed by pairs of
CPU speeds and the delay to use at or above those speeds. Colons can
be used between the speeds and associated delays for readability. For
example:
80000 1300000:200000 1500000:40000
uses delay 80000 uS until CPU speed 1.3 GHz, at which speed delay
200000 uS is used until speed 1.5 GHz, at which speed (and above)
delay 40000 uS is used. If speeds are specified these must appear in
ascending order. Default is 20000 uS.
``boost``
If non-zero, immediately boost speed of all CPUs to at least
hispeed_freq until zero is written to this attribute. If zero, allow
CPU speeds to drop below hispeed_freq according to load as usual.
Default is zero.
``boostpulse``
On each write, immediately boost speed of all CPUs to
hispeed_freq for at least the period of time specified by
boostpulse_duration, after which speeds are allowed to drop below
hispeed_freq according to load as usual. Its a write-only file.
``boostpulse_duration``
Length of time to hold CPU speed at hispeed_freq
on a write to boostpulse, before allowing speed to drop according to
load as usual. Default is 80000 uS.
``go_hispeed_load``
The CPU load at which to ramp to hispeed_freq.
Default is 99%.
``hispeed_freq``
An intermediate "high speed" at which to initially ramp
when CPU load hits the value specified in go_hispeed_load. If load
stays high for the amount of time specified in above_hispeed_delay,
then speed may be bumped higher. Default is the maximum speed allowed
by the policy at governor initialization time.
``io_is_busy``
If set, the governor accounts IO time as CPU busy time.
``min_sample_time``
The minimum amount of time to spend at the current
Frequency Boost Support
=======================

View File

@@ -26,35 +26,23 @@ information is helpful. Any exploit code is very helpful and will not
be released without consent from the reporter unless it has already been
made public.
Disclosure and embargoed information
------------------------------------
Disclosure
----------
The security list is not a disclosure channel. For that, see Coordination
below.
The goal of the Linux kernel security team is to work with the bug
submitter to understand and fix the bug. We prefer to publish the fix as
soon as possible, but try to avoid public discussion of the bug itself
and leave that to others.
Once a robust fix has been developed, the release process starts. Fixes
for publicly known bugs are released immediately.
Although our preference is to release fixes for publicly undisclosed bugs
as soon as they become available, this may be postponed at the request of
the reporter or an affected party for up to 7 calendar days from the start
of the release process, with an exceptional extension to 14 calendar days
if it is agreed that the criticality of the bug requires more time. The
only valid reason for deferring the publication of a fix is to accommodate
the logistics of QA and large scale rollouts which require release
coordination.
Whilst embargoed information may be shared with trusted individuals in
order to develop a fix, such information will not be published alongside
the fix or on any other disclosure channel without the permission of the
reporter. This includes but is not limited to the original bug report
and followup discussions (if any), exploits, CVE information or the
identity of the reporter.
In other words our only interest is in getting bugs fixed. All other
information submitted to the security list and any followup discussions
of the report are treated confidentially even after the embargo has been
lifted, in perpetuity.
Publishing the fix may be delayed when the bug or the fix is not yet
fully understood, the solution is not well-tested or for vendor
coordination. However, we expect these delays to be short, measurable in
days, not weeks or months. A release date is negotiated by the security
team working with the bug submitter as well as vendors. However, the
kernel security team holds the final say when setting a timeframe. The
timeframe varies from immediate (esp. if it's already publicly known bug)
to a few weeks. As a basic default policy, we expect report date to
release date to be on the order of 7 days.
Coordination
------------
@@ -80,7 +68,7 @@ may delay the bug handling. If a reporter wishes to have a CVE identifier
assigned ahead of public disclosure, they will need to contact the private
linux-distros list, described above. When such a CVE identifier is known
before a patch is provided, it is desirable to mention it in the commit
message if the reporter agrees.
message, though.
Non-disclosure agreements
-------------------------

View File

@@ -6,7 +6,7 @@ TL;DR summary
* Use only NEON instructions, or VFP instructions that don't rely on support
code
* Isolate your NEON code in a separate compilation unit, and compile it with
'-march=armv7-a -mfpu=neon -mfloat-abi=softfp'
'-mfpu=neon -mfloat-abi=softfp'
* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
NEON code
* Don't sleep in your NEON code, and be aware that it will be executed with
@@ -87,7 +87,7 @@ instructions appearing in unexpected places if no special care is taken.
Therefore, the recommended and only supported way of using NEON/VFP in the
kernel is by adhering to the following rules:
* isolate the NEON code in a separate compilation unit and compile it with
'-march=armv7-a -mfpu=neon -mfloat-abi=softfp';
'-mfpu=neon -mfloat-abi=softfp';
* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
into the unit containing the NEON code from a compilation unit which is *not*
built with the GCC flag '-mfpu=neon' set.

View File

@@ -178,7 +178,3 @@ HWCAP_ILRCPC
HWCAP_FLAGM
Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001.
HWCAP_SSBS
Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010.

View File

@@ -1,27 +0,0 @@
==================
ARM64 Architecture
==================
.. toctree::
:maxdepth: 1
acpi_object_usage
arm-acpi
booting
cpu-feature-registers
elf_hwcaps
hugetlbpage
legacy_instructions
memory
pointer-authentication
silicon-errata
sve
tagged-address-abi
tagged-pointers
.. only:: subproject and html
Indices
=======
* :ref:`genindex`

View File

@@ -44,8 +44,6 @@ stable kernels.
| Implementor | Component | Erratum ID | Kconfig |
+----------------+-----------------+-----------------+-----------------------------+
| Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 |
| | | | |
| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 |
| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 |
| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 |
@@ -58,8 +56,6 @@ stable kernels.
| ARM | Cortex-A72 | #853709 | N/A |
| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 |
| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 |
| ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 |
| ARM | Neoverse-N1 | #1542419 | ARM64_ERRATUM_1542419 |
| ARM | MMU-500 | #841119,#826419 | N/A |
| | | | |
| Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 |

View File

@@ -1,163 +0,0 @@
==========================
AArch64 TAGGED ADDRESS ABI
==========================
Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
Catalin Marinas <catalin.marinas@arm.com>
Date: 21 August 2019
This document describes the usage and semantics of the Tagged Address
ABI on AArch64 Linux.
1. Introduction
---------------
On AArch64 the ``TCR_EL1.TBI0`` bit is set by default, allowing
userspace (EL0) to perform memory accesses through 64-bit pointers with
a non-zero top byte. This document describes the relaxation of the
syscall ABI that allows userspace to pass certain tagged pointers to
kernel syscalls.
2. AArch64 Tagged Address ABI
-----------------------------
From the kernel syscall interface perspective and for the purposes of
this document, a "valid tagged pointer" is a pointer with a potentially
non-zero top-byte that references an address in the user process address
space obtained in one of the following ways:
- ``mmap()`` syscall where either:
- flags have the ``MAP_ANONYMOUS`` bit set or
- the file descriptor refers to a regular file (including those
returned by ``memfd_create()``) or ``/dev/zero``
- ``brk()`` syscall (i.e. the heap area between the initial location of
the program break at process creation and its current location).
- any memory mapped by the kernel in the address space of the process
during creation and with the same restrictions as for ``mmap()`` above
(e.g. data, bss, stack).
The AArch64 Tagged Address ABI has two stages of relaxation depending
how the user addresses are used by the kernel:
1. User addresses not accessed by the kernel but used for address space
management (e.g. ``mprotect()``, ``madvise()``). The use of valid
tagged pointers in this context is allowed with the exception of
``brk()``, ``mmap()`` and the ``new_address`` argument to
``mremap()`` as these have the potential to alias with existing
user addresses.
NOTE: This behaviour changed in v5.6 and so some earlier kernels may
incorrectly accept valid tagged pointers for the ``brk()``,
``mmap()`` and ``mremap()`` system calls.
2. User addresses accessed by the kernel (e.g. ``write()``). This ABI
relaxation is disabled by default and the application thread needs to
explicitly enable it via ``prctl()`` as follows:
- ``PR_SET_TAGGED_ADDR_CTRL``: enable or disable the AArch64 Tagged
Address ABI for the calling thread.
The ``(unsigned int) arg2`` argument is a bit mask describing the
control mode used:
- ``PR_TAGGED_ADDR_ENABLE``: enable AArch64 Tagged Address ABI.
Default status is disabled.
Arguments ``arg3``, ``arg4``, and ``arg5`` must be 0.
- ``PR_GET_TAGGED_ADDR_CTRL``: get the status of the AArch64 Tagged
Address ABI for the calling thread.
Arguments ``arg2``, ``arg3``, ``arg4``, and ``arg5`` must be 0.
The ABI properties described above are thread-scoped, inherited on
clone() and fork() and cleared on exec().
Calling ``prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)``
returns ``-EINVAL`` if the AArch64 Tagged Address ABI is globally
disabled by ``sysctl abi.tagged_addr_disabled=1``. The default
``sysctl abi.tagged_addr_disabled`` configuration is 0.
When the AArch64 Tagged Address ABI is enabled for a thread, the
following behaviours are guaranteed:
- All syscalls except the cases mentioned in section 3 can accept any
valid tagged pointer.
- The syscall behaviour is undefined for invalid tagged pointers: it may
result in an error code being returned, a (fatal) signal being raised,
or other modes of failure.
- The syscall behaviour for a valid tagged pointer is the same as for
the corresponding untagged pointer.
A definition of the meaning of tagged pointers on AArch64 can be found
in Documentation/arm64/tagged-pointers.rst.
3. AArch64 Tagged Address ABI Exceptions
-----------------------------------------
The following system call parameters must be untagged regardless of the
ABI relaxation:
- ``prctl()`` other than pointers to user data either passed directly or
indirectly as arguments to be accessed by the kernel.
- ``ioctl()`` other than pointers to user data either passed directly or
indirectly as arguments to be accessed by the kernel.
- ``shmat()`` and ``shmdt()``.
Any attempt to use non-zero tagged pointers may result in an error code
being returned, a (fatal) signal being raised, or other modes of
failure.
4. Example of correct usage
---------------------------
.. code-block:: c
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#define PR_SET_TAGGED_ADDR_CTRL 55
#define PR_TAGGED_ADDR_ENABLE (1UL << 0)
#define TAG_SHIFT 56
int main(void)
{
int tbi_enabled = 0;
unsigned long tag = 0;
char *ptr;
/* check/enable the tagged address ABI */
if (!prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0))
tbi_enabled = 1;
/* memory allocation */
ptr = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (ptr == MAP_FAILED)
return 1;
/* set a non-zero tag if the ABI is available */
if (tbi_enabled)
tag = rand() & 0xff;
ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT));
/* memory access to a tagged address */
strcpy(ptr, "tagged pointer\n");
/* syscall with a tagged pointer */
write(1, ptr, strlen(ptr));
return 0;
}

View File

@@ -1,75 +0,0 @@
=========================================
Tagged virtual addresses in AArch64 Linux
=========================================
Author: Will Deacon <will.deacon@arm.com>
Date : 12 June 2013
This document briefly describes the provision of tagged virtual
addresses in the AArch64 translation system and their potential uses
in AArch64 Linux.
The kernel configures the translation tables so that translations made
via TTBR0 (i.e. userspace mappings) have the top byte (bits 63:56) of
the virtual address ignored by the translation hardware. This frees up
this byte for application use.
Passing tagged addresses to the kernel
--------------------------------------
All interpretation of userspace memory addresses by the kernel assumes
an address tag of 0x00, unless the application enables the AArch64
Tagged Address ABI explicitly
(Documentation/arm64/tagged-address-abi.rst).
This includes, but is not limited to, addresses found in:
- pointer arguments to system calls, including pointers in structures
passed to system calls,
- the stack pointer (sp), e.g. when interpreting it to deliver a
signal,
- the frame pointer (x29) and frame records, e.g. when interpreting
them to generate a backtrace or call graph.
Using non-zero address tags in any of these locations when the
userspace application did not enable the AArch64 Tagged Address ABI may
result in an error code being returned, a (fatal) signal being raised,
or other modes of failure.
For these reasons, when the AArch64 Tagged Address ABI is disabled,
passing non-zero address tags to the kernel via system calls is
forbidden, and using a non-zero address tag for sp is strongly
discouraged.
Programs maintaining a frame pointer and frame records that use non-zero
address tags may suffer impaired or inaccurate debug and profiling
visibility.
Preserving tags
---------------
Non-zero tags are not preserved when delivering signals. This means that
signal handlers in applications making use of tags cannot rely on the
tag information for user virtual addresses being maintained for fields
inside siginfo_t. One exception to this rule is for signals raised in
response to watchpoint debug exceptions, where the tag information will
be preserved.
The architecture prevents the use of a tagged PC, so the upper byte will
be set to a sign-extension of bit 55 on exception return.
This behaviour is maintained when the AArch64 Tagged Address ABI is
enabled.
Other considerations
--------------------
Special care should be taken when using tagged pointers, since it is
likely that C compilers will not hazard two virtual addresses differing
only in the upper byte.

View File

@@ -18,9 +18,7 @@ Passing tagged addresses to the kernel
--------------------------------------
All interpretation of userspace memory addresses by the kernel assumes
an address tag of 0x00, unless the application enables the AArch64
Tagged Address ABI explicitly
(Documentation/arm64/tagged-address-abi.rst).
an address tag of 0x00.
This includes, but is not limited to, addresses found in:
@@ -33,15 +31,13 @@ This includes, but is not limited to, addresses found in:
- the frame pointer (x29) and frame records, e.g. when interpreting
them to generate a backtrace or call graph.
Using non-zero address tags in any of these locations when the
userspace application did not enable the AArch64 Tagged Address ABI may
result in an error code being returned, a (fatal) signal being raised,
or other modes of failure.
Using non-zero address tags in any of these locations may result in an
error code being returned, a (fatal) signal being raised, or other modes
of failure.
For these reasons, when the AArch64 Tagged Address ABI is disabled,
passing non-zero address tags to the kernel via system calls is
forbidden, and using a non-zero address tag for sp is strongly
discouraged.
For these reasons, passing non-zero address tags to the kernel via
system calls is forbidden, and using a non-zero address tag for sp is
strongly discouraged.
Programs maintaining a frame pointer and frame records that use non-zero
address tags may suffer impaired or inaccurate debug and profiling
@@ -61,9 +57,6 @@ be preserved.
The architecture prevents the use of a tagged PC, so the upper byte will
be set to a sign-extension of bit 55 on exception return.
This behaviour is maintained when the AArch64 Tagged Address ABI is
enabled.
Other considerations
--------------------

View File

@@ -177,9 +177,6 @@ These helper barriers exist because architectures have varying implicit
ordering on their SMP atomic primitives. For example our TSO architectures
provide full ordered atomics and these barriers are no-ops.
NOTE: when the atomic RmW ops are fully ordered, they should also imply a
compiler barrier.
Thus:
atomic_fetch_add();

View File

@@ -1,183 +0,0 @@
.. SPDX-License-Identifier: GPL-2.0
=================
Inline Encryption
=================
Objective
=========
We want to support inline encryption (IE) in the kernel.
To allow for testing, we also want a crypto API fallback when actual
IE hardware is absent. We also want IE to work with layered devices
like dm and loopback (i.e. we want to be able to use the IE hardware
of the underlying devices if present, or else fall back to crypto API
en/decryption).
Constraints and notes
=====================
- IE hardware have a limited number of "keyslots" that can be programmed
with an encryption context (key, algorithm, data unit size, etc.) at any time.
One can specify a keyslot in a data request made to the device, and the
device will en/decrypt the data using the encryption context programmed into
that specified keyslot. When possible, we want to make multiple requests with
the same encryption context share the same keyslot.
- We need a way for filesystems to specify an encryption context to use for
en/decrypting a struct bio, and a device driver (like UFS) needs to be able
to use that encryption context when it processes the bio.
- We need a way for device drivers to expose their capabilities in a unified
way to the upper layers.
Design
======
We add a struct bio_crypt_ctx to struct bio that can represent an
encryption context, because we need to be able to pass this encryption
context from the FS layer to the device driver to act upon.
While IE hardware works on the notion of keyslots, the FS layer has no
knowledge of keyslots - it simply wants to specify an encryption context to
use while en/decrypting a bio.
We introduce a keyslot manager (KSM) that handles the translation from
encryption contexts specified by the FS to keyslots on the IE hardware.
This KSM also serves as the way IE hardware can expose their capabilities to
upper layers. The generic mode of operation is: each device driver that wants
to support IE will construct a KSM and set it up in its struct request_queue.
Upper layers that want to use IE on this device can then use this KSM in
the device's struct request_queue to translate an encryption context into
a keyslot. The presence of the KSM in the request queue shall be used to mean
that the device supports IE.
On the device driver end of the interface, the device driver needs to tell the
KSM how to actually manipulate the IE hardware in the device to do things like
programming the crypto key into the IE hardware into a particular keyslot. All
this is achieved through the :c:type:`struct keyslot_mgmt_ll_ops` that the
device driver passes to the KSM when creating it.
It uses refcounts to track which keyslots are idle (either they have no
encryption context programmed, or there are no in-flight struct bios
referencing that keyslot). When a new encryption context needs a keyslot, it
tries to find a keyslot that has already been programmed with the same
encryption context, and if there is no such keyslot, it evicts the least
recently used idle keyslot and programs the new encryption context into that
one. If no idle keyslots are available, then the caller will sleep until there
is at least one.
Blk-crypto
==========
The above is sufficient for simple cases, but does not work if there is a
need for a crypto API fallback, or if we are want to use IE with layered
devices. To these ends, we introduce blk-crypto. Blk-crypto allows us to
present a unified view of encryption to the FS (so FS only needs to specify
an encryption context and not worry about keyslots at all), and blk-crypto
can decide whether to delegate the en/decryption to IE hardware or to the
crypto API. Blk-crypto maintains an internal KSM that serves as the crypto
API fallback.
Blk-crypto needs to ensure that the encryption context is programmed into the
"correct" keyslot manager for IE. If a bio is submitted to a layered device
that eventually passes the bio down to a device that really does support IE, we
want the encryption context to be programmed into a keyslot for the KSM of the
device with IE support. However, blk-crypto does not know a priori whether a
particular device is the final device in the layering structure for a bio or
not. So in the case that a particular device does not support IE, since it is
possibly the final destination device for the bio, if the bio requires
encryption (i.e. the bio is doing a write operation), blk-crypto must fallback
to the crypto API *before* sending the bio to the device.
Blk-crypto ensures that:
- The bio's encryption context is programmed into a keyslot in the KSM of the
request queue that the bio is being submitted to (or the crypto API fallback
KSM if the request queue doesn't have a KSM), and that the ``bc_ksm``
in the ``bi_crypt_context`` is set to this KSM
- That the bio has its own individual reference to the keyslot in this KSM.
Once the bio passes through blk-crypto, its encryption context is programmed
in some KSM. The "its own individual reference to the keyslot" ensures that
keyslots can be released by each bio independently of other bios while
ensuring that the bio has a valid reference to the keyslot when, for e.g., the
crypto API fallback KSM in blk-crypto performs crypto on the device's behalf.
The individual references are ensured by increasing the refcount for the
keyslot in the ``bc_ksm`` when a bio with a programmed encryption
context is cloned.
What blk-crypto does on bio submission
--------------------------------------
**Case 1:** blk-crypto is given a bio with only an encryption context that hasn't
been programmed into any keyslot in any KSM (for e.g. a bio from the FS).
In this case, blk-crypto will program the encryption context into the KSM of the
request queue the bio is being submitted to (and if this KSM does not exist,
then it will program it into blk-crypto's internal KSM for crypto API
fallback). The KSM that this encryption context was programmed into is stored
as the ``bc_ksm`` in the bio's ``bi_crypt_context``.
**Case 2:** blk-crypto is given a bio whose encryption context has already been
programmed into a keyslot in the *crypto API fallback* KSM.
In this case, blk-crypto does nothing; it treats the bio as not having
specified an encryption context. Note that we cannot do here what we will do
in Case 3 because we would have already encrypted the bio via the crypto API
by this point.
**Case 3:** blk-crypto is given a bio whose encryption context has already been
programmed into a keyslot in some KSM (that is *not* the crypto API fallback
KSM).
In this case, blk-crypto first releases that keyslot from that KSM and then
treats the bio as in Case 1.
This way, when a device driver is processing a bio, it can be sure that
the bio's encryption context has been programmed into some KSM (either the
device driver's request queue's KSM, or blk-crypto's crypto API fallback KSM).
It then simply needs to check if the bio's ``bc_ksm`` is the device's
request queue's KSM. If so, then it should proceed with IE. If not, it should
simply do nothing with respect to crypto, because some other KSM (perhaps the
blk-crypto crypto API fallback KSM) is handling the en/decryption.
Blk-crypto will release the keyslot that is being held by the bio (and also
decrypt it if the bio is using the crypto API fallback KSM) once
``bio_remaining_done`` returns true for the bio.
Layered Devices
===============
Layered devices that wish to support IE need to create their own keyslot
manager for their request queue, and expose whatever functionality they choose.
When a layered device wants to pass a bio to another layer (either by
resubmitting the same bio, or by submitting a clone), it doesn't need to do
anything special because the bio (or the clone) will once again pass through
blk-crypto, which will work as described in Case 3. If a layered device wants
for some reason to do the IO by itself instead of passing it on to a child
device, but it also chose to expose IE capabilities by setting up a KSM in its
request queue, it is then responsible for en/decrypting the data itself. In
such cases, the device can choose to call the blk-crypto function
``blk_crypto_fallback_to_kernel_crypto_api`` (TODO: Not yet implemented), which will
cause the en/decryption to be done via the crypto API fallback.
Future Optimizations for layered devices
========================================
Creating a keyslot manager for the layered device uses up memory for each
keyslot, and in general, a layered device (like dm-linear) merely passes the
request on to a "child" device, so the keyslots in the layered device itself
might be completely unused. We can instead define a new type of KSM; the
"passthrough KSM", that layered devices can use to let blk-crypto know that
this layered device *will* pass the bio to some child device (and hence
through blk-crypto again, at which point blk-crypto can program the encryption
context, instead of programming it into the layered device's KSM). Again, if
the device "lies" and decides to do the IO itself instead of passing it on to
a child device, it is responsible for doing the en/decryption (and can choose
to call ``blk_crypto_fallback_to_kernel_crypto_api``). Another use case for the
"passthrough KSM" is for IE devices that want to manage their own keyslots/do
not have a limited number of keyslots.

View File

@@ -156,23 +156,19 @@ Per-device statistics are exported as various nodes under /sys/block/zram<id>/
A brief description of exported device attributes. For more details please
read Documentation/ABI/testing/sysfs-block-zram.
Name access description
---- ------ -----------
disksize RW show and set the device's disk size
initstate RO shows the initialization state of the device
reset WO trigger device reset
mem_used_max WO reset the `mem_used_max' counter (see later)
mem_limit WO specifies the maximum amount of memory ZRAM can use
to store the compressed data
writeback_limit WO specifies the maximum amount of write IO zram can
write out to backing device as 4KB unit
writeback_limit_enable RW show and set writeback_limit feature
max_comp_streams RW the number of possible concurrent compress operations
comp_algorithm RW show and change the compression algorithm
compact WO trigger memory compaction
debug_stat RO this file is used for zram debugging purposes
backing_dev RW set up backend storage for zram to write out
idle WO mark allocated slot as idle
Name access description
---- ------ -----------
disksize RW show and set the device's disk size
initstate RO shows the initialization state of the device
reset WO trigger device reset
mem_used_max WO reset the `mem_used_max' counter (see later)
mem_limit WO specifies the maximum amount of memory ZRAM can use
to store the compressed data
max_comp_streams RW the number of possible concurrent compress operations
comp_algorithm RW show and change the compression algorithm
compact WO trigger memory compaction
debug_stat RO this file is used for zram debugging purposes
backing_dev RW set up backend storage for zram to write out
User space is advised to use the following files to read the device statistics.
@@ -224,17 +220,6 @@ line of text and contains the following stats separated by whitespace:
pages_compacted the number of pages freed during compaction
huge_pages the number of incompressible pages
File /sys/block/zram<id>/bd_stat
The stat file represents device's backing device statistics. It consists of
a single line of text and contains the following stats separated by whitespace:
bd_count size of data written in backing device.
Unit: 4K bytes
bd_reads the number of reads from backing device
Unit: 4K bytes
bd_writes the number of writes to backing device
Unit: 4K bytes
9) Deactivate:
swapoff /dev/zram0
umount /dev/zram1
@@ -252,79 +237,11 @@ a single line of text and contains the following stats separated by whitespace:
= writeback
With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
With incompressible pages, there is no memory saving with zram.
Instead, with CONFIG_ZRAM_WRITEBACK, zram can write incompressible page
to backing storage rather than keeping it in memory.
To use the feature, admin should set up backing device via
"echo /dev/sda5 > /sys/block/zramX/backing_dev"
before disksize setting. It supports only partition at this moment.
If admin want to use incompressible page writeback, they could do via
"echo huge > /sys/block/zramX/write"
To use idle page writeback, first, user need to declare zram pages
as idle.
"echo all > /sys/block/zramX/idle"
From now on, any pages on zram are idle pages. The idle mark
will be removed until someone request access of the block.
IOW, unless there is access request, those pages are still idle pages.
Admin can request writeback of those idle pages at right timing via
"echo idle > /sys/block/zramX/writeback"
With the command, zram writeback idle pages from memory to the storage.
If there are lots of write IO with flash device, potentially, it has
flash wearout problem so that admin needs to design write limitation
to guarantee storage health for entire product life.
To overcome the concern, zram supports "writeback_limit" feature.
The "writeback_limit_enable"'s default value is 0 so that it doesn't limit
any writeback. IOW, if admin want to apply writeback budget, he should
enable writeback_limit_enable via
$ echo 1 > /sys/block/zramX/writeback_limit_enable
Once writeback_limit_enable is set, zram doesn't allow any writeback
until admin set the budget via /sys/block/zramX/writeback_limit.
(If admin doesn't enable writeback_limit_enable, writeback_limit's value
assigned via /sys/block/zramX/writeback_limit is meaninless.)
If admin want to limit writeback as per-day 400M, he could do it
like below.
$ MB_SHIFT=20
$ 4K_SHIFT=12
$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
/sys/block/zram0/writeback_limit.
$ echo 1 > /sys/block/zram0/writeback_limit_enable
If admin want to allow further write again once the bugdet is exausted,
he could do it like below
$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
/sys/block/zram0/writeback_limit
If admin want to see remaining writeback budget since he set,
$ cat /sys/block/zramX/writeback_limit
If admin want to disable writeback limit, he could do
$ echo 0 > /sys/block/zramX/writeback_limit_enable
The writeback_limit count will reset whenever you reset zram(e.g.,
system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of
writeback happened until you reset the zram to allocate extra writeback
budget in next setting is user's job.
If admin want to measure writeback count in a certain period, he could
know it via /sys/block/zram0/bd_stat's 3rd column.
User should set up backing device via /sys/block/zramX/backing_dev
before disksize setting.
= memory tracking
@@ -334,17 +251,16 @@ pages of the process with*pagemap.
If you enable the feature, you could see block state via
/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
300 75.033841 .wh.
301 63.806904 s...
302 63.806919 ..hi
300 75.033841 .wh
301 63.806904 s..
302 63.806919 ..h
First column is zram's block index.
Second column is access time since the system was booted
Third column is state of the block.
(s: same page
w: written page to backing store
h: huge page
i: idle page)
h: huge page)
First line of above example says 300th block is accessed at 75.033841sec
and the block's state is huge so it is written back to the backing

View File

@@ -37,7 +37,7 @@ needs_sphinx = '1.3'
extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure', 'sphinx.ext.ifconfig']
# The name of the math extension changed on Sphinx 1.4
if (major == 1 and minor > 3) or (major > 1):
if major == 1 and minor > 3:
extensions.append("sphinx.ext.imgmath")
else:
extensions.append("sphinx.ext.pngmath")

View File

@@ -99,20 +99,16 @@ Coarse and fast_ns access
Some additional variants exist for more specialized cases:
.. c:function:: ktime_t ktime_get_coarse( void )
ktime_t ktime_get_coarse_boottime( void )
.. c:function:: ktime_t ktime_get_coarse_boottime( void )
ktime_t ktime_get_coarse_real( void )
ktime_t ktime_get_coarse_clocktai( void )
.. c:function:: u64 ktime_get_coarse_ns( void )
u64 ktime_get_coarse_boottime_ns( void )
u64 ktime_get_coarse_real_ns( void )
u64 ktime_get_coarse_clocktai_ns( void )
ktime_t ktime_get_coarse_raw( void )
.. c:function:: void ktime_get_coarse_ts64( struct timespec64 * )
void ktime_get_coarse_boottime_ts64( struct timespec64 * )
void ktime_get_coarse_real_ts64( struct timespec64 * )
void ktime_get_coarse_clocktai_ts64( struct timespec64 * )
void ktime_get_coarse_raw_ts64( struct timespec64 * )
These are quicker than the non-coarse versions, but less accurate,
corresponding to CLOCK_MONONOTNIC_COARSE and CLOCK_REALTIME_COARSE

View File

@@ -34,6 +34,10 @@ Configure the kernel with::
CONFIG_DEBUG_FS=y
CONFIG_GCOV_KERNEL=y
select the gcc's gcov format, default is autodetect based on gcc version::
CONFIG_GCOV_FORMAT_AUTODETECT=y
and to get coverage data for the entire kernel::
CONFIG_GCOV_PROFILE_ALL=y
@@ -165,20 +169,6 @@ b) gcov is run on the BUILD machine
[user@build] gcov -o /tmp/coverage/tmp/out/init main.c
Note on compilers
-----------------
GCC and LLVM gcov tools are not necessarily compatible. Use gcov_ to work with
GCC-generated .gcno and .gcda files, and use llvm-cov_ for Clang.
.. _gcov: http://gcc.gnu.org/onlinedocs/gcc/Gcov.html
.. _llvm-cov: https://llvm.org/docs/CommandGuide/llvm-cov.html
Build differences between GCC and Clang gcov are handled by Kconfig. It
automatically selects the appropriate gcov format depending on the detected
toolchain.
Troubleshooting
---------------

View File

@@ -4,25 +4,15 @@ The Kernel Address Sanitizer (KASAN)
Overview
--------
KernelAddressSANitizer (KASAN) is a dynamic memory error detector designed to
find out-of-bound and use-after-free bugs. KASAN has two modes: generic KASAN
(similar to userspace ASan) and software tag-based KASAN (similar to userspace
HWASan).
KernelAddressSANitizer (KASAN) is a dynamic memory error detector. It provides
a fast and comprehensive solution for finding use-after-free and out-of-bounds
bugs.
KASAN uses compile-time instrumentation to insert validity checks before every
memory access, and therefore requires a compiler version that supports that.
KASAN uses compile-time instrumentation for checking every memory access,
therefore you will need a GCC version 4.9.2 or later. GCC 5.0 or later is
required for detection of out-of-bounds accesses to stack or global variables.
Generic KASAN is supported in both GCC and Clang. With GCC it requires version
4.9.2 or later for basic support and version 5.0 or later for detection of
out-of-bounds accesses for stack and global variables and for inline
instrumentation mode (see the Usage section). With Clang it requires version
7.0.0 or later and it doesn't support detection of out-of-bounds accesses for
global variables yet.
Tag-based KASAN is only supported in Clang and requires version 7.0.0 or later.
Currently generic KASAN is supported for the x86_64, arm64, xtensa and s390
architectures, and tag-based KASAN is supported only for arm64.
Currently KASAN is supported only for the x86_64 and arm64 architectures.
Usage
-----
@@ -31,14 +21,12 @@ To enable KASAN configure kernel with::
CONFIG_KASAN = y
and choose between CONFIG_KASAN_GENERIC (to enable generic KASAN) and
CONFIG_KASAN_SW_TAGS (to enable software tag-based KASAN).
and choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. Outline and
inline are compiler instrumentation types. The former produces smaller binary
the latter is 1.1 - 2 times faster. Inline instrumentation requires a GCC
version 5.0 or later.
You also need to choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE.
Outline and inline are compiler instrumentation types. The former produces
smaller binary while the latter is 1.1 - 2 times faster.
Both KASAN modes work with both SLUB and SLAB memory allocators.
KASAN works with both SLUB and SLAB memory allocators.
For better bug detection and nicer reporting, enable CONFIG_STACKTRACE.
To disable instrumentation for specific files or directories, add a line
@@ -55,85 +43,85 @@ similar to the following to the respective kernel Makefile:
Error reports
~~~~~~~~~~~~~
A typical out-of-bounds access generic KASAN report looks like this::
A typical out of bounds access report looks like this::
==================================================================
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [test_kasan]
Write of size 1 at addr ffff8801f44ec37b by task insmod/2760
BUG: AddressSanitizer: out of bounds access in kmalloc_oob_right+0x65/0x75 [test_kasan] at addr ffff8800693bc5d3
Write of size 1 by task modprobe/1689
=============================================================================
BUG kmalloc-128 (Not tainted): kasan error
-----------------------------------------------------------------------------
CPU: 1 PID: 2760 Comm: insmod Not tainted 4.19.0-rc3+ #698
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Disabling lock debugging due to kernel taint
INFO: Allocated in kmalloc_oob_right+0x3d/0x75 [test_kasan] age=0 cpu=0 pid=1689
__slab_alloc+0x4b4/0x4f0
kmem_cache_alloc_trace+0x10b/0x190
kmalloc_oob_right+0x3d/0x75 [test_kasan]
init_module+0x9/0x47 [test_kasan]
do_one_initcall+0x99/0x200
load_module+0x2cb3/0x3b20
SyS_finit_module+0x76/0x80
system_call_fastpath+0x12/0x17
INFO: Slab 0xffffea0001a4ef00 objects=17 used=7 fp=0xffff8800693bd728 flags=0x100000000004080
INFO: Object 0xffff8800693bc558 @offset=1368 fp=0xffff8800693bc720
Bytes b4 ffff8800693bc548: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
Object ffff8800693bc558: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8800693bc568: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8800693bc578: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8800693bc588: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8800693bc598: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8800693bc5a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8800693bc5b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8800693bc5c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
Redzone ffff8800693bc5d8: cc cc cc cc cc cc cc cc ........
Padding ffff8800693bc718: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
CPU: 0 PID: 1689 Comm: modprobe Tainted: G B 3.18.0-rc1-mm1+ #98
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
ffff8800693bc000 0000000000000000 ffff8800693bc558 ffff88006923bb78
ffffffff81cc68ae 00000000000000f3 ffff88006d407600 ffff88006923bba8
ffffffff811fd848 ffff88006d407600 ffffea0001a4ef00 ffff8800693bc558
Call Trace:
dump_stack+0x94/0xd8
print_address_description+0x73/0x280
kasan_report+0x144/0x187
__asan_report_store1_noabort+0x17/0x20
kmalloc_oob_right+0xa8/0xbc [test_kasan]
kmalloc_tests_init+0x16/0x700 [test_kasan]
do_one_initcall+0xa5/0x3ae
do_init_module+0x1b6/0x547
load_module+0x75df/0x8070
__do_sys_init_module+0x1c6/0x200
__x64_sys_init_module+0x6e/0xb0
do_syscall_64+0x9f/0x2c0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f96443109da
RSP: 002b:00007ffcf0b51b08 EFLAGS: 00000202 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 000055dc3ee521a0 RCX: 00007f96443109da
RDX: 00007f96445cff88 RSI: 0000000000057a50 RDI: 00007f9644992000
RBP: 000055dc3ee510b0 R08: 0000000000000003 R09: 0000000000000000
R10: 00007f964430cd0a R11: 0000000000000202 R12: 00007f96445cff88
R13: 000055dc3ee51090 R14: 0000000000000000 R15: 0000000000000000
Allocated by task 2760:
save_stack+0x43/0xd0
kasan_kmalloc+0xa7/0xd0
kmem_cache_alloc_trace+0xe1/0x1b0
kmalloc_oob_right+0x56/0xbc [test_kasan]
kmalloc_tests_init+0x16/0x700 [test_kasan]
do_one_initcall+0xa5/0x3ae
do_init_module+0x1b6/0x547
load_module+0x75df/0x8070
__do_sys_init_module+0x1c6/0x200
__x64_sys_init_module+0x6e/0xb0
do_syscall_64+0x9f/0x2c0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 815:
save_stack+0x43/0xd0
__kasan_slab_free+0x135/0x190
kasan_slab_free+0xe/0x10
kfree+0x93/0x1a0
umh_complete+0x6a/0xa0
call_usermodehelper_exec_async+0x4c3/0x640
ret_from_fork+0x35/0x40
The buggy address belongs to the object at ffff8801f44ec300
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 123 bytes inside of
128-byte region [ffff8801f44ec300, ffff8801f44ec380)
The buggy address belongs to the page:
page:ffffea0007d13b00 count:1 mapcount:0 mapping:ffff8801f7001640 index:0x0
flags: 0x200000000000100(slab)
raw: 0200000000000100 ffffea0007d11dc0 0000001a0000001a ffff8801f7001640
raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
[<ffffffff81cc68ae>] dump_stack+0x46/0x58
[<ffffffff811fd848>] print_trailer+0xf8/0x160
[<ffffffffa00026a7>] ? kmem_cache_oob+0xc3/0xc3 [test_kasan]
[<ffffffff811ff0f5>] object_err+0x35/0x40
[<ffffffffa0002065>] ? kmalloc_oob_right+0x65/0x75 [test_kasan]
[<ffffffff8120b9fa>] kasan_report_error+0x38a/0x3f0
[<ffffffff8120a79f>] ? kasan_poison_shadow+0x2f/0x40
[<ffffffff8120b344>] ? kasan_unpoison_shadow+0x14/0x40
[<ffffffff8120a79f>] ? kasan_poison_shadow+0x2f/0x40
[<ffffffffa00026a7>] ? kmem_cache_oob+0xc3/0xc3 [test_kasan]
[<ffffffff8120a995>] __asan_store1+0x75/0xb0
[<ffffffffa0002601>] ? kmem_cache_oob+0x1d/0xc3 [test_kasan]
[<ffffffffa0002065>] ? kmalloc_oob_right+0x65/0x75 [test_kasan]
[<ffffffffa0002065>] kmalloc_oob_right+0x65/0x75 [test_kasan]
[<ffffffffa00026b0>] init_module+0x9/0x47 [test_kasan]
[<ffffffff810002d9>] do_one_initcall+0x99/0x200
[<ffffffff811e4e5c>] ? __vunmap+0xec/0x160
[<ffffffff81114f63>] load_module+0x2cb3/0x3b20
[<ffffffff8110fd70>] ? m_show+0x240/0x240
[<ffffffff81115f06>] SyS_finit_module+0x76/0x80
[<ffffffff81cd3129>] system_call_fastpath+0x12/0x17
Memory state around the buggy address:
ffff8801f44ec200: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff8801f44ec280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff8801f44ec300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03
^
ffff8801f44ec380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff8801f44ec400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff8800693bc300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8800693bc380: fc fc 00 00 00 00 00 00 00 00 00 00 00 00 00 fc
ffff8800693bc400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8800693bc480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8800693bc500: fc fc fc fc fc fc fc fc fc fc fc 00 00 00 00 00
>ffff8800693bc580: 00 00 00 00 00 00 00 00 00 00 03 fc fc fc fc fc
^
ffff8800693bc600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8800693bc680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8800693bc700: fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb fb
ffff8800693bc780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8800693bc800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
The header of the report provides a short summary of what kind of bug happened
and what kind of access caused it. It's followed by a stack trace of the bad
access, a stack trace of where the accessed memory was allocated (in case bad
access happens on a slab object), and a stack trace of where the object was
freed (in case of a use-after-free bug report). Next comes a description of
the accessed slab object and information about the accessed memory page.
The header of the report discribe what kind of bug happened and what kind of
access caused it. It's followed by the description of the accessed slub object
(see 'SLUB Debug output' section in Documentation/vm/slub.rst for details) and
the description of the accessed memory page.
In the last section the report shows memory state around the accessed address.
Reading this part requires some understanding of how KASAN works.
@@ -150,24 +138,18 @@ inaccessible memory like redzones or freed memory (see mm/kasan/kasan.h).
In the report above the arrows point to the shadow byte 03, which means that
the accessed address is partially accessible.
For tag-based KASAN this last report section shows the memory tags around the
accessed address (see Implementation details section).
Implementation details
----------------------
Generic KASAN
~~~~~~~~~~~~~
From a high level, our approach to memory error detection is similar to that
of kmemcheck: use shadow memory to record whether each byte of memory is safe
to access, and use compile-time instrumentation to insert checks of shadow
memory on each memory access.
to access, and use compile-time instrumentation to check shadow memory on each
memory access.
Generic KASAN dedicates 1/8th of kernel memory to its shadow memory (e.g. 16TB
to cover 128TB on x86_64) and uses direct mapping with a scale and offset to
translate a memory address to its corresponding shadow address.
AddressSanitizer dedicates 1/8 of kernel memory to its shadow memory
(e.g. 16TB to cover 128TB on x86_64) and uses direct mapping with a scale and
offset to translate a memory address to its corresponding shadow address.
Here is the function which translates an address to its corresponding shadow
address::
@@ -180,38 +162,12 @@ address::
where ``KASAN_SHADOW_SCALE_SHIFT = 3``.
Compile-time instrumentation is used to insert memory access checks. Compiler
inserts function calls (__asan_load*(addr), __asan_store*(addr)) before each
memory access of size 1, 2, 4, 8 or 16. These functions check whether memory
access is valid or not by checking corresponding shadow memory.
Compile-time instrumentation used for checking memory accesses. Compiler inserts
function calls (__asan_load*(addr), __asan_store*(addr)) before each memory
access of size 1, 2, 4, 8 or 16. These functions check whether memory access is
valid or not by checking corresponding shadow memory.
GCC 5.0 has possibility to perform inline instrumentation. Instead of making
function calls GCC directly inserts the code to check the shadow memory.
This option significantly enlarges kernel but it gives x1.1-x2 performance
boost over outline instrumented kernel.
Software tag-based KASAN
~~~~~~~~~~~~~~~~~~~~~~~~
Tag-based KASAN uses the Top Byte Ignore (TBI) feature of modern arm64 CPUs to
store a pointer tag in the top byte of kernel pointers. Like generic KASAN it
uses shadow memory to store memory tags associated with each 16-byte memory
cell (therefore it dedicates 1/16th of the kernel memory for shadow memory).
On each memory allocation tag-based KASAN generates a random tag, tags the
allocated memory with this tag, and embeds this tag into the returned pointer.
Software tag-based KASAN uses compile-time instrumentation to insert checks
before each memory access. These checks make sure that tag of the memory that
is being accessed is equal to tag of the pointer that is used to access this
memory. In case of a tag mismatch tag-based KASAN prints a bug report.
Software tag-based KASAN also has two instrumentation modes (outline, that
emits callbacks to check memory accesses; and inline, that performs the shadow
memory checks inline). With outline instrumentation mode, a bug report is
simply printed from the function that performs the access check. With inline
instrumentation a brk instruction is emitted by the compiler, and a dedicated
brk handler is used to print bug reports.
A potential expansion of this mode is a hardware tag-based mode, which would
use hardware memory tagging support instead of compiler instrumentation and
manual shadow memory manipulation.

View File

@@ -34,7 +34,6 @@ Profiling data will only become accessible once debugfs has been mounted::
Coverage collection
-------------------
The following program demonstrates coverage collection from within a test
program using kcov:
@@ -129,7 +128,6 @@ only need to enable coverage (disable happens automatically on thread end).
Comparison operands collection
------------------------------
Comparison operands collection is similar to coverage collection:
.. code-block:: c
@@ -204,130 +202,3 @@ Comparison operands collection is similar to coverage collection:
Note that the kcov modes (coverage collection or comparison operands) are
mutually exclusive.
Remote coverage collection
--------------------------
With KCOV_ENABLE coverage is collected only for syscalls that are issued
from the current process. With KCOV_REMOTE_ENABLE it's possible to collect
coverage for arbitrary parts of the kernel code, provided that those parts
are annotated with kcov_remote_start()/kcov_remote_stop().
This allows to collect coverage from two types of kernel background
threads: the global ones, that are spawned during kernel boot in a limited
number of instances (e.g. one USB hub_event() worker thread is spawned per
USB HCD); and the local ones, that are spawned when a user interacts with
some kernel interface (e.g. vhost workers).
To enable collecting coverage from a global background thread, a unique
global handle must be assigned and passed to the corresponding
kcov_remote_start() call. Then a userspace process can pass a list of such
handles to the KCOV_REMOTE_ENABLE ioctl in the handles array field of the
kcov_remote_arg struct. This will attach the used kcov device to the code
sections, that are referenced by those handles.
Since there might be many local background threads spawned from different
userspace processes, we can't use a single global handle per annotation.
Instead, the userspace process passes a non-zero handle through the
common_handle field of the kcov_remote_arg struct. This common handle gets
saved to the kcov_handle field in the current task_struct and needs to be
passed to the newly spawned threads via custom annotations. Those threads
should in turn be annotated with kcov_remote_start()/kcov_remote_stop().
Internally kcov stores handles as u64 integers. The top byte of a handle
is used to denote the id of a subsystem that this handle belongs to, and
the lower 4 bytes are used to denote the id of a thread instance within
that subsystem. A reserved value 0 is used as a subsystem id for common
handles as they don't belong to a particular subsystem. The bytes 4-7 are
currently reserved and must be zero. In the future the number of bytes
used for the subsystem or handle ids might be increased.
When a particular userspace proccess collects coverage by via a common
handle, kcov will collect coverage for each code section that is annotated
to use the common handle obtained as kcov_handle from the current
task_struct. However non common handles allow to collect coverage
selectively from different subsystems.
.. code-block:: c
struct kcov_remote_arg {
__u32 trace_mode;
__u32 area_size;
__u32 num_handles;
__aligned_u64 common_handle;
__aligned_u64 handles[0];
};
#define KCOV_INIT_TRACE _IOR('c', 1, unsigned long)
#define KCOV_DISABLE _IO('c', 101)
#define KCOV_REMOTE_ENABLE _IOW('c', 102, struct kcov_remote_arg)
#define COVER_SIZE (64 << 10)
#define KCOV_TRACE_PC 0
#define KCOV_SUBSYSTEM_COMMON (0x00ull << 56)
#define KCOV_SUBSYSTEM_USB (0x01ull << 56)
#define KCOV_SUBSYSTEM_MASK (0xffull << 56)
#define KCOV_INSTANCE_MASK (0xffffffffull)
static inline __u64 kcov_remote_handle(__u64 subsys, __u64 inst)
{
if (subsys & ~KCOV_SUBSYSTEM_MASK || inst & ~KCOV_INSTANCE_MASK)
return 0;
return subsys | inst;
}
#define KCOV_COMMON_ID 0x42
#define KCOV_USB_BUS_NUM 1
int main(int argc, char **argv)
{
int fd;
unsigned long *cover, n, i;
struct kcov_remote_arg *arg;
fd = open("/sys/kernel/debug/kcov", O_RDWR);
if (fd == -1)
perror("open"), exit(1);
if (ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE))
perror("ioctl"), exit(1);
cover = (unsigned long*)mmap(NULL, COVER_SIZE * sizeof(unsigned long),
PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if ((void*)cover == MAP_FAILED)
perror("mmap"), exit(1);
/* Enable coverage collection via common handle and from USB bus #1. */
arg = calloc(1, sizeof(*arg) + sizeof(uint64_t));
if (!arg)
perror("calloc"), exit(1);
arg->trace_mode = KCOV_TRACE_PC;
arg->area_size = COVER_SIZE;
arg->num_handles = 1;
arg->common_handle = kcov_remote_handle(KCOV_SUBSYSTEM_COMMON,
KCOV_COMMON_ID);
arg->handles[0] = kcov_remote_handle(KCOV_SUBSYSTEM_USB,
KCOV_USB_BUS_NUM);
if (ioctl(fd, KCOV_REMOTE_ENABLE, arg))
perror("ioctl"), free(arg), exit(1);
free(arg);
/*
* Here the user needs to trigger execution of a kernel code section
* that is either annotated with the common handle, or to trigger some
* activity on USB bus #1.
*/
sleep(2);
n = __atomic_load_n(&cover[0], __ATOMIC_RELAXED);
for (i = 0; i < n; i++)
printf("0x%lx\n", cover[i + 1]);
if (ioctl(fd, KCOV_DISABLE, 0))
perror("ioctl"), exit(1);
if (munmap(cover, COVER_SIZE * sizeof(unsigned long)))
perror("munmap"), exit(1);
if (close(fd))
perror("close"), exit(1);
return 0;
}

View File

@@ -1,99 +0,0 @@
dm_bow (backup on write)
========================
dm_bow is a device mapper driver that uses the free space on a device to back up
data that is overwritten. The changes can then be committed by a simple state
change, or rolled back by removing the dm_bow device and running a command line
utility over the underlying device.
dm_bow has three states, set by writing 1 or 2 to /sys/block/dm-?/bow/state.
It is only possible to go from state 0 (initial state) to state 1, and then from
state 1 to state 2.
State 0: dm_bow collects all trims to the device and assumes that these mark
free space on the overlying file system that can be safely used. Typically the
mount code would create the dm_bow device, mount the file system, call the
FITRIM ioctl on the file system then switch to state 1. These trims are not
propagated to the underlying device.
State 1: All writes to the device cause the underlying data to be backed up to
the free (trimmed) area as needed in such a way as they can be restored.
However, the writes, with one exception, then happen exactly as they would
without dm_bow, so the device is always in a good final state. The exception is
that sector 0 is used to keep a log of the latest changes, both to indicate that
we are in this state and to allow rollback. See below for all details. If there
isn't enough free space, writes are failed with -ENOSPC.
State 2: The transition to state 2 triggers replacing the special sector 0 with
the normal sector 0, and the freeing of all state information. dm_bow then
becomes a pass-through driver, allowing the device to continue to be used with
minimal performance impact.
Usage
=====
dm-bow takes one command line parameter, the name of the underlying device.
dm-bow will typically be used in the following way. dm-bow will be loaded with a
suitable underlying device and the resultant device will be mounted. A file
system trim will be issued via the FITRIM ioctl, then the device will be
switched to state 1. The file system will now be used as normal. At some point,
the changes can either be committed by switching to state 2, or rolled back by
unmounting the file system, removing the dm-bow device and running the command
line utility. Note that rebooting the device will be equivalent to unmounting
and removing, but the command line utility must still be run
Details of operation in state 1
===============================
dm_bow maintains a type for all sectors. A sector can be any of:
SECTOR0
SECTOR0_CURRENT
UNCHANGED
FREE
CHANGED
BACKUP
SECTOR0 is the first sector on the device, and is used to hold the log of
changes. This is the one exception.
SECTOR0_CURRENT is a sector picked from the FREE sectors, and is where reads and
writes from the true sector zero are redirected to. Note that like any backup
sector, if the sector is written to directly, it must be moved again.
UNCHANGED means that the sector has not been changed since we entered state 1.
Thus if it is written to or trimmed, the contents must first be backed up.
FREE means that the sector was trimmed in state 0 and has not yet been written
to or used for backup. On being written to, a FREE sector is changed to CHANGED.
CHANGED means that the sector has been modified, and can be further modified
without further backup.
BACKUP means that this is a free sector being used as a backup. On being written
to, the contents must first be backed up again.
All backup operations are logged to the first sector. The log sector has the
format:
--------------------------------------------------------
| Magic | Count | Sequence | Log entry | Log entry | …
--------------------------------------------------------
Magic is a magic number. Count is the number of log entries. Sequence is 0
initially. A log entry is
-----------------------------------
| Source | Dest | Size | Checksum |
-----------------------------------
When SECTOR0 is full, the log sector is backed up and another empty log sector
created with sequence number one higher. The first entry in any log entry with
sequence > 0 therefore must be the log of the backing up of the previous log
sector. Note that sequence is not strictly needed, but is a useful sanity check
and potentially limits the time spent trying to restore a corrupted snapshot.
On entering state 1, dm_bow has a list of free sectors. All other sectors are
unchanged. Sector0_current is selected from the free sectors and the contents of
sector 0 are copied there. The sector 0 is backed up, which triggers the first
log entry to be written.

View File

@@ -146,13 +146,6 @@ block_size:number
Supported values are 512, 1024, 2048 and 4096 bytes. If not
specified the default block size is 512 bytes.
legacy_recalculate
Allow recalculating of volumes with HMAC keys. This is disabled by
default for security reasons - an attacker could modify the volume,
set recalc_sector to zero, and the kernel would not detect the
modification.
The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can
be changed when reloading the target (load an inactive table and swap the
tables with suspend and resume). The other arguments should not be changed

View File

@@ -75,15 +75,6 @@ its hardware characteristcs.
* port or ports: same as above.
* Optional properties for all components:
* arm,coresight-loses-context-with-cpu : boolean. Indicates that the
hardware will lose register context on CPU power down (e.g. CPUIdle).
An example of where this may be needed are systems which contain a
coresight component and CPU in the same power domain. When the CPU
powers down the coresight component also powers down and loses its
context. This property is currently only used for the ETM 4.x driver.
* Optional properties for ETM/PTMs:
* arm,cp14: must be present if the system accesses ETM/PTM management

View File

@@ -164,10 +164,6 @@ Rockchip platforms device tree bindings
Required root node properties:
- compatible = "rockchip,px3-evb", "rockchip,px3", "rockchip,rk3188";
- Rockchip PX30 Evaluation board (Linux):
Required root node properties:
- compatible = "rockchip,px30-evb-ddr3-v11-linux", "rockchip,px30";
- Rockchip PX5 Evaluation board:
Required root node properties:
- compatible = "rockchip,px5-evb", "rockchip,px5", "rockchip,rk3368";
@@ -191,17 +187,10 @@ Rockchip platforms device tree bindings
- Rockchip RK3229 Evaluation board:
- compatible = "rockchip,rk3229-evb", "rockchip,rk3229";
- Rockchip RK3229 Echo board:
- compatible = "rockchip,rk3229-echo", "rockchip,rk3229";
- Rockchip RK3288 Fennec board:
Required root node properties:
- compatible = "rockchip,rk3288-fennec", "rockchip,rk3288";
- Rockchip RK3326 f863 rkisp1 board:
Required root node properties:
- compatible = "rockchip,rk3326-863-lp3-v10-rkisp1", "rockchip,rk3326";
- Rockchip RK3328 evb:
Required root node properties:
- compatible = "rockchip,rk3328-evb", "rockchip,rk3328";
@@ -210,11 +199,6 @@ Rockchip platforms device tree bindings
Required root node properties:
- compatible = "rockchip,rk3399-evb", "rockchip,rk3399";
- Rockchip RK3399 evb ind board (Linux):
Required root node properties:
- compatible = "rockchip,linux", "rockchip,rk3399";
- compatible = "rockchip,rk3399-evb-ind-lpddr4-linux";
- Rockchip RK3399 Sapphire board standalone:
Required root node properties:
- compatible = "rockchip,rk3399-sapphire", "rockchip,rk3399";
@@ -223,15 +207,6 @@ Rockchip platforms device tree bindings
Required root node properties:
- compatible = "rockchip,rk3399-sapphire-excavator", "rockchip,rk3399";
- Rockchip RK3399pro evb:
Required root node properties:
- compatible = "rockchip,rk3399pro-evb-v10", "rockchip,rk3399pro";
- compatible = "rockchip,rk3399pro-evb-v10-linux", "rockchip,rk3399pro";
- compatible = "rockchip,rk3399pro-evb-v11-linux", "rockchip,rk3399pro";
- compatible = "rockchip,rk3399pro-evb-v13-linux", "rockchip,rk3399pro";
- compatible = "rockchip,rk3399pro-evb-v13-multi-cam", "rockchip,rk3399pro";
- compatible = "rockchip,rk3399pro-evb-v14-linux", "rockchip,rk3399pro";
- Theobroma Systems RK3368-uQ7 Haikou Baseboard:
Required root node properties:
- compatible = "tsd,rk3368-uq7-haikou", "rockchip,rk3368";
@@ -243,7 +218,3 @@ Rockchip platforms device tree bindings
- Tronsmart Orion R68 Meta
Required root node properties:
- compatible = "tronsmart,orion-r68-meta", "rockchip,rk3368";
- Rockchip RK3568 NVR Demo V10:
Required root node properties:
- compatible = "rockchip,rk3568-nvr-demo-ddr4-v10-linux-spi-nand", "rockchip,rk3568";

View File

@@ -35,7 +35,6 @@ Required standard properties:
"ti,sysc-omap3-sham"
"ti,sysc-omap-aes"
"ti,sysc-mcasp"
"ti,sysc-dra7-mcasp"
"ti,sysc-usb-host-fs"
"ti,sysc-dra7-mcan"

View File

@@ -46,7 +46,7 @@ Required properties:
Example (R-Car H3):
usb2_clksel: clock-controller@e6590630 {
compatible = "renesas,r8a7795-rcar-usb2-clock-sel",
compatible = "renesas,r8a77950-rcar-usb2-clock-sel",
"renesas,rcar-gen3-usb2-clock-sel";
reg = <0 0xe6590630 0 0x02>;
clocks = <&cpg CPG_MOD 703>, <&usb_extal>, <&usb_xtal>;

View File

@@ -1,9 +1,7 @@
Rockchip Electronics And Security Accelerator
Required properties:
- compatible: should be one of the following:
- "rockchip,rk3288-crypto": for rk3288
- "rockchip,px30-crypto": for px30
- compatible: Should be "rockchip,rk3288-crypto"
- reg: Base physical address of the engine and length of memory mapped
region
- interrupts: Interrupt number

View File

@@ -1,22 +1,8 @@
* Rockchip DFI device
* Rockchip rk3399 DFI device
Required properties:
- compatible: Should be one of the following.
- "rockchip,px30-dfi" - for PX30 SoCs.
- "rockchip,rk1808-dfi" - for RK1808 SoCs.
- "rockchip,rk3128-dfi" - for RK3128 SoCs.
- "rockchip,rk3288-dfi" - for RK3288 SoCs.
- "rockchip,rk3328-dfi" - for RK3328 SoCs.
- "rockchip,rk3368-dfi" - for RK3368 SoCs.
- "rockchip,rk3399-dfi" - for RK3399 SoCs.
- "rockchip,rk3568-dfi" - for RK3568 SoCs.
- "rockchip,rv1126-dfi" - for RV1126 SoCs.
Required properties for RK3368:
- rockchip,grf: phandle to the syscon managing the "general register files"
Required properties for RK3399:
- compatible: Must be "rockchip,rk3399-dfi".
- reg: physical base address of each DFI and length of memory mapped region
- rockchip,pmu: phandle to the syscon managing the "pmu general register files"
- clocks: phandles for clock specified in "clock-names" property

View File

@@ -0,0 +1,213 @@
* Rockchip rk3399 DMC (Dynamic Memory Controller) device
Required properties:
- compatible: Must be "rockchip,rk3399-dmc".
- devfreq-events: Node to get DDR loading, Refer to
Documentation/devicetree/bindings/devfreq/event/
rockchip-dfi.txt
- clocks: Phandles for clock specified in "clock-names" property
- clock-names : The name of clock used by the DFI, must be
"pclk_ddr_mon";
- operating-points-v2: Refer to Documentation/devicetree/bindings/opp/opp.txt
for details.
- center-supply: DMC supply node.
- status: Marks the node enabled/disabled.
Optional properties:
- interrupts: The CPU interrupt number. The interrupt specifier
format depends on the interrupt controller.
It should be a DCF interrupt. When DDR DVFS finishes
a DCF interrupt is triggered.
Following properties relate to DDR timing:
- rockchip,dram_speed_bin : Value reference include/dt-bindings/clock/rk3399-ddr.h,
it selects the DDR3 cl-trp-trcd type. It must be
set according to "Speed Bin" in DDR3 datasheet,
DO NOT use a smaller "Speed Bin" than specified
for the DDR3 being used.
- rockchip,pd_idle : Configure the PD_IDLE value. Defines the
power-down idle period in which memories are
placed into power-down mode if bus is idle
for PD_IDLE DFI clock cycles.
- rockchip,sr_idle : Configure the SR_IDLE value. Defines the
self-refresh idle period in which memories are
placed into self-refresh mode if bus is idle
for SR_IDLE * 1024 DFI clock cycles (DFI
clocks freq is half of DRAM clock), default
value is "0".
- rockchip,sr_mc_gate_idle : Defines the memory self-refresh and controller
clock gating idle period. Memories are placed
into self-refresh mode and memory controller
clock arg gating started if bus is idle for
sr_mc_gate_idle*1024 DFI clock cycles.
- rockchip,srpd_lite_idle : Defines the self-refresh power down idle
period in which memories are placed into
self-refresh power down mode if bus is idle
for srpd_lite_idle * 1024 DFI clock cycles.
This parameter is for LPDDR4 only.
- rockchip,standby_idle : Defines the standby idle period in which
memories are placed into self-refresh mode.
The controller, pi, PHY and DRAM clock will
be gated if bus is idle for standby_idle * DFI
clock cycles.
- rockchip,dram_dll_dis_freq : Defines the DDR3 DLL bypass frequency in MHz.
When DDR frequency is less than DRAM_DLL_DISB_FREQ,
DDR3 DLL will be bypassed. Note: if DLL was bypassed,
the odt will also stop working.
- rockchip,phy_dll_dis_freq : Defines the PHY dll bypass frequency in
MHz (Mega Hz). When DDR frequency is less than
DRAM_DLL_DISB_FREQ, PHY DLL will be bypassed.
Note: PHY DLL and PHY ODT are independent.
- rockchip,ddr3_odt_dis_freq : When the DRAM type is DDR3, this parameter defines
the ODT disable frequency in MHz (Mega Hz).
when the DDR frequency is less then ddr3_odt_dis_freq,
the ODT on the DRAM side and controller side are
both disabled.
- rockchip,ddr3_drv : When the DRAM type is DDR3, this parameter defines
the DRAM side driver strength in ohms. Default
value is DDR3_DS_40ohm.
- rockchip,ddr3_odt : When the DRAM type is DDR3, this parameter defines
the DRAM side ODT strength in ohms. Default value
is DDR3_ODT_120ohm.
- rockchip,phy_ddr3_ca_drv : When the DRAM type is DDR3, this parameter defines
the phy side CA line (incluing command line,
address line and clock line) driver strength.
Default value is PHY_DRV_ODT_40.
- rockchip,phy_ddr3_dq_drv : When the DRAM type is DDR3, this parameter defines
the PHY side DQ line (including DQS/DQ/DM line)
driver strength. Default value is PHY_DRV_ODT_40.
- rockchip,phy_ddr3_odt : When the DRAM type is DDR3, this parameter defines
the PHY side ODT strength. Default value is
PHY_DRV_ODT_240.
- rockchip,lpddr3_odt_dis_freq : When the DRAM type is LPDDR3, this parameter defines
then ODT disable frequency in MHz (Mega Hz).
When DDR frequency is less then ddr3_odt_dis_freq,
the ODT on the DRAM side and controller side are
both disabled.
- rockchip,lpddr3_drv : When the DRAM type is LPDDR3, this parameter defines
the DRAM side driver strength in ohms. Default
value is LP3_DS_34ohm.
- rockchip,lpddr3_odt : When the DRAM type is LPDDR3, this parameter defines
the DRAM side ODT strength in ohms. Default value
is LP3_ODT_240ohm.
- rockchip,phy_lpddr3_ca_drv : When the DRAM type is LPDDR3, this parameter defines
the PHY side CA line (including command line,
address line and clock line) driver strength.
Default value is PHY_DRV_ODT_40.
- rockchip,phy_lpddr3_dq_drv : When the DRAM type is LPDDR3, this parameter defines
the PHY side DQ line (including DQS/DQ/DM line)
driver strength. Default value is
PHY_DRV_ODT_40.
- rockchip,phy_lpddr3_odt : When dram type is LPDDR3, this parameter define
the phy side odt strength, default value is
PHY_DRV_ODT_240.
- rockchip,lpddr4_odt_dis_freq : When the DRAM type is LPDDR4, this parameter
defines the ODT disable frequency in
MHz (Mega Hz). When the DDR frequency is less then
ddr3_odt_dis_freq, the ODT on the DRAM side and
controller side are both disabled.
- rockchip,lpddr4_drv : When the DRAM type is LPDDR4, this parameter defines
the DRAM side driver strength in ohms. Default
value is LP4_PDDS_60ohm.
- rockchip,lpddr4_dq_odt : When the DRAM type is LPDDR4, this parameter defines
the DRAM side ODT on DQS/DQ line strength in ohms.
Default value is LP4_DQ_ODT_40ohm.
- rockchip,lpddr4_ca_odt : When the DRAM type is LPDDR4, this parameter defines
the DRAM side ODT on CA line strength in ohms.
Default value is LP4_CA_ODT_40ohm.
- rockchip,phy_lpddr4_ca_drv : When the DRAM type is LPDDR4, this parameter defines
the PHY side CA line (including command address
line) driver strength. Default value is
PHY_DRV_ODT_40.
- rockchip,phy_lpddr4_ck_cs_drv : When the DRAM type is LPDDR4, this parameter defines
the PHY side clock line and CS line driver
strength. Default value is PHY_DRV_ODT_80.
- rockchip,phy_lpddr4_dq_drv : When the DRAM type is LPDDR4, this parameter defines
the PHY side DQ line (including DQS/DQ/DM line)
driver strength. Default value is PHY_DRV_ODT_80.
- rockchip,phy_lpddr4_odt : When the DRAM type is LPDDR4, this parameter defines
the PHY side ODT strength. Default value is
PHY_DRV_ODT_60.
Example:
dmc_opp_table: dmc_opp_table {
compatible = "operating-points-v2";
opp00 {
opp-hz = /bits/ 64 <300000000>;
opp-microvolt = <900000>;
};
opp01 {
opp-hz = /bits/ 64 <666000000>;
opp-microvolt = <900000>;
};
};
dmc: dmc {
compatible = "rockchip,rk3399-dmc";
devfreq-events = <&dfi>;
interrupts = <GIC_SPI 1 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cru SCLK_DDRCLK>;
clock-names = "dmc_clk";
operating-points-v2 = <&dmc_opp_table>;
center-supply = <&ppvar_centerlogic>;
upthreshold = <15>;
downdifferential = <10>;
rockchip,ddr3_speed_bin = <21>;
rockchip,pd_idle = <0x40>;
rockchip,sr_idle = <0x2>;
rockchip,sr_mc_gate_idle = <0x3>;
rockchip,srpd_lite_idle = <0x4>;
rockchip,standby_idle = <0x2000>;
rockchip,dram_dll_dis_freq = <300>;
rockchip,phy_dll_dis_freq = <125>;
rockchip,auto_pd_dis_freq = <666>;
rockchip,ddr3_odt_dis_freq = <333>;
rockchip,ddr3_drv = <DDR3_DS_40ohm>;
rockchip,ddr3_odt = <DDR3_ODT_120ohm>;
rockchip,phy_ddr3_ca_drv = <PHY_DRV_ODT_40>;
rockchip,phy_ddr3_dq_drv = <PHY_DRV_ODT_40>;
rockchip,phy_ddr3_odt = <PHY_DRV_ODT_240>;
rockchip,lpddr3_odt_dis_freq = <333>;
rockchip,lpddr3_drv = <LP3_DS_34ohm>;
rockchip,lpddr3_odt = <LP3_ODT_240ohm>;
rockchip,phy_lpddr3_ca_drv = <PHY_DRV_ODT_40>;
rockchip,phy_lpddr3_dq_drv = <PHY_DRV_ODT_40>;
rockchip,phy_lpddr3_odt = <PHY_DRV_ODT_240>;
rockchip,lpddr4_odt_dis_freq = <333>;
rockchip,lpddr4_drv = <LP4_PDDS_60ohm>;
rockchip,lpddr4_dq_odt = <LP4_DQ_ODT_40ohm>;
rockchip,lpddr4_ca_odt = <LP4_CA_ODT_40ohm>;
rockchip,phy_lpddr4_ca_drv = <PHY_DRV_ODT_40>;
rockchip,phy_lpddr4_ck_cs_drv = <PHY_DRV_ODT_80>;
rockchip,phy_lpddr4_dq_drv = <PHY_DRV_ODT_80>;
rockchip,phy_lpddr4_odt = <PHY_DRV_ODT_60>;
};

View File

@@ -5,9 +5,7 @@ Required properties for dp-controller:
platform specific such as:
* "samsung,exynos5-dp"
* "rockchip,rk3288-dp"
* "rockchip,rk3368-edp"
* "rockchip,rk3399-edp"
* "rockchip,rk3568-edp"
-reg:
physical base address of the controller and length
of memory mapped region.
@@ -23,8 +21,6 @@ Required properties for dp-controller:
from general PHY binding: Should be "dp".
Optional properties for dp-controller:
-analogix,video-bist-enable:
Enable video bist pattern for DP_TX debugging.
-force-hpd:
Indicate driver need force hpd when hpd detect failed, this
is used for some eDP screen which don't have hpd signal.

View File

@@ -16,9 +16,6 @@ Required properties:
Documentation/devicetree/bindings/graph.txt. This port should be connected
to the input port of an attached HDMI or LVDS encoder chip.
Optional properties:
- pinctrl-names: Contain "default" and "sleep".
Example:
dpi0: dpi@1401d000 {
@@ -29,9 +26,6 @@ dpi0: dpi@1401d000 {
<&mmsys CLK_MM_DPI_ENGINE>,
<&apmixedsys CLK_APMIXED_TVDPLL>;
clock-names = "pixel", "engine", "pll";
pinctrl-names = "default", "sleep";
pinctrl-0 = <&dpi_pin_func>;
pinctrl-1 = <&dpi_pin_idle>;
port {
dpi0_out: endpoint {

View File

@@ -1,9 +0,0 @@
Armadeus ST0700 Adapt. A Santek ST0700I5Y-RBSLW 7.0" WVGA (800x480) TFT with
an adapter board.
Required properties:
- compatible: "armadeus,st0700-adapt"
- power-supply: see panel-common.txt
Optional properties:
- backlight: see panel-common.txt

View File

@@ -5,53 +5,12 @@ panel node
----------
Required properties:
- compatible: Should contain one of the following:
- "simple-panel": for common simple panel
- "simple-panel-dsi": for common simple dsi panel
- "vendor,panel": for vendor specific panel
- power-supply: See panel-common.txt
Optional properties:
- vsp-supply: positive voltage supply
- vsn-supply: negative voltage supply
- ddc-i2c-bus: phandle of an I2C controller used for DDC EDID probing
- enable-gpios: GPIO pin to enable or disable the panel
- reset-gpios: GPIO pin to reset the panel
- backlight: phandle of the backlight device attached to the panel
- prepare-delay-ms: the time (in milliseconds) that it takes for the panel to
become ready and start receiving video data
- enable-delay-ms: the time (in milliseconds) that it takes for the panel to
display the first valid frame after starting to receive
video data
- disable-delay-ms: the time (in milliseconds) that it takes for the panel to
turn the display off (no content is visible)
- unprepare-delay-ms: the time (in milliseconds) that it takes for the panel
to power itself down completely
- reset-delay-ms: the time (in milliseconds) that it takes for the panel to
reset itself completely
- init-delay-ms: the time (in milliseconds) that it takes for the panel to
send init command sequence after reset deassert
- width-mm: width (in millimeters) of the panel's active display area
- height-mm: height (in millimeters) of the panel's active display area
- bpc: bits per color/component
- bus-format: Pixel data format on the wire
- dsi,lanes: number of active data lanes
- dsi,format: pixel format for video mode
- dsi,flags: DSI operation mode related flags
- panel-init-sequence:
- panel-exit-sequence:
A byte stream formed by simple multiple dcs packets.
byte 0: dcs data type
byte 1: wait number of specified ms after dcs command transmitted
byte 2: packet payload length
byte 3 and beyond: number byte of payload
- power-invert: power invert control
- rockchip,cmd-type: default is DSI cmd, or "spi", "mcu" cmd type
- spi-sdi: spi init panel for spi-sdi io
- spi-scl: spi init panel for spi-scl io
- spi-cs: spi init pael for spi-cs io
Example:

View File

@@ -3,9 +3,7 @@ Rockchip RK3288 specific extensions to the Analogix Display Port
Required properties:
- compatible: "rockchip,rk3288-dp",
"rockchip,rk3368-edp",
"rockchip,rk3399-edp";
"rockchip,rk3568-edp";
- reg: physical base address of the controller and length
@@ -23,6 +21,8 @@ Required properties:
- reset-names: Must include the name "dp"
- rockchip,grf: this soc should set GRF regs, so need get grf here.
- ports: there are 2 port nodes with endpoint definitions as defined in
Documentation/devicetree/bindings/media/video-interfaces.txt.
Port 0: contained 2 endpoints, connecting to the output of vop.
@@ -33,11 +33,6 @@ Optional property for different chips:
- clock-names: from common clock binding:
Required elements: "grf"
- rockchip,grf: this soc should set GRF regs, so need get grf here.
Optional properties
- reset-names: "apb" must control the reset line to the APB interface.
- vcc-supply: Regulator for eDP_AVDD_1V0.
- vccio-supply: Regulator for eDP_AVDD_1V8.
For the below properties, please refer to Analogix DP binding document:
* Documentation/devicetree/bindings/display/bridge/analogix_dp.txt

View File

@@ -12,10 +12,7 @@ following device-specific properties.
Required properties:
- compatible: should be one of the following:
"rockchip,rk3228-dw-hdmi"
"rockchip,rk3288-dw-hdmi"
"rockchip,rk3328-dw-hdmi"
"rockchip,rk3368-dw-hdmi"
"rockchip,rk3399-dw-hdmi"
- reg: See dw_hdmi.txt.
- reg-io-width: See dw_hdmi.txt. Shall be 4.
@@ -26,7 +23,6 @@ Required properties:
corresponding to the video input of the controller. The port shall have two
endpoints, numbered 0 and 1, connected respectively to the vopb and vopl.
- rockchip,grf: Shall reference the GRF to mux vopl/vopb.
- rockchip,phy-table: the parameter table of hdmi phy configuration.
Optional properties
@@ -38,13 +34,6 @@ Optional properties
- clock-names: May contain "cec" as defined in dw_hdmi.txt.
- clock-names: May contain "grf", power for grf io.
- clock-names: May contain "vpll", external clock for some hdmi phy.
- phys: from general PHY binding: the phandle for the PHY device.
- phy-names: Should be "hdmi" if phys references an external phy.
- max-tmdsclk: hdmi max tmdsclk, if this property is not set, set max tmdsclk to 594M.
- unsupported-yuv-input: If yuv input is not supported, set this property.
- unsupported-deep-color: If deep color mode is not supported, set this property.
- hdcp1x-enable: enable hdcp1.x, enable if defined, disable if not defined
- scramble-low-rates: if defined enable scarmble when tmdsclk less than 340Mhz
Example:
@@ -71,9 +60,4 @@ hdmi: hdmi@ff980000 {
};
};
};
rockchip,phy-table = <74250000 0x8009 0x0004 0x0272>,
<165000000 0x802b 0x0004 0x0209>,
<297000000 0x8039 0x0005 0x028d>,
<594000000 0x8039 0x0000 0x019d>,
<000000000 0x0000 0x0000 0x0000>;
};

View File

@@ -4,44 +4,28 @@ Rockchip specific extensions to the Synopsys Designware MIPI DSI
Required properties:
- #address-cells: Should be <1>.
- #size-cells: Should be <0>.
- compatible: must be one of:
"rockchip,px30-mipi-dsi", "snps,dw-mipi-dsi".
"rockchip,rk1808-mipi-dsi", "snps,dw-mipi-dsi".
"rockchip,rk3128-mipi-dsi", "snps,dw-mipi-dsi".
"rockchip,rk3288-mipi-dsi", "snps,dw-mipi-dsi".
"rockchip,rk3368-mipi-dsi", "snps,dw-mipi-dsi".
"rockchip,rk3399-mipi-dsi", "snps,dw-mipi-dsi".
"rockchip,rk3568-mipi-dsi", "snps,dw-mipi-dsi".
"rockchip,rv1126-mipi-dsi", "snps,dw-mipi-dsi".
- compatible: "rockchip,rk3288-mipi-dsi", "snps,dw-mipi-dsi".
"rockchip,rk3399-mipi-dsi", "snps,dw-mipi-dsi".
- reg: Represent the physical address range of the controller.
- interrupts: Represent the controller's interrupt to the CPU(s).
- power-domains: a phandle to mipi dsi power domain node.
- resets: list of phandle + reset specifier pairs, as described in [3].
- reset-names: string reset name, must be "apb".
- clocks, clock-names: Phandles to the controller's APB clock(pclk),
As described in [1].
- clocks, clock-names: Phandles to the controller's pll reference
clock(ref) and APB clock(pclk). For RK3399, a phy config clock
(phy_cfg) and a grf clock(grf) are required. As described in [1].
- rockchip,grf: this soc should set GRF regs to mux vopl/vopb.
- ports: contain a port node with endpoint definitions as defined in [2].
For vopb,set the reg = <0> and set the reg = <1> for vopl.
Optional properties:
- clocks, clock-names:
phandle to the SNPS-PHY config clock, name should be "phy_cfg".
phandle to the SNPS-PHY PLL reference clock, name should be "ref".
phandle to the Non-SNPS PHY high speed clock, name should be "hs_clk".
- phys: phandle to Non-SNPS PHY node
- phy-names: the string "mipi_dphy" when is found in a node, along with "phys"
attribute, provides phandle to MIPI PHY node
- rockchip,lane-rate: optional override of the desired bandwidth.
- rockchip,dual-channel: for dual-channel mode, phandle to the slave channel.
- rockchip,data-swap: for dual-channel mode, swap two channel data of MIPI.
- power-domains: a phandle to mipi dsi power domain node.
- resets: list of phandle + reset specifier pairs, as described in [3].
- reset-names: string reset name, must be "apb".
[1] Documentation/devicetree/bindings/clock/clock-bindings.txt
[2] Documentation/devicetree/bindings/media/video-interfaces.txt
[3] Documentation/devicetree/bindings/reset/reset.txt
Example:
dsi: dsi@ff960000 {
mipi_dsi: mipi@ff960000 {
#address-cells = <1>;
#size-cells = <0>;
compatible = "rockchip,rk3288-mipi-dsi", "snps,dw-mipi-dsi";
@@ -58,18 +42,16 @@ Example:
#size-cells = <0>;
reg = <1>;
port {
mipi_in: port {
#address-cells = <1>;
#size-cells = <0>;
dsi_in_vopb: endpoint@0 {
mipi_in_vopb: endpoint@0 {
reg = <0>;
remote-endpoint = <&vopb_out_dsi>;
remote-endpoint = <&vopb_out_mipi>;
};
dsi_in_vopl: endpoint@1 {
mipi_in_vopl: endpoint@1 {
reg = <1>;
remote-endpoint = <&vopl_out_dsi>;
remote-endpoint = <&vopl_out_mipi>;
};
};
};

View File

@@ -11,21 +11,9 @@ Required properties:
of vop devices. vop definitions as defined in
Documentation/devicetree/bindings/display/rockchip/rockchip-vop.txt
Optional properties
- secure-memory-region: phandle to a node describing secure memory
- clocks: include clock specifiers corresponding to entries in the
clock-names property.
- clock-names: optional include
hdmi-tmds-pll: special pll required by some hdmi-vop design,
if there is no hdmi plug, also can reuse for
common display pll.
default-vop-pll: common display pll.
example:
display-subsystem {
compatible = "rockchip,display-subsystem";
ports = <&vopl_out>, <&vopb_out>;
clocks = <&cru PLL_VPLL>, <&cru PLL_CPLL>;
clock-names = "hdmi-tmds-pll", "default-vop-pll";
};

View File

@@ -1,19 +1,26 @@
Rockchip SoCs LVDS interface
Rockchip RK3288 LVDS interface
================================
Required properties:
- compatible: matching the soc type, one of
- "rockchip,px30-lvds",
- "rockchip,rk3126-lvds",
- "rockchip,rk3288-lvds",
- "rockchip,rk3368-lvds";
- "rockchip,rk3568-lvds";
- phys : phandle for the PHY device
- phy-names : should be "phy"
- "rockchip,rk3288-lvds";
- reg: physical base address of the controller and length
of memory mapped region.
- clocks: must include clock specifiers corresponding to entries in the
clock-names property.
- clock-names: must contain "pclk_lvds"
- avdd1v0-supply: regulator phandle for 1.0V analog power
- avdd1v8-supply: regulator phandle for 1.8V analog power
- avdd3v3-supply: regulator phandle for 3.3V analog power
- rockchip,grf: phandle to the general register files syscon
- rockchip,output: "rgb", "lvds" or "duallvds", This describes the output interface
Optional properties:
- dual-channel: boolean. if it exists, enable dual channel mode
- rockchip,data-swap: boolean to enable odd/even data swap in dual channel mode
- pinctrl-names: must contain a "lcdc" entry.
- pinctrl-0: pin control group to be used for this controller.
Required nodes:
@@ -25,36 +32,68 @@ Their connections are modeled using the OF graph bindings specified in
- video port 0 for the VOP input, the remote endpoint maybe vopb or vopl
- video port 1 for either a panel or subsequent encoder
the lvds panel described by
Documentation/devicetree/bindings/display/panel/simple-panel.txt
Panel required properties:
- ports for remote LVDS output
Panel optional properties:
- data-mapping: should be "vesa-24","jeida-24" or "jeida-18".
This describes decribed by:
Documentation/devicetree/bindings/display/panel/panel-lvds.txt
Example:
&grf {
status = "okay";
lvds_panel: lvds-panel {
compatible = "auo,b101ean01";
enable-gpios = <&gpio7 21 GPIO_ACTIVE_HIGH>;
data-mapping = "jeida-24";
lvds: lvds {
ports {
panel_in_lvds: endpoint {
remote-endpoint = <&lvds_out_panel>;
};
};
};
For Rockchip RK3288:
lvds: lvds@ff96c000 {
compatible = "rockchip,rk3288-lvds";
phys = <&video_phy>;
phy-names = "phy";
status = "disabled";
rockchip,grf = <&grf>;
reg = <0xff96c000 0x4000>;
clocks = <&cru PCLK_LVDS_PHY>;
clock-names = "pclk_lvds";
pinctrl-names = "lcdc";
pinctrl-0 = <&lcdc_ctl>;
avdd1v0-supply = <&vdd10_lcd>;
avdd1v8-supply = <&vcc18_lcd>;
avdd3v3-supply = <&vcca_33>;
rockchip,output = "rgb";
ports {
#address-cells = <1>;
#size-cells = <0>;
port@0 {
lvds_in: port@0 {
reg = <0>;
#address-cells = <1>;
#size-cells = <0>;
lvds_in_vopb: endpoint@0 {
reg = <0>;
remote-endpoint = <&vopb_out_lvds>;
};
lvds_in_vopl: endpoint@1 {
reg = <1>;
remote-endpoint = <&vopl_out_lvds>;
};
};
lvds_out: port@1 {
reg = <1>;
lvds_out_panel: endpoint {
remote-endpoint = <&panel_in_lvds>;
};
};
};
};
};

View File

@@ -7,16 +7,8 @@ buffer to an external LCD interface.
Required properties:
- compatible: value should be one of the following
"rockchip,rk3036-vop";
"rockchip,rk3066-vop";
"rockchip,rk3126-vop";
"rockchip,px30-vop-big";
"rockchip,px30-vop-lit";
"rockchip,rk3308-vop";
"rockchip,rk1808-vop-lit";
"rockchip,rk1808-vop-raw";
"rockchip,rv1126-vop";
"rockchip,rk3288-vop-lit";
"rockchip,rk3288-vop-big";
"rockchip,rk3288-vop";
"rockchip,rk3368-vop";
"rockchip,rk3366-vop";
"rockchip,rk3399-vop-big";
@@ -24,13 +16,6 @@ Required properties:
"rockchip,rk3228-vop";
"rockchip,rk3328-vop";
- reg: Address and length of the register set for the device.
- reg-names: The names of register regions. contain following regions:
- "regs" : (Required) Base address and size of the controllers.
- "gamma_lut" : (Optional) gamma function lut table registers,
take care of this register's length, driver would use
register's length to decide gamma table size.
- interrupts: should contain a list of all VOP IP block interrupts in the
order: VSYNC, LCD_SYSTEM. The interrupt specifier
format depends on the interrupt controller used.
@@ -42,7 +27,6 @@ Required properties:
aclk_vop: for ddr buffer transfer.
hclk_vop: for ahb bus to R/W the phy regs.
dclk_vop: pixel clock.
dclk_source: optional, dclk sources from display plls.
- resets: Must contain an entry for each entry in reset-names.
See ../reset/reset.txt for details.
@@ -55,14 +39,12 @@ Required properties:
- port: A port node with endpoint definitions as defined in
Documentation/devicetree/bindings/media/video-interfaces.txt.
- rockchip,dual-channel-swap: boolean, optional config for vop dual channel swap
Example:
SoC specific DT entry:
vopb: vopb@ff930000 {
compatible = "rockchip,rk3288-vop";
reg = <0xff930000 0x19c>, <0xff931000 0x1000>;
reg-names = "regs", "gamma_lut";
reg = <0xff930000 0x19c>;
interrupts = <GIC_SPI 15 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cru ACLK_VOP0>, <&cru DCLK_VOP0>, <&cru HCLK_VOP0>;
clock-names = "aclk_vop", "dclk_vop", "hclk_vop";

View File

@@ -27,7 +27,6 @@ Required properties:
"atmel,24c256",
"atmel,24c512",
"atmel,24c1024",
"atmel,24c2048",
If <manufacturer> is not "atmel", then a fallback must be used
with the same <model> and "atmel" as manufacturer.

View File

@@ -8,7 +8,6 @@ Required properties :
- reg : Offset and length of the register set for the device
- compatible: should be one of the following:
- "rockchip,rv1108-i2c": for rv1108
- "rockchip,rv1126-i2c": for rv1126
- "rockchip,rk3066-i2c": for rk3066
- "rockchip,rk3188-i2c": for rk3188
- "rockchip,rk3228-i2c": for rk3228

View File

@@ -7,7 +7,6 @@ Required properties:
- "rockchip,rk3328-saradc", "rockchip,rk3399-saradc": for rk3328
- "rockchip,rk3399-saradc": for rk3399
- "rockchip,rv1108-saradc", "rockchip,rk3399-saradc": for rv1108
- "rockchip,rk1808-saradc", "rockchip,rk3399-saradc": for rk1808
- reg: physical base address of the controller and length of memory mapped
region.

View File

@@ -11,13 +11,11 @@ New driver handles the following
Required properties:
- compatible: Must be "samsung,exynos-adc-v1"
for Exynos5250 controllers.
for exynos4412/5250 and s5pv210 controllers.
Must be "samsung,exynos-adc-v2" for
future controllers.
Must be "samsung,exynos3250-adc" for
controllers compatible with ADC of Exynos3250.
Must be "samsung,exynos4212-adc" for
controllers compatible with ADC of Exynos4212 and Exynos4412.
Must be "samsung,exynos7-adc" for
the ADC in Exynos7 and compatibles
Must be "samsung,s3c2410-adc" for
@@ -30,8 +28,6 @@ Required properties:
the ADC in s3c2443 and compatibles
Must be "samsung,s3c6410-adc" for
the ADC in s3c6410 and compatibles
Must be "samsung,s5pv210-adc" for
the ADC in s5pv210 and compatibles
- reg: List of ADC register address range
- The base address and range of ADC register
- The base address and range of ADC_PHY register (every

View File

@@ -21,7 +21,7 @@ controller state. The mux controller state is described in
Example:
mux: mux-controller {
compatible = "gpio-mux";
compatible = "mux-gpio";
#mux-control-cells = <0>;
mux-gpios = <&pioA 0 GPIO_ACTIVE_HIGH>,

View File

@@ -24,10 +24,6 @@ Optional properties:
- rockchip,disable-mmu-reset : Don't use the mmu reset operation.
Some mmu instances may produce unexpected results
when the reset operation is used.
- rk_iommu,disable_reset_quirk : Same with above, for compatible with previous code
- rockchip,skip-mmu-read : Some iommu instances are not able to be read, skip
reading operation to make iommu work as normal
Example:

View File

@@ -73,7 +73,7 @@ Example:
};
};
port@a {
port@10 {
reg = <10>;
adv7482_txa: endpoint {
@@ -83,7 +83,7 @@ Example:
};
};
port@b {
port@11 {
reg = <11>;
adv7482_txb: endpoint {

View File

@@ -62,10 +62,6 @@ Optional properties:
be referred to mmc-pwrseq-simple.txt. But now it's reused as a tunable delay
waiting for I/O signalling and card power supply to be stable, regardless of
whether pwrseq-simple is used. Default to 10ms if no available.
- supports-cqe : The presence of this property indicates that the corresponding
MMC host controller supports HW command queue feature.
- disable-cqe-dcmd: This property indicates that the MMC controller's command
queue engine (CQE) does not support direct commands (DCMDs).
*NOTE* on CD and WP polarity. To use common for all SD/MMC host controllers line
polarity properties, we have to fix the meaning of the "normal" and "inverted"

View File

@@ -19,9 +19,6 @@ Optional properties:
- interrupt-names: must be "mdio_done_error" when there is a share interrupt fed
to this hardware block, or must be "mdio_done" for the first interrupt and
"mdio_error" for the second when there are separate interrupts
- clocks: A reference to the clock supplying the MDIO bus controller
- clock-frequency: the MDIO bus clock that must be output by the MDIO bus
hardware, if absent, the default hardware values are used
Child nodes of this MDIO bus controller node are standard Ethernet PHY device
nodes as described in Documentation/devicetree/bindings/net/phy.txt

View File

@@ -17,7 +17,7 @@ Example:
reg = <1>;
clocks = <&clk32m>;
interrupt-parent = <&gpio4>;
interrupts = <13 IRQ_TYPE_LEVEL_HIGH>;
interrupts = <13 IRQ_TYPE_EDGE_RISING>;
vdd-supply = <&reg5v0>;
xceiver-supply = <&reg5v0>;
};

View File

@@ -4,7 +4,6 @@ Required properties:
- compatible: Should be one of the following:
- "microchip,mcp2510" for MCP2510.
- "microchip,mcp2515" for MCP2515.
- "microchip,mcp25625" for MCP25625.
- reg: SPI chip select.
- clocks: The clock feeding the CAN controller.
- interrupts: Should contain IRQ line for the CAN controller.

View File

@@ -110,13 +110,6 @@ PROPERTIES
Usage: required
Definition: See soc/fsl/qman.txt and soc/fsl/bman.txt
- fsl,erratum-a050385
Usage: optional
Value type: boolean
Definition: A boolean property. Indicates the presence of the
erratum A050385 which indicates that DMA transactions that are
split can result in a FMan lock.
=============================================================================
FMan MURAM Node

View File

@@ -16,7 +16,7 @@ Required properties:
Optional properties:
- interrupts: interrupt line number for the SMI error/done interrupt
- clocks: phandle for up to four required clocks for the MDIO instance
- clocks: phandle for up to three required clocks for the MDIO instance
The child nodes of the MDIO driver are the individual PHY devices
connected to this MDIO bus. They must have a "reg" property given the

View File

@@ -25,7 +25,7 @@ Example (for ARM-based BeagleBone with NPC100 NFC controller on I2C2):
clock-frequency = <100000>;
interrupt-parent = <&gpio1>;
interrupts = <29 IRQ_TYPE_LEVEL_HIGH>;
interrupts = <29 GPIO_ACTIVE_HIGH>;
enable-gpios = <&gpio0 30 GPIO_ACTIVE_HIGH>;
firmware-gpios = <&gpio0 31 GPIO_ACTIVE_HIGH>;

View File

@@ -25,7 +25,7 @@ Example (for ARM-based BeagleBone with PN544 on I2C2):
clock-frequency = <400000>;
interrupt-parent = <&gpio1>;
interrupts = <17 IRQ_TYPE_LEVEL_HIGH>;
interrupts = <17 GPIO_ACTIVE_HIGH>;
enable-gpios = <&gpio3 21 GPIO_ACTIVE_HIGH>;
firmware-gpios = <&gpio3 19 GPIO_ACTIVE_HIGH>;

View File

@@ -5,18 +5,14 @@ The device node has following properties.
Required properties:
- compatible: should be "rockchip,<name>-gamc"
"rockchip,px30-gmac": found on PX30 SoCs
"rockchip,rk1808-gmac": found on RK1808 SoCs
"rockchip,rk3128-gmac": found on RK312x SoCs
"rockchip,rk3228-gmac": found on RK322x SoCs
"rockchip,rk3288-gmac": found on RK3288 SoCs
"rockchip,rk3308-mac": found on RK3308 SoCs
"rockchip,rk3328-gmac": found on RK3328 SoCs
"rockchip,rk3366-gmac": found on RK3366 SoCs
"rockchip,rk3368-gmac": found on RK3368 SoCs
"rockchip,rk3399-gmac": found on RK3399 SoCs
"rockchip,rk3568-gmac": found on RK3568 SoCs
"rockchip,rv1108-gmac": found on RV1108 SoCs
"rockchip,rv1126-gmac": found on RV1126 SoCs
- reg: addresses and length of the register sets for the device.
- interrupts: Should contain the GMAC interrupts.
- interrupt-names: Should contain the interrupt names "macirq".

View File

@@ -2,13 +2,10 @@
Required properties:
- compatible: Should be one of the following.
- "rockchip,rk1808-efuse" - for RK1808 SoCs.
- "rockchip,rk3066a-efuse" - for RK3066a SoCs.
- "rockchip,rk3128-efuse" - for RK3128 SoCs.
- "rockchip,rk3188-efuse" - for RK3188 SoCs.
- "rockchip,rk3228-efuse" - for RK3228 SoCs.
- "rockchip,rk3288-efuse" - for RK3288 SoCs.
- "rockchip,rk3288-secure-efuse" - for RK3288 SoCs.
- "rockchip,rk3328-efuse" - for RK3328 SoCs.
- "rockchip,rk3368-efuse" - for RK3368 SoCs.
- "rockchip,rk3399-efuse" - for RK3399 SoCs.

View File

@@ -2,15 +2,10 @@ ROCKCHIP USB2.0 PHY WITH INNO IP BLOCK
Required properties (phy (parent) node):
- compatible : should be one of the listed compatibles:
* "rockchip,px30-usb2phy"
* "rockchip,rk3128-usb2phy"
* "rockchip,rk3228-usb2phy"
* "rockchip,rk3308-usb2phy"
* "rockchip,rk3328-usb2phy"
* "rockchip,rk3366-usb2phy"
* "rockchip,rk3368-usb2phy"
* "rockchip,rk3399-usb2phy"
* "rockchip,rk3568-usb2phy"
* "rockchip,rv1108-usb2phy"
- reg : the address offset of grf for usb-phy configuration.
- #clock-cells : should be 0.
@@ -28,10 +23,6 @@ Optional properties:
register files". When set driver will request its
phandle as one companion-grf for some special SoCs
(e.g RV1108).
- rockchip,u2phy-tuning: when set, tuning u2phy to improve usb2 SI.
- vbus-supply: regulator phandle for vbus power source.
- wakeup-source: enable USB irq wakeup when suspend.
only work when suspend wakeup-config is not work.
Required nodes : a sub-node is required for each port the phy provides.
The sub-node name is used to identify host or otg port,
@@ -54,16 +45,6 @@ Required properties (port (child) node):
Optional properties:
- phy-supply : phandle to a regulator that provides power to VBUS.
See ./phy-bindings.txt for details.
- rockchip,bypass-uart: when set, indicates that support usb to
bypass uart feature.
note: this property can only be added in debug stage.
- rockchip,utmi-avalid : when set, the usb2 phy will use avalid
status bit to get vbus status. If not, it will use
bvalid status bit to get vbus status by default.
- rockchip,vbus-always-on: when set, indicates that the otg vbus
is always powered on.
- rockchip,low-power-mode: when set, the port will enter low power
state when suspend.
Example:
@@ -95,48 +76,3 @@ grf: syscon@ff770000 {
};
};
};
Required properties (usb2phy grf node):
- compatible : should be one of the listed compatibles:
"rockchip,px30-usb2phy-grf", "syscon", "simple-mfd";
"rockchip,rk1808-usb2phy-grf", "syscon", "simple-mfd";
"rockchip,rk3308-usb2phy-grf", "syscon", "simple-mfd";
"rockchip,rk3328-usb2phy-grf", "syscon", "simple-mfd";
"rockchip,rk3568-usb2phy-grf", "syscon";
- reg : the address offset of grf for usb-phy configuration.
- #address-cells : should be 1.
- #size-cells : should be 1.
Required nodes : a sub-node is required for the phy provides.
The sub-node name is used to identify each phy,
and shall be the following entries:
Example:
usb2phy_grf: syscon@ff450000 {
compatible = "rockchip,rk3328-usb2phy-grf", "syscon",
"simple-mfd";
reg = <0x0 0xff450000 0x0 0x10000>;
#address-cells = <1>;
#size-cells = <1>;
u2phy: usb2-phy@100 {
compatible = "rockchip,rk3328-usb2phy";
reg = <0x100 0x10>;
clocks = <&xin24m>;
clock-names = "phyclk";
#clock-cells = <0>;
assigned-clocks = <&cru USB480M>;
assigned-clock-parents = <&u2phy>;
clock-output-names = "usb480m_phy";
status = "disabled";
u2phy_host: host-port {
#phy-cells = <0>;
interrupts = <GIC_SPI 62 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "linestate";
status = "disabled";
};
};
};

View File

@@ -17,11 +17,6 @@ Required properties:
Optional properties:
- extcon : extcon specifier for the Power Delivery
- rockchip,phy-config : A list of voltage swing(mV) and pre-emphasis
(dB) pairs. They are 3 blocks of 4 entries and
correspond to s0p0 ~ s0p3, s1p0 ~ s1p3,
s2p0 ~ s2p3, s3p0 ~ s2p3 swing and pre-emphasis
values.
Required nodes : a sub-node is required for each port the phy provides.
The sub-node name is used to identify dp or usb3 port,
@@ -55,21 +50,6 @@ Example:
<&cru SRST_P_UPHY0_TCPHY>;
reset-names = "uphy", "uphy-pipe", "uphy-tcphy";
rockchip,phy-config = <0x2a 0x00>,
<0x1f 0x15>,
<0x14 0x22>,
<0x02 0x2b>,
<0x21 0x00>,
<0x12 0x15>,
<0x02 0x22>,
<0 0>,
<0x15 0x00>,
<0x00 0x15>,
<0 0>,
<0 0>;
tcphy0_dp: dp-port {
#phy-cells = <0>;
};
@@ -94,21 +74,6 @@ Example:
<&cru SRST_P_UPHY1_TCPHY>;
reset-names = "uphy", "uphy-pipe", "uphy-tcphy";
rockchip,phy-config = <0x2a 0x00>,
<0x1f 0x15>,
<0x14 0x22>,
<0x02 0x2b>,
<0x21 0x00>,
<0x12 0x15>,
<0x02 0x22>,
<0 0>,
<0x15 0x00>,
<0x00 0x15>,
<0 0>,
<0 0>;
tcphy1_dp: dp-port {
#phy-cells = <0>;
};

View File

@@ -4,17 +4,12 @@ Rockchip specific extensions to the Analogix Display Port PHY
Required properties:
- compatible : should be one of the following supported values:
- "rockchip.rk3288-dp-phy"
- "rockchip.rk3368-dp-phy"
- clocks: from common clock binding: handle to dp clock.
of memory mapped region.
- clock-names: from common clock binding:
Required elements: "24m"
- #phy-cells : from the generic PHY bindings, must be 0;
Optional properties:
- resets : phandle to the reset of eDP 24m clock domain.
- reset-names : should be "edp_24m".
Example:
grf: syscon@ff770000 {

View File

@@ -22,7 +22,6 @@ Required properties for iomux controller:
- compatible: should be
"rockchip,px30-pinctrl": for Rockchip PX30
"rockchip,rv1108-pinctrl": for Rockchip RV1108
"rockchip,rv1126-pinctrl": for Rockchip RV1126
"rockchip,rk2928-pinctrl": for Rockchip RK2928
"rockchip,rk3066a-pinctrl": for Rockchip RK3066a
"rockchip,rk3066b-pinctrl": for Rockchip RK3066b
@@ -30,7 +29,6 @@ Required properties for iomux controller:
"rockchip,rk3188-pinctrl": for Rockchip RK3188
"rockchip,rk3228-pinctrl": for Rockchip RK3228
"rockchip,rk3288-pinctrl": for Rockchip RK3288
"rockchip,rk3308-pinctrl": for Rockchip RK3308
"rockchip,rk3328-pinctrl": for Rockchip RK3328
"rockchip,rk3368-pinctrl": for Rockchip RK3368
"rockchip,rk3399-pinctrl": for Rockchip RK3399

View File

@@ -36,7 +36,6 @@ Required properties:
- "rockchip,rk3188-io-voltage-domain" for rk3188
- "rockchip,rk3228-io-voltage-domain" for rk3228
- "rockchip,rk3288-io-voltage-domain" for rk3288
- "rockchip,rk3308-io-voltage-domain" for rk3308
- "rockchip,rk3328-io-voltage-domain" for rk3328
- "rockchip,rk3368-io-voltage-domain" for rk3368
- "rockchip,rk3368-pmu-io-voltage-domain" for rk3368 pmu-domains
@@ -44,7 +43,6 @@ Required properties:
- "rockchip,rk3399-pmu-io-voltage-domain" for rk3399 pmu-domains
- "rockchip,rv1108-io-voltage-domain" for rv1108
- "rockchip,rv1108-pmu-io-voltage-domain" for rv1108 pmu-domains
- "rockchip,rv1126-pmu-io-voltage-domain" for rv1126 pmu-domains
Deprecated properties:
- rockchip,grf: phandle to the syscon managing the "general register files"
@@ -119,17 +117,6 @@ Possible supplies for rk3399:
Possible supplies for rk3399 pmu-domains:
- pmu1830-supply:The supply connected to PMUIO2_VDD.
Possible supplies for rv1126 pmu-domains:
- vccio1-supply:The supply connected to VCCIO1_VDD.
- vccio2-supply:The supply connected to VCCIO2_VDD.
- vccio3-supply:The supply connected to VCCIO3_VDD.
- vccio4-supply:The supply connected to VCCIO4_VDD.
- vccio5-supply:The supply connected to VCCIO5_VDD.
- vccio6-supply:The supply connected to VCCIO6_VDD.
- vccio7-supply:The supply connected to VCCIO7_VDD.
- pmuio1-supply:The supply connected to PMUIO1_VDD.
- pmuio2-supply:The supply connected to PMUIO2_VDD.
Example:
io-domains {

View File

@@ -17,9 +17,6 @@ Required properties:
- #pwm-cells: must be 2 (rk2928) or 3 (rk3288). See pwm.txt in this directory
for a description of the cell format.
Optional properties:
- center-aligned: support pwm output center aligned mode.
Example:
pwm0: pwm@20030000 {
@@ -27,5 +24,4 @@ Example:
reg = <0x20030000 0x10>;
clocks = <&cru PCLK_PWM01>;
#pwm-cells = <2>;
center-aligned;
};

View File

@@ -1,8 +1,7 @@
Binding for Fairchild FAN53555 regulators
Required properties:
- compatible: one of "fcs,fan53555", "silergy,syr827", "silergy,syr828",
"rockchip,rk8603", "rockchip,rk8604"
- compatible: one of "fcs,fan53555", "silergy,syr827", "silergy,syr828"
- reg: I2C address
Optional properties:

View File

@@ -1,32 +0,0 @@
Regulator Proxy Consumer Bindings
Regulator proxy consumers provide a means to use a default regulator state
during bootup only which is removed at the end of boot. This feature can be
used in situations where a shared regulator can be scaled between several
possible voltages and hardware requires that it be at a high level at the
beginning of boot before the consumer device responsible for requesting the
high level has probed.
Optional properties:
proxy-supply: phandle of the regulator's own device node.
This property is required if any of the three
properties below are specified.
qcom,proxy-consumer-enable: Boolean indicating that the regulator must be
kept enabled during boot.
qcom,proxy-consumer-voltage: List of two integers corresponding the minimum
and maximum voltage allowed during boot in
microvolts.
qcom,proxy-consumer-current: Minimum current in microamps required during
boot.
Example:
foo_vreg: regulator@0 {
regulator-name = "foo";
regulator-min-microvolt = <1000000>;
regulator-max-microvolt = <2000000>;
proxy-supply = <&foo_vreg>;
qcom,proxy-consumer-voltage = <1500000 2000000>;
qcom,proxy-consumer-current = <25000>;
qcom,proxy-consumer-enable;
};

View File

@@ -63,9 +63,6 @@ reusable (optional) - empty property
able to reclaim it back. Typically that means that the operating
system can use that region to store volatile or cached data that
can be otherwise regenerated or migrated elsewhere.
inactive (optional) - empty property
- Indicates the operating system must not active the region. It's only
do active or inactive for region reusable.
Linux implementation note:
- If a "linux,cma-default" property is present, then Linux will use the

View File

@@ -1,27 +0,0 @@
OMAP ROM RNG driver binding
Secure SoCs may provide RNG via secure ROM calls like Nokia N900 does. The
implementation can depend on the SoC secure ROM used.
- compatible:
Usage: required
Value type: <string>
Definition: must be "nokia,n900-rom-rng"
- clocks:
Usage: required
Value type: <prop-encoded-array>
Definition: reference to the the RNG interface clock
- clock-names:
Usage: required
Value type: <stringlist>
Definition: must be "ick"
Example:
rom_rng: rng {
compatible = "nokia,n900-rom-rng";
clocks = <&rng_ick>;
clock-names = "ick";
};

View File

@@ -27,4 +27,4 @@ and valid to enable charging:
- "abracon,tc-diode": should be "standard" (0.6V) or "schottky" (0.3V)
- "abracon,tc-resistor": should be <0>, <3>, <6> or <11>. 0 disables the output
resistor, the other values are in kOhm.
resistor, the other values are in ohm.

View File

@@ -51,7 +51,6 @@ Optional properties:
- tx-threshold: Specify the TX FIFO low water indication for parts with
programmable TX FIFO thresholds.
- resets : phandle + reset specifier pairs
- wakeup-source: wake up the system when UART receives data.
Note:
* fsl,ns16550:

View File

@@ -13,7 +13,6 @@ On RK3328 SoCs, the GRF adds a section for USB2PHYGRF,
Required Properties:
- compatible: GRF should be one of the following:
- "rockchip,rk1808-grf", "syscon": for rk1808
- "rockchip,rk3036-grf", "syscon": for rk3036
- "rockchip,rk3066-grf", "syscon": for rk3066
- "rockchip,rk3188-grf", "syscon": for rk3188
@@ -23,12 +22,7 @@ Required Properties:
- "rockchip,rk3368-grf", "syscon": for rk3368
- "rockchip,rk3399-grf", "syscon": for rk3399
- "rockchip,rv1108-grf", "syscon": for rv1108
- "rockchip,rv1126-grf", "syscon": for rv1126
- compatible: COREGRF should be one of the following
- "rockchip,rk1808-coregrf", "syscon": for rk1808
- compatible: PMUGRF should be one of the following:
- "rockchip,rv1108-pmugrf", "syscon": for rv1108
- "rockchip,rv1126-pmugrf", "syscon": for rv1126
- "rockchip,rk3368-pmugrf", "syscon": for rk3368
- "rockchip,rk3399-pmugrf", "syscon": for rk3399
- compatible: SGRF should be one of the following

View File

@@ -6,8 +6,6 @@ powered up/down by software based on different application scenes to save power.
Required properties for power domain controller:
- compatible: Should be one of the following.
"rockchip,px30-power-controller" - for PX30 SoCs.
"rockchip,rv1126-power-controller" - for RV1126 SoCs
"rockchip,rk1808-power-controller" - for RK1808 SoCs
"rockchip,rk3036-power-controller" - for RK3036 SoCs.
"rockchip,rk3128-power-controller" - for RK3128 SoCs.
"rockchip,rk3228-power-controller" - for RK3228 SoCs.
@@ -16,7 +14,6 @@ Required properties for power domain controller:
"rockchip,rk3366-power-controller" - for RK3366 SoCs.
"rockchip,rk3368-power-controller" - for RK3368 SoCs.
"rockchip,rk3399-power-controller" - for RK3399 SoCs.
"rockchip,rk3568-power-controller" - for RK3568 SoCs.
- #power-domain-cells: Number of cells in a power-domain specifier.
Should be 1 for multiple PM domains.
- #address-cells: Should be 1.
@@ -25,8 +22,6 @@ Required properties for power domain controller:
Required properties for power domain sub nodes:
- reg: index of the power domain, should use macros in:
"include/dt-bindings/power/px30-power.h" - for PX30 type power domain.
"include/dt-bindings/power/rv1126-power.h" - for RV1126 type power domain.
"include/dt-bindings/power/rk1808-power.h" - for RK1808 type power domain.
"include/dt-bindings/power/rk3036-power.h" - for RK3036 type power domain.
"include/dt-bindings/power/rk3128-power.h" - for RK3128 type power domain.
"include/dt-bindings/power/rk3228-power.h" - for RK3228 type power domain.
@@ -35,7 +30,6 @@ Required properties for power domain sub nodes:
"include/dt-bindings/power/rk3366-power.h" - for RK3366 type power domain.
"include/dt-bindings/power/rk3368-power.h" - for RK3368 type power domain.
"include/dt-bindings/power/rk3399-power.h" - for RK3399 type power domain.
"include/dt-bindings/power/rk3568-power.h" - for RK3568 type power domain.
- clocks (optional): phandles to clocks which need to be enabled while power domain
switches state.
- pm_qos (optional): phandles to qos blocks which need to be saved and restored
@@ -108,8 +102,6 @@ containing a phandle to the power device node and an index specifying which
power domain to use.
The index should use macros in:
"include/dt-bindings/power/px30-power.h" - for px30 type power domain.
"include/dt-bindings/power/rv1126-power.h" - for RV1126 type power domain.
"include/dt-bindings/power/rk1808-power.h" - for rk1808 type power domain.
"include/dt-bindings/power/rk3036-power.h" - for rk3036 type power domain.
"include/dt-bindings/power/rk3128-power.h" - for rk3128 type power domain.
"include/dt-bindings/power/rk3128-power.h" - for rk3228 type power domain.
@@ -118,7 +110,6 @@ The index should use macros in:
"include/dt-bindings/power/rk3366-power.h" - for rk3366 type power domain.
"include/dt-bindings/power/rk3368-power.h" - for rk3368 type power domain.
"include/dt-bindings/power/rk3399-power.h" - for rk3399 type power domain.
"include/dt-bindings/power/rk3568-power.h" - for rk3568 type power domain.
Example of the node using power domain:

View File

@@ -3,11 +3,6 @@
Required properties:
- compatible: "rockchip,pdm"
- "rockchip,px30-pdm"
- "rockchip,rk1808-pdm"
- "rockchip,rk3308-pdm"
- "rockchip,rk3568-pdm"
- "rockchip,rv1126-pdm"
- reg: physical base address of the controller and length of memory mapped
region.
- dmas: DMA specifiers for rx dma. See the DMA client binding,
@@ -17,38 +12,11 @@ Required properties:
- clock-names: should contain following:
- "pdm_hclk": clock for PDM BUS
- "pdm_clk" : clock for PDM controller
- resets: a list of phandle + reset-specifer paris, one for each entry in reset-names.
- reset-names: reset names, should include "pdm-m".
- rockchip,no-dmaengine: This is a boolean property. If present, driver will do not
register pcm dmaengine, only just register dai. if the dai is part of multi-dais,
the property should be present. Please refer to rockchip,multidais.txt about
multi-dais usage.
- pinctrl-names: Must contain a "default" entry.
- pinctrl-N: One property must exist for each entry in
pinctrl-names. See ../pinctrl/pinctrl-bindings.txt
for details of the property values.
Optional properties:
- rockchip,path-map: This is a variable length array, that shows the mapping
of SDIx to PATHx. By default, they are one-to-one mapping as follows:
path0 <-- sdi0
path1 <-- sdi1
path2 <-- sdi2
path3 <-- sdi3
e.g. "rockchip,path-map = <3 2 1 0>" means the mapping as follows:
path0 <-- sdi3
path1 <-- sdi2
path2 <-- sdi1
path3 <-- sdi0
- rockchip,mclk-calibrate: This is a boolean value, if present, enable
clk calibrate and compenation.
note: the corresponding property 'pdm_clk_root' should be assigned.
Example for rk3328 PDM controller:
pdm: pdm@ff040000 {
@@ -71,12 +39,3 @@ pdm: pdm@ff040000 {
&pdmm0_sdi2_sleep
&pdmm0_sdi3_sleep>;
};
Example for RV1126 PDM controller with mclk-calibrate:
&pdm {
status = "okay";
clocks = <&cru MCLK_PDM>, <&cru HCLK_PDM>, <&cru PLL_CPLL>;
clock-names = "pdm_clk", "pdm_hclk", "pdm_clk_root";
rockchip,mclk-calibrate;
};

View File

@@ -8,18 +8,14 @@ Required properties:
- compatible: should be one of the following:
- "rockchip,rk3066-i2s": for rk3066
- "rockchip,px30-i2s", "rockchip,rk3066-i2s": for px30
- "rockchip,rk1808-i2s", "rockchip,rk3066-i2s": for rk1808
- "rockchip,rk3036-i2s", "rockchip,rk3066-i2s": for rk3036
- "rockchip,rk3128-i2s", "rockchip,rk3066-i2s": for rk3128
- "rockchip,rk3188-i2s", "rockchip,rk3066-i2s": for rk3188
- "rockchip,rk3228-i2s", "rockchip,rk3066-i2s": for rk3228
- "rockchip,rk3288-i2s", "rockchip,rk3066-i2s": for rk3288
- "rockchip,rk3308-i2s", "rockchip,rk3066-i2s": for rk3308
- "rockchip,rk3328-i2s", "rockchip,rk3066-i2s": for rk3328
- "rockchip,rk3366-i2s", "rockchip,rk3066-i2s": for rk3366
- "rockchip,rk3368-i2s", "rockchip,rk3066-i2s": for rk3368
- "rockchip,rk3399-i2s", "rockchip,rk3066-i2s": for rk3399
- "rockchip,rv1126-i2s", "rockchip,rk3066-i2s": for rv1126
- reg: physical base address of the controller and length of memory mapped
region.
- interrupts: should contain the I2S interrupt.
@@ -30,34 +26,14 @@ Required properties:
- clock-names: should contain the following:
- "i2s_hclk": clock for I2S BUS
- "i2s_clk" : clock for I2S controller
- resets: a list of phandle + reset-specifer paris, one for each entry in reset-names.
- reset-names: reset names, should include "reset-m", "reset-h", there may be only
"reset-m" on some platforms.
- rockchip,playback-channels: max playback channels, if not set, 8 channels default.
- rockchip,capture-channels: max capture channels, if not set, 2 channels default.
- rockchip,bclk-fs: configure the i2s bclk fs.
- rockchip,clk-trcm: tx and rx lrck/bclk common use.
- 0: both tx_lrck/bclk and rx_lrck/bclk are used
- 1: only tx_lrck/bclk is used
- 2: only rx_lrck/bclk is used
- rockchip,no-dmaengine: This is a boolean property. If present, driver will do not
register pcm dmaengine, only just register dai. if the dai is part of multi-dais,
the property should be present. Please refer to rockchip,multidais.txt about
multi-dais usage.
- rockchip,playback-only: Specify that the controller only has playback capability.
- rockchip,capture-only: Specify that the controller only has capture capability.
Required properties for controller which support multi channels
playback/capture:
- rockchip,grf: the phandle of the syscon node for GRF register.
Optional properties:
- rockchip,mclk-calibrate: This is a boolean value, if present, enable
clk calibrate and compenation.
note: the corresponding property 'i2s_clk_root' should be assigned.
Example for rk3288 I2S controller:
i2s@ff890000 {
@@ -70,14 +46,4 @@ i2s@ff890000 {
clocks = <&cru HCLK_I2S0>, <&cru SCLK_I2S0>;
rockchip,playback-channels = <8>;
rockchip,capture-channels = <2>;
rockchip,bclk-fs = <64>;
};
Example for RV1126 I2S controller with mclk-calibrate:
&i2s1_2ch {
status = "okay";
clocks = <&cru MCLK_I2S1>, <&cru HCLK_I2S1>, <&cru PLL_CPLL>;
clock-names = "i2s_clk", "i2s_hclk", "i2s_clk_root";
rockchip,mclk-calibrate;
};

View File

@@ -15,7 +15,6 @@ Required properties:
- "rockchip,rk3366-spdif"
- "rockchip,rk3368-spdif"
- "rockchip,rk3399-spdif"
- "rockchip,rk3568-spdif"
- reg: physical base address of the controller and length of memory mapped
region.
- interrupts: should contain the SPDIF interrupt.

View File

@@ -33,8 +33,6 @@ Optional properties:
2: Scale current by 1.0
3: Scale current by 1.5
- spk-con-gpio: speaker amplifier enable/disable control
Pins on the device (for linking into audio routes) for RT5651:
* DMIC L1

Some files were not shown because too many files have changed in this diff Show More