Merge tag 'docs-5.7' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "This has been a busy cycle for documentation work.

  Highlights include:

   - Lots of RST conversion work by Mauro, Daniel ALmeida, and others.
     Maybe someday we'll get to the end of this stuff...maybe...

   - Some organizational work to bring some order to the core-api
     manual.

   - Various new docs and additions to the existing documentation.

   - Typo fixes, warning fixes, ..."

* tag 'docs-5.7' of git://git.lwn.net/linux: (123 commits)
  Documentation: x86: exception-tables: document CONFIG_BUILDTIME_TABLE_SORT
  MAINTAINERS: adjust to filesystem doc ReST conversion
  docs: deprecated.rst: Add BUG()-family
  doc: zh_CN: add translation for virtiofs
  doc: zh_CN: index files in filesystems subdirectory
  docs: locking: Drop :c:func: throughout
  docs: locking: Add 'need' to hardirq section
  docs: conf.py: avoid thousands of duplicate label warning on Sphinx
  docs: prevent warnings due to autosectionlabel
  docs: fix reference to core-api/namespaces.rst
  docs: fix pointers to io-mapping.rst and io_ordering.rst files
  Documentation: Better document the softlockup_panic sysctl
  docs: hw-vuln: tsx_async_abort.rst: get rid of an unused ref
  docs: perf: imx-ddr.rst: get rid of a warning
  docs: filesystems: fuse.rst: supress a Sphinx warning
  docs: translations: it: avoid duplicate refs at programming-language.rst
  docs: driver.rst: supress two ReSt warnings
  docs: trace: events.rst: convert some new stuff to ReST format
  Documentation: Add io_ordering.rst to driver-api manual
  Documentation: Add io-mapping.rst to driver-api manual
  ...
This commit is contained in:
Linus Torvalds
2020-03-30 12:45:23 -07:00
141 changed files with 4535 additions and 3257 deletions

View File

@@ -272,8 +272,8 @@ STA information lifetime rules
.. kernel-doc:: net/mac80211/sta_info.c
:doc: STA information lifetime rules
Aggregation
===========
Aggregation Functions
=====================
.. kernel-doc:: net/mac80211/sta_info.h
:functions: sta_ampdu_mlme
@@ -284,8 +284,8 @@ Aggregation
.. kernel-doc:: net/mac80211/sta_info.h
:functions: tid_ampdu_rx
Synchronisation
===============
Synchronisation Functions
=========================
TBD

View File

@@ -5,8 +5,8 @@ DMAEngine documentation
DMAEngine documentation provides documents for various aspects of DMAEngine
framework.
DMAEngine documentation
-----------------------
DMAEngine development documentation
-----------------------------------
This book helps with DMAengine internal APIs and guide for DMAEngine device
driver writers.

View File

@@ -210,7 +210,7 @@ probed.
While the typical use case for sync_state() is to have the kernel cleanly take
over management of devices from the bootloader, the usage of sync_state() is
not restricted to that. Use it whenever it makes sense to take an action after
all the consumers of a device have probed.
all the consumers of a device have probed::
int (*remove) (struct device *dev);

View File

@@ -1,58 +0,0 @@
.. SPDX-License-Identifier: GPL-2.0
====
EDID
====
In the good old days when graphics parameters were configured explicitly
in a file called xorg.conf, even broken hardware could be managed.
Today, with the advent of Kernel Mode Setting, a graphics board is
either correctly working because all components follow the standards -
or the computer is unusable, because the screen remains dark after
booting or it displays the wrong area. Cases when this happens are:
- The graphics board does not recognize the monitor.
- The graphics board is unable to detect any EDID data.
- The graphics board incorrectly forwards EDID data to the driver.
- The monitor sends no or bogus EDID data.
- A KVM sends its own EDID data instead of querying the connected monitor.
Adding the kernel parameter "nomodeset" helps in most cases, but causes
restrictions later on.
As a remedy for such situations, the kernel configuration item
CONFIG_DRM_LOAD_EDID_FIRMWARE was introduced. It allows to provide an
individually prepared or corrected EDID data set in the /lib/firmware
directory from where it is loaded via the firmware interface. The code
(see drivers/gpu/drm/drm_edid_load.c) contains built-in data sets for
commonly used screen resolutions (800x600, 1024x768, 1280x1024, 1600x1200,
1680x1050, 1920x1080) as binary blobs, but the kernel source tree does
not contain code to create these data. In order to elucidate the origin
of the built-in binary EDID blobs and to facilitate the creation of
individual data for a specific misbehaving monitor, commented sources
and a Makefile environment are given here.
To create binary EDID and C source code files from the existing data
material, simply type "make".
If you want to create your own EDID file, copy the file 1024x768.S,
replace the settings with your own data and add a new target to the
Makefile. Please note that the EDID data structure expects the timing
values in a different way as compared to the standard X11 format.
X11:
HTimings:
hdisp hsyncstart hsyncend htotal
VTimings:
vdisp vsyncstart vsyncend vtotal
EDID::
#define XPIX hdisp
#define XBLANK htotal-hdisp
#define XOFFSET hsyncstart-hdisp
#define XPULSE hsyncend-hsyncstart
#define YPIX vdisp
#define YBLANK vtotal-vdisp
#define YOFFSET vsyncstart-vdisp
#define YPULSE vsyncend-vsyncstart

View File

@@ -17,6 +17,7 @@ available subsections can be seen below.
driver-model/index
basics
infrastructure
ioctl
early-userspace/index
pm/index
clk
@@ -74,11 +75,12 @@ available subsections can be seen below.
connector
console
dcdbas
edid
eisa
ipmb
isa
isapnp
io-mapping
io_ordering
generic-counter
lightnvm-pblk
memory-devices/index

View File

@@ -0,0 +1,97 @@
========================
The io_mapping functions
========================
API
===
The io_mapping functions in linux/io-mapping.h provide an abstraction for
efficiently mapping small regions of an I/O device to the CPU. The initial
usage is to support the large graphics aperture on 32-bit processors where
ioremap_wc cannot be used to statically map the entire aperture to the CPU
as it would consume too much of the kernel address space.
A mapping object is created during driver initialization using::
struct io_mapping *io_mapping_create_wc(unsigned long base,
unsigned long size)
'base' is the bus address of the region to be made
mappable, while 'size' indicates how large a mapping region to
enable. Both are in bytes.
This _wc variant provides a mapping which may only be used
with the io_mapping_map_atomic_wc or io_mapping_map_wc.
With this mapping object, individual pages can be mapped either atomically
or not, depending on the necessary scheduling environment. Of course, atomic
maps are more efficient::
void *io_mapping_map_atomic_wc(struct io_mapping *mapping,
unsigned long offset)
'offset' is the offset within the defined mapping region.
Accessing addresses beyond the region specified in the
creation function yields undefined results. Using an offset
which is not page aligned yields an undefined result. The
return value points to a single page in CPU address space.
This _wc variant returns a write-combining map to the
page and may only be used with mappings created by
io_mapping_create_wc
Note that the task may not sleep while holding this page
mapped.
::
void io_mapping_unmap_atomic(void *vaddr)
'vaddr' must be the value returned by the last
io_mapping_map_atomic_wc call. This unmaps the specified
page and allows the task to sleep once again.
If you need to sleep while holding the lock, you can use the non-atomic
variant, although they may be significantly slower.
::
void *io_mapping_map_wc(struct io_mapping *mapping,
unsigned long offset)
This works like io_mapping_map_atomic_wc except it allows
the task to sleep while holding the page mapped.
::
void io_mapping_unmap(void *vaddr)
This works like io_mapping_unmap_atomic, except it is used
for pages mapped with io_mapping_map_wc.
At driver close time, the io_mapping object must be freed::
void io_mapping_free(struct io_mapping *mapping)
Current Implementation
======================
The initial implementation of these functions uses existing mapping
mechanisms and so provides only an abstraction layer and no new
functionality.
On 64-bit processors, io_mapping_create_wc calls ioremap_wc for the whole
range, creating a permanent kernel-visible mapping to the resource. The
map_atomic and map functions add the requested offset to the base of the
virtual address returned by ioremap_wc.
On 32-bit processors with HIGHMEM defined, io_mapping_map_atomic_wc uses
kmap_atomic_pfn to map the specified page in an atomic fashion;
kmap_atomic_pfn isn't really supposed to be used with device pages, but it
provides an efficient mapping for this usage.
On 32-bit processors without HIGHMEM defined, io_mapping_map_atomic_wc and
io_mapping_map_wc both use ioremap_wc, a terribly inefficient function which
performs an IPI to inform all processors about the new mapping. This results
in a significant performance penalty.

View File

@@ -0,0 +1,51 @@
==============================================
Ordering I/O writes to memory-mapped addresses
==============================================
On some platforms, so-called memory-mapped I/O is weakly ordered. On such
platforms, driver writers are responsible for ensuring that I/O writes to
memory-mapped addresses on their device arrive in the order intended. This is
typically done by reading a 'safe' device or bridge register, causing the I/O
chipset to flush pending writes to the device before any reads are posted. A
driver would usually use this technique immediately prior to the exit of a
critical section of code protected by spinlocks. This would ensure that
subsequent writes to I/O space arrived only after all prior writes (much like a
memory barrier op, mb(), only with respect to I/O).
A more concrete example from a hypothetical device driver::
...
CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: val = readl(my_status);
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: val = readl(my_status);
CPU B: ...
CPU B: writel(newval2, ring_ptr);
CPU B: spin_unlock_irqrestore(&dev_lock, flags)
...
In the case above, the device may receive newval2 before it receives newval,
which could cause problems. Fixing it is easy enough though::
...
CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: val = readl(my_status);
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: (void)readl(safe_register); /* maybe a config register? */
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: val = readl(my_status);
CPU B: ...
CPU B: writel(newval2, ring_ptr);
CPU B: (void)readl(safe_register); /* maybe a config register? */
CPU B: spin_unlock_irqrestore(&dev_lock, flags)
Here, the reads from safe_register will cause the I/O chipset to flush any
pending writes before actually posting the read to the chipset, preventing
possible data corruption.

View File

@@ -0,0 +1,253 @@
======================
ioctl based interfaces
======================
ioctl() is the most common way for applications to interface
with device drivers. It is flexible and easily extended by adding new
commands and can be passed through character devices, block devices as
well as sockets and other special file descriptors.
However, it is also very easy to get ioctl command definitions wrong,
and hard to fix them later without breaking existing applications,
so this documentation tries to help developers get it right.
Command number definitions
==========================
The command number, or request number, is the second argument passed to
the ioctl system call. While this can be any 32-bit number that uniquely
identifies an action for a particular driver, there are a number of
conventions around defining them.
``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
ioctl commands that follow modern conventions: ``_IO``, ``_IOR``,
``_IOW``, and ``_IOWR``. These should be used for all new commands,
with the correct parameters:
_IO/_IOR/_IOW/_IOWR
The macro name specifies how the argument will be used.  It may be a
pointer to data to be passed into the kernel (_IOW), out of the kernel
(_IOR), or both (_IOWR).  _IO can indicate either commands with no
argument or those passing an integer value instead of a pointer.
It is recommended to only use _IO for commands without arguments,
and use pointers for passing data.
type
An 8-bit number, often a character literal, specific to a subsystem
or driver, and listed in :doc:`../userspace-api/ioctl/ioctl-number`
nr
An 8-bit number identifying the specific command, unique for a give
value of 'type'
data_type
The name of the data type pointed to by the argument, the command number
encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer,
leading to a limit of 8191 bytes for the maximum size of the argument.
Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as that
will lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t).
_IO does not have a data_type parameter.
Interface versions
==================
Some subsystems use version numbers in data structures to overload
commands with different interpretations of the argument.
This is generally a bad idea, since changes to existing commands tend
to break existing applications.
A better approach is to add a new ioctl command with a new number. The
old command still needs to be implemented in the kernel for compatibility,
but this can be a wrapper around the new implementation.
Return code
===========
ioctl commands can return negative error codes as documented in errno(3);
these get turned into errno values in user space. On success, the return
code should be zero. It is also possible but not recommended to return
a positive 'long' value.
When the ioctl callback is called with an unknown command number, the
handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in
-ENOTTY being returned from the system call. Some subsystems return
-ENOSYS or -EINVAL here for historic reasons, but this is wrong.
Prior to Linux 5.5, compat_ioctl handlers were required to return
-ENOIOCTLCMD in order to use the fallback conversion into native
commands. As all subsystems are now responsible for handling compat
mode themselves, this is no longer needed, but it may be important to
consider when backporting bug fixes to older kernels.
Timestamps
==========
Traditionally, timestamps and timeout values are passed as ``struct
timespec`` or ``struct timeval``, but these are problematic because of
incompatible definitions of these structures in user space after the
move to 64-bit time_t.
The ``struct __kernel_timespec`` type can be used instead to be embedded
in other data structures when separate second/nanosecond values are
desired, or passed to user space directly. This is still not ideal though,
as the structure matches neither the kernel's timespec64 nor the user
space timespec exactly. The get_timespec64() and put_timespec64() helper
functions can be used to ensure that the layout remains compatible with
user space and the padding is treated correctly.
As it is cheap to convert seconds to nanoseconds, but the opposite
requires an expensive 64-bit division, a simple __u64 nanosecond value
can be simpler and more efficient.
Timeout values and timestamps should ideally use CLOCK_MONOTONIC time,
as returned by ktime_get_ns() or ktime_get_ts64(). Unlike
CLOCK_REALTIME, this makes the timestamps immune from jumping backwards
or forwards due to leap second adjustments and clock_settime() calls.
ktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that
need to be persistent across a reboot or between multiple machines.
32-bit compat mode
==================
In order to support 32-bit user space running on a 64-bit machine, each
subsystem or driver that implements an ioctl callback handler must also
implement the corresponding compat_ioctl handler.
As long as all the rules for data structures are followed, this is as
easy as setting the .compat_ioctl pointer to a helper function such as
compat_ptr_ioctl() or blkdev_compat_ptr_ioctl().
compat_ptr()
------------
On the s390 architecture, 31-bit user space has ambiguous representations
for data pointers, with the upper bit being ignored. When running such
a process in compat mode, the compat_ptr() helper must be used to
clear the upper bit of a compat_uptr_t and turn it into a valid 64-bit
pointer. On other architectures, this macro only performs a cast to a
``void __user *`` pointer.
In an compat_ioctl() callback, the last argument is an unsigned long,
which can be interpreted as either a pointer or a scalar depending on
the command. If it is a scalar, then compat_ptr() must not be used, to
ensure that the 64-bit kernel behaves the same way as a 32-bit kernel
for arguments with the upper bit set.
The compat_ptr_ioctl() helper can be used in place of a custom
compat_ioctl file operation for drivers that only take arguments that
are pointers to compatible data structures.
Structure layout
----------------
Compatible data structures have the same layout on all architectures,
avoiding all problematic members:
* ``long`` and ``unsigned long`` are the size of a register, so
they can be either 32-bit or 64-bit wide and cannot be used in portable
data structures. Fixed-length replacements are ``__s32``, ``__u32``,
``__s64`` and ``__u64``.
* Pointers have the same problem, in addition to requiring the
use of compat_ptr(). The best workaround is to use ``__u64``
in place of pointers, which requires a cast to ``uintptr_t`` in user
space, and the use of u64_to_user_ptr() in the kernel to convert
it back into a user pointer.
* On the x86-32 (i386) architecture, the alignment of 64-bit variables
is only 32-bit, but they are naturally aligned on most other
architectures including x86-64. This means a structure like::
struct foo {
__u32 a;
__u64 b;
__u32 c;
};
has four bytes of padding between a and b on x86-64, plus another four
bytes of padding at the end, but no padding on i386, and it needs a
compat_ioctl conversion handler to translate between the two formats.
To avoid this problem, all structures should have their members
naturally aligned, or explicit reserved fields added in place of the
implicit padding. The ``pahole`` tool can be used for checking the
alignment.
* On ARM OABI user space, structures are padded to multiples of 32-bit,
making some structs incompatible with modern EABI kernels if they
do not end on a 32-bit boundary.
* On the m68k architecture, struct members are not guaranteed to have an
alignment greater than 16-bit, which is a problem when relying on
implicit padding.
* Bitfields and enums generally work as one would expect them to,
but some properties of them are implementation-defined, so it is better
to avoid them completely in ioctl interfaces.
* ``char`` members can be either signed or unsigned, depending on
the architecture, so the __u8 and __s8 types should be used for 8-bit
integer values, though char arrays are clearer for fixed-length strings.
Information leaks
=================
Uninitialized data must not be copied back to user space, as this can
cause an information leak, which can be used to defeat kernel address
space layout randomization (KASLR), helping in an attack.
For this reason (and for compat support) it is best to avoid any
implicit padding in data structures.  Where there is implicit padding
in an existing structure, kernel drivers must be careful to fully
initialize an instance of the structure before copying it to user
space.  This is usually done by calling memset() before assigning to
individual members.
Subsystem abstractions
======================
While some device drivers implement their own ioctl function, most
subsystems implement the same command for multiple drivers. Ideally the
subsystem has an .ioctl() handler that copies the arguments from and
to user space, passing them into subsystem specific callback functions
through normal kernel pointers.
This helps in various ways:
* Applications written for one driver are more likely to work for
another one in the same subsystem if there are no subtle differences
in the user space ABI.
* The complexity of user space access and data structure layout is done
in one place, reducing the potential for implementation bugs.
* It is more likely to be reviewed by experienced developers
that can spot problems in the interface when the ioctl is shared
between multiple drivers than when it is only used in a single driver.
Alternatives to ioctl
=====================
There are many cases in which ioctl is not the best solution for a
problem. Alternatives include:
* System calls are a better choice for a system-wide feature that
is not tied to a physical device or constrained by the file system
permissions of a character device node
* netlink is the preferred way of configuring any network related
objects through sockets.
* debugfs is used for ad-hoc interfaces for debugging functionality
that does not need to be exposed as a stable interface to applications.
* sysfs is a good way to expose the state of an in-kernel object
that is not tied to a file descriptor.
* configfs can be used for more complex configuration than sysfs
* A custom file system can provide extra flexibility with a simple
user interface but adds a lot of complexity to the implementation.