Skip to content

Conversation

@blktests-ci
Copy link

@blktests-ci blktests-ci bot commented Jul 10, 2025

Pull request for series with
subject: dm-pcache ��� persistent-memory cache for block devices
version: 2
url: https://patchwork.kernel.org/project/linux-block/list/?series=979565

@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 10, 2025

Upstream branch: 8c2e52e
series: https://patchwork.kernel.org/project/linux-block/list/?series=979565
version: 2

@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 10, 2025

Upstream branch: bc9ff19
series: https://patchwork.kernel.org/project/linux-block/list/?series=979565
version: 2

@blktests-ci blktests-ci bot force-pushed the series/975154=>linus-master branch from 5f35bc7 to 72a991a Compare July 10, 2025 17:10
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 9a69d4f to e311dd9 Compare July 11, 2025 01:13
@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 11, 2025

Upstream branch: bc9ff19
series: https://patchwork.kernel.org/project/linux-block/list/?series=979565
version: 2

@blktests-ci blktests-ci bot force-pushed the series/975154=>linus-master branch from 72a991a to b554731 Compare July 11, 2025 01:19
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from e311dd9 to b6b569e Compare July 11, 2025 07:05
@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 11, 2025

Upstream branch: bc9ff19
series: https://patchwork.kernel.org/project/linux-block/list/?series=979565
version: 2

@blktests-ci blktests-ci bot force-pushed the series/975154=>linus-master branch from b554731 to 19dc21a Compare July 11, 2025 07:12
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from b6b569e to ef2c9cd Compare July 11, 2025 17:45
@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 11, 2025

Upstream branch: 40f92e7
series: https://patchwork.kernel.org/project/linux-block/list/?series=979565
version: 2

@blktests-ci blktests-ci bot force-pushed the series/975154=>linus-master branch from 19dc21a to b1548d9 Compare July 11, 2025 17:52
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from ef2c9cd to 198825c Compare July 11, 2025 23:25
@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 11, 2025

Upstream branch: 40f92e7
series: https://patchwork.kernel.org/project/linux-block/list/?series=979565
version: 2

@blktests-ci blktests-ci bot force-pushed the series/975154=>linus-master branch from b1548d9 to d5adcfc Compare July 11, 2025 23:33
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 198825c to 341e7ed Compare July 14, 2025 02:27
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 341e7ed to 81f31a4 Compare July 23, 2025 02:05
Consolidate common PCACHE helpers into a new header so that subsequent
patches can include them without repeating boiler-plate.

- Logging macros with unified prefix and location info.
- Common constants (KB/MB helpers, metadata replica count, CRC seed).
- On-disk metadata header definition and CRC helper.
- Sequence-number comparison that handles wrap-around.
- pcache_meta_find_latest() to pick the newest valid metadata copy.

Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
This patch introduces *backing_dev.{c,h}*, a self-contained layer that
handles all interaction with the *backing block device* where cache
write-back and cache-miss reads are serviced.  Isolating this logic
keeps the core dm-pcache code free of low-level bio plumbing.

* Device setup / teardown
  - Opens the target with `dm_get_device()`, stores `bdev`, file and
    size, and initialises a dedicated `bioset`.
  - Gracefully releases resources via `backing_dev_stop()`.

* Request object (`struct pcache_backing_dev_req`)
  - Two request flavours:
    - REQ-type – cloned from an upper `struct bio` issued to
      dm-pcache; trimmed and re-targeted to the backing LBA.
    - KMEM-type – maps an arbitrary kernel memory buffer
      into a freshly built.
  - Private completion callback (`end_req`) propagates status to the
    upper layer and handles resource recycling.

* Submission & completion path
  - Lock-protected submit queue + worker (`req_submit_work`) let pcache
    push many requests asynchronously, at the same time, allow caller
    to submit backing_dev_req in atomic context.
  - End-io handler moves finished requests to a completion list processed
    by `req_complete_work`, ensuring callbacks run in process context.
  - Direct-submit option for non-atomic context.

* Flush
  - `backing_dev_flush()` issues a flush to persist backing-device data.

Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Add cache_dev.{c,h} to manage the persistent-memory device that stores
all pcache metadata and data segments.  Splitting this logic out keeps
the main dm-pcache code focused on policy while cache_dev handles the
low-level interaction with the DAX block device.

* DAX mapping
  - Opens the underlying device via dm_get_device().
  - Uses dax_direct_access() to obtain a direct linear mapping; falls
    back to vmap() when the range is fragmented.

* On-disk layout
  ┌─ 4 KB ─┐  super-block (SB)
  ├─ 4 KB ─┤  cache_info[0]
  ├─ 4 KB ─┤  cache_info[1]
  ├─ 4 KB ─┤  cache_ctrl
  └─ ...  ─┘  segments
  Constants and macros in the header expose offsets and sizes.

* Super-block handling
  - sb_read(), sb_validate(), sb_init() verify magic, CRC32 and host
    endianness (flag *PCACHE_SB_F_BIGENDIAN*).
  - Formatting zeroes the metadata replicas and initialises the segment
    bitmap when the SB is blank.

* Segment allocator
  - Bitmap protected by seg_lock; find_next_zero_bit() yields the next
    free 16 MB segment.

* Lifecycle helpers
  - cache_dev_start()/stop() encapsulate init/exit and are invoked by
    dm-pcache core.
  - Gracefully handles errors: CRC mismatch, wrong endianness, device
    too small (< 512 MB), or failed DAX mapping.

Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Introduce segment.{c,h}, an internal abstraction that encapsulates
everything related to a single pcache *segment* (the fixed-size
allocation unit stored on the cache-device).

* On-disk metadata (`struct pcache_segment_info`)
  - Embedded `struct pcache_meta_header` for CRC/sequence handling.
  - `flags` field encodes a “has-next” bit and a 4-bit *type* class
    (`CACHE_DATA` added as the first type).

* Initialisation
  - `pcache_segment_init()` populates the in-memory
    `struct pcache_segment` from a given segment id, data offset and
    metadata pointer, computing the usable `data_size` and virtual
    address within the DAX mapping.

* IO helpers
  - `segment_copy_to_bio()` / `segment_copy_from_bio()` move data
    between pmem and a bio, using `_copy_mc_to_iter()` and
    `_copy_from_iter_flushcache()` to tolerate hw memory errors and
    ensure durability.
  - `segment_pos_advance()` advances an internal offset while staying
    inside the segment’s data area.

These helpers allow upper layers (cache key management, write-back
logic, GC, etc.) to treat a segment as a contiguous byte array without
knowing about DAX mappings or persistence details.

Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Introduce *cache_segment.c*, the in-memory/on-disk glue that lets a
`struct pcache_cache` manage its array of data segments.

* Metadata handling
  - Loads the most-recent replica of both the segment-info block
    (`struct pcache_segment_info`) and per-segment generation counter
    (`struct pcache_cache_seg_gen`) using `pcache_meta_find_latest()`.
  - Updates those structures atomically with CRC + sequence rollover,
    writing alternately to the two metadata slots inside each segment.

* Segment initialisation (`cache_seg_init`)
  - Builds a `struct pcache_segment` pointing to the segment’s data
    area, sets up locks, generation counters, and, when formatting a new
    cache, zeroes the on-segment kset header.

* Linked-list of segments
  - `cache_seg_set_next_seg()` stores the *next* segment id in
    `seg_info->next_seg` and sets the HAS_NEXT flag, allowing a cache to
    span multiple segments. This is important to allow other type of
    segment added in future.

* Runtime life-cycle
  - Reference counting (`cache_seg_get/put`) with invalidate-on-last-put
    that clears the bitmap slot and schedules cleanup work.
  - Generation bump (`cache_seg_gen_increase`) persists a new generation
    record whenever the segment is modified.

* Allocator
  - `get_cache_segment()` uses a bitmap and per-cache hint to pick the
    next free segment, retrying with micro-delays when none are
    immediately available.

Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Introduce cache_writeback.c, which implements the asynchronous write-back
path for pcache.  The new file is responsible for detecting dirty data,
organising it into an in-memory tree, issuing bios to the backing block
device, and advancing the cache’s *dirty tail* pointer once data has
been safely persisted.

* Dirty-state detection
  - `__is_cache_clean()` reads the kset header at `dirty_tail`, checks
    magic and CRC, and thus decides whether there is anything to flush.

* Write-back scheduler
  - `cache_writeback_work` is queued on the cache task-workqueue and
    re-arms itself at `PCACHE_CACHE_WRITEBACK_INTERVAL`.
  - Uses an internal spin-protected `writeback_key_tree` to batch keys
    belonging to the same stripe before IO.

* Key processing
  - `cache_kset_insert_tree()` decodes each key inside the on-media
    kset, allocates an in-memory key object, and inserts it into the
    writeback_key_tree.
  - `cache_key_writeback()` builds a *KMEM-type* backing request that
    maps the persistent-memory range directly into a WRITE bio and
    submits it with `submit_bio_noacct()`.
  - After all keys from the writeback_key_tree have been flushed,
    `backing_dev_flush()` issues a single FLUSH to ensure durability.

* Tail advancement
  - Once a kset is written back, `cache_pos_advance()` moves
    `cache->dirty_tail` by the exact on-disk size and the new position is
    persisted via `cache_encode_dirty_tail()`.
  - When the `PCACHE_KSET_FLAGS_LAST` flag is seen, the write-back
    engine switches to the next segment indicated by `next_cache_seg_id`.

Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Introduce cache_gc.c, a self-contained engine that reclaims cache
segments whose data have already been flushed to the backing device.
Running in the cache workqueue, the GC keeps segment usage below the
user-configurable *cache_gc_percent* threshold.

* need_gc() – decides when to trigger GC by checking:
  - *dirty_tail* vs *key_tail* position,
  - kset integrity (magic + CRC),
  - bitmap utilisation against the gc-percent threshold.

* Per-key reclamation
  - Decodes each key in the target kset (`cache_key_decode()`).
  - Drops the segment reference with `cache_seg_put()`, allowing the
    segment to be invalidated once all keys are gone.
  - When the reference count hits zero the segment is cleared from
    `seg_map`, making it immediately reusable by the allocator.

* Scheduling
  - `pcache_cache_gc_fn()` loops until no more work is needed, then
    re-queues itself after *PCACHE_CACHE_GC_INTERVAL*.

Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Add *cache_key.c* which becomes the heart of dm-pcache’s
in-memory index and on-media key-set (“kset”) format.

* Key objects (`struct pcache_cache_key`)
  - Slab-backed allocator & ref-count helpers
  - `cache_key_encode()/decode()` translate between in-memory keys and
    their on-disk representation, validating CRC when
    *cache_data_crc* is enabled.

* Kset construction & persistence
  - Per-kset buffer lives in `struct pcache_cache_kset`; keys are
    appended until full or *force_close* triggers an immediate flush.
  - `cache_kset_close()` writes the kset to the *key_head* segment,
    automatically chaining a *LAST* kset header when rolling over to a
    freshly allocated segment.

* Red-black tree with striping
  - Cache space is divided into *subtrees* to reduce lock
    contention; each subtree owns its own RB-root + spinlock.
  - Complex overlap-resolution logic (`cache_insert_fixup()`) ensures
    newly inserted keys never leave overlapping stale ranges behind
    (head/tail/contain/contained cases handled).

* Replay on start-up
  - `cache_replay()` walks from *key_tail* to *key_head*, re-hydrates
    keys, validates CRC/magic, seamlessly
    skipping placeholder “empty” keys left by read-misses.

* Background maintenance
  - `clean_work` lazily prunes invalidated keys after GC.
  - `kset_flush_work` background thread to close a kset.

With this patch dm-pcache can persistently track cached extents, rebuild
its index after crash, and guarantee non-overlapping key space – paving
the way for functional read/write caching.

Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Introduce cache_req.c, the high-level engine that
drives I/O requests through dm-pcache. It decides whether data is served
from the cache or fetched from the backing device, allocates new cache
space on writes, and flushes dirty ksets when required.

* Read path
  - Traverses the striped RB-trees to locate cached extents.
  - Generates backing READ requests for gaps and inserts placeholder
    “empty” keys to avoid duplicate fetches.
  - Copies valid data directly from pmem into the caller’s bio; CRC and
    generation checks guard against stale segments.

* Write path
  - Allocates space in the current data segment via cache_data_alloc().
  - Copies data from the bio into pmem, then inserts or updates keys,
    splitting or trimming overlapped ranges as needed.
  - Adds each new key to the active kset; forces kset close when FUA is
    requested or the kset is full.

* Miss handling
  - create_cache_miss_req() builds a backing READ, optionally attaching
    an empty key.
  - miss_read_end_req() replaces the placeholder with real data once the
    READ completes, or deletes it on error.

* Flush support
  - cache_flush() iterates over all ksets and forces them to close,
    ensuring data durability when REQ_PREFLUSH is received.

Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Add cache.c and cache.h that introduce the top-level
“struct pcache_cache”. This object glues together the backing block
device, the persistent-memory cache device, segment array, RB-tree
indexes, and the background workers for write-back and garbage
collection.

* Persistent metadata
  - pcache_cache_info tracks options such as cache mode, data-crc flag
    and GC threshold, written atomically with CRC+sequence.
  - key_tail and dirty_tail positions are double-buffered and recovered
    at mount time.

* Segment management
  - kvcalloc()’d array of pcache_cache_segment objects, bitmap for fast
    allocation, refcounts and generation numbers so GC can invalidate
    old extents safely.
  - First segment hosts a pcache_cache_ctrl block shared by all
    threads.

* Request path hooks
  - pcache_cache_handle_req() dispatches READ, WRITE and FLUSH bios to
    the engines added in earlier patches.
  - Per-CPU data_heads support lock-free allocation of space for new
    writes.

* Background workers
  - Delayed work items for write-back (5 s) and GC (5 s).
  - clean_work removes stale keys after segments are reclaimed.

* Lifecycle helpers
  - pcache_cache_start()/stop() bring the cache online, replay keys,
    start workers, and flush everything on shutdown.

With this piece in place dm-pcache has a fully initialised cache object
capable of serving I/O and maintaining its on-disk structures.

Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
Add the top-level integration pieces that make the new persistent-memory
cache target usable from device-mapper:

* Documentation
  - `Documentation/admin-guide/device-mapper/dm-pcache.rst` explains the
    design, table syntax, status fields and runtime messages.

* Core target implementation
  - `dm_pcache.c` and `dm_pcache.h` register the `"pcache"` DM target,
    parse constructor arguments, create workqueues, and forward BIOS to
    the cache core added in earlier patches.
  - Supports flush/FUA, status reporting, and a “gc_percent” message.
  - Dont support discard currently.
  - Dont support table reload for live target currently.

* Device-mapper tables now accept lines like
    pcache <pmem_dev> <backing_dev> writeback <true|false>

Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev>
@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 23, 2025

Upstream branch: 89be9a8
series: https://patchwork.kernel.org/project/linux-block/list/?series=979565
version: 2

@blktests-ci blktests-ci bot force-pushed the series/975154=>linus-master branch from d5adcfc to d6dfc92 Compare July 23, 2025 02:14
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 81f31a4 to 87bbbbc Compare July 24, 2025 05:40
@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 24, 2025

Upstream branch: 25fae0b
series: https://patchwork.kernel.org/project/linux-block/list/?series=979565
version: 2

@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 24, 2025

Upstream branch: 25fae0b
series: https://patchwork.kernel.org/project/linux-block/list/?series=979565
version: 2

@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 24, 2025

Github failed to update this PR after force push. Close it.

@blktests-ci blktests-ci bot closed this Jul 24, 2025
@blktests-ci blktests-ci bot deleted the series/975154=>linus-master branch July 31, 2025 04:38
blktests-ci bot pushed a commit that referenced this pull request Aug 4, 2025
The current code that checks for misspelling verifies, in a more
complex regex, if $rawline matches [^\w]($misspellings)[^\w]

Being $rawline a byte-string, a utf-8 character in $rawline can
match the non-word-char [^\w].
E.g.:
	./scripts/checkpatch.pl --git 81c2f05
	WARNING: 'ment' may be misspelled - perhaps 'meant'?
	#36: FILE: MAINTAINERS:14360:
	+M:     Clément Léger <clement.leger@bootlin.com>
	            ^^^^

Use a utf-8 version of $rawline for spell checking.

Link: https://lkml.kernel.org/r/20250616-b4-checkpatch-upstream-v2-1-5600ce4a3b43@foss.st.com
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
blktests-ci bot pushed a commit that referenced this pull request Aug 31, 2025
After hid_hw_start() is called hidinput_connect() will eventually be
called to set up the device with the input layer since the
HID_CONNECT_DEFAULT connect mask is used. During hidinput_connect()
all input and output reports are processed and corresponding hid_inputs
are allocated and configured via hidinput_configure_usages(). This
process involves slot tagging report fields and configuring usages
by setting relevant bits in the capability bitmaps. However it is possible
that the capability bitmaps are not set at all leading to the subsequent
hidinput_has_been_populated() check to fail leading to the freeing of the
hid_input and the underlying input device.

This becomes problematic because a malicious HID device like a
ASUS ROG N-Key keyboard can trigger the above scenario via a
specially crafted descriptor which then leads to a user-after-free
when the name of the freed input device is written to later on after
hid_hw_start(). Below, report 93 intentionally utilises the
HID_UP_UNDEFINED Usage Page which is skipped during usage
configuration, leading to the frees.

0x05, 0x0D,        // Usage Page (Digitizer)
0x09, 0x05,        // Usage (Touch Pad)
0xA1, 0x01,        // Collection (Application)
0x85, 0x0D,        //   Report ID (13)
0x06, 0x00, 0xFF,  //   Usage Page (Vendor Defined 0xFF00)
0x09, 0xC5,        //   Usage (0xC5)
0x15, 0x00,        //   Logical Minimum (0)
0x26, 0xFF, 0x00,  //   Logical Maximum (255)
0x75, 0x08,        //   Report Size (8)
0x95, 0x04,        //   Report Count (4)
0xB1, 0x02,        //   Feature (Data,Var,Abs)
0x85, 0x5D,        //   Report ID (93)
0x06, 0x00, 0x00,  //   Usage Page (Undefined)
0x09, 0x01,        //   Usage (0x01)
0x15, 0x00,        //   Logical Minimum (0)
0x26, 0xFF, 0x00,  //   Logical Maximum (255)
0x75, 0x08,        //   Report Size (8)
0x95, 0x1B,        //   Report Count (27)
0x81, 0x02,        //   Input (Data,Var,Abs)
0xC0,              // End Collection

Below is the KASAN splat after triggering the UAF:

[   21.672709] ==================================================================
[   21.673700] BUG: KASAN: slab-use-after-free in asus_probe+0xeeb/0xf80
[   21.673700] Write of size 8 at addr ffff88810a0ac000 by task kworker/1:2/54
[   21.673700]
[   21.673700] CPU: 1 UID: 0 PID: 54 Comm: kworker/1:2 Not tainted 6.16.0-rc4-g9773391cf4dd-dirty #36 PREEMPT(voluntary)
[   21.673700] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   21.673700] Call Trace:
[   21.673700]  <TASK>
[   21.673700]  dump_stack_lvl+0x5f/0x80
[   21.673700]  print_report+0xd1/0x660
[   21.673700]  kasan_report+0xe5/0x120
[   21.673700]  __asan_report_store8_noabort+0x1b/0x30
[   21.673700]  asus_probe+0xeeb/0xf80
[   21.673700]  hid_device_probe+0x2ee/0x700
[   21.673700]  really_probe+0x1c6/0x6b0
[   21.673700]  __driver_probe_device+0x24f/0x310
[   21.673700]  driver_probe_device+0x4e/0x220
[...]
[   21.673700]
[   21.673700] Allocated by task 54:
[   21.673700]  kasan_save_stack+0x3d/0x60
[   21.673700]  kasan_save_track+0x18/0x40
[   21.673700]  kasan_save_alloc_info+0x3b/0x50
[   21.673700]  __kasan_kmalloc+0x9c/0xa0
[   21.673700]  __kmalloc_cache_noprof+0x139/0x340
[   21.673700]  input_allocate_device+0x44/0x370
[   21.673700]  hidinput_connect+0xcb6/0x2630
[   21.673700]  hid_connect+0xf74/0x1d60
[   21.673700]  hid_hw_start+0x8c/0x110
[   21.673700]  asus_probe+0x5a3/0xf80
[   21.673700]  hid_device_probe+0x2ee/0x700
[   21.673700]  really_probe+0x1c6/0x6b0
[   21.673700]  __driver_probe_device+0x24f/0x310
[   21.673700]  driver_probe_device+0x4e/0x220
[...]
[   21.673700]
[   21.673700] Freed by task 54:
[   21.673700]  kasan_save_stack+0x3d/0x60
[   21.673700]  kasan_save_track+0x18/0x40
[   21.673700]  kasan_save_free_info+0x3f/0x60
[   21.673700]  __kasan_slab_free+0x3c/0x50
[   21.673700]  kfree+0xcf/0x350
[   21.673700]  input_dev_release+0xab/0xd0
[   21.673700]  device_release+0x9f/0x220
[   21.673700]  kobject_put+0x12b/0x220
[   21.673700]  put_device+0x12/0x20
[   21.673700]  input_free_device+0x4c/0xb0
[   21.673700]  hidinput_connect+0x1862/0x2630
[   21.673700]  hid_connect+0xf74/0x1d60
[   21.673700]  hid_hw_start+0x8c/0x110
[   21.673700]  asus_probe+0x5a3/0xf80
[   21.673700]  hid_device_probe+0x2ee/0x700
[   21.673700]  really_probe+0x1c6/0x6b0
[   21.673700]  __driver_probe_device+0x24f/0x310
[   21.673700]  driver_probe_device+0x4e/0x220
[...]

Fixes: 9ce12d8 ("HID: asus: Add i2c touchpad support")
Cc: stable@vger.kernel.org
Signed-off-by: Qasim Ijaz <qasdev00@gmail.com>
Link: https://patch.msgid.link/20250810181041.44874-1-qasdev00@gmail.com
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants