Commit Graph

6581 Commits

Author SHA1 Message Date
Ranjan Dutta f9c4e4ce1f This is the 6.1.80 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmXhyaIACgkQONu9yGCS
 aT70JA//Rq81Di6wkFIUi9ZGShriNiGRyBIXBcdltI8WdSDc23QGJ4wDV/HV3t+s
 WD9Rz+/PuQObOVb6dJwK/BQqt/tAmBEA06RePLkAva/hWiTLI39hRYYaGvBFpiZj
 qQlofaaGuVU44GAn+M0iYsUkMlCvhqNO1iN7e+Sv83le1nNJP4rdxosOIhscgc6g
 CmkQBPS+zSjmXwCAVLa1c5Lzs+LNWUBUdoASjaAPguZWl+HpBIfTWAqIPHxc1fiO
 +0f5Yd7Y0K273TTZlckDGo/O8EI79RkKBzcYQzTHkB1agJ7/vJzGuQuPISG3pEqQ
 6eBmQDsBJHS0eOzmUhT2EHW0wO/Qt0A8CMUSQ2/BK54yOqPPgNQlAjklrXWOO4OU
 jXNNNaGSbADIsnN0rFUouRCRvc4XKNF0zjPhPb6DyukOl1m41oAl4tTfMiR4aBi0
 tdZ/UZZ7RzoNkw6p6L7Cmxv/3KbkpBkF778UmqaygM/nG7QpBIeL3to7sUi9fnUU
 jTpQsLoDRCCnBxsHiZUARQ2jroHq3vjEBGQC5XPI4Y9zGTNm+BtR0JZK9a0uU3/O
 ysjRkEQekgR09OxiaaFP1ySuePI9x6//XcuxFKECq+SyP+udQucNWpuRZ5A8B1o7
 x+W5w2Egct1t1VQAjPYqml82BE2prE0gW3sngbfKyXA2fQmPBj0=
 =RyJn
 -----END PGP SIGNATURE-----

Merge tag 'v6.1.80' into lts2022/linux

This is the 6.1.80 stable release

* tag 'v6.1.80': (470 commits)
  Linux 6.1.80
  fs/ntfs3: Enhance the attribute size check
  arp: Prevent overflow in arp_req_get().
  ahci: Extend ASM1061 43-bit DMA address quirk to other ASM106x parts
  ata: ahci: add identifiers for ASM2116 series adapters
  mptcp: add needs_id for netlink appending addr
  mptcp: userspace pm send RM_ADDR for ID 0
  mm: zswap: fix missing folio cleanup in writeback race path
  fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio
  mm/damon/reclaim: fix quota stauts loss due to online tunings
  erofs: fix inconsistent per-file compression format
  erofs: simplify compression configuration parser
  i2c: imx: when being a target, mark the last read as processed
  drm/amd/display: Fix memory leak in dm_sw_fini()
  drm/syncobj: call drm_syncobj_fence_add_wait when WAIT_AVAILABLE flag is set
  net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY
  Fix write to cloned skb in ipv6_hop_ioam()
  phonet/pep: fix racy skb_queue_empty() use
  phonet: take correct lock to peek at the RX queue
  net: sparx5: Add spinlock for frame transmission from CPU
  ...
2024-03-22 01:33:06 +08:00
Christian A. Ehrhardt 8fc8087410 block: Fix WARNING in _copy_from_iter
[ Upstream commit 13f3956eb5681a4045a8dfdef48df5dc4d9f58a6 ]

Syzkaller reports a warning in _copy_from_iter because an
iov_iter is supposedly used in the wrong direction. The reason
is that syzcaller managed to generate a request with
a transfer direction of SG_DXFER_TO_FROM_DEV. This instructs
the kernel to copy user buffers into the kernel, read into
the copied buffers and then copy the data back to user space.

Thus the iovec is used in both directions.

Detect this situation in the block layer and construct a new
iterator with the correct direction for the copy-in.

Reported-by: syzbot+a532b03fdfee2c137666@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/0000000000009b92c10604d7a5e9@google.com/t/
Reported-by: syzbot+63dec323ac56c28e644f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/0000000000003faaa105f6e7c658@google.com/T/
Signed-off-by: Christian A. Ehrhardt <lk@c--e.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240121202634.275068-1-lk@c--e.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01 13:26:25 +01:00
Ranjan Dutta 3d5fcbdb49 Merge branch 'my/v6.1.77' into lts2022/linux
* my/v6.1.77: (1299 commits)
  Revert "drm/vmwgfx: Keep a gem reference to user bos in surfaces"
  Linux 6.1.77
  drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'
  ASoC: codecs: wsa883x: fix PA volume control
  ASoC: codecs: lpass-wsa-macro: fix compander volume hack
  bonding: remove print in bond_verify_device_path
  gve: Fix use-after-free vulnerability
  LoongArch/smp: Call rcutree_report_cpu_starting() at tlb_init()
  drm/msm/dsi: Enable runtime PM
  Revert "drm/amd/display: Disable PSR-SU on Parade 0803 TCON again"
  mm, kmsan: fix infinite recursion due to RCU critical section
  arm64: irq: set the correct node for shadow call stack
  selftests: bonding: Check initial state
  selftests: team: Add missing config options
  net: sysfs: Fix /sys/class/net/<iface> path
  selftests: net: fix available tunnels detection
  af_unix: fix lockdep positive in sk_diag_dump_icons()
  net: ipv4: fix a memleak in ip_setup_cork
  netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations
  netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger
  ...
2024-02-28 13:19:46 +08:00
Damien Le Moal e7d2e87abc block: fix partial zone append completion handling in req_bio_endio()
[ Upstream commit 748dc0b65ec2b4b7b3dbd7befcc4a54fdcac7988 ]

Partial completions of zone append request is not allowed but if a zone
append completion indicates a number of completed bytes different from
the original BIO size, only the BIO status is set to error. This leads
to bio_advance() not setting the BIO size to 0 and thus to not call
bio_endio() at the end of req_bio_endio().

Make sure a partially completed zone append is failed and completed
immediately by forcing the completed number of bytes (nbytes) to be
equal to the BIO size, thus ensuring that bio_endio() is called.

Fixes: 297db73184 ("block: fix req_bio_endio append error handling")
Cc: stable@kernel.vger.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240110092942.442334-1-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23 09:12:49 +01:00
Jens Axboe 492e0aba08 block: treat poll queue enter similarly to timeouts
commit 33391eecd6 upstream.

We ran into an issue where a production workload would randomly grind to
a halt and not continue until the pending IO had timed out. This turned
out to be a complicated interaction between queue freezing and polled
IO:

1) You have an application that does polled IO. At any point in time,
   there may be polled IO pending.

2) You have a monitoring application that issues a passthrough command,
   which is marked with side effects such that it needs to freeze the
   queue.

3) Passthrough command is started, which calls blk_freeze_queue_start()
   on the device. At this point the queue is marked frozen, and any
   attempt to enter the queue will fail (for non-blocking) or block.

4) Now the driver calls blk_mq_freeze_queue_wait(), which will return
   when the queue is quiesced and pending IO has completed.

5) The pending IO is polled IO, but any attempt to poll IO through the
   normal iocb_bio_iopoll() -> bio_poll() will fail when it gets to
   bio_queue_enter() as the queue is frozen. Rather than poll and
   complete IO, the polling threads will sit in a tight loop attempting
   to poll, but failing to enter the queue to do so.

The end result is that progress for either application will be stalled
until all pending polled IO has timed out. This causes obvious huge
latency issues for the application doing polled IO, but also long delays
for passthrough command.

Fix this by treating queue enter for polled IO just like we do for
timeouts. This allows quick quiesce of the queue as we still poll and
complete this IO, while still disallowing queueing up new IO.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-16 19:06:31 +01:00
Tejun Heo e5dc63f01e blk-iocost: Fix an UBSAN shift-out-of-bounds warning
[ Upstream commit 2a427b49d02995ea4a6ff93a1432c40fa4d36821 ]

When iocg_kick_delay() is called from a CPU different than the one which set
the delay, @now may be in the past of @iocg->delay_at leading to the
following warning:

  UBSAN: shift-out-of-bounds in block/blk-iocost.c:1359:23
  shift exponent 18446744073709 is too large for 64-bit type 'u64' (aka 'unsigned long long')
  ...
  Call Trace:
   <TASK>
   dump_stack_lvl+0x79/0xc0
   __ubsan_handle_shift_out_of_bounds+0x2ab/0x300
   iocg_kick_delay+0x222/0x230
   ioc_rqos_merge+0x1d7/0x2c0
   __rq_qos_merge+0x2c/0x80
   bio_attempt_back_merge+0x83/0x190
   blk_attempt_plug_merge+0x101/0x150
   blk_mq_submit_bio+0x2b1/0x720
   submit_bio_noacct_nocheck+0x320/0x3e0
   __swap_writepage+0x2ab/0x9d0

The underflow itself doesn't really affect the behavior in any meaningful
way; however, the past timestamp may exaggerate the delay amount calculated
later in the code, which shouldn't be a material problem given the nature of
the delay mechanism.

If @now is in the past, this CPU is racing another CPU which recently set up
the delay and there's nothing this CPU can contribute w.r.t. the delay.
Let's bail early from iocg_kick_delay() in such cases.

Reported-by: Breno Leitão <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 5160a5a53c ("blk-iocost: implement delay adjustment hysteresis")
Link: https://lore.kernel.org/r/ZVvc9L_CYk5LO1fT@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-16 19:06:29 +01:00
Ming Lei 1d9c777d3e blk-mq: fix IO hang from sbitmap wakeup race
[ Upstream commit 5266caaf5660529e3da53004b8b7174cab6374ed ]

In blk_mq_mark_tag_wait(), __add_wait_queue() may be re-ordered
with the following blk_mq_get_driver_tag() in case of getting driver
tag failure.

Then in __sbitmap_queue_wake_up(), waitqueue_active() may not observe
the added waiter in blk_mq_mark_tag_wait() and wake up nothing, meantime
blk_mq_mark_tag_wait() can't get driver tag successfully.

This issue can be reproduced by running the following test in loop, and
fio hang can be observed in < 30min when running it on my test VM
in laptop.

	modprobe -r scsi_debug
	modprobe scsi_debug delay=0 dev_size_mb=4096 max_queue=1 host_max_queue=1 submit_queues=4
	dev=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
	fio --filename=/dev/"$dev" --direct=1 --rw=randrw --bs=4k --iodepth=1 \
       		--runtime=100 --numjobs=40 --time_based --name=test \
        	--ioengine=libaio

Fix the issue by adding one explicit barrier in blk_mq_mark_tag_wait(), which
is just fine in case of running out of tag.

Cc: Jan Kara <jack@suse.cz>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Reported-by: Changhui Zhong <czhong@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240112122626.4181044-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05 20:12:59 +00:00
Christoph Hellwig 8ae4201900 block: prevent an integer overflow in bvec_try_merge_hw_page
[ Upstream commit 3f034c374ad55773c12dd8f3c1607328e17c0072 ]

Reordered a check to avoid a possible overflow when adding len to bv_len.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20231204173419.782378-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05 20:12:53 +00:00
Li Lingfeng 9564767b67 block: Move checking GENHD_FL_NO_PART to bdev_add_partition()
[ Upstream commit 7777f47f2ea64efd1016262e7b59fab34adfb869 ]

Commit 1a721de848 ("block: don't add or resize partition on the disk
with GENHD_FL_NO_PART") prevented all operations about partitions on disks
with GENHD_FL_NO_PART in blkpg_do_ioctl() since they are meaningless.
However, it changed error code in some scenarios. So move checking
GENHD_FL_NO_PART to bdev_add_partition() to eliminate impact.

Fixes: 1a721de848 ("block: don't add or resize partition on the disk with GENHD_FL_NO_PART")
Reported-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Closes: https://lore.kernel.org/all/CAOYeF9VsmqKMcQjo1k6YkGNujwN-nzfxY17N3F-CMikE1tYp+w@mail.gmail.com/
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240118130401.792757-1-lilingfeng@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-31 16:17:11 -08:00
Ranjan Dutta 1258d08042 This is the 6.1.69 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmWDD6YACgkQONu9yGCS
 aT6e3g/9EtbS2fa79p3YHc18vS/Hka3du7ih48MJVpy4XuqCdd9joPTtcG3agD9B
 XQE3Evd7qEDtKJNF91w6m+MtS2UjKYpMKwCtxqEpzf6kLags3lpRNLvW4oMk8pvu
 PqIRATFda5nBtgWM8bnexnvQYCmWH6hUBRHHZ7F4xn3KctEkCfRP6+OvKGxzYWNj
 FfHdvKxzS3akNp+9XV5BEAADcr5irZnvxDeafYgCn3LK5j1XfWNK/5yRixsysLm/
 6ZG4hqpo/9uY7Y84oe0hixcqTt6/9t3cgK8YgGHfEXI0YAlhtcQzMx3388UUuexc
 sYzzuineC7KXExYgyjZqnImVkoutFvoafkNuiWsRsqIG4LOXifcedOUF+d3C3mmN
 Xo66ids1eBots0CNnp+cDhhxMEHQMf2scN1xZDzHk0ZCkc9J7Uyuc/jFHYh07uRa
 awrbf5Q25yfR/94Xao6ALb4RnuSWxnulTY29ERF3edRTppsSzO3tljaWf8xwJiAf
 gKXxiSTV4hVrpyCFifITm41bj6B8hPsnqv+ipEwn2jP2kBjPhnI8z8GPtfwJXLgD
 NJYGA9+5NprX8yzHo+2012CyecNPw92QN9wu22KoM0o44yjRIsFxpx0Jf1UDsTQV
 3WHiLG8XoCCxTaM1u3Wjj5k6TBkXmK01aX04jeDH9WA/alSb580=
 =/f+L
 -----END PGP SIGNATURE-----

Merge tag 'v6.1.69' into lts2022/linux

This is the 6.1.69 stable release

# gpg verification failed.

# By Steven Rostedt (Google) (13) and others
# Via Greg Kroah-Hartman
* tag 'v6.1.69': (411 commits)
  Linux 6.1.69
  r8152: fix the autosuspend doesn't work
  r8152: remove rtl_vendor_mode function
  r8152: avoid to change cfg for all devices
  net: tls, update curr on splice as well
  ring-buffer: Have rb_time_cmpxchg() set the msb counter too
  ring-buffer: Do not try to put back write_stamp
  ring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs
  ring-buffer: Fix writing to the buffer with max_data_size
  ring-buffer: Have saved event hold the entire event
  ring-buffer: Do not update before stamp when switching sub-buffers
  tracing: Update snapshot buffer on resize if it is allocated
  ring-buffer: Fix memory leak of free page
  smb: client: fix OOB in smb2_query_reparse_point()
  smb: client: fix NULL deref in asn1_ber_decoder()
  smb: client: fix OOB in receive_encrypted_standard()
  drm/i915: Fix remapped stride with CCS on ADL+
  drm/amd/display: Disable PSR-SU on Parade 0803 TCON again
  drm/amdgpu: fix tear down order in amdgpu_vm_pt_free
  btrfs: don't clear qgroup reserved bit in release_folio
  ...

# Conflicts:
#	drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
#	drivers/gpu/drm/amd/display/dc/dc.h
#	drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
#	drivers/gpu/drm/i915/display/intel_dp.c
#	drivers/gpu/drm/i915/display/intel_dvo.c
#	drivers/gpu/drm/i915/display/intel_hdmi.c
#	drivers/gpu/drm/i915/display/intel_lvds.c
#	drivers/gpu/drm/i915/display/intel_sdvo.c
#	drivers/gpu/drm/i915/i915_reg.h
#	drivers/net/ethernet/stmicro/stmmac/dwmac5.c
#	drivers/net/ethernet/stmicro/stmmac/dwmac5.h
#	drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
#	drivers/net/ethernet/stmicro/stmmac/hwif.h
2024-01-30 07:39:52 +08:00
Matthew Wilcox (Oracle) 9025ee1079 block: Remove special-casing of compound pages
commit 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55 upstream.

The special casing was originally added in pre-git history; reproducing
the commit log here:

> commit a318a92567d77
> Author: Andrew Morton <akpm@osdl.org>
> Date:   Sun Sep 21 01:42:22 2003 -0700
>
>     [PATCH] Speed up direct-io hugetlbpage handling
>
>     This patch short-circuits all the direct-io page dirtying logic for
>     higher-order pages.  Without this, we pointlessly bounce BIOs up to
>     keventd all the time.

In the last twenty years, compound pages have become used for more than
just hugetlb.  Rewrite these functions to operate on folios instead
of pages and remove the special case for hugetlbfs; I don't think
it's needed any more (and if it is, we can put it back in as a call
to folio_test_hugetlb()).

This was found by inspection; as far as I can tell, this bug can lead
to pages used as the destination of a direct I/O read not being marked
as dirty.  If those pages are then reclaimed by the MM without being
dirtied for some other reason, they won't be written out.  Then when
they're faulted back in, they will not contain the data they should.
It'll take a pretty unusual setup to produce this problem with several
races all going the wrong way.

This problem predates the folio work; it could for example have been
triggered by mmaping a THP in tmpfs and using that as the target of an
O_DIRECT read.

Fixes: 800d8c63b2 ("shmem: add huge pages support")
Cc:  <stable@vger.kernel.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-25 15:27:52 -08:00
Jens Axboe 33cf52b6e5 block: ensure we hold a queue reference when using queue limits
[ Upstream commit 7b4f36cd22a65b750b4cb6ac14804fb7d6e6c67d ]

q_usage_counter is the only thing preventing us from the limits changing
under us in __bio_split_to_limits, but blk_mq_submit_bio doesn't hold
it while calling into it.

Move the splitting inside the region where we know we've got a queue
reference. Ideally this could still remain a shared section of code, but
let's keep the fix simple and defer any refactoring here to later.

Reported-by: Christoph Hellwig <hch@lst.de>
Fixes: 900e080752 ("block: move queue enter logic into blk_mq_submit_bio()")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25 15:27:49 -08:00
Min Li ef31cc8779 block: add check that partition length needs to be aligned with block size
commit 6f64f866aa1ae6975c95d805ed51d7e9433a0016 upstream.

Before calling add partition or resize partition, there is no check
on whether the length is aligned with the logical block size.
If the logical block size of the disk is larger than 512 bytes,
then the partition size maybe not the multiple of the logical block size,
and when the last sector is read, bio_truncate() will adjust the bio size,
resulting in an IO error if the size of the read command is smaller than
the logical block size.If integrity data is supported, this will also
result in a null pointer dereference when calling bio_integrity_free.

Cc:  <stable@vger.kernel.org>
Signed-off-by: Min Li <min15.li@samsung.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230629142517.121241-1-min15.li@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-25 15:27:42 -08:00
Keith Busch a623d31805 block: make BLK_DEF_MAX_SECTORS unsigned
[ Upstream commit 0a26f327e4 ]

This is used as an unsigned value, so define it that way to avoid
having to cast it.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20230105205146.3610282-2-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 9a9525de8654 ("null_blk: don't cap max_hw_sectors to BLK_DEF_MAX_SECTORS")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25 15:27:30 -08:00
Li Nan e765363ecf block: add check of 'minors' and 'first_minor' in device_add_disk()
[ Upstream commit 4c434392c4777881d01beada6701eff8c76b43fe ]

'first_minor' represents the starting minor number of disks, and
'minors' represents the number of partitions in the device. Neither
of them can be greater than MINORMASK + 1.

Commit e338924bd0 ("block: check minor range in device_add_disk()")
only added the check of 'first_minor + minors'. However, their sum might
be less than MINORMASK but their values are wrong. Complete the checks now.

Fixes: e338924bd0 ("block: check minor range in device_add_disk()")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20231219075942.840255-1-linan666@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25 15:27:28 -08:00
Li Nan 9f5b79cf12 block: Set memalloc_noio to false on device_add_disk() error path
[ Upstream commit 5fa3d1a00c2d4ba14f1300371ad39d5456e890d7 ]

On the error path of device_add_disk(), device's memalloc_noio flag was
set but not cleared. As the comment of pm_runtime_set_memalloc_noio(),
"The function should be called between device_add() and device_del()".
Clear this flag before device_del() now.

Fixes: 25e823c8c3 ("block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices")
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20231211075356.1839282-1-linan666@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25 15:27:27 -08:00
Ming Lei f84b0c6445 blk-mq: don't count completed flush data request as inflight in case of quiesce
[ Upstream commit 0e4237ae8d159e3d28f3cd83146a46f576ffb586 ]

Request queue quiesce may interrupt flush sequence, and the original request
may have been marked as COMPLETE, but can't get finished because of
queue quiesce.

This way is fine from driver viewpoint, because flush sequence is block
layer concept, and it isn't related with driver.

However, driver(such as dm-rq) can call blk_mq_queue_inflight() to count &
drain inflight requests, then the wait & drain never gets done because
the completed & not-finished flush request is counted as inflight.

Fix this issue by not counting completed flush data request as inflight in
case of quiesce.

Cc: Mike Snitzer <snitzer@kernel.org>
Cc: David Jeffery <djeffery@redhat.com>
Cc: John Pittman <jpittman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20231201085605.577730-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-20 11:50:04 +01:00
Christoph Hellwig bf223fd4d9 block: update the stable_writes flag in bdev_add
[ Upstream commit 1898efcdbed32bb1c67269c985a50bab0dbc9493 ]

Propagate the per-queue stable_write flags into each bdev inode in bdev_add.
This makes sure devices that require stable writes have it set for I/O
on the block device node as well.

Note that this doesn't cover the case of a flag changing on a live device
yet.  We should handle that as well, but I plan to cover it as part of a
more general rework of how changing runtime paramters on block devices
works.

Fixes: 1cb039f3dc ("bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag")
Reported-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20231025141020.192413-3-hch@lst.de
Tested-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-10 17:10:32 +01:00
Christoph Hellwig b5c8e0ff76 blk-mq: make sure active queue usage is held for bio_integrity_prep()
[ Upstream commit b0077e269f6c152e807fdac90b58caf012cdbaab ]

blk_integrity_unregister() can come if queue usage counter isn't held
for one bio with integrity prepared, so this request may be completed with
calling profile->complete_fn, then kernel panic.

Another constraint is that bio_integrity_prep() needs to be called
before bio merge.

Fix the issue by:

- call bio_integrity_prep() with one queue usage counter grabbed reliably

- call bio_integrity_prep() before bio merge

Fixes: 900e080752 ("block: move queue enter logic into blk_mq_submit_bio()")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/r/20231113035231.2708053-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-10 17:10:30 +01:00
Sarthak Kukreti 9539e3b56e block: Don't invalidate pagecache for invalid falloc modes
commit 1364a3c391 upstream.

Only call truncate_bdev_range() if the fallocate mode is supported. This
fixes a bug where data in the pagecache could be invalidated if the
fallocate() was called on the block device with an invalid mode.

Fixes: 25f4c41415 ("block: implement (some of) fallocate for block devices")
Cc: stable@vger.kernel.org
Reported-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Fixes: line?  I've never seen those wrapped.
Link: https://lore.kernel.org/r/20231011201230.750105-1-sarthakkukreti@chromium.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-10 17:10:20 +01:00
Ming Lei 94070fd668 blk-cgroup: bypass blkcg_deactivate_policy after destroying
[ Upstream commit e63a57303599b17290cd8bc48e6f20b24289a8bc ]

blkcg_deactivate_policy() can be called after blkg_destroy_all()
returns, and it isn't necessary since blkg_destroy_all has covered
policy deactivation.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20231117023527.3188627-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-20 17:00:21 +01:00
Ming Lei e52d0eb48e blk-throttle: fix lockdep warning of "cgroup_mutex or RCU read lock required!"
[ Upstream commit 27b13e209ddca5979847a1b57890e0372c1edcee ]

Inside blkg_for_each_descendant_pre(), both
css_for_each_descendant_pre() and blkg_lookup() requires RCU read lock,
and either cgroup_assert_mutex_or_rcu_locked() or rcu_read_lock_held()
is called.

Fix the warning by adding rcu read lock.

Reported-by: Changhui Zhong <czhong@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20231117023527.3188627-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-20 17:00:21 +01:00
Ranjan Dutta 382d87438f This is the 6.1.65 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmVsIPcACgkQONu9yGCS
 aT7+WBAAzFMBvadFg+miHsQM+j94gOCSSq4F01gjjchdyeB3ybE/CBfIEa9abfmZ
 X1qaor8H7Khxh0aPr4KiRsmjKXBGJ6lR1RjdOKeLwffs/1iUk1zHqC3V4jGELhAM
 WumR5Lyc1UOMA5oCk/oxGoDZ0YNzXwBwB3hTrhpvuogCw8A3qMiyzo7J928PmNr9
 sPo2TDi8HvQLlOZ8G9omVP9FTK20owJvfAj1u+gJyN/NGVXGqAQSvDpdhZ6BMYNG
 0Z6DlMdCkOF/iSCdsZBCwPXH697Qt4pkPoeYpqNEi9H54B/LQaRDg6K5z7ON+w+7
 jH9gwwSUXZLsohdpVkPWTnUThAQJDK4Wr5Pnf3GN1avePyxW4X7meathyeqP4jxD
 Oc8Igh464VraTunddwHJ03paoZ8/jXkheB0kxIsJ/jeKqUzxb/7gC6aYKZ3+DF3a
 0WicxlLCNTeai2zJCYPiQsxejJmwQ37PU6dcZzLyZefXqIVPBmLJ72HJ8j2zocm0
 zY6ezASdUjzzTQIM3CuzJfTOJ0VSeaUnyqUK64Ye7cKbiAKRbZMiSjaTfoNRo9MP
 8KasX7pEzyEjpO0rtpHKc0hM7imltXsYjcdDfJYkKBXSUMWRTI/wPH9RFE4sJHqh
 NmEG/8bAE0v6HaQJK83lEMHZJFGFTvXWySsXowU4gXpcw82/F54=
 =OY6r
 -----END PGP SIGNATURE-----

Merge tag 'v6.1.65' into lts2022/linux

This is the 6.1.65 stable release

# gpg verification failed.

# By Eric Dumazet (23) and others
# Via Greg Kroah-Hartman
* tag 'v6.1.65': (1185 commits)
  Linux 6.1.65
  io_uring: fix off-by one bvec index
  USB: dwc3: qcom: fix wakeup after probe deferral
  USB: dwc3: qcom: fix software node leak on probe errors
  usb: dwc3: set the dma max_seg_size
  usb: dwc3: Fix default mode initialization
  USB: dwc2: write HCINT with INTMASK applied
  usb: typec: tcpm: Skip hard reset when in error recovery
  USB: serial: option: don't claim interface 4 for ZTE MF290
  USB: serial: option: fix FM101R-GL defines
  USB: serial: option: add Fibocom L7xx modules
  usb: cdnsp: Fix deadlock issue during using NCM gadget
  bcache: fixup lock c->root error
  bcache: fixup init dirty data errors
  bcache: prevent potential division by zero error
  bcache: check return value from btree_node_alloc_replacement()
  dm-delay: fix a race between delay_presuspend and delay_bio
  hv_netvsc: Mark VF as slave before exposing it to user-mode
  hv_netvsc: Fix race of register_netdevice_notifier and VF register
  hv_netvsc: fix race of netvsc and VF register_netdevice
  ...

# Conflicts:
#	drivers/gpu/drm/bridge/lontium-lt8912b.c
#	drivers/gpu/drm/bridge/tc358768.c
#	drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
#	drivers/net/ethernet/intel/igc/igc.h
#	drivers/net/ethernet/intel/igc/igc_defines.h
#	drivers/net/ethernet/intel/igc/igc_main.c
#	drivers/net/ethernet/intel/igc/igc_ptp.c
#	drivers/net/ethernet/intel/igc/igc_tsn.c
#	drivers/tty/serial/serial_core.c
2023-12-18 11:01:58 +08:00
Yu Kuai 46c541fa66 blk-core: use pr_warn_ratelimited() in bio_check_ro()
[ Upstream commit 1b0a151c10a6d823f033023b9fdd9af72a89591b ]

If one of the underlying disks of raid or dm is set to read-only, then
each io will generate new log, which will cause message storm. This
environment is indeed problematic, however we can't make sure our
naive custormer won't do this, hence use pr_warn_ratelimited() to
prevent message storm in this case.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Fixes: 57e95e4670 ("block: fix and cleanup bio_check_ro")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20231107111247.2157820-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20 11:52:17 +01:00
Khazhismel Kumykov 6a5b845b57 blk-throttle: check for overflow in calculate_bytes_allowed
commit 2dd710d476 upstream.

Inexact, we may reject some not-overflowing values incorrectly, but
they'll be on the order of exabytes allowed anyways.

This fixes divide error crash on x86 if bps_limit is not configured or
is set too high in the rare case that jiffy_elapsed is greater than HZ.

Fixes: e8368b57c0 ("blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice()")
Fixes: 8d6bbaada2 ("blk-throttle: prevent overflow while calculating wait time")
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20231020223617.2739774-1-khazhy@google.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-02 09:35:29 +01:00
Ranjan Dutta a2989f3ed0 This is the 6.1.59 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmUxmvQACgkQONu9yGCS
 aT79txAAsdMG7H4n6wai9EsZa6fpenp5MaQdice97FinKvS3El1hmlNOJPY2idAC
 hFnfhebrWsyStCUcUs2KeiosZwKoKSRuIZ2l43P9o4tNbAoaJfp7EOihCKSJyl1K
 qcu1P7AJZH2GfR3wS86grRjlembwxYNaFVS+n4a9X3XfoMOTM9TLwIPyKzXmruUv
 4FjTH6clfQbg7lu80nPBeps1FKd2XXhiLfFH21ilnSY8ESQEKo0x10Vbi0/LqE/F
 QxvP2bFRMXyKl9HHQMAkIIjpQH+hZDzpbGx1hoC2I0xc92dTERHzzZxWkuUt2e/Y
 zGrgQ3gQR8VkpuhPuFRrSu3bZUlEr74zRyp3sXBG0RSOpK13xSYgcdRzBac/L0zT
 aSqvmIYuLnjf6qE85LEp/NAQAgKxQ2S2nGwSoN+cePb/zB0qlyXNfSsPg2mptOTi
 MOxgldZg10oNh0VXEayhtoJGUCJBjk1XUor0bAFj4u4GCiBbFfAt3e5fF0jauMva
 9b2s89qE5444dr98kdAcd79mEI0xkX9SbjCTY9lwTzA7xk+7iw+vOchLC7fQWoyf
 gspg7PdzCaFnDAS3WiIR3NqkSlGpv426Kr3kd/8jrLA3VB81Rb6bFX+E7iRzVJsd
 /YShEGesDA16aZ5BDnOMXMHzFbTEgfNDH1AiJoJcjq/V5cIuSc0=
 =Dwcv
 -----END PGP SIGNATURE-----

Merge tag 'v6.1.59' into lts2022/linux

This is the 6.1.59 stable release

# gpg verification failed.

# By Pablo Neira Ayuso (15) and others
# Via Greg Kroah-Hartman
* tag 'v6.1.59': (694 commits)
  Linux 6.1.59
  ALSA: hda/realtek - Fixed two speaker platform
  powerpc/64e: Fix wrong test in __ptep_test_and_clear_young()
  powerpc/8xx: Fix pte_access_permitted() for PAGE_NONE
  dmaengine: mediatek: Fix deadlock caused by synchronize_irq()
  dmaengine: idxd: use spin_lock_irqsave before wait_event_lock_irq
  x86/alternatives: Disable KASAN in apply_alternatives()
  usb: cdnsp: Fixes issue with dequeuing not queued requests
  usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call
  usb: gadget: udc-xilinx: replace memcpy with memcpy_toio
  usb: typec: ucsi: Clear EVENT_PENDING bit if ucsi_send_command fails
  usb: typec: altmodes/displayport: Signal hpd low when exiting mode
  counter: microchip-tcb-capture: Fix the use of internal GCLK logic
  counter: chrdev: fix getting array extensions
  scsi: ufs: core: Correct clear TM error log
  pinctrl: avoid unsafe code pattern in find_pinctrl()
  dma-buf: add dma_fence_timestamp helper
  cgroup: Remove duplicates in cgroup v1 tasks file
  usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope
  nfp: flower: avoid rmmod nfp crash issues
  ...

# Conflicts:
#	drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
#	drivers/gpu/drm/i915/gt/gen8_engine_cs.c
#	sound/soc/intel/boards/sof_es8336.c
#	sound/soc/intel/common/soc-acpi-intel-mtl-match.c
2023-10-28 02:52:56 +08:00
Ranjan Dutta 37245a0cb5 This is the 6.1.54 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmUJd8EACgkQONu9yGCS
 aT7crQ//ZsUDeoTMsQBU6lB2g32LODO3jVPXdGdRjLvpLVMMnKXXwl3uTC20CQ23
 mtlN1mku6OtyPHgorKK9nJoNVTG78v0wXL8iCe5GHEKri45FwmcKlCxtIqboGCcg
 bpRkLqfZ/cNVFeV/81n7kMFI/GHST2qym/lJfUkK0BIewXOrJozHMyCriLhG5uc/
 XPmXN3LlGmT7Gb2KwJeAgJ9IWrVu5ZEWH6CnpjnLPXMA3FGJiBiYPeGaWRsrdjth
 MvACPXKPu5tKAmEs6eyAhB1YbXbswKviDuY+YHeTMoOVYCfJY29VQTI16F6HBGeM
 XVCo1AovZV+B9OrgnzYA8x5iZIKCdk/PzUhBi+uUb3nLJhGpD8ha7wOuBjehINeo
 22YY+7fmB7lZVSAe14hDH7GjKNdYpxntPVpWCMa1yoCUtqKB1O44/10mj0OjZ5j4
 EXKXIe6ho+0Uatubd+3hWRXimz4jzlp7UY1QM9ge5MGp0wOmdLu5Q91T70CrCEJO
 RxXZSkHDKGxokXubl4oF0bYYpB1kRVgsNEc4H5i2k+OheyDBmVv3vRPMzT/2yim/
 BEqwX6x2sE7kvbsyCO5VxIIVsnAystJEKzdVlRxmrcqkV0FCdqHjwZ9cr0mpqOse
 ogdnQgXQpaGUyhdYcpo4U9f+WGi5AHXs3IMbKQN4SDZGDgJHrss=
 =XhWe
 -----END PGP SIGNATURE-----

Merge tag 'v6.1.54' into lts2022/linux

This is the 6.1.54 stable release

# gpg verification failed.

# By Konrad Dybcio (30) and others
# Via Greg Kroah-Hartman
* tag 'v6.1.54': (1179 commits)
  Linux 6.1.54
  drm/amd/display: Fix a bug when searching for insert_above_mpcc
  MIPS: Only fiddle with CHECKFLAGS if `need-compiler'
  kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg().
  ixgbe: fix timestamp configuration code
  tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.
  tcp: Fix bind() regression for v4-mapped-v6 wildcard address.
  tcp: Factorise sk_family-independent comparison in inet_bind2_bucket_match(_addr_any).
  ipv6: Remove in6addr_any alternatives.
  ipv6: fix ip6_sock_set_addr_preferences() typo
  net: macb: fix sleep inside spinlock
  net: macb: Enable PTP unicast
  net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()
  platform/mellanox: NVSW_SN2201 should depend on ACPI
  platform/mellanox: mlxbf-pmc: Fix reading of unprogrammed events
  platform/mellanox: mlxbf-pmc: Fix potential buffer overflows
  platform/mellanox: mlxbf-tmfifo: Drop jumbo frames
  platform/mellanox: mlxbf-tmfifo: Drop the Rx packet if no more descriptors
  kcm: Fix memory leak in error path of kcm_sendmsg()
  r8152: check budget for r8152_poll()
  ...

# Conflicts:
#	drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
#	drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.h
#	drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
#	drivers/gpu/drm/drm_aperture.c
#	drivers/gpu/drm/gma500/psb_drv.c
#	drivers/gpu/drm/i915/gt/gen8_engine_cs.c
#	drivers/gpu/drm/msm/msm_fbdev.c
#	drivers/gpu/drm/virtio/virtgpu_ioctl.c
#	drivers/tty/serial/qcom_geni_serial.c
#	drivers/video/aperture.c
#	drivers/video/fbdev/hyperv_fb.c
2023-10-24 23:06:25 +08:00
Ming Lei 752ec2d93e block: fix use-after-free of q->q_usage_counter
commit d36a9ea5e7 upstream.

For blk-mq, queue release handler is usually called after
blk_mq_freeze_queue_wait() returns. However, the
q_usage_counter->release() handler may not be run yet at that time, so
this can cause a use-after-free.

Fix the issue by moving percpu_ref_exit() into blk_free_queue_rcu().
Since ->release() is called with rcu read lock held, it is agreed that
the race should be covered in caller per discussion from the two links.

Reported-by: Zhang Wensheng <zhangwensheng@huaweicloud.com>
Reported-by: Zhong Jinghua <zhongjinghua@huawei.com>
Link: https://lore.kernel.org/linux-block/Y5prfOjyyjQKUrtH@T590/T/#u
Link: https://lore.kernel.org/lkml/Y4%2FmzMd4evRg9yDi@fedora/
Cc: Hillf Danton <hdanton@sina.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Dennis Zhou <dennis@kernel.org>
Fixes: 2b0d3d3e4f ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20221215021629.74870-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10 22:00:37 +02:00
Ranjan Dutta d508bf1a37 Merge branch 'my/v6.1.46' into lts2022/linux
# By Eric Dumazet (40) and others
* my/v6.1.46: (1546 commits)
  Revert "igc: set TP bit in 'supported' and 'advertising' fields of ethtool_link_ksettings"
  Revert "drm/client: Send hotplug event after registering a client"
  Revert "Revert "drm/i915: Disable DC states for all commits""
  Linux 6.1.46
  drm/amd/pm/smu7: move variables to where they are used
  sch_netem: fix issues in netem_change() vs get_dist_table()
  alpha: remove __init annotation from exported page_is_ram()
  ACPI: scan: Create platform device for CS35L56
  platform/x86: serial-multi-instantiate: Auto detect IRQ resource for CSC3551
  scsi: qedf: Fix firmware halt over suspend and resume
  scsi: qedi: Fix firmware halt over suspend and resume
  scsi: fnic: Replace return codes in fnic_clean_pending_aborts()
  scsi: core: Fix possible memory leak if device_add() fails
  scsi: snic: Fix possible memory leak if device_add() fails
  scsi: 53c700: Check that command slot is not NULL
  scsi: ufs: renesas: Fix private allocation
  scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
  scsi: core: Fix legacy /proc parsing buffer overflow
  netfilter: nf_tables: report use refcount overflow
  nvme-rdma: fix potential unbalanced freeze & unfreeze
  ...

# Conflicts:
#	drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c
#	drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
#	drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
#	drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
#	drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
#	drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
#	drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
#	drivers/gpu/drm/amd/display/dc/core/dc.c
#	drivers/gpu/drm/amd/display/dc/core/dc_link.c
#	drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
#	drivers/gpu/drm/amd/display/dc/core/dc_resource.c
#	drivers/gpu/drm/amd/display/dc/dc.h
#	drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
#	drivers/gpu/drm/amd/display/dc/dcn321/dcn321_resource.c
#	drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c
#	drivers/gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.h
#	drivers/gpu/drm/amd/display/dc/inc/core_types.h
#	drivers/gpu/drm/amd/display/dc/inc/dc_link_dp.h
#	drivers/gpu/drm/amd/pm/powerplay/inc/hwmgr.h
#	drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
#	drivers/gpu/drm/bridge/tc358768.c
#	drivers/gpu/drm/i915/display/intel_ddi.c
#	drivers/gpu/drm/i915/display/intel_display_types.h
#	drivers/gpu/drm/i915/display/intel_tc.c
#	drivers/gpu/drm/i915/display/intel_tc.h
#	drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
#	drivers/gpu/drm/msm/dsi/dsi_host.c
#	drivers/gpu/drm/msm/msm_gem_submit.c
#	drivers/gpu/drm/ttm/ttm_bo.c
#	drivers/gpu/drm/vkms/vkms_composer.c
#	drivers/gpu/drm/vkms/vkms_formats.c
#	drivers/net/ethernet/intel/igc/igc.h
#	drivers/tty/serial/serial_core.c
#	net/sched/sch_mqprio.c
2023-09-28 09:02:24 +08:00
Christoph Hellwig 00cf1dc13c block: factor out a bvec_set_page helper
[ Upstream commit d58cdfae6a ]

Add a helper to initialize a bvec based of a page pointer.  This will help
removing various open code bvec initializations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 1f0bbf2894 ("nvmet-tcp: pass iov_len instead of sg->length to bvec_set_page()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-23 11:11:08 +02:00
Yu Kuai 84d5779234 blk-throttle: consider 'carryover_ios/bytes' in throtl_trim_slice()
[ Upstream commit eead005664 ]

Currently, 'carryover_ios/bytes' is not handled in throtl_trim_slice(),
for consequence, 'carryover_ios/bytes' will be used to throttle bio
multiple times, for example:

1) set iops limit to 100, and slice start is 0, slice end is 100ms;
2) current time is 0, and 10 ios are dispatched, those io won't be
   throttled and io_disp is 10;
3) still at current time 0, update iops limit to 1000, carryover_ios is
   updated to (0 - 10) = -10;
4) in this slice(0 - 100ms), io_allowed = 100 + (-10) = 90, which means
   only 90 ios can be dispatched without waiting;
5) assume that io is throttled in slice(0 - 100ms), and
   throtl_trim_slice() update silce to (100ms - 200ms). In this case,
   'carryover_ios/bytes' is not cleared and still only 90 ios can be
   dispatched between 100ms - 200ms.

Fix this problem by updating 'carryover_ios/bytes' in
throtl_trim_slice().

Fixes: a880ae93e5 ("blk-throttle: fix io hung due to configuration updates")
Reported-by: zhuxiaohui <zhuxiaohui.400@bytedance.com>
Link: https://lore.kernel.org/all/20230812072116.42321-1-zhuxiaohui.400@bytedance.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230816012708.1193747-5-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19 12:28:00 +02:00
Yu Kuai fd2420905c blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice()
[ Upstream commit e8368b57c0 ]

There are no functional changes, just make the code cleaner.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230816012708.1193747-4-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: eead005664 ("blk-throttle: consider 'carryover_ios/bytes' in throtl_trim_slice()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19 12:28:00 +02:00
Li Lingfeng 5e4e9900e6 block: don't add or resize partition on the disk with GENHD_FL_NO_PART
commit 1a721de848 upstream.

Commit a33df75c63 ("block: use an xarray for disk->part_tbl") remove
disk_expand_part_tbl() in add_partition(), which means all kinds of
devices will support extended dynamic `dev_t`.
However, some devices with GENHD_FL_NO_PART are not expected to add or
resize partition.
Fix this by adding check of GENHD_FL_NO_PART before add or resize
partition.

Fixes: a33df75c63 ("block: use an xarray for disk->part_tbl")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230831075900.1725842-1-lilingfeng@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-13 09:43:02 +02:00
Zhiguo Niu 9183c4fe91 block/mq-deadline: use correct way to throttling write requests
[ Upstream commit d47f9717e5 ]

The original formula was inaccurate:
dd->async_depth = max(1UL, 3 * q->nr_requests / 4);

For write requests, when we assign a tags from sched_tags,
data->shallow_depth will be passed to sbitmap_find_bit,
see the following code:

nr = sbitmap_find_bit_in_word(&sb->map[index],
			min_t (unsigned int,
			__map_depth(sb, index),
			depth),
			alloc_hint, wrap);

The smaller of data->shallow_depth and __map_depth(sb, index)
will be used as the maximum range when allocating bits.

For a mmc device (one hw queue, deadline I/O scheduler):
q->nr_requests = sched_tags = 128, so according to the previous
calculation method, dd->async_depth = data->shallow_depth = 96,
and the platform is 64bits with 8 cpus, sched_tags.bitmap_tags.sb.shift=5,
sb.maps[]=32/32/32/32, 32 is smaller than 96, whether it is a read or
a write I/O, tags can be allocated to the maximum range each time,
which has not throttling effect.

In addition, refer to the methods of bfg/kyber I/O scheduler,
limit ratiois are calculated base on sched_tags.bitmap_tags.sb.shift.

This patch can throttle write requests really.

Fixes: 07757588e5 ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests")

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/1691061162-22898-1-git-send-email-zhiguo.niu@unisoc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:42 +02:00
Christoph Hellwig 00c0b2825b block: don't allow enabling a cache on devices that don't support it
[ Upstream commit 43c9835b14 ]

Currently the write_cache attribute allows enabling the QUEUE_FLAG_WC
flag on devices that never claimed the capability.

Fix that by adding a QUEUE_FLAG_HW_WC flag that is set by
blk_queue_write_cache and guards re-enabling the cache through sysfs.

Note that any rescan that calls blk_queue_write_cache will still
re-enable the write cache as in the current code.

Fixes: 93e9d8e836 ("block: add ability to flag write back caching on a device")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230707094239.107968-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:40 +02:00
Christoph Hellwig e5e0ec8ff1 block: cleanup queue_wc_store
[ Upstream commit c4e21bcd0f ]

Get rid of the local queue_wc_store variable and handling setting and
clearing the QUEUE_FLAG_WC flag diretly instead the if / else if.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230707094239.107968-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 43c9835b14 ("block: don't allow enabling a cache on devices that don't support it")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:39 +02:00
Sweet Tea Dorminy 8ad3bfdd22 blk-crypto: dynamically allocate fallback profile
commit c984ff1423 upstream.

blk_crypto_profile_init() calls lockdep_register_key(), which warns and
does not register if the provided memory is a static object.
blk-crypto-fallback currently has a static blk_crypto_profile and calls
blk_crypto_profile_init() thereupon, resulting in the warning and
failure to register.

Fortunately it is simple enough to use a dynamically allocated profile
and make lockdep function correctly.

Fixes: 2fb48d88e7 ("blk-crypto: use dynamic lock class for blk_crypto_profile::lock")
Cc: stable@vger.kernel.org
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20230817141615.15387-1-sweettea-kernel@dorminy.me
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-23 17:52:39 +02:00
Ross Lagerwall 557ea2ff05 blk-mq: Fix stall due to recursive flush plug
[ Upstream commit 7090426351 ]

We have seen rare IO stalls as follows:

* blk_mq_plug_issue_direct() is entered with an mq_list containing two
requests.
* For the first request, it sets last == false and enters the driver's
queue_rq callback.
* The driver queue_rq callback indirectly calls schedule() which calls
blk_flush_plug(). This may happen if the driver has the
BLK_MQ_F_BLOCKING flag set and is allowed to sleep in ->queue_rq.
* blk_flush_plug() handles the remaining request in the mq_list. mq_list
is now empty.
* The original call to queue_rq resumes (with last == false).
* The loop in blk_mq_plug_issue_direct() terminates because there are no
remaining requests in mq_list.

The IO is now stalled because the last request submitted to the driver
had last == false and there was no subsequent call to commit_rqs().

Fix this by returning early in blk_mq_flush_plug_list() if rq_count is 0
which it will be in the recursive case, rather than checking if the
mq_list is empty. At the same time, adjust one of the callers to skip
the mq_list empty check as it is not necessary.

Fixes: dc5fc361d8 ("block: attempt direct issue of plug list")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230714101106.3635611-1-ross.lagerwall@citrix.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-03 10:23:48 +02:00
Eric Biggers 49f6ac6f1c blk-crypto: use dynamic lock class for blk_crypto_profile::lock
[ Upstream commit 2fb48d88e7 ]

When a device-mapper device is passing through the inline encryption
support of an underlying device, calls to blk_crypto_evict_key() take
the blk_crypto_profile::lock of the device-mapper device, then take the
blk_crypto_profile::lock of the underlying device (nested).  This isn't
a real deadlock, but it causes a lockdep report because there is only
one lock class for all instances of this lock.

Lockdep subclasses don't really work here because the hierarchy of block
devices is dynamic and could have more than 2 levels.

Instead, register a dynamic lock class for each blk_crypto_profile, and
associate that with the lock.

This avoids false-positive lockdep reports like the following:

    ============================================
    WARNING: possible recursive locking detected
    6.4.0-rc5 #2 Not tainted
    --------------------------------------------
    fscryptctl/1421 is trying to acquire lock:
    ffffff80829ca418 (&profile->lock){++++}-{3:3}, at: __blk_crypto_evict_key+0x44/0x1c0

                   but task is already holding lock:
    ffffff8086b68ca8 (&profile->lock){++++}-{3:3}, at: __blk_crypto_evict_key+0xc8/0x1c0

                   other info that might help us debug this:
     Possible unsafe locking scenario:

           CPU0
           ----
      lock(&profile->lock);
      lock(&profile->lock);

                    *** DEADLOCK ***

     May be due to missing lock nesting notation

Fixes: 1b26283970 ("block: Keyslot Manager for Inline Encryption")
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230610061139.212085-1-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-23 13:49:21 +02:00
Michael Schmitz 899cc8f798 block/partition: fix signedness issue for Amiga partitions
commit 7eb1e47696 upstream.

Making 'blk' sector_t (i.e. 64 bit if LBD support is active) fails the
'blk>0' test in the partition block loop if a value of (signed int) -1 is
used to mark the end of the partition block list.

Explicitly cast 'blk' to signed int to allow use of -1 to terminate the
partition block linked list.

Fixes: b6f3f28f60 ("block: add overflow checks for Amiga partition support")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Link: https://lore.kernel.org/r/024ce4fa-cc6d-50a2-9aae-3701d0ebf668@xenosoft.de
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Martin Steigerwald <martin@lichtvoll.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19 16:22:17 +02:00
Demi Marie Obenour defc914227 block: increment diskseq on all media change events
commit b90ecc0379 upstream.

Currently, associating a loop device with a different file descriptor
does not increment its diskseq.  This allows the following race
condition:

1. Program X opens a loop device
2. Program X gets the diskseq of the loop device.
3. Program X associates a file with the loop device.
4. Program X passes the loop device major, minor, and diskseq to
   something.
5. Program X exits.
6. Program Y detaches the file from the loop device.
7. Program Y attaches a different file to the loop device.
8. The opener finally gets around to opening the loop device and checks
   that the diskseq is what it expects it to be.  Even though the
   diskseq is the expected value, the result is that the opener is
   accessing the wrong file.

From discussions with Christoph Hellwig, it appears that
disk_force_media_change() was supposed to call inc_diskseq(), but in
fact it does not.  Adding a Fixes: tag to indicate this.  Christoph's
Reported-by is because he stated that disk_force_media_change()
calls inc_diskseq(), which is what led me to discover that it should but
does not.

Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Fixes: e6138dc12d ("block: add a helper to raise a media changed event")
Cc: stable@vger.kernel.org # 5.15+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230607170837.1559-1-demi@invisiblethingslab.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19 16:21:47 +02:00
Michael Schmitz 40d6a1261a block: add overflow checks for Amiga partition support
commit b6f3f28f60 upstream.

The Amiga partition parser module uses signed int for partition sector
address and count, which will overflow for disks larger than 1 TB.

Use u64 as type for sector address and size to allow using disks up to
2 TB without LBD support, and disks larger than 2 TB with LBD. The RBD
format allows to specify disk sizes up to 2^128 bytes (though native
OS limitations reduce this somewhat, to max 2^68 bytes), so check for
u64 overflow carefully to protect against overflowing sector_t.

Bail out if sector addresses overflow 32 bits on kernels without LBD
support.

This bug was reported originally in 2012, and the fix was created by
the RDB author, Joanne Dow <jdow@earthlink.net>. A patch had been
discussed and reviewed on linux-m68k at that time but never officially
submitted (now resubmitted as patch 1 in this series).
This patch adds additional error checking and warning messages.

Reported-by: Martin Steigerwald <Martin@lichtvoll.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Message-ID: <201206192146.09327.Martin@lichtvoll.de>
Cc: <stable@vger.kernel.org> # 5.2
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/r/20230620201725.7020-4-schmitzmic@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19 16:21:47 +02:00
Michael Schmitz a4c79ea1e9 block: fix signed int overflow in Amiga partition support
commit fc3d092c6b upstream.

The Amiga partition parser module uses signed int for partition sector
address and count, which will overflow for disks larger than 1 TB.

Use sector_t as type for sector address and size to allow using disks
up to 2 TB without LBD support, and disks larger than 2 TB with LBD.

This bug was reported originally in 2012, and the fix was created by
the RDB author, Joanne Dow <jdow@earthlink.net>. A patch had been
discussed and reviewed on linux-m68k at that time but never officially
submitted. This patch differs from Joanne's patch only in its use of
sector_t instead of unsigned int. No checking for overflows is done
(see patch 3 of this series for that).

Reported-by: Martin Steigerwald <Martin@lichtvoll.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Message-ID: <201206192146.09327.Martin@lichtvoll.de>
Cc: <stable@vger.kernel.org> # 5.2
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Tested-by: Martin Steigerwald <Martin@lichtvoll.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620201725.7020-2-schmitzmic@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19 16:21:47 +02:00
Yu Kuai aa07e56c6a block: fix blktrace debugfs entries leakage
[ Upstream commit dd7de3704a ]

Commit 99d055b4fd ("block: remove per-disk debugfs files in
blk_unregister_queue") moves blk_trace_shutdown() from
blk_release_queue() to blk_unregister_queue(), this is safe if blktrace
is created through sysfs, however, there is a regression in corner
case.

blktrace can still be enabled after del_gendisk() through ioctl if
the disk is opened before del_gendisk(), and if blktrace is not shutdown
through ioctl before closing the disk, debugfs entries will be leaked.

Fix this problem by shutdown blktrace in disk_release(), this is safe
because blk_trace_remove() is reentrant.

Fixes: 99d055b4fd ("block: remove per-disk debugfs files in blk_unregister_queue")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230610022003.2557284-4-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:20:58 +02:00
Yu Kuai 931bd6758b blk-mq: fix potential io hang by wrong 'wake_batch'
[ Upstream commit 4f1731df60 ]

In __blk_mq_tag_busy/idle(), updating 'active_queues' and calculating
'wake_batch' is not atomic:

t1:			t2:
_blk_mq_tag_busy	blk_mq_tag_busy
inc active_queues
// assume 1->2
			inc active_queues
			// 2 -> 3
			blk_mq_update_wake_batch
			// calculate based on 3
blk_mq_update_wake_batch
/* calculate based on 2, while active_queues is actually 3. */

Fix this problem by protecting them wih 'tags->lock', this is not a hot
path, so performance should not be concerned. And now that all writers
are inside the lock, switch 'actives_queues' from atomic to unsigned
int.

Fixes: 180dccb0db ("blk-mq: fix tag_get wait task can't be awakened")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230610023043.2559121-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:20:55 +02:00
Li Nan 8ceeb3fc86 blk-iocost: use spin_lock_irqsave in adjust_inuse_and_calc_cost
[ Upstream commit 8d21155467 ]

adjust_inuse_and_calc_cost() use spin_lock_irq() and IRQ will be enabled
when unlock. DEADLOCK might happen if we have held other locks and disabled
IRQ before invoking it.

Fix it by using spin_lock_irqsave() instead, which can keep IRQ state
consistent with before when unlock.

  ================================
  WARNING: inconsistent lock state
  5.10.0-02758-g8e5f91fd772f #26 Not tainted
  --------------------------------
  inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
  kworker/2:3/388 [HC0[0]:SC0[0]:HE0:SE1] takes:
  ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq
  ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390
  {IN-HARDIRQ-W} state was registered at:
    __lock_acquire+0x3d7/0x1070
    lock_acquire+0x197/0x4a0
    __raw_spin_lock_irqsave
    _raw_spin_lock_irqsave+0x3b/0x60
    bfq_idle_slice_timer_body
    bfq_idle_slice_timer+0x53/0x1d0
    __run_hrtimer+0x477/0xa70
    __hrtimer_run_queues+0x1c6/0x2d0
    hrtimer_interrupt+0x302/0x9e0
    local_apic_timer_interrupt
    __sysvec_apic_timer_interrupt+0xfd/0x420
    run_sysvec_on_irqstack_cond
    sysvec_apic_timer_interrupt+0x46/0xa0
    asm_sysvec_apic_timer_interrupt+0x12/0x20
  irq event stamp: 837522
  hardirqs last  enabled at (837521): [<ffffffff84b9419d>] __raw_spin_unlock_irqrestore
  hardirqs last  enabled at (837521): [<ffffffff84b9419d>] _raw_spin_unlock_irqrestore+0x3d/0x40
  hardirqs last disabled at (837522): [<ffffffff84b93fa3>] __raw_spin_lock_irq
  hardirqs last disabled at (837522): [<ffffffff84b93fa3>] _raw_spin_lock_irq+0x43/0x50
  softirqs last  enabled at (835852): [<ffffffff84e00558>] __do_softirq+0x558/0x8ec
  softirqs last disabled at (835845): [<ffffffff84c010ff>] asm_call_irq_on_stack+0xf/0x20

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&bfqd->lock);
    <Interrupt>
      lock(&bfqd->lock);

   *** DEADLOCK ***

  3 locks held by kworker/2:3/388:
   #0: ffff888107af0f38 ((wq_completion)kthrotld){+.+.}-{0:0}, at: process_one_work+0x742/0x13f0
   #1: ffff8881176bfdd8 ((work_completion)(&td->dispatch_work)){+.+.}-{0:0}, at: process_one_work+0x777/0x13f0
   #2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq
   #2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390

  stack backtrace:
  CPU: 2 PID: 388 Comm: kworker/2:3 Not tainted 5.10.0-02758-g8e5f91fd772f #26
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  Workqueue: kthrotld blk_throtl_dispatch_work_fn
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x107/0x167
   print_usage_bug
   valid_state
   mark_lock_irq.cold+0x32/0x3a
   mark_lock+0x693/0xbc0
   mark_held_locks+0x9e/0xe0
   __trace_hardirqs_on_caller
   lockdep_hardirqs_on_prepare.part.0+0x151/0x360
   trace_hardirqs_on+0x5b/0x180
   __raw_spin_unlock_irq
   _raw_spin_unlock_irq+0x24/0x40
   spin_unlock_irq
   adjust_inuse_and_calc_cost+0x4fb/0x970
   ioc_rqos_merge+0x277/0x740
   __rq_qos_merge+0x62/0xb0
   rq_qos_merge
   bio_attempt_back_merge+0x12c/0x4a0
   blk_mq_sched_try_merge+0x1b6/0x4d0
   bfq_bio_merge+0x24a/0x390
   __blk_mq_sched_bio_merge+0xa6/0x460
   blk_mq_sched_bio_merge
   blk_mq_submit_bio+0x2e7/0x1ee0
   __submit_bio_noacct_mq+0x175/0x3b0
   submit_bio_noacct+0x1fb/0x270
   blk_throtl_dispatch_work_fn+0x1ef/0x2b0
   process_one_work+0x83e/0x13f0
   process_scheduled_works
   worker_thread+0x7e3/0xd80
   kthread+0x353/0x470
   ret_from_fork+0x1f/0x30

Fixes: b0853ab4a2 ("blk-iocost: revamp in-period donation snapbacks")
Signed-off-by: Li Nan <linan122@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20230527091904.3001833-1-linan666@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:20:55 +02:00
Greg Kroah-Hartman 4f8e0e8336 driver core: make struct device_type.uevent() take a const *
The uevent() callback in struct device_type should not be modifying the
device that is passed into it, so mark it as a const * and propagate the
function signature changes out into all relevant subsystems that use
this callback.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andreas Noever <andreas.noever@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bard Liao <yung-chuan.liao@linux.intel.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jilin Yuan <yuanjilin@cdjrlc.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Mark Gross <markgross@kernel.org>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Maximilian Luz <luzmaximilian@gmail.com>
Cc: Michael Jamet <michael.jamet@intel.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Sanyog Kale <sanyog.r.kale@intel.com>
Cc: Sean Young <sean@mess.org>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Won Chung <wonchung@google.com>
Cc: Yehezkel Bernat <YehezkelShB@gmail.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for Thunderbolt
Acked-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Wolfram Sang <wsa@kernel.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230111113018.459199-6-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-18 13:21:55 +08:00
Tian Lan b64bbe8b1a blk-mq: fix blk_mq_hw_ctx active request accounting
[ Upstream commit ddad59331a ]

The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.

kernel-stack

  INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
  Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
    Call Trace:
    <TASK>
    __schedule+0x351/0xa20
    scheduler+0x5d/0xe0
    io_schedule+0x42/0x70
    blk_mq_get_tag+0x11a/0x2a0
    ? dequeue_task_stop+0x70/0x70
    __blk_mq_alloc_requests+0x191/0x2e0

kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.

  320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
         b'__blk_mq_free_request+0x1 [kernel]'
         b'bt_iter+0x50 [kernel]'
         b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
         b'blk_mq_timeout_work+0x7c [kernel]'
         b'process_one_work+0x1c4 [kernel]'
         b'worker_thread+0x4d [kernel]'
         b'kthread+0xe6 [kernel]'
         b'ret_from_fork+0x1f [kernel]'

Signed-off-by: Tian Lan <tian.lan@twosigma.com>
Fixes: 2e315dc07d ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230513221227.497327-1-tilan7663@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-14 11:15:31 +02:00
Damien Le Moal 2a72e6814f block: fix revalidate performance regression
commit 47fe1c3064 upstream.

The scsi driver function sd_read_block_characteristics() always calls
disk_set_zoned() to a disk zoned model correctly, in case the device
model changed. This is done even for regular disks to set the zoned
model to BLK_ZONED_NONE and free any zone related resources if the drive
previously was zoned.

This behavior significantly impact the time it takes to revalidate disks
on a large system as the call to disk_clear_zone_settings() done from
disk_set_zoned() for the BLK_ZONED_NONE case results in the device
request queued to be frozen, even if there are no zone resources to
free.

Avoid this overhead for non-zoned devices by not calling
disk_clear_zone_settings() in disk_set_zoned() if the device model
was already set to BLK_ZONED_NONE, which is always the case for regular
devices.

Reported by: Brian Bunker <brian@purestorage.com>

Fixes: 508aebb805 ("block: introduce blk_queue_clear_zone_settings()")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20230529073237.1339862-1-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-09 10:34:23 +02:00
Loic Poulain 7df6008b87 block: Deny writable memory mapping if block is read-only
[ Upstream commit 69baa3a623 ]

User should not be able to write block device if it is read-only at
block level (e.g force_ro attribute). This is ensured in the regular
fops write operation (blkdev_write_iter) but not when writing via
user mapping (mmap), allowing user to actually write a read-only
block device via a PROT_WRITE mapping.

Example: This can lead to integrity issue of eMMC boot partition
(e.g mmcblk0boot0) which is read-only by default.

To fix this issue, simply deny shared writable mapping if the block
is readonly.

Note: Block remains writable if switch to read-only is performed
after the initial mapping, but this is expected behavior according
to commit a32e236eb9 ("Partially revert "block: fail op_is_write()
requests to read-only partitions"")'.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230510074223.991297-1-loic.poulain@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09 10:34:17 +02:00