Commit Graph

1201957 Commits

Author SHA1 Message Date
Dave Airlie dd64d8ae0f - Fix the flow for ignoring GuC SLPC efficient frequency selection (Vinay)
- Fix SDVO panel_type initialization (Jani)
 - Fix display probe for IVB Q and IVB D GT2 server (Jani)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAmTeMewACgkQ+mJfZA7r
 E8qILgf/cvNbXxfp0D0oqpMOQxj9b0yBZ1IFNwXtyDBRHpfZaSYOgkxhh0ybcGei
 p15AOwk96hET5wo2KmBsq6JakjBU7GNfRtFISOQBPevo4pFPSKO/XZTOFzp7Wb2M
 eqDI9EVaedGRqV+oEiRfr1XpCPunIDg7jaCC1Fl/aD+iX93GO1ExuiPgHJNdIA36
 KfqVF9bgDupt9foCzPktJpwglG1xIOXCphXUogZ+15FtIfvgFvPhIGnDBVoTH9av
 A1kr8d114K+VPT1YqfdrgNHMw4doHcejcnNxESn0kXI2KptCKhgBHAr/93ytMz9r
 mix3VDSvBIWLtj18ca61BbtLKL7org==
 =Bx6T
 -----END PGP SIGNATURE-----

Merge tag 'drm-intel-fixes-2023-08-17' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Fix the flow for ignoring GuC SLPC efficient frequency selection (Vinay)
- Fix SDVO panel_type initialization (Jani)
- Fix display probe for IVB Q and IVB D GT2 server (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZN4yduyBU1Ev9dc7@intel.com
2023-08-18 06:08:43 +10:00
Jakub Kicinski e9bbd60169 mlx5-fixes-2023-08-16
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmTdNAEACgkQSD+KveBX
 +j76qQf+PO4Po2ihQifANqZdJ75PumJxvHw3GUtroq0ELt2CqUx4cFwnVVoy2+6Y
 fYctk0UpFu5yRHkPUv5jVgP8ixDgLiJ8wLR15evCi3iDKxi0Ta/e+8EsTfKBrtI4
 GXK7wYTL+awZz5QDyerkMjqDSDDJ8KBgL4p2B1jpKgPMvd5gYWxqG+6MC5b/AAFe
 jqZAB6vocCYH6TGRBn7Gn2b6jnWhG35HBViv+nUCjZKRpVrkNq4wbh9RfeAgOaqz
 8rEBII19EnbaBj+RpJrLNgSINmv5TQCd4PY0CLkhqFYtJw6nlYEYcuMkdfi8VUO8
 P2UNhznOnCtJqai0Unsz2tR6H20uzQ==
 =BUoK
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2023-08-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2023-08-16

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2023-08-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Fix mlx5_cmd_update_root_ft() error flow
  net/mlx5e: XDP, Fix fifo overrun on XDP_REDIRECT
====================

Link: https://lore.kernel.org/r/20230816204108.53819-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17 11:56:44 -07:00
Marcin Szycik 43d00e102d ice: Block switchdev mode when ADQ is active and vice versa
ADQ and switchdev are not supported simultaneously. Enabling both at the
same time can result in nullptr dereference.

To prevent this, check if ADQ is active when changing devlink mode to
switchdev mode, and check if switchdev is active when enabling ADQ.

Fixes: fbc7b27af0 ("ice: enable ndo_setup_tc support for mqprio_qdisc")
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230816193405.1307580-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17 11:55:40 -07:00
Manish Chopra 2eb9625a3a qede: fix firmware halt over suspend and resume
While performing certain power-off sequences, PCI drivers are
called to suspend and resume their underlying devices through
PCI PM (power management) interface. However this NIC hardware
does not support PCI PM suspend/resume operations so system wide
suspend/resume leads to bad MFW (management firmware) state which
causes various follow-up errors in driver when communicating with
the device/firmware afterwards.

To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding
system to go into suspended/standby mode.

Without this fix device/firmware does not recover unless system
is power cycled.

Fixes: 2950219d87 ("qede: Add basic network device support")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230816150711.59035-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17 11:55:39 -07:00
Eric Dumazet b616be6b97 net: do not allow gso_size to be set to GSO_BY_FRAGS
One missing check in virtio_net_hdr_to_skb() allowed
syzbot to crash kernels again [1]

Do not allow gso_size to be set to GSO_BY_FRAGS (0xffff),
because this magic value is used by the kernel.

[1]
general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
CPU: 0 PID: 5039 Comm: syz-executor401 Not tainted 6.5.0-rc5-next-20230809-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
RIP: 0010:skb_segment+0x1a52/0x3ef0 net/core/skbuff.c:4500
Code: 00 00 00 e9 ab eb ff ff e8 6b 96 5d f9 48 8b 84 24 00 01 00 00 48 8d 78 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e ea 21 00 00 48 8b 84 24 00 01
RSP: 0018:ffffc90003d3f1c8 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 000000000001fffe RCX: 0000000000000000
RDX: 000000000000000e RSI: ffffffff882a3115 RDI: 0000000000000070
RBP: ffffc90003d3f378 R08: 0000000000000005 R09: 000000000000ffff
R10: 000000000000ffff R11: 5ee4a93e456187d6 R12: 000000000001ffc6
R13: dffffc0000000000 R14: 0000000000000008 R15: 000000000000ffff
FS: 00005555563f2380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020020000 CR3: 000000001626d000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
udp6_ufo_fragment+0x9d2/0xd50 net/ipv6/udp_offload.c:109
ipv6_gso_segment+0x5c4/0x17b0 net/ipv6/ip6_offload.c:120
skb_mac_gso_segment+0x292/0x610 net/core/gso.c:53
__skb_gso_segment+0x339/0x710 net/core/gso.c:124
skb_gso_segment include/net/gso.h:83 [inline]
validate_xmit_skb+0x3a5/0xf10 net/core/dev.c:3625
__dev_queue_xmit+0x8f0/0x3d60 net/core/dev.c:4329
dev_queue_xmit include/linux/netdevice.h:3082 [inline]
packet_xmit+0x257/0x380 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3087 [inline]
packet_sendmsg+0x24c7/0x5570 net/packet/af_packet.c:3119
sock_sendmsg_nosec net/socket.c:727 [inline]
sock_sendmsg+0xd9/0x180 net/socket.c:750
____sys_sendmsg+0x6ac/0x940 net/socket.c:2496
___sys_sendmsg+0x135/0x1d0 net/socket.c:2550
__sys_sendmsg+0x117/0x1e0 net/socket.c:2579
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7ff27cdb34d9

Fixes: 3953c46c3a ("sk_buff: allow segmenting based on frag sizes")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230816142158.1779798-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17 11:53:40 -07:00
Abel Wu 2d0c88e84e sock: Fix misuse of sk_under_memory_pressure()
The status of global socket memory pressure is updated when:

  a) __sk_mem_raise_allocated():

	enter: sk_memory_allocated(sk) >  sysctl_mem[1]
	leave: sk_memory_allocated(sk) <= sysctl_mem[0]

  b) __sk_mem_reduce_allocated():

	leave: sk_under_memory_pressure(sk) &&
		sk_memory_allocated(sk) < sysctl_mem[0]

So the conditions of leaving global pressure are inconstant, which
may lead to the situation that one pressured net-memcg prevents the
global pressure from being cleared when there is indeed no global
pressure, thus the global constrains are still in effect unexpectedly
on the other sockets.

This patch fixes this by ignoring the net-memcg's pressure when
deciding whether should leave global memory pressure.

Fixes: e1aab161e0 ("socket: initial cgroup code.")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Link: https://lore.kernel.org/r/20230816091226.1542-1-wuyun.abel@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17 11:34:36 -07:00
Edward Cree 54c9016eb8 sfc: don't fail probe if MAE/TC setup fails
Existing comment in the source explains why we don't want efx_init_tc()
 failure to be fatal.  Cited commit erroneously consolidated failure
 paths causing the probe to be failed in this case.

Fixes: 7e056e2360 ("sfc: obtain device mac address based on firmware handle for ef100")
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/aa7f589dd6028bd1ad49f0a85f37ab33c09b2b45.1692114888.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17 11:31:07 -07:00
Edward Cree fa165e1949 sfc: don't unregister flow_indr if it was never registered
In efx_init_tc(), move the setting of efx->tc->up after the
 flow_indr_dev_register() call, so that if it fails, efx_fini_tc()
 won't call flow_indr_dev_unregister().

Fixes: 5b2e12d51b ("sfc: bind indirect blocks for TC offload on EF100")
Suggested-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/a81284d7013aba74005277bd81104e4cfbea3f6f.1692114888.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-17 11:31:07 -07:00
Mark Brown 2f43f549cd arm64/ptrace: Ensure that the task sees ZT writes on first use
When the value of ZT is set via ptrace we don't disable traps for SME.
This means that when a the task has never used SME before then the value
set via ptrace will never be seen by the target task since it will
trigger a SME access trap which will flush the register state.

Disable SME traps when setting ZT, this means we also need to allocate
storage for SVE if it is not already allocated, for the benefit of
streaming SVE.

Fixes: f90b529bcb ("arm64/sme: Implement ZT0 ptrace support")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org> # 6.3.x
Link: https://lore.kernel.org/r/20230816-arm64-zt-ptrace-first-use-v2-1-00aa82847e28@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-08-17 19:00:03 +01:00
Mark Brown 5d0a8d2fba arm64/ptrace: Ensure that SME is set up for target when writing SSVE state
When we use NT_ARM_SSVE to either enable streaming mode or change the
vector length for a process we do not currently do anything to ensure that
there is storage allocated for the SME specific register state.  If the
task had not previously used SME or we changed the vector length then
the task will not have had TIF_SME set or backing storage for ZA/ZT
allocated, resulting in inconsistent register sizes when saving state
and spurious traps which flush the newly set register state.

We should set TIF_SME to disable traps and ensure that storage is
allocated for ZA and ZT if it is not already allocated.  This requires
modifying sme_alloc() to make the flush of any existing register state
optional so we don't disturb existing state for ZA and ZT.

Fixes: e12310a0d3 ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Reported-by: David Spickett <David.Spickett@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org> # 5.19.x
Link: https://lore.kernel.org/r/20230810-arm64-fix-ptrace-race-v1-1-a5361fad2bd6@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-08-17 18:59:51 +01:00
Zheng Yejian eecb91b9f9 tracing: Fix memleak due to race between current_tracer and trace
Kmemleak report a leak in graph_trace_open():

  unreferenced object 0xffff0040b95f4a00 (size 128):
    comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s)
    hex dump (first 32 bytes):
      e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}..........
      f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e...
    backtrace:
      [<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0
      [<000000007df90faa>] graph_trace_open+0xb0/0x344
      [<00000000737524cd>] __tracing_open+0x450/0xb10
      [<0000000098043327>] tracing_open+0x1a0/0x2a0
      [<00000000291c3876>] do_dentry_open+0x3c0/0xdc0
      [<000000004015bcd6>] vfs_open+0x98/0xd0
      [<000000002b5f60c9>] do_open+0x520/0x8d0
      [<00000000376c7820>] path_openat+0x1c0/0x3e0
      [<00000000336a54b5>] do_filp_open+0x14c/0x324
      [<000000002802df13>] do_sys_openat2+0x2c4/0x530
      [<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4
      [<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394
      [<00000000313647bf>] do_el0_svc+0xac/0xec
      [<000000002ef1c651>] el0_svc+0x20/0x30
      [<000000002fd4692a>] el0_sync_handler+0xb0/0xb4
      [<000000000c309c35>] el0_sync+0x160/0x180

The root cause is descripted as follows:

  __tracing_open() {  // 1. File 'trace' is being opened;
    ...
    *iter->trace = *tr->current_trace;  // 2. Tracer 'function_graph' is
                                        //    currently set;
    ...
    iter->trace->open(iter);  // 3. Call graph_trace_open() here,
                              //    and memory are allocated in it;
    ...
  }

  s_start() {  // 4. The opened file is being read;
    ...
    *iter->trace = *tr->current_trace;  // 5. If tracer is switched to
                                        //    'nop' or others, then memory
                                        //    in step 3 are leaked!!!
    ...
  }

To fix it, in s_start(), close tracer before switching then reopen the
new tracer after switching. And some tracers like 'wakeup' may not update
'iter->private' in some cases when reopen, then it should be cleared
to avoid being mistakenly closed again.

Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyejian1@huawei.com

Fixes: d7350c3f45 ("tracing/core: make the read callbacks reentrants")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-17 13:49:37 -04:00
Mark Brown ab0b5072d1
ASoC: cs35l56: Update ACPI HID and property
Merge series from Richard Fitzgerald <rf@opensource.cirrus.com>:

These two patches add an ACPI HID and update the way the platform-
specific firmware identifier is extracted from the ACPI.
2023-08-17 18:36:28 +01:00
Linus Torvalds 16931859a6 nfsd-6.5 fixes:
- Fix new MSG_SPLICE_PAGES support in server's TCP sendmsg helper
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmTeKW4ACgkQM2qzM29m
 f5dqfRAAjLXcxSixOnFqp49echmMlG1DI493WbiXG05hJ43TspnoThIGI+GpeTBG
 LxADUeZuxbMor0In8qA/GzRGQjJpdL22E1EWfLUlFN7JtWjVyyeuYLSeECGskByP
 VxAiQ/uo50dYPcRI3YN391ec2GZ9wnrNkHbfj3X0pEZoZVDea3GPRXnUQbnb4GVS
 Xj433F1pmPfl7EFNWCSuMcb3OPjG1uzcbYWF6+1RjaLZ02QrgzvwmU4bMePGHRga
 PsFGBaBdVJi6Dkv41NhsAflsg6DjZnzyuHnH8N6EygMz4py3kess9TRux1J9g22f
 j+o/GM+U1r66YXdhMhg1kiYApemRCybrmzFgn0B9dtYC7r4MiHVVRQFgAoa95teo
 g4iDQ/jxJOJMFm8ipXM/bOgMcznJfcHBX2pG2cN+qXZyPQnCn+luh81JI2s5pvqf
 yAJ/uThL6cWdyLoM+ZBvozpfPn9QTwwqDWMtZlbVkIz/nM6kiJjdUrl2NDBpaDhq
 WGd3CP6GWgxoCbURpfBJPLytjhb00Bep7xRm6BZavt/XSOHRWJdhEachVyWt/zFI
 6zLOMHv9A7rPTBigFAIgdfZzIL9c+StdpW3oEafWovwTBKW537W+m680tdLQK44q
 yl2EmeG2Z3hpqbRnDPiV5XCJ7ijArFslBBuZB6mwDCPKcbMs3FQ=
 =jfUp
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Fix new MSG_SPLICE_PAGES support in server's TCP sendmsg helper

* tag 'nfsd-6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  sunrpc: set the bv_offset of first bvec in svc_tcp_sendmsg
2023-08-17 16:38:48 +02:00
Trond Myklebust be2fd1560e NFS: Fix a use after free in nfs_direct_join_group()
Be more careful when tearing down the subrequests of an O_DIRECT write
as part of a retransmission.

Reported-by: Chris Mason <clm@fb.com>
Fixes: ed5d588fe4 ("NFS: Try to join page groups before an O_DIRECT retransmission")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-08-17 09:30:38 -04:00
xiaoshoukui 29eefa6d0d btrfs: fix BUG_ON condition in btrfs_cancel_balance
Pausing and canceling balance can race to interrupt balance lead to BUG_ON
panic in btrfs_cancel_balance. The BUG_ON condition in btrfs_cancel_balance
does not take this race scenario into account.

However, the race condition has no other side effects. We can fix that.

Reproducing it with panic trace like this:

  kernel BUG at fs/btrfs/volumes.c:4618!
  RIP: 0010:btrfs_cancel_balance+0x5cf/0x6a0
  Call Trace:
   <TASK>
   ? do_nanosleep+0x60/0x120
   ? hrtimer_nanosleep+0xb7/0x1a0
   ? sched_core_clone_cookie+0x70/0x70
   btrfs_ioctl_balance_ctl+0x55/0x70
   btrfs_ioctl+0xa46/0xd20
   __x64_sys_ioctl+0x7d/0xa0
   do_syscall_64+0x38/0x80
   entry_SYSCALL_64_after_hwframe+0x63/0xcd

  Race scenario as follows:
  > mutex_unlock(&fs_info->balance_mutex);
  > --------------------
  > .......issue pause and cancel req in another thread
  > --------------------
  > ret = __btrfs_balance(fs_info);
  >
  > mutex_lock(&fs_info->balance_mutex);
  > if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) {
  >         btrfs_info(fs_info, "balance: paused");
  >         btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
  > }

CC: stable@vger.kernel.org # 4.19+
Signed-off-by: xiaoshoukui <xiaoshoukui@ruijie.com.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-17 15:27:45 +02:00
Chris Mason 09c3717c3a btrfs: only subtract from len_to_oe_boundary when it is tracking an extent
bio_ctrl->len_to_oe_boundary is used to make sure we stay inside a zone
as we submit bios for writes.  Every time we add a page to the bio, we
decrement those bytes from len_to_oe_boundary, and then we submit the
bio if we happen to hit zero.

Most of the time, len_to_oe_boundary gets set to U32_MAX.
submit_extent_page() adds pages into our bio, and the size of the bio
ends up limited by:

- Are we contiguous on disk?
- Does bio_add_page() allow us to stuff more in?
- is len_to_oe_boundary > 0?

The len_to_oe_boundary math starts with U32_MAX, which isn't page or
sector aligned, and subtracts from it until it hits zero.  In the
non-zoned case, the last IO we submit before we hit zero is going to be
unaligned, triggering BUGs.

This is hard to trigger because bio_add_page() isn't going to make a bio
of U32_MAX size unless you give it a perfect set of pages and fully
contiguous extents on disk.  We can hit it pretty reliably while making
large swapfiles during provisioning because the machine is freshly
booted, mostly idle, and the disk is freshly formatted.  It's also
possible to trigger with reads when read_ahead_kb is set to 4GB.

The code has been clean up and shifted around a few times, but this flaw
has been lurking since the counter was added.  I think the commit
24e6c80822 ("btrfs: simplify main loop in submit_extent_page") ended
up exposing the bug.

The fix used here is to skip doing math on len_to_oe_boundary unless
we've changed it from the default U32_MAX value.  bio_add_page() is the
real limit we want, and there's no reason to do extra math when block
layer is doing it for us.

Sample reproducer, note you'll need to change the path to the bdi and
device:

  SUBVOL=/btrfs/swapvol
  SWAPFILE=$SUBVOL/swapfile
  SZMB=8192

  mkfs.btrfs -f /dev/vdb
  mount /dev/vdb /btrfs

  btrfs subvol create $SUBVOL
  chattr +C $SUBVOL
  dd if=/dev/zero of=$SWAPFILE bs=1M count=$SZMB
  sync

  echo 4 > /proc/sys/vm/drop_caches

  echo 4194304 > /sys/class/bdi/btrfs-2/read_ahead_kb

  while true; do
	  echo 1 > /proc/sys/vm/drop_caches
	  echo 1 > /proc/sys/vm/drop_caches
	  dd of=/dev/zero if=$SWAPFILE bs=4096M count=2 iflag=fullblock
  done

Fixes: 24e6c80822 ("btrfs: simplify main loop in submit_extent_page")
CC: stable@vger.kernel.org # 6.4+
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-17 15:27:35 +02:00
Anand Jain b471965fdb btrfs: fix replace/scrub failure with metadata_uuid
Fstests with POST_MKFS_CMD="btrfstune -m" (as in the mailing list)
reported a few of the test cases failing.

The failure scenario can be summarized and simplified as follows:

  $ mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb1 /dev/sdb2 :0
  $ btrfstune -m /dev/sdb1 :0
  $ wipefs -a /dev/sdb1 :0
  $ mount -o degraded /dev/sdb2 /btrfs :0
  $ btrfs replace start -B -f -r 1 /dev/sdb1 /btrfs :1
    STDERR:
    ERROR: ioctl(DEV_REPLACE_START) failed on "/btrfs": Input/output error

  [11290.583502] BTRFS warning (device sdb2): tree block 22036480 mirror 2 has bad fsid, has 99835c32-49f0-4668-9e66-dc277a96b4a6 want da40350c-33ac-4872-92a8-4948ed8c04d0
  [11290.586580] BTRFS error (device sdb2): unable to fix up (regular) error at logical 22020096 on dev /dev/sdb8 physical 1048576

As above, the replace is failing because we are verifying the header with
fs_devices::fsid instead of fs_devices::metadata_uuid, despite the
metadata_uuid actually being present.

To fix this, use fs_devices::metadata_uuid. We copy fsid into
fs_devices::metadata_uuid if there is no metadata_uuid, so its fine.

Fixes: a3ddbaebc7 ("btrfs: scrub: introduce a helper to verify one metadata block")
CC: stable@vger.kernel.org # 6.4+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-17 15:26:39 +02:00
Arnd Bergmann 6e8d96909a asm-generic: partially revert "Unify uapi bitsperlong.h for arm64, riscv and loongarch"
Unifying the asm-generic headers across 32-bit and 64-bit architectures
based on the compiler provided macros was a good idea and appears to work
with all user space, but it caused a regression when building old kernels
on systems that have the new headers installed in /usr/include, as this
combination trips an inconsistency in the kernel's own tools/include
headers that are a mix of userspace and kernel-internal headers.

This affects kernel builds on arm64, riscv64 and loongarch64 systems that
might end up using the "#define __BITS_PER_LONG 32" default from the old
tools headers. Backporting the commit into stable kernels would address
this, but it would still break building kernels without that backport,
and waste time for developers trying to understand the problem.

arm64 build machines are rather common, and on riscv64 this can also
happen in practice, but loongarch64 is probably new enough to not
be used much for building old kernels, so only revert the bits
for arm64 and riscv.

Link: https://lore.kernel.org/all/20230731160402.GB1823389@dev-arch.thelio-3990X/
Reported-by: Nathan Chancellor <nathan@kernel.org>
Fixes: 8386f58f8d ("asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch")
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-08-17 14:51:20 +02:00
Arnd Bergmann 3c78dbf251 Qualcomm ARM64 fixes for v6.5
This corrects the invalid path specifier for L3 interconnects in the CPU
 nodes of SM8150 and SM8250. It corrects the compatible of the SC8180X L3
 node, to pass the binding check.
 
 The crypto core, and its DMA controller, is disabled on SM8350 to avoid
 the system from crashing at boot while the issue is diagnosed.
 
 A thermal zone node name conflict is resolved for PM8150L, on the RB5
 board.
 
 The UFS vccq voltage is corrected on the SA877P Ride platform, to
 address observed stability issues.
 
 The reg-names of the DSI phy on SC7180 are restored after an accidental
 search-and-replace update.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAmTbiYoVHGFuZGVyc3Nv
 bkBrZXJuZWwub3JnAAoJEAsfOT8Nma3FeCkQAJuHrH8Bb1954UR4yba3P7h4kmLQ
 xWgDzQXfTT6cl9MHmq/8RwoIhGABAbr3HNX0qwVNVNgn+w90LtVFSSL0SqKczkQk
 euwtJf+T2NKqFJnRisx2O78WhPYxw52d6rs+v2gw2AoWATX6zRdXpcHch37bUv+/
 jBRNdas9wttrkxuOtJ7TD4G42KlIrM+6YukDWCtdwpLhBkKZwqu/UuEa8jwWyV8b
 cY9edRqo4P2nem0Nt5PmCGwFdHSBKR7wqDGMkwYP9tlziHfe6WTtQNYM+J7ZRMzg
 tslny8xIXZvo2xy2fwB8JGYD/rhQY3D16wL72mPFxOZelokdU1esX6pURw4JIyiB
 gHFDWEfEWXXbYzV+QDrxdbm28jgUTZinzDiRkkbxoPsA8roBmN9XHHlxm+/I4Yoj
 Vm1ymH1KUDOidkKlM1lfuTaXNxFGZ3xTtHweXD72d+uN+woUwcSM9N/FLugPq0Fb
 rng9FwxpOmr3Wc4NaIneRNOYA57+KaSY1OgMp5X+e9pAkjKoPKqOEPokgWUhqmAG
 PW+P3VJpwqT9xpaidN15Eyb/dIwtPa6IsRkK70l8DRLhLIwl8Zdbch+bV4AjTc7R
 fyxkuazL3Y+G7ddEvRXvgo6i0e1w8TTjn11D2NLn9yINZJhr6JqGbuS1F4dzYmUD
 Zc7WroVo89eep9Wh
 =ZesT
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmTeFRQACgkQYKtH/8kJ
 UifBZxAAg83TGfOxhmmHC/+YsJDK3qFDx8TbG1O6a65zpAOF0+mrDOzjIkaCFBVC
 jxpR8eKTMNVOq/ovfYOzENDik4qIcULyFFyIJDL3KtGQLh97/C6QzALUCDdHUbt5
 CTS7sZ08VRIP7BRt0G9C6rlR4/cuCzEC1k5KgIKV8VLbzAgQN+ylC7CQZCCr/4uo
 mA2yDy2E3X15SA6W1PbH5JUBS1ckFzpwFFm5FsjrNgmAkMXtBshypjW1HFXQQFhg
 XZV+MLH32+Hdy3MQ016w/fHbln61W0DjVIjiABfyqLSgpiDQ5AqvONDcV3YbsdP3
 dTnQQ2UJJ9xrbVT5retKOAs2jjMBZ/hmwZhExaI+m5+ZnSHiuOUctfKDkPL7aHcB
 fTzgc7Hrzalca7UDWRtC31IUlzlaHyXO/WLuAljwTqwFST2r7Jck/FSPDJMyRdJr
 hUixLBgby4W32PouDsPNqxsEO5Rhp8cPAa2Eb3i0L85EbZ3Zua9F36cSuhKE3xP6
 lANsgnbr5he/wZohPIz9pDFUnFdUcIXghX3NGXM1obNxZ8T6/Y1mG3WZBCmQHgOI
 nWttoKhMHknnY6pKuk66IzIAvM7wED6wQgHekcUCK9ZkXr8zVZMzBaIX+0d4zx/F
 asi/g3pnxqt+RL3HjhLMQnIz48wF46rtajCsfEHoGEmoaO1UmD8=
 =VxVu
 -----END PGP SIGNATURE-----

Merge tag 'qcom-arm64-fixes-for-6.5' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes

Qualcomm ARM64 fixes for v6.5

This corrects the invalid path specifier for L3 interconnects in the CPU
nodes of SM8150 and SM8250. It corrects the compatible of the SC8180X L3
node, to pass the binding check.

The crypto core, and its DMA controller, is disabled on SM8350 to avoid
the system from crashing at boot while the issue is diagnosed.

A thermal zone node name conflict is resolved for PM8150L, on the RB5
board.

The UFS vccq voltage is corrected on the SA877P Ride platform, to
address observed stability issues.

The reg-names of the DSI phy on SC7180 are restored after an accidental
search-and-replace update.

* tag 'qcom-arm64-fixes-for-6.5' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
  arm64: dts: qcom: sc7180: Fix DSI0_PHY reg-names
  arm64: dts: qcom: sa8775p-ride: Update L4C parameters
  arm64: dts: qcom: qrb5165-rb5: fix thermal zone conflict
  arm64: dts: qcom: sm8350: fix BAM DMA crash and reboot
  arm64: dts: qcom: sc8180x: Fix OSM L3 compatible
  arm64: dts: qcom: sm8250: Fix EPSS L3 interconnect cells
  arm64: dts: qcom: sm8150: Fix OSM L3 interconnect cells

Link: https://lore.kernel.org/r/20230815142042.2459048-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-08-17 14:39:47 +02:00
Arnd Bergmann 17fd01a243 Fixes for omaps
A fix external abort on non-linefetch for am335x that is fixed with a flush
 of posted write. And two networking fixes for beaglebone mostly for revision
 c3 to do phy reset with a gpio and to fix a boot time warning.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEkgNvrZJU/QSQYIcQG9Q+yVyrpXMFAmTcShQRHHRvbnlAYXRv
 bWlkZS5jb20ACgkQG9Q+yVyrpXO/Rw/7BFVAM9bgvrYYD/fqUgr3Y1qGEvhYDv8O
 TCioh2kJ5YfQmhZznYQKNQkOnFiCdyGizLN30k5q1tKsMeCZfEZptyQXlUI3X1hu
 0MFvwOfbHHUlJuSyTkNv3bL3PpWss26Q3HR1Bv+cDIR+2jhhT3yWDPBQuEipBG5Y
 8ItXtndDyXB04tpN+e28N1tBVooc2Kuv9oqI0CLlOMsJEes7qsx4CSGSwoCEmwKd
 Ea9BMyjrmpI2omSU2u8pnPtYRWwFKMgVJfzDJsPe4EvMF638iFW5Djk5x1KcYgq9
 HKkUV0/5X7oqDQCPxqktCuJmwxtMm2bYAqPRoj/jGX+3SS2/9zMas99MfsqstT8F
 xzbrh4rOk3tnQqvhExAW6UzsmVLDfvb/rV1hVfn/ECVs2Qc52mUe31PgaTBtkKgz
 4rPHxGbDZH8PcQwPkuMGzxJI/tDz+MC4KwDhNbi3wds3xoPa3Fn7U5ceuY5acf4B
 zLY8xSvmoUlZqEkr7yN1J9V7mavuThL0qF2Ns9HpYMBd+qPCEi8KZr6mRWuLKWxo
 bZyBbyxwvEwDtPrEL9AJWL3wDWWywL3EvvzdyZ8SYFRo8TkslC9qLkgMmGryJu2N
 flO3hESA/qZYuEkgovorrVgSBwdlZHC/4JwUn+G8vO1kTfCgiqgHuxyo9niZtDgF
 Irn9Y8fYNCs=
 =OJ1M
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmTeFM8ACgkQYKtH/8kJ
 UidhUBAAmehmOaEvyq0M/5vtDd/5S7yjLCuzYe+US+rqsocxYWasFpE5OvrLECcX
 dZdixLmopkak3VqemB8h9F+qahnePEA2x1UA4Ei329VHgWn/Ds4Z3RJ/mJZ1vSfj
 R9cGB+/ivRKpfjJCCUM9UCycbGS+MyUWmj2i9bpK4D6j/UOxqh8K7Zsx2Tq0btAE
 ArIMW23xTyfsmXTrAZFT77sQK3cX8tKtGZrQanMre8QZwH3/pCYc/nZlYZnYtrNS
 +96go1b3+KwSVULYpg3ChmK27nh8YNDYFXMCVpsukc+wM5qsf/IKhyHgl/fgBHiF
 dh2OWKqadxmoeLvUVEn8gFYPiQvQVjPa8iFvF8cDPN/bsNAbwUbF3pGIwVPyTUHm
 9qVxzKZBPtqt2IE6PRlobNSaS+I6THfiJXVBDj9r+ETuG8Dm2e8Lb+vgV7fdVrVu
 AK/bH+tR7ufnzGLkF6zMB/Ub5kFuI5d9Qsv9qmPHwOHbfl85dnDdWqk6fDezGNyn
 eRbay/7Z4VKIFjbrNLEu2wJp+S7S2/u56GGb1sF/EVoIxoexgQLy1GxoALy3+b94
 tBQGO176uXzDAziJTzZTC7arC4pCzIqKY8+QWRX5X88bdiEhrx5Bi8h5Jug7v+9g
 Hr46pW6DJBTUZIZbe89qIxiSUWipaa9CjbhUnPrh4VJfi3kCo1w=
 =1dGT
 -----END PGP SIGNATURE-----

Merge tag 'omap-for-v6.5/fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes

Fixes for omaps

A fix external abort on non-linefetch for am335x that is fixed with a flush
of posted write. And two networking fixes for beaglebone mostly for revision
c3 to do phy reset with a gpio and to fix a boot time warning.

* tag 'omap-for-v6.5/fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: am335x-bone-common: Add vcc-supply for on-board eeprom
  ARM: dts: am335x-bone-common: Add GPIO PHY reset on revision C3 board
  bus: ti-sysc: Flush posted write on enable before reset

Link: https://lore.kernel.org/r/pull-1692158536-457318@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-08-17 14:38:39 +02:00
Arnd Bergmann e7c12167a2 Correct wifi interrupt flags for some boards, fixed wifi on Rock PI4,
disabled hs400 speeds for some boards having problems with data
 intergrity and some dt property/styling fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCAAuFiEE7v+35S2Q1vLNA3Lx86Z5yZzRHYEFAmTdQagQHGhlaWtvQHNu
 dGVjaC5kZQAKCRDzpnnJnNEdga3AB/sFuePDO2RZcM4YAf5IB/Ct5nFgdaVEOvNk
 UxB60rD7nsVuTblc3FT9K6TRgGJfX3VKylDUWoSHlkbjoof0Hu8GJ/5dhiNM5dxc
 GRwTzobweVez3d6CWI78EMbEBa/phHSEFdEy5Dk0npolt91tBzOn1BqZTi3/CLY3
 56cPcBArLqlEWE7ExhNXHbrJ+WCblgjulabVm9aYjSeU8iOHHxnq+Z82M+DDIP2Q
 3I2rsyFqgHYOAsYaBGOyVnEzJiEVEOp/1+4bvlawmWLRj1s48bjn7aWUBaawLd1c
 NMcPRWFRKEDqyP28wxL45Kr8mWZ/97lQ4DY/MsyKtJFF7zj9CPIM
 =gBGI
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmTeD70ACgkQYKtH/8kJ
 Uif5+BAArCGIZsSXNvHiRYrjOJkfy3G/pAa6S+fVxS0H+A9MiJO6Z1wsqE+Qq/B8
 YFUFeWQpiijVfBip8E8vOZsLLykuY32z5LKv3u7fbMdDIThyr1npsiRGJ6I4KGM5
 pnfdqRo3hsIak67jp9idRXppRWfU7t5O2IqpD5+Yrul8cjLu0U+tEHrO30MzzpaJ
 TQfLwFjvWUe2MvI2p6ntPb44vfYuqK7vAuYhqIyMDnO/69r9hTOzf58NxqkQr09B
 f3bgMY+53dR7lKDrvdUSZDlNQmtclrrsCC1UK+eueYWgOMEhD2DOeVCzAIzkABzM
 mUrKHLAldi9zDLFfrXpOatUBlToj8Zq4BR9R6qKI1s4gvuXi70tkDhuK4DLjObad
 ADtTHFikrrcYE0wU9o2fnc5cndvkDZLQ/ploUwnwTfoL/YO2omKCDT59R9O7aDja
 eVkwUE60ZavOIQHvd11uFRtFSNeH5eh/97ggGSXKaYkamiSR9VFHnYnsD38phEMQ
 hcPzocN+eqaIIdhLgTqn+qHPJ20oHDcnjFjyJEDDHMTg5UEY2k4VCwCFVH1n0A2u
 R24XXzzdJHNEBQPL8o2SzgJfvyKyjVVe2+7OeIT6RxaSjRI+M90oA1zp4DMNEeqy
 +bIPCzi7aa48+U0EchM4lt+IJJphGMSOJ5mELaDmF8BDwPhKEzY=
 =yaB5
 -----END PGP SIGNATURE-----

Merge tag 'v6.5-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes

Correct wifi interrupt flags for some boards, fixed wifi on Rock PI4,
disabled hs400 speeds for some boards having problems with data
intergrity and some dt property/styling fixes.

* tag 'v6.5-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: dts: rockchip: Fix Wifi/Bluetooth on ROCK Pi 4 boards
  arm64: dts: rockchip: minor whitespace cleanup around '='
  arm64: dts: rockchip: Disable HS400 for eMMC on ROCK 4C+
  arm64: dts: rockchip: Disable HS400 for eMMC on ROCK Pi 4
  arm64: dts: rockchip: add missing space before { on indiedroid nova
  arm64: dts: rockchip: correct wifi interrupt flag in Box Demo
  arm64: dts: rockchip: correct wifi interrupt flag in Rock Pi 4B
  arm64: dts: rockchip: correct wifi interrupt flag in eaidk-610
  arm64: dts: rockchip: Drop invalid regulator-init-microvolt property

Link: https://lore.kernel.org/r/4519945.8hzESeGDPO@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-08-17 14:17:01 +02:00
Shenghao Ding 3d521f9f36
ASoC: tas2781: fixed register access error when switching to other chips
fixed register access error when switching to other tas2781 -- refresh the page
inside regmap on the switched tas2781

Signed-off-by: Shenghao Ding <shenghao-ding@ti.com>
Link: https://lore.kernel.org/r/20230817093257.951-1-shenghao-ding@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2023-08-17 13:09:12 +01:00
Simon Trimmer e8500a7027
ASoC: cs35l56: Add an ACPI match table
An ACPI ID has been allocated for CS35L56 ASoC devices so that they can
be instantiated from ACPI Device entries.

Signed-off-by: Simon Trimmer <simont@opensource.cirrus.com>
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230817112712.16637-3-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2023-08-17 13:05:31 +01:00
Maciej Strozek 897a6b5a03
ASoC: cs35l56: Read firmware uuid from a device property instead of _SUB
Use a device property "cirrus,firmware-uid" to get the unique firmware
identifier instead of using ACPI _SUB. There aren't any products that use
_SUB.

There will not usually be a _SUB in Soundwire nodes. The ACPI can use a
_DSD section for custom properties.

There is also a need to support instantiating this driver using software
nodes. This is for systems where the CS35L56 is a back-end device and the
ACPI refers only to the front-end audio device - there will not be any ACPI
references to CS35L56.

Fixes: e496112529 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Signed-off-by: Maciej Strozek <mstrozek@opensource.cirrus.com>
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230817112712.16637-2-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2023-08-17 13:05:30 +01:00
Jani Nikula 50b6f2c829 Revert "drm/edid: Fix csync detailed mode parsing"
This reverts commit ca62297b20.

Commit ca62297b20 ("drm/edid: Fix csync detailed mode parsing") fixed
EDID detailed mode sync parsing. Unfortunately, there are quite a few
displays out there that have bogus (zero) sync field that are broken by
the change. Zero means analog composite sync, which is not right for
digital displays, and the modes get rejected. Regardless, it used to
work, and it needs to continue to work. Revert the change.

Rejecting modes with analog composite sync was the part that fixed the
gitlab issue 8146 [1]. We'll need to get back to the drawing board with
that.

[1] https://gitlab.freedesktop.org/drm/intel/-/issues/8146

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8789
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8930
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/9044
Fixes: ca62297b20 ("drm/edid: Fix csync detailed mode parsing")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.4+
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230815101907.2900768-1-jani.nikula@intel.com
2023-08-17 14:39:12 +03:00
Peter Zijlstra 5409730962 x86/static_call: Fix __static_call_fixup()
Christian reported spurious module load crashes after some of Song's
module memory layout patches.

Turns out that if the very last instruction on the very last page of the
module is a 'JMP __x86_return_thunk' then __static_call_fixup() will
trip a fault and die.

And while the module rework made this slightly more likely to happen,
it's always been possible.

Fixes: ee88d363d1 ("x86,static_call: Use alternative RET encoding")
Reported-by: Christian Bricart <christian@bricart.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lkml.kernel.org/r/20230816104419.GA982867@hirez.programming.kicks-ass.net
2023-08-17 13:24:09 +02:00
Alfred Lee 23d775f12d net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset
If the switch is reset during active EEPROM transactions, as in
just after an SoC reset after power up, the I2C bus transaction
may be cut short leaving the EEPROM internal I2C state machine
in the wrong state.  When the switch is reset again, the bad
state machine state may result in data being read from the wrong
memory location causing the switch to enter unexpected mode
rendering it inoperational.

Fixes: a3dcb3e7e7 ("net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset")
Signed-off-by: Alfred Lee <l00g33k@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230815001323.24739-1-l00g33k@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16 20:16:51 -07:00
Nathan Lynch 4f3175979e powerpc/rtas_flash: allow user copy to flash block cache objects
With hardened usercopy enabled (CONFIG_HARDENED_USERCOPY=y), using the
/proc/powerpc/rtas/firmware_update interface to prepare a system
firmware update yields a BUG():

  kernel BUG at mm/usercopy.c:102!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 0 PID: 2232 Comm: dd Not tainted 6.5.0-rc3+ #2
  Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 of:IBM,FW860.50 (SV860_146) hv:phyp pSeries
  NIP:  c0000000005991d0 LR: c0000000005991cc CTR: 0000000000000000
  REGS: c0000000148c76a0 TRAP: 0700   Not tainted  (6.5.0-rc3+)
  MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002242  XER: 0000000c
  CFAR: c0000000001fbd34 IRQMASK: 0
  [ ... GPRs omitted ... ]
  NIP usercopy_abort+0xa0/0xb0
  LR  usercopy_abort+0x9c/0xb0
  Call Trace:
    usercopy_abort+0x9c/0xb0 (unreliable)
    __check_heap_object+0x1b4/0x1d0
    __check_object_size+0x2d0/0x380
    rtas_flash_write+0xe4/0x250
    proc_reg_write+0xfc/0x160
    vfs_write+0xfc/0x4e0
    ksys_write+0x90/0x160
    system_call_exception+0x178/0x320
    system_call_common+0x160/0x2c4

The blocks of the firmware image are copied directly from user memory
to objects allocated from flash_block_cache, so flash_block_cache must
be created using kmem_cache_create_usercopy() to mark it safe for user
access.

Fixes: 6d07d1cd30 ("usercopy: Restrict non-usercopy caches to size 0")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
[mpe: Trim and indent oops]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230810-rtas-flash-vs-hardened-usercopy-v2-1-dcf63793a938@linux.ibm.com
2023-08-17 09:46:14 +10:00
Peter Zijlstra dbf4600877 objtool/x86: Fixup frame-pointer vs rethunk
For stack-validation of a frame-pointer build, objtool validates that
every CALL instruction is preceded by a frame-setup. The new SRSO
return thunks violate this with their RSB stuffing trickery.

Extend the __fentry__ exception to also cover the embedded_insn case
used for this. This cures:

  vmlinux.o: warning: objtool: srso_untrain_ret+0xd: call without frame pointer save/setup

Fixes: 4ae68b26c3 ("objtool/x86: Fix SRSO mess")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20230816115921.GH980931@hirez.programming.kicks-ass.net
2023-08-17 00:44:35 +02:00
Karol Herbst 1b254b791d drm/nouveau/disp: fix use-after-free in error handling of nouveau_connector_create
We can't simply free the connector after calling drm_connector_init on it.
We need to clean up the drm side first.

It might not fix all regressions from commit 2b5d1c29f6
("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts"),
but at least it fixes a memory corruption in error handling related to
that commit.

Link: https://lore.kernel.org/lkml/20230806213107.GFZNARG6moWpFuSJ9W@fat_crate.local/
Fixes: 95983aea80 ("drm/nouveau/disp: add connector class")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230814144933.3956959-1-kherbst@redhat.com
2023-08-16 23:45:04 +02:00
Shay Drory 0fd23db0cc net/mlx5: Fix mlx5_cmd_update_root_ft() error flow
The cited patch change mlx5_cmd_update_root_ft() to work with multiple
peer devices. However, it didn't align the error flow as well.
Hence, Fix the error code to work with multiple peer devices.

Fixes: 222dd18583 ("{net/RDMA}/mlx5: introduce lag_for_each_peer")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-16 13:39:28 -07:00
Dragos Tatulea 34a79876d9 net/mlx5e: XDP, Fix fifo overrun on XDP_REDIRECT
Before this fix, running high rate traffic through XDP_REDIRECT
with multibuf could overrun the fifo used to release the
xdp frames after tx completion. This resulted in corrupted data
being consumed on the free side.

The culplirt was a miscalculation of the fifo size: the maximum ratio
between fifo entries / data segments was incorrect. This ratio serves to
calculate the max fifo size for a full sq where each packet uses the
worst case number of entries in the fifo.

This patch fixes the formula and names the constant. It also makes sure
that future values will use a power of 2 number of entries for the fifo
mask to work.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Fixes: 3f734b8c59 ("net/mlx5e: XDP, Use multiple single-entry objects in xdpi_fifo")
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-16 13:39:28 -07:00
Sven Schnelle c4d6b54381 tracing/synthetic: Allocate one additional element for size
While debugging another issue I noticed that the stack trace contains one
invalid entry at the end:

<idle>-0       [008] d..4.    26.484201: wake_lat: pid=0 delta=2629976084 000000009cc24024 stack=STACK:
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x150/0x2c0
=> kcompactd+0x9ca/0xc20
=> kthread+0x2f6/0x3d8
=> __ret_from_fork+0x8a/0xe8
=> 0x6b6b6b6b6b6b6b6b

This is because the code failed to add the one element containing the
number of entries to field_size.

Link: https://lkml.kernel.org/r/20230816154928.4171614-4-svens@linux.ibm.com

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 00cf3d672a ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-16 16:37:07 -04:00
Sven Schnelle 887f92e09e tracing/synthetic: Skip first entry for stack traces
While debugging another issue I noticed that the stack trace output
contains the number of entries on top:

         <idle>-0       [000] d..4.   203.322502: wake_lat: pid=0 delta=2268270616 stack=STACK:
=> 0x10
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x242/0x2c0
=> __wait_for_common+0x434/0x680
=> __wait_rcu_gp+0x198/0x3e0
=> synchronize_rcu+0x112/0x138
=> ring_buffer_reset_online_cpus+0x140/0x2e0
=> tracing_reset_online_cpus+0x15c/0x1d0
=> tracing_set_clock+0x180/0x1d8
=> hist_register_trigger+0x486/0x670
=> event_hist_trigger_parse+0x494/0x1318
=> trigger_process_regex+0x1d4/0x258
=> event_trigger_write+0xb4/0x170
=> vfs_write+0x210/0xad0
=> ksys_write+0x122/0x208

Fix this by skipping the first element. Also replace the pointer
logic with an index variable which is easier to read.

Link: https://lkml.kernel.org/r/20230816154928.4171614-3-svens@linux.ibm.com

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 00cf3d672a ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-16 16:34:25 -04:00
Sven Schnelle ddeea494a1 tracing/synthetic: Use union instead of casts
The current code uses a lot of casts to access the fields member in struct
synth_trace_events with different sizes.  This makes the code hard to
read, and had already introduced an endianness bug. Use a union and struct
instead.

Link: https://lkml.kernel.org/r/20230816154928.4171614-2-svens@linux.ibm.com

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 00cf3d672a ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-16 16:33:27 -04:00
Borislav Petkov (AMD) 9dbd23e42f x86/srso: Explain the untraining sequences a bit more
The goal is to eventually have a proper documentation about all this.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814164447.GFZNpZ/64H4lENIe94@fat_crate.local
2023-08-16 21:58:59 +02:00
Peter Zijlstra 864bcaa38e x86/cpu/kvm: Provide UNTRAIN_RET_VM
Similar to how it doesn't make sense to have UNTRAIN_RET have two
untrain calls, it also doesn't make sense for VMEXIT to have an extra
IBPB call.

This cures VMEXIT doing potentially unret+IBPB or double IBPB.
Also, the (SEV) VMEXIT case seems to have been overlooked.

Redefine the meaning of the synthetic IBPB flags to:

 - ENTRY_IBPB     -- issue IBPB on entry  (was: entry + VMEXIT)
 - IBPB_ON_VMEXIT -- issue IBPB on VMEXIT

And have 'retbleed=ibpb' set *BOTH* feature flags to ensure it retains
the previous behaviour and issues IBPB on entry+VMEXIT.

The new 'srso=ibpb_vmexit' option only sets IBPB_ON_VMEXIT.

Create UNTRAIN_RET_VM specifically for the VMEXIT case, and have that
check IBPB_ON_VMEXIT.

All this avoids having the VMEXIT case having to check both ENTRY_IBPB
and IBPB_ON_VMEXIT and simplifies the alternatives.

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121149.109557833@infradead.org
2023-08-16 21:58:59 +02:00
Peter Zijlstra e7c25c441e x86/cpu: Cleanup the untrain mess
Since there can only be one active return_thunk, there only needs be
one (matching) untrain_ret. It fundamentally doesn't make sense to
allow multiple untrain_ret at the same time.

Fold all the 3 different untrain methods into a single (temporary)
helper stub.

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121149.042774962@infradead.org
2023-08-16 21:58:59 +02:00
Peter Zijlstra 42be649dd1 x86/cpu: Rename srso_(.*)_alias to srso_alias_\1
For a more consistent namespace.

  [ bp: Fixup names in the doc too. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.976236447@infradead.org
2023-08-16 21:58:53 +02:00
Peter Zijlstra d025b7bac0 x86/cpu: Rename original retbleed methods
Rename the original retbleed return thunk and untrain_ret to
retbleed_return_thunk() and retbleed_untrain_ret().

No functional changes.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.909378169@infradead.org
2023-08-16 21:47:53 +02:00
Peter Zijlstra d43490d0ab x86/cpu: Clean up SRSO return thunk mess
Use the existing configurable return thunk. There is absolute no
justification for having created this __x86_return_thunk alternative.

To clarify, the whole thing looks like:

Zen3/4 does:

  srso_alias_untrain_ret:
	  nop2
	  lfence
	  jmp srso_alias_return_thunk
	  int3

  srso_alias_safe_ret: // aliasses srso_alias_untrain_ret just so
	  add $8, %rsp
	  ret
	  int3

  srso_alias_return_thunk:
	  call srso_alias_safe_ret
	  ud2

While Zen1/2 does:

  srso_untrain_ret:
	  movabs $foo, %rax
	  lfence
	  call srso_safe_ret           (jmp srso_return_thunk ?)
	  int3

  srso_safe_ret: // embedded in movabs instruction
	  add $8,%rsp
          ret
          int3

  srso_return_thunk:
	  call srso_safe_ret
	  ud2

While retbleed does:

  zen_untrain_ret:
	  test $0xcc, %bl
	  lfence
	  jmp zen_return_thunk
          int3

  zen_return_thunk: // embedded in the test instruction
	  ret
          int3

Where Zen1/2 flush the BTB entry using the instruction decoder trick
(test,movabs) Zen3/4 use BTB aliasing. SRSO adds a return sequence
(srso_safe_ret()) which forces the function return instruction to
speculate into a trap (UD2).  This RET will then mispredict and
execution will continue at the return site read from the top of the
stack.

Pick one of three options at boot (evey function can only ever return
once).

  [ bp: Fixup commit message uarch details and add them in a comment in
    the code too. Add a comment about the srso_select_mitigation()
    dependency on retbleed_select_mitigation(). Add moar ifdeffery for
    32-bit builds. Add a dummy srso_untrain_ret_alias() definition for
    32-bit alternatives needing the symbol. ]

Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.842775684@infradead.org
2023-08-16 21:47:24 +02:00
Alex Deucher 6ecc10295a Revert "Revert "drm/amdgpu/display: change pipe policy for DCN 2.0""
This reverts commit 27dd79c00a.

It appears MPC_SPLIT_DYNAMIC still causes problems with multiple
displays on DCN2.0 hardware.  Switch back to MPC_SPLIT_AVOID_MULT_DISP.
This increases power usage with multiple displays, but avoids hangs.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2475
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.4.x
2023-08-16 15:46:40 -04:00
Mario Limonciello a7b7d9e8ae drm/amd: flush any delayed gfxoff on suspend entry
DCN 3.1.4 is reported to hang on s2idle entry if graphics activity
is happening during entry.  This is because GFXOFF was scheduled as
delayed but RLC gets disabled in s2idle entry sequence which will
hang GFX IP if not already in GFXOFF.

To help this problem, flush any delayed work for GFXOFF early in
s2idle entry sequence to ensure that it's off when RLC is changed.

commit 4b31b92b14 ("drm/amdgpu: complete gfxoff allow signal during
suspend without delay") modified power gating flow so that if called
in s0ix that it ensured that GFXOFF wasn't put in work queue but
instead processed immediately.

This is dead code due to commit 10cb67eb8a ("drm/amdgpu: skip
CG/PG for gfx during S0ix") because GFXOFF will now not be explicitly
called as part of the suspend entry code.  Remove that dead code.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-08-16 15:46:39 -04:00
Tim Huang f1740b1ab2 drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix
GFX v11.0.1 reported fence fallback timer expired issue on
SDMA and GFX rings after S0ix resume. This is generated by
EOP interrupts are disabled when S0ix suspend but fails to
re-enable when resume because of the GFX is in GFXOFF.

[  203.349571] [drm] Fence fallback timer expired on ring sdma0
[  203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0
[  203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0

For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers
to configure the fence driver interrupts for rings that belong to GFX.
The interrupts configuration will be restored by GFXOFF exit.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-08-16 15:46:39 -04:00
James Zhu b25fdc048c drm/amdgpu: skip xcp drm device allocation when out of drm resource
Return 0 when drm device alloc failed with -ENOSPC in
order to  allow amdgpu drive loading. But the xcp without
drm device node assigned won't be visiable in user space.
This helps amdgpu driver loading on system which has more
than 64 nodes, the current limitation.

The proposal to add more drm nodes is discussed in public,
which will support up to 2^20 nodes totally.
kernel drm:
https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiarski@intel.com/T/
libdrm:
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16 15:46:39 -04:00
Asad Kamal d621114ffb drm/amd/pm: Update pci link width for smu v13.0.6
Update addresses of PCIE link width registers,
& link width format used to populate gpu metrics
table for smu v13.0.6

v2:
Removed ESM register update

v3:
Updated patch subject and message

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16 15:46:39 -04:00
Lijo Lazar 8d036427f0 drm/amd/pm: Fix temperature unit of SMU v13.0.6
Temperature needs to be reported in millidegree Celsius.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16 15:46:39 -04:00
Umio Yasuno 6a92761a86 drm/amdgpu/pm: fix throttle_status for other than MP1 11.0.7
Use the right metrics table version based on the firmware.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2720
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-08-16 15:46:39 -04:00
Jiadong Zhu 0d6f374c0c drm/amdgpu: disable mcbp if parameter zero is set
The parameter amdgpu_mcbp shall have priority against the default value
calculated from the chip version.
User could disable mcbp by setting the parameter mcbp as zero.

v2: do not trigger preemption in sw ring muxer when mcbp is disabled.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16 15:46:33 -04:00
Zheng Yejian 5450be6bef selftests/ftrace: Add a basic testcase for snapshot
This testcase is constrived to reproduce a problem that the cpu buffers
become unavailable which is due to 'record_disabled' of array_buffer and
max_buffer being messed up.

Local test result after bugfix:
  # ./ftracetest test.d/00basic/snapshot1.tc
  === Ftrace unit tests ===
  [1] Snapshot and tracing_cpumask        [PASS]
  [2] (instance)  Snapshot and tracing_cpumask    [PASS]

  # of passed:  2
  # of failed:  0
  # of unresolved:  0
  # of untested:  0
  # of unsupported:  0
  # of xfailed:  0
  # of undefined(test bug):  0

Link: https://lkml.kernel.org/r/20230805033816.3284594-3-zhengyejian1@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <vnagarnaik@google.com>
Cc: <shuah@kernel.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-16 15:13:40 -04:00