Go to file
Felix Kuehling 8b25d39716 drm/amdkfd: Fix lock dependency warning
[ Upstream commit 47bf0f83fc86df1bf42b385a91aadb910137c5c9 ]

======================================================
WARNING: possible circular locking dependency detected
6.5.0-kfd-fkuehlin #276 Not tainted
------------------------------------------------------
kworker/8:2/2676 is trying to acquire lock:
ffff9435aae95c88 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550

but task is already holding lock:
ffff9435cd8e1720 (&svms->lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&svms->lock){+.+.}-{3:3}:
       __mutex_lock+0x97/0xd30
       kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu]
       kfd_ioctl+0x1b2/0x5d0 [amdgpu]
       __x64_sys_ioctl+0x86/0xc0
       do_syscall_64+0x39/0x80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> #1 (&mm->mmap_lock){++++}-{3:3}:
       down_read+0x42/0x160
       svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu]
       process_one_work+0x27a/0x540
       worker_thread+0x53/0x3e0
       kthread+0xeb/0x120
       ret_from_fork+0x31/0x50
       ret_from_fork_asm+0x11/0x20

-> #0 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}:
       __lock_acquire+0x1426/0x2200
       lock_acquire+0xc1/0x2b0
       __flush_work+0x80/0x550
       __cancel_work_timer+0x109/0x190
       svm_range_bo_release+0xdc/0x1c0 [amdgpu]
       svm_range_free+0x175/0x180 [amdgpu]
       svm_range_deferred_list_work+0x15d/0x340 [amdgpu]
       process_one_work+0x27a/0x540
       worker_thread+0x53/0x3e0
       kthread+0xeb/0x120
       ret_from_fork+0x31/0x50
       ret_from_fork_asm+0x11/0x20

other info that might help us debug this:

Chain exists of:
  (work_completion)(&svm_bo->eviction_work) --> &mm->mmap_lock --> &svms->lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&svms->lock);
                               lock(&mm->mmap_lock);
                               lock(&svms->lock);
  lock((work_completion)(&svm_bo->eviction_work));

I believe this cannot really lead to a deadlock in practice, because
svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO
refcount is non-0. That means it's impossible that svm_range_bo_release
is running concurrently. However, there is no good way to annotate this.

To avoid the problem, take a BO reference in
svm_range_schedule_evict_svm_bo instead of in the worker. That way it's
impossible for a BO to get freed while eviction work is pending and the
cancel_work_sync call in svm_range_bo_release can be eliminated.

v2: Use svm_bo_ref_unless_zero and explained why that's safe. Also
removed redundant checks that are already done in
amdkfd_fence_enable_signaling.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05 20:12:59 +00:00
Documentation ASoC: doc: Fix undefined SND_SOC_DAPM_NOPM argument 2024-02-05 20:12:54 +00:00
LICENSES
arch um: time-travel: fix time corruption 2024-02-05 20:12:57 +00:00
block block: prevent an integer overflow in bvec_try_merge_hw_page 2024-02-05 20:12:53 +00:00
certs
crypto crypto: api - Disallow identical driver names 2024-01-31 16:16:58 -08:00
drivers drm/amdkfd: Fix lock dependency warning 2024-02-05 20:12:59 +00:00
fs 9p: Fix initialisation of netfs_inode for 9p 2024-02-05 20:12:59 +00:00
include PCI: add INTEL_HDA_ARL to pci_ids.h 2024-02-05 20:12:55 +00:00
init rootfs: Fix support for rootfstype= when root= is given 2024-01-25 15:27:43 -08:00
io_uring io_uring/rw: ensure io->bytes_done is always initialized 2024-01-25 15:27:41 -08:00
ipc
kernel bpf: Set uattr->batch.count as zero before batched update or deletion 2024-02-05 20:12:51 +00:00
lib debugobjects: Stop accessing objects after releasing hash bucket lock 2024-02-05 20:12:47 +00:00
mm mm: page_alloc: unreserve highatomic page blocks before oom 2024-01-31 16:17:03 -08:00
net bridge: cfm: fix enum typo in br_cc_ccm_tx_parse 2024-02-05 20:12:54 +00:00
rust
samples fprobe: Pass entry_data to handlers 2023-10-25 12:03:12 +02:00
scripts scripts/get_abi: fix source path leak 2024-01-31 16:17:01 -08:00
security lsm: new security_file_ioctl_compat() hook 2024-01-31 16:17:00 -08:00
sound ALSA: hda/conexant: Fix headset auto detect fail in cx8070 and SN6140 2024-02-05 20:12:57 +00:00
tools libsubcmd: Fix memory leak in uniq() 2024-02-05 20:12:59 +00:00
usr
virt
.clang-format
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore
.mailmap
.rustfmt.toml
COPYING
CREDITS
Kbuild
Kconfig
MAINTAINERS genirq/affinity: Move group_cpus_evenly() into lib/ 2024-01-10 17:10:33 +01:00
Makefile Linux 6.1.76 2024-01-31 16:17:12 -08:00
README

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.