KVM: arm64: Retry fault if vma_lookup() results become invalid
commit13ec9308a8
upstream. Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can detect if the results of vma_lookup() (e.g. vma_shift) become stale before it acquires kvm->mmu_lock. This fixes a theoretical bug where a VMA could be changed by userspace after vma_lookup() and before KVM reads the mmu_invalidate_seq, causing KVM to install page table entries based on a (possibly) no-longer-valid vma_shift. Re-order the MMU cache top-up to earlier in user_mem_abort() so that it is not done after KVM has read mmu_invalidate_seq (i.e. so as to avoid inducing spurious fault retries). This bug has existed since KVM/ARM's inception. It's unlikely that any sane userspace currently modifies VMAs in such a way as to trigger this race. And even with directed testing I was unable to reproduce it. But a sufficiently motivated host userspace might be able to exploit this race. Fixes:94f8e6418d
("KVM: ARM: Handle guest faults in KVM") Cc: stable@vger.kernel.org Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230313235454.2964067-1-dmatlack@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev> [will: Use FSC_PERM instead of ESR_ELx_FSC_PERM] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commit is contained in:
parent
d70f63be62
commit
e1562cc202
|
@ -1178,6 +1178,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
|||
return -EFAULT;
|
||||
}
|
||||
|
||||
/*
|
||||
* Permission faults just need to update the existing leaf entry,
|
||||
* and so normally don't require allocations from the memcache. The
|
||||
* only exception to this is when dirty logging is enabled at runtime
|
||||
* and a write fault needs to collapse a block entry into a table.
|
||||
*/
|
||||
if (fault_status != FSC_PERM ||
|
||||
(logging_active && write_fault)) {
|
||||
ret = kvm_mmu_topup_memory_cache(memcache,
|
||||
kvm_mmu_cache_min_pages(kvm));
|
||||
if (ret)
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
* Let's check if we will get back a huge page backed by hugetlbfs, or
|
||||
* get block mapping for device MMIO region.
|
||||
|
@ -1234,36 +1248,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
|
|||
fault_ipa &= ~(vma_pagesize - 1);
|
||||
|
||||
gfn = fault_ipa >> PAGE_SHIFT;
|
||||
mmap_read_unlock(current->mm);
|
||||
|
||||
/*
|
||||
* Permission faults just need to update the existing leaf entry,
|
||||
* and so normally don't require allocations from the memcache. The
|
||||
* only exception to this is when dirty logging is enabled at runtime
|
||||
* and a write fault needs to collapse a block entry into a table.
|
||||
*/
|
||||
if (fault_status != FSC_PERM || (logging_active && write_fault)) {
|
||||
ret = kvm_mmu_topup_memory_cache(memcache,
|
||||
kvm_mmu_cache_min_pages(kvm));
|
||||
if (ret)
|
||||
return ret;
|
||||
}
|
||||
|
||||
mmu_seq = vcpu->kvm->mmu_invalidate_seq;
|
||||
/*
|
||||
* Ensure the read of mmu_invalidate_seq happens before we call
|
||||
* gfn_to_pfn_prot (which calls get_user_pages), so that we don't risk
|
||||
* the page we just got a reference to gets unmapped before we have a
|
||||
* chance to grab the mmu_lock, which ensure that if the page gets
|
||||
* unmapped afterwards, the call to kvm_unmap_gfn will take it away
|
||||
* from us again properly. This smp_rmb() interacts with the smp_wmb()
|
||||
* in kvm_mmu_notifier_invalidate_<page|range_end>.
|
||||
* Read mmu_invalidate_seq so that KVM can detect if the results of
|
||||
* vma_lookup() or __gfn_to_pfn_memslot() become stale prior to
|
||||
* acquiring kvm->mmu_lock.
|
||||
*
|
||||
* Besides, __gfn_to_pfn_memslot() instead of gfn_to_pfn_prot() is
|
||||
* used to avoid unnecessary overhead introduced to locate the memory
|
||||
* slot because it's always fixed even @gfn is adjusted for huge pages.
|
||||
* Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
|
||||
* with the smp_wmb() in kvm_mmu_invalidate_end().
|
||||
*/
|
||||
smp_rmb();
|
||||
mmu_seq = vcpu->kvm->mmu_invalidate_seq;
|
||||
mmap_read_unlock(current->mm);
|
||||
|
||||
pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
|
||||
write_fault, &writable, NULL);
|
||||
|
|
Loading…
Reference in New Issue