acrn-kernel/mm
Charan Teja Kalla 3101b9fd74 mm: page_alloc: unreserve highatomic page blocks before oom
commit ac3f3b0a55518056bc80ed32a41931c99e1f7d81 upstream.

__alloc_pages_direct_reclaim() is called from slowpath allocation where
high atomic reserves can be unreserved after there is a progress in
reclaim and yet no suitable page is found.  Later should_reclaim_retry()
gets called from slow path allocation to decide if the reclaim needs to be
retried before OOM kill path is taken.

should_reclaim_retry() checks the available(reclaimable + free pages)
memory against the min wmark levels of a zone and returns:

a) true, if it is above the min wmark so that slow path allocation will
   do the reclaim retries.

b) false, thus slowpath allocation takes oom kill path.

should_reclaim_retry() can also unreserves the high atomic reserves **but
only after all the reclaim retries are exhausted.**

In a case where there are almost none reclaimable memory and free pages
contains mostly the high atomic reserves but allocation context can't use
these high atomic reserves, makes the available memory below min wmark
levels hence false is returned from should_reclaim_retry() leading the
allocation request to take OOM kill path.  This can turn into a early oom
kill if high atomic reserves are holding lot of free memory and
unreserving of them is not attempted.

(early)OOM is encountered on a VM with the below state:
[  295.998653] Normal free:7728kB boost:0kB min:804kB low:1004kB
high:1204kB reserved_highatomic:8192KB active_anon:4kB inactive_anon:0kB
active_file:24kB inactive_file:24kB unevictable:1220kB writepending:0kB
present:70732kB managed:49224kB mlocked:0kB bounce:0kB free_pcp:688kB
local_pcp:492kB free_cma:0kB
[  295.998656] lowmem_reserve[]: 0 32
[  295.998659] Normal: 508*4kB (UMEH) 241*8kB (UMEH) 143*16kB (UMEH)
33*32kB (UH) 7*64kB (UH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB
0*4096kB = 7752kB

Per above log, the free memory of ~7MB exist in the high atomic reserves
is not freed up before falling back to oom kill path.

Fix it by trying to unreserve the high atomic reserves in
should_reclaim_retry() before __alloc_pages_direct_reclaim() can fallback
to oom kill path.

Link: https://lkml.kernel.org/r/1700823445-27531-1-git-send-email-quic_charante@quicinc.com
Fixes: 0aaa29a56e ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand")
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
Reported-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Chris Goldsworthy <quic_cgoldswo@quicinc.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-31 16:17:03 -08:00
..
damon mm/damon/core: make damon_start() waits until kdamond_fn() starts 2024-01-01 12:39:08 +00:00
kasan kasan: disable kasan_non_canonical_hook() for HW tags 2024-01-01 12:38:52 +00:00
kfence mm,kfence: decouple kfence from page granularity mapping judgement 2023-12-03 07:32:08 +01:00
kmsan mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush() 2023-04-26 14:28:41 +02:00
Kconfig mm: introduce new 'lock_mm_and_find_vma()' page fault helper 2023-07-01 13:16:24 +02:00
Kconfig.debug mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM 2023-06-14 11:15:29 +02:00
Makefile
backing-dev.c writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs 2023-04-26 14:28:39 +02:00
balloon_compaction.c
bootmem_info.c
cma.c mm/cma: use nth_page() in place of direct struct page manipulation 2023-11-28 17:07:14 +00:00
cma.h
cma_debug.c
cma_sysfs.c
compaction.c Revert "mm/compaction: fix set skip in fast_find_migrateblock" 2023-02-01 08:34:49 +01:00
debug.c
debug_page_ref.c
debug_vm_pgtable.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
filemap.c mm: merge folio_has_private()/filemap_release_folio() call pairs 2024-01-10 17:10:31 +01:00
folio-compat.c
frontswap.c
gup.c mm: always expand the stack with the mmap write lock held 2023-07-01 13:16:25 +02:00
gup_test.c
gup_test.h
highmem.c
hmm.c
huge_memory.c mm: merge folio_has_private()/filemap_release_folio() call pairs 2024-01-10 17:10:31 +01:00
hugetlb.c hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write 2023-12-13 18:39:20 +01:00
hugetlb_cgroup.c
hugetlb_vmemmap.c mm: hugetlb_vmemmap: fix a race between vmemmap pmd split 2023-09-19 12:27:56 +02:00
hugetlb_vmemmap.h
hwpoison-inject.c
init-mm.c
internal.h mm, netfs, fscache: stop read optimisation when folio removed from pagecache 2024-01-10 17:10:31 +01:00
interval_tree.c
io-mapping.c
ioremap.c
khugepaged.c mm: merge folio_has_private()/filemap_release_folio() call pairs 2024-01-10 17:10:31 +01:00
kmemleak.c
ksm.c mm/ksm: fix race with VMA iteration and mm_struct teardown 2023-03-30 12:49:29 +02:00
list_lru.c
maccess.c mm: Fix copy_from_user_nofault(). 2023-06-28 11:12:17 +02:00
madvise.c madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check 2023-08-30 16:11:11 +02:00
mapping_dirty_helpers.c
memblock.c Revert "mm: Always release pages to the buddy allocator in memblock_free_late()." 2023-02-22 12:59:50 +01:00
memcontrol.c mm: kmem: drop __GFP_NOFAIL when allocating objcg vectors 2023-11-28 17:07:20 +00:00
memfd.c memfd: check for non-NULL file_seals in memfd_create() syscall 2023-06-28 11:12:27 +02:00
memory-failure.c mm: merge folio_has_private()/filemap_release_folio() call pairs 2024-01-10 17:10:31 +01:00
memory-tiers.c memory tier: release the new_memtier in find_create_memory_tier() 2023-03-10 09:34:27 +01:00
memory.c mm: fix unmap_mapping_range high bits shift bug 2024-01-10 17:10:35 +01:00
memory_hotplug.c mm/memory_hotplug: fix error handling in add_memory_resource() 2024-01-10 17:10:33 +01:00
mempolicy.c mm/mempolicy: fix set_mempolicy_home_node() previous VMA pointer 2023-11-08 14:11:02 +01:00
mempool.c
memremap.c
memtest.c
migrate.c mm: merge folio_has_private()/filemap_release_folio() call pairs 2024-01-10 17:10:31 +01:00
migrate_device.c mm/migrate_device: return number of migrating pages in args->cpages 2022-11-22 18:50:43 -08:00
mincore.c mm: teach mincore_hugetlb about pte markers 2023-03-22 13:34:03 +01:00
mlock.c
mm_init.c
mm_slot.h
mmap.c mmap: fix error paths with dup_anon_vma() 2023-11-08 14:11:03 +01:00
mmap_lock.c
mmu_gather.c mm/khugepaged: fix GUP-fast interaction by sending IPI 2022-11-30 14:49:42 -08:00
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c mm, mremap: fix mremap() expanding for vma's with vm_ops->close() 2023-02-09 11:28:22 +01:00
msync.c
nommu.c xtensa: fix lock_mm_and_find_vma in case VMA not found 2023-07-05 18:27:37 +01:00
oom_kill.c
page-writeback.c filemap: add a per-mapping stable writes flag 2024-01-10 17:10:32 +01:00
page_alloc.c mm: page_alloc: unreserve highatomic page blocks before oom 2024-01-31 16:17:03 -08:00
page_counter.c
page_ext.c mm/page_exit: fix kernel doc warning in page_ext_put() 2022-11-22 18:50:41 -08:00
page_idle.c
page_io.c use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
page_isolation.c
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c mm: page_table_check: Ensure user pages are not slab pages 2023-06-14 11:15:29 +02:00
page_vma_mapped.c
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgalloc-track.h
pgtable-generic.c
process_vm_access.c use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
ptdump.c
readahead.c vfs: fix readahead(2) on block devices 2023-11-20 11:51:50 +01:00
rmap.c mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON 2023-03-10 09:34:25 +01:00
rodata_test.c
secretmem.c
shmem.c mm/shmem: fix race in shmem_undo_range w/THP 2023-12-20 17:00:26 +01:00
shrinker_debug.c mm: shrinkers: fix deadlock in shrinker debugfs 2023-02-22 12:59:46 +01:00
shuffle.c
shuffle.h
slab.c mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP 2023-03-30 12:49:23 +02:00
slab.h
slab_common.c mm/slab_common: fix slab_caches list corruption after kmem_cache_destroy() 2023-10-06 14:57:03 +02:00
slob.c
slub.c
sparse-vmemmap.c
sparse.c mm/sparsemem: fix race in accessing memory_section->usage 2024-01-31 16:17:02 -08:00
swap.c
swap.h
swap_cgroup.c
swap_slots.c
swap_state.c
swapfile.c mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() 2023-04-13 16:55:36 +02:00
truncate.c mm: merge folio_has_private()/filemap_release_folio() call pairs 2024-01-10 17:10:31 +01:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-06-28 11:12:17 +02:00
userfaultfd.c
util.c rcu: dump vmalloc memory info safely 2023-09-13 09:42:59 +02:00
vmalloc.c mm/vmalloc: add a safer version of find_vm_area() for debug 2023-09-13 09:43:00 +02:00
vmpressure.c net-memcg: Fix scope of sockmem pressure indicators 2023-09-13 09:42:33 +02:00
vmscan.c mm: merge folio_has_private()/filemap_release_folio() call pairs 2024-01-10 17:10:31 +01:00
vmstat.c
workingset.c mm/mglru: fix underprotected page cache 2023-12-20 17:00:26 +01:00
z3fold.c
zbud.c
zpool.c
zsmalloc.c zsmalloc: allow only one active pool compaction context 2023-08-23 17:52:40 +02:00
zswap.c zswap: do not shrink if cgroup may not zswap 2023-06-21 16:00:54 +02:00