acrn-kernel/kernel/stackleak.c

178 lines
4.7 KiB
C
Raw Normal View History

x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-17 06:16:58 +08:00
// SPDX-License-Identifier: GPL-2.0
/*
* This code fills the used part of the kernel stack with a poison value
* before returning to userspace. It's part of the STACKLEAK feature
* ported from grsecurity/PaX.
*
* Author: Alexander Popov <alex.popov@linux.com>
*
* STACKLEAK reduces the information which kernel stack leak bugs can
* reveal and blocks some uninitialized stack variable attacks.
*/
#include <linux/stackleak.h>
#include <linux/kprobes.h>
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-17 06:16:58 +08:00
#ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
#include <linux/jump_label.h>
#include <linux/sysctl.h>
stackleak: move stack_erasing sysctl to stackleak.c kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the stack_erasing sysctl from kernel/sysctl.c to kernel/stackleak.c and use register_sysctl() to register the sysctl interface. [mcgrof@kernel.org: commit log update] Link: https://lkml.kernel.org/r/20211124231435.1445213-8-mcgrof@kernel.org Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Qing Wang <wangqing@vivo.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Stephen Kitt <steve@sk2.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 14:12:43 +08:00
#include <linux/init.h>
static DEFINE_STATIC_KEY_FALSE(stack_erasing_bypass);
stackleak: move stack_erasing sysctl to stackleak.c kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the stack_erasing sysctl from kernel/sysctl.c to kernel/stackleak.c and use register_sysctl() to register the sysctl interface. [mcgrof@kernel.org: commit log update] Link: https://lkml.kernel.org/r/20211124231435.1445213-8-mcgrof@kernel.org Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Qing Wang <wangqing@vivo.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Stephen Kitt <steve@sk2.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 14:12:43 +08:00
#ifdef CONFIG_SYSCTL
static int stack_erasing_sysctl(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
int ret = 0;
int state = !static_branch_unlikely(&stack_erasing_bypass);
int prev_state = state;
table->data = &state;
table->maxlen = sizeof(int);
ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
state = !!state;
if (ret || !write || state == prev_state)
return ret;
if (state)
static_branch_disable(&stack_erasing_bypass);
else
static_branch_enable(&stack_erasing_bypass);
pr_warn("stackleak: kernel stack erasing is %s\n",
state ? "enabled" : "disabled");
return ret;
}
stackleak: move stack_erasing sysctl to stackleak.c kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the stack_erasing sysctl from kernel/sysctl.c to kernel/stackleak.c and use register_sysctl() to register the sysctl interface. [mcgrof@kernel.org: commit log update] Link: https://lkml.kernel.org/r/20211124231435.1445213-8-mcgrof@kernel.org Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Qing Wang <wangqing@vivo.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Stephen Kitt <steve@sk2.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 14:12:43 +08:00
static struct ctl_table stackleak_sysctls[] = {
{
.procname = "stack_erasing",
.data = NULL,
.maxlen = sizeof(int),
.mode = 0600,
.proc_handler = stack_erasing_sysctl,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
},
{}
};
static int __init stackleak_sysctls_init(void)
{
register_sysctl_init("kernel", stackleak_sysctls);
return 0;
}
late_initcall(stackleak_sysctls_init);
#endif /* CONFIG_SYSCTL */
#define skip_erasing() static_branch_unlikely(&stack_erasing_bypass)
#else
#define skip_erasing() false
#endif /* CONFIG_STACKLEAK_RUNTIME_DISABLE */
#ifndef __stackleak_poison
static __always_inline void __stackleak_poison(unsigned long erase_low,
unsigned long erase_high,
unsigned long poison)
{
while (erase_low < erase_high) {
*(unsigned long *)erase_low = poison;
erase_low += sizeof(unsigned long);
}
}
#endif
static __always_inline void __stackleak_erase(bool on_task_stack)
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-17 06:16:58 +08:00
{
stackleak: rework stack low bound handling In stackleak_task_init(), stackleak_track_stack(), and __stackleak_erase(), we open-code skipping the STACK_END_MAGIC at the bottom of the stack. Each case is implemented slightly differently, and only the __stackleak_erase() case is commented. In stackleak_task_init() and stackleak_track_stack() we unconditionally add sizeof(unsigned long) to the lowest stack address. In stackleak_task_init() we use end_of_stack() for this, and in stackleak_track_stack() we use task_stack_page(). In __stackleak_erase() we handle this by detecting if `kstack_ptr` has hit the stack end boundary, and if so, conditionally moving it above the magic. This patch adds a new stackleak_task_low_bound() helper which is used in all three cases, which unconditionally adds sizeof(unsigned long) to the lowest address on the task stack, with commentary as to why. This uses end_of_stack() as stackleak_task_init() did prior to this patch, as this is consistent with the code in kernel/fork.c which initializes the STACK_END_MAGIC value. In __stackleak_erase() we no longer need to check whether we've spilled into the STACK_END_MAGIC value, as stackleak_track_stack() ensures that `current->lowest_stack` stops immediately above this, and similarly the poison scan will stop immediately above this. For stackleak_task_init() and stackleak_track_stack() this results in no change to code generation. For __stackleak_erase() the generated assembly is slightly simpler and shorter. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-5-mark.rutland@arm.com
2022-04-28 01:31:19 +08:00
const unsigned long task_stack_low = stackleak_task_low_bound(current);
stackleak: rework stack high bound handling Prior to returning to userspace, we reset current->lowest_stack to a reasonable high bound. Currently we do this by subtracting the arbitrary value `THREAD_SIZE/64` from the top of the stack, for reasons lost to history. Looking at configurations today: * On i386 where THREAD_SIZE is 8K, the bound will be 128 bytes. The pt_regs at the top of the stack is 68 bytes (with 0 to 16 bytes of padding above), and so this covers an additional portion of 44 to 60 bytes. * On x86_64 where THREAD_SIZE is at least 16K (up to 32K with KASAN) the bound will be at least 256 bytes (up to 512 with KASAN). The pt_regs at the top of the stack is 168 bytes, and so this cover an additional 88 bytes of stack (up to 344 with KASAN). * On arm64 where THREAD_SIZE is at least 16K (up to 64K with 64K pages and VMAP_STACK), the bound will be at least 256 bytes (up to 1024 with KASAN). The pt_regs at the top of the stack is 336 bytes, so this can fall within the pt_regs, or can cover an additional 688 bytes of stack. Clearly the `THREAD_SIZE/64` value doesn't make much sense -- in the worst case, this will cause more than 600 bytes of stack to be erased for every syscall, even if actual stack usage were substantially smaller. This patches makes this slightly less nonsensical by consistently resetting current->lowest_stack to the base of the task pt_regs. For clarity and for consistency with the handling of the low bound, the generation of the high bound is split into a helper with commentary explaining why. Since the pt_regs at the top of the stack will be clobbered upon the next exception entry, we don't need to poison these at exception exit. By using task_pt_regs() as the high stack boundary instead of current_top_of_stack() we avoid some redundant poisoning, and the compiler can share the address generation between the poisoning and resetting of `current->lowest_stack`, making the generated code more optimal. It's not clear to me whether the existing `THREAD_SIZE/64` offset was a dodgy heuristic to skip the pt_regs, or whether it was attempting to minimize the number of times stackleak_check_stack() would have to update `current->lowest_stack` when stack usage was shallow at the cost of unconditionally poisoning a small portion of the stack for every exit to userspace. For now I've simply removed the offset, and if we need/want to minimize updates for shallow stack usage it should be easy to add a better heuristic atop, with appropriate commentary so we know what's going on. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-7-mark.rutland@arm.com
2022-04-28 01:31:21 +08:00
const unsigned long task_stack_high = stackleak_task_high_bound(current);
stackleak: rework poison scanning Currently we over-estimate the region of stack which must be erased. To determine the region to be erased, we scan downwards for a contiguous block of poison values (or the low bound of the stack). There are a few minor problems with this today: * When we find a block of poison values, we include this block within the region to erase. As this is included within the region to erase, this causes us to redundantly overwrite 'STACKLEAK_SEARCH_DEPTH' (128) bytes with poison. * As the loop condition checks 'poison_count <= depth', it will run an additional iteration after finding the contiguous block of poison, decrementing 'erase_low' once more than necessary. As this is included within the region to erase, this causes us to redundantly overwrite an additional unsigned long with poison. * As we always decrement 'erase_low' after checking an element on the stack, we always include the element below this within the region to erase. As this is included within the region to erase, this causes us to redundantly overwrite an additional unsigned long with poison. Note that this is not a functional problem. As the loop condition checks 'erase_low > task_stack_low', we'll never clobber the STACK_END_MAGIC. As we always decrement 'erase_low' after this, we'll never fail to erase the element immediately above the STACK_END_MAGIC. In total, this can cause us to erase `128 + 2 * sizeof(unsigned long)` bytes more than necessary, which is unfortunate. This patch reworks the logic to find the address immediately above the poisoned region, by finding the lowest non-poisoned address. This is factored into a stackleak_find_top_of_poison() helper both for clarity and so that this can be shared with the LKDTM test in subsequent patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-8-mark.rutland@arm.com
2022-04-28 01:31:22 +08:00
unsigned long erase_low, erase_high;
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-17 06:16:58 +08:00
stackleak: rework poison scanning Currently we over-estimate the region of stack which must be erased. To determine the region to be erased, we scan downwards for a contiguous block of poison values (or the low bound of the stack). There are a few minor problems with this today: * When we find a block of poison values, we include this block within the region to erase. As this is included within the region to erase, this causes us to redundantly overwrite 'STACKLEAK_SEARCH_DEPTH' (128) bytes with poison. * As the loop condition checks 'poison_count <= depth', it will run an additional iteration after finding the contiguous block of poison, decrementing 'erase_low' once more than necessary. As this is included within the region to erase, this causes us to redundantly overwrite an additional unsigned long with poison. * As we always decrement 'erase_low' after checking an element on the stack, we always include the element below this within the region to erase. As this is included within the region to erase, this causes us to redundantly overwrite an additional unsigned long with poison. Note that this is not a functional problem. As the loop condition checks 'erase_low > task_stack_low', we'll never clobber the STACK_END_MAGIC. As we always decrement 'erase_low' after this, we'll never fail to erase the element immediately above the STACK_END_MAGIC. In total, this can cause us to erase `128 + 2 * sizeof(unsigned long)` bytes more than necessary, which is unfortunate. This patch reworks the logic to find the address immediately above the poisoned region, by finding the lowest non-poisoned address. This is factored into a stackleak_find_top_of_poison() helper both for clarity and so that this can be shared with the LKDTM test in subsequent patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-8-mark.rutland@arm.com
2022-04-28 01:31:22 +08:00
erase_low = stackleak_find_top_of_poison(task_stack_low,
current->lowest_stack);
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-17 06:16:58 +08:00
#ifdef CONFIG_STACKLEAK_METRICS
current->prev_lowest_stack = erase_low;
#endif
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-17 06:16:58 +08:00
/*
stackleak: rework stack high bound handling Prior to returning to userspace, we reset current->lowest_stack to a reasonable high bound. Currently we do this by subtracting the arbitrary value `THREAD_SIZE/64` from the top of the stack, for reasons lost to history. Looking at configurations today: * On i386 where THREAD_SIZE is 8K, the bound will be 128 bytes. The pt_regs at the top of the stack is 68 bytes (with 0 to 16 bytes of padding above), and so this covers an additional portion of 44 to 60 bytes. * On x86_64 where THREAD_SIZE is at least 16K (up to 32K with KASAN) the bound will be at least 256 bytes (up to 512 with KASAN). The pt_regs at the top of the stack is 168 bytes, and so this cover an additional 88 bytes of stack (up to 344 with KASAN). * On arm64 where THREAD_SIZE is at least 16K (up to 64K with 64K pages and VMAP_STACK), the bound will be at least 256 bytes (up to 1024 with KASAN). The pt_regs at the top of the stack is 336 bytes, so this can fall within the pt_regs, or can cover an additional 688 bytes of stack. Clearly the `THREAD_SIZE/64` value doesn't make much sense -- in the worst case, this will cause more than 600 bytes of stack to be erased for every syscall, even if actual stack usage were substantially smaller. This patches makes this slightly less nonsensical by consistently resetting current->lowest_stack to the base of the task pt_regs. For clarity and for consistency with the handling of the low bound, the generation of the high bound is split into a helper with commentary explaining why. Since the pt_regs at the top of the stack will be clobbered upon the next exception entry, we don't need to poison these at exception exit. By using task_pt_regs() as the high stack boundary instead of current_top_of_stack() we avoid some redundant poisoning, and the compiler can share the address generation between the poisoning and resetting of `current->lowest_stack`, making the generated code more optimal. It's not clear to me whether the existing `THREAD_SIZE/64` offset was a dodgy heuristic to skip the pt_regs, or whether it was attempting to minimize the number of times stackleak_check_stack() would have to update `current->lowest_stack` when stack usage was shallow at the cost of unconditionally poisoning a small portion of the stack for every exit to userspace. For now I've simply removed the offset, and if we need/want to minimize updates for shallow stack usage it should be easy to add a better heuristic atop, with appropriate commentary so we know what's going on. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-7-mark.rutland@arm.com
2022-04-28 01:31:21 +08:00
* Write poison to the task's stack between 'erase_low' and
* 'erase_high'.
*
* If we're running on a different stack (e.g. an entry trampoline
* stack) we can erase everything below the pt_regs at the top of the
* task stack.
*
* If we're running on the task stack itself, we must not clobber any
* stack used by this function and its caller. We assume that this
* function has a fixed-size stack frame, and the current stack pointer
* doesn't change while we write poison.
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-17 06:16:58 +08:00
*/
if (on_task_stack)
erase_high = current_stack_pointer;
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-17 06:16:58 +08:00
else
stackleak: rework stack high bound handling Prior to returning to userspace, we reset current->lowest_stack to a reasonable high bound. Currently we do this by subtracting the arbitrary value `THREAD_SIZE/64` from the top of the stack, for reasons lost to history. Looking at configurations today: * On i386 where THREAD_SIZE is 8K, the bound will be 128 bytes. The pt_regs at the top of the stack is 68 bytes (with 0 to 16 bytes of padding above), and so this covers an additional portion of 44 to 60 bytes. * On x86_64 where THREAD_SIZE is at least 16K (up to 32K with KASAN) the bound will be at least 256 bytes (up to 512 with KASAN). The pt_regs at the top of the stack is 168 bytes, and so this cover an additional 88 bytes of stack (up to 344 with KASAN). * On arm64 where THREAD_SIZE is at least 16K (up to 64K with 64K pages and VMAP_STACK), the bound will be at least 256 bytes (up to 1024 with KASAN). The pt_regs at the top of the stack is 336 bytes, so this can fall within the pt_regs, or can cover an additional 688 bytes of stack. Clearly the `THREAD_SIZE/64` value doesn't make much sense -- in the worst case, this will cause more than 600 bytes of stack to be erased for every syscall, even if actual stack usage were substantially smaller. This patches makes this slightly less nonsensical by consistently resetting current->lowest_stack to the base of the task pt_regs. For clarity and for consistency with the handling of the low bound, the generation of the high bound is split into a helper with commentary explaining why. Since the pt_regs at the top of the stack will be clobbered upon the next exception entry, we don't need to poison these at exception exit. By using task_pt_regs() as the high stack boundary instead of current_top_of_stack() we avoid some redundant poisoning, and the compiler can share the address generation between the poisoning and resetting of `current->lowest_stack`, making the generated code more optimal. It's not clear to me whether the existing `THREAD_SIZE/64` offset was a dodgy heuristic to skip the pt_regs, or whether it was attempting to minimize the number of times stackleak_check_stack() would have to update `current->lowest_stack` when stack usage was shallow at the cost of unconditionally poisoning a small portion of the stack for every exit to userspace. For now I've simply removed the offset, and if we need/want to minimize updates for shallow stack usage it should be easy to add a better heuristic atop, with appropriate commentary so we know what's going on. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-7-mark.rutland@arm.com
2022-04-28 01:31:21 +08:00
erase_high = task_stack_high;
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-17 06:16:58 +08:00
__stackleak_poison(erase_low, erase_high, STACKLEAK_POISON);
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-17 06:16:58 +08:00
/* Reset the 'lowest_stack' value for the next syscall */
stackleak: rework stack high bound handling Prior to returning to userspace, we reset current->lowest_stack to a reasonable high bound. Currently we do this by subtracting the arbitrary value `THREAD_SIZE/64` from the top of the stack, for reasons lost to history. Looking at configurations today: * On i386 where THREAD_SIZE is 8K, the bound will be 128 bytes. The pt_regs at the top of the stack is 68 bytes (with 0 to 16 bytes of padding above), and so this covers an additional portion of 44 to 60 bytes. * On x86_64 where THREAD_SIZE is at least 16K (up to 32K with KASAN) the bound will be at least 256 bytes (up to 512 with KASAN). The pt_regs at the top of the stack is 168 bytes, and so this cover an additional 88 bytes of stack (up to 344 with KASAN). * On arm64 where THREAD_SIZE is at least 16K (up to 64K with 64K pages and VMAP_STACK), the bound will be at least 256 bytes (up to 1024 with KASAN). The pt_regs at the top of the stack is 336 bytes, so this can fall within the pt_regs, or can cover an additional 688 bytes of stack. Clearly the `THREAD_SIZE/64` value doesn't make much sense -- in the worst case, this will cause more than 600 bytes of stack to be erased for every syscall, even if actual stack usage were substantially smaller. This patches makes this slightly less nonsensical by consistently resetting current->lowest_stack to the base of the task pt_regs. For clarity and for consistency with the handling of the low bound, the generation of the high bound is split into a helper with commentary explaining why. Since the pt_regs at the top of the stack will be clobbered upon the next exception entry, we don't need to poison these at exception exit. By using task_pt_regs() as the high stack boundary instead of current_top_of_stack() we avoid some redundant poisoning, and the compiler can share the address generation between the poisoning and resetting of `current->lowest_stack`, making the generated code more optimal. It's not clear to me whether the existing `THREAD_SIZE/64` offset was a dodgy heuristic to skip the pt_regs, or whether it was attempting to minimize the number of times stackleak_check_stack() would have to update `current->lowest_stack` when stack usage was shallow at the cost of unconditionally poisoning a small portion of the stack for every exit to userspace. For now I've simply removed the offset, and if we need/want to minimize updates for shallow stack usage it should be easy to add a better heuristic atop, with appropriate commentary so we know what's going on. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-7-mark.rutland@arm.com
2022-04-28 01:31:21 +08:00
current->lowest_stack = task_stack_high;
x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls The STACKLEAK feature (initially developed by PaX Team) has the following benefits: 1. Reduces the information that can be revealed through kernel stack leak bugs. The idea of erasing the thread stack at the end of syscalls is similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel crypto, which all comply with FDP_RIP.2 (Full Residual Information Protection) of the Common Criteria standard. 2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712, CVE-2010-2963). That kind of bugs should be killed by improving C compilers in future, which might take a long time. This commit introduces the code filling the used part of the kernel stack with a poison value before returning to userspace. Full STACKLEAK feature also contains the gcc plugin which comes in a separate commit. The STACKLEAK feature is ported from grsecurity/PaX. More information at: https://grsecurity.net/ https://pax.grsecurity.net/ This code is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on our understanding of the code. Changes or omissions from the original code are ours and don't reflect the original grsecurity/PaX code. Performance impact: Hardware: Intel Core i7-4770, 16 GB RAM Test #1: building the Linux kernel on a single core 0.91% slowdown Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P 4.2% slowdown So the STACKLEAK description in Kconfig includes: "The tradeoff is the performance impact: on a single CPU system kernel compilation sees a 1% slowdown, other systems and workloads may vary and you are advised to test this feature on your expected workload before deploying it". Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-17 06:16:58 +08:00
}
/*
* Erase and poison the portion of the task stack used since the last erase.
* Can be called from the task stack or an entry stack when the task stack is
* no longer in use.
*/
stackleak: move skip_erasing() check earlier In stackleak_erase() we check skip_erasing() after accessing some fields from current. As generating the address of current uses asm which hazards with the static branch asm, this work is always performed, even when the static branch is patched to jump to the return at the end of the function. This patch avoids this redundant work by moving the skip_erasing() check earlier. To avoid complicating initialization within stackleak_erase(), the body of the function is split out into a __stackleak_erase() helper, with the check left in a wrapper function. The __stackleak_erase() helper is marked __always_inline to ensure that this is inlined into stackleak_erase() and not instrumented. Before this patch, on x86-64 w/ GCC 11.1.0 the start of the function is: <stackleak_erase>: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 00 00 48 8b 48 20 mov 0x20(%rax),%rcx 48 8b 80 98 0a 00 00 mov 0xa98(%rax),%rax 66 90 xchg %ax,%ax <------------ static branch 48 89 c2 mov %rax,%rdx 48 29 ca sub %rcx,%rdx 48 81 fa ff 3f 00 00 cmp $0x3fff,%rdx After this patch, on x86-64 w/ GCC 11.1.0 the start of the function is: <stackleak_erase>: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) <--- static branch 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 00 00 48 8b 48 20 mov 0x20(%rax),%rcx 48 8b 80 98 0a 00 00 mov 0xa98(%rax),%rax 48 89 c2 mov %rax,%rdx 48 29 ca sub %rcx,%rdx 48 81 fa ff 3f 00 00 cmp $0x3fff,%rdx Before this patch, on arm64 w/ GCC 11.1.0 the start of the function is: <stackleak_erase>: d503245f bti c d5384100 mrs x0, sp_el0 f9401003 ldr x3, [x0, #32] f9451000 ldr x0, [x0, #2592] d503201f nop <------------------------------- static branch d503233f paciasp cb030002 sub x2, x0, x3 d287ffe1 mov x1, #0x3fff eb01005f cmp x2, x1 After this patch, on arm64 w/ GCC 11.1.0 the start of the function is: <stackleak_erase>: d503245f bti c d503201f nop <------------------------------- static branch d503233f paciasp d5384100 mrs x0, sp_el0 f9401003 ldr x3, [x0, #32] d287ffe1 mov x1, #0x3fff f9451000 ldr x0, [x0, #2592] cb030002 sub x2, x0, x3 eb01005f cmp x2, x1 While this may not be a huge win on its own, moving the static branch will permit further optimization of the body of the function in subsequent patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-3-mark.rutland@arm.com
2022-04-28 01:31:17 +08:00
asmlinkage void noinstr stackleak_erase(void)
{
if (skip_erasing())
return;
__stackleak_erase(on_thread_stack());
}
/*
* Erase and poison the portion of the task stack used since the last erase.
* Can only be called from the task stack.
*/
asmlinkage void noinstr stackleak_erase_on_task_stack(void)
{
if (skip_erasing())
return;
__stackleak_erase(true);
}
/*
* Erase and poison the portion of the task stack used since the last erase.
* Can only be called from a stack other than the task stack.
*/
asmlinkage void noinstr stackleak_erase_off_task_stack(void)
{
if (skip_erasing())
return;
__stackleak_erase(false);
stackleak: move skip_erasing() check earlier In stackleak_erase() we check skip_erasing() after accessing some fields from current. As generating the address of current uses asm which hazards with the static branch asm, this work is always performed, even when the static branch is patched to jump to the return at the end of the function. This patch avoids this redundant work by moving the skip_erasing() check earlier. To avoid complicating initialization within stackleak_erase(), the body of the function is split out into a __stackleak_erase() helper, with the check left in a wrapper function. The __stackleak_erase() helper is marked __always_inline to ensure that this is inlined into stackleak_erase() and not instrumented. Before this patch, on x86-64 w/ GCC 11.1.0 the start of the function is: <stackleak_erase>: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 00 00 48 8b 48 20 mov 0x20(%rax),%rcx 48 8b 80 98 0a 00 00 mov 0xa98(%rax),%rax 66 90 xchg %ax,%ax <------------ static branch 48 89 c2 mov %rax,%rdx 48 29 ca sub %rcx,%rdx 48 81 fa ff 3f 00 00 cmp $0x3fff,%rdx After this patch, on x86-64 w/ GCC 11.1.0 the start of the function is: <stackleak_erase>: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) <--- static branch 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 00 00 48 8b 48 20 mov 0x20(%rax),%rcx 48 8b 80 98 0a 00 00 mov 0xa98(%rax),%rax 48 89 c2 mov %rax,%rdx 48 29 ca sub %rcx,%rdx 48 81 fa ff 3f 00 00 cmp $0x3fff,%rdx Before this patch, on arm64 w/ GCC 11.1.0 the start of the function is: <stackleak_erase>: d503245f bti c d5384100 mrs x0, sp_el0 f9401003 ldr x3, [x0, #32] f9451000 ldr x0, [x0, #2592] d503201f nop <------------------------------- static branch d503233f paciasp cb030002 sub x2, x0, x3 d287ffe1 mov x1, #0x3fff eb01005f cmp x2, x1 After this patch, on arm64 w/ GCC 11.1.0 the start of the function is: <stackleak_erase>: d503245f bti c d503201f nop <------------------------------- static branch d503233f paciasp d5384100 mrs x0, sp_el0 f9401003 ldr x3, [x0, #32] d287ffe1 mov x1, #0x3fff f9451000 ldr x0, [x0, #2592] cb030002 sub x2, x0, x3 eb01005f cmp x2, x1 While this may not be a huge win on its own, moving the static branch will permit further optimization of the body of the function in subsequent patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-3-mark.rutland@arm.com
2022-04-28 01:31:17 +08:00
}
gcc-plugins/stackleak: Use noinstr in favor of notrace While the stackleak plugin was already using notrace, objtool is now a bit more picky. Update the notrace uses to noinstr. Silences the following objtool warnings when building with: CONFIG_DEBUG_ENTRY=y CONFIG_STACK_VALIDATION=y CONFIG_VMLINUX_VALIDATION=y CONFIG_GCC_PLUGIN_STACKLEAK=y vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section Note that the plugin's addition of calls to stackleak_track_stack() from noinstr functions is expected to be safe, as it isn't runtime instrumentation and is self-contained. Cc: Alexander Popov <alex.popov@linux.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-04 04:17:54 +08:00
void __used __no_caller_saved_registers noinstr stackleak_track_stack(void)
{
gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving The kernel code instrumentation in stackleak gcc plugin works in two stages. At first, stack tracking is added to GIMPLE representation of every function (except some special cases). And later, when stack frame size info is available, stack tracking is removed from the RTL representation of the functions with small stack frame. There is an unwanted side-effect for these functions: some of them do useless work with caller-saved registers. As an example of such case, proc_sys_write without() instrumentation: 55 push %rbp 41 b8 01 00 00 00 mov $0x1,%r8d 48 89 e5 mov %rsp,%rbp e8 11 ff ff ff callq ffffffff81284610 <proc_sys_call_handler> 5d pop %rbp c3 retq 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 00 00 00 proc_sys_write() with instrumentation: 55 push %rbp 48 89 e5 mov %rsp,%rbp 41 56 push %r14 41 55 push %r13 41 54 push %r12 53 push %rbx 49 89 f4 mov %rsi,%r12 48 89 fb mov %rdi,%rbx 49 89 d5 mov %rdx,%r13 49 89 ce mov %rcx,%r14 4c 89 f1 mov %r14,%rcx 4c 89 ea mov %r13,%rdx 4c 89 e6 mov %r12,%rsi 48 89 df mov %rbx,%rdi 41 b8 01 00 00 00 mov $0x1,%r8d e8 f2 fe ff ff callq ffffffff81298e80 <proc_sys_call_handler> 5b pop %rbx 41 5c pop %r12 41 5d pop %r13 41 5e pop %r14 5d pop %rbp c3 retq 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 00 00 Let's improve the instrumentation to avoid this: 1. Make stackleak_track_stack() save all register that it works with. Use no_caller_saved_registers attribute for that function. This attribute is available for x86_64 and i386 starting from gcc-7. 2. Insert calling stackleak_track_stack() in asm: asm volatile("call stackleak_track_stack" :: "r" (current_stack_pointer)) Here we use ASM_CALL_CONSTRAINT trick from arch/x86/include/asm/asm.h. The input constraint is taken into account during gcc shrink-wrapping optimization. It is needed to be sure that stackleak_track_stack() call is inserted after the prologue of the containing function, when the stack frame is prepared. This work is a deep reengineering of the idea described on grsecurity blog https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Link: https://lore.kernel.org/r/20200624123330.83226-5-alex.popov@linux.com Signed-off-by: Kees Cook <keescook@chromium.org>
2020-06-24 20:33:29 +08:00
unsigned long sp = current_stack_pointer;
/*
* Having CONFIG_STACKLEAK_TRACK_MIN_SIZE larger than
* STACKLEAK_SEARCH_DEPTH makes the poison search in
* stackleak_erase() unreliable. Let's prevent that.
*/
BUILD_BUG_ON(CONFIG_STACKLEAK_TRACK_MIN_SIZE > STACKLEAK_SEARCH_DEPTH);
gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving The kernel code instrumentation in stackleak gcc plugin works in two stages. At first, stack tracking is added to GIMPLE representation of every function (except some special cases). And later, when stack frame size info is available, stack tracking is removed from the RTL representation of the functions with small stack frame. There is an unwanted side-effect for these functions: some of them do useless work with caller-saved registers. As an example of such case, proc_sys_write without() instrumentation: 55 push %rbp 41 b8 01 00 00 00 mov $0x1,%r8d 48 89 e5 mov %rsp,%rbp e8 11 ff ff ff callq ffffffff81284610 <proc_sys_call_handler> 5d pop %rbp c3 retq 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 00 00 00 proc_sys_write() with instrumentation: 55 push %rbp 48 89 e5 mov %rsp,%rbp 41 56 push %r14 41 55 push %r13 41 54 push %r12 53 push %rbx 49 89 f4 mov %rsi,%r12 48 89 fb mov %rdi,%rbx 49 89 d5 mov %rdx,%r13 49 89 ce mov %rcx,%r14 4c 89 f1 mov %r14,%rcx 4c 89 ea mov %r13,%rdx 4c 89 e6 mov %r12,%rsi 48 89 df mov %rbx,%rdi 41 b8 01 00 00 00 mov $0x1,%r8d e8 f2 fe ff ff callq ffffffff81298e80 <proc_sys_call_handler> 5b pop %rbx 41 5c pop %r12 41 5d pop %r13 41 5e pop %r14 5d pop %rbp c3 retq 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 00 00 Let's improve the instrumentation to avoid this: 1. Make stackleak_track_stack() save all register that it works with. Use no_caller_saved_registers attribute for that function. This attribute is available for x86_64 and i386 starting from gcc-7. 2. Insert calling stackleak_track_stack() in asm: asm volatile("call stackleak_track_stack" :: "r" (current_stack_pointer)) Here we use ASM_CALL_CONSTRAINT trick from arch/x86/include/asm/asm.h. The input constraint is taken into account during gcc shrink-wrapping optimization. It is needed to be sure that stackleak_track_stack() call is inserted after the prologue of the containing function, when the stack frame is prepared. This work is a deep reengineering of the idea described on grsecurity blog https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction Signed-off-by: Alexander Popov <alex.popov@linux.com> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Link: https://lore.kernel.org/r/20200624123330.83226-5-alex.popov@linux.com Signed-off-by: Kees Cook <keescook@chromium.org>
2020-06-24 20:33:29 +08:00
/* 'lowest_stack' should be aligned on the register width boundary */
sp = ALIGN(sp, sizeof(unsigned long));
if (sp < current->lowest_stack &&
stackleak: rework stack low bound handling In stackleak_task_init(), stackleak_track_stack(), and __stackleak_erase(), we open-code skipping the STACK_END_MAGIC at the bottom of the stack. Each case is implemented slightly differently, and only the __stackleak_erase() case is commented. In stackleak_task_init() and stackleak_track_stack() we unconditionally add sizeof(unsigned long) to the lowest stack address. In stackleak_task_init() we use end_of_stack() for this, and in stackleak_track_stack() we use task_stack_page(). In __stackleak_erase() we handle this by detecting if `kstack_ptr` has hit the stack end boundary, and if so, conditionally moving it above the magic. This patch adds a new stackleak_task_low_bound() helper which is used in all three cases, which unconditionally adds sizeof(unsigned long) to the lowest address on the task stack, with commentary as to why. This uses end_of_stack() as stackleak_task_init() did prior to this patch, as this is consistent with the code in kernel/fork.c which initializes the STACK_END_MAGIC value. In __stackleak_erase() we no longer need to check whether we've spilled into the STACK_END_MAGIC value, as stackleak_track_stack() ensures that `current->lowest_stack` stops immediately above this, and similarly the poison scan will stop immediately above this. For stackleak_task_init() and stackleak_track_stack() this results in no change to code generation. For __stackleak_erase() the generated assembly is slightly simpler and shorter. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-5-mark.rutland@arm.com
2022-04-28 01:31:19 +08:00
sp >= stackleak_task_low_bound(current)) {
current->lowest_stack = sp;
}
}
EXPORT_SYMBOL(stackleak_track_stack);