Writeup for the Broken challenge from VolgaCTF 2016 Quals

This is a pretty nice challenge from the VolgaCTF 2016 Quals; sadly, I couldn't join the r/OpenToAllCTFteam and play because I was too busy, but I noticed it was missing a writeup and decided to write one.

The first thing I always do when I want to analyze an executable is to run it inside a disposable virtual machine; the first thing you will notice is that it doesn't seem to be doing anything at all, and that it will closes itself after half a minute with the following message: “The processing has taken too long, terminating the process…". It's obvious we will not gain any further knowledge from this executable by just launching it. Fire up your favorite disassembler and let's take a look at the entry point.

This is roughly what happens inside the main entry point:

  • A couple of structures are initialized.
  • Four threads are started using a function that wraps pthread_create.
  • The program pauses until the threads have returned (using a wrapper for pthread_join).
  • A printf() call outputs a string.

This is the function that is used to create each thread. We are running under Linux x64, meaning that the vast majority of the functions we will encounter here will use the __fastcall convention: arguments are primarily passed using the registers (RDI, RSI, RDX, RCX, R8, R9, XMM0/XMM7). The start routine is the second argument of the function, and will therefore end up inside the RSI register.

You can now track down the start routines used to create the threads:

  • 0x00400E20: ComputeSHA256Hash
  • 0x00400E60: ComputeSHA512Hash
  • 0x00400F40: Thread3
  • 0x00400EA0: TimeoutThread

The first two threads will compute the hash of a buffer and terminate; remember the two structures that are initialized right at the start of the main entry point? They hold both the input buffer and the pointer where the resulting digest will be stored. They are not particularly interesting, and I will not talk much about them.

The fourth thread is the one that prints the timeout message and terminate the process by calling exit().

Let's take a look at the third thread; you will eventually notice there's something spying on you once you start setting breakpoints around. The program will stop working and the execution will not even reach the main entry point. This is caused by a function that computes the hash of a selected number of functions defined the program and terminate in case any of the calculated signatures don't match.

If you are curious about this, you can declare such function using the __attribute__((constructor)) statement in your C or C++ code.

As you can see, it's calling a couple of initialization functions taken from an array; this array is accessed at virtual address 0x00401591 and contains both the patching protection (0x00400A50) that is interfering with us and a function that initializes the internal state of the program (0x00400DF0). The huge nop instruction at virtual address 0x004015B8 is a clear indication that an opcode has been removed. Did you notice that the second array is referenced but its value is actually never used? Replace the instruction with a call to the function pointer stored inside the second array (0x00400DD0).

Now we have to disable the protection; I have forced the jump at virtual address 0x00401541, but you can probably just skip the whole function.

Let's go back to the thread we were analyzing (0x00400F40). If you step through the code, you will notice that it deadlocks inside a sem_wait call. Do you remember how sem_init works? When you create a new semaphore, you can set the initial value; this value is increased using sem_post and decreased using sem_wait. If you wait on a semaphore that is currently set to 0, you will have to wait until someone increments it using sem_post.

The whole situation become a lot easier to understand once you give the semaphores a name. I have named them (surprise) semaphore1 (0x00400F77), semaphore2 (0x00400F8D) and semaphore3 (0x00400FA3). They have all been initialized to 0: keep this in mind because it's important.

Once the semaphores are initialized, a couple more threads are created: Thread5 (0x00400ED0) and Thread6 (0x004012C0). Both functions perform (more or less) the same operations:

// pseudo-code for Thread5 and Thread6
void ThreadEntryPoint()
	// keep in mind that you have two almost identical threads that
	// are performing the same operations!
	// increment semaphore3 twice; this will allow Thread3 to call
	// sem_wait(semaphore3) twice after Thread5 and Thread6 are created.
	_sem_post(semaphore3) // semaphore3 += 2

		// wait for Thread3 to give us the ok to proceed
		// we obviously need two sem_post(semaphore1) calls in order
		// to unlock both threads
		sem_wait(semaphore1); // semaphore1 -= 2
		// update the program state
		// ...

		// this will tell Thread3 that it can proceed
	} while (sem_post(semaphore3) == 0); // semaphore3 += 2

Now that we know how the two threads are working, let's go back to Thread3. We have another nop at virtual address 0x00401073 and we need to replace it with another function call; remember the patch protection function at virtual address 0x00400A50? One of the routines that it was protecting was not referenced by anything else in the program, meaning that it's the one we need to call.

You will notice that this is not enough to fix the program, because it will now deadlock somewhere else inside the code we added; if you take a closer look at where it gets stuck you will realize that the only possible explanation is that one of the semaphores inside the threads #5 and #6 has been changed.

Open the Thread6 function and fix the semaphore inside the loop:

Here's a summary of the patching I have done:

Run it again and you will get the flag: **VolgaCTF{avoid_de@dl0cks_they_br3ak_your_@pp}**