Press enter or click to view image in full size
Usually, the first thing you do when running position-independent code (PIC) is find where your execution started. After messy exploitation and with ASLR in play, you need a reliable anchor point to locate the rest of your payload (like your data) so you can begin decoding, extracting, or comparing values.
CALL/POP (6–10 bytes)
The first and most well-known method is the classic CALL/POP technique. It uses a CALL instruction to push the return address (i.e., the current RIP) onto the stack, and then a POP retrieves it. This simple trick was the crown jewel of early shellcodes and remained a staple technique for a long time.
call get_ripget_rip:
pop rcx
The moment execution reaches the POP RCX, it retrieves the return address -effectively pointing to the current location in memory. From there, you can calculate offsets to access other parts of your shellcode.
However, this method faced a major issue: the CALL instruction uses a relative 4-byte offset, which can introduce null bytes -something shellcode often can't tolerate.
To overcome this, you can use a JMP trampoline, like the following:
jmp label_forwardslabel_backwards:
jmp label_get_rip
label_forwards:
call label_backwards
label_get_rip:
pop rcx
Now the JMP goes backwards, so the null bytes are avoided, the offset appears as negative, like 0xFFFFFFFF and decreasing.
FPU/FNSTENV (7 bytes)
Enter the Floating Point Unit (FPU) instructions , originally meant for floating-point calculations , but they also offered a way to retrieve EIP/RIP.
This technique involved executing one of about 100 FPU-related instructions, which would populate a 28-byte structure called the FPU data pointer. This structure includes various fields related to FPU execution, and one of them holds the address of the last executed FPU instruction.
The Shikata-ga-nai encoder used this method to ensure a more randomized stub. With many possible FPU instructions to choose from -and the ability to insert garbage instructions in between- it becomes harder for detection engines to recognize such shellcode.
Nick Hoffman, Jeremy Humble, and Toby Taylor wrote an excellent blog post back in the day detailing the use of FPU instructions in Shikata-ga-nai, including methods for detection and suggestions for improvement.
In x64, fxsave64 is used to obtain such state.
LEA (7–11 bytes)
Accessing EIP/RIP directly isn’t allowed in the x86 instruction set, you can't MOV or perform arithmetic using EIP/RIP. If you could, we wouldn't even be talking about these workarounds. The only way to control RIP is through instructions like CALL and RET, which rely on known or relative addresses, something you typically don’t have access to after an instruction executes.
In x86–64, however, one instruction does have the privilege of working with RIP: the LEA instruction.
LEA works similarly to MOV, but instead of moving data, it calculates the address of the source operand and stores it in the destination, like this:
; rbx = 0x05mov rax, [rbx + 0x01] ; Load value in address 0x06 into rax
lea rax, [rbx + 0x01] ; Copy the address 0x06 itself into rax (no memory access)
With MOV, RAX ends up containing the value stored at address 0x06, in other words, it dereferences the address. In contrast: LEA simply assigns the value 0x06 to RAX without dereferencing it.
This also works with RIP, where you can load its effective address into another register using an instruction like:
lea rcx, [rip]This copies the value of RIP + 0x00 into RCX.
To avoid null bytes, you can also use LEA RCX, [RIP - 0x01] and then increment the value afterward as needed.
Syscall
Enter a new method: SYSCALL! the mechanism that transfers execution from user mode to kernel mode. The kernel exposes a set of system services, and a table (like KiServiceTable in Windows or sys_call_table in Linux) uses the index provided in EAX to locate the requested service.
When SYSCALL is executed, the CPU switches context from user mode to kernel mode, updates RFLAGS accordingly, and begins executing the target kernel service. Once the service completes, SYSRET is called, restoring the user-mode stack and the original caller's RIP, with the return value placed in RAX.
You can see the start of this process in action when ntdll prepares a call to NtTerminateProcess:
Press enter or click to view image in full size
Note that during the transition, RCX is stored into R10 because RCX changes.
What happens is the CPU uses RCX to store the caller’s RIP address (similar to how RIP is pushed onto the stack during a normal CALL). It also stores the user stack pointer in R8 and the RFLAGS register in R11, along with other operations.
Press enter or click to view image in full size
Because of this, the registers RCX, R8-R11 get clobbered during the syscall transition. Once kernel takes control of execution, it also saves the new RCX into kernel stack, and restore it at the end.
Press enter or click to view image in full size
This behavior happens on both operating systems. In Linux, for example, a POP_REGS macro is called in syscall_return_via_sysret to undo the PUSH_AND_CLEAR_REGS operation performed upon syscall entry.
Knowing this allows us to use a syscall to obtain the current RIP. We can do this either by calling a non-destructive syscall or by invoking a non-existent syscall, hoping the system dispatcher doesn’t raise an exception but instead fails gracefully, returning an error code in RAX.
mov eax, 0x20202020syscall
; rcx will point to next instruction by now
By executing the previous code, we get 0xC000001C as the return value in RAX, which corresponds to STATUS_INVALID_SYSTEM_SERVICE, indicating that there's no service with that number in the SSDT.
The same concept applies in Linux, where the return value -38 corresponds to ENOSYS: Function not implemented.
Conclusion
The SYSCALL method gives us another way to get RIP, which can be useful in certain shellcode scenarios. With modern exploitation moving more toward data-only attacks, traditional shellcode tricks are used less often, but it’s still good to explore what’s possible.
This method could be flagged by invalid syscalls, or it might be made less suspicious by using harmless ones like getpid on Linux. Maybe it’s not even new and I could’ve just come across it late. Either way, it’s another option worth being aware of.
Thanks @FJarid for validation, @barakatsoror for valid input.
I found a mention of such a way a year ago on hn by @LegionMammal978.
.png)
