What is the full information needed to clone assembly instructions-Collection of common programming errors
In order to hook a function you’ll need to:
- Know its interface (calling convention, parameter number and types, all that). The compiler may inline the function or fool around the interface when optimizing code. If this is the case, I don’t know how to best handle it. You might need to tweak the code so the function is called via a volatile pointer to a function, trying to convince the compiler that the pointer may change its value at any time and point to any other function with the same parameters and it would be unwise to change the function’s prologue and epilogue. Disabling optimization may be another option. All this is needed to avoid the situation when the original and new functions aren’t compatible in terms of how they receive parameters and return. However, if this is one of the exported functions, the compiler obviously won’t change anything as it would break code.
- Know its address.
- Minimally disassemble the first instructions of the function, which you are going to overwrite with the jump instruction to your new code. When disassemblying you must find out: the instruction length (for this you’ll need to correctly parse all instruction prefixes, all opcode bytes, all Mod/Rm/SIB bytes, all displacement and all immediate operand bytes; some logic + look-up tables will help), whether this instruction transfers control to or accesses data at a location relative to the instruction pointer (e.g.
Jcc
,JMP near
,CALL near
,JMP/CALL qword ptr [RIP+something]
,MOV EAX, dword ptr [RIP+something]
) and, if this is so, the target address. - Know the address of the copies of the original instructions. Ideally, you’d allocate memory for the copies after parsing the instructions, but you can (and probably should) preallocate more to simplify your life.
- Copy the original instructions to the new place and if necessary adjust the relative address in them by the difference between the old and new location of these instructions. Note that, the original instructions may use very short relative addresses in them (e.g. 8-bit (the most common case for
Jcc
) or even 16-bit) which are insufficiently short for simple direct patching. In this case you will need to reassemble such instructions with longer relative addresses (this will require either inserting/changing an instruction prefix or changing the Mod/RM/SIB bytes). Keep in mind that the relative addresses are relative to the instruction’s end (or, IOW, beginning of the next instruction), which means if the adjusted instruction is longer than the original, the relative address will have to account for the instruction length difference as well. Ideally, you should also be prepared to handle the case when the original instructions, which you overwrite, jmp to one another. You don’t want their copies to jump back to the overwritten code. - Add a
JMP
instruction that jumps to the first untouched (by overwriting) instruction in the original function.
After this in most situations hooking should just work. The problems will arise if there’s any other code generated by the compiler that expects the original instructions at their original place and unchanged.
As for the data structure, you replace N
bytes of the original code. N
is 5 for a 32-bit jump. Those N
bytes will correspond to at most N
original instructions. You’ll need to save those 1 to N instructions in their entirety (every instruction is at most 15-bytes long, IIRC), then parse, possibly adjust and store in the new place. You don’t really need a tree here, an array would suffice. An element per instruction. Simple. But it’s quite some code that needs to be carefully written and debugged/tested.
Please see the related questions. There may be valuable details.
EDIT: Answering the main question:
I think, the main function to “copy” all instructions (copy_instructions()) may indeed be defined as you’ve defined it. You may want to return an error code from it, though, in case it fails (to allocate memory or disassemble unknown instruction or something else). It may be helpful. I can’t see what else you’d need from/for the caller.
Originally posted 2013-11-10 00:11:07.