- How to stymie a disassembler
- Posted by randyhyde@earthlink.net on February 17th, 2006
Hi All,
I'm collecting little tricks that will stymie a disassembler (that is,
prevent it from disassembling the code correctly) to use in a book
project I'm working on ("The Art of Disassembly"). I've collected a
bunch of tricks over the years (OhMyGosh, it's getting to be decades
now), but chances are pretty good that I've missed some pretty good
ones.
Here are some of the ideas I'm using in the book:
1. Burying data in the code stream
2. Placing code in the middle of data objects (a variant of [1]).
3. Arithmetic expressions involving two relocatable addresses (e.g.,
lbl1-lbl2)
4. Burying instructions within the opcodes of other instructions
5. Using alignment operations in code and data
6. Writing code that does not have well-defined procedure/function
boundaries
7. Overlapping data tables and, in general, making data boundaries
fuzzy.
8. Using unions and variant types to make it difficult to infer a data
object's type
9. Writing interpreters that allow a mixture of 80x86 and interpretive
code in the code stream
10. Using the breakpoint (int 3) and trace flag facilities within the
application
11. Using the machine instructions that correspond to a copyright
notice (or other string) do useful computations within the program.
12. Using the data at some location as both program data and executable
machine code (a generalization of [11]). This includes, for example,
self-modifying code.
13. Using lots of dynamically-linked libraries to make it difficult (or
even impossible) for a disassembler to infer much about the external
code.
14. Creating wrappers for system APIs to make it difficult for
heuristic analysis to make any headway processing those calls.
My interest in this subject is duomorphic. I want to be able to discuss
how to overcome these problems when using (or writing) a disassembler;
I also want to discuss how to help obfuscate object code to make it
difficult to disassemble. Any and all constructive comments,
suggestions, and examples are welcome.
Cheers,
Randy Hyde
- Posted by Pete Fenelon on February 17th, 2006
randyhyde@earthlink.net wrote:
Something I've seen done - admittedly for fairly nefarious purposes! -
on a *ix box was replacing system call APIs with the far less friendly
and obvious _syscall macros. Certainly gets rid of a lot of stuff that
helps people understand what you're doing 
pete
--
pete@fenelon.com [Support no2id.net: working to destroy Blair's ID card fraud]
- Posted by Mike Harrison on February 17th, 2006
On 17 Feb 2006 11:43:54 -0800, randyhyde@earthlink.net wrote:
Self-modifying code.
Many levels of indirection when accessing variables.
When discussing disassembly however, bear in mind that a logic analyser will rapidly unravel many
obfuscation tactics...
- Posted by Tauno Voipio on February 17th, 2006
randyhyde@earthlink.net wrote:
The standard trick used by most virii (viruses?) in circulation
is to pack the program file in some weird way, maybe in several
steps, and include the unpacker(s) in the code. This is actually
a variant of self-modifying code.
On many operating systems and processor architectures, self-modifying
code faces the difficulties of processor caches and code / data
privilege separation. Basically, the protected mode Intel architecture
refuses to have the same memory segment executable and writable, though
the limitation is often by-passed with segment aliasing.
--
Tauno Voipio
tauno voipio (at) iki fi
- Posted by Paul Keinanen on February 17th, 2006
On 17 Feb 2006 11:43:54 -0800, randyhyde@earthlink.net wrote:
The RSX-11M system library used quite a lot of this to save a branch
instruction. If two routines were nearly identical with some
alternatives at the end, it was common to start the function with
NAME1:: MOV #4001, R1 ; Move a nonzero flag into R1
other instructions
however, the alternate entry point was at NAME1+2
.WORD 12701 ; The first word of the MOV #4001,R1
NAME2:: CLR R1 ; The 4001 opcode
Without the global symbol table it would have been very hard to detect
the other routine (I hope I remembered the opcodes correctly).
An alternative would be to unzip/uncrypt a segment of code at run time
into memory.
On architectures that do not support unaligned word or instruction
address, include a illegal address trap routine that fixes the illegal
addressing expression in a predefined way and executes the
instruction. If instructions are normally allowed only on word/long
word boundaries, few disassemblers would try to decode instructions at
odd addresses.
A two pass disassembler is nice to have, the first pass builds just a
symbol table of branch/jump targets and the second pass generates the
output listing and writes pseudo labels only to those locations
referenced. In normal situations, this helps a lot to understand the
code, but it can also be used to alert about branches into
instructions hidden within other instructions or non-word boundary
data.
Alternatively, if the target of the target of a branch appears to be
an illegal instruction, this would suggest that the first pass
incorrectly detected a branch instruction.
Paul
- Posted by Jim Stewart on February 17th, 2006
randyhyde@earthlink.net wrote:
Years and years ago I added a subroutine to my x86 library
called print_next. It would print the null-terminated
text string that followed the subroutine's call instruction:
call print_next
db 'I'm going to do an ADD next',0
add ax,bx
Of course, the consequences of leaving out the null
terminator are always spectacular (:
- Posted by Didi on February 17th, 2006
Paul Keinanen wrote:
I remember practically the same technique being used in the 6800
assembler. To branch over one or two bytes when the condition codes
would not have to be preserved, instead of using a two byte "bra"
opcode a cmpa # ($81) or a cpx # ($8c) were used... There were even
macro instructions (called skip1 and skip2) coming with the EQU.SA
file... The days when we had to count every byte are long gone, but
I suppose we have acquired some useful techniques back then which
can be quite an advantage sometimes today.
Dimiter
------------------------------------------------------
Dimiter Popoff Transgalactic Instruments
http://www.tgi-sci.com
------------------------------------------------------
..
- Posted by randyhyde@earthlink.net on February 17th, 2006
Jim Stewart wrote:
Indeed, this is probably my standard example when someone claims that
they've got a disassembler that can disassemble just about anything.
Cheers,
Randy Hyde
- Posted by randyhyde@earthlink.net on February 17th, 2006
Paul Keinanen wrote:
Actually, most (non-trivial) automatic disassemblies do a control-flow
or data-flow analysis over the code, and it could require several
passes over local areas of the code.
Remember this: ultimately differentiating code and data is an
undecideable problem. But *really good* disassemblers like IDA Pro
*can* do a decent job using static analysis. Rarely perfect (indeed,
usually far from it), but with it makes a good "rough estimate" to use
when interactive input from the user begins.
On some architectures, the number of illegal instructions is so small
that this isn't a very useful heuristic. That is, when you get an
illegal instruction, you *know* something went wrong, but the fact that
you've got a legal instruction doesn't imply that the disassembly was
correct (it could still be pointing at data).
I've always wondered if there isn't a way to statistically analyze a
code sequence and determine the probability that the code is actually
doing something useful (i.e., it doesn't contain a sequence of
nonsense, but syntactically correct, instructions).
Cheers,
Randy Hyde
- Posted by David R Brooks on February 17th, 2006
randyhyde@earthlink.net wrote:
[snip]
Certainly, a human can do this quite readily. Several of the tests could
be mechanised: for example, loading a value in a register, then
overwriting it without making any use of it (or its associated condition
codes, if any). In the X86 architecture, one often finds random
occurrences of unexpected floating-point operations.
You can take the output of say, IdaPro, and very quickly exclude blocks
that it has defaulted to assume are code. Observe yourself doing this, &
note what tests you are applying. This should give some leads.
- Posted by David R Brooks on February 18th, 2006
randyhyde@earthlink.net wrote:
disassembler strategy could be to start by looking for ASCII strings:
very few architectures will give more than a few printable ASCII
characters in sequence. Find one such string, & you have identified the
"print_next" function at its start. Now you can hunt a string after
every call to that function.
- Posted by Wilco Dijkstra on February 18th, 2006
<randyhyde@earthlink.net> wrote in message
news:1140218175.931120.263300@g14g2000cwa.googlegr oups.com...
This doesn't really pose a problem if you have access to the symbols. Many
tools insert extra symbols that label the bits that follow as code, data,
instruction set X, start/end of a function, which makes correct disassembly
a breeze.
One thing the OP didn't mention would be to use instructions that are
incorrectly disassembled to valid instructions with a different meaning.
On ARM a compare that didn't set the flags(!) was disassembled as a
normal compare. The disassembly looked correct, but behaved differently
from the original code. Unimplemented/reserved or instructions with
undefined behaviour are good ways to confuse assemblers too.
Wilco
- Posted by Heikki Orsila on February 19th, 2006
randyhyde@earthlink.net wrote:
http://en.wikipedia.org/wiki/Trace_vector_decoder
--
Heikki Orsila Barbie's law:
heikki.orsila@iki.fi "Math is hard, let's go shopping!"
http://www.iki.fi/shd
- Posted by Mark McDougall on February 19th, 2006
David R Brooks wrote:
Careful - especially on older embedded systems - this could be ballast code!
Regards,
Mark
- Posted by Jim Stewart on February 19th, 2006
Mark McDougall wrote:
I've also seen the Borland C compiler emit an
occasional 32-bit opcode when compiling to a
16-bit target.
- Posted by Richard H. on February 20th, 2006
randyhyde@earthlink.net wrote:
FYI, Kris Kaspersky's book "Hacker Disassembly Uncovered" has a similar
slant on this topic (though your book will, of course, be different and
better :-) ... http://www.amazon.com/gp/product/1931769222/
For completeness / to one-up his coverage, you might want to review the
techniques covered in his chapters on Counteracting Debuggers and
Counteracting Disassemblers.
Cheers,
Richard
- Posted by randyhyde@earthlink.net on February 21st, 2006
David R Brooks wrote:
Heck yeah. Used it on the 6502 and 680x parts, too.
Well, such a special case is easily stymied by the printf routine I
wrote for the 6502 and 8086:
call printf
byte "I = %d, J=%2d", 0
dword i, j
Of course, the general principle (passing static parameter data in the
code stream) is not something a disassembler is going to be able to
automatically handle (after all, differentiating code and data is an
undecideable problem). Certainly a disassembler can be made smart
enough to recognize some *common* code, like a call to "print", but the
general problem will stump every disassembler out there (well,
*automatic* disassembler, obviously interactive ones with the help of a
human operator munch through this kind of stuff pretty quick).
Cheers,
Randy Hyde
- Posted by randyhyde@earthlink.net on February 21st, 2006
Richard H. wrote:
Oh, certainly I'm aware of this book (Reversing by Eilam is pretty
good, too).
Cheers,
Randy Hyde
- Posted by randyhyde@earthlink.net on February 21st, 2006
Heikki Orsila wrote:
Yep, similar in principle to "virtual machine" guided disassemblers and
dynamic disassemblers that do an interpretation of the code.
Of course, where these guys fail is when you have lots of code that
doesn't execute on a given run of the program.
Cheers,
Randy Hyde
- Posted by Richard H. on February 21st, 2006
randyhyde@earthlink.net wrote:
:-) I figured as much. Yeah, Eilam's Reversing book is good too, but in
more general terms and techniques. I liked Kaspersky's specific
examples; his s book ended up with a whole lot more highlights and
bookmarks in it. :-)
However, he absolutely drove me nuts by moving from one example to the
next with virtually no introduction / transition / etc. - skip a
paragraph and you'll lose content entirely. Suddenly the discussion of
obfuscating login authentication routines turns into an analysis of the
crypto behavior, because he's moved on to a completely different target
program.
My $.02... get a good tech editor to keep you honest. Hopfully they'll
have grammatical skills too, because God knows that run-of-the-mill
editors won't be able to comment on your readability. Oh, and have a
LOT of your material drafted before you commit to a publisher - they'll
put you on a schedule that demands 40+ hours a week, and it'll be tough
to hit your deadlines.
Good luck on your endeavor. If you're in it for the money, stop now.
:-) It's a lot of work, and the return is small unless you're on the
leading edge of the next hot technology. But it's good for the ego and
the resume...
Cheers,
Richard