Tech Support > Computers & Technology > Programming > How to stymie a disassembler
How to stymie a disassembler
Posted by randyhyde@earthlink.net on February 17th, 2006


Hi All,
I'm collecting little tricks that will stymie a disassembler (that is,
prevent it from disassembling the code correctly) to use in a book
project I'm working on ("The Art of Disassembly"). I've collected a
bunch of tricks over the years (OhMyGosh, it's getting to be decades
now), but chances are pretty good that I've missed some pretty good
ones.

Here are some of the ideas I'm using in the book:


1. Burying data in the code stream
2. Placing code in the middle of data objects (a variant of [1]).
3. Arithmetic expressions involving two relocatable addresses (e.g.,
lbl1-lbl2)
4. Burying instructions within the opcodes of other instructions
5. Using alignment operations in code and data
6. Writing code that does not have well-defined procedure/function
boundaries
7. Overlapping data tables and, in general, making data boundaries
fuzzy.
8. Using unions and variant types to make it difficult to infer a data
object's type
9. Writing interpreters that allow a mixture of 80x86 and interpretive
code in the code stream
10. Using the breakpoint (int 3) and trace flag facilities within the
application
11. Using the machine instructions that correspond to a copyright
notice (or other string) do useful computations within the program.
12. Using the data at some location as both program data and executable

machine code (a generalization of [11]). (This includes self-modifying
code, for example.)
13. Using lots of dynamically-linked libraries to make it difficult (or

even impossible) for a disassembler to infer much about the external
code.
14. Creating wrappers for system APIs to make it difficult for
heuristic analysis to make any headway processing those calls.


My interest in this subject is duomorphic. I want to be able to discuss

how to overcome these problems when using (or writing) a disassembler;
I also want to discuss how to help obfuscate object code to make it
difficult to disassemble. Any and all constructive comments,
suggestions, and examples are welcome.
Cheers,
Randy Hyde

Posted by Thor Lancelot Simon on February 17th, 2006


In article <1140205184.912955.210180@f14g2000cwb.googlegroups .com>,
randyhyde@earthlink.net <randyhyde@earthlink.net> wrote:
No technique for doing this is likely to prove effective against any
but a very, very casual adversary. That said, one of the better ones
I know is to write one's code directly in a machine language of one's
own invention, and make the "main program" simply a virtual machine
that executes the machine language in question.

--
Thor Lancelot Simon tls@rek.tjls.com

"We cannot usually in social life pursue a single value or a single moral
aim, untroubled by the need to compromise with others." - H.L.A. Hart

Posted by Arthur J. O'Dwyer on February 17th, 2006



On Fri, 17 Feb 2006, randyhyde@earthlink.net wrote:
We heard you the first time, in
Message-ID: <1140127022.351589.57790@g14g2000cwa.googlegroups. com>

Some people even nicely responded with good answers and additions
to your list. But if you keep posting the same question, without giving
any indication that you've read the responses already given, some
people are going to pigeonhole you as someone not worth their time.
So watch out for that, eh?

Have a nice day,
-Arthur

Posted by Andrew Reilly on February 19th, 2006


On Fri, 17 Feb 2006 12:39:44 -0800, randyhyde@earthlink.net wrote:
One I ran into just this week is a variation on that that sort of comes
for free on ISAs with variable-length instructions for which there's an
advantage of some sort to align branch targets: the data in the alignment
padding might disassemble into something that overlaps the branch target
address, thus screwing up the disassembly at that address until a run of
short instructions allows the disassembler to re-synchronize.

Dunno why I've never actually seen this on x86 code. Guess I don't do
much x86 disassembling. Maybe the disassemblers pay more attention to
what's going on. I saw this on a ColdFire (68000 derivative) and it
happened quite readily. (The compiler seemed to like aligning branch
targets to multiples of eight, for some reason).

Cheers,

--
Andrew


Posted by Stephen Fuld on February 19th, 2006



<randyhyde@earthlink.net> wrote in message
news:1140205184.912955.210180@f14g2000cwb.googlegr oups.com...
Rather than trying to bury all, or a substantial part, of the code in one of
the ways you suggest above, you could encrypt the code and bury just the
decryption routine, which would be much smaller and thus probably easier to
"hide".

--
- Stephen Fuld
e-mail address disguised to prevent spam



Posted by Terje Mathisen on February 20th, 2006


Andrew Reilly wrote:
The best way to do this is by using a conditional jump that you know
will always be taken, then carefully match up the overlapping opcodes,
and finally make the branch/jump to the real target hard to locate.

On a 486, the fastest strcpy() code I could come up with used a skewed
loop, where the first iteration has to skip the top of the loop:

Instead of using a JMP opcode to do this, I had a TEST EAX,12345678
instruction, where the 4-byte immediate constant was replaced with the
four opcode bytes in the top of the loop.

I.e. I replaced a two-byte immediate jump with a single-byte TEST, while
getting rid of a taken branch: This saved both a byte of code and 3
cycles (1 instead of 4).

Terje

--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Posted by MitchAlsup@aol.com on February 20th, 2006


One item not seen in your list was a trick we sometimes used in 68010
codes.

Consider a If-the-else statement where the else clause was not null but
could be performed with a 1 or 2 bytes of opcode. Instead of having the
then-clause jump over the else-clause, we could insert a move-immediate
of the size of the else clause and consume the else clause as data
instead of as instructions. This saves a handfull of cycles by avoiding
a jump instruction and disrupting the instruction fetch process.

Posted by randyhyde@earthlink.net on February 21st, 2006



Arthur J. O'Dwyer wrote:
And what makes you think that I'm not responding?
Believe it or not, some people *do* have a life outside the internet
and happen to take off for a couple days over long weekends.
Cheers,
Randy Hyde


Posted by randyhyde@earthlink.net on February 21st, 2006



Stephen Fuld wrote:
But having only one routine, buried or otherwise, makes it *really*
easy for a hacker with a decent debugger or dynamic disassembler to
pick the whole thing apart.
Cheers,
Randy Hyde


Posted by randyhyde@earthlink.net on February 21st, 2006



MitchAlsup@aol.com wrote:
That would be option #4: burying the instruction inside the opcode of
other instructions.
Very nasty trick. Definitely screws up a disassembler. An interactive
disassembler and a human can still pick it apart, but it's not easy.
Cheers,
Randy Hyde


Posted by randyhyde@earthlink.net on February 21st, 2006



Andrew Reilly wrote:
This is actually quite a good idea. Indeed, someone over in
comp.lang.asm.x86 recommended using instruction prefix bytes like $0F
that completely change the meaning of any following instructions. This
can definitely screw up a disassembler and make it difficult for a
human to help get the disassembler back on track.

Certainly I see this all the time between procedures. And branch
targets that follow some control-transfer instruction are often aligned
as well (by compilers, you don't see this *very* often in hand-written
assembly language).

Probably cache-line or bus size it was optimizing for. Just guessing.
Cheers,
Randy Hyde


Posted by glen herrmannsfeldt on February 21st, 2006



MitchAlsup@aol.com wrote:

I used to see it in 6809 code, and it likely is used in 8080 and 6502
code, too.

-- glen


Posted by Andy Glew on February 21st, 2006


Code disassemblers have problems with:

* Executing code at several different, overlapping, alignments:

E.g. branch to label L, and see a 3 byte instruction L,L+1,L+2, followed by ...

But branch to L+1, and see a 5 byte instruction L+1,L+2,L+3,L+4,L+5

* code that works when A20M is asserted, but not otherwise...

* execute code out of a set of hardware memory mapped I/O registers.

e.g. on the 1st pass L,L+1, etc. are a set of instructions I0;
but when you branch back to L, you get a different set of instructions I1, etc.

Posted by Ken Hagan on February 21st, 2006


Andy Glew wrote:
Presumably a disassembler has no trouble listing the instructions
that begin at address L or L+1 and if it can see a branch to both
L and L+1 then it knows it has to list both instruction streams.

It then has the problem that there is no way to represent the
overlapping instruction streams in an ASM file. This *is* a
problem, but there is some sense in which it is a limitation
of the *assembler*, not the disassembler.

Posted by Oliver Wong on February 21st, 2006



<randyhyde@earthlink.net> wrote in message
news:1140498900.993141.177080@g47g2000cwa.googlegr oups.com...
The accusation is not that you're not responding; it's that you're
unaware that other people have been responding. The evidence Arthur was
providing that led to this accusation was that you posted the same question
twice, without rephrasing the question to show why the answers you received
thus far were unsatisfactory.

<Analogy>
A: What's 1 + 1?
B: 2.
C: Depends on the base. Could be 10 in binary.
A: What's 1 + 1? Anybody?
B: Ermm... We told you already...
</Analogy>

I'm guessing though that the root of the problem is actually that you
multi-posted instead of crossposted. See
http://smjg.port5.com/faqs/usenet/xpost.html to understand why people don't
like people who multi-post.

- Oliver


Posted by randyhyde@earthlink.net on February 21st, 2006



Oliver Wong wrote:
Go to alt.lang.asm to see why I multipost rather than cross-post. You
wouldn't like the responses that occur in some newsgroups appearing
here.

My apologies if two posts went to this same newsgroup. It may very well
been a case of cross-posting (as I *did* cross post to a couple of
benign newsgroups).
Cheers,
Randy Hyde


Posted by Morten Reistad on February 21st, 2006


In article <1140499133.716038.252530@g14g2000cwa.googlegroups .com>,
randyhyde@earthlink.net <randyhyde@earthlink.net> wrote:
There are some variants on this trick, and number #9. Examples are
threaded code (a.k.a. forth, postscript).

Also, generating code to be executed. Two varieties; making small
code snippets in registers and jumping to them (not doable on x86);
and generating small code thunks that can be executed.

-- mrr





Posted by Ron Nicholson on February 22nd, 2006


Stephen Fuld wrote:
This is a variant of #12, which is usually analyzed by just waiting
until the decryption routine returns before dumping memory for
disassembly (often inside a system simulator or emulator).

Slightly more obscure are routines that modify their own opcode stream
both just ahead of and just behind the cache and/or instruction
prefetch queue. Quite implementation dependent of course. Simulators
often don't take caches and prefetch queues into account at this level
of detail, and any interrupts or traps will possibly change the
behavior of this kind of self-modifying code.

A variant of #9 is for an interpreter to also modify its own opcode
lookup table as it executes so that there is no consistent opcode
mapping.

A lot of these obfuscations might be defeated by just tracking register
and memory read/writes inside some sandbox (emulator, simulator,
logic analyzer hooked to an FPGA RTL equivalent CPU, etc.), and
then inferring an architecturally independent instruction stream that
would produce those data transitions.


IMHO. YMMV.
--
rhn A.T nicholson d.0.t C-o-M


Posted by Robert Finch on February 22nd, 2006


<randyhyde@earthlink.net> wrote in message
news:1140205184.912955.210180@f14g2000cwb.googlegr oups.com...
I just use a processor that uses instruction block encryption. Unless you
know the encryption key, impossible to dissassemble






Posted by Zak on February 23rd, 2006


Ron Nicholson wrote:

Does anyone (ab)use external devices? For example poke some I/O device
and poll for the reult - if running too slowly the wrong data is read
back or code is overwritten?


Thomas


Similar Posts