* References: * svx/source/msfilter/msvbasic.cxx * Costin Raiu, Kaspersky Labs, 'Apple of Discord' * Virus bulletin's bontchev.pdf, svajcer.pdf Using a hacked libgsf's test-cp-msole.c: * round-trip - works fine, 'security warning from OXP' * deleting the __SRP_* streams: works fine ... 'Simple' stream: * Simple: 'SGetThree' -> 'SGetThack' - no pcode changes ... + no visible changes * Compressed text: + 0d 0a is the VBA test line terminator + Screwing up the compression can cause _serious_ artefacts - ie. Excel loop/lock / crash etc. * VBA 5 -> VBA 6 ... (?) * pcode line table start marker: FE CA 01 00 + following 2 bytes : seemingly not a clobberable version. + blanking (0x00) pcode between 42 81 0C 00 and ff ff ff ff - gives empty 1st macro in spreadsheet. + clobbering with 0xff's results in XP crasher. * clobbering the pcode results in the text changing ie. For i = 0 To 10 -> For i = 0 To 31 - with no other change. * clobbered pcode with 0's from & inc. feca0100 * ie. pcode totally ignored. * Can totally wipe all data from start to "01 ae b0 00'A''t''t'," none of it used (seemingly) * Can totally wipe all data _except_ pre-pended 3 items: 0x01 0xae 0xb0 - as long as we update offsets in 'dir' stream; see later. * re-compressing seems to cause some nasty grief; _looks_ like a length is stored somewhere here. Our version is shrunk by N bytes, - we get N bytes of de-compressed crud at the end of the macro. '_VBA_PROJECT' stream: * clobber byte 3 of _VBA_PROJECT to 0xa8 - also works going backwards: 0x01 + no need to clobber pcode table ... [ presumably not read !? ] * Office XP only requires this data in this file: + cc 61 <01> 00 00 01 + pad to 32 bytes with 0s. * clobber macro name - reflected in re-load in [functions] list - ordering odd there though. Generic stream description: + first byte '1' + 2nd byte == length of compressed data - 1 + 3rd byte == 0xb + 4 bits MSB of compressed data length + maximum recorded compressed size is 0xc0c ie 0x010cbc above which it (appears) that it reads to the EOS. + 4th ... bytes = LZSS compressed macro text. + VBA editor can't cope with unicode - just barfs + VBA editor can cope with 8bit chars - and writes them out as such too. + can insert invalid / broken VBA with no problems. All streams contain compressed text: + Sheet / Workbook / VBA etc. all re-constructed ... [!] 'dir' stream: * Looks like the project stream - but more authoritative ? * Compression is (essentially) LZSS with no Huffman post-coding, and some window size tweakage. * Contains authoritative offsets for VBA text in streams * re-compression a total pain; managed to get Excel to accept a re-compressed 'dir' - by truncating a number of the longer back-matches to < 1/2 the window size. * altering the offsets - works. * Stream format is: + 2 byte tag, 4 byte length, [ repeat ] + the tag 0x9 is broken: it's length is 2 bytes too short. * Each Stream has 4 strings: ascii/unicode pairs + unicode canonical; 1st pair: unknown, 2nd: stream name. + serious problems if all names not in sync (hmm) * Cutting / reduction: + Removing both external linkage records + prepended 2 name pair works fine. + fiddling with the object count - appears to have little effect on OfficeXP - reads to EOF. + removing objects: + removing SHORT_OBJECT_COUNT -> Sheet1's name -> crashes Excel (even with decrementing the count) + If the equivalent PROJECT reference removed - works fine, can remove all Sheets. + can remove workbook if not in PROJECT. + not understood fields: + Fields 0x13, 0x2c - can be 00'd with no apparent problems + Binning all the VBA stream names (ie. not Sheet*,wbk) results in very odd loading - populates project tree just fine, (from OLE stream names?) - but code view / calculate hangs the editor; hmm. * finally realised that the _VBA_PROJECT_CUR/PROJECT and PROJECTwm streams are rather important. * PROJECTwm - looks like an i18n hash; 8bit -> 16bit; simple list, all 0 terminated with a trailing 16bits of 0. + what is VersionCompatible32="393222000" ? * Office 97 quirks: + exports a '_VBA_PROJECT' storage - but can just trash it. + this is for backwards compatibility with Office 95 + Office 95 doesn't have a _VBA_PROJECT stream. + everything very fragile until you delete the __SRP_X streams. + Clobbering the 'byte 3' _VBA_PROJECT version works nicely, [ you can flawlessly export OfficeXP to Office97 macros ]. + Binning the 'PROJECTwm' stream appears to have no effect though Also mangling down the PROJECTwm set to only the macro names works fine. + Serious problems arise when we omit the 'Sheet 1' type references in the PROJECT stream. + we can omit them from the [Workspace] section, + we must have lots of 'Document=' nodes. + [HostExtenderInfo] not necessary. + HelpContextID, HelpFile, Name not necessary. + CMG, DPB, GC: not necessary + VersionCompatible32="393222000" not necessary + Sheet names must be there; Foo1,Foo2,Foo3 no good. + they must exactly match the 'dir' entries [I guess] + The 'dir' entries must be there too ... + Workbook & Sheet streams ... The 'Workbook' stream must contain _at least_: + Attribute VB_Base = "0{00020819-0000-0000-C000-000000000046}" The 'Sheet' stream must contain _at least_: + Attribute VB_Base = "0{00020820-0000-0000-C000-000000000046}" + Naming consistency + 'Workbook' name (stream & PROJECT) _must_ match with Excel 'CODENAME' (0x01ba) record, or SEGV result. + Similarly, Sheet 'CODENAME' records must also match. [ XclCodename record ] + Office 97 also segv's if we don't have a valuid + Attribute VB_Name = "" record in the macro ... ** Appendix 1: Random _VBA_PROJECT observations: + Examining a typical stream; 00000000 cc 61 5e 00 00 01 00 ff 09 08 00 00 09 04 00 00 |Ìa^....ÿ........| ^^^^^^^^^^-1-^^^^^^^^^^ ^^^^-2-^^^^ ^^^^-3-^^^^ 00000010 e4 04 01 00 00 00 00 00 00 00 00 00 01 00 05 00 |ä...............| ^-4-^ ^-A-^ ^-B-^ ^-C-^ ^-D-^ ^-E-^ ^-F-^ ^-G-^ 00000020 02 00 16 01 2a 00 5c 00 47 00 7b 00 30 00 30 00 |....*.\.G.{.0.0.| ^-H-^ [1] - version / magic [2/3] - language identifiers [4] - char-set Lengths: [A] - [B] - [C] - [D] - [E] - [F] - [G] - number of plugins / strings. [H] - ... a number of strings ... VBA section: 00000770 0b 00 02 00 02 00 02 00 02 00 01 00 01 00 01 00 |................| ^-1-^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^-2-^^^^^^^^^^^^ 00000780 01 00 02 04 02 04 02 04 03 00 04 02 00 00 06 02 |................| ^^^^^^^^^^^^^^^^^^^^^^^ ^-3-^ ^^^^^^^^^^^ ^-4-^ 00000790 01 00 08 02 00 00 0a 02 ff ff ff ff ff ff 00 00 |................| ^^^^^^^^^^^^^^^^^ ^^^^^^^^^-5-^^^^^^^^^^^^ -000007a0 00 00 ff ff 00 00 5a f6 67 40 03 00 00 00 01 00 |......Z.g@......| This is all tedious overall project stuff so far ... [1] - count of VBA project elements ... [2] - element types - each USHORT: 0x1 == VBA module, 0x2 == sheet/workbook, 0x402 == class or form [3] - count of int32s - tags of some form (?); effectively magic: [5] - an offset table in 16bit chunks (?) + fixed length of 100bytes