The "Pros" and "Cons" of WordBasic Virus Upconversion


Vesselin Bontchev, anti–virus researcher
FRISK Software International
Postholf 7180, 127 Reykjavik, ICELAND
E–mail: bontchev@complex.is

Abstract: The built–in capability of Microsoft Word 97 to convert WordBasic viruses to VBA5 presents a serious ethical dilemma to the anti–virus researchers. On the one hand, they have to provide protection to their users against these converted viruses. On the other hand, in order to implement detection of the converted viruses, the anti–virus researchers must first have them. The obvious way to achieve this is to perform the conversion on all WordMacro viruses in their collections. However, this would mean that the anti–virus producers must create new viruses—an activity they are ethically bound to avoid at any price. This paper describes the problems related to this di-lemma in details, considers the level of threat represented by the upconverted viruses, and also provides a satisfactory technical solution to the problem.

Download this paper as a ZIPped Word document

Table of Contents

1. Introduction

1.1. Welcome to Office 97

1.2. Anti–virus Capabilities of Office 97

1.3. New Viruses Produced by Office 97

1.4. Definitions and Terminology

2. The Need for Macro Virus Upconversion

3. The Problems of Macro Virus Upconversion

3.1. Are the Natural Upconverts a Real Threat?

3.2. Ambiguous Upconversion

3.2.1. Empty Lines

3.2.2. White Space

3.2.3. Trailing Blanks in Comments

3.2.4. Letter Case of the Identifiers

3.2.5. Module Renaming

3.2.6. Foreign Characters in Literal Strings

3.2.7. Other Peculiarities

3.3. Damaging the Image of the Anti–Virus Industry

3.4. Testers Creating Viruses

3.5. Stimulation of Virus Creation and Distribution

4. The Solution

4.1. Required Criteria

4.2. The Methods

4.2.1. Scan Strings

4.2.2. Upconversion of Single Macros

4.2.3. Upconversion in Parts

4.2.4. Upconversion of Disabled Viruses

4.2.5. VBA5 Heuristics

4.3. Advantages

4.4. Disadvantages

5. Conclusion

6. References


1. Introduction

At the beginning of 1997, Microsoft introduced a major update of their immensely popular office automation package—Microsoft Office. The new version was called, unimaginatively, Office 97. It was met with mixed feelings by the users—since it was much more bloated that the previous version, was slower, and required more memory—but also crashed significantly less often and had many nice new features. However, it caused nightmares among the anti–virus people.

Back to the Table of Contents

1.1. Welcome to Office 97

The applications of Microsoft’s Office 97 package sport a common scripting language—Visual Basic for Applications version 5 (VBA5). The word processor of the previous versions of the package, Microsoft Word, used a very different language—WordBasic. Since a large pool of user programs written in WordBasic already existed, Microsoft had to provide some way of porting these programs to the new language, VBA5, which is used by Word 97. (Word 97 can no longer run WordBasic programs.)

Understandably, Microsoft wanted to make this porting as easy and straightforward as possible—in order to facilitate the upgrading to Office 97 and stimulate the users to upgrade to the new package. In particular, they have decided to ship a WordBasic–to–VBA5 converter and make it part of the converter which automatically converts documents created with the old versions of Word into the format used by Word 97. Unfortunately, as usual, Microsoft decided to ignore the security–related aspects of the problem.

Back to the Table of Contents

1.2. Anti–virus Capabilities of Office 97

Fortunately, not all security–related aspects were ignored. First of all, not all WordBasic programs are converted correctly to equivalent VBA5 ones—mostly due to bugs in the converter and incompatibilities between the two languages (some WordBasic commands simply have no VBA5 equivalents).

Second, at the last moment before releasing Office 97 officially, Microsoft introduced some virus–specific anti–virus protection in the WordBasic–to–VBA5 converter. The converter uses a library (WWINTL32.DLL) which, among other things, contains text strings present in some of the well–known viruses at the time. (With the potential to scare the users if they see a text like "You are now infected by the Alliance" in their files.) WordBasic macros which contain these texts are simply not converted to VBA5. This is relatively useful, in the sense that it stops most of the very widespread viruses—Concept, Wazzu, Npad, NOP, etc.

Unfortunately, the protection described above does not work as well as it should. First of all, it is fixed. It cannot be updated. A much more sensible approach would have been to provide standard hooks and APIs, so that the anti–virus producers can hook their own virus–specific protections. As it is now, there are hundreds of WordBasic viruses which are not stopped by the protection—either because they have appeared after Office 97 has been released, or because Microsoft did not know about them at the time when the protection was developed.

Second, it is obvious that the scan strings used have not been selected by a professional anti–virus researcher. Using simple text strings as scan strings is a naïve technique which has been made obsolete by the developments in the anti–virus technology long time ago—about 1987. Furthermore, the scan strings are not selected from the most appropriate part of each virus and are not optimized to detect as many new variants of the known viruses as possible.

Third, and probably most important, the protection is present only in the official release of Office 97—because it was introduced in the very last moment. Unfortunately, due to Microsoft’s marketing strategy, zillions of beta versions of Office 97 have been distributed around—versions which do not have this protection and would happily convert any WordBasic virus to a VBA5 program. And, once converted, the new virus will spread successfully even on systems which have the protection—because the scan strings are used only by the converter from WordBasic to VBA5; not by Word 97 itself.

Actually, the applications of the Office 97 suite have yet another kind of anti–virus protection. When the user attempts to open with them a document containing macros, menus, toolbars, key shortcuts, and other "customizations" (as Microsoft calls them), the respective application (e.g., Word 97) displays a warning and allows the user to open the document in read–only mode and with the macros and the customizations disabled.

Unfortunately, this protection is not virus–specific and, therefore, constantly causes false positives—because it is designed to detect macros and customizations; not viruses. This is likely to force the users to switch it off—especially having in mind that Microsoft has made the process of switching it off extremely easy (much easier than switching it back on) by providing the appropriate checkbox right on the dialog which displays the warning. Furthermore, as explained in [Bontchev96], it is possible for a macro virus to bypass the protection—because it has not been designed by professional anti–virus researchers and security experts. (The paper [Bontchev96] discusses the protection used by Word 7.0a, but it is extremely similar to that used by Word 97.)

Finally, the SR–1 patch of Office 97 adds yet another anti–virus feature to Word 97—it is discussed in details in section 3.1.

Back to the Table of Contents

1.3. New Viruses Produced by Office 97

The main problem comes from the fact that the user–created useful and legitimate WordBasic programs are by far not all WordBasic programs that exist. In particular, at the time Office 97 was released, about a thousand of WordBasic macro viruses existed—and new ones kept being created at an alarmingly high rate.

Since computer viruses written in WordBasic are just WordBasic programs like any others, the existence of a converter like the one provided by Microsoft meant that all these viruses would be converted into the new language as well.

We would like to emphasize that WordBasic and VBA5 are not just too dialects of BASIC—they are two completely different languages. VBA5 is object–oriented while WordBasic is not. WordBasic is tokenized and interpreted like most dialects of BASIC, while VBA5 is compiled into p–code for a virtual stack machine and then this p–code is interpreted—a technique which was more popular with some portable Pascal implementations (e.g., UCSD Pascal). Finally, the internal representations of a WordBasic program and its VBA5 conversion are entirely different (see Appendix A).

Due to these facts, the anti–virus researchers are forced to consider the conversion of a known WordBasic virus to a VBA5 program to be a new virus ([VB97]). And indeed, detecting this VBA5 program requires a totally new approach and a complete re–design of the anti–virus products—which is why it took months for the anti–virus producers to come up with versions of their products which were able to support Office 97 documents. Not only the same scan strings used to detect the WordBasic virus cannot be used (because the converted program uses completely different opcodes) but even using scan strings for virus detection has become mostly unusable as an anti–virus technique—because the VBA5 programs contain too many variable areas.

A similar analogy exists in the DOS virus world. If somebody takes the source of a virus and compiles it with several different assemblers, this will most likely result in several different viruses—and they are considered different by the anti–virus researchers, regardless that these viruses have only minor differences, are equivalent implementations of one and the same algorithm, and do essentially one and the same thing. According to the commonly used definition, two viruses are different if they differ in at least one bit of their non–modifiable areas. In fact, the differences between a WordBasic virus and the program which results from converting it to VBA5 are even larger. A better analogy is if somebody takes the source of a virus written in BASIC, then rewrites it in C++, following the same algorithm, and compiles it. Of course the resulting virus will be a different, new one.

Because of all this, a naming scheme has been created—a scheme which indicates that the VBA5 viruses resulting from this conversion are new ones, yet still reminds which the original virus is. For instance, a WordBasic virus can be named WM/Foo.A. If this virus is converted to VBA5 and the resulting program is still a virus, this new virus will be named W97M/Foo.A. A similar procedure is used for the viruses written in the macro programming language of Excel—except that there the prefixes "XM/" and "X97M/" are used instead. These are the short forms of the prefixes which indicate the platform. A long form can be used as well—"WordMacro/", "Word97Macro/", "ExcelMacro/" and "Excel97Macro/".

Back to the Table of Contents

1.4. Definitions and Terminology

Finally, we would like to introduce some new terms, coined by Richard Ford ([Ford97]). The process of converting a WordBasic or VBA3 program (Excel versions 5.0 and 7.0 use VBA3 as a macro language; it is similar but nevertheless different from VBA5—both as a language and as internal representation) to a VBA5 program using the built–in converters of Office 97 is called upconversion. Similarly, the viruses produced by this process are called upconverted viruses or simply upconverts. We shall call the viruses from which the upconverts were produced originating viruses or simply originators.

Back to the Table of Contents

2. The Need for Macro Virus Upconversion

Microsoft has made the upconversion process too easy and unnoticeable. When a document created by Word 6.x/7.x is opened with Word 97, the macros in it are automatically and unnoticeably upconverted—no questions asked. If the user attempts to save the document, Word 97 will ask whether the upconverted document (with the VBA5 modules) should be saved. Unfortunately, from the anti–virus point of view, this is too late—by that time the virus in the document would have had the chance to run and infect other documents.

Since the upconversion process is so easy and unnoticeable and the upconverted programs behave mostly like their WordBasic originators, most users are not aware of the fact that just by opening an infected document they have created a new macro virus. Nevertheless, it is so. Of course, we are far from suggesting that the users should be blamed for this virus creation. They are not aware of the fact that the upconverted viruses are different, they are often not aware of the fact that the upconversion occurs at all, and they might be even not aware that their original Word 6.x/7.x document was infected! It can be argued that in those cases the fault is Microsoft’s—for ignoring, as usual, the security aspects of the problem, for not consulting the anti–virus experts before designing and implementing their products, and for putting too much emphasis on convenience at the price of paying too little attention to safety. Nevertheless, this means that new viruses are likely to be created via upconversion when the users upgrade to Office 97.

However, since the users are usually unaware of the fact that these viruses are drastically different from their WordBasic originators, they are unlikely to realize that they require special handling by the anti–virus products—just like any new virus. That is, if a product detects a particular known WordMacro virus, this does not necessarily mean that the product will detect its Word97Macro upconvert. (Also, by far not all virus–specific anti–virus products clearly state which WM and W97M viruses they can detect. Many products do not specify the platform part of the virus name, thus misleading the users to believe that both kinds of viruses are detected. For instance, by reporting just Wazzu.A, they imply that they can detect both the WM/Wazzu.A and the W97M/Wazzu.A viruses.) But this upconvert seems reasonably likely to be created (since it is so easy to create it automatically). Therefore, the users are right to expect and demand that their anti–virus product detects these Word97Macro viruses as well as the old WordMacro ones. And, since the users generally lack the knowledge and the expertise necessary to test an anti–virus product properly, this means that independent testing organizations will have to test which anti–virus products satisfy the above requirements—and to what degree.

Back to the Table of Contents

3. The Problems of Macro Virus Upconversion

Unfortunately, the rather obvious and straightforward reasoning used in the previous section opens a real Pandora’s box. Let us examine in details the problems related to this upconversion and testing.

Back to the Table of Contents

3.1. Are the Natural Upconverts a Real Threat?

One of the basic premises of the reasoning used in Section 2 is that the known WordMacro viruses are likely to be upconverted in a natural way by the users who have upgraded to Office 97 and, therefore, become a real problem.

At a first glance, such a premise looks quite sound. First, there are a lot of WordMacro viruses out there and a lot of infected documents. Less than a year since the appearance of the first WordMacro virus, this kind of viruses topped the lists of reported virus incidents all over the world.

Second, people are gradually upgrading to Office 97—according to Microsoft, millions copies of it have been shipped all over the world too. Even if the reported numbers are taken with a grain of salt (having in mind that Microsoft is using them for marketing purposes, probably "a ton of salt" is a more appropriate metaphor), that is still quite a lot of copies of Office 97 around. So, it seems likely that some of these copies will face documents infected with WordMacro viruses.

Third, as was mentioned in Section 1.2 and described in details in [Bontchev96], the built–in anti–virus protection of Word 97 (and the other Office 97 applications) leaves a lot to be desired (to put it mildly) in terms of security. Therefore, it is likely to fail stopping macro viruses often enough.

However, the reality seems to belie the above fine reasoning. The number of upconverted Word97Macro viruses found in the wild so far is minuscule compared to the number of known WordMacro viruses. According to the June 1998 issue of Joe Wells’ WildList ([WildList98]), almost more than a year and a half after Office 97 has been officially released, only five WordMacro virus upconverts are being reliably found in the wild. Our own statistics, based on the reports we receive, are only slightly higher (see Appendix B). At the time of writing this paper, our macro virus scanner disinfects infected Word 97 documents by removing all VBA5 modules from them (not just the viral ones), as well as all user menus, toolsbars, buttons, key shortcuts, and so on. Yet, we are getting no complaints from the users. As a comparison, when one of the first versions of our product handled WordMacro viruses by removing all macros from the infected documents, we were deluged with complaints. It is unreasonable to think that most users who have upgraded to Office 97 use macros significantly less often than those users who have not. The correct explanation, therefore, seems to be that the Word97Macro viruses in general (and the upconverted viruses in particular) do not seem to be a serious problem for the moment.

We are not sure what exactly are the reasons for this phenomenon. Several possible explanations spring into mind. While none of them is sufficient to explain the unnaturally low number of upconverts found in the wild, probably the combination of them provides an adequate explanation.

Probably the most important reason is that Office 97 is not as widespread and popular as Microsoft tries to make us believe. Since it is a major upgrade, the users know from experience that it is bound to introduce many new problems and incompatibilities with the previous versions, to put additional strain on the technical support, and so on. That is why most big organizations seem still reluctant to upgrade their established base of office automation software. This has the consequence that there are fewer environments which are likely to produce upconverted macro viruses.

Another reason is the built–in macro virus protection in the Office 97 applications. As explained in Section 1.2, they are generic and virus–specific. Since one of the two upconverts listed in [WildList98] has been created by an early beta version of Word 97 (one which did not have the virus–specific macro virus protection), we have to assume that the virus–specific macro virus protection proves to be more effective than the generic one in stopping viruses. And, since it stops most of the widespread WordMacro viruses which plague the earlier versions of Word, they do not get the chance to be upconverted.

Yet another reason comes from the fact that the upconvert of an encrypted (execute–only) WordMacro virus is never viral. This is so because upconverting encrypted WordBasic macros results in "protected" VBA5 projects—and it is not allowed to edit or copy modules which reside in protected projects. (Not only copying from outside is disallowed; modules residing in such projects cannot copy themselves either—and, therefore, cannot be viral.)

Finally, Microsoft has released the so–called SR–1 (Service Release One) patch to Office 97. One of the changes it does to Word 97 is extremely relevant to the upconversion issue. Once this patch is applied, it changes the behavior of the VBA5 commands Application.OrganizerCopy, WordBasic.MacroCopy and WordBasic.Organizer .Copy so that they still can copy modules from documents to the global template (thus still allowing the installation of legitimate VBA5 packages) but the attempt to copy modules in the opposite direction (from the global template to documents) silently fails with an untrappable error. Furthermore, a module running in the global template cannot create modules in documents with the command ActiveDocument.VBProject.VBComponents.Add (vbext_ct_StdModule). In practice, this means that a virus can infect the global template but is unable to spread any further.

Admittedly, this does not eliminate the threat entirely. First, if the virus has a destructive payload, this payload may still activate and cause damage when the virus is running in the global template—even if the virus is unable to spread any further. Second, since Microsoft did not consult the anti–virus experts when implementing this feature, it is by far not fool–proof. It is possible to bypass it—for instance, by not infecting the global template at all and only spreading from one document to another (e.g., by infecting only the documents listed on the Most Recently Used file list). Also, VBA5–specific programming techniques exist which allow a virus to replicate from the global template even under Word 97 SR–1—and these techniques have already been discovered by the virus writers.

Nevertheless, the SR–1 patch adds a powerful anti–virus feature to Word 97 (for instance, no upconvert is a virus under Word 97 SR–1) and all Office 97 users are urged to apply it. It essentially solves the upconversion virus problem, fixes several bugs, and will be a precondition for applying the next, SR–2 patch, when it becomes available.

To summarize, it seems that the currently available documented evidence shows that the naturally upconverted macro viruses do not pose a significant threat—at least not yet, regardless of what the purely theoretical reasoning would make us believe. This situation may change in the future, however—although past experience suggests that this is extremely unlikely.

Of course, the unlikeness of the threat has never prevented some unethical anti–virus vendors from drumming it up in the hope that the scared users will rush to buy their products. We saw it done during the DataCrime virus scare, we saw it done during the Michelangelo virus scare, we saw it done during the Hare virus scare—and we shall, undoubtedly, keep seeing it in the future. (A Spanish anti–virus producer has even the guts to come up with some kind of virus scare every month—they call it "the virus of the month" and publish regularly press releases about such viruses—usually pretty insignificant ones and extremely unlikely to spread—in order to get public attention for marketing purposes.)

Such vendors are, of course, eager to have their anti–virus products "tested" against the upconversion threat—in order to show how much "better" their products are, compared to those which do not pay sufficient attention to the virtually non–existant virus upconversion problem. Such vendors are, as a rule, not concerned with the ethical problems of having anti–virus people create new viruses (even if it is just for the purposes of a test). Their main concern is gaining commercial advantage for the product they sell—although this is often concealed behind seeming "concern" for the safety of the users.

Back to the Table of Contents

3.2. Ambiguous Upconversion

Another very serious problem caused by Microsoft’s automatic virus upconvertor comes from the fact that the upconversion is not unambiguous. That is, one and the same WordBasic macro can be upconverted to different VBA5 programs, depending on some obscure local settings of the user’s system. In the next few subsections we shall present some of the ambiguities which have been discovered during our research. Unfortunately, there is absolutely no guarantee whatsoever that they are the only ones. In particular, there is a serious lack of information concerning the upconversion of corrupted macros, upconversion performed by the different language versions and beta versions of Office 97, and so on.

The implications of these ambiguities are clear. If we do not know how exactly the upconversion is performed and which parts of the resulting VBA5 program are dependent on the user’s environment, we have no guarantee that the detection we have implemented which works for the upconverts created in our laboratories will work for the upconverts which are naturally created in the wild.

Back to the Table of Contents

3.2.1. Empty Lines

As it turns out, the converters to VBA5—both from WordBasic and from VBA3—add one empty line at the beginning of the program when converting it. By itself, this is not so bad. Unfortunately, Excel 97 contains a converter in both directions. That is, when it opens an Excel 95 workbook containing VBA3 modules, it converts them into VBA5 modules. However, it also allows Excel 97 workbooks to be saved in Excel 95 format—and, unlike Word 97, then it "downconverts" the VBA5 modules in the workbook to VBA3 modules. (The converter of Word 97 simply ignores the VBA5 modules when saving a Word 97 document in Word 6.x/7.x format.)

In practice, it means that you can take a VBA3 module, convert it into a VBA5 module—this adds one blank line at its very beginning. Then you can convert this VBA5 module back to a VBA3 module—the empty line is preserved. If you now convert the resulting VBA3 module to a VBA5 module again, another blank line will be added at its beginning. In the case of a macro virus (e.g., XM/Laroux) and an organization which is just switching to Office 97 and still has lots of Office 95 machines, such "upconversion/downconversion" loops can be performed many times (because the users would want to save the documents in the old format—in order to keep them compatible with the machines which have not been upgraded yet)—therefore, adding many blank lines at the beginning of the virus. Yet this is still the same virus. Therefore, anti–virus programs should ignore the empty lines when identifying VBA (3 or 5) viruses.

Or, at least, it seems that they should ignore them in the case of Excel viruses and if they are at the beginning of the virus. Unfortunately, the situation is a bit more complicated than that.

The first sign that something else was amiss was brought by the W97M/Gambler.A virus—a native (i.e., not the result of "upconversion" of an existing WM virus) virus for Word 97. This virus contains several user forms, designed to spoof the Tools/Macro dialog box and provide some elementary form of stealth. When we replicated it, we noticed that the checksum of the code contained in the streams containing the user forms was different in the different replicants. As it turned out, each time the virus replicated, one blank line was inserted in the code of one of the user forms. Consequent replications resulted in additional blank lines being inserted. Worse, the lines were inserted not at the beginning of the code but somewhere in the middle—between the definition of the form and the code implementing the procedures designed to handle the different events (e.g., mouse clicks) associated with the different parts of the form. Why this happens is beyond our understanding. Ask Microsoft.

Anyway, it seems that it would be wise if the empty lines are ignored when identifying a VBA macro virus—regardless of whether it is written in VBA3 or in VBA5, regardless of whether it is an Excel or a Word 97 virus, and regardless of where the empty lines are in the code of the virus.

Back to the Table of Contents

3.2.2. White Space

Another problem of the identification of the WM viruses "upconverted" to W97M viruses is caused by the way the converter treats white space in general and tabulation characters in particular. The first report that something is fishy came from Dmitry Gryaznov—an anti–virus researcher working for Dr Solomon’s ([Gryaznov97]). According to him, the first generation of the W97M/Appder.A virus (i.e., produced immediately by the converter; before the new virus has had the chance to replicate) was somehow different from its replicants.

Careful examination revealed that the difference was caused by an operator which contained a tabulation character and an apostrophe–style comment at its end. This prompted us to research how WordBasic macros containing tabulation characters in different places are upconverted to VBA5 form. The results were quite interesting.

It should be noted that the tabulation characters do not exist in VBA5. If you press the Tab key while editing a VBA5 program, a number of spaces is inserted instead. The precise number of spaces inserted depends on the current position of the cursor and on the contents of the Tools/Options/Editor/Tab Width setting of the VBA5 Editor. However, tabulation characters can be freely used in WordBasic—and often are used to indent lines.

Obviously, when converting the WordBasic programs to VBA5, the converter has to do something with these tabulation characters. What it does is quite logical—or at least it seems so at first glance. The converter takes the current setting of the "tab width" field described above and uses it to convert the tabulation characters in the corresponding number of spaces—so that the look of the program (i.e., its indentation) is at least approximately preserved.

Unfortunately, when WM viruses containing such tabulation characters are upconverted, this means trouble. In particular, it means that machines with different setting of the tab width field will produce different W97M viruses from one and the same WM virus—if this virus contains any tabulation characters used as line indents! Furthermore, the user might even not know the contents of this setting—or even that such a setting exists at all.

Clearly, it will be too much of an inconvenience if all these W97M viruses are to be considered different. Therefore, they have to be considered one and the same virus. That is, the indentation of the lines has to be ignored when identifying a viral VBA5 module—because we often do not know whether its originating WM virus had contained any tabulation characters used as line indents.

But this is not all. As it turns out, tabulation characters can be used not only as line indents. The only good news is that there are not that many other places where they can be used. In most cases, if the user inserts extra white space in the middle of an operator, both WordBasic and VBA5 will throw it out automatically. The WordBasic editor throws it out when the macro editing window is closed (thus, it becomes apparent that the extra white space has been removed the next time the macro is opened for editing), while the VBA5 Editor throws it out when the cursor leaves the editing line (thus, the change becomes apparent immediately). For example, the line

x  =  2  *  2

is automatically converted (both by WordBasic and VBA5) to

x = 2 * 2

But not always. There are a few exceptions, described below.

First, the white space is preserved in the front of the apostrophe–style comments. That is, the lines

x = 2 * 2    ' This is a comment

and

x = 2 * 2 ' This is a comment

generate different code. In VBA5, the position of the beginning of the apostrophe–style comment is contained in the first operand of the "apostrophe–style comment" p–code instruction.

Second, the white space is preserved after the ":" operator. That is, the lines

x = 2 : y = 4

and

x = 2 :    y = 4

generate different code. In VBA5, the position of the beginning of the second operator is contained in the argument of the ":" p–code instruction.

Third, the white space is preserved in the Dim statements before the As keyword. That is, the lines

Dim x As Integer

and

Dim x As    Integer

generate different code. This case is a bit more complicated than the previous two. It seems that VBA5 can use two different Dim p–code instructions—one for "Dim without spaces before the As" and another for "Dim with spaces before the As". The second p–code instruction contains one additional operand, containing the position of the As keyword on the line.

Fourth, white space can be used when indenting the different parts of a VBA5 line that is split into multiple lines (with the "_" character at the end of the line indicating the split point). This has a direct equivalent in WordBasic (where "\" is used as a line continuation character) and the subparts of the split line can be indented with tabulation characters. Therefore, the corresponding parts of the VBA5 p–code instruction for line continuation should be skipped when computing the checksum of the VBA5 modules.

Since all the cases described above have direct equivalents in WordBasic, and since their WordBasic equivalents can contain tabulation characters as part of the white space, this means that such lines upconvert differently depending on the tab width settings of the VBA5 Editor. Therefore, the white–space–related operands of the p–code instructions mentioned above should be ignored when identifying VBA macro viruses.

In fact, there is another case in which white space is used—however, it turns out that is does not cause any macro virus identification problems. In particular, tabulation characters can be used inside a string literal. However, the converter handles these cases quite smartly—it finds all such characters in the literal strings and replaces them with Chr$(9). For example, the WordBasic line

x$ = "This is a Tab ->   <-"

is upconverted to the following VBA5 line:

x$ = "This is a Tab ->" + Chr$(9) + "<-"
Back to the Table of Contents

3.2.3. Trailing Blanks in Comments

VBA3 supports trailing blanks in the comments of the program. That is, if the user enters some spaces at the end of the comment, these spaces will be stored in the compiled image of the VBA3 program—although from the user point of view it will not be obvious that the spaces are there (if the user moves the cursor to the comment line and presses the End key, the cursor will jump after the first non–blank character—not after the last character of the line (which is a space).

As opposed to that, VBA5 strips the trailing blanks from the comments. This means that if a VBA3 program which contains some comment lines with trailing spaces is upconverted to VBA5 and then downconverted back to VBA3, these spaces will disappear.

Back to the Table of Contents

3.2.4. Letter Case of the Identifiers

As mentioned in [Bontchev97], all VBA (both VBA3 and VBA5) modules in a file share a common pool of identifiers (variable names, procedure names, etc.). But this is not all. In addition, when a new module is added, a module which uses the same identifier as one of the existing modules, no new identifiers are added to the common pool of identifiers. Problem is, when deciding whether the new identifier is "the same" as one of the existing ones, the letter case of the said identifier is ignored.

This can have some rather puzzling effects. For instance, if one creates a VBA module containing the line

fOO = Bar

then creates another module, containing the line

bAR = Foo

and opens the first module for editing again, its contents will look like this:

Foo = bAR

which is not quite exactly what the user originally entered. In fact, these two lines can be typed in one and the same module—and, as soon as the second one is entered, the letter case in the first one will change.

This means that the letter case of the identifiers in an upconvert can depend on the letter case of the same identifiers used in other (not related to the virus) modules in the same document and, therefore, should be ignored when deciding whether two VBA viruses are the same or not.

Back to the Table of Contents

3.2.5. Module Renaming

As discovered by David Chess ([Chess98]), a WordBasic macro named Exit (the letter case does not matter) gets upconverted to a VBA5 module named Exit_. The same name conversion is preformed only sometimes on the WordBasic program. For instance, calls to a macro named Exit are changed to calls to a module named Exit_—but the operands of MacroCopy, Organizer, ToolsMacro, OnTime and others are not changed accordingly.

The author of the present paper later discovered that every WordBasic macro, the name of which is a keyword in VBA5, gets upconverted to a VBA5 module with an underscore appended to its name. This has negative implications not only on computer viruses (which tend to "lose" modules with the abovementioned special names because the macro copying operations do not address them by their new name)—it can often cause legitimate WordBasic packages to stop working after their upconversion to VBA5.

Back to the Table of Contents

3.2.6. Foreign Characters in Literal Strings

It seems that non–ASCII (foreign) characters in literal strings are represented differently in the VBA5 program, depending on whether the upconversion of the originating WordBasic program containing these literal strings has been performed on an IBM PC or on a Macintosh. This phenomenon seems to be somehow related to Unicode and the code pages used by Windows and MacOS. If this peculiarity is not taken into account, a virus scanner could easily miss an upconvert, even though that same upconvert has been created in the anti–virus lab and detection of it has been implemented. It is sufficient that in reality the upconvert gets created on the wrong platform.

Back to the Table of Contents

3.2.7. Other Peculiarities

Some obscure inconsistencies exist between the first generation upconvert and its replicants in several cases. This happens usually when a WordMacro virus for a Far Eastern language version of Word gets upconverted to VBA5—although we have observed it in several cases of ExcelMacro to Excel97Macro upconversion. The inconsistencies touch such areas like the return values and the argument list of the functions and subroutines. (The fact that most anti–virus products do not bother to identify these areas exactly and, therefore, do not notice the difference, cannot be an excuse. These areas ought to be identified, because it is perfectly possible to write two different viruses which will look and act differently—yet will differ only in those areas.)

During our almost two year research of the upconversion issue we have been continuously finding new peculiarities. There is absolutely no reason to believe that we have found all of them. Until the issue is well–researched and perfectly understood, it is simply dangerous to upconvert viruses. There is no guarantee that the viruses created this way will be the same viruses which actually appear during natural upconversion—and there is no guarantee that the implemented detection of these laboratory–created viruses will be of any help to the users—since it is quite likely to fail to detect the upconverts which are actually created in the real world via natural upconversion.

Back to the Table of Contents

3.3. Damaging the Image of the Anti–Virus Industry

There are persistent rumors among the general public that the producers of anti–virus software create and release computer viruses in order to create a market for their products. These rumors have always been vigorously denied by the anti–virus people. It has been pointed out that the number of viruses being created by the virus authors is high enough already (more than high enough, actually)—that the anti–virus producers hardly have the time to cope with the hundreds of new viruses created every month, let alone to create any themselves. Analogies have been put forth, insisting that it does not make more sense for the anti–virus people to create viruses than it does for the doctors to create new diseases and the police to create new criminal organizations for "job security" purposes.

The author of this paper, being an anti–virus researcher himself, has always subscribed to the above line of thought. It was, therefore, a great disappointment to him when some people working in the anti–virus industry began to advocate actively the creation of new macro viruses via upconversion. Initially we genuinely believed that the goal of those people was indeed what they claimed—to provide better protection to the users from a seemingly likely threat. Also, a person not deeply familiar with the internal formats of the Word and Word 97 documents might not realize immediately that the upconverted virus is different from the original—so, people insisting that "WordMacro viruses should be replicated on the Word 97 platform" could have been honestly confused.

However, by now it has been explained multiple times to those people why the upconverts are not the same viruses as the originators. The documented evidence collected during the past year and a half clearly shows that the new viruses are created via natural upconversion significantly less often than expected. Continuous research has demonstrated that there are other, more effective ways to protect the users from the upconverts, even if those eventually are created in a natural way—methods which do not involve the creation of new viruses. (A detailed explanation of these methods will be given in Section 4.2.)

Therefore, by now we are deeply convinced that anybody who openly advocates that the anti–virus community should engage in the creation of new macro viruses via upconversion should consider whether there exist ethical grounds for such activity. There is no good reason for the anti–virus people to create viruses—any kind of viruses. And, those who are doing so, damage the image of the whole anti–virus industry and bring shame to it. Such practices must not be condoned and tolerated; they must be denounced loudly as being deeply unethical.

This is consistent with the reaction observed in a much milder case. Evidence was provided that some of the distributors of an Israeli anti–virus product (namely, the distributors of this product in New Zealand and the UK) were providing a diskette with live viruses to their customers, so that the latter could better "test" and evaluate the product—as well as some other, competing products. This raised an indignant outcry both among the users and within the anti–virus industry, since giving viruses to people who are not trustworthy and technically qualified to handle them is considered a seriously irresponsible act. The fact that the viruses in question were old and well–known and most products could already detect them did not change the matter. However, one could imagine a much stronger reaction if the offenders had created these viruses themselves. It has been said several times that their action brings shame to the whole anti–virus industry—and most other anti–virus producers have been fast to distance themselves from the company involved and to denounce its activities as unethical and irresponsible.

Back to the Table of Contents

3.4. Testers Creating Viruses

If the upconverts of the known WordMacro viruses are created and the scanners are made to detect them in a virus–specific way, the users have the right to request that this capability of the anti–virus products is tested—in order to measure the quality of the different anti–virus products in this aspect. Unfortunately, this opens another can of worms.

In order to perform such a test, the reviewers of anti–virus products must have a collection of upconverted viruses. There are essentially two different ways to obtain it.

First, the reviewer could get them from the anti–virus producers who have created them. However, this would mean that the viruses in question would leave the supposedly secure environment of the virus labs and would have to be sent to people; many whom have proven in the past that their technical competence, as far as viruses are concerned, to put it mildly, leaves a lot to be desired. Clearly, this would increase the danger of a virus outbreak—an outbreak for which the people who have created the virus would be directly responsible.

Second, the reviewer could create those viruses him/herself, using a collection of known WordMacro viruses and a copy of Word 97. Unfortunately, this "solution" is even worse than the first one. As we saw in Section 3.2 (and as is explained in more details in [Bontchev97]), the upconversion process is far from well–understood. It is full of many bizarre peculiarities. A person who is not deeply familiar with all of them (and, since not all of them have been discovered yet, the set of such persons is empty), is very likely to goof up and create a completely new virus—one which would not have been created via natural upconversion from the originating WordMacro virus. And, of course, the dangers of virus leaks and outbreaks due to improper handling mentioned in the previous paragraph are present with this "solution" too.

Furthermore, regardless of how the collection of upconverted macro viruses is obtained, some scanners will fail to detect some of the viruses in it. Then the tester will have to send the missed samples (or replicants of them) to the producers of the respective scanners—so that they can determine where the problem is and update their products. After all, this is standard practice—because one of the main goals of the anti–virus tests is to help the anti–virus producers to improve their products.

Unfortunately, in this particular case it means that new viruses—viruses created by the anti–virus people—will be distributed further to others. As a result not only new viruses would have been created, but they would be also widely disseminated—something which the anti–virus industry definitely could do without. Enough trouble is caused by the virus writers and the people who run virus exchange sites—it is completely unnecessary for us to cause more of it to ourselves and to everybody else.

Back to the Table of Contents

3.5. Stimulation of Virus Creation and Distribution

We did not arrive immediately to the conclusion that the creation of the upconverts—in fact, the creation of any computer virus, for whatever purposes—is something wrong; something that must not be done under any circumstances. After all, the stated goal of the proponents of the idea—to prepare our products and be able to help the users if these upconverts are created naturally—is very tempting. Initially, we upconverted a few (3–4) ExcelMacro viruses to Excel97Macro viruses and implemented detection of the latter in our scanner (F–MACROW).

What was our surprise, when some time later somebody uploaded anonymously to our ftp site a big collection of macro viruses from some big virus exchange site! In it, we found examples of precisely the same Excel97Macro viruses we had created. Since we had not given samples of them to anyone, (even the people who insist that the anti–virus producers should create upconverts for the purposes of implementing detection of them agree that the created viruses must not be distributed to anyone—not even to other anti–virus researchers), and since it was unlikely that exactly those ExcelMacro viruses (all macro viruses of this kind known at the time) would have been upconverted naturally and the upconverts found their way to the virus exchange site, the only plausible explanation is the following.

The person running the virus exchange site was using our scanner to sort his macro virus collection. This is quite natural, since our macro virus scanner provides a level of exact identification of macro viruses which level is unsurpassed in the anti–virus world, it has an extremely high virus detection rate, and is generally designed not just as a user–level anti–virus product but also as a tool for macro virus identification and collection management (we needed a tool for those purposes ourselves, so we made our scanner such a tool). Furthermore, the scanner clearly lists which macro viruses it knows about—usually this pretty much covers all known macro viruses.

Clearly, the person running the virus exchange site saw the Excel97Macro viruses listed among the viruses which our scanner could detect. (It lists the viruses for the different platforms—WordMacro, Word97Macro, etc.—separately, so that the users can clearly see which viruses we claim to be able to detect.) He did not have them in his collection—but he had the originating macro viruses. So, he simply performed the upconversion himself—in order to obtain these "missing" viruses and make his collection more complete. Unfortunately, as is customary for a virus exchange site, the collection of viruses is publicly available to anyone—which means that those upconverts are likely to have received wide dissemination.

In our opinion, we share the blame for that—after all, if we had not created these viruses ourselves and if we did not claim that they existed, the virus exchange guy would not have created and distributed them himself—or at least that would have been less likely. Therefore, the mere fact that the upconverts are created and "officially" listed as existing in some anti–virus product stimulates their creation and distribution by the virus exchange people.

Back to the Table of Contents

4. The Solution

In this section we shall present a method for implementing detection of the upconverts which, we believe, does not have the problems discussed in the previous section.

Back to the Table of Contents

4.1. Required Criteria

It is clear that any such method must satisfy the following criteria:

  1. It must ensure that the upconverted viruses, when/if they appear, will be detected reliably.

  2. It must achieve the above goal without forcing the anti–virus producers to get involved in creation and distribution of computer viruses.

  3. It must not stimulate the virus writers, collectors and distributors to do the upconversions themselves.

We believe that we have succeeded developing a set of methods which satisfy all of the above criteria. They will be described in details in the next subsection.

Back to the Table of Contents

4.2. The Methods

In this subsection we shall describe four different methods which can be used to implement detection of the upconverted viruses. The methods are given in their order of increasing efficiency. Neither of these methods involves creation of new viruses.

Back to the Table of Contents

4.2.1. Scan Strings

Historically, one of the first methods used for virus–specific detection was to pick a small piece of code from the virus (a scan string) and search all infectable objects for it. If the scan string is selected well, it will not only detect all instances of the virus—it is also very likely to detect future variants of the virus.

The same approach can be used to detect the not–yet–existing upconverts—by considering them as "future variants". One has to examine the existing upconverted viruses, select a good scan string from them—and this scan string can be used to detect other upconverts from the same virus families.

There are several problems with this approach. Probably the most important one is that the scan string approach is not very adequate for detecting VBA viruses in the first place. The internal representation of a VBA (VBA3 or VBA5) program consists of two–byte opcodes, each followed by one or more zero– to eight–byte operands. The operands are very often pointers to different complex data structures and are, therefore, variable. That is, one and the same VBA program can be represented by one and the same sequences of opcodes which sequences, however, have different contents of their operands—simply because the data structures they point to reside in different places. (A detailed description of this phenomenon requires, unfortunately, detailed knowledge of the internal formats of the VBA modules. The description of these formats is outside the scope of this paper. Furthermore most of the information is covered with a non–disclosure agreement we have signed with Microsoft and is, therefore, not publicly available.)

In practice, the above means that a VBA module consists of constant areas which can be as small as two bytes and which are separated by zero– to eight–byte areas with variable contents. Therefore, a scan string for a VBA virus must often be rather long and contain lots of wildcards.

But this is not all. It is possible to have two different VBA programs which consist of one and the same opcodes—and which can be differentiated from each other only if the pointers in the operands of these opcodes are resolved to the data each of them points to. The consequence is that even a carefully chosen scan string of a VBA virus can match some other, legitimate program and therefore cause false positives. That is why we strongly discourage the usage of scan strings for VBA virus detection and recommend exact identification instead.

Another problem comes from the fact that it is possible that no variant of a particular family of WordMacro viruses has been upconverted naturally. Therefore, no Word97Macro variant of this family is known and there is nothing to pick a good scan string from.

Back to the Table of Contents

4.2.2. Upconversion of Single Macros

There is a way of solving at least partially one of the two problems mentioned in the previous subsection. When a particular family of WordMacro viruses has not produced any known Word97Macro upconverts via natural upconversion, one could pick just a single macro from each virus in the family (if possible, a macro which is shared by all variants in the family—or at least by as many of them as possible), and upconvert only that macro. Of course, the macro has to be analyzed extremely carefully beforehand and the anti–virus researcher must be absolutely certain that taken alone, the macro is not able to replicate itself—i.e., is not a virus.

Once this non–viral macro is upconverted, several approaches are possible. For instance, one could pick a scan string for it—then this scan string will detect all upconverts of the viruses in the family which share the original macro. However, as mentioned in the previous subsection, scan strings are not particularly suitable for detection of VBA viruses. It is much better to implement exact identification (via some kind of checksum of the non–variable areas of the whole macro/module body).

Some scanners (e.g., ours) go even further—they do not work on a per–macro basis but consider each virus as a set of macros/modules, where each macro/module is identified exactly. For them a slight modification of the above solution is necessary—because otherwise they will think that the upconverted macro is the whole virus and will remove only it on disinfection, with the danger of creating a new variant, as explained in [Bontchev97]. The modification consists of adding to the definition of the upconvert another, bogus module—one which does not exist in any virus. This way the complete definition will never match any upconvert—only parts of the definition will match parts of the upconverts. Therefore, the upconverts, when they appear, will be detected as "new variants," presumably prompting the user to send samples to the anti–virus producer. A slight modification of this approach is used in our scanner to detect new variants of the WM/CAP virus and all possible upconverts of the WM/Muck viruses.

One advantage of this method is that, similar to the one presented in the previous subsection, it allows the detection of new upconverts—even if their originating viruses are not yet known or even not yet created. One disadvantage is that it does not identify the upconverts exactly—they are only detected (as new variants). Another is that it is applicable only to multi–macro viruses. If the originating virus consists of a single macro, then it is obviously impossible to pick one non–viral macro from it.

Back to the Table of Contents

4.2.3. Upconversion in Parts

The main idea behind the method presented in the previous subsection can be extended further. The idea is to pick one small non–viral part of the virus, upconvert it, and implement detection and/or identification of that. However, nothing forces us to use always parts as big as a single macro.

The solution should be obvious by now. Even if some macro in a virus can be a virus itself, it is always possible to split it into parts, none of which is viral. These parts can then be put between Sub/End Sub "brackets" to satisfy the WordBasic/VBA5 syntax analyzers, each of the parts can be upconverted separately, and identification data for them can be computed, ignoring the "brackets".

Of course, careful manual analysis is necessary in order to determine the number of parts and the size of each part. During our experiments we have observed that even if the boundary between the parts passes between whole lines (i.e., it never splits a line), no more than two parts per macro are necessary. However, we have been unable to prove formally that two parts are always sufficient to make a viral macro non–viral. But the latter hardly matters—each virus is analyzed manually anyway and the macros are split in as many parts as necessary.

The method presented above is rather time–consuming and requires significant work per virus—although, after some practice one "gets the hang of it". Also, we have developed tools to help us compute the integral identification data from the different upconverted parts (ignoring the "brackets") and so on. Probably the only other disadvantage of the method is that it cannot tell for sure whether the upconvert is a virus—since it is never created. However, the manual analysis usually reveals that pretty reliably.

However, the method is extremely powerful. It is unconditionally applicable to all macro viruses—and it allows detection, recognition, exact identification and surgical removal (i.e., only the modules which belong to the virus are removed on disinfection) of the upconverts when they actually appear. And, of course, since none of the parts is viral and since their upconversions are never put together, at no point is a new virus created.

Once we developed this method, we asked a colleague anti–virus researcher who works for a company that regularly does upconversions to test its efficiency. Using our method, we computed the identification data for all upconverts of all members of the WM/Wazzu family—regardless that many of them are encrypted and, therefore, their upconverts cannot be viruses. Then our scanner with the updated database of virus definitions was run on that company’s collection of upconverted viruses. Not only did it detect and identify every single virus there, but it even discovered problems in the collection—e.g., some viruses were named incorrectly there.

We strongly recommend this method to all anti–virus producers. It completely fulfils the goal of providing the desired level of protection to their users, yet it does not require from them to get involved in virus creation—something every ethical anti–virus person should strive to avoid at any cost.

Back to the Table of Contents

4.2.4. Upconversion of Disabled Viruses

There is an even easier method than the upconversion in parts described in the previous subsection. Again, we proceed from the basic premise that if the upconversion process does not create a new virus (or, more exactly, that the new program it creates is not viral), then there is nothing wrong in doing it. Therefore, the problem is reduced to modifying the originating WM virus in such a way, that its upconvert is guaranteed not to be a virus—yet still can be used to extract identification data which will permit us to detect, recognize and identify the real upconvert, when/if it appears.

The idea is that while the upconverts of many WM viruses are not viral, the upconverts of every WordBasic program which is not a virus is guaranteed not to be viral. Therefore, we can introduce a small, standard modification to the WM virus—a modification which will disable its ability to replicate. Then we can upconvert the resulting non–virus and compute the identification data of the upconvert using a method which simply skips over the upconverted image of the standart modification introduced by us—as if it does not exist.

In practice, this can be achieved by inserting a Stop statement at the beginning of each MAIN subroutine of each macro of the virus. When the macros modified in this way are upconverted, the Stop statement upconverts to an Exit Sub statement. It is then a simple matter of modifying our tool for extraction of W97M virus identification data, so that it ignores the Exit Sub statement if it is the first statement of the VBA5 program.

Back to the Table of Contents

4.2.5. VBA5 Heuristics

Probably the most powerful detection method is a good set of heuristics. By studying the upconversion (and upconverting many non–viral macros), it is possible to come up with a relatively good set of rules which would indicate whether a given WordMacro virus would upconvert to a viral Word97Macro program or not. (As mentioned elsewhere in this paper, very often the upconvert is not able to replicate—i.e., is not a virus.)

These rules do not need to be very precise—it is sufficient that they are made to err on the safe side. That is, they will never fail to indicate that a particular WordMacro virus would upconvert to a Word97Macro virus—although they can occasionally claim so for some cases when the Word97Macro upconvert would not be actually viral.

An example of such a rule could be "IF a WordMacro virus does not contain encrypted macros AND its replication does not depend on ToolsMacro .Edit (or any other WordBasic operator, the VBA5 upconvert of which does not work), THEN the Word97Macro upconvert of this WordMacro virus would be a virus too".

By further studying the replication mechanisms of the WordMacro viruses which match the above rule, it is possible to come up with pretty good set of heuristics for detecting the Word97Macro upconverts of the viruses of this kind. We have developed such a set of heuristics and have implemented them in our anti–virus products—F–PROT and F–MACROW.

The analysis shows that these heuristics detect 100% of the known Word97Macro viruses which are upconverts of WordMacro viruses. (Our products contain also other, VBA5–specific heuristics. The average detection rate of Word97Macro viruses of any kind—not just upconverts—is about 98%.) We have no way of testing this, but we have reasons to believe that this detection rate of 100% will remain for all Word97Macro upconverts of all known WordMacro viruses.

The heuristic approach has the additional advantage of providing detection even of upconverts of the WordMacro viruses which are either not known or not yet created—something which the "easy" method of simply creating the upconverts of the known WordMacro viruses does not allow.

Probably the only problem of this method is that it does not provide virus–specific detection, identification and removal. That is, the scanners using it can detect the upconverts when they appear—but cannot recognize them by name, let alone identify them exactly. Also, they can be removed only by removing all macros from the infected document. (Admittedly, it is possible to create also heuristics for determining which modules a heuristically detected virus consists of and for removing only those modules. However, as explained in [Bontchev97], this presents the danger of creating new virus variants during disinfection—so, removing of all macros from the infected document remains the safest disinfection approach.

Back to the Table of Contents

4.3. Advantages

The methods described in the previous section have the advantage of providing to the user the protection against upconverts of the known WordBasic viruses without having any of the disadvantages which the more straightforward method of simply having the anti–virus researchers create new viruses via upconversion has. In particular:

  1. They do not damage the image of the anti–virus industry, because they do not force the anti–virus people to get involved in virus creation and distribution.

  2. They do not force the testers of anti–virus products to get involved in virus creation and distribution either—because the anti–virus products do not claim to detect the upconverted viruses and, therefore, there is nothing to test.

  3. They do not stimulate the virus collectors and distributors to do the upconversions themselves and make these new viruses available in their collections on the virus exchange sites, because the viruses are not listed in the list of viruses detected by the anti–virus products—and, therefore, the virus collectors have no way of even suspecting that these viruses actually exist.

  4. In the (relatively rare) cases when the upconverted viruses actually appear on the users’ machines, these users will have protection offered by the anti–virus product they use.

Back to the Table of Contents

4.4. Disadvantages

The methods proposed above are not without some disadvantages, when compared to the more straightforward solution of just upconverting and replicating the WordBasic viruses under Word 97. In particular, these disadvantages are:

  1. Additional work. Obviously, implementing some of the methods (particularly the Upconversion in Parts method) means that some additional work, time and efforts have to be spent on every virus—while straightforward upconversion is easily automated. Furthermore, the anti–virus programs most likely will have to be modified, in order to "hide" some entries from their lists of viruses they can detect. However, this is not much additional work—and we are speaking from personal experience.

  2. Uncertainty of whether the upconverts will be really viruses. As explained in Section 3.1, very often the when a WordBasic macro virus is upconverted to a VBA5 program, it loses its ability of replicating—i.e., it stops being a virus. The only way to check whether the upconvert is still a virus is to attempt to replicate it in Word 97. However, this can be done only if the upconvert is created as a whole new virus; something we want to avoid. So, the anti–virus producer will be forced to keep entries in the database of his product which may detect non–viruses, thus wasting precious space. This is not as bad as it sounds, though. First, even though some viruses stop replicating after being upconverted to VBA5, their payloads might still be functioning perfectly (that is, they will turn to Trojan horses)—and this means that the anti–virus products should be able to detect and eliminate them. Second, even the presence of harmless, do–nothing VBA5 modules in the users’ documents is often considered unwanted by many users, so they will consider the ability of the anti–virus products to remove them as an advantage.

  3. No solution for the testers’ problems. The methods given in Section 4.2 are usable only by the anti–virus producers—not by the testers of anti–virus software. It is still deemed unacceptable for the testers to do this kind of upconversion and to use the resulting samples in their tests—since this would mean that non–viruses would be used in a virus detection test; something which is not correct. Therefore, the testers will not be able to test how well the different anti–virus products are handling the potential threat. However, this is not really such a big problem. First, as we explained in Section 4.2.3, the anti–virus products using this method should not claim that they can handle the upconverts, until these actually appear—in order to solve the problem described in Section 3.5. And, if no claims are made, there is nothing that has to be tested. Second, the main goal of the anti–virus testers should be to test how well the anti–virus products handle the existing threats—not the potential ones. (Testing the latter is important too—but not as much as testing the former, and certainly not at the price of getting involved in virus creation and distribution. There are ways to test it too—again ways which do not involve the creation of new viruses. For instance, the tester could use an older version of the scanners and test its ability to detect known viruses which did not exist when that version of the scanner was released.) This is addressed very well even now, by the fact that more and more testers put their emphasis on testing how well the anti–virus products handle the computer viruses known to be in–the–wild—as opposed to testing how well they handle all known viruses. And, of course, it is perfectly acceptable (and even recommendable) for the testers to use in their tests samples of upconverted viruses that have been actually found in–the–wild.

We firmly believe that the advantages of the methods proposed in this paper far outweigh their disadvantages. Therefore, we urge all anti–virus producers to use these methods (or even better methods, if they can come up with any), instead of creating new viruses.

Back to the Table of Contents

5. Conclusion

The irresponsible way in which Microsoft has handled the problem of WordBasic macro upconversion has created a lot of trouble for users and anti–virus researchers alike. There are security threats following from it and these threats have to be addressed and countered in a satisfactory way. However, the members of the anti–virus community must strive to keep the high ethical standards of their profession and not succumb to the temptation of fast and easy—but unethical—solutions. Satisfactory technical solutions to these problems do exist, as this paper demonstrates, and the anti–virus people should constantly keep in mind that they must never get involved in virus creation and distribution.

Back to the Table of Contents

6. References

Back to the Table of Contents

Appendix A. Why Are the Upconverts Different?

Since, from the user point of view, the upconversion is performed in a rather automatic and unnoticeable manner, and since the upconverted modules usually work pretty much like the originating macros, it might not be immediately obvious to the casual observer that the upconversion creates a new virus from a known one. Let us see why this is so.

  1. Different language. WordBasic and VBA5 are two completely different languages. Although the upconversion results in a VBA5 program which is an implementation of roughly the same algorithm as the WordBasic program, the two programs are quite different. This difference is immediately obvious to anyone who bothers to look. Here are two snippets from the WM/Concept.A virus and from the W97M/Concept.A virus; it is clear that the two programs are not the same:

  2. WM/Concept.A (the beginning of the AAAZFS macro):

    Sub MAIN
    'this becomes the FileSaveAs for the global template
    Dim dlg As FileSaveAs
    On Error Goto bail
    GetCurValues dlg
    Dialog dlg
    If dlg .Format = 0 Then dlg .Format = 1

    W97M/Concept.A (the beginning of the AAAZFS module):

    Public Sub MAIN()
    Dim sMe$
    Dim sTMacro$
    'this becomes the FileSaveAs for the global template
    Dim dlg As Object: Set dlg = WordBasic.DialogRecord.FileSaveAs(False)
    On Error GoTo –1: On Error GoTo Bail
    WordBasic.CurValues.FileSaveAs dlg
    WordBasic.Dialog.FileSaveAs dlg
    If dlg.Format = 0 Then dlg.Format = 1

  3. Different representation. A VBA virus (in fact, any VBA program) is represented in documents in a way much different from the way WordBasic programs are represented. The following two pictures illustrate the representation of the WM/Concept.A and W97M/Concept.A viruses respectively:

  4. WM/Concept.A:

    W97M/Concept.A:

  5. Different structures. This difference is not obvious from the user point of view. Understanding it requires intimate knowledge of the internal representation of the WordBasic and VBA programs in the files. A detailed explanation of these internal formats is, unfortunately, outside the scope of this paper. Suffice to say that in WordBasic the operands of a token follow immediately the token. As opposed to that, the p–code instructions of VBA contain pointers to areas in the module which are outside the opcode area—and sometimes even outside the module stream. These areas have very complex formats—they look as if linked lists of complex records have been created in memory and this memory—dumped to a file.

  6. Different contents. A WordBasic program is tokenized to WordBasic tokens. A VBA5 program is compiled to VBA5 p–code instructions. The binary images of the two programs are very different—in fact, they have nothing in common:

  7. WM/Concept.A (the beginning of the AAAZFS macro):

    01 00 64 1B 69 04 4D 41 49 4E 64 6B 33 74 68 69
    73 20 62 65 63 6F 6D 65 73 20 74 68 65 20 46 69
    6C 65 53 61 76 65 41 73 20 66 6F 72 20 74 68 65
    20 67 6C 6F 62 61 6C 20 74 65 6D 70 6C 61 74 65
    64 2F 69 03 64 6C 67 34 67 54 00 64 2C 2D 2A 69
    04 62 61 69 6C 64 3E 69 03 64 6C 67 64 3F 69 03
    64 6C 67 64 1D 69 03 64 6C 67 73 CB 00 0C 6C 00
    00 1E 69 03 64 6C 67 73 CB 00 0C 6C 01 00 64 69

    W97M/Concept.A (the beginning of the code area of the AAAZFS module):

    8F 14 00 00 00 00 00 00 5A 00 EA 04 40 00 00 00
    5A 00 EA 04 58 00 00 00 D8 00 00 00 33 00 74 68
    69 73 20 62 65 63 6F 6D 65 73 20 74 68 65 20 46
    69 6C 65 53 61 76 65 41 73 20 66 6F 72 20 74 68
    65 20 67 6C 6F 62 61 6C 20 74 65 6D 70 6C 61 74
    65 00 FF FF FF FF FF FF 5A 00 EA 04 70 00 00 00
    45 00 00 00 E5 00 AF 00 20 00 30 02 21 00 4C 02
    25 00 4E 02 01 00 2E 00 4A 02 FF FF

By now it should be obvious that the two viruses are quite different. They can be considered as "the same virus" only in the sense of Dr. Fred Cohen’s proof that all computer viruses are equivalent ([Cohen89]). However, we should keep in mind that Cohen’s proof holds only for Turing Machines—not for real computers. Also, according to it all computer viruses are equivalent—therefore, it is practically useless for the purposes of differentiating between any two viruses—not just between the upconvert and its originator. The fact that the upconversion occurs automatically cannot be a relevant factor either—after all, the creation of new variants via random corruption (see [Bontchev97]) happens automatically too (and much more often than the natural upconversion of WordMacro viruses)—and the variants created this way are considered different.

Back to the Table of Contents

Appendix B. Known Upconverts

The following is a list of all known (by July 1998—the date of writing this paper) WordMacro viruses. The second column of the table indicates whether a Word97Macro upconvert of the virus is known. As it can be easily seen, the upconverts are created very rarely in the real world—during the more than a year and a half after the release of Office 97, the 2143 known WordMacro viruses have produced only 94 upconverts—or less than 4.4% of the known WordMacro viruses have actually been upconverted in the wild.

The actual number is even a bit smaller, because some of these WordMacro viruses are encrypted—which means that their natural upconverts are never viral; obviously somebody has produced the respective Word97Macro viruses by upconverting the manually decrypted WordMacro virus (or the not–yet–encrypted first generation of the virus). There is also a handful of known Word97Macro viruses which are obviously upconverts, but the originating WordMacro viruses of which are not known. They are irrelevant for the discussed issue, since detection of them could not have been implemented by simply creating them from their originating viruses—because the originating viruses are unknown.

Back to the Table of Contents