Search  
Thursday, August 07, 2008 ..:: Home » Technical Papers » Visual Basic 9.0 Compiler Bug ::.. Register  Login
Site Navigation

Visual Basic 9.0 Compiler Bug
Last updated 11/23/07
I found a bug in the Visual Basic 9.0 compiler that results in particular lines of your source code not being included in the IL output. That’s extremely serious because this lack of fidelity may cause key operations not to occur and the failure is entirely silent.  Rather than continue posting in my blog, which does not version well, I’m moving to this page. I’ll keep it updated as I understand more and note the changes at the top of this page.
NOTE: This is my current understanding of the problem in real time. I do not anticipate it being entirely correct. Stay tuned for updates!
Please also note there are important caveats in the section How Narrow is Narrow? I understand a lot of what’s going on here, but these questions remain.
Thanks to the VB Team
First, I want to thank the Visual Basic team. I think they’ve done a great job in responding to this bug and in being very open about it. I’ve actually been more conservative in limiting what I’ve said based on potentially NDA emails than the team meant for me to be. No one likes to see a bug in their software, and it’s the response to the bug that matters. Since this bug has the potential to cause a bug in your software, you’ll be glad to know they’ve treated it as the most serious possible type of bug.
They’ll be posting a KB on the bug, but they have to be correct before they do, test the mitigation tool extensively, etc. I can give you real time information of how I understand the bug. Not everything here will be correct, but it’s the best information I have as of the time I update.
Common Symptoms
The symptom is that lines of code are missing from your compiled IL. The code that runs in the field is not what you wrote. It is missing lines of code. Those particular calls will not be included even if you move them, place them in a try block, duplicate them twenty times, etc.
You’ll find this because your code will not do what you said and someone – you, your testers, or your customers will notice that the application is misbehaving.
You’ll then start debugging. If you place a breakpoint on the missing line it will move when you run your application in debug mode. This is the marker behavior. If you pace a breakpoint and it is not hit, move up what you believe to be the callstack (if its unfamiliar code, use Find All References) placing breakpoints to see if you find a moving breakpoint or you find a logic error in your code. This is normal debugging behavior.
If you have a moving breakpoint, you’ve encountered the bug.
What to do if You Encounter the Bug?
To confirm that you have encountered this bug, open your code in Reflector (or ILDASM if you are macho) and look at whether the line of code exists. If it does not exist, you have encountered this bug.
If you encounter this bug, and your IL does not match the source code, please zip your project in case you have encountered a variation. Do not clean prior to the zip.
The current understanding of the bug is that it has very narrow impact. The team has the compiler to look at, so I trust their judgment on that. But if there are any variations, they need to be submitted as bugs. Please don’t take the responsibility you have if you encounter a bug lightly. Keep reading and if the scenario I describe applies to you, in particular understand the diamond plus one reference pattern and the caveats. If your situation is different, please report the bug. Work with your organization to see what agreements it needs to give Microsoft to give it copies of the source code and compiled app. Do not post proprietary stuff on Connect. Feel free to email me to connect you directly with the right people.
Clicking the Build/Rebuild menu item does not remove the problem. Clicking Build/Clean Solution will probably provide a temporary fix, but is not a permanent fix. Do not ignore the problem. The integrity of your application is at stake.
Is your Project Excluded on the Simple Things?
Before jumping into details of how this happens, there are some basic requirements that exclude many projects. A lot of you can breathe easier after reading this. This bug only occurs when all of the following apply:
-          You have at least five projects in your solution
-          You have at least two root projects
-          You use constrained generics
The five projects are required to build a “diamond plus one” reference pattern.
A root project is a project which no other project references and this extra root is the “one” part of the “diamond plus one” pattern name. All solutions have at least one root. Additional roots are legal for any reason, but are most often used for test projects.
If you’re using generics, you are probably constraining them. You can find out more about generic constraints in my August column for Visual Studio Magazine. A constrained generic is one where the the type argument has a specifier:
Public Class BaseBiz(Of IBiz)
If you meet these three requirements, I think you owe it to yourself and your team to understand the gritty details of this bug so you can determine if it applies.
Project and File (Assembly) References
To encounter this bug you need a mix of file and project references. File references are sometimes called assembly references. A project reference is one you select on the Project tab of the add assembly dialog and a file reference is one you select on the Browse tab. In addition, VB assemblies referencing C# projects use file references. You change from one type to the other by removing and adding back the reference from the correct tab.
An important note here is references from VB to C# are always file references, even if your C# project is part of your assembly and you add it with the Project tab.
 The simple fix if the rest of the bug applies is to replace the file references with assembly references. I initially thought everyone should run the mitigation tool and only use project references in VB 9.0 pre-SP1. However, I’ve been convinced that there are valid reasons to use file references. If you want to use them, understand the diamond reference plus one pattern to see if you’re still in the running for the bug.
Unfortunately you can’t determine whether they are project or file references within Visual Studio (feel free to suggest this via Help/Report a Bug). You need to open your project in Notepad or an XML editor. A file reference looks like:
<Reference Include="ClassLibrary2, Version=1.0.0.0,
     Culture=neutral, processorArchitecture=MSIL">
  <SpecificVersion>False</SpecificVersion>
   <HintPath>..\ClassLibrary2\bin\Debug\ClassLibrary2.dll</HintPath>
</Reference>
 
Project references appear in a different ItemGroup in the project as:
 
<ProjectReference Include="..\ClassLibrary1\ClassLibrary1.vbproj">
   <Project>{3F2AC8F2-6339-45F7-AF60-C4723AFE1556}</Project>
   <Name>ClassLibrary1</Name>
</ProjectReference>
 
The reason the VB team is working on a mitigation tool is that this will be so hard to check in large projects. To change from a file to a project reference (or vice versa) remove the reference and add it back via the desired tab page. Feel free to suggest a better way to check for reference type in the next version.
The fact that its so hard to ensure you aren’t accidently using file references is one of the insidious things about this bug. The reference I had was from a temporarily obsolete test project.
Understanding the Diamond Plus One Reference Pattern
If you haven’t been able to exclude your project so far, you need to understand the diamond plus one reference pattern.  You will probably have several more projects in your solution so consider using Visio or PowerPoint and drawing a picture of all of your projects. Then you can drag them around to better understand the relationships. I am not aware of any tool that does this for you.
Project 1 has a reference to project 2, 3, and 4. Project 1, 2, and 3 have a reference to Project 4. This makes a diamond reference. There are a few other references, but there’s the diamond. The “plus one” part of the pattern is another root project with a reference to Project 4:
Let’s add some meaningful assembly names to help you relate to my project. Keep in mind that you could have this structure for different reasons.
 
In my talk on generics I introduce a similar pattern because its they key pattern to implementing generic refactoring. There are two differences between this the code I use in the demonstration that drops 50% of the source code and the diamond plus one pattern. The generic base classes are still in the same assembly as the business objects. That’s a demo ware moment and if you have used that refactoring pattern, its quite likely that you moved the generic base classes into their own assembly. The other difference is that there is not a second root – no “plus one”. If you’ve seen that talk and built real world projects using the generic refactoring techniques, check carefully for the diamond plus one pattern.
There are probably many reasons to arrive at this pattern.
If you have this pattern, I think you should understand why this bug occurs, and then follow up by understanding some of the remaining questions.
Why this Bug Occurs
I started out with an attitude of not really caring what the real problem was and just wanting a mitigation strategy and a fix in SP1. I’ve come to believe that if you aren’t excluded so far, and don’t see switching to all project references (meaning reading all of your project files), you need at least some understanding of what goes wrong.
The second root reference loads the file that was previously created, not the one being created on this build. It does this because it loads first. Common.dll (Project 4) compiles with updated source code and the two don’t match. Code in the GenericBase.dll (Project 3) specifically related to generics that are constrained looks for details on its constraint. It finds two copies in memory and becomes confused. At this point, the local code in the compiler should throw an error. It’s confused and confused code should always throw an error. Unfortunately it quits instead and fails to output the line of code.
A local fix of a compile failure would not make us happy because it would be some error about ambiguity. The fix cannot disallow the combination because C# files are always referenced as file references. Instead the fix is going to have to check before the compiler loads the file version whether it will be updated during the build and use the updated version –effectively forcing a project reference whenever needed to avoid two out of sync assemblies in memory.
This is also the reason it’s an intermittent error and why the Clean solution is a temporary fix. It only occurs when the Common dll being compiled differs from the file.
How Narrow is Narrow?
The VB team thinks the impact of this bug will be extremely narrow, although they agree on how serious it can be when it occurs. I’m still asking questions on this and hopefully will have answers for you soon.
But for those of you not in the US, I’m posting this version on Thanksgiving morning. Thanksgiving is a holiday almost everyone spends with family (away from work) until Monday. I’m then at DevTeach Vancouver and doing an all day session on workflow for the Los Angles .NET Developers Group on Dec. 1. I plan to stay on top of this while I’m traveling, but need to back off how much time I’ve spent on this.
Here’s my project with the file references marked with dotted lines:
Does it really have to be a second root that loads the file? Can it be a descendant from a second root or some other branching in very complex directory structures? Is there any other way the file version of the Project 4 dll might be in memory?
Does the file reference from Project 1 to Project 2 really need to be a file reference?
Does it really have to be four files in the diamond? Are we certain to avoid the problem if Project 3 and 4 are collapsed together? What if 2 and 3 are collapsed together?
Does the constraint need to be an interface constraint or could it be a base class constraint?
Where does the concrete class that uses the constrained generic need to be? Can it be in either Project 1 or Project 2?
If this raises other technical questions in your mind, feel free to ask them via email or my blog. I’ll try to follow up with the team so people understand this problem and can respond appropriately.
Mitigation
The compiler becomes confused because of the two copies of the assembly – the one loaded from the file and the one just loaded as part of the build. Stop loading the one from file and the problem goes away. So the mitigation is simple – once you know where the file references are.
In the short term, you’ve got to evaluate your build order and examine project files. In the very near future, I anticipate the team posting a mitigation tool to evaluate solutions and offer to update references when needed.
Summary
Many, probably most, VB programmers will never encounter this bug. Programmers that encounter it will be working with more sophisticated architectures. I want you to know about this bug so you can catch it before you’ve got code that fails to do what you want it to do. It’s nasty enough I certainly wouldn’t want to trust my test systems to find it. That’s like leaning out against a safety line. I care about the relatively small number of people that will encounter this bug, but I care more about working to ensure that every VB developer can guarantee that they will not encounter this bug.
Links
No links yet. I’ll link here to the KB and other resources as they become available.
Copyright (c) 2008 The Source For Information On Code Generation in Microsoft .NET   Terms Of Use  Privacy Statement