Localizing linked images in Word (and horrible interfaces)

A couple of weeks back, I was struggling with a set of Word documents that had external links to images (equivalent to the HTML img element). I desperately needed to get the image links broken and the binary image data included in the document. There were thousands of documents, so an automated solution was required.

Now, what's wrong with the following Word automation code (pseudocode, but it's the same in C#, VBA and whatever)? More specifically, why did it make the images vanish? For the record, practically the same code was experienced to work elsewhere.

foreach (Shape s in document.Shapes) {
if (s.LinkFormat != null) {
s.SavePictureWithDocument = true;

It took us quite a few hours to figure this out, so I'll spill it for you. It's all about delays. When the links point to an HTTP URI, setting the SavePictureWithDocument to true seems to start a background thread that actually retrieves the images from the remote server. But since it does take some time (albeit very little), BreakLink is run and the document closed prior to completion of the download. Thus, the images disappear.

What made debugging worse was the fact that the code worked well when stepped in a debugger – the delays of me hitting the Step button were enough for the HTTP download to complete. I believe you can imagine the confusion until we figured this out. Though, in the end the ugly hack was easy: Just add a short delay (we used two seconds) before the BreakLink call. The correct fix would've been significantly more complex, so I never got that far for a one-shot application.

Now, which part of SavePictureWithDocument or BreakLink API documentation warns us about this behavior? You guessed it: None whatsoever.

Most coders aren't particularly famiiliar with threading and parallel computing issues. Any API that starts background threads without explicit request from the programmer is a risk to software stability. To some extent I can understand this in Word's case, but still, the behavior of the API is far from optimal. If the user has just set SavePictureWithDocument and then calls BreakLink, how likely it is that the user had just wanted to trigger the HTTP request but not use the results from it? BreakLink could quite well block in this case until the requests were done (although granted, this might be an issue with serious server lag). At any rate, a documentation entry on these issues would be highly welcomed.

Just another story from a land where lack of proper documentation (and surprisingly few hits from Google, too!) costs quite a few working hours.

April 16, 2005 В· Jouni Heikniemi В· Comments Closed
Posted in: Misc. programming