In the past, when I’ve wanted to do a screen capture of an application, I’ve generally fired up FRAPS or Camtasia or similar third party software. Screen capture software tends to be a hassle, though – cost, compatibility glitches, poor performance, etc.
Another approach is to generate your screen capture video from code within your app by calling the Windows Media Encoder SDK. Windows Media Encoder can read from a range of input sources including real video devices, pre-existing video files, and the “screen capture” device we’re interested in. Then it can apply a rudimentary transform pipeline, and send the result to an encoder and on to an output sink like a file or a live Windows Media broadcast.
What you’ll need:
- Install Windows Media Encoder 9 Series: (32 bit | 64 bit )
- Install Windows Media Encoder SDK
- A .NET app whose output you want to render to video (WinForms or WPF, shouldn’t matter too much – I worked with WPF. In fact, I think even a Console application could be rendered this way – basically, anything you can get an HWND for – see below.)
Suprising fact #1 – 64 bit is different to 32 bit
I installed the 32 bit edition of the Encoder on my 64 bit Windows 7 RC1 install. I probably could have avoided the problem below by installing the 64 bit version.
When I first instantiated the WMEncoder object from my .NET app, I got a failure to instantiate:
Time to check that the Encoder and Encoder SDK installations succeeded. Looking around for the Windows Media Encoder 9 dlls, I noticed they were in my Program Files (x86) folder, so, on a hunch, I told my .NET app to target 32-bit specifically (rather than being agnostic and just running in whatever platform the target machine is natively).
Before:
After:
And that solved the problem.
Surprising fact #2 – COM is fiddly
When you ask the encoder to use the screen as its input stream, you can further tell it whether to grab the whole screen, just a specific part of the screen based on pixel boundaries, or a particular window.
Generally, WMEncoder does a good job of presenting a .NET friendly interface to its underlying COM horror-show. But in this particular case, things get a bit fiddly. The IWMEncVideoSource2 object which represents our screen capture source also implements IPropertyBag, which is not a Windows Media Encoder interface; it’s a generic COM thing, if I’m not mistaken. For this reason, there’s no helpful .NET interop definition for IPropertyBag in the WMEncoder SDK – you’re left to your own devices.
The solution was actually relatively simple – pinvoke.nethas entries for both relevant interfaces (IPropertyBag and IErrorLog) – I was able to just copy the annotated definitions straight from pinvoke.net down into my application and start coding against them.
Surprising fact #3 – flexible and fail-safe interfaces aren’t discoverable
By “flexible” here, I mean interfaces that rely on an untyped property bag of name value pairs. By “fail-safe”, I mean interfaces that, when given configuration values that don’t make sense, rather than raising an error and explaining the problem, they “fail-safe” to a sensible default value.
As I mentioned above, the screen capture video source can capture either the whole screen, a particular window, or a specified rectangle.
Surfing around the SDK documentation and google, it’s possible to find a list of property bag keys that the screen capture source takes notice of. But it wasn’t easy (I never managed) to find documentation of which keys/value pairs must be set to elicit a certain behaviour. For example, the default behaviour is to capture the entire screen. There’s a “Screen” key/value pair which is #defined as “CaptureScreen” in some samples. If I want some other behaviour than whole-screen-capture, do I need to set “Screen” to false? Or zero? Or null? Or “No'”? (Hint: the answer’s no).
There’s a “CaptureWindow” key/value pair. If I want to capture a particular window, I guess I’m going to have to set this. Should it be true/1/”Yes”? (Hint: no). There’s also “WindowTitle”, so I’m guessing I specify the window I want by passing the window’s title in as a string to this key. (Hint: wrong).
All the above approaches seem (still seem) reasonable to me, and there are googleable examples of others making them and posting their code on the web, but they’re wrong.
Actually, you just set “CaptureWindow” to the Int32 HWND of the window you want to capture.
Moral of the story: if you’re writing an API for mass consumption, you can either:
- write a tight interface that is discoverable and complains clearly and specifically when violated, or
- write a loose interface and great documentation.
Please choose one.
Surprising fact #4 – WPF’s discoverability irritates me again
WPF is generally a vast improvement over WinForms. The main area it falls down in is discoverability. For example, if I have an Ellipse on a Canvas, and I want to set the Ellipse’s position on the canvas, here is the list of approaches I will try, in order:
- myEllipse.SetPosition(new Point(100, 50)); // Fail
- myCanvas.SetPosition(myEllipse, new Point(100, 50)); // Fail
- Rain dance // Fail
- Prayer // Fail
- Establish working group // Epic fail
- Canvas.SetLeft(myEllipse, 100); Canvas.SetTop(myEllipse, 50); // Success, but need to shower
Today’s WPF-discoverability irritatation is a lot lower on the irritation scale than that old chestnut. I needed to obtain the HWND for my app window. In WinForms, HWND is a property on the Window, from memory. In WPF, you need to do the following, which IMHO has the same code reek as the Canvas malarkey above:
new WindowInteropHelper(Application.Current.MainWindow).Handle.ToInt32()
(Thanks to http://blogs.vertigo.com/personal/ralph/Blog/archive/2007/04/12/wpf-get-hwnd-of-window-object.aspx for that one).
Surprising fact #5 – after all that, it works!
[code:c#]
// Create an instance of WMEncoder
_wmEncoder = new WMEncoder();
// Create a source group with an arbitrary name
IWMEncSourceGroup sourceGroup = _wmEncoder.SourceGroupCollection.Add("Screen");
// Add a video source to the source group
IWMEncVideoSource2 videoSource = sourceGroup.AddSource(WMENC_SOURCE_TYPE.WMENC_VIDEO) as IWMEncVideoSource2;
// Tell the video source that it's doing screen capture
videoSource.SetInput("ScreenCap://ScreenCapture1", null, null);
// Get access to the video source's property bag (IPropertyBag is not part of the Windows Media Encoder SDK, and
// so you don't get a friendly wrapper for it and its companion IErrorLog interface just by referencing WMEncoderLib.
// I got set up by copying interface definitions with the appropriate magical COM interop annotations from pinvoke.net.
// http://www.pinvoke.net/default.aspx/Interfaces/IPropertyBag.html
// http://www.pinvoke.net/default.aspx/Interfaces/IErrorLog.html
IPropertyBag propBag = videoSource as IPropertyBag;
// I wasn't able to find clear documentation of what properties needed to be stuffed into the property bag in order
// to do screen cap of a specific window. It seems that it's only necessary to set the "CaptureWindow" property to
// the HWND of the window whose content you want to capture.
// That presents a problem - it's easy in WinForms to get at the HWND of a window, but WPF doesn't make it quite as
// discoverable - this blog entry helped:
// http://blogs.vertigo.com/personal/ralph/Blog/archive/2007/04/12/wpf-get-hwnd-of-window-object.aspx
// Note that the call to ToInt32() is essential - without it, my screen capture just grabbed the whole screen.
object val = new WindowInteropHelper(Application.Current.MainWindow).Handle.ToInt32();
propBag.Write("CaptureWindow", ref val);
// Now, we tell the source group what output encoder to use. In this case, I'm using MS's screen capture encoder,
// appropriately enough.
const string EncoderName = "Screen Video High (CBR)";
IWMEncProfile profile = _wmEncoder.ProfileCollection.Cast<IWMEncProfile>().FirstOrDefault(p => p.Name == EncoderName);
if (profile == null)
{
throw new Exception("Couldn't find desired encoder '" + EncoderName + "'");
}
sourceGroup.set_Profile(profile);
// Set the output file name
_wmEncoder.File.LocalFileName = "ScreenCapture.wmv";
// Begin capture - don't forget to call Stop() when you're done.
_wmEncoder.Start();