Digital Compositing, as we are going to be discussing it, attempts primarily to deal with the process of integrating images from multiple sources into a single, seamless whole. While many of these techniques apply to still images, we will be looking at tools and methods which are useful and reasonable for large sequences of images as well.
In the first half of this document, we will deal more with the Science of Digital Compositing. The second half will deal with a few of the more complex (or at least misunderstood) issues on the topic, as well as look at techniques that deal with the Art of Digital Compositing.
As you will see, the skills of a good compositor range from technician to artist. Not only does one need to understand the basic 'tools of the trade', which can include a variety of software, one must also be aware of the visual nature of the process.
Remember, the bottom line is that all the technical considerations are unimportant when confronted with the question of "Does it look right?" Obviously this is a subjective judgement, and a good compositor able to make these decisions will always be in high demand.
By far, the most difficult part of this process is producing the integrated result - an image which doesn't betray that its creation is owed to multiple source elements.
In particular, we are usually attempting to produce (sequences of) images which could have been believably photographed without the use of any post-processing. Colloquially, it should look 'real'. Even if the elements in the scene are obviously not real, (huge insects living inside a giant peach, for example), one must be able to believe that everything in the scene was photographed at the same time, by the same camera.
We will be discussing the manipulations needed to achieve this combination, and the various tools necessary to achieve the desired result. In the digital world, which is the world we're interested in for the bulk of today's discussion, these tools are specifically the software needed to create a composite. Keep in mind that compositing was being done long before computers entered the picture (pardon the pun). Optical compositing is still a valid and often-used process, and many of the techniques and skills developed by optical compositors are directly applicable to the digital realm ( in many cases, digital techniques owe their origin directly to optical methodologies ).
Finally, remember that every person who views an image has a little expert he carries around with him. This expert, the subconscious, has spent a lifetime learning what looks 'real'. Even if, consciously, the viewer is unable to say why a composite looks wrong, the subconscious can notice 'warning signs' of artificiality. Beware.
But we'll also be discussing (particularly during the 2nd half of this course), some of the things that should be done before, during and after the creation of the original elements to ensure that they are as useful as possible.
Before we go any further, let's take a look at a still-frame composite and define some naming conventions.
Example 1 shows a composite from the Walt Disney feature film James and the Giant Peach. [Director: Henry Selick, Skellington Productions]
This composite was created from a multitude of different original images. We usually refer to the individual pieces from which we create our final composite as 'Elements'. Elements in this composite include:
There are many, many other elements in this composite as well. (reflections, smoke, shadows, spray, etc.) Most elements have had some sort of additional processing performed on them, such as color-correction, scaling or blurring.
You may also commonly hear elements referred to as 'layers'. A subset of elements, called 'plates', usually refers to original scanned sequences. Intermediate elements generated during the creation of a composite are generally not referred to as plates.
As stated, a composite is the 'manipulated' combination of elements. This 'manipulation' is usually some form of digital image processing such as color correction or matte creation. We'll discuss various image processing techniques in Chapter 4. Mattes, which are used either to isolate or remove certain areas of an image before it is combined with another image, will be discussed in Chapter 5.
There is one final piece of business to be dealt with, before we go any further:
Deal with it.
Elements generally come from one of three places. Either they're hand-painted, synthetic, human generated elements (which can range in complexity from a simple black and white mask to a photo-realistic matte painting of an imaginary scene), computer-generated images (rendered elements from a 2-D or 3-D animation package), or images that have been scanned into the computer from some other source (typically film or video). This is very simplified - CG elments may contain scanned or painted elements - as texture maps for instance, matte paintings often borrow heavily from other sources, and live-action elements may be augmented with hand-painted 'wire removals' or 'effects animations'.
We'll make the assumption that most everyone here has some idea of how CG elements are rendered. The topic is certainly beyond the scope of this discussion. The major distinguishing factor between CG and original scanned images (for the purposes of compositing) is the fact that CG elements are usually generated with a matte channel. Scanned images, unless they're being used solely as a background, will need to have some work done with them to create a matte channel.
Images which were created by non-digital methods will need to be 'scanned', or 'digitized'.
Sequences of scanned images will probably come from one of 2 places; either video or film. Video images, captured with a video camera, can simply be passed through an encoder to create digital data. High-end video storage formats, such as D-1, are already considered to be digitally encoded, and you merely need to have the equipment to transfer the files to your compositing system.
Since even the best video equipment is limited to less than 8 bits of information per component, it is generally unnecessary to store digital images which came from a video source at more than 24 bits, although as you'll see, there are some special considerations, even when storing 8-bit video data, to ensure the highest color fidelity.
Digitizing images which originated on film necessitates a somewhat different approach. The use of a film-scanner is required. Until very recently, film scanners were typically custom-made, proprietary systems that were built by companies for their own, internal use. Within the last few years, companies such as Kodak and Imagica have begun to provide off-the-shelf scanners to whomever wishes to buy them.
There are dozens, probably hundreds of different ways of storing an image digitally. But, in day to day usage, most of these methods have several things in common. First, images are stored as an array of dots, or pixels. The larger the number of these pixels, the greater the 'resolution' (more properly, 'spatial resolution') of the image.
Each pixel is composed of 3 components, red, green and blue (usually simplified to R,G, and B). By using a combination of these 3 primary colors, we can now represent the full range of the visible spectrum. When we need to refer to the full-sized array of pixels for a single component, it is known as a 'channel'.
Consider an image such as that shown in
Example 2 .
Example 3 is the 'Green Channel' of this sample image. Note that the brightest areas of the green channel are the areas that have the highest green component in the original image.
Each component of a pixel can be represented by an arbitrary number of bits. The number of bits-per-component is known as the channel's 'Bit-Depth'. Probably the most common bit-depth is 8 bits-per-channel, which is also referred to as a 24bit image. (8 bits each for the Red, Green and Blue channel).
8 bits per component means that each component can have 256 different possible intensities (28) and the 3 components together can represent about 16 million (16,777,216 for the geeks) colors. Although this sounds like a lot of colors, we'll discuss later why it is often still not enough. Feature film work, for instance, often represents digital images with as much as 16 bits per component, which gives us a palette of 281 trillion different colors. In contrast, lower end video games may work with only 4 bits per channel, or less!
Example 4 shows our original image decimated to 4 bits-per-channel. Notice the 'quantizing' that arises. This phenomenon, also known as 'banding' or 'contouring' or 'posterization', arises when we do not have the ability to specify enough values for smooth transitions between colors.
Since different images may be stored at different bit-depths, it is convenient to normalize all values to floating-point numbers in the range of 0 to 1. Throughout this document we will assume that an RGB triplet of (1,1,1) refers to a 100% white pixel, a pixel that is (0,0,0) is pure black, and (0.5, 0.5, 0.5) is 50% gray.
In addition to the 3 color channels, there is often a 4th channel, the alpha channel, which can be stored with an image. It is used to determine the transparency of various pixels in the image. This channel is also known as the 'matte' channel, and as you'll come to see, it is the concept of 'matte' upon which nearly all compositing is based.
Now that we have a digital representation of an image, we need to store it in a file. There is a huge array of different formats that one may choose to store an image. Formats vary in their ability to handle things like:
We've included a list of the more popular file formats, along with some of their different characteristics, at the end of this paper.
Because high-resolution images can take up a huge amount of disk space, it is often desirable to compress them. There are a number of techniques for this, some of which will decrease the quality of the image (referred to as 'lossy compression' ) and some which maintain the full quality of the image.
Always be aware of whether or not you are storing in an image format, or using a form of compression, which could degrade the quality of your image!
In addition to standard data-compression algorithms, there is also a technique whereby images are pre-processed to be in Non-Linear Color Space .
In order to fully understand all of the terms involved with the concept of storing images in 'linear' or 'logarithmic' format, we need to go over a few more basic concepts about how images are stored digitally.
If we had an unlimited number of bits-per-pixel, nonlinear representations would become unnecessary. However, practical considerations of available disk space, memory usage, speed of calculations and even transfer/transmission methods, all dictate that we attempt to store images as efficiently as possible, keeping the minimum amount of data necessary to realize the image quality we find acceptable.
Encoding an image into 'nonlinear' space is driven by the need to store the maximum amount of useful information in the precision or bit-depth we have decided upon. Note that we have made the distinction that we wish to store useful information. How do we decide what is useful and what isn't? The decision is usually based (at least partially) on the knowledge of how the human eye works. In particular, the human eye is far more sensitive to color- and brightness-differences in the low- to mid-range than it is to changes in very bright areas.
Nonlinear encoding is useful in a variety of situations, and whether you work in film or video, you will undoubtedly need to deal with the process. In the video world, this nonlinear conversion is known as a gamma correction. Typically, video images are stored with a gamma correction of 2.2, and re-conversion to linear space is done by merely applying the inverse, gamma 0.45, to the image.
For film images, a more complex conversion is often used, which takes into account various idiosyncracies of how film-stock responds to varying exposure levels. The most common of these conversions is specified by Kodak for their 'Cineon' file format, and is colloquially known as storing images in 'Logarithmic Color Space', or simply 'Log Space'. Kodak's Log Space also includes room for data that may be outside of the range of 'White' or 'Black' when the digital data is actually put back on film, but needs to be preserved for intermediate color correction. The Cineon format additionally compresses the file's size by placing three 10-bit channels into 32 bits of data.
For our example of nonlinear encoding, we'll look at the extremely simplified case of wishing to take an image which originated as a 4-bit grayscale image and store it as efficiently as possible in a 3-bit file format. Once you understand this scenario, you can mentally extrapolate the process to real-world situations where we deal with greater bit-depths.
If our original image starts as a 4-bit grayscale image, we have 16 different grayscale values that we can represent. Our 3-bit destination image has only eight different grayscale values. The most simple conversion would be merely to take colors 1 and 2 from the input range and convert them to color value 1 in the output image. Colors 3 and 4 would both become color 2, 5 and 6 would be 3, etc. This mapping is shown as follows:
The problem with this method is that it ignores the fact that the human eye is less sensitive to differences in tone as brightness increases. It is hard to demonstrate with only 16 colors, but consider if we had 100 colors to choose from. You would find that, visually, it would be almost impossible to distinguish the difference between 99% and 100% white. In the darker colors, however, the difference between 0% and 1% would still remain noticeable. Thus, a better way to convert from 16 colors to 8 colors would be to try and consolidate a few more of the upper-range colors together, while preserving as many of the steps as possible in the lower range. The next graph shows this type of mapping.
The small inset shows a graph (solid line) of this mapping, as well as an approximation (dotted line) of a lookup table curve that would accomplish this same color-correction. (Note the similarity between this curve's shape and a gamma-correction curve). If we were to view this new image directly, it would appear to be far too bright, since we've shifted mid-range colors in the original image to be bright colors in our encoded image. To properly view this image, we would either need to re-convert back to linear color space or modify our viewing device (i.e. the video- or computer-monitor) so that it compensates for this compression.
The bottom line, and the most commonly misunderstood fact about representing data in Logarithmic or other nonlinear formats, is that:
The conversion between Log and Linear is simply a custom color-correction.
It is a method of consolidating certain color ranges together so that they take up less of your color palette, thus leaving room for the other, more important ranges to keep their full resolution, even when reducing the number of bits used to store a given image.
Because log-encoding is only a color-correction, it means that any file format can be used to store logarithmic data. It also means that there is effectively no way to determine if an image is in log or linear space, other than to visually compare it with the original scene.
Although you may have chosen to store images in a nonlinear format, when it comes to actually working with these images, you will almost always want to convert them back to linear space first. The reason has to do with the need to perform additional color-correction on these elements. Here's another warning:
Consider Example 5 , and a simple color correction where the red channel has been multiplied by a constant value of 0.8. The left side of the image was corrected in linear space, and comparing any pixel with the original image in Example 2 will show a red value that has been reduced by 20%. The right side of the image, however, was first encoded into log-space, then the red channel was multiplied by 0.8. The image was then restored to linear space. As you can see, a fairly slight color correction in linear space has become rather drastic when applied to a log-encoded image. In particular, notice how the mid-tones have changed quite a bit more than the darker areas of the image. This problem can be particularly vexing when working with bluescreen elements: Attempts to reduce blue spill may result in undesirable shifts in flesh-tones, for instance.
Here, the X axis represents the value of pixels in the original, source image. The Y axis is the value the pixel has been converted to after applying our color correction operator. Thus, an uncorrected image would look like the above graph.
Let's assume that we take an image and multiply every pixel by a certain 'brightness'. In the case of Example 6 we apply a brightness of 2.0
The graph of this operation looks like this:
As you can see, a pixel value of 0.5 (mid-tone gray) would map to a pixel value of 1.0 (white) in the new image. This particular example assumes that the same operator is being applied to all 3 channels equally. This need not be the case. We might apply a brightness of 0.1 to the red channel, and a brightness of 1.5 to the green channel and leave the blue channel unmodified (i.e. a brightness of 1.0). This is shown in Example 7 , and the resulting graph would look like this:
As mentioned before, some of the operators we will discuss here may be referred to by different names, depending on the software you are using.
For some of the more simple tools, we may give a brief equation to describe the process. As usual, pixel values are assumed to range from 0 to 1. Any values which are pushed outside of this range would be clipped.
We will not, as a rule, discuss the math behind every operator, but there are certain basic tools which we will cover in greater detail so that they can be used as fully as possible. In these cases, we will use the conventions:
Thus, O = I * 2.0 would refer to the example above where every pixel in the input image is multiplied by a brightness of 2.0 to produce the output image.
Finally, a couple of warnings:
1) Just about any color correction operation on an image loses some data, if for no other reason but round-off error.
In other words, 'Digital' does not equal 'Lossless'!
A particularly vivid example is shown in Example 8; an image which has had a brightness of 2.0 applied, and then a brightness of 0.5 applied to the result. As you can see, a significant amount of data is lost. In certain situations, some of the newer software maybe smart enough to analyze multiple color-corrections like this and consolidate these expressions into a global, more precise algorithm. (In the example above, the two color-corrections would cancel each other out).
2) If you create any intermediate, pre-processed images or sequences (known as 'Pre-Comps') which you need to store to disk, be sure you use a file format that has as much precision as your compositing system.
With all of the background information out of the way, we're ready to start discussing the basic tools which we use when compositing a scene.
We'll first discuss unary operators; events which take a single sequence as input and produce a new, modified output sequence. This is in contrast to operators which take two or more input sequences to produce an output. (Multi-Image Operators).
Let's look at some common operators. We will usually consider the case where all 3 color channels of an image are being affected equally, but in the real world it's not uncommon for different channels to be modified independantly.
Multiply applies an overall scale applied to the channels of the image. When applied equally to all 3 channels, it is also known as a Brightness. We've already seen a couple of examples of this operator.
Instead of affecting the apparent brightness of an image by multiplying, we add (or subtract) a constant value from each pixel.
Example 9 has had a value of 0.2 added to each channel. The graph would look like:
Notice that our blacks have gone to gray, or 'washed out,' something that is often undesirable.
Also know as an exponential curve, this operator uses the following function:
O = I1/Gamma
Thus, a pixel whose initial value is 0.5 will end up with a new value of 0.707 if a gamma of 2.0 is applied.
A gamma of 2.0, shown in our graph format, would look like this,
with the resulting image shown in Example 10.
The reason that the gamma operator is so useful is that it does not change the white or black point of an image. There is less chance for data loss, and images look more natural. Blacks don't change to gray and whites don't blow out or get stepped on.
Please, beware of the term 'Gamma' - it is one of the most over-used letters in the Greek alphabet. Ambiguity and confusion can arise because the term is also used to refer to a variety of totally unrelated nonlinear functions.
Contrast is a method of changing the brightness relationship between the upper and lower color ranges of an image.
Increasing the contrast causes dark areas to get darker and bright areas
to get brighter. A contrast operator can be as simple as
though a better system is to apply gamma-like curves to
the upper and lower ranges, thus:
Ideally, we can also explicitly choose the
boundary between what are considered the 'highs' and the 'lows', giving us a
This is particularly useful in images which are either very bright or very dark.
Example 11 has had a fair amount of increased contrast applied to it.
There are many different algorithms for blurring an image, all of which produce slightly different results, but the basic concepts remain the same.
Example 12 and Example 13 show progressively more blur.
Certain blur algorithms (those based on integer convolutions) animate very poorly - visual 'steps' between blur levels can be seen - making 'rack-focus' effects unacceptable.
Certain blur algorithms can be very slow, particularly for large amounts of blur.
Given the fact that we can blur images, it also becomes desirable to be able to sharpen images. To a certain extent, it is possible, although it's somewhat of a trick. The sharpen operator actually works by increasing the contrast between areas of transition in an image, which the eye perceives as sharpness. Keep in mind that sharpening tools can never actually create or restore lost information. The trick only works up to a certain point, and results can often include undesirable artifacts.
Example 14 has had a slight sharpening applied, with subtle but noticeable results.
Example 15 has had a far greater amount of sharpening applied, to demonstrate the type of problems that can show up. In particular, you will see noticeable 'ringing' along strong transition areas, such as the hair on her forehead.
HSV - Colorspace and operations:
Up until now, we've always defined the color of an image as being based on RG&B components. But there is an entirely different method of representing (and manipulating) the colors of an image, known as the HSV color space. It is, in many ways, a much more intuitive method of dealing with color. HSV refers to the Hue, Saturation and Value of a pixel.
The Hue of a pixel refers to its basic color - red or yellow or violet or magenta, for instance. It is usually represented in the range of 0 - 360, referring to the color's location (in degrees) around a circular color palette.
Saturation is the amount of a color present in a pixel.
Value, for the most part, can just be thought of as the brightness of an image..
Example 16 has had its Saturation reduced by about 50%.
Example 17 has had its Hue rotated by 180 degrees through the color spectrum.
The next class of operators we will be discussing all fall under the category of transformations. These would include panning, rotating, and scaling. Panning an image is simply repositioning it in X,Y and sometimes Z space. The pan can either be static or dynamic.
Scale and rotation of an image are usually defined relative to some user-specified center point, to eliminate the need for additional panning.
In most sophisticated systems, all of these operators can be performed in a 3D space. 3D moves are really just methods of distorting an image so that it appears to have the perspective changes one would expect. Since this process is fairly intuitive, we won't spend a lot of time detailing it. Example 18 demonstrates a few types of transformations on an image.
Note that we can also 'flip' or 'flop' an image by scaling it by negative 1 in the X or Y direction, respectively. Remember that flipping an image is not the same as merely turning it upside-down! Instead, we have produced a mirror image. This can be very useful when faking shadows or reflections in a composite.
Also keep in mind that there are a variety of different algorithms used to compute these transformations. Different types of filtering can produce vast changes in the quality of the resulting image, particularly when dealing with a moving sequence of images. For instance, when animating a pan be sure to choose a filter which is able to deal with increments of less than a pixel. Otherwise you will see your image pop to whole-pixel boundaries as it moves - a visual artifact that is surprisingly noticeable, even when working at high resolutions.
An even more sophisticated method of distorting an image is known as Warping. It is usually controlled by either a grid-mesh or a series of splines. Spline-based systems ultimately create a grid as well - they just do a better job of hiding it from the user.
Warping is also great for cheap laughs - as in Example 19
There may be situations where one actually wishes to re-order the channels that make up an image - placing the blue channel into the green channel, for instance. Reordering within the R,G, and B channels may not be common, but it is very likely that you will find a need to move data to and from the matte channel.
Many software packages nowdays allow one to directly manipulate look-up table (or LUT) curves, using control points on splines . This can give an extremely fine amount of control, and often proves useful as a tool for creating mattes.
Example 20 has had all three channels modified by a set of curves similar to those shown here:
A simple monochrome effect could be obtained by adding the three channels together and dividing by three:
A more proper image, which takes into account the fact that the human eye perceives the brightness of Red, Green and Blue differently, would be obtained by:
An interesting scenario is shown in Example 21 , where only the Blue channel has been modified.
In English, that translates as:
If a pixel's Blue value is greater than 1.15 times its Green value, then set the pixel's Blue value to be zero, otherwise leave the pixel's Blue value unchanged. As you can see, well thought-out equations can quickly produce results that would otherwise have required the combination of several different methods.
Before we can go much further with our discussion of some of these single-input operators, we need to talk about Multi-Image operators.
Dual-input operators are the true heart of digital compositing. We've finally gotten to the point where we are actually able to discuss "The manipulated combination of at least two source images..."!
Many of these operators rely on manipulations to both the color (RGB) channels and to the Matte channel. Where applicable, we will use the convention that, for a given image A:
Adding two images together is, very simply, the process of adding every pixel in image A to its corresponding pixel in image B. Thus:
Example 22 shows 2 images added together.
Subtract: Every pixel in image A is subtracted from its corresponding pixel in image B.
Note: Be aware which of these multilayer events are symmetrical.
We'll discuss Subtraction a bit more when we get into Difference Matting.
Finally, we've gotten to the operator that first comes to mind when discussing the term 'compositing'. The Over operator takes an image (combined with a matte) and lays it on top of the second image. Intuitively, people understand compositing with a matte channel as if the matte were a cookie-cutter which removes all excess information from the foreground image and the result is then pasted on top of the background.
Here's what really happens, mathematically, when we place image A over image B, using image M as the matte image for A. For this first example, assume that images A and B do not already have matte channels associated with them. (In general, when an image without a matte channel is brought into a digital compositing system, a 100% solid matte is assigned to that image.)
The result of our Over is:
In other words, the foreground image is multiplied by the matte image, which causes everything outside of the matte to go to black. At the same time, the background image is multiplied by the inverted matte, which creates a black 'hole'. These two intermediate images are then added together, creating the final output.
For those of us who 'grew up' on CG-rendered images, we're probably more comfortable with the concept of the foreground image already having a built-in matte channel. If this is the case, it usually means that the foreground's color channels have already been pre-multiplied by that matte channel. In effect, it has already been 'cut-out' by a matte and that matte is stored in the 4th, alpha channel).
Thus, the Over operation is simply:
(Where 'A' is the foreground image and 'Am' is the foreground image's matte channel).
Incidentally, the Output image's matte channel = (1 - Am) * Bm
The weighted, normalized addition of two images.
In other words, the two images are averaged together, usually with one of the images contributing a larger percentage to the resultant image.
MV = Mix Value
O = (MV * A) + ((1-MV) * B)
To Dissolve between two images, one merely animates the mix so that it initially displays 100% of image A, and then eventually displays 100% of image B.
In: (A In B)
Out: (A held Out by B)
Multiply: (A Multiplied by B)
Atop: (A Atop B)
As you can see, most of these operators are really just very simple mathematical calculations mixing images and mattes. More complex operators certainly exist. Morphing, for instance, combines the animated warping of 2 images over time, with a controlled dissolve between the two sequences.
Putting it all together:
How one applies all of these operators to achieve the final composite depends on the system you are using. If you are using an 'On-Line' system such as Discreet Logic's Flame, you tend to work in an interactive mode, where operators will be applied immediately to produce a new, modified element. 'Batch' systems, such as Wavefront's Composer, usually work by building a large script that will then execute a sequence of operations to produce the final result. The operations applied in either system may be identical, it is merely the work-and-data-flow that is different.
A matte is an image designed to control the transparency and opacity of another image. Mattes are used during compositing when we only wish a portion of one image to be included in the output image. You may also hear the term 'mask' used when referring to mattes, and in general the two terms are usually interchangeable. 'Mask' is more common when specifying an image being used to control or limit a color-correction (or some other form of image-processing) on another image.
'Mask' and 'Matte' may also be used both as nouns and verbs - the terms can refer to the image itself, or to the process of protecting or excluding a section of an image.
Mattes are generally considered to be single-channel, grayscale images, though they may be stored in any of an image's four channels, as needed. Typically, black or white areas of the matte are used to specify 100% transparency or opacity, while intermediate grays determine partial transparency/opacity.
An example of a situation that would require a very simple matte would be a split-screen composite. The matte would be a fairly simple shape that defines the boundaries of the split - often just a straight line that separates one plate from another. More often, however, we need to place an object whose outline is much more complicated into a scene. We need a matte that accurately represents the boundaries or outline of the object, and fully encloses all solid areas of of the object's interior.
In the case where this object does not move, it is still conceivable that such a 'static matte' can be hand-painted. But even with an unmoving subject it can be difficult to accurately paint something that properly captures the edge-quality (transparency and softness) of a given object. Instead, one will tend to use certain software algorithms, discussed below, to help isolate an image from its background.
Situation's involving static mattes are fairly rare. Far more often we find the need to create a matte for an object that is moving within or through the frame. We require the use of a moving, or 'traveling' matte.
There are two approaches we can take to generate these mattes. The first would continue the methods we've talked about already - hand-drawing a matte for the object in question over every frame of our sequence. And this is still occasionally done, but only after all other options have been exhausted, since the process of hand-drawing a matte for every frame of a sequence is time-consuming and error-prone. (Errors can include edges that slide, jitter or crawl, or visibly darker edges or 'Matte Lines' can appear around the area we are trying to extract). Instead, we rely more on procedural techniques, where some initial parameters are determined that are capable of extracting a matte, and then software is allowed to apply these same parameters over a sequence of images.
One slightly less 'Brute-Force' method of generating a traveling matte involves the use of splines to 'rotoscope' basic outlined shapes for an object. The rotoscope artist can specify certain key-shapes, and the software will smoothly interpolate any in-between frames. Unfortunately, objects which move or change shape a great deal may end up needing a key shape defined for every frame anyway!
A better method, since we (usually) know in advance that we are shooting a subject that will be placed into a different scene, involves photographing this subject in a matter that greatly simplifies the extraction of a matte. By far the most popular (and effective) method of procedurally generating traveling mattes is known as the 'Color Difference Method'.
The Color Difference Method involves shooting the foreground object we wish to isolate in front of a uniform-colored backdrop. Any color for the backdrop (or 'backing') may be used, as long as the foreground is essentially devoid of this color. The term 'Color Difference' refers to the difference in color between the foreground and the colored backing.
The most common backing colors are blue and green, and the choice of which to use is generally determined on the basis of the subject's colors. If someone says that the subject must be wearing a blue shirt, then typically a green-screen shoot would be dictated. Generally, tests are done if the choice is not obvious.
The process of extracting a matte with this method is known as 'keying', and the extracted matte can be referred to as the 'key'.
The software used to 'key' something from its background varies in complexity and capabilities. Most compositing systems allow you to pull mattes based on luminance (Luma-Keying) or chrominance (Chroma-Keying), but more sophisticated software such as Ultimattetm or Primattetm can produce far better results, due to the specialized algorithms they employ.
There is also a method of creating a matte known as 'Difference Matting', where a frame of the scene without the subject is subtracted from a frame with the subject. In theory, all you are left with is the subject. In practice, slight lighting differences, shadows and grain makes the difference between the two images unpredictable.
Difference matting is sometimes the only solution available, and it is actually a very useful first-pass that can then be cleaned-up by hand.
Even the best tools can have problems with certain images, and real-world situations often deliver to the compositor plates that are less than perfect in terms of evenness, graininess, and being free from objects that are not intended to be seen in the final composite.
To help deal with these, one almost always creates garbage-mattes around the subject. These are loose-fitting shapes designed to quickly remove problem areas from the scene.
Several of the techniques mentioned will often be used in conjunction with one another, combined together until as flawless a matte as possible results.
You may come across the term 'Premultiplied Image', which refers to an image whose Red, Green and Blue channels have already been multiplied by a matte channel. This is almost always the type of element produced by 3D rendering software. You may also often wish to produce 'Pre-Comped', 4-channel elements that have already been pre-multiplied by their matte channel. You need to understand exactly how your compositing system deals with premultiplied elements. Some systems assume that the 'Over' operator will be fed such images by default, others may require that image and matte are brought in seperately and recombined before the 'Over' is performed. It is very important to understand that the relationship between image and matte can dramatically affect the results of an Over. Let's look at some different scenarios. We'll be using an example with a very soft-edged matte, since this is the most problematic situation.
Consider Example 23. This is a premultiplied image over a background in a system that assumes you're feeding it premultiplied images.
Example 24 is an Unpremultiplied image over a background in a system that assume you're feeding it premultiplied images. Note that the foreground element, in areas where it's matte is supposed to be Zero, is appearing as a 'ghost' image. If you were to check out the math of what is happening, you'll see that in those areas of the result image, it is exactly the same as if we had simply added the two images together.
Example 25 is a premultiplied image over a background in a system that assumes all images sent to it are not premultiplied. Such a system will automatically multiply the image by its matte channel. In this situation, we have effectively multiplied the image by its matte channel twice thereby darkening all areas of soft-edged matte. Consequently, there is a dark halo around the foreground.
As you can see, dramatic problems can arise when you feed an Over something it doesn't expect. The same sort of problems can arise when color-correcting premultiplied elements. In a premultiplied, 4-channel image, the image and the matte channels have a specific relationship that can cause compositing artifacts if altered.
Because inevitably you will find yourself with a need to color-correct a premultiplied
image, there is usually a tool on most systems to temporarily 'undo' the premultiplication,
at least in the critical areas of the image. We will call refer to this tool by
the rather unwieldy name of 'Un-Premultiply'. Essentially, the tool re-divides the image
by its own matte channel, which has the effect (except in areas where the matte is
solid black, or zero) of boosting the areas which were attenuated by a partially transparent
matte, back to approximately their original values. If your system does
not have an explicit tool to perform this operation, you can hopefully use some other tool
to 'fake it'. For example, if there is a simple parsing language, you can do:
R = R/M
G = G/M
B = B/M
(and hope that you don't get divide-by-zero errors).
Once your image has been unpremultiplied, apply any necessary color-corrections. Then you can re-premultiply by the original matte channel, once again producing an element whose image-to-matte relationship is correct.
NOTE: For slight colorcorrections, or when using images which have very hard-edged mattes, you may get perfectly acceptable results without going through this process. "If it looks correct, it is correct.
When photographing an element to be used for visual-effects work, one sometimes has the ability to specify that the camera be unmoving, or 'locked-off' for the duration of the shot. However, it is often not possible, or even desirable, to do this. Multiple shots without camera moves can become boring and lifeless. In situations where the need arises to composite together elements which were shot without identical camera moves, one must resort to tracking.
Tracking is the process of selecting a particular region of an image and determining that region's movement over time (on a sequence of images). This data is stored as a series of moves or positions.
There are a variety of situations where tracking can be used. One reason would be to 'stabilize' the sequence you are working on. Another reason would be the need to synchronize the movement of an object you are adding to the scene, with something already in the scene. (The object in the scene may be moving, or the camera may just be moving relative to the object). Note: Tracking a still element into a moving scene only works up to a certain point. Your still element will not have any of the perspective shifts associated with the movement of a true 3-D object..
Even if the shot was originally intended to be locked off, any number of factors could conspire to produce a plate which is not 'stable'. The camera might have been bumped, jarred, or even just slightly moved by a strong wind.
The specific steps taken when tracking an area of an image vary depending on what software you are using, but the concepts are fairly universal.
First of all, you need to decide on the specific feature of the image that you wish to track. Generally, tracking software prefers high-contrast areas with noticeable variations between colors and light/dark areas. There should be variations along both the X and Y axis.
Certain software may work better if you pre-process your element to increase the necessary qualities. You should try to choose an area that doesn't change shape radically or become obscured by something moving in front of it. You will usually have the ability to specify how large an area to track, as well as specifying a boundary of potential distances your tracking area could move within. Carefully choosing these parameters can greatly increase the accuracy and speed of your tracking session.
Once your area has been tracked, you should have a sequence of either pixel locations or offsets for every frame of the shot. Some software returns the data based on the absolute pixel position on the image you tracked, others will return the data as relative offsets from your original position (now defined to be [0,0]).
If you now wish to have an element exactly match the movement of your tracked area, simply apply these X and Y moves to the new element, along with any additional translations you might need for aesthetic reasons.
When tracking an area in order to insert a new element, it is important to chose a feature in the same depth-plane as what you wish to insert. The reason for this is that the rate at which elements move is decreased based on the increased distance from the camera. When inserting elements into a scene, matching the rate at which the elements move in relation to the background is the key.
If you are using this data to stabilize the plate, then instead of moving an object on each frame, we will move the plate itself in the opposite direction on each frame. We are effectively subtracting the plate's natural motion from itself, to produce a new plate which no longer moves.
Obviously, you will want to track to an element that should be locked in place relative to the camera. It won't do you any good if you try to stabilize a plate by tracking a tree branch that is swaying in the breeze! (The exception, of course, would be if you planned to composite a bird into the scene, sitting on that branch...)
So far, we have talked about one-point tracking. (It is also known as single-point tracking). This will give us the X and Y position for a point on an image so we can attach an element to it or stabilize the image based on that point. Single-point tracking only gives us enough information to deal with simple, overall positional changes. If, instead, we track two different points, we can now derive information about the rotational changes that are occurring between the two points, and by measuring the change in distance between the two points, changes in scale can be computed.
By taking this one more step with 4 point tracking, we now have enough information to calculate simple warps that mimic perspective shifts. We can track the four corners of a moving object and lock an image or element inside those points, creating a match for any transformations caused by object- or camera-moves.
Tools are now becoming available that can take tracking data from even more points and, in conjunction with proper survey-data for the scene in question, recreate the full camera move for the shot. This data can then be fed back to a 3-D animation system which can now render 3-D elements that exactly match the moves from the real scene
Both Film and Video have situations where you have to deal with non-square pixels. D1 digital video images are always displayed with pixels that are 12% wider than they are tall. Film has a few squeezed or 'anamorphic' formats. We'll look at the most common, the Cinemascope (or C-scope) format. Everything we'll discuss applies equally to squeezed film or video, but we've chosen the C-scope format for our example because it is the most dramatic.
C-scope images are shot using a lens that compresses by half along the X axis. When projected, they are unsqueezed by a similar lens. When digitally compositing images that were shot in this format, there are a few different routes one can take. Initially, your elements will probably have been scanned squeezed. And in fact you may choose to do your work directly on these squeezed images. Particularly when dealing with less extreme squeezes, such as video, this may not be a problem. (The caveat comes when it becomes necessary to rotate a squeezed element - we'll discuss this in a moment.)
Working with squeezed C-scope elements can be visually distracting, and properly judging movement is a skill that can take some time to develop. Ideally you could just unsqueeze the images, but in practice this has drawbacks in that, you either must halve your Y-resolution (thereby losing 50% of the information you just scanned), or you must scale your image by 2 times in X (thereby doubling the amount of data to deal with, even though you haven't really created any extra sharpness in your image.)
Therefore, the best solution is to find a compositing system that is capable of displaying the images in an unsqueezed format, while still processing the data in a squeezed format.
Finally, there is the issue of rotation which must be dealt with. Simply layering one squeezed element over another is no problem. If, however, we wish to rotate a squeezed element by 90 degrees before pasting it over a squeezed background, we have suddenly generated an element which is visually squeezed by 50% along the Y axis. The only solution is to first unsqueeze the element, rotate it, and then resqueeze it along the X-axis. Again, the best compositing systems will simply allow you to specify a working aspect-ratio and will automatically deal with the unsqueeze/resqueeze when rotating elements.
Compositing scripts have an uncanny ability to grow excessively, and unless great care is taken to simplify whenever possible, you will end up with a script that is not only incomprehensible, but also full of image-degrading problems.
Avoid excessive layering of similar effects. Consolidate image-processing events whenever possible.
Consider the following simple flow-chart.
We have a Foreground element with several effects layered on it, a Background element with a few different effects on it, and then the first is placed over the second.
Close examination of the script will reveal obvious inefficiencies. First of all, there is a 'Move' operator on the foreground element in two different locations, each offsetting by a certain number of pixels. These pixel offsets can be added together to produce a new, consolidated offset value.
[200,100] + [50,-20] = [250,80]
Also, we have both a Brightness and an RGB-Multiply. If you recall the definition of Brightness, you'll remember that it is effectively the same as an RGB-Multiply, only the same value is applied to all 3 channels equally. Therefore, we really have 2 different RGB-Multiply operators, the first which applies a multiplier of [2.0,2.0,2.0] to the image, the second which applies a multiplier of [1.0,0.25,2.0] to the image. These can be multiplied together to get a new RGB-Scale of [2.0, 0.5, 4.0].
Finally, the 'Blur' operator on both images is the exact same value, so we can simply move it after the 'Over', so that it only needs to happen once, instead of twice.
We've just eliminated 3 operators, producing a faster, more comprehensible, and higher quality script.
Now, let's assume that your supervisor takes a look at the sample test image your script produced and tells you that it looks twice as bright as it should. The tempting solution would be to simply shoot your supervisor in the head and finish the movie yourself. No, wait, that's not right. The tempting solution would be to simply apply a brightness of 0.5 to the top of the script, after the 'Over', as follows:
But a much better solution, particularly in light of how much we're boosting the image in some of the earlier steps, is to apply a Brightness (or RGB-Multiply) to both elements before the 'Over', where they can be consolidated into the existing operators. Thus:
Obviously, not all scripts have such obvious places where you can simplify. You also need to understand the math of various operators before you can make educated decisions on event-consolidation. Remember, many effects can be order-dependant as well; simply changing the order of two operators can radically affect the output image.
Incidentally, if you think that this relentless pursuit of simplified scripts is a bit too obsessive, consider the fact that even medium-complexity scripts can include dozens or even hundreds of different layers and operators.
The best composities are those whose elements were planned and photographed with the explicit intention of creating a composited image.
While this may seem like an obvious, elementary statement, you'd be surprised at how often it is ignored in the real world. In just a bit we'll look at what can be done to help 'fix' improperly shot plates, but first let's look at some things that can be done to make everyone's life easier.
Whether you are planning to integrate elements shot with bluescreen, synthetic CG images, or are just going to soft-split two plates together, it is critical that the different pieces look like they were lit with the same lights.
Lights should hit the objects in the scene from the same angle, have the same apparent intensity, be of the same color, etc. This is a particularly difficult task when shooting foreground elements over blue- (or green-) screen. To obtain the best matte-pulling results, the blue-backing be uniformly lit. This means that we will need to introduce additional lights into the scene to illuminate the backing. You should obviously try to direct the lights at only the backing, and avoid having these lights cast additional illumination on your subject. In practice, this is not always easily achieved. In addition, the blue backing itself may reflect light onto the foreground. This is known as 'blue-spill', and is one of the most difficult problems to deal with in bluescreen photography. Blue spill on the subject not only causes the lighting to be wrong, but it makes the job of pulling a matte off of this subject much more difficult, since the spill areas now have similar coloration to the blue backing we're trying to remove.
We may often add some additional yellow lights on the subject in order to nullify the blue (yellow being the complementary color to blue) but once again we have added lights to the foreground element that were not present in the original photography of the background.
Keep in mind issues such as shadows, flickering lights (from fire or candles, for instance), and even bounce-light which would come from other objects in the scene. If your foreground character is going to be standing next to a bright red flag hanging on the wall, you'll want to make sure that somehow you're going to get a bit of soft red light to hit him in the proper place. If your foreground character is going to be standing underneath a tree in bright sunlight, you'll get a much more believeable composite if you photograph him with some moving leaf-shadows playing across his body.
Almost as important as the lighting is the synchronization of the camera for all elements. Be aware of the camera's positioning relative to the subject, the height of the camera from the ground plane, and ensure that the same size lens is used when photographing all plates.
If your camera moves throughout the shot, either plan on shooting it with a mechanical motion-control move (a device that allows the camera to repeatedly execute the exact same move) or be prepared to do a lot of post-processing tracking to try and duplicate the move. A much simpler solution is to shoot all elements 'locked-off'. In other words, make sure that the camera does not move at all during the shoot. This may include isolating the camera from even slight vibrations, such as ground-shake when a truck drives by.
If possible, shoot all elements with the same type and speed of film (and expose and develop them similarly). Different film speeds have widely different grain characteristics, and the discrepancy can be obvious.
To make it easier to achieve the above, there are a number of things that can be done while on-set. First of all, take as many notes as possible. Someone should be in charge of recording information about the light sources (position, intensity, color and what filters or 'gels' were used), camera (position and lens), and often general set measurements. If you're planning on integrating 3-D elements into the scene, you may want to survey various set dimensions.
All of this anal-retentive behavior is necessary because seldom will you be able to shoot the various elements which go to make up your composite on the same stage. By having all of this information at hand, you will be able to recreate the original lighting faithfully at your new location.
Try to shoot a reference stand-in in the scene. This can be later used as a reference for the compositor who works on the scene to give a good idea of how light, colors and shadows are interacting.
Finally, it's an excellent rule-of-thumb that one should:
Always shoot the least controllable element in a scene first. The other elements can then be adjusted to compensate for anomalies in the first plate.
Having said all this, let me make it perfectly clear that it's very likely that you're not going to get it all. Normally it is the job of the visual effects supervisor to see that plates are shot as well as possible, but the realities (and costs) of location shooting often make it impossible or impractical to set-up everything perfectly. Frankly (and unfortunately), it really is sometimes cheaper to just 'fix it in post'.
Remember, every shot has its own problems, and the true test of a good compositor is their ability to come up with efficient and creative solutions to these issues.
It's very common for visual effects to be mixed with practical effects which require things like gag-wires, harnesses, ropes, and other mechanical devices. It is not always possible to fully hide these items from camera, and consequently digital effects may be needed to remove them. There is limited commercially available software for wire removal, and the general consensus is that this software works well on the easier situations, but the more difficult shots are still going to be done at least partially by hand. At some point there may be no other option but to laborously hand paint out the offending object, frame by frame. Not only is this process time consuming, it requires a very skilled individual to be able to produce something that doesn't exhibit chattering or other artifacts when played at speed.
As mentioned earlier, tracking software should be used, either to stabilize the plate or to 'bounce' the new elements to match the background.
Trying to tie together images whose lighting doesn't match can be one of the most frustrating tasks a compositor can undertake. Sometimes, however, you may get lucky, and certain 'easy' fixes may work. For instance, you may have a scenario with strong side-light coming from the left in the background, and strong side-light from the right in the foreground. Assuming there's nothing in the scene to give away the trick, simply flop (mirror along the Y axis) one of the elements. (Remember, as long as you don't introduce a continuity problem with another shot, it is just as valid to change the background as the foreground).
Troublesome highlights on objects can often be decreased via specific color-corrections and masking.
When your A and B plates were accidentally shot from wildly different positions or with different length lenses, you may need to compensate by moving and scaling the various elements relative to each other.
Even the timing of the action may not initially work correctly. Not only can this be dealt with by 'slipping' the synchronization between the plates (so that frame 1 of element A is combined with frame 30 of element B, for example), but you can also adjust the speed of action by dropping, duplicating or averaging frames.
Ultimately, for certain problems, there may be no other solution but to hand-paint the offending area - possibly on every frame. This is usually somewhat of a last resort, but the fact remains that it is a totally valid and acceptable solution. Sometimes you may be able to hand-paint a fix on a single frame and then use tracking and warping tools to apply this fix to the rest of the sequence.
Depending on how much creative control you may have over a scene, it is sometimes acceptable to simply cover a problem area with something else. The new element should be, like all elements, well-balanced for the scene, and will hopefully not look out of place. Be careful that you don't introduce continuity issues with other shots.
The more time you spend compositing, the more you'll learn about what things are important in order to fully integrate the elements.
There are a number of techniques (and several tricks) that can be used to trigger the visual cues the eye is accustomed to seeing. There are also a lot of common problems that can be easily addressed if they are identified.
First (and foremost), a good compositor should understand how a real camera behaves. You will often need to mimic artifacts and characteristics of shutter, film, and lens. Many of the items mentioned below are related to this issue.
It is rare that every element in a scene is in sharp focus. Depth of field dictates that objects farther or closer than the focus point will grow more and more unfocused. Determine the distance your element should be from that focus point and blur accordingly.
If there is nothing in the background that is the same distance from camera as your foreground element, you will need to make an educated guess. Remember that scenes with less light will often have been shot with a larger aperature, which causes a narrower depth-of-field.
This focus relationship can change over time - known as a rack focus. Be sure to match the animation timing as closely as possible.
Also, keep in mind that a moving object, when recorded on film or video, will have 'motion blur'. This is due to the distance the object moves while the film is exposed (or the video-camera is recording). If you wish to place a moving object into a scene, and that movement is something that you created (either as a 3D element or with a 2D transformation), you should plan on motion-blurring your element. Most high-end systems allow motion-blur to be added to a 2D move or rendered with the 3D element.
In the real world, when a bright light source is shined directly into a lens, you will get a 'flare' artifact. It is often desirable to duplicate this flare when creating an image which has bright light sources that were not present in the original elements. Be careful with this effect, as it has become over-used in much of the CG Imagery being created these days.
Another extremely common mistake with adding pure CG elements to a scene is to ignore the amount of grain that the rest of the plate has. The CGI element will consequently appear far too 'clean'. Even two different live elements may not have the same amount of grain, owing to different film stocks, exposure levels, development processes, and any pre-processing (such as scaling or blurring) which you may have applied. In general it's much easier to add grain to an element than it is to remove it, so for this reason effects photographers try to use the least-grainy film they can get away with. Larger format cameras, such as those referred to as 'Vistavision', allow one to expose a bigger piece of the negative to the scene, thus reducing the relative amount of grain.
Incidentally, a little bit of grain added to an element can do a wonderful job of eliminating the contouring or banding artifacts associated with limited bit-depth images.
If your background scene has flickering or inconstant lighting, (and your foreground plate doesn't), you will need to do your best to match the background. In some cases it may be as simple as adding a fluctuating brightness effect which is synchronized to the background. In other situations you may need to have articulated mattes controlling the light so that it only falls on certain areas. Very complex interactive lighting can be achieved by duplicating your foreground element as a 3D model and applying CG lighting to it. These properly shaped light patches can then be applied to the original 2-D foreground element.
Examine the quality of the edges of objects which are already in the scene and try to match them as closely as possible. Edges can be sharper or softer depending on the amount of backlight an object is getting or how out-of-focus an object is.
Be particularly aware of how edges are behaving over time. What looks acceptable on a still frame may 'chatter' in the moving sequence.
A common novice mistake is to forget the fact that an object should cast a shadow (or several). Many methods of extracting an object from its background do not allow the object's shadow to be brought along. In this case you'll need to create a shadow yourself. It is often acceptable to simply flop the matte of the foreground element and use it to darken a section of the background. Remember to match the rest of the shadows in the scene, in terms of size, sharpness and density.
The same is true for the brightest parts of a scene, and so the rule should really be "brightness ranges should match". Typically, there is less of a problem with people not matching their highlights, partially because, even with live-action images, whites can often reach 100%.
An excellent trick for ensuring that your element's levels are matched to the background is to adjust the monitor brightness to high and low extremes. This will help to bring areas that the human eye is less sensitive to into a range where differences become more obvious.
This step is particularly important because you can never be certain what will happen to your images once you are finished with them. For instance, it is a very common practice to slightly boost the brightness of film images when they are released on videotape, due to video's much lower contrast ratio. Suddenly, areas which looked to be of uniform blackness can reveal hidden detail, including compositing artifacts!
The nature of effects photography often requires that the elements of a composite all be shot with an unmoving, 'locked-off' camera.
While this makes for easier composites, it doesn't really make for interesting (or believable) cinematography. Fortunately, it is possible for the compositor to add camera moves after the plates are shot. This move may be as simple as a nearly imperceptible camera shake, or as complex as a tracked, 3D match-move.
Often there may be fog, smoke or haze in the scene you're wishing to integrate with. The farther away from the camera your element is supposed to be, the more atmosphere would need to be added. Again, examine other elements in the scene that are at the same distance and try to match their levels. If the atmosphere in a scene is very uniform, you can probably get away with just decreasing the contrast in your element and adjusting its color. With more distinctive mist or smoke, you may need to use a separate smoke element and add it in explicitly.
File Format Typical Bits per Matte Zdepth Extension Component ______________________________________________________________________ Alias None or .als 8 bits No * No * Aurora .im 8 bits Yes No Avid OMF none 8 bits No No Explore(TDI) none 8 or 12 bits Yes Yes GIF .gif indexed No No JPEG .jpg 8 bits No No Kodak Cineon .cin 10 bits No No PICT Version 2 .pict 8 bit indexed Opt. No Pixar .pic 8 bits Yes No PostScript .eps 8 bits Yes No Quicktime Movie .mov 8 bits No No Silicon Graphics .rgb or .sgi 8 or 16 bits Opt. No Silicon Graphics Movie .mv 8 bits No No Softimage .pic 8 bits Yes No Targa .tga 8 bits Yes No TIFF Class R .tif 8 bits Yes No Vista .vst 8 bits Yes No Wavefront(image) .rla 1 to 16 bits Yes Yes _____________________________________________________________________ * Stored as a separate file.
Computer Graphics in Visual Effects: SIGGRAPH 1990 Course #17.
The Education of a Computer Animator: SIGGRAPH 1991 Course #4.
Fielding, Raymond., The Technique of Special Effects Cinematography Fourth Edition. Focal Press, Copyright 1985.
Grayscale Transformations of Cineon Digital Film Data for Display, Conversion and Film Recording, Version 1.1. April 12, 1993, Cinesite Digital Film Center.
McAlister, Micheal J., The Language of Visual Effects, Lone Eagle Publishing Co., Copyright 1993.
Newman, Steve, and Ed Marsh, Last Action Hero: The Official Moviebook, pg 123. Newmarket Press., Copyright 1993.
Poynton, Charles A., "Gamma" and its Disguises: The Nonlinear Mappings of Intensity in Perception, CRTs, Film and Video, SMPTE Journal, December 1993, 1099-1108.