High-Speed Cameras and Slow-Motion Video Terms and Concepts
The term pixel, derived from the words picture and element, is used to represent the smallest element of a digital image, one dot. The term may be used to describe three very different but related things:
- A high-speed CMOS sensor comprises millions of pixels on one integrated circuit, or chip. Each pixel is made up of five to seven tiny transistors and is capable of converting light into an electrical charge that is initially stored, then read out to an analog to digital converter to produce a digital image. In this case a “pixel” is a physical element in an electronic component.
- Once digitized, pixel values read out of the sensor may be stored in electronic memory or mass storage device as part of an image file, or transmitted for display, or some combination of both. Digitized pixels have numerical values such that dark areas of an image have low values and bright areas have higher ones. In this case a “pixel” is a numeric value that is stored on some sort of electronic media.
- Finally, the pixel values are converted back to visible light by the pixels of a digital display such as a computer monitor, television, cell phone, VR headset, etc. In the last case, the pixel is some sort of light or an array of lights, whether a minuscule element of a computer display or a light bulb in an enormous stadium display.
- 1- bit pixel has 21 (2) values (black and white)
- 2- bit pixel has 22 (4) values (black, two intermediate shades, white)
- 3- bit pixel has 23 (8) values
- 8-bit pixel has 28 (256) values
- 10-bit pixel has 210 (1024) values
- 12-bit pixel has 212 (4096) values
- 1- bit has 21 (2) “raw” values and 23 (9) colorized values
- 2- bit pixel has 22 (4) “raw” values and 43 (64) colorized values
- 3- bit pixel has 23 (8) “raw” values and 83 (512) colorized values
- 8-bit pixel has 28 (256) “raw” values and 2563 (16,777,216) colorized values
- 10-bit pixel has 210 (1024) “raw” values and 10243 (1,073,741,824) colorized values
- 12-bit pixel has 212 (4096) “raw” values and 40963 (68,719,476,736) colorized values
Maximum Resolution (max vert-res x max horiz-res) x Maximum fps (@ max resolution)Thus, if the maximum resolution is 1280 x 1024 and the maximum frame rate at that resolution is 1000fps, then the maximum pixel rate is 1280 x 1024 x 1000, or 1,310,720,000 pixels/sec, or 1,311Mp/sec.
Most high speed digital cameras have global electronic shutters, for which “global” refers to the design where all pixels begin and end their light-to-charge functions in unison. This is a very specialized operation and is one of the ways the bone-fide high-speed camera is very different than a standard video camera with slow motion features. It requires a more complex pixel design whereby immediately following integration (exposure) the charge on each pixel is transferred to a non-light-sensitive structure within the pixel where it is held until it can be read, thereby freeing the pixel to begin integration for the next frame.
Most digital cameras use “rolling” electronic shutters whereby pixels are read off immediately after exposure but are not all read simultaneously. This means that actual time of the exposure is not the same for the frame and any movement that has occurred as the sensor is being read may show progressively across every image.
The classic image comparison between global and rolling shutters are pictures of fan blades in which the blades appear to shift to one side of the fan. The image to the left, was taken with an iPhone, the image on the right was taken with a Fastec IL5 with a 60-microsecond shutter. Both the fan and the wheel were rotating at the same speeds in both images and the same illumination was used.
For a standard movie camera, the movement across the field of view is expected to be slightly blurred to give it a realistic look on screen. After all, the moving picture is really a series of still images projected quickly enough to trick the eye. Movie cameras often use the term “shutter angle” for shutter controls, hearkening back to the days when the shutter was a rotating disk, synchronized to rotate 360⁰ with each frame. The operator would set the angle for which it would open to expose the film. The normal setting was 180⁰, meaning basically that the exposure time would be 1/2 the interval of the frame rate. This setting was chosen as it gave the most natural look. Smaller angles were used to add intensity to the motion for action scenes.
For motion analysis purposes, however, it is better to have a crisp and clear undistorted image for frame-by-frame review. To do this, the exposure often needs to be quite short to ensure that the object of interest does not move much during the exposure, causing motion blur.
Motion blur can be calculated if the speed at which an object is moving, the dimensions of the field of view, the pixel resolution, and the duration of exposure are known. The frame rate is not important. For example, if an automobile is traveling at 30mph across a field of view that is 50 feet across at a horizontal resolution of 1920 with a shutter duration of 100 microseconds:
- We can compute how far the car will travel during the shutter time of 100ms,
• 30mph = 44 feet/sec. 100ms is .0001 seconds, so the car goes .0044 feet in 100ms
- and then how far in relation to the field of view, 50feet
• .0044/50 = .00088
- and then how many pixels of the 1920 across the image does that represent:
• .00088 x 1920 = 1.68
The motion blur in pixels is, thus, about 2 pixels. This number is directly proportional to the shutter duration, so a 1millisecond shutter would yield 16.8 pixels of blur. The acceptable amount of blur depends on the size and complexity of the object to be tracked and the accuracy required.
The IL5 image, below to the right was taken of a 4” 50Hz fan with a 60ms shutter. It has extremely little motion blur compared with the iPhone image to the left.
Exposure in a digital camera system can be defined as the amount of light used for each frame. There are three factors that affect exposure: 1) illumination, the source of light projected onto the sensor, reflected by or emitted from the object to be imaged; 2) the transmittance of the lens, which can be controlled by adjusting the lens aperture; and 3) the integration or “shutter” time of the camera, which is the length of time the sensor is actually collecting light for each frame.
The ideal exposure for any image sensor is dependent on its light sensitivity and remains a constant at any given level of gain. Generally, image quality, especially with respect to noise and dynamic range, are adversely affected as gain is added.
There are tradeoffs as well with the mentioned factors for affecting exposure:
- Adding or adjusting illumination is often necessary to get the best images. This has been elevated to an art for serious photography and movie making. The balance here is one of effort. Proper illumination of a scene can be a painstaking process for which there may not be the time or will. There are many applications for which adding enough light to facilitate getting an ideal exposure is not possible.
- Adjusting lens aperture affects Depth of Field, DOF, which is the depth to which objects will appear in focus. As a lens is “stopped down” to transmit less light, the DOF increases, which is usually beneficial for high-speed shots. Conversely, the more open the aperture is, the shallower the DOF becomes. For more information please refer to the tutorials DOF calculator and DOF vs Pixel Size.
- Adjusting the Shutter duration affects the exposure. The maximum exposure for any given frame rate is an interval slightly less than 1/frame rate. When imaging objects that are moving quickly through the field of view, it is often necessary to reduce the shutter duration to minimize motion blur. The amount of motion blur that is tolerable for a given application is dependent on the shape of the object to be imaged and the clarity required to analyze its behavior.
The recording time of a high-speed video camera is depended on the resolution and frame rate selected and the amount of storage medium available. Fastec cameras may use either high-speed RAM or a high- speed SSD for recording medium. For record times of Fastec cameras, please refer to the datasheets. The most basic approximation for record time is:
Storage Space / ((Horizontal Resolution) X (Vertical Resolution) X (bit depth) X Frame Rate))
There are some caveats:
- Not all storage media can record the full bandwidth of the camera
- There is some small amount of metadata saved with each file. This is practically negligible for megapix- el images, but as the image size decreases, the metadata becomes an increasingly important factor.
Most cameras are simply triggered whenever an image is to be captured. For example, with a still camera, the shot is composed, and then the photographer waits for the right instant and then triggers the camera by pressing the shutter button or a remote switch. For a video camera, a trigger is sent to begin recording, then another is sent to end it.
High speed events, however, may occur too quickly for the photographer to respond. One strategy to ensure capturing a high-speed event is to begin recording well before the event is expected and continue recording until the event has occurred. The problem with this strategy is that because of the high frame rates, the amount of video that is saved may be prohibitively great. There may not be enough memory to store it all, or it may just produce much more data than can be reviewed.
The most common way to accommodate this triggering issue in a high-speed camera is with the use of a circular buffer and what we call an “end” trigger. The
circular buffer is set to record some amount of video, usually in seconds, but much longer intervals are possible when recording to an SSD. The buffer will record a pre-set period of time, then begin writing over the oldest images while preserving new ones. For example, the buffer could be set to record for 10 seconds before it “wraps.” When the camera gets a trigger, it simply stops recording. If a trigger can be sent to the camera within 10 seconds of the event, it will be captured.
Often video from both before and after an event may be useful. In this case we can move the Trigger Position to any spot in the buffer we want. Using the above example with a trigger set for 50% of the buffer, we would always record 5 seconds before and 5 seconds after a trigger.
Human vision comprises electromagnetic waves between about 400 and 700 nanometers in length. Starting with 700nm and working backwards, we have the colors of the rainbow: red @ 600-700nm and violet @ 400nm, with orange, yellow, green, and blue in between. The peak human response to light is at about 550nm, which is in the middle of the green band, as seen on the response curve.
Notice the curves for red, green, and blue cones. These three types of photo receptors give us our ability to see up to about 10,000,000 hues, although some of us have much greater color acuity than others. There is a much wider disparity among animals. Some have up to 15 different types of photo receptors, usually for highly specialized narrow bands, while other animals are considered color blind.
CMOS sensors used for most digital cameras are sensitive to a wider band than humans and most animals, generally from about 300nm to 900nm, as depicted by the black line on the response curve, above.
This discrepancy in the response of camera sensors vs human and animal sight poses opportunities as well as challenges.
The greatest opportunity comes in the form of using mono cameras with illumination in the Near Infra-Red, NIR, for applications that require intense illumination, but cannot disturb test subjects. Many studies of zebra fish, bats, some insects, etc. are done using this technique as these animals are not sensitive to NIR.
More challenging is the reproduction of color by the camera. This is basically a three-step process:
- A filter to eliminate light outside the visible spectrum is placed between the lens and the sensor. This is usually an IR cut filter, which limits light above 650nm. It is necessary because, while there is often a lot of light present in this range, humans are blind to it, so it cannot be used when we are trying to replicate the human experience. An example of this is that there is often a great deal of NIR reflected by certain foliage. Without the IR cut filter a deep green tree might appear to be bright red to the camera!
- A Bayer filter mosaic is applied to color sensors. The Bayer filter mosaic or color filter array, CFA, is the placement of a red, green, or blue filter atop each pixel of the sensor. These are arranged in green/ blue/green and red/green/red rows, such that 1/2 of all pixels are green, 1/4 are red and 1/4 are blue. In this way each pixel responds only to the band of illumination that is roughly equivalent to what the red, green, and blue cones of the human eye would see.
- Color interpolation is done on a per-pixel basis to render red, green, and blue, values for each. For each pixel, its value as well as the values of surrounding pixels of other colors are used to yield these “RGB” values.
While there are cameras that use multiple sensors, each with its own color filtration, for the creation of color, and there are some sensors that have the ability to collect RGB data on a per-pixel basis, the vast majority of color digital cameras use the above techniques. There are some obvious negatives when using color vs mono cameras:
- Almost half of the light to which the sensor is responsive is eliminated by the NIR cut filter. And for each pixel, much of the remaining light outside its own color band is eliminated. This usually results in color cameras having less than 1/2 the sensitivity of mono cameras, depending on the illumination spectrum.
- Interpolation algorithms are optimized for daylight. While there is high-quality white and daylight balanced artificial illumination available, getting good color reproduction in most artificial lighting is a challenge.
- In order to produce RGB values for each pixel, the surrounding pixels must be used, which means that the spatial resolution for the color camera is not as fine as that of the mono camera. That is not to say that extremely sharp color images are not possible, but mono images are generally sharper.
- High contrast edges, will not yield fine enough color information for the interpolation algorithms to resolve, resulting in color aliasing. For example, you will often see red, green, and blue smudges in images of black and white text.