For our end of the semester project, we explored the capabilities of a Microsoft Kinect. First of all, the specs of the Kinect are as follows:
- Color and Depth-Sensing Cameras
- Multi-Microphone Array
- Tilt Motor
- Field of View
- Horizontal field of view: 57 degrees
- Vertical field of view: 43 degrees
- Physical tilt range: ± 27 degrees
- Depth sensor range: 1.2m – 3.5m
- Data Streams
- 320×240 16-bit depth @ 30 frames/sec
- 640×480 32-bit colour@ 30 frames/sec
- 16-bit audio @ 16 kHz
To do anything interesting with the data returned by the Kinect, specifically the RGB and depth frames, we researched another library called OpenCV originally built by the open source branch of Intel (along with numerous other researches) but now primarily maintained by robotic entusiast company, Willow Garage. The code is hosted on a brand new site (specifically on 5/7/12) here while their main wiki and documentation is here. The wiki page is being redone at the time of writing so this link may be broken in the future. This library covers an extremely huge range of image processing tasks that could definitely not be covered in a single semester, so we chose one algorithm to analyze that was both feasible to learn about and get implemented quickly.
Basically, when 'tracking' is turned on in our program it records a history of N (2 in our case as the frames per second from the Kinect was lower than expected) images. The absolute differences between these pictures gets accumulated, segmented into connected components (aka the region of interest), and small components due to noise get thrown out. From this, it is then simle mathamatics to get the x and y location in the plane of the camera screen. From the motion history, it is also possible to determine the angle that the connected component is traveling in the plane of the camera screen, but this has slightly erratic behavior and should not be trusted for more than determining simple directions such as left vs right. Finally, knowing the region of interest (ROI), one can simple average over that region in the depth image to get a rough estimate for the depth of the object in interest. In fact, this program can be used to track multilple objects, but there would have to be algorithms to differentiate between the objects and to guard against the case where two or more distinct objects are close and are considered one object by the tracking algorithm.