A dog’s purpose can take on new meaning when humans strap a GoPro camera to her head. Such “dog cam” video clips have helped train computer vision software that could someday give rise to robotic canine companions.

The idea behind DECADE, described as “a dataset of ego-centric videos from a dog’s perspective,” is to directly model the behavior of intelligent beings based on how they see and move around within the real world. Vision and movement data from a single dog—an Alaskan Malamute named Kelp M. Redmon—proved capable of training off-the-shelf deep learning algorithms to predict how dogs might react to different situations, such as seeing the owner holding a bag of treats or throwing a ball.

“The near-term application would be to model the behavior of the dog and try to make an actual robot dog using this data,” said Kiana Ehsani, a PhD student in computer science at the University of Washington in Seattle.

Dogs became the research candidate of choice for modeling “visually intelligent agents” because they have simpler behaviors than humans. But they also display more complex behaviors than many other animals through their social interactions with other dogs and humans.

This research was backed with funding from the U.S. National Science Foundation and the Office of Naval Research. Ehsani and her colleagues at the University of Washington and the Allen Institute for AI (AI2) published the details of their work in a 28 March 2018 paper uploaded to the preprint server arXiv [pdf].

The data collection process went beyond just putting a GoPro camera on the Alaskan Mamalute’s head. Researchers also attached motion trackers to the dog’s body and joints to record each body part’s absolute position and the relative angle of the dog’s limbs and body. The trackers collected an average rate of 20 readings per second and were synchronized through a system based on a Raspberry Pi 3.0 computer.

“We needed the setup to be robust to the dog's movements and shakings and also light-weight and comfortable for the dog such that there is no interference in her normal behavior,” Ehsani explained.

That combination of dog vision and movement data allowed the researchers to train the deep learning algorithms to do more than just anticipate a dog’s future actions. It also enabled them to predict the appropriate sequence of dog limb and joint movements that get a dog from point A to point B: an early step toward programming a robotic dog to perform the same motions and behaviors.

The idea of modeling the overall behavior of “visually intelligent agents” differs from traditional computer vision research that trains machine learning algorithms—including more specialized deep learning algorithms—on very specific vision tasks such as object detection or body pose estimation. Such tasks are usually “easier to evaluate and more well-defined” than the more complex doggy behaviors modeled in the recent study. But they also tend to represent proxy tasks rather than directly representing the way intelligent beings actually view the world. 

Another downside of the traditional approach to computer vision research is that it often requires huge training datasets consisting of many thousands or even millions of images. Each image or video frame in the dataset must be painstakingly hand-labeled by humans to teach algorithms the difference between correct and incorrect examples.

By comparison, Ehsani and her colleagues had no need to manually label the vision and movement data in their doggy dataset. They even showed that data from a single dog’s behavior can train algorithms to perform more generalized computer vision tasks: a demonstration of using dog behavior to supervise representation learning in algorithms.

The more impressive part of the demonstration involved training ResNet-18—a deep learning model for image recognition—on both their dog dataset and a standard ImageNet classification dataset in order to accomplish a particular computer vision task. The task, known as “walkable surface estimation,” required the deep learning model to figure out what parts of any given image represent a walkable surface such as a carpet or floor area. 

When applied to that task, the deep learning model trained on the dog dataset outperformed the same model trained on the ImageNet dataset by about three percent. That may not sound like much, but it’s important to consider that ImageNet consists of 1 million images that were carefully hand-labeled by humans through a time-intensive process. The dog dataset contains just 20,000 images that did not require any manual labeling.

This still represents more of a proof-of-concept demonstration than anything else. The researchers already have big plans to expand their testing beyond their first doggy subject so that they can compare the behaviors of many different dogs and try to find more generalizable patterns.

“One of the core problems of AI is to come up with generalizable models, and generalization requires collecting data from multiple dogs in many different situations,” Ehsani said.

The researchers plan to eventually develop a robot dog based on real-world doggy behavior. But first, they want to collect data from more dog breeds and capture interactions among multiple dogs. That will likely mean more volunteers rigging their dogs with cameras and motion trackers in the name of science.

“This can be scaled up to more dogs because the whole setup is adjustable for different sizes and breeds of dogs,” Ehsani said. “There isn't going to be any additional costs for the dog owners as well: They just need to attach the sensors and start playing with their dog, or go for a walk, and the setup will do the data collection automatically.”

Source: IEEE Spectrum