Scientists Teach AI Cameras to See Depth in Photos Better
An oft-underappreciated feature of human sight is the ability to perceive depth in flat images, and while computers can "see" thanks to camera technology, that kind of depth perception has been a challenge -- but perhaps not anymore.
Humans can naturally see depth in photographs from a single point of view, even if the image is flat. Computers struggle with this, but researcher scientists from Simon Fraser University in Canada have developed a way to help computer vision see depth in flat images.
“When we look at a picture, we can tell the relative distance of objects by looking at their size, position, and relation to each other,” says Mahdi Miangoleh, an MSc student who worked on the project. “This requires recognizing the objects in a scene and knowing what size the objects are in real life. This task alone is an active research topic for neural networks.”
The process is called monocular depth estimation and it involves teaching computers how to see depth by using machine learning. Basically, the process uses contextual cues such as the relative sizes of objects to estimate the structure of the scene. The team started with an existing model that would take the highest resolution image available and resize it for the model, but this method had largely failed to provide consistent results.
The team realized that while neural network technology has advanced significantly over the years, they still have a relatively small capacity to generate many details at once. Additionally, the networks are limited by how much of the scene the network can analyze at a time. It's why feeding just a high-resolution image into a neural network wasn't able to work properly in previous attempts at depth mapping.
The team explains that during the process, they found that there were strange effects in how the neural network saw images at different resolutions. High-resolution images would result in details of subjects being visible, but some of the depth structure would be lost. Low-resolution images would lose the detail, but that depth information would return.
The researchers explain that a small network that generates the structure of a complex scene cannot also generate fine details. At a low resolution, every pixel can see the edges of a structure in a photo, so the network can judge that it is, for example, a flat wall. But at a high resolution, some pixels do not receive any contextual information due to the limitations of the network's processing capability. This results in large structural inconsistencies.
So rather than rely on one resolution, the team decided to build a method that uses multiple.
“Our method analyzes an image and optimizes the process by looking at the image content according to the limitations of current architectures,” explains PhD student Sebastian Dille. “We give our input image to our neural network in many different forms, to create as many details as the model allows while preserving a realistic geometry.”
The team published its full report here and produced an easy-to-understand video explanation above.
#news #software #technology #ai #artificialintelligence #artificialintelligencecamera #computervision #depth #depthestimation #machinelearning #monoculardepthestimation
Engineer Makes an AI Camera Sprinkler to Keep People Off His Lawn
To keep people from walking on his lawn and inhibiting its growth, Inventor and YouTuber Ryder of the YouTube Channel Ryder Calm Down built a people-detecting smart camera rigged to a sprinkler system that would spray those who got too close.
Ryder explains that the city where he lives inexplicably removed a chunk of sidewalk from the front of his house, leaving only unsightly dirt.
"About a year ago the city dug up the sidewalk in front of my house and I'm not sure why," he says. "So now there is a whole bunch of dirt where the sidewalk used to be and people keep walking over top of it. I'm trying to plant grass, but it doesn't grow when people keep walking on it."
Rather than relying on a sign, Ryder decided to take a smart camera system and rig up a different kind of "solution."
As previously reported, Ryder had developed a artificial intelligence-powered camera in the past which he trained to recognize dogs and, once one was detected, yell compliments at it and its owner. That system was built on the Raspberry Pi camera and a Raspberry Pi, which worked together to analyze subjects using a pre-programmed machine learning system that was able to recognize about 80 different objects including people, cars, and dogs.
Ryder applied the same idea to a Wyze camera running custom firmware which he used to recognize a host of objects including people. When they were detected, it would activate a sprinkler that he believed would deter the people from walking on his growing grass.
"There is no teacher quite like fear," Ryder says, jokingly. "So I'm using artificial intelligence to turn my sprinkler on only when people walk on my lawn."
Is this solution ethical? No, and Ryder recognizes as much. He calls it a "highly unethical waste of his engineering degree."
"I made this for entertainment only. The people you see in the video, even the ones portrayed as the public, have all agreed to be in the video prior to filming," he writes in his video's description. "I wouldn't recommend you actually build anything like this yourself, I just wanted to make a video that I thought was entertaining."
Even though the solution isn't one he actually depoyed for obvious reasons, the application of machine learning through intelligent camera systems is still impressive and, of course, entertaining.
More of Ryder's inventions and videos can be found on his YouTube Channel.
#culture #technology #ai #aicamera #artificialintelligence #artificialintelligencecamera #objectrecognition #rydercalmdown