You throw a ball across the room and expect a toddler to catch it. Sounds simple, right? But long before the toddler decides to field a pass the ball has moved past him. Most toddlers make their first catch hugging the ball to their chest by the age of four. It takes them few years to master hand-eye coordination, reaction speed and motor skills. Now imagine asking a computer to do this? A computer has to see, describe and understand the request in a tiny fraction of a second before it reacts. Recreating human vision is arguably the most difficult problem facing mankind.
Computer Vision: Visual Understanding of the World
AI pioneer Marvin Minsky instructed his graduate student in 1966 to “connect a camera to a computer and have it describe what it sees.” The summer project never aimed to create computer vision system and it was too ambitious for its time. 50 years later we are still finding answers. Why is vision so difficult?
In part, it is because vision is an inverse problem as Research Scientist and founding Director of the Computational Photography group at Facebook, Richard Szeliski states in his book ‘Computer Vision: Algorithms and Applications.’ Since many problems in computer vision are inverse problems they require researchers to estimate quantities from the noisy input data. To estimate unknown data the scientists have to rely on machine learning techniques to learn probabilistic models from large amounts of training data.
Computer Vision is a field of Artificial Intelligence that aims at providing computers with a visual understanding of the world. Today computer vision is used widely for scene reconstruction, event detection, video tracking, object recognition, 3D pose estimation, motion estimation, learning, indexing, and image restoration.
Last year Amazon expanded its line-up of its pre-trained machine learning tools with the launch of Amazon Rekognition Video which brings the same advanced computer vision techniques used by Amazon to developers. Amazon is already using the technology to analyze billions of images and videos daily.
Amazon Go the brand’s unmanned store uses Amazon Rekognition to track a person even if they are not in the frame using a proprietary technique called Skeleton Modelling. Skeleton Modelling combined with sensors on the shelves, machine learning and Amazon Go app allows customers to pick what they want from the store and simply leave the store, with the Go app knowing what you have taken with you.
Similarly, Apple uses advanced computer vision techniques to find and recognise the position of 2D images such as signs, posters, and artwork, ARKit can integrate these real-world images into AR experiences such as filling a museum with interactive exhibits or bringing a movie poster to life. Facebook is using computer vision to get a deeper understanding of its vast library of digital images and videos.
Computer Vision: Applications for Marketing
The most discussed application of computer vision has to be self-driving cars. Early research predicts that autonomous cars will become hubs for marketing and advertising to a captive audience. But applications of computer vision are not just limited to self-driving cars.
Leading social listening tools like Crimson Hexagon, BrandWatch and Talkwaker are using computer vision for social media image analysis. With over 3 billion images shared every day on social media, this new technology helps brands discover identify brand mentions, visual influencers, product issues and even track sponsorship ROI. If you are interested in understanding how this technology works, you should read our earlier blog post on visual listening.
The other application of computer vision include augmented reality, smarter online merchandising, real-world product and content disvovery and many more.
If you are interested in leveraging the capabilities of computer vision, you should start by reading about both Google Cloud Vision and Amazon Reckognition. Similarly, if you are interested in learning about deep learning frameworks then we recommend reading our article on ‘Getting started with Deep Learning Frameworks.’