• Big Purple Clouds
  • Posts
  • Explainer Series #5 - Computer Vision - How AI Systems Identify and Categorise Images and Video

Explainer Series #5 - Computer Vision - How AI Systems Identify and Categorise Images and Video

BIGPURPLECLOUDS PUBLICATIONS
Explainer Series #5 - Computer Vision - How AI Systems Identify and Categorise Images and Video

Computer vision is a field of artificial intelligence (AI) that trains computers to interpret and understand visual data like images and videos. It is an extremely active area of research and development, with applications across many industries including self-driving cars, facial recognition, medical imaging, robotics, and more. In this blog post, we’ll explore some of the key techniques used in computer vision and how AI systems are able to identify and categorise images and video.

How Computer Vision Works

Computer vision relies on pattern recognition and deep learning algorithms to make sense of pixel data from images and video frames. The basic workflow involves feeding image data into a neural network and training it to recognise patterns and features that are relevant for a specific task. For example, a system designed to recognise dogs would be fed thousands of images of dogs during training so it can learn the visual patterns that characterise dogs (like fur, snouts, tails, etc).

Some of the major components and techniques in computer vision include:

  • Image classification - Categorising an image into a specific class or label (e.g. dog, cat, table, car etc). This involves feeding image pixels into a neural network that has been trained on labelled image datasets.

  • Object detection - Identifying where objects are located in an image and drawing bounding boxes around them. Object detectors can both classify objects and locate their position in the image.

  • Image segmentation - Partitioning an image into multiple segments or objects. This allows the system to separate objects like cars and roads in a street scene image.

  • Facial recognition - A specialised application of image classification that identifies faces in images and video. Facial recognition systems extract visual features like eyes, nose, and mouth from faces.

  • Optical character recognition (OCR) - Identifying and extracting text from images and documents. OCR enables computers to read text from scanned documents.

  • Pose estimation - Detecting the positioning and orientation of objects in 3D space. This is critical for applications like body tracking.

  • Image generation - Using generative adversarial networks (GANs) and other deep learning techniques to synthesise realistic images and videos.

To perform these tasks, deep convolutional neural networks (CNNs) are most commonly used in computer vision. CNNs analyse image pixels through a series of processing layers, each of which detects different visual features. Through extensive training on labelled image datasets, CNNs build up hierarchical representations of image content.

Key Datasets for Image Recognition

The breakthroughs in computer vision over the past decade have largely been fuelled by the availability of large datasets of labelled images. Here are some of the most important public image datasets used to train vision AI systems:

  • ImageNet - A dataset with over 14 million hand-annotated images organised into 20000 categories. ImageNet helped catalyse the current AI boom in computer vision.

  • MNIST - A database of over 60,000 images of handwritten digits commonly used to train image classification systems.

  • COCO - The Microsoft Common Objects in Context dataset contains over 200,000 labelled images of common objects in their natural context.

  • YouTube-8M - A large-scale labelled video dataset with over 8 million YouTube video IDs and labels covers thousands of classes.

  • Face recognition datasets - Databases like MegaFace, VGGFace2, and CelebA are used to train facial recognition systems.

  • Medical imaging datasets - Publicly available datasets of X-ray, MRI, and CT scan images to train systems for medical diagnoses.

Real-World Applications of Computer Vision

With the rapid progress in algorithms and available training data, computer vision has become highly accurate and enabled many impactful real-world applications, including:

  • Self-driving cars - Computer vision is the backbone of autonomous vehicle technology, allowing cars to interpret objects, lane markings, signs, and lights.

  • Security and surveillance - AI-enabled camera systems can identify people, objects, behaviours and activities. This has applications in monitoring public spaces or facilities.

  • Healthcare - AI is automating the analysis of medical images to detect disease, diagnose conditions, and support procedures like surgery.

  • Facial recognition - User verification, photo tagging, surveillance systems, and human-robot interaction all leverage computer vision for facial recognition.

  • Manufacturing - Computer vision guides robots, detects defects in products, and performs automates quality assurance in factories.

  • Retail - Amazon Go's checkout-free stores rely on computer vision to track products and detect what users take.

  • Augmented reality - AR overlays digital information on real-world images and video. Computer vision enables the integration of the real and digital.

  • Photo and video editing - AI techniques power intelligent editing apps that can apply filters, create animations, and automatically tag people.

Computer vision has become one of the most transformative applications of AI. It is revolutionising how machines perceive, interpret and interact with the visual world. While there remain many open challenges and areas for improvement, the field has seen remarkable progress unlocking a wide array of useful applications that are benefiting consumers, businesses and society.

The Big Purple Clouds Team

CONTACT INFORMATION
Need to Reach Out to Us?

🎯 You’ll find us on:

📩 And you can now also email us at [email protected]

BEFORE YOU GO
Tell Us What You Think

Reply

or to participate.