Lesson 1 of 5·10 min read

What Is Computer Vision?

Computer Vision (CV) gives machines the ability to understand and interpret visual information — images, videos, 3D scans, and live streams. In 2026, CV is no longer a future technology but a mature tool with concrete business impact.

Image Classification

The Basic Problem

Image classification answers a simple question: "What's in this image?"

  • Binary classification: Good/bad, defective/OK, cat/dog
  • Multi-class: Product A / Product B / Product C / Unknown
  • Multi-label: An image can have multiple labels (e.g., "outdoor + sunset + mountains")

How It Works

Modern image classification uses Convolutional Neural Networks (CNNs) and increasingly Vision Transformers (ViT):

  1. Feature extraction: The network learns to recognize visual features (edges, textures, shapes, objects)
  2. Hierarchical abstraction: Low levels = edges; high levels = complex objects
  3. Classification: Softmax layer assigns a probability per class

State of the art 2026: Models like DINOv2, EVA-02, and SigLIP achieve over 90% top-1 accuracy on ImageNet. For custom domains, often 100–500 labeled images suffice for fine-tuning with 95%+ accuracy.

Business Applications

  • Product recognition: Automatic categorization in e-commerce
  • Damage detection: Automatically assess insurance claims
  • Medical imaging: Skin cancer screening, X-ray analysis
  • Agriculture: Detect plant diseases via drone imagery

Object Detection

Beyond Classification

Object detection answers: "What is where in the image?" — with bounding boxes around each detected object.

Algorithms:

  • YOLO (You Only Look Once): Real-time detection, currently YOLOv9/v10
  • DETR (Detection Transformer): Transformer-based, very accurate
  • RT-DETR: Combines DETR accuracy with YOLO speed

Speed: YOLOv10 processes up to 600 images per second on modern GPU hardware — ideal for real-time applications.

Business Applications

  • Retail: Shelf monitoring (which products are missing?)
  • Logistics: Package counting and sorting
  • Security: People counting, access control
  • Automotive: Pedestrian, vehicle, and sign detection

Segmentation

Pixel-Level Recognition

Segmentation goes even further: Every pixel is assigned to a class.

Types:

  • Semantic segmentation: All pixels of one class (e.g., all "road" pixels)
  • Instance segmentation: Distinguishes individual objects of the same class (Person 1, Person 2, Person 3)
  • Panoptic segmentation: Combines both approaches

State of the art: SAM 2 (Segment Anything Model 2, Meta) can segment any object in images and videos — without specific training. A foundation model for segmentation.

Business Applications

  • Medicine: Mark tumors pixel-precisely in MRI images
  • Autonomous driving: Separate road, sidewalk, obstacles
  • Manufacturing: Precisely locate defects on surfaces
  • Agriculture: Distinguish weeds from crops for precision spraying

Development and Deployment

The CV Stack 2026

  • Frameworks: PyTorch (dominant), TensorFlow, ONNX for deployment
  • Platforms: Roboflow, Encord, V7 for labeling and training
  • Edge deployment: NVIDIA Jetson, Intel OpenVINO, Apple CoreML
  • Cloud APIs: Google Vision AI, AWS Rekognition, Azure Computer Vision

Key takeaway: Computer vision is no longer a research project. With pretrained models and modern tools, companies can build production-ready CV solutions in weeks — not years.

📝

Quiz

Question 1 of 3

Was unterscheidet Instance Segmentation von Semantic Segmentation?