Course has not yet started

Practical Computer Vision with PyTorch

Offered by Antonio Rueda-Toicen
Practical Computer Vision with PyTorch

Practical Computer Vision in PyTorch is a comprehensive, hands-on course designed for developers and practitioners eager to explore computer vision using PyTorch. The course covers a wide spectrum of tasks, ranging from image classification and object detection to segmentation and generative modeling. With a strong emphasis on hands-on implementation, the participants will engage in coding demos and build projects using industry-standard tools and libraries. By the end of the course, the participants will be equipped with the skills to build, fine-tune, and deploy computer vision models for real-world applications.

May 7, 2025 - May 21, 2025
Language: English
Advanced, Big Data and AI, Data Science

Course information

Computer vision technologies are transforming industries, driving innovation across sectors such as healthcare, automotive, retail, and media. However, effectively applying these technologies requires deep practical knowledge. This course provides a comprehensive introduction to modern computer vision methodologies using PyTorch, beginning with foundational concepts like convolutional neural networks (CNNs), progressing to advanced architectures including Vision Transformers (ViT), and exploring cutting-edge vision-language models such as CLIP and Grounding DINO.

In addition to technical skills, the course will guide participants in best practices for evaluating and fine-tuning computer vision models, ensuring proficiency in both development and deployment.

Course Structure
Week 1: Fundamentals and Basic Techniques
  • CNNs, optimization, metrics, and embeddings
  • Code demos: Zero-shot inference, visualization of CNN operations, data augmentation
  • Practical challenges on Kaggle
Week 2: Advanced Techniques
  • Vision transformers, object detection, segmentation
  • Vision-language models (CLIP, Grounding DINO)
  • Generative models and diffusion techniques
  • Practical coding and advanced implementation demos
Scope

The Practical Computer Vision with PyTorch course runs for two weeks with a total workload of approximately 8-10 hours. It includes video lectures and interactive coding demonstrations, each accompanied by multiple-choice assessments. Practical Kaggle challenges are integrated into the course structure to reinforce learning through direct application.

All learning materials (videos, coding demonstrations, assessments) are available at the course start. Students are encouraged to engage actively in Kaggle Community Challenges, which serve as practical assessments to test their understanding and skills. Final assessments and community challenges are released at the end of the first week, giving learners one additional week to complete and submit their solutions.

Prerequisites
  • Intermediate AI/ML understanding
  • Proficiency in Python (writing classes/functions)

What you'll learn

  • Fundamental concepts of computer vision and applications
  • Building/training neural networks (CNNs, transformers)
  • Loss functions and optimization techniques
  • Performance evaluation metrics
  • Transfer learning and feature extraction techniques
  • Data augmentation and dataset curation methods
  • Specialized architectures: Vision Transformers, Mask R-CNN, etc.
  • Vision-language models and generative modeling techniques (diffusion models, LoRA)

Who this course is for

  • Students with intermediate AI/ML knowledge
  • Practitioners interested in practical computer vision solutions
  • Developers exploring modern vision architectures (ViT, CLIP, etc.)
  • Hands-on learners comfortable with Python programming
  • AI engineers focusing on generative AI and transfer learning techniques

Enroll me for this course

The course is free. Just register for an account on openHPI and take the course!
Enroll me now

Certificate Requirements

  • Gain a Record of Achievement by earning at least 50% of the maximum number of points from all graded assignments.
  • Gain a Confirmation of Participation by completing at least 50% of the course material.

Find out more in the certificate guidelines.

This course is offered by

Antonio Rueda-Toicen

Antonio Rueda-Toicen helps companies and individuals use artificial intelligence. He has experience developing and deploying machine learning models both in industry and academia. Currently, he is a researcher in the Artificial Intelligence and Intelligent Systems group at the Hasso Plattner Institut. He also works as an AI Engineer at Voxel51, where he leads workshops on practical computer vision skills. Antonio is a certified instructor of deep learning and generative models at NVIDIA's Deep Learning Institute.

Since 2019, Antonio has organized the Berlin Computer Vision Group meetup. He has delivered workshops to over 1,000 participants both in person and online. He mentors students at Berlin's Data Science Retreat, helping them transition into industry roles. He enjoys teaching computer vision, MLOps, and neural networks. As an engineer at HPI's AI Service Center, he co-founded the AI Maker Community to support open collaboration.

Antonio is pursuing a PhD at HPI. His focus is on vision-language models and representation learning. He holds degrees in computer science and bioengineering from Universidad Central de Venezuela. Antonio is passionate about making complex technology accessible and useful for everyone.