AI and machine learning are providing a timely boost to the fields of computer vision and computer graphics, enabling computers to massively improve their visual understanding of the world, and to become vastly better at creating new virtual worlds and their depictions. Applications within the field include traditional digital media such as VFX and games, as well as computational design & fabrication, VR/AR/MR, visual understanding and reasoning, multi-modal learning, video analytics, controlled image/video generation, training simulators, context-aware devices and robots, sports analytics, and much more.
Visual computing is of significant importance to our local BC digital media eco-system which is experiencing dramatic growth and currently includes over 600 digital media (VFX, games, VR, etc...) companies with $2.3 billion in annual revenues.
At UBC we have developed specific strengths around:
- End-to-end architecture design and training of deep neural networks for variety of visual tasks. Including in large-scale supervised, semi-supervised and weakly-supervised settings, where only limited amount of labeled training data is available.
- Building AI agents capable of leveraging, modeling and learning from multi-modal (visual, lingual, audio, geometric & photometric) data.
- Leveraging AI in support of the design of shapes, motions and imagery, where the algorithms need to integrate a comprehensive mix of available data, human input to the creative process, and functional design criteria.
Modeling Skilled Movement for Humans, Animals, and Robots: Learning data-driven and/or physics-based models capable of producing high-quality, real-time animated or simulated movements, with applications to simulation & visualization, VFX, games, simulated robotic skills, ergonomics, and biomechanics.
Detecting and Transferring Style: Quantifying notions of style and aesthetics across diverse domains, including fashion, architecture, and interior design via combination of domain knowledge and learning from human perception studies.
Perception Driven Shape Modeling: Modeling algorithms that mimic human perception of shape by combining active learning with Gestalt principles, supporting intuitive modeling and editing of 2D and 3D content.
Intelligent Animation Tools: Artist-driven toolsets that augment or replace the low-level processes that are currently a principal bottleneck in the production of VFX, games, and other graphics applications. New tools leverage learning-based approaches to support efficient creation and editing of motion, with desired styles and constraints.
Neural rendering: Efficient photorealistic rendering of skin and clothing by augmenting low-fidelity renders with learned details.
Self-supervised representation learning: Self-supervision leverages real but unlabelled images for disentangling the intrinsic scene geometry and appearance. The resulting structured, low-dimensional representations enable supervised training on orders of magnitude smaller datasets, thereby mitigating laborious manual annotation.
Mobile sensors: Deep learning has revolutionized the reconstruction of 3D geometry from 2D images. We explore alternative sensing modalities, such as disentangling echoes in time series recorded with acoustic cameras; integrating instantaneous measurements from event cameras, which count single photons; and combined sensing with accelerometers (IMUs).
Anomaly detection and re-identification: Deep neural networks are known to be overly confident when exposed to test examples that are dissimilar to the training set. This renders decision making risky and can be fatal in critical decisions, such as detecting unknown objects for autonomous driving and reacting to aggressive behavior on public places under surveillance. We explore how well a trained generative model can explain a new observation.
Conditional generative modeling: Integration of convenient user-control into high-resolution expressive generative models (GANs and VAEs) capable of generating diverse but consistent image and video samples and scene layouts.
Multi-modal visual learning and reasoning: End-to-end architectures that can integrate various forms of spatial (image) and sequential (text, audio) information, as well as can focus on aspect of that information that is relevant for the task, through neural attention and memory are critical for perception and reasoning. We develop novel architectures and learning mechanisms for such generic AI uses, including visual captioning, question answering and language grounding.
Digital Humans: Fast and accurate simulation of human bodies and clothing, using physics-based and data-driven models
Areas of collaboration between sub-group members:
- Capture and reconstruction
- Analysis and processing of artist imagery
- Reinforcement learning
- Visual perception and understanding
- Digital Human Models