6.3.5 - What does the Future Hold?

Unified Architecture, Multimodal Data, Mixture of Experts

Perceiver: General Perception with Iterative Attention

The Perceiver is an architecture based on attentional principles that scales to high-dimensional inputs such as images, videos, audio, point-clouds, and multimodal combinations without making domain-specific assumptions

GPT-3 Scared You? Meet Wu Dao 2.0: A Monster of 1.75 Trillion Parameters

Wu Dao 2.0 holds the record of being the largest of all with a striking 1.75 trillion parameters (10x GPT-3).

Wu Dao 2.0 is multimodal. It can learn from text and images and tackle tasks that include both types of data (something GPT-3 can’t do

Wu Dao 2.0 was trained with FastMoE, a system similar to Google’s Mixture of Experts (MoE)