Doesn't suit? No problem! You can return items for up to 30 days
You won't go wrong with a gift voucher. The gift recipient can choose anything from our offer.
Up to 30 days for returns
The Transformer Principles Series is a three-volume graduate-level treatise that builds a complete mathematical and engineering understanding of modern AI systems, from the foundational attention mechanism to large language models and multimodal architectures.
Volume III - Multimodal AI Systems: Architectures, Training, and Applications extends the Transformer paradigm beyond text into vision, audio, and video. It covers modality-specific encoders and tokenizers, cross-modal fusion and contrastive alignment (CLIP, SigLIP), diffusion and flow-matching generative models, vision-language architectures (ViT, LLaVA, Q-Former), text-to-image and text-to-video generation, speech and audio processing, efficient inference for multimodal models, long-context scaling, and reasoning agents that perceive and act across modalities.
Hi! I'm Libroamiko, your book advisor.
How can I help you?