Check order status

Become part of a community of book lovers from all over the world and get access to a whole bunch of benefits. Create an account for free

Austrian Post 5.49 € DPD courier 3.99 € DPD point 2.99 €

Contact

How to shop

Help

My account

▸ Empty :-(

AI Inference Optimization Engineering

Name: AI Inference Optimization Engineering
Brand: Independently published
SKU: 52770465
Price: 11.79 EUR
Availability: InStock
Author: ChatVariety Team
ISBN: 9798199720021

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

ChatVariety Team

Language

English

Book Paperback

Libristo code: 52770465

Publishers Independently published, June 2026

Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a mass... Full description

Libristo code: 52770465

29 b

Coming soon New

New

11.79 € VAT included

Expected in stock Expected 07. 06. 2026

Delivery to Austria

30-day return policy

Slash LLM Deployment Costs and Latency

Deploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.

What you will master inside this book:

Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.
State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.
Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.
Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.
Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.

Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines.

Actress & Polyglot

EWA KASP for

Play video

Libristo has the largest selection of foreign-language books. That’s why I buy my books there.

About the book

Full name AI Inference Optimization Engineering

Author ChatVariety Team

Language

English

Binding Book - Paperback

Date of issue 2026

Number of pages 96

EAN 9798199720021

Libristo code 52770465

Publishers Independently published

Weight 142

Dimensions 152 x 229 x 5

Frequently searched

Categories

Authors

Publishers

Frequently searched

Items

Categories

Authors

Publishers

Delivery

Shopping guide

AI Inference Optimization Engineering

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

About the book

Categories

Give this book today

It's easy

We are at home across Europe

Frequently searched

Categories

Authors

Publishers

AI Inference Optimization Engineering

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

About the book

Categories

Give this book today

It's easy

Don’t have an account? Discover the benefits of having a Libristo account!