Check order status

Become part of a community of book lovers from all over the world and get access to a whole bunch of benefits. Create an account for free

Austrian Post 5.49 € DPD courier 3.99 € DPD point 2.99 €

Contact

How to shop

Help

My account

▸ Empty :-(

Local LLM Inference Optimization

Name: Local LLM Inference Optimization
Brand: Independently published
SKU: 52120727
Price: 17.29 EUR
Availability: InStock
Author: Thomas O. Greene
ISBN: 9798258375193

A Comprehensive Guide to Quantization, Hardware Acceleration, and Efficient Private AI Deployment

Thomas O. Greene

Language

English

Book Paperback

Libristo code: 52120727

Publishers Independently published, April 2026

Stop Renting Intelligence. Start Optimizing Your Own.Do you want to run 70B parameter models on a si... Full description

Libristo code: 52120727

42 b

New

17.29 € VAT included

In stock at our supplier Shipping in 9-15 days

Delivery to Austria

30-day return policy

Stop Renting Intelligence. Start Optimizing Your Own.
Do you want to run 70B parameter models on a single consumer GPU? Are you tired of high API costs, network latency, and the privacy risks of cloud-based AI?
The "Local LLM Revolution" is here, but running Large Language Models (LLMs) privately is only half the battle. To make them truly useful, you must master Inference Optimization.
In Local LLM Inference Optimization, you will move beyond basic "out-of-the-box" setups and dive into the high-performance engineering required to squeeze every drop of power from your hardware. Whether you are using NVIDIA CUDA, Apple Silicon (MLX), or AMD ROCm, this comprehensive guide provides the technical blueprint for the sovereign engineer.

What You Will Master:

The Quantization Deep-Dive: Learn to navigate the "Quantization Tax" using GGUF, EXL2, AWQ, and GPTQ. Move from FP32 to 4-bit and even 1.58-bit (BitNet) without losing the model's "mind."
Advanced Memory Management: Defeat "Out of Memory" (OOM) errors by mastering KV Cache Management, PagedAttention, and FlashAttention 2 & 3.
The Speed Multipliers: Double your Tokens Per Second (TPS) using Speculative Decoding, Continuous Batching, and Lookahead Heuristics.
Hardware Architecture: Architect high-performance local servers using Multi-GPU Pipeline Parallelism and CPU/GPU offloading strategies.
Context Window Expansion: Use RoPE Scaling, YaRN, and LongRoPE to push 8k models to 128k+ context on consumer hardware.
The Full Local Stack: Step-by-step guides for Llama.cpp, Ollama, vLLM, and TGI (Text Generation Inference).
Security & Privacy: Deploy Air-Gapped AI environments and secure your infrastructure using Safetensors and local sandboxing.

Why This Book?
This book focuses on Deployment and Efficiency. It is written for the Lead Engineer, the Privacy-Conscious CTO, and the Prosumer Hobbyist who demands low Time to First Token (TTFT) and maximum Perf/Watt.
Stop paying for tokens. Own your weights. Optimize your future.

Actress & Polyglot

EWA KASP for

Play video

Libristo has the largest selection of foreign-language books. That’s why I buy my books there.

About the book

Full name Local LLM Inference Optimization

Author Thomas O. Greene

Language

English

Binding Book - Paperback

Date of issue 2026

Number of pages 170

EAN 9798258375193

Libristo code 52120727

Publishers Independently published

Weight 237

Dimensions 152 x 229 x 9

Frequently searched

Categories

Authors

Publishers

Frequently searched

Items

Categories

Authors

Publishers

Delivery

Shopping guide

Local LLM Inference Optimization

A Comprehensive Guide to Quantization, Hardware Acceleration, and Efficient Private AI Deployment

About the book

Categories

Give this book today

It's easy

We are at home across Europe

Frequently searched

Categories

Authors

Publishers

Local LLM Inference Optimization

A Comprehensive Guide to Quantization, Hardware Acceleration, and Efficient Private AI Deployment

About the book

Categories

Give this book today

It's easy

Don’t have an account? Discover the benefits of having a Libristo account!