As LLMs move into production, #Observability is essential for Reliability, Performance & Responsible AI.

Learn how to deploy an #opensource observability stack - using Prometheus, Grafana, Tempo, and OpenTelemetry Collectors on Kubernetes - and monitor real #AI workloads with #vLLM & #Llamastack.

🎥 Watch the #InfoQ video (#transcript included): https://bit.ly/4hlKoDa

#Prometheus #Grafana #OpenTelemetry #Kubernetes

3 things to know about Red Hat AI 3

YouTube
Red Hat Brings Distributed AI Inference to Production AI Workloads with Red Hat AI 3

Red Hat Brings Distributed AI Inference to Production AI Workloads with Red Hat AI 3

Introducing vLLM Inference Provider in Llama Stack

We are excited to announce that vLLM inference provider is now available in Llama Stack through the collaboration between the Red Hat AI Engineering team and the Llama Stack team from Meta. This article provides an introduction to this integration and a tutorial to help you get started using it locally or deploying it in a Kubernetes cluster.

vLLM Blog

🦙 #LlamaStack: Standardizing #GenerativeAI Development

Defines open API specs for #AI application building blocks

Covers full lifecycle: model training, evaluation, production deployment

Includes APIs for inference, safety, memory, agents, and more

Supports multiple environments: local, hosted, and on-device

🛠️ Features:

#OpenSource API providers and distributions

Mix-and-match capabilities (e.g., local small models, cloud-based large models)

Consistent APIs across platforms (server, mobile, etc.)

🤝 Supported implementations:

API Providers: #Meta Reference, #Fireworks, #AWS Bedrock, #Together, #Ollama, TGI, #Chroma, PG Vector, #PyTorch ExecuTorch

Distributions: Meta Reference, Dell-TGI

📦 Easy installation via pip or from source 🖥️ Includes 'llama' CLI for managing distributions, models, and more

Learn more: https://github.com/meta-llama/llama-stack

GitHub - meta-llama/llama-stack: Composable building blocks to build Llama Apps

Composable building blocks to build Llama Apps. Contribute to meta-llama/llama-stack development by creating an account on GitHub.

GitHub