Raspberry Pi Wants Generative AI to Stop Relying on the Cloud: Meet the New AI HAT+ 2 for Raspberry Pi 5

For years, “AI at the edge” on Raspberry Pi has mainly been associated with computer vision: detecting objects, estimating poses, or segmenting scenes using a camera and a dedicated accelerator. But in 2026, the conversation has shifted. The popularity of generative models—especially language models (LLMs) and vision-language models (VLMs)—has shifted expectations from “recognize what it sees” to “understand, respond, and assist.” That leap, until now, often required cloud services, with their recurring costs and privacy implications.

In this context, Raspberry Pi has announced the Raspberry Pi AI HAT+ 2, a new expansion board for Raspberry Pi 5 specifically designed to fill the gap for local generative AI: running inferences offline, with low latency, and without subscribing to external APIs. The key change from earlier generations revolves around a new accelerator that offers greater inference power geared toward GenAI, and most notably, onboard dedicated memory.

A shift in focus: from “accelerated vision” to “local generative AI”

The company recalls that the original AI HAT+ was released just over a year ago as an add-on for Raspberry Pi 5, featuring Hailo-8 (26 TOPS) and Hailo-8L (13 TOPS) accelerators, designed to keep all processing “at home,” within the device itself. This design allowed for solutions with increased privacy and independence from cloud services but was optimized for vision neural networks, not for the surge of generative models.

The AI HAT+ 2 is specifically built for this new landscape. It features the Hailo-10H and promises 40 TOPS (INT4) of inference performance, aiming to make generative AI workloads practically fit within a Raspberry Pi 5. Additionally, a crucial component is onboard: 8 GB of dedicated RAM integrated into the board, designed to handle larger models than typical edge devices and alleviate some of the memory pressure on the Raspberry Pi. Furthermore, local execution—without network connection—maintains the original goals: low latency, privacy, and cost control compared to API usage.

At the same time, Raspberry Pi emphasizes that the new HAT does not sacrifice its vision capabilities: for models like YOLO, pose estimation, or scene segmentation, performance remains comparable to the AI HAT+ with 26 TOPS, partly thanks to the onboard RAM. Practical compatibility is also preserved: the AI HAT+ 2 continues to integrate with the camera stack (libcamera, rpicam-apps, Picamera2), ensuring that existing users within that ecosystem won’t face disruptive changes.

Quick comparison table: AI HAT+ vs AI HAT+ 2

FeatureRaspberry Pi AI HAT+Raspberry Pi AI HAT+ 2
AcceleratorHailo-8 / Hailo-8LHailo-10H
Inference performance26 TOPS / 13 TOPS40 TOPS (INT4)
Main focusVision modelsLLMs, VLMs, and GenAI (without losing vision)
Dedicated onboard RAM8 GB
Camera integrationYesYes (libcamera, rpicam-apps, Picamera2)
Pricing announced$130

Models available at launch

To concretize the concept, Raspberry Pi details that several LLMs will be available for installation at launch, optimized for edge environments. The initial list includes:

  • DeepSeek-R1-Distill (1.5 billion parameters)
  • Llama 3.2 (1 billion)
  • Qwen 2.5-Coder (1.5 billion)
  • Qwen 2.5-Instruct (1.5 billion)
  • Qwen 2 (1.5 billion)

The company also anticipates more models and larger sizes will become available through future updates.

In demonstrations, Raspberry Pi uses an LLM backend called hailo-ollama and a web frontend similar to chat interfaces with Open WebUI, enabling operation from a browser as if it were a traditional assistant interface. Examples include general question-answering, code-oriented programming tasks, simple translation, and a VLM scene description based on a real-time camera feed—all running locally on a Raspberry Pi 5 with the AI HAT+ 2.

The true edge limit: small models, but adaptable

Raspberry Pi underscores a reality often overlooked: cloud-based LLMs operate at a vastly different scale. According to the company, cloud models from players like OpenAI, Meta, or Anthropic range between 500 billion and 2 trillion parameters. In contrast, models intended for the AI HAT+ 2 typically fall between 1 billion and 7 billion parameters, to match available memory.

This creates an inevitable trade-off: these “compact” models are not designed for broad knowledge coverage but are optimized for a more specific dataset and goal. The strategy to compensate is personalization: the AI HAT+ 2 supports fine-tuning with LoRA (Low-Rank Adaptation) and uses the Hailo Dataflow Compiler to compile adapters for specialized tasks such as translation, voice-to-text, or scene analysis. Practically, this brings the product closer to an “industrial” or final product approach: less promise of broad AI, more emphasis on use-case-specific optimization.

Availability, assembly, and use cases

The Raspberry Pi AI HAT+ 2 is priced at $130 and is explicitly designed for Raspberry Pi 5, partly due to its communication via PCI Express in addition to GPIO. The technical documentation also mentions an optional heatsink recommended for intensive workloads and aims for long-term production: the product will remain in production until at least January 2036.

Regarding applications, the “GenAI without cloud” approach applies to scenarios where connectivity isn’t always feasible or desirable: offline process control, secure data analysis, facility management, and robotics, among others. These are not coincidental examples—they are precisely the sectors where cloud call costs, latency, and data sensitivity can turn a promising prototype into an unviable solution.


Frequently Asked Questions

What is the purpose of the Raspberry Pi AI HAT+ 2 if the original AI HAT+ already exists?
The original AI HAT+ was optimized for vision models (object detection, segmentation, etc.). The AI HAT+ 2 adds an accelerator designed for generative AI and includes 8 GB of dedicated RAM, enabling local execution of LLMs and VLMs on Raspberry Pi 5.

Which language models can be installed initially, and what sizes do they have?
At launch, models like DeepSeek-R1-Distill (1.5 billion), Llama 3.2 (1 billion), and various Qwen (1.5 billion) variants will be available, with plans to add larger models later.

What are the advantages of running a local assistant on Raspberry Pi versus using a cloud API?
Primarily, data privacy, lower latency, offline operation without relying on internet connectivity, and cost control (no payments per use or dependence on third-party providers). This is especially critical in industrial, educational, or IoT environments.

Can models be “personalized” for specific tasks with the AI HAT+ 2?
Yes. Raspberry Pi supports LoRA for fine-tuning language models and the Hailo Dataflow Compiler to compile adapters and run models tailored for particular needs (e.g., translation, speech-to-text, scene analysis).

via: raspberrypi

Scroll to Top