Rebellions Acquires SqueezeBits to Complete the AI Inference Cycle

Rebellions has announced the acquisition of SqueezeBits, a South Korean startup specializing in inference optimization and model compression for artificial intelligence. The deal underscores an increasingly evident trend in the AI infrastructure market: manufacturing chips alone is no longer enough. To compete in real-world deployments, it’s also necessary to control the software that enables models to run quickly, cost-effectively, and reliably.

The Seoul-based company aims to evolve from a manufacturer of NPUs to an end-to-end AI infrastructure provider. With SqueezeBits, Rebellions incorporates into its own organization a piece that has so far worked closely with it: model optimization, computational load reduction, serving software, and adapting open frameworks such as vLLM to NPU-based environments. Rebellions states that both companies have collaborated since 2024 on model compression technologies and dedicated software for their chips.

The core message is clear. The battle for enterprise AI is shifting toward inference, meaning the moment when a model responds to an actual user, application, or system request. It is there that the query cost, latency, power consumption, and the ability to scale a service without skyrocketing expenses are decided.

From chips to complete systems

Rebellions does not want to position itself solely as a semiconductor company. After acquiring SqueezeBits, its approach now combines NPU hardware, software optimization, and inference serving on a single platform. This means covering more stages of the process: from the moment a request enters the system to when the model executes and returns a response.

The move aligns with a broader trend. In generative AI, raw chip performance matters, but it alone does not determine the final outcome. An accelerator can look efficient on paper and underperform if the model isn’t well adapted, if the serving doesn’t make good use of the hardware, if memory management is poor, or if the software stack forces developers to rewrite too much code.

SqueezeBits brings precisely that knowledge. Founded in March 2022, the company focuses on model compression and optimization to reduce deployment and operational costs of AI services. Rebellions highlights that the startup has collaborated with global hardware companies like Intel and NVIDIA, and has developed technologies to accelerate models and cut costs across various computing environments.

Part of the infrastructureWhat is added after the acquisition
Rebellions NPUSpecialized hardware for AI inference
Model compressionLower memory and compute requirements
Inference optimizationLower latency and better accelerator utilization
ServingRunning models in production
vLLM and open frameworksReduced barriers for developers
Full-stack integrationLess manual effort in real-world deployments

The key word here is integration. Many companies seek alternatives to GPUs for inference but avoid a difficult migration. If an NPU demands too many changes in code, tools, deployment, and observability, its cost or power advantages may stay theoretical. That’s why software is as critical as silicon.

vLLM, PyTorch, and the importance of maintaining developer flow

Rebellions and SqueezeBits have previously collaborated within the Korean developer community through workshops focused on vLLM, an open inference framework widely used for serving large language models. In a technical summary published by SqueezeBits, both companies explained that the exercises ran on Rebellions’ ATOM-MAX NPU servers, with Kubernetes as the infrastructure layer and workflows based on PyTorch, Optimum, and vLLM.

That detail matters more than it seems. Adopting new accelerators isn’t just about benchmarks. It depends on whether teams can utilize familiar tools, maintain their deployment patterns, and achieve improvements without overhauling their entire platform. SqueezeBits noted in that workshop that the vLLM-RBLN plugin allowed users to preserve the usual GPU code flow with minimal changes.

For Rebellions, acquiring SqueezeBits enables positioning that layer at the core of its product. It’s not just a talent acquisition; it’s a way to bridge the gap between specialized hardware and real-world applications. In a market dominated by NVIDIA, alternatives need more than just efficient chips—they need developers to comfortably run models, clients to get clear support, and performance to be consistent in production.

A piece in Korea’s sovereign AI strategy

The acquisition also has an industrial reading. Rebellions has become one of South Korea’s most visible bets to build a local AI infrastructure ecosystem. In December 2024, it completed its merger with SAPEON Korea, an operation the company described as the birth of Korea’s first AI chip unicorn under the Rebellions brand. That integration combined capabilities from two domestic semiconductor companies and strengthened its international ambitions.

Now, with SqueezeBits, the company broadens its focus. It is no longer just about integrating chip manufacturers but also about incorporating inference and optimization software. Rebellions positions the acquisition within the building of a sovereign AI infrastructure and reminds that in March 2026, it was selected as the first direct investment from the National Growth Fund, part of Korea’s effort to create a “K-NVIDIA.”

The comparison to NVIDIA shouldn’t be taken literally. NVIDIA’s leadership isn’t just about GPUs but also CUDA, libraries, networking, complete systems, inference software, developer support, and a vast partner ecosystem. If Korea aims to create a local player with real potential, it needs something smaller-scale: chips, software, tools, community, systems, and use cases.

SqueezeBits could play a key role here. Inference optimization is one of the layers where competitiveness is gained or lost. A model that consumes less memory, responds faster, and better leverages hardware allows infrastructure to be sold with a stronger argument than just technological sovereignty.

Inference becomes the core business focus

This acquisition reflects a growing industry trend: the next phase of AI will be measured not just by training larger models but by executing them cost-effectively. Every enterprise chatbot, agent, internal search engine, copilot, analysis tool, or automated support system relies on continuous inference. This makes operational cost a central issue.

In this context, model compression, quantization, efficient memory management, batching, caching, and serving are no longer just technical details. They are the difference between a viable service and one that’s too expensive to scale.

Rebellions aims to position itself precisely there. Its goal is for customers to use NPUs for inference infrastructure without facing overly complex optimization layers. The SqueezeBits acquisition is part of its effort to reduce that technical and commercial friction.

The deal alone doesn’t guarantee that Rebellions can compete with industry giants. NVIDIA, AMD, Intel, Google, AWS, Huawei, and others have well-established resources, clients, and platforms. However, it demonstrates that the South Korean company understands a critical point: in AI, hardware without software falls short, and in inference, efficiency only matters if it reaches production.

Frequently Asked Questions

What did Rebellions acquire?
Rebellions acquired SqueezeBits, a startup focused on inference optimization, model compression, and software for more efficient AI deployment.

Why is this acquisition important?
Because it enables Rebellions to integrate NPU hardware, software optimization, and inference serving into a unified platform, simplifying deployment for customers.

What is inference in AI?
Inference is the phase where a trained model receives a request, processes it, and produces a response. It’s the most resource-intensive part when AI services are used at scale.

What was the relationship between Rebellions and SqueezeBits before the purchase?
Since 2024, they have collaborated on model compression, software for Rebellions’ NPUs, and developer activities centered on vLLM.

What does “K-NVIDIA” mean?
It describes South Korea’s ambition to create a national champion in AI infrastructure capable of competing in chips, software, and systems for AI.

via: rebellions.ai

Scroll to Top