cover

dReLU Sparsification: High-Performance 90% Sparsity for Next-Gen LLMs

4 Mar 2026

Learn how this breakthrough makes large language models (LLMs) more accessible and environmentally friendly.

cover

TurboSparse Mobile: 22x Faster Mixtral Inference on PowerInfer-2

4 Mar 2026

Learn how PowerInfer-2 leverages extreme sparsity for a 22.2x speedup over llama.cpp.

cover

TurboSparse Inference: 4.6x Faster LLM Decoding via Hybrid GPU-CPU Computing

4 Mar 2026

Achieve up to 2.28x speedup on pure CPU and 4.64x in hybrid GPU-CPU environments compared to llama.cpp baselines.

cover

TurboSparse: Elite Inference Speed via dReLU Sparsity

3 Mar 2026

Achieve 2-5x faster LLM decoding on RTX 4090 and mobile devices using TurboSparse. Experience 97% parameter sparsity without performance loss.

cover

TurboSparse Inference Speedup: PowerInfer Integration for Real-Time LLM Decoding

3 Mar 2026

Learn how neuron-level predictor modules and expert routing enable practical inference acceleration for Mixtral-47B.PowerInfer Framework Integration

cover

TurboSparse Efficiency: Achieving 97% Parameter Sparsity in Mixtral-47B

3 Mar 2026

Discover how TurboSparse-Mistral-7B and Mixtral-47B leverage ReLUfication to reach up to 90% neuron inactivity, reducing active parameters to just 3%

cover

TurboSparse-LLM Performance: Outperforming Mixtral and Gemma with Extreme Sparsity

28 Feb 2026

Discover how ReLU-based intrinsic sparsity maintains accuracy with significant FLOPs reduction.

cover

dReLU Sparsification: Recovering LLM Performance with 150B Token Pretraining

28 Feb 2026

Discover the high-quality pretraining datasets and mixture ratios used to achieve elite activation sparsity.

cover

Sparse Activation in MoE Models: Extending ReLUfication to Mixture-of-Experts

27 Feb 2026

Discover how this discovery enables massive FLOP reductions through MoE ReLUfication.