
TurboSparse: Elite Inference Speed via dReLU Sparsity
3 Mar 2026
Achieve 2-5x faster LLM decoding on RTX 4090 and mobile devices using TurboSparse. Experience 97% parameter sparsity without performance loss.

TurboSparse Inference Speedup: PowerInfer Integration for Real-Time LLM Decoding
3 Mar 2026
Learn how neuron-level predictor modules and expert routing enable practical inference acceleration for Mixtral-47B.PowerInfer Framework Integration

TurboSparse Efficiency: Achieving 97% Parameter Sparsity in Mixtral-47B
3 Mar 2026
Discover how TurboSparse-Mistral-7B and Mixtral-47B leverage ReLUfication to reach up to 90% neuron inactivity, reducing active parameters to just 3%

TurboSparse-LLM Performance: Outperforming Mixtral and Gemma with Extreme Sparsity
28 Feb 2026
Discover how ReLU-based intrinsic sparsity maintains accuracy with significant FLOPs reduction.

dReLU Sparsification: Recovering LLM Performance with 150B Token Pretraining
28 Feb 2026
Discover the high-quality pretraining datasets and mixture ratios used to achieve elite activation sparsity.

Sparse Activation in MoE Models: Extending ReLUfication to Mixture-of-Experts
27 Feb 2026
Discover how this discovery enables massive FLOP reductions through MoE ReLUfication.

dReLU Activation Function: Matching SwiGLU Performance with 90% Sparsity
27 Feb 2026
Achieve superior sparsity and lower validation perplexity without compromising model convergence or performance.

Analyzing ReLUfication Limitations: Enhancing LLM Sparsity via Up Projection
27 Feb 2026
Learn why modifying the up projection component is key to achieving higher LLM activation sparsity.

Optimizing LLM Inference: Sparse Activation, MoE, and Gated-MLP Efficiency
27 Feb 2026
Explore advanced strategies for efficient LLM inference, including model compression, intrinsic activation sparsity, and Mixture-of-Experts (MoE)