cover

TurboSparse-LLM Performance: Outperforming Mixtral and Gemma with Extreme Sparsity

28 Feb 2026

Discover how ReLU-based intrinsic sparsity maintains accuracy with significant FLOPs reduction.

cover

dReLU Sparsification: Recovering LLM Performance with 150B Token Pretraining

28 Feb 2026

Discover the high-quality pretraining datasets and mixture ratios used to achieve elite activation sparsity.

cover

Sparse Activation in MoE Models: Extending ReLUfication to Mixture-of-Experts

27 Feb 2026

Discover how this discovery enables massive FLOP reductions through MoE ReLUfication.

cover

dReLU Activation Function: Matching SwiGLU Performance with 90% Sparsity

27 Feb 2026

Achieve superior sparsity and lower validation perplexity without compromising model convergence or performance.

cover

Analyzing ReLUfication Limitations: Enhancing LLM Sparsity via Up Projection

27 Feb 2026

Learn why modifying the up projection component is key to achieving higher LLM activation sparsity.

cover

Optimizing LLM Inference: Sparse Activation, MoE, and Gated-MLP Efficiency

27 Feb 2026

Explore advanced strategies for efficient LLM inference, including model compression, intrinsic activation sparsity, and Mixture-of-Experts (MoE)

cover

TurboSparse-LLM: Accelerating Mixtral and Mistral Inference via dReLU Sparsity

27 Feb 2026

Discover how the dReLU activation function and high-quality data mixtures achieve 90% sparsity in Mistral and Mixtral models without losing performance.

cover

Toto: Time Series Optimized Transformer for Observability

22 Oct 2025

Datadog introduces Toto, an AI model for time series forecasting that boosts accuracy, reliability, and responsible observability.

cover

Toto AI Model Sets New Benchmark for Time Series Forecasting

22 Oct 2025

Discover how Toto achieved state-of-the-art results in AI forecasting, surpassing leading zero-shot and full-shot models.