
TurboSparse-LLM Performance: Outperforming Mixtral and Gemma with Extreme Sparsity
28 Feb 2026
Discover how ReLU-based intrinsic sparsity maintains accuracy with significant FLOPs reduction.

dReLU Sparsification: Recovering LLM Performance with 150B Token Pretraining
28 Feb 2026
Discover the high-quality pretraining datasets and mixture ratios used to achieve elite activation sparsity.

Sparse Activation in MoE Models: Extending ReLUfication to Mixture-of-Experts
27 Feb 2026
Discover how this discovery enables massive FLOP reductions through MoE ReLUfication.

dReLU Activation Function: Matching SwiGLU Performance with 90% Sparsity
27 Feb 2026
Achieve superior sparsity and lower validation perplexity without compromising model convergence or performance.

Analyzing ReLUfication Limitations: Enhancing LLM Sparsity via Up Projection
27 Feb 2026
Learn why modifying the up projection component is key to achieving higher LLM activation sparsity.

Optimizing LLM Inference: Sparse Activation, MoE, and Gated-MLP Efficiency
27 Feb 2026
Explore advanced strategies for efficient LLM inference, including model compression, intrinsic activation sparsity, and Mixture-of-Experts (MoE)

TurboSparse-LLM: Accelerating Mixtral and Mistral Inference via dReLU Sparsity
27 Feb 2026
Discover how the dReLU activation function and high-quality data mixtures achieve 90% sparsity in Mistral and Mixtral models without losing performance.

Toto: Time Series Optimized Transformer for Observability
22 Oct 2025
Datadog introduces Toto, an AI model for time series forecasting that boosts accuracy, reliability, and responsible observability.

Toto AI Model Sets New Benchmark for Time Series Forecasting
22 Oct 2025
Discover how Toto achieved state-of-the-art results in AI forecasting, surpassing leading zero-shot and full-shot models.