
dReLU Activation Function: Matching SwiGLU Performance with 90% Sparsity
27 Feb 2026
Achieve superior sparsity and lower validation perplexity without compromising model convergence or performance.

Analyzing ReLUfication Limitations: Enhancing LLM Sparsity via Up Projection
27 Feb 2026
Learn why modifying the up projection component is key to achieving higher LLM activation sparsity.

Optimizing LLM Inference: Sparse Activation, MoE, and Gated-MLP Efficiency
27 Feb 2026
Explore advanced strategies for efficient LLM inference, including model compression, intrinsic activation sparsity, and Mixture-of-Experts (MoE)

TurboSparse-LLM: Accelerating Mixtral and Mistral Inference via dReLU Sparsity
27 Feb 2026
Discover how the dReLU activation function and high-quality data mixtures achieve 90% sparsity in Mistral and Mixtral models without losing performance.

Toto: Time Series Optimized Transformer for Observability
22 Oct 2025
Datadog introduces Toto, an AI model for time series forecasting that boosts accuracy, reliability, and responsible observability.

Toto AI Model Sets New Benchmark for Time Series Forecasting
22 Oct 2025
Discover how Toto achieved state-of-the-art results in AI forecasting, surpassing leading zero-shot and full-shot models.

How Datadog Turned Noisy Observability Metrics Into AI Gold
22 Oct 2025
How Datadog trained its Toto model using one trillion curated time series points and synthetic data to build smarter, more generalizable AI.

How Toto Reimagines Multi-Head Attention for Multivariate Forecasting
21 Oct 2025
Toto introduces a decoder-only transformer with proportional space-time attention for smarter multivariate time series forecasting.

The Time Series Optimized Transformer Setting New Standards in Observability
21 Oct 2025
Datadog’s Toto sets a new benchmark in time series forecasting with trillion-point training and domain-tuned observability insights.