cover

dReLU Activation Function: Matching SwiGLU Performance with 90% Sparsity

27 Feb 2026

Achieve superior sparsity and lower validation perplexity without compromising model convergence or performance.

cover

Analyzing ReLUfication Limitations: Enhancing LLM Sparsity via Up Projection

27 Feb 2026

Learn why modifying the up projection component is key to achieving higher LLM activation sparsity.

cover

Optimizing LLM Inference: Sparse Activation, MoE, and Gated-MLP Efficiency

27 Feb 2026

Explore advanced strategies for efficient LLM inference, including model compression, intrinsic activation sparsity, and Mixture-of-Experts (MoE)

cover

TurboSparse-LLM: Accelerating Mixtral and Mistral Inference via dReLU Sparsity

27 Feb 2026

Discover how the dReLU activation function and high-quality data mixtures achieve 90% sparsity in Mistral and Mixtral models without losing performance.

cover

Toto: Time Series Optimized Transformer for Observability

22 Oct 2025

Datadog introduces Toto, an AI model for time series forecasting that boosts accuracy, reliability, and responsible observability.

cover

Toto AI Model Sets New Benchmark for Time Series Forecasting

22 Oct 2025

Discover how Toto achieved state-of-the-art results in AI forecasting, surpassing leading zero-shot and full-shot models.

cover

How Datadog Turned Noisy Observability Metrics Into AI Gold

22 Oct 2025

How Datadog trained its Toto model using one trillion curated time series points and synthetic data to build smarter, more generalizable AI.

cover

How Toto Reimagines Multi-Head Attention for Multivariate Forecasting

21 Oct 2025

Toto introduces a decoder-only transformer with proportional space-time attention for smarter multivariate time series forecasting.

cover

The Time Series Optimized Transformer Setting New Standards in Observability

21 Oct 2025

Datadog’s Toto sets a new benchmark in time series forecasting with trillion-point training and domain-tuned observability insights.