Tech Blogs
Recent engineering posts and AI research from the companies I follow — fetched from their RSS feeds and linked to the originals. Filter by company; newest first.
Statement on the US government directive to suspend access to Fable 5 and Mythos 5
How Preply combines AI and human tutors to personalize learning
Preply uses OpenAI to launch AI-generated lesson summaries, providing personalised feedback and language learning exercises.
New OpenAI Academy courses for the next era of work
OpenAI introduces three Academy courses that help people build practical AI skills, create repeatable workflows, and apply agents in everyday work.
Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure
As enterprise AI adoption scales, developers are increasingly forced to stitch together fragmented pipelines—separate models for text, vision, and...
NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI Benchmark
AI agents have fundamentally changed the complexity of inference workloads. Until now, the industry has struggled to define a standard for measuring how...
Ire identifies another LOTUSLITE specimen
Project Ire examined a timely malware sample and determined its intent through reverse engineering—identifying LOTUSLITE characteristics even as most major EDR tools did not detect it. The post Ire identifies another LOT…
olmo-eval: An evaluation workbench for the model development loop
Scaling out Distroless adoption With AI
Distroless adoption at Grab Grab is migrating from heavy base images to Distroless images to reduce security risks. By limiting each container to the application and its runtime dependencies, we shed non-essential binari…
How we made GitHub Copilot CLI more selective about delegation
Better orchestration, fewer handoffs, faster progress, without a single new knob. The post How we made GitHub Copilot CLI more selective about delegation appeared first on The GitHub Blog .
How Dropbox uses MCP and Dash to close the design-to-code security gap
Using an agentic AI system to surface threat models during code review and spot gaps between security requirements and implementation.
Scaling Security Insights: how we achieved a 10x increase in global scanning capacity
Cloudflare Security Insights system now processes over 120 scans per second, providing frequent insights for all customers. By optimizing Kafka consumers, Postgres queries, and our API, we scaled our throughput 10x witho…
Results from the first Anthropic Public Record
TCS and Anthropic partner to bring Claude to regulated industries
Stripe Projects adds new agent integrations, more providers, and custom developer controls
Our data shows that agents are now fully capable of independently writing code and integrating with APIs like Stripe’s. And yet, many of the steps adjacent to writing code are still too hard for agents to do on their own…
Agentic Testing: Where Agents Fit in the E2E Testing Stack
Abstract Agent-driven end-to-end (E2E) tests add a new exploratory layer to testing, but should they replace traditional deterministic tests? We ran more than 200 agentic E2E workflows using the Playwright MCP, Playwrigh…
How MuleSoft Is Raising the Trust Bar for AI-Generated Code
In our Engineering Energizers Q A series, we highlight the engineering minds driving innovation across Salesforce. Today, we spotlight Melissa Cazalet, Senior Vice President of Software Engineering at MuleSoft, whose tea…
OpenAI to acquire Ona
OpenAI plans to acquire Ona to expand Codex with secure, persistent cloud environments, enabling long-running AI agents across enterprise workflows.
BBVA puts AI at the core of banking with OpenAI
Learn how BBVA scaled ChatGPT Enterprise to 100,000 employees and partnered with OpenAI to accelerate AI-powered banking transformation worldwide.
How an astrophysicist uses Codex to help simulate black holes
Discover how astrophysicist Chi-kwan Chan uses Codex to build black hole simulations, helping scientists study extreme physics and test Einstein’s theory of general relativity.
Supporting Europe’s work in ensuring a trustworthy AI ecosystem
OpenAI supports the EU Code of Practice on AI content transparency, advancing provenance standards and tools to help people understand AI-generated content.
One-Click Multi-Tenant Security with NVIDIA Quantum InfiniBand
NVIDIA Quantum InfiniBand now offers intent-based security profiles in Unified Fabric Manager (UFM) that enable multi-tenant fabric security in a single...
Your agent just scaffolded a project from 2020
Your agent ran a scaffold command. Project generated, dependencies resolved, no errors. Everything looks fine. Except it s based on the project structure from 2020, and neither you nor the agent noticed. How npx picks th…
Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP
GitLab Patch Release: 19.0.2, 18.11.5, 18.10.8
Making secret scanning more trustworthy: Reducing false positives at scale
Alerts are more trustworthy and actionable when noise is reduced. See how we improved the verification step with context-aware LLM reasoning. The post Making secret scanning more trustworthy: Reducing false positives at…
GitHub availability report: May 2026
In May, we experienced nine incidents that resulted in degraded performance across GitHub services. The post GitHub availability report: May 2026 appeared first on The GitHub Blog .
Zanele Munyikwa
MiniPIC: Flexible Position-Independent Caching in <100LOC
Retrieval-augmented and agentic workloads repeatedly prefill recurring predictable structured inputs (which we call "spans") such as documents and code files. Yet, prefix caching in engines such as vLLM cannot reuse thei…
HyPE: Category-Aware Hypergraph Encoding with Persistent Edge Embeddings for Persona-Grounded Dialogue
Persona-grounded dialogue systems aim to produce responses consistent with a speaker's persona, yet existing methods treat personas as a flat set of sentences and fail to model the high-order relations among persona attr…
NTS-CoT: Mitigating Hallucinations in LLM-based News Timeline Summarization with Chain-of-Thought Reasoning
The rapid updates of online news make tracking event developments challenging, highlighting the need for timeline summarization (TLS). Hallucinations, where LLM-generated content deviates from source news, still remain a…
Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents
Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in one session may still be violated in the next. We study this gap between p…
MemRefine: LLM-Guided Compression for Long-Term Agent Memory
Large language model (LLM) agents are increasingly expected to operate over long-term interactions, where information from past dialogues must be preserved and recalled to support future tasks. However, as interactions a…
LAUKIN: A Multi-jurisdictional Common Law Contract Dataset
Multinational companies increasingly require cross-jurisdictional contract review, yet existing legal NLP datasets are largely restricted to a single jurisdiction. We introduce LAUKIN (Legal equivalence dataset of Austra…
A Context-Aware Dataset for Stance Detection in Bioethical Controversies on Reddit
Bioethical debates increasingly unfold on social media, yet stance detection research lacks large-scale, domain-specific resources for modeling such context-dependent discourse. We present BioStance, a context-aware data…
SICI: A Semantic-Pragmatic Complexity Index Reveals Regime Shifts in LLM Stance Detection
Prompt-based LLMs are increasingly used for stance detection, but harder examples are not always repaired by clearer instructions, reasoning prompts, retrieval, or debate. We introduce SICI (Stance Inference Complexity I…
Understanding helpfulness and harmless tension in reward models
Reward models are a key component of reinforcement learning from human feedback (RLHF), aligning language models toward both helpful and harmless behaviour. However, the internal mechanisms underlying these objectives an…
Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization
Optimal transport (OT) has been shown to detect hallucinations in neural machine translation (NMT) by measuring the geometric distance between cross-attention distributions and a reference distribution, without any super…
When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates
Arabic and Hebrew, as closely related Semitic languages, share a substantial lexicon of true cognates, misleading false friends, and modern loanwords. This overlap poses a challenge for cross-lingual semantic understandi…
PolyAlign: Conditional Human-Distribution Alignment
Post-training methods such as supervised fine-tuning (SFT) and preference optimization typically align language models toward a single global assistant behavior. While effective for improving average helpfulness, this ca…
ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm
Existing computer-use agents remain fundamentally limited in professional software manipulation: GUI-based agents suffer from fragile visual grounding and long-horizon error accumulation, while API-basedapproaches strugg…
Evaluating Pluralism in LLMs through Latent Perspectives
The growing need to represent diverse perspectives has increased interest in pluralistic LLM generation. Although difficult to operationalize, identifying perspectives expressed in text would provide clear guidance on pl…
TimeLens: On-Device Artifact Recognition with Retrieval-Augmented Question Answering for the Grand Egyptian Museum
TimeLens is an AI-powered bilingual mobile guide for the Grand Egyptian Museum (GEM). Pointing a phone at an exhibit, a visitor sees the artifact recognized in real time and can ask follow-up questions answered in Englis…
Cross-Modal Masked Compositional Concept Modeling for Enhancing Visio-Linguistic Compositionality
Contrastively trained vision-language models like CLIP, have made remarkable progress in learning joint image-text representations, but still face challenges in compositional understanding. They often exhibit a "bag-of-w…
RogueAI: A Reverse Turing Test for Detecting Licensed AI Deception in Dialogue
The original Turing Test asks a human judge to distinguish a machine from a person through dialogue. Three quarters of a century later, conversational systems pass this test in casual settings; the interesting epistemolo…
SkillCAT: Contrastive Assessment and Topology-Aware Skill Self-Evolution for LLM Agents
Skill self-evolution methods for LLM agents aim to turn execution trajectories into reusable skill documents, but current pipelines typically learn from one trajectory per task, merge candidate skill patches before check…
Low-Latency Real-Time Audio Game Commentary System via LLM-Based Parallel Text Generation
We present a low-latency real-time audio game commentary system that generates spoken commentary directly from live gameplay video. In this end-to-end setting, a key bottleneck is accumulated waiting time; conventional p…
IVIE: A Neuro-symbolic Approach to Incremental and Validated Generation of Interactive Fiction Worlds
Computational creativity in Interactive Fiction faces a fundamental tension: Large Language Models (LLM) may produce creative narratives but struggle with world coherence, while symbolic systems ensure consistency but la…
From Passive Generation to Investigation: A Proactive Scientific Peer Review Agent
Large language models (LLMs) have shown promise in automating scientific peer review. However, existing approaches often struggle to generate in-depth reviews supported by concrete evidence. We argue that a key limitatio…
An End-to-End Hybrid Framework for Rumour Detection in Low-Resources Algerian Dialect
The rapid growth of social media has intensified the spread of rumours. This issue is more challenging in the Algerian context due to the informal and code-switched nature of dialectal content, the scarcity of annotated…
S-GBT: Smooth Growth Bound Tensor for Certified Robustness Against Word Substitution Attacks in NLP
Despite recent progress in Natural Language Processing (NLP), models remain vulnerable to word substitution attacks. Most existing defenses focus on first order sensitivity and measure how much the output changes when th…
Why Sampling Is Not Choosing: Intentionality, Agency, and Moral Responsibility in Large Language Models
Recent advances in large language models (LLMs) have prompted claims that such systems exhibit agency or qualify as moral agents. This paper argues that these attributions are misguided. We maintain that moral responsibi…
Examining the Cognitive Gap Between Authors and Peer Reviewers on Academic Paper Novelty
Novelty is a crucial metric for assessing the quality of academic papers. Scholars strive to highlight the novel aspects of their work, particularly in the title, abstract, and introduction. Peer review, serving as the g…
Ontology Memory-Augmented ASR Correction for Long Text-Speech Interleaved Conversations
Automatic speech recognition (ASR) correction has traditionally focused on isolated utterances or short local contexts. However, as text and speech become increasingly interleaved in long interactions, ASR correction req…
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling
We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verifica…
SupraBench: A Benchmark for Supramolecular Chemistry
Supramolecular chemistry, which includes the study of non-covalent host-guest assemblies, has advanced various applications. However, designing host-guest systems remains time-consuming, requiring days of dry-lab verific…
Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data
Large-scale mined corpora provide abundant training data for end-to-end speech-to-speech translation (S2ST) but may contain noise, misalignment, and semantic errors. Filtering noisy data is crucial to maintain robust spe…
When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval
While mixed-language querying is ubiquitous in multilingual communities, the sensitivity of dense retrievers to such queries remains poorly understood. We present a ratio-controlled study on mMARCO that systematically ev…
Adaptive Turn-Taking for Real-time Multi-Party Voice Agents
Turn-taking in multi-party spoken conversations remains a fundamental challenge for voice-based agents, particularly under dynamic floor competition and varying user expectations. We propose ModeratorLM, a role-playing v…
Uncertainty-Aware Hybrid Retrieval for Long-Document RAG
Retrieval augmented generation (RAG) depends critically on the quality and granularity of retrieved evidence. Large retrieval units preserve context but often introduce irrelevant content, which can dilute answer bearing…
Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models
Text-guided image editing with visual autoregressive (VAR) generators requires controlling both what the model samples and where the sampled change is written back into the image code. Existing VAR editors mainly operate…
ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages
Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, especially in multilingual and lo…
LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories
Scientific laboratories increasingly rely on AI systems to reason about experiments, but the physical act of doing science remains largely outside their reach. AI can help read literature, generate hypotheses, and plan p…
The Tone of Awareness: Topic, Sentiment, and Toxicity Maps During Mental Health Month on TikTok
Despite raising concerns about the mental health effects associated with the usage of TikTok, little is known about how related content is framed by creators and received by audiences. We collect the content of 28,341 Ti…
Reward Modeling for Multi-Agent Orchestration
Multi-Agent Systems (MAS) built on Large Language Models (LLMs) require effective orchestration to coordinate specialized agents, yet training such orchestrators is hindered by limited supervision and high computational…
Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models
Chain-of-thought (CoT) reasoning is the dominant paradigm for inference-time scaling in language models, yet the causal influence of individual steps on the final answer poorly understood. We estimate each step's causal…
One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders
Search-augmented LLMs increasingly mediate everyday consumer recommendations by retrieving live web content. This creates a new risk: generative recommenders may consume polluted web content, such as fake reviews and pro…
Beyond Uniform Tokens: Adaptive Compression for Time Series Language Models
Large language models (LLMs) have enabled time series (TS) analysis by jointly modeling numerical observations and textual context through a shared token interface. However, TS tokens and prompt tokens exhibit fundamenta…
From Tokens to Faces: Investigating Discrete Speech Representations for 3D Facial Animation
The choice of speech representation is critical in speech-driven 3D facial animation. Representations differ in what they encode: SSL features emphasize segmental and semantic cues, neural codecs yield latents optimized…
Operads for compositional reasoning in LLMs
Question decomposition, i.e. breaking a complex query into simpler sub-queries whose answers are composed to produce a final answer, is a widely used strategy for improving LLM reasoning, yet it currently lacks a rigorou…
Recursive Agent Harnesses
Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents at scale, most rece…
SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation
We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types -- nearly 4$\times$ the depth of existing multi…
Operadic consistency: a label-free signal for compositional reasoning failures in LLMs
Detecting LLM reasoning failures at inference time without ground-truth labels has motivated a wide range of confidence baselines, including self-consistency, semantic entropy, and P(True), built on within-question sampl…
EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery
LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produc…
HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents
Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observation, and value transfer is exposed in the main reasoning trace. This creates an \emph{execution-granularity mismatch}…
Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution
With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to…
Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning
Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex re…
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments
Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agent…
Clipping Makes Distributed and Federated Asynchronous SGD Robust to Stragglers
In modern machine learning, parallelization of training is an important strategy for increasing scale. Asynchronous stochastic gradient descent (ASGD), which maximizes the utilization of available hardware by avoiding wa…
Simultaneous Latent Budget Trees for Stratified Classification
In the era of Explainable Artificial Intelligence, there is a renewed focus on single trees for their ease of interpretation. This paper introduces Simultaneous Latent Budget Trees, a probabilistic machine learning frame…
Quantizing Time-Series Models As Dynamical Systems: Trajectory-Based Quantization Sensitivity Score
We introduce the Trajectory-based Quantization Sensitivity Score (TQS), a metric that reframes post-training quantization (PTQ) through the lens of dynamical-systems stability. By modeling the network's rollout as a disc…
Physics-Guided Spatiotemporal Learning for Coastal Wave Peak Period Estimation from Video
Wave parameters in the nearshore are crucial for coastal engineering, shoreline protection, marine hazard assessment, and coastal management for climate resilience. Traditional monitoring systems like buoys and radar pla…
Rarity-Gated Context Conditioning for Offline Imitation Learning-Based Maritime Anomaly Detection
Contextual anomaly detection aims to identify abnormal behavior conditional on context variables, but practical deployments often face highly imbalanced context distributions where rare regimes can be critical informatio…
Navigating the Safety-Fidelity Trade-off: Massive-Variate Time Series Forecasting for Power Systems via Probabilistic Scenarios
Probabilistic forecasting models are increasingly deployed on multivariate systems with distinct channel physics and operational constraints, but existing benchmarks evaluate neither property at scale. Public canonical m…
Enhanced Low-Density Region Exploration in Classifier-Guided Diffusion Models Through Modified Reverse Diffusion Sampling
Diffusion models have emerged as state-of-the-art generative models for high-fidelity image synthesis, particularly in their classifier-free guided and classifier-guided forms. However, standard classifier guidance conce…
VideoMDM: Towards 3D Human Motion Generation From 2D Supervision
We introduce VideoMDM, a diffusion-based framework that trains 3D human motion priors directly from accurate 2D poses extracted from monocular videos, without any 3D ground truth. A pretrained 2D-to-3D lifter provides ap…
Positional Encoding in the Context of Memristor-Based Analog Computation for Automatic Speech Recognition
Memristors provide a new chance for resource-efficient computation of neural models for natural language processing by enabling analog execution of vector-matrix-multiplication. Yet, computations on these devices are cur…
Hölder++: Improving the Quality-Coherence Trade-off in Multimodal VAEs
Existing approaches for multimodal variational autoencoders (VAEs) face a trade-off between generative quality and coherence-i.e., they struggle to generate realistic and diverse samples that, at the same time, are seman…
PolyFlow: Safe and Efficient Polytope-Constrained Flow Matching with Constraint Embedding and Projection-free Update
While flow-based generative models have demonstrated strong performance across a wide range of domains, deploying them in safety-critical physical systems remains challenging due to strict constraint requirements. Existi…
Foundations of Practical Quantum Advantage in Quantum-Informed Machine Learning for Predicting Chaos
We develop theoretical foundations for a practical quantum-advantage mechanism in quantum-informed machine learning for chaotic dynamical systems. A family of k-indexed higher-order quantum statistical priors (Q-Priors)…
Accelerating Speculative Diffusions via Block Verification
Speculative decoding speeds up LLM inference by using a draft model to generate tokens, with an acceptance-rejection scheme that ensures that the output matches the target distribution. Adapting this to continuous diffus…
S-GBT: Smooth Growth Bound Tensor for Certified Robustness Against Word Substitution Attacks in NLP
Despite recent progress in Natural Language Processing (NLP), models remain vulnerable to word substitution attacks. Most existing defenses focus on first order sensitivity and measure how much the output changes when th…
How Much Memory Do We Need? Adaptive Memory Gate for Neural Operators
Neural operators have emerged as a powerful data-driven approach for solving time-dependent PDEs. Among recent advances, memory-augmented neural operators explicitly incorporate past states and have achieved remarkable p…
Clustering Node Attributed Networks with Graph Neural Networks and Self Learning
Graph clustering - partitioning the node set of a graph into disjoint subsets that reflect some latent information - is a fundamental problem as it finds applications in a myriad of different scenarios. While this classi…
Uncertainty Estimation for Molecular Diffusion Models
Diffusion models have seen wide adoption for 3D molecular generation, yet they offer no principled signal of when a generated molecule is likely to be of low quality. We propose a post-hoc method for estimating per-sampl…
Optical Implementation of Equilibrium Propagation Using Spatial Photonic Ising Machines
Equilibrium Propagation offers a compelling alternative to traditional machine learning for training energy-based networks. Here we demonstrate a hybrid optical-digital implementation of EP using a Spatial Photonic Ising…
Reinforcement Learning for Neural Model Editing
Editing pretrained neural networks requires specialized algorithms tailored to specific objectives. Designing such algorithms is often time-consuming and demands significant effort. We present an exploratory framework th…
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling
We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verifica…
SupraBench: A Benchmark for Supramolecular Chemistry
Supramolecular chemistry, which includes the study of non-covalent host-guest assemblies, has advanced various applications. However, designing host-guest systems remains time-consuming, requiring days of dry-lab verific…
CRAFTIIF: Cross-Resolution Analytic Four-Type Interpretable Isolation Forest for Multivariate Time Series Anomaly Detection
Anomaly detection in multivariate time series is challenged by four structurally distinct anomaly types -- point (isolated spikes), distributional (level shifts), temporal (rhythm changes), and collective (inter-sensor c…
GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving
Diffusion Transformers (DiTs) have become the dominant architecture for image and video generation, creating growing demand for efficient DiT serving. Existing systems assign each request a fixed parallel configuration t…
MaskWAM: Unifying Mask Prompting and Prediction for World-Action Models
World Action Models (WAMs) present a promising paradigm for robotic control via video prediction. However, current WAMs suffer from fundamental spatial bottlenecks: standard text inputs introduce referential ambiguity in…
Ride, Track, and Recover: Pilot Randomized Trial of a Wearable Digital Self-Management Intervention During a Veteran Endurance-Cycling Program
Post-traumatic stress disorder (PTSD) in veterans is characterized by persistent hyperarousal and comorbid anxiety and depressive symptoms that are difficult to monitor and manage outside clinical settings. Thirteen vete…
Graphical Causal Reasoning for Root Cause Analysis in Cloud Networks
Cloud-computing relies on large-scale networks which are inherently complex systems. In this paper, we present a novel approach to root cause analysis (RCA) of cloud network incidents, leveraging graph-based causal disco…
NetCause: Counterfactual Learning for Root Cause Analysis in Large-Scale Networks
Can a learned model capture how faults propagate through a large-scale network and use this knowledge to causally attribute customer impact to its underlying root cause? Existing root cause analysis techniques often rely…
A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding
Discrete diffusion models offer a simple and stable likelihood-based framework for sequence generation, recently extended to any-length settings via token insertion. Principled reward-guided fine-tuning for any-length di…
Adjusted Cup-Product Neural Layer
Many important observables in physics and geometry are cup products of cochains. The adjusted cup product neural layer has been introduced in this paper. It is a neural primitive that hard wires the cup product with an a…
Existence Precedes Value: Joint Modeling of Observational Existence and Evolving States in Time Series Forecasting
Real-world time series are often highly incomplete and irregular due to sensor dormancy, transmission delays, and event-driven sampling, making reliable forecasting fundamentally challenging. Existing methods have evolve…
Learning with Simulators: No Regret in a Computationally Bounded World
Understanding the minimal assumptions necessary for generalization is the fundamental question in learning theory. Unfortunately, most results rely heavily on independence (or some proxy thereof) of the data-generating p…
LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories
Scientific laboratories increasingly rely on AI systems to reason about experiments, but the physical act of doing science remains largely outside their reach. AI can help read literature, generate hypotheses, and plan p…
Simplex-Constrained Sparse Bagging: Transitioning from Uniform Priors to Sparse Posteriors in Ensemble Learning
We present Simplex-Constrained Sparse Bagging (SCSB), a mathematically rigorous framework for post-training compression and probability calibration of bootstrap-based bagging ensembles. Standard bagging ensembles (such a…
Multiagent Protocols with Aggregated Confidence Signals
Confidence is used for reliability, oversight, and a range of downstream decision tasks in Natural Language Processing (NLP), yet no existing method produces or evaluates a confidence for the output of a multiagent syste…
Reward Modeling for Multi-Agent Orchestration
Multi-Agent Systems (MAS) built on Large Language Models (LLMs) require effective orchestration to coordinate specialized agents, yet training such orchestrators is hindered by limited supervision and high computational…
Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models
Chain-of-thought (CoT) reasoning is the dominant paradigm for inference-time scaling in language models, yet the causal influence of individual steps on the final answer poorly understood. We estimate each step's causal…
Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch
Dispatch in three-sided marketplaces provides a natural setting for reinforcement learning from world feedback: decisions are evaluated by delayed operational outcomes such as delivery speed, courier utilization, and mer…
Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning
This paper presents a distribution-agnostic robust trajectory-optimization framework based on chance-constrained reinforcement learning. The uncertainty is represented here through initial conditions and process noise, w…
AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility
Agent systems are advancing quickly across domains, but their evaluation remains fragmented. Most benchmarks rely on fixed, LLM-centric harnesses that require heavy integration, create test-production mismatch, and limit…
Majority-of-Three is Optimal
We give a short proof that the majority vote of three independent consistent classifiers is an optimal learner in the realizable PAC setting. This proves optimality for the simplest voting scheme, while simplifying both…
Beyond Runtime Enforcement: Shield Synthesis as Defensibility Analysis for Adversarial Networks
Shielded reinforcement learning is typically presented as a runtime safety mechanism that compiles temporal-logic specifications into automata restricting an agent's actions. We argue this is the wrong product. The same…
Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approaches
We study generative modeling of Bach-style symbolic piano music using a shared MIDI corpus and three model families: autoregressive LSTMs with attention, latent-variable models including recurrent VAEs and vector-quantiz…
Valid Inference with Synthetic Data via Task Exchangeability
There is a proliferation of work arguing for the use of synthetic data in scientific research. For example, social scientists are arguing for the use of LLM-generated "silicon samples" in pilot studies; AI evaluations in…
Aerial Wildfire Suppression Planning with a Hybrid CNN-Cellular Automata Fire Model
Aerial wildfire suppression requires not only predicting fire spread, but also designing effective intervention strategies under operational and environmental uncertainty. We present a modeling and optimization framework…
The Stable Recovery Manifold: Geometric Principles Governing Recoverability in Continual Learning
Catastrophic forgetting is often viewed as the destruction of previously learned knowledge during sequential learning. Building on the Accessibility Collapse framework, we investigate the geometric structure of recoverab…
SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation
We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types -- nearly 4$\times$ the depth of existing multi…
Operadic consistency: a label-free signal for compositional reasoning failures in LLMs
Detecting LLM reasoning failures at inference time without ground-truth labels has motivated a wide range of confidence baselines, including self-consistency, semantic entropy, and P(True), built on within-question sampl…
Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation
On-policy distillation (\textsc{OPD}) has recently become a prominent post-training recipe as it combines two desirable ingredients: on-policy student trajectories and dense teacher supervision, yet how this hybrid chang…
Understanding Truncated Positional Encodings for Graph Neural Networks
Positional encodings (PEs) enhance the power of graph neural networks (GNNs), both theoretically and empirically. Two of the most popular families of PEs - spectral (e.g., Laplacian eigenspaces, effective resistance) and…
Mana: Dexterous Manipulation of Articulated Tools
Articulated tool manipulation remains a major challenge in dexterous robotics due to the need to coordinate internal degrees of freedom and contact-rich interactions. While prior work has largely focused on rigid objects…
IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing
Computer-Aided Design is pivotal in modern manufacturing, yet existing automated methods predominantly rely on open-loop, one-shot generation, creating a mismatch with iterative real-world practices. In this paper, we pr…
A Quantitative Experimental Repeated Measures Study of Training Dynamics in a Small Llama Style Language Model Under a Compute-Aware Token Budget
This study examines training dynamics in a small Llama-style language model trained under a fixed, compute-constrained token budget. Rather than evaluating efficiency solely through endpoint performance, the study uses a…
An LLM System for Autonomous Variational Quantum Circuit Design
The design of high performing quantum circuits remains largely dependent on human expertise. We introduce an autonomous agentic framework that employs large language models (LLMs) to conduct iterative quantum circuit des…
SmartFont: Dynamic Condition Allocation for Few-Shot Font Generation
Few-shot font generation simultaneously requires global structural completeness and fine-grained local style fidelity. Existing methods usually either rely on global content-style modeling, which is robust but imperfectl…
Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents
Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerab…
MiniMax Sparse Attention
Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to m…
Mod-Guide: An LLM-based Content Moderation Feedback System to Address Insensitive Speech toward Indigenous Ethnic and Religious Minority Communities
Language operates as a mechanism of both marginalization and resistance, especially for minority communities navigating insensitive and harmful speech online. As content moderation increasingly depends on large language…
PolyFlow: Safe and Efficient Polytope-Constrained Flow Matching with Constraint Embedding and Projection-free Update
While flow-based generative models have demonstrated strong performance across a wide range of domains, deploying them in safety-critical physical systems remains challenging due to strict constraint requirements. Existi…
Neuro-Symbolic Agents for Regulated Process Automation: Challenges and Research Agenda
LLM-based agents are entering regulated industries where they automate judgment intensive quality management processes. We argue that symbolic structures already embedded in these domains, including regulations, typed pr…
Optimizing Appliance Scheduling for Solar Energy Management Using Metaheuristic Algorithms
Renewable energy is essential for meeting future energy demands; however, solar energy generation, which occurs only during daylight hours often does not align with household consumption patterns. Appliances such as cook…
OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data
Cloning camera motion from reference videos is an important task in video generation, as videos provide intuitive and precise control. Existing methods either directly use parametric representations that fail to handle m…
Evaluation Sovereignty in Metadata-Driven Classification: A Multi-Track Framework for Weakly Supervised Information Systems
Evaluation in machine learning is typically treated as a neutral measurement process. However, in operational information systems, evaluation outcomes are often conditioned by the processes used to generate labels. This…
Why Sampling Is Not Choosing: Intentionality, Agency, and Moral Responsibility in Large Language Models
Recent advances in large language models (LLMs) have prompted claims that such systems exhibit agency or qualify as moral agents. This paper argues that these attributions are misguided. We maintain that moral responsibi…
Toward Instructions-as-Code: Understanding the Impact of Instruction Files on Agentic Pull Requests
AI-agents (e.g., GitHub Copilot) collaborate as teammates in different software engineering tasks, including code generation proposed through pull requests (Agentic-PRs). For better agent efficiency, developers create in…
Ontology Memory-Augmented ASR Correction for Long Text-Speech Interleaved Conversations
Automatic speech recognition (ASR) correction has traditionally focused on isolated utterances or short local contexts. However, as text and speech become increasingly interleaved in long interactions, ASR correction req…
Understanding the Rejection of Fixes Generated by Agentic Pull Requests -- Insights from the AIDev Dataset
AI coding agents are increasingly used to generate pull requests (PRs) that propose code fixes in software projects. From a first exploration of the AIDev dataset, we find that 46.41\% of the fixes proposed by the agents…
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling
We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verifica…
SupraBench: A Benchmark for Supramolecular Chemistry
Supramolecular chemistry, which includes the study of non-covalent host-guest assemblies, has advanced various applications. However, designing host-guest systems remains time-consuming, requiring days of dry-lab verific…
CRAFTIIF: Cross-Resolution Analytic Four-Type Interpretable Isolation Forest for Multivariate Time Series Anomaly Detection
Anomaly detection in multivariate time series is challenged by four structurally distinct anomaly types -- point (isolated spikes), distributional (level shifts), temporal (rhythm changes), and collective (inter-sensor c…
Heterogeneous LiDAR Early Fusion and Learned Re-Ranking Strategy for Robust Long-Term Place Recognition in Unstructured Environments
Robust localization in unstructured environments, such as agricultural fields, is a critical challenge for autonomous systems. LiDAR sensors provide detailed 3D information about the environment and are invariant to ligh…
Measurement-Calibrated Multi-Camera Fusion for Vision-Based Indoor Localization
Indoor vision-based localization systems are affected by detection noise, occlusions, and limited camera coverage, leading to uncertainty at multiple stages of the pipeline. While multi-camera data fusion is widely used…
CloudCons: A Comprehensive End-to-End Benchmark for Cloud Resource Consolidation
Driven by conservative over-provisioning to guarantee service reliability, resource utilization in cloud data centers remains at low levels. To mitigate this, the forecast-then-optimize paradigm has emerged to optimize c…
AgentRivet: an automated system for producing Rivet routines from journal publications
Particle physics collider experiments provide Rivet routines as part of the analysis preservation strategy for model-independent measurements. Rivet is a C++ toolkit that allow new theoretical models to be compared to th…
Adaptive Turn-Taking for Real-time Multi-Party Voice Agents
Turn-taking in multi-party spoken conversations remains a fundamental challenge for voice-based agents, particularly under dynamic floor competition and varying user expectations. We propose ModeratorLM, a role-playing v…
Uncertainty-Aware Hybrid Retrieval for Long-Document RAG
Retrieval augmented generation (RAG) depends critically on the quality and granularity of retrieved evidence. Large retrieval units preserve context but often introduce irrelevant content, which can dilute answer bearing…
Is It You or Your Environment? A Bayesian Inference Framework for Genomically-Anchored Personalized Physiological Interpretation
Personalized health AI systems face a fundamental cold-start problem: machine learning models for physiological interpretation require weeks of individual behavioral data before they can distinguish constitutional variat…
Contrast-Informed Augmentation and Domain-Adversarial Training for Adult-to-Neonatal MR Reconstruction Generalization
Purpose: To investigate whether contrast-informed data augmentation and domain-adversarial training improve the adult-to-neonatal generalization of the E2E-VarNet. Methods: Three training regimes were investigated: (1) a…
A Three-Layer Framework for AI in Scientific Discovery
Current discussions of AI in scientific discovery are often dominated by two visible capabilities: search over existing knowledge and execution through optimization, simulation, and automation. Both are important, but ne…
Existence Precedes Value: Joint Modeling of Observational Existence and Evolving States in Time Series Forecasting
Real-world time series are often highly incomplete and irregular due to sensor dormancy, transmission delays, and event-driven sampling, making reliable forecasting fundamentally challenging. Existing methods have evolve…
ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages
Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, especially in multilingual and lo…
LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories
Scientific laboratories increasingly rely on AI systems to reason about experiments, but the physical act of doing science remains largely outside their reach. AI can help read literature, generate hypotheses, and plan p…
EvTexture++: Event-Driven Texture Enhancement for Video Super-Resolution
Event-based vision has drawn increasing attention owing to its distinctive properties, including ultra-high temporal resolution and extreme dynamic range. Recent works have introduced it to video super-resolution (VSR) t…
Multiagent Protocols with Aggregated Confidence Signals
Confidence is used for reliability, oversight, and a range of downstream decision tasks in Natural Language Processing (NLP), yet no existing method produces or evaluates a confidence for the output of a multiagent syste…
Reward Modeling for Multi-Agent Orchestration
Multi-Agent Systems (MAS) built on Large Language Models (LLMs) require effective orchestration to coordinate specialized agents, yet training such orchestrators is hindered by limited supervision and high computational…
EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis
We introduce EpiBench, a verifiable benchmark for short-horizon epigenomics analysis. EpiBench evaluates whether agents can make well-defined analysis decisions from realistic workflow states and return deterministically…
Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models
Chain-of-thought (CoT) reasoning is the dominant paradigm for inference-time scaling in language models, yet the causal influence of individual steps on the final answer poorly understood. We estimate each step's causal…
Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch
Dispatch in three-sided marketplaces provides a natural setting for reinforcement learning from world feedback: decisions are evaluated by delayed operational outcomes such as delivery speed, courier utilization, and mer…
Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday Reasoning
When large language models (LLMs) fail to generalize or make haphazard errors in reasoning, it is often taken as evidence that LLMs are not truly reasoning, but rather performing a kind of pattern matching. The implicati…
AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility
Agent systems are advancing quickly across domains, but their evaluation remains fragmented. Most benchmarks rely on fixed, LLM-centric harnesses that require heavy integration, create test-production mismatch, and limit…
One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders
Search-augmented LLMs increasingly mediate everyday consumer recommendations by retrieving live web content. This creates a new risk: generative recommenders may consume polluted web content, such as fake reviews and pro…
Beyond Runtime Enforcement: Shield Synthesis as Defensibility Analysis for Adversarial Networks
Shielded reinforcement learning is typically presented as a runtime safety mechanism that compiles temporal-logic specifications into automata restricting an agent's actions. We argue this is the wrong product. The same…
Valid Inference with Synthetic Data via Task Exchangeability
There is a proliferation of work arguing for the use of synthetic data in scientific research. For example, social scientists are arguing for the use of LLM-generated "silicon samples" in pilot studies; AI evaluations in…
SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation
We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types -- nearly 4$\times$ the depth of existing multi…
Before You Think: System 0, AI-Mediated Cognition and Cognitive Colonization
This paper examines three recent frameworks for understanding the cognitive and epistemic consequences of artificial intelligence: Tri-System Theory, Thinkframes, and System 0. It argues that while the first two capture…
EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery
LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produc…
Agents-K1: Towards Agent-native Knowledge Orchestration
Current LLM-based research agents have advanced through agent orchestration, yet largely overlook scientific knowledge orchestration. Existing works often reduce papers to abstracts, surface mentions, and flat \texttt{ci…
Automated reproducibility assessments in the social and behavioral sciences using large language models
Reproducibility in the social and behavioral sciences is typically evaluated by independent researchers who reanalyze the original data to assess whether the published findings can be recovered. However, such approaches…
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by…
Mana: Dexterous Manipulation of Articulated Tools
Articulated tool manipulation remains a major challenge in dexterous robotics due to the need to coordinate internal degrees of freedom and contact-rich interactions. While prior work has largely focused on rigid objects…
Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning
Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex re…
Introducing Claude Corps
DXC will integrate Claude into the systems banks, airlines, and other regulated industries rely on
Encoding Your Domain Expert: The Context Layer Behind Spotify's Data Assistant
At Spotify, data problems used to follow a specific pattern. You'd look for the relevant dashboard, there... The post Encoding Your Domain Expert: The Context Layer Behind Spotify's Data Assistant appeared first on Spoti…
From data to decisions: how LSEG is scaling trusted AI
See how LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles, and empowering 4,000 employees.
PRC-linked influence operations are targeting AI debates in the US
A new report from OpenAI details PRC-linked influence operations using AI to target U.S. tech debates, data center narratives, tariffs, and false claims about ChatGPT.
Access OpenAI models and Codex through your Oracle cloud commitment
Access OpenAI models and Codex through Oracle Cloud, using existing commitments to build and deploy AI with enterprise security and governance.
Designing Production-Ready Battery Energy Storage Systems for AI Factories
AI factories are changing what data-center infrastructure must do. Unlike traditional data centers, AI factories are built to manufacture intelligence at scale....
Run DiffusionGemma on NVIDIA for Developer-Ready, High-Throughput Text Generation
Developers building real-time AI—such as chat assistants, copilots, and agentic workflows—are often constrained by token-by-token generation speed. This...
Spec-Driven Development: A Spec-First Approach to AI-Native Engineering
AI has made software delivery faster, but speed alone does not guarantee better outcomes. As teams adopt AI-native development, the real challenge is keeping requirements, design, implementation, and validation aligned s…
Is your agent extension actually working?
This is the third article in a series about Agent Experience (AX): the practice of making AI coding agents work correctly with your technology. The series covers what you can and can t control in the agent stack, how to…
Metric Semantic Layer: How Lyft Governs and Scales Key Data Definitions
Written by Rohit Channe and Simran Mirchandani at Lyft. Motivation At Lyft, data isn’t just a resource — it’s woven into everything we do. Metrics drive key forecasts, steer operational decisions, and put our boldest hyp…
GitLab on Google Cloud: Fully managed, compliant, and AI-ready
You can now run GitLab as a fully managed platform on Google Cloud, delivered by GitLab-certified managed service providers (MSPs) — with the latest Google AI models built in. Through MSP partners, including Beyond and D…
GitLab: Built for the agentic engineering era
GitLab Transcend, our customer event showcasing our roadmap, success stories, and industry research just wrapped. Here s what we announced and demonstrated: Next-generation source code management , a Git engine rebuilt f…
GitLab Flex: Commit once, reshape your seats and AI spend
The agentic era made your needs harder to predict, and the way you buy software hasn t caught up. Six months out, you don t know how many seats you ll need, how much AI your teams will consume, or which new capabilities…
Introducing GitLab Orbit: Full code and lifecycle context, in one query
Agents are good at writing code. They re far worse at navigating the system around it: the related code, the pipelines that run it, the deployments that ship it, the work items that asked for it, and the teams that own i…
Give GitHub Copilot CLI real code intelligence with language servers
Install and configure LSP servers for GitHub Copilot CLI, replacing brute-force grep/decompile with real code intelligence. The post Give GitHub Copilot CLI real code intelligence with language servers appeared first on…
Game On: Discord Is Backing the Next Generation of Dutch Gaming Founders
Together with Techleap, we’re launching the Gaming Founders Circle: a hands-on program for the most ambitious gaming companies in the Netherlands to provide participants with a direct line to the people, investors, and n…
Updated Requirements to How Apps Access Data in Servers
Discord is updating requirements to how apps access certain data within servers, changing the review threshold and requiring annual review. Here s what s changing and why.
Investing in multi-agent AI safety research
Google DeepMind and partners announce a $10M funding call for multi-agent safety research.
DiffusionGemma: 4x faster text generation
The future of work debate has an evidence problem
Route public traffic to private applications with Cloudflare
Application Services for Private Origins is available now in closed beta. Route public hostnames to private IP origins over your existing IPsec, GRE, CNI, or Cloudflare Mesh paths. No public IPs or extra connector softwa…
Now available: Amazon EC2 M9g and M9gd instances powered by new AWS Graviton5 processors
AWS launches Amazon EC2 M9g and M9gd instances, powered by AWS Graviton5 processors. AWS Graviton5 is most powerful, and most energy efficient processor AWS has ever built, and offers up to 25% better compute performance co…
How to Build Reliable AI Agents: 5 Engineering Patterns from a Production System
By Tuhin Kanti Sharma and Chirag Ramesh Hegde. If you ve built an AI agent that works perfectly in demos but becomes unpredictable in production, you ve probably already discovered that reliability is much harder than ca…
Industrial policy for the Intelligence Age
Explore our ambitious, people-first industrial policy ideas for the AI era—focused on expanding opportunity, sharing prosperity, and building resilient institutions as advanced intelligence evolves.
What Codex unlocks for Notion
How Notion uses Codex to one-shot specs, build AI Voice Input for the web, and multiply engineering power across small teams.
How engineers at Nextdoor use Codex to build without limits
How engineers at Nextdoor use Codex with GPT-5.5 to investigate hard-to-reproduce issues, build across platforms, and focus on product outcomes.
Evaluate Clinical ASR Models Faster with Agent Skills and NVIDIA Nemotron Speech
Training a speech AI model to correctly recognize or synthesize clinical terminology is surprisingly difficult. Drug names like Acetaminophen, Amlodipine,...
Accelerating Federated Learning Research with AI Agents and NVIDIA FLARE Auto-FL
Federated learning (FL) research often begins with a deceptively simple question: What should we try next? A new aggregation rule, a FedProx coefficient, a...
Model Quantization: Turn FP8 Checkpoints into High-Performance Inference Engines with NVIDIA TensorRT
Converting a quantized checkpoint into an NVIDIA TensorRT engine bridges the gap between model optimization and production deployment, enabling faster...
Delivering Lifecycle Control for AI Infrastructure at Scale with NVIDIA DGX Spark Enterprise Manageability
As AI infrastructure scales, enterprise expectations for operational maturity are increasing. Organizations expect these systems to be provisionable,...
From Chaos to Clarity: How We Built a Unified, Self-Routing Support Ops Ticketing System at Lyft
Written by Atul Gupta , Analytics Manager — LUS Support Ops, Lyft At Lyft, getting operators and riders connected quickly and reliably depends on more than technology — it depends on the teams working behind the scenes t…
Migrating Your GitHub CI to Hugging Face Jobs
How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces
Introducing North Mini Code: Cohere’s First Model For Developers
Mythos-class Claude Fable 5 arrives on GitLab Duo Agent Platform
Important note: As of June 12, 2026, Claude Fable 5 is unavailable on GitLab Duo Agent Platform. Anthropic has suspended access to this model for all customers to comply with a U.S. government directive. All other Claude…
Shai-Hulud copycat campaign targets Python developers through PyPI typosquatting
GitLab s Vulnerability Research team has identified a coordinated supply chain attack on PyPI deploying a copy of the Shai-Hulud malware. We found five malicious packages: four typosquats impersonating Flask, Requests, a…
From one-off prompts to workflows: How to use custom agents in GitHub Copilot CLI
Custom agents let GitHub Copilot CLI understand your stack and team workflows, turning one-off terminal prompts into repeatable, reviewable processes. The post From one-off prompts to workflows: How to use custom agents…
How We Moved Discord Voice to the Edge
Moving Discord’s voice and video onto Cloudflare s edge network. Closer servers, lower ping in most regions, and a few real bugs getting there.
Powering the future of robotics in Europe
Introducing Gemma 4 12B: a unified, encoder-free multimodal model
Fluid, natural voice translation with Gemini 3.5 Live Translate
Gemini 3.5 Live Translate brings near real-time, natural speech translation to Google AI Studio, Google Translate and Google Meet.
North Mini Code: Agentic Coding Model for Developers
Defend against frontier cyber models: Cloudflare's architecture as customer zero
In our post about Project Glasswing, we made the argument that the architecture around a vulnerability matters more than the speed of the patch. Here we walk through what that architecture looks like, the threats it defe…
Anthropic Claude Fable 5 on AWS: Mythos-class capabilities with built-in safeguards now available
AWS announces the availability of Claude Fable 5 on Amazon Bedrock and Claude Platform on AWS. Claude Fable 5 delivers Mythos-level capabilities available to all customers, with strong safeguards designed to make it safe…
Claude Fable 5 and Claude Mythos 5
Scaling beyond one: How Airbnb evolved its data architecture for a multi-product world
How Airbnb’s data engineers and analytics engineers built a consistent and flexible data modeling framework to support the expansion into Homes, Experiences, and Services. By : Patrick Lam , Namrata Lamba , Jamie Stober…
Scaling Zero Copy from 1 Trillion to 120 Trillion Rows with File Federation
In our Engineering Energizers Q A series, we highlight the engineering minds driving innovation across Salesforce. Today, we spotlight Srini Krishnamoorthy, Vice President of Engineering for Data 360. Srini leads the evo…
Introducing the OpenAI Economic Research Exchange
OpenAI launches the Economic Research Exchange to study AI’s impact on jobs, productivity, and the economy. Applications are now open for selected research projects.
Built to benefit everyone: our plan
A vision for the future of AI, focusing on access, safety, and shared prosperity as OpenAI works to ensure AGI benefits everyone.
Confidential submission of draft S-1 to the SEC
OpenAI confirms a confidential S-1 submission to the SEC and has not yet determined timing for further action.
Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell
Pre-training frontier LLMs comes down to throughput. When training spans trillions of tokens across thousands of accelerators, every percentage point of step...
Microsoft Build 2026 recap: vision, launches, and top sessions
Catch up on Microsoft Build 2026 with the vision lead-off, top developer announcements, and must-watch sessions across the Microsoft developer ecosystem. The post Microsoft Build 2026 recap: vision, launches, and top ses…
The Open Source Community is backing OpenEnv for Agentic RL
GitHub for Beginners: Answers to some common questions
Find the answers to some of the most common GitHub-related questions. The post GitHub for Beginners: Answers to some common questions appeared first on The GitHub Blog .
Introducing: You Bar
Introducing the You Bar: an update to the Discord mobile app that celebrates your identity, simpler navigation, and a peek at what’s next.
Measuring the impact of learning with AI in Sierra Leone and beyond
Results from a randomized controlled trial show the potential of Gemini’s Guided Learning feature to boost engagement and accelerate learning.
Turning Cloudflare’s threat indicators into real-time WAF rules
Cloudflare customers can now use Cloudforce One threat intelligence directly within the WAF to block high-risk traffic. By using new cf.intel fields, security teams can automate protection against specific threat actors…
AWS Weekly Roundup: BYOM for Amazon RDS for SQL Server, AWS IoT Device SDK for Swift, and more (June 8, 2026)
This week, the AWS IoT Device SDK for Swift reached general availability. As a member of the Swift Server Workgroup (SSWG), this one caught my attention. The SDK brings production-ready MQTT 5 connectivity, Device Shadow…
Introducing the Third Generation of Apple’s Foundation Models
Our next generation of Apple Intelligence is centered around our users, integrated deeply into our operating systems, and powered by a bold new architecture with privacy at its core. At the heart of this architecture is…
Enterprise AI
Product Launch
AI for Developers
Company News
Research
Your AI bill is out of control. Cloudflare can fix it now.
AI Gateway now features real-time spend limits to prevent runaway token bills across multiple AI providers. By integrating with Cloudflare Access, companies can use identity-driven budgets and policies.
Try the new console experience in Amazon Bedrock, optimized for Anthropic- and OpenAI-compatible APIs
You can use the new console experience on Amazon Bedrock to browse and compare the latest AI models side by side, organize work into projects with streamlined evaluation workflows, and access project-aware live documenta…
Rethinking risk in the age of AI
Join senior risk and payments leaders in Seattle to explore how AI is reshaping fraud strategy. Seats are limited.
The future of agentic commerce is here
Explore how AI agents are transforming commerce at Stripe’s Agentic Commerce Next roadshow. Reserve your spot in Seattle.
New ways to turn global demand into revenue
At Sessions 2026, Stripe unveiled dozens of products and capabilities to help businesses turn global demand into revenue. See how to go global faster with localized checkout and Adaptive Pricing, smarter fraud tools, mul…
How Engineering 360 Unified Operations at Scale and Reached 80% Adoption
By Shiva Nimmagadda, Arun Lakshmi Narayanan, and Arun Gangavarapu. Salesforce engineering teams encountered a significant operational hurdle as the organization scaled. Critical data lived across dozens of fragmented das…
Biodefense in the Intelligence Age
An action plan for AI-powered biological resilience
Dreaming: Better memory for a more helpful ChatGPT
ChatGPT introduces a new memory system to better remember preferences, keeping context fresh and relevant across conversations.
How Endava is redesigning software delivery around AI agents
Learn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise.
NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents
Single-turn chatbots are evolving into long-running agents that can reason, maintain context, use tools, and run efficiently across many turns to complete...
Designing the hf CLI as an agent-optimized way to work with the Hub
Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI
GitHub Universe is back: All together now, in the agentic era
GitHub Universe is back: returning to the historic Fort Mason Center in San Francisco on October 28–29, 2026. The post GitHub Universe is back: All together now, in the agentic era appeared first on The GitHub Blog .
Discord Patch Notes: June 4, 2026
Check out the finer details of the more technical fixes implemented into Discord recently.
VoidZero is joining Cloudflare
VoidZero, the team behind Vite, Vitest, Rolldown, Oxc, and Vite+, is joining Cloudflare. Vite stays open source, vendor-agnostic, and built for everyone.
Sitar-agent: Building a reliable dynamic configuration sidecar at scale
How Airbnb built a Kubernetes sidecar to deliver dynamic configuration reliably at scale. By : Bo Teng , Cosmo Qiu , Siyuan Zhou , Ankur Soni , Xin Huang , Willis Harvey Introduction In our previous post , we explored Ai…
Helping businesses optimize network costs with the Visa Digital Commerce Authentication Program (DCAP)
We moved quickly to help Stripe businesses take advantage of DCAP and capture interchange savings while protecting authorization rates. Here’s what we did.
Coding Is No Longer the Constraint: Scaling Developer Experience to Teams and Agents at Spotify
At Code with Claude, Spotify’s chief architect shared how we make both teams and AI agents more effective. The post Coding Is No Longer the Constraint: Scaling Developer Experience to Teams and Agents at Spotify appeared…
How Agentforce Conversation Client Accelerated Accessibility Remediation by 5x Using AI-Driven Workflows
By Prasanna Krishna Sanagala, Ronak Shah, Sandeep Tailor, and Mani Manjari Velnati. In our Engineering Energizers Q A series, we highlight the engineering minds driving innovation across Salesforce. Today, we spotlight P…
A blueprint for democratic governance of frontier AI
OpenAI outlines a blueprint for U.S. governance of frontier AI, proposing a federal framework for safety, resilience, and national security.
OpenAI public policy agenda
OpenAI outlines its public policy agenda for AI, including safety, youth protection, workforce transition, and global standards to ensure AI benefits society.
How Wasmer used Codex to build a Node.js runtime for the edge
See how Wasmer used Codex with GPT-5.5 to build a Node.js runtime for the edge, accelerating development 10x to 20x and shipping in weeks instead of months.
Introducing new capabilities to GPT-Rosalind
GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities.
Dynamic Repartitioning for Time Series Workloads
By Rajiv Shringi , Kaidan Fullerton , Oleksii Tkachuk and Kartik Sathyanarayanan Introduction Netflix’s TimeSeries Abstraction is a scalable system for ingesting and querying petabytes of temporal event data with millise…
Lights Out, Systems On: Validating Instant Power Loss Readiness
We’re introducing Instantaneous PowerLoss Storm, a new testing paradigm within Meta’s infrastructure for handling and mitigating instant or zero-notice power loss in our data centers. We’re sharing: how we built readines…
Adding MCP Tools to Reachy Mini
Direct Preference Optimization Beyond Chatbots
Enforcing the First AS in BGP AS_PATHs
BGP is vulnerable to routing hijacks and path leaks that negatively impact traffic on the Internet. RPKI helps solve some of these problems, but for some forged paths, we need to rely on a simpler mechanism: First AS enf…
Improve your application resilience with Amazon Cognito multi-Region replication
Amazon Cognito now offers multi-Region replication that automatically synchronizes user data, credentials, and pool configurations to a secondary AWS Region, enabling uninterrupted authentication during regional failover…
Introducing the Services Track and Partner Hub of the Claude Partner Network
What we learned mapping a year’s worth of AI-enabled cyber threats
Codex is becoming a productivity tool for everyone
The Next Era of Knowledge Work report explores how Codex is transforming productivity through AI-powered research, data analysis, workflow automation, and content creation.
Advancing youth safety and opportunity through global leadership
OpenAI calls for global action on youth AI safety, proposing an international institute to strengthen safeguards, standards, and opportunities for young people.
Codex for every role, tool, and workflow
Discover new Codex plugins, sites, and annotations that help analysts, marketers, designers, investors, and other teams get more done with AI.
Travelers deploys AI-powered claims countrywide with OpenAI
Travelers built an AI-powered Claim Assistant with OpenAI to guide customers through filing claims, provide 24/7 support, and scale operations during peak demand.
Deploy Agentic-Ready AI at the Edge with Memory Efficiency in NVIDIA JetPack 7.2
As AI agents move from the digital world to the physical environment, they can readily use NVIDIA Jetson to accelerate real-world deployment with optimized...
Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw
AI agents are a powerful tool for synthesizing data to accelerate research, summarize information, and help teams make decisions faster. But combining internal...
Build Personal AI Agents on Windows PCs with New Tools from Microsoft and NVIDIA
AI agents are changing how you interact with your PC. Creators, developers, and AI enthusiasts are already using these agents extensively to assist with...
Semantic IDs: Product Understanding at Scale
Key Contributors: Shrikar Archak, Karuna Ahuja, Soroush Sobhkhiz, Marko Avdalovic, Xiyu Wang, JiChao Zhang, Hao Yan, Chris Hartley Introduction Operating a grocery catalog at Instacart’s scale means managing millions of…
From Scoring to Spelling: Rebuilding Ads Retrieval at Instacart
Key Contributors: Karuna Ahuja, Marko Avdalovic, Soroush Sobhkhiz, Shrikar Archak, Xiyu Wang, Ji Chao Zhang, Hao Yan Introduction Every time a user opens Instacart, they see product recommendations: on the retailer home…
Holo3.1: Fast & Local Computer Use Agents
GitHub Copilot app: The agent-native desktop experience
At Microsoft Build 2026, GitHub introduced new tools, updates, and surfaces so agents can work the way you already work. The post GitHub Copilot app: The agent-native desktop experience appeared first on The GitHub Blog…
Expanding Project Glasswing
When history fails you, borrow from geography
How Airbnb used sequential geographic recovery signals and prior propagation to generate reliable corridor-level forecasts when local data was scarce. By: Harrison Katz The problem with unprecedented shocks Almost every…
OpenAI frontier models and Codex are now available on AWS
OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new path to build with OpenAI through the AWS environments, controls, and procurement workflows they already use. Customers can ge…
Building the infrastructure for the Intelligence Age in Michigan
OpenAI breaks ground on a 1GW data center project in Michigan as part of Stargate, building AI infrastructure to expand access, create jobs, and support communities.
Our views on AI policy and political advocacy
Our approach to AI policy and political advocacy, transparency, support for thoughtful regulation and AI safety, and that no outside political group speaks on the company’s behalf.
NVIDIA DSX OS Delivers Open, Modular Software for Operating AI Factories at Scale
AI is now essential infrastructure, powered by AI factories that generate intelligence in the form of tokens. As demand grows, these factories must scale...
NVIDIA Vera CPU Sets a New Standard for Agentic Workloads in AI Factories
Each wave of AI has created a new scaling law. Pretraining scaled intelligence through larger datasets, more parameters, and massively parallel GPU systems....
Advancing AI Infrastructure for Agentic AI with NVIDIA DOCA In-Silicon Security
The AI era is driving a new class of infrastructure: AI factories that transform data into intelligence for autonomous AI agents operating at unprecedented...
Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3
Physical AI systems must understand the real world before they can act within it. Robots, autonomous vehicles, and smart spaces need to understand what's...
How to Post-Train Autonomous Vehicle Models in Closed-Loop with NVIDIA Alpamayo
Developing autonomous vehicle (AV) policies requires bridging an important gap between training and deployment. Vision-language-action (VLA) models that can...
Run Local AI Agents with Faster Models and Multi-Node Clustering on NVIDIA DGX Spark
The rise of autonomous, long-running AI agents has introduced a new class of compute demand, namely tasks that maintain large context windows, spawn concurrent...
Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic
Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains
RWS and Cohere Build Top-Performing AI Language Intelligence for the Enterprise
How we reduced core unit boot time from hours to minutes
We investigated why firmware updates were causing our core servers to take four hours to reboot. By diving into UEFI data structures and iPXE automation, we eliminated unnecessary timeouts and cut boot times back down to…
AWS Weekly Roundup: Claude Opus 4.8 on AWS, Aurora MySQL with Kiro Powers, and more (June 1, 2026)
In my last Week in Review post, I shared what I’d been hearing from customers in the AI-Driven Development Lifecycle (AI-DLC) workshops I’ve been delivering. Last week I was back at it, this time in Denver for a two-day…
Get started with OpenAI GPT-5.5, GPT-5.4 models, and Codex on Amazon Bedrock
OpenAI frontier models GPT-5.5 and GPT-5.4, and Codex, the OpenAI coding agent, are available on Amazon Bedrock. Deploy frontier models on Bedrock's high performance inference engine with built-in security, governance, a…
Anthropic confidentially submits draft S-1 to the SEC
A shared playbook for trustworthy third party evaluations
OpenAI shares guidance on third-party AI evaluations, covering how to assess model capabilities, safeguards, and validity for frontier systems.
Strengthening societal resilience with Rosalind Biodefense
OpenAI launches Rosalind Biodefense, expanding trusted access to GPT-Rosalind for vetted developers and U.S. government partners advancing biodefense, public health, and pandemic preparedness through frontier AI.
How Braintrust turns customer requests into code with Codex
How Braintrust engineers use Codex with GPT-5.5 to run experiments and code faster.
Boston Children’s uses AI to unlock new diagnoses
Boston Children’s Hospital uses OpenAI technology to improve patient care, reduce operational burden, and help diagnose more than 40 rare disease cases.
Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI
AI applications are moving beyond text generation to multimodal systems that can perceive, search, and reason across images, documents, video, and...
How to Automate AI Model Documentation with the NVIDIA MCG Toolkit
As AI models grow in complexity and regulatory scrutiny intensifies under frameworks including California’s AB-2013 and the EU AI Act, software teams...
DynoSim: Simulating the Pareto Frontier
Modern LLM serving is hard to tune because each deployment is a stack of interacting choices: model backend, tensor-parallel shape, prefill/decode split, worker...
From Silos to Service Topology: Why Netflix Built a Real-Time Service Map
By Parth Jain , Rakesh Sukumar , Yingwu Zhao , Renzo Sanchez Nathan Fisher How we built a living map of our distributed infrastructure to help engineers understand dependencies, troubleshoot faster, and keep Netflix runn…
High-Throughput Graph Abstraction at Netflix: Part I
By Oleksii Tkachuk , Kartik Sathyanarayanan , Rajiv Shringi Introduction Netflix has a diverse range of graph use cases, each serving specific business needs with unique functionality and performance requirements. These…
Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler
From decentralized Docs-as-Code to a centralized repository: Evolving Grab's documentation strategy
Introduction: The journey of documentation at Grab In early 2021, Grab adopted a Docs-as-Code approach to address gaps in our technical documentation processes, as illustrated in our blog post Embracing a Docs-as-Code .…
Solo founding is at an all-time high: Top performers have these traits in common
In 2025, solo founders in the top decile generated 61 times the revenue of the median solo founder in their first six months. We analyzed the data to understand what drives that gap.
Slack AI: The Path to Multi-Cloud
In early 2023, Slack faced a foundational challenge: serving Large Language Models (LLMs) at enterprise scale with the security, reliability, and performance our customers expect. Over three years, we evolved from basic…
OpenAI’s Frontier Governance Framework
Explore OpenAI’s Frontier Governance Framework and how our AI safety, security, and risk practices align with emerging EU and California regulations.
MUFG aims to become AI-native with OpenAI
MUFG uses ChatGPT Enterprise to build an AI-native organization, improve workflows, and deliver new AI-powered financial services at scale.
How Endava builds an agentic organization with Codex
Learn how Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks to hours.
Data Formulator 0.7: AI-powered data analytics for enterprise data
Data Formulator introduces AI-powered analytics for enterprise data workflows. Data teams can easily bring enterprise data into an AI-ready workspace where users can explore, analyze, and visualize data with AI agents to…
AI Now Summit 2026
Vibe gets to work.
Introducing Search Toolkit
Improve your agentic developer tools by grounding in Microsoft Learn
Development workflows span terminals, IDEs, background agents, and custom assistants. What matters is whether they draw from the same current source. Learn MCP Server gives any MCP-compatible agent direct access to curre…
Agentic coding is only as good as its context
Every week, another coding agent demo shows a prompt turning into a merge request in under five minutes. These demos often highlight a narrow use case not yet in production, and they skip everything that happens after th…
Claude Opus 4.8 on GitLab: Complex agentic work, less disruption
Anthropic s latest model on GitLab is built for precise execution across complex multi-step agent work. Agents fail most often on complex, multi-step work: tasks that span multiple tools and go from intent to production…
GitLab Patch Release: 19.0.1, 18.11.4, 18.10.7
Still a developer. Just outside. Our latest GitHub Shop collection is here.
The ESC collection lets you escape the confines of your desk and get out into the sun where good ideas are bound to happen. The post Still a developer. Just outside. Our latest GitHub Shop collection is here. appeared fi…
Beyond code generation: rethinking engineering productivity in the age of AI agents
How Dropbox is moving from AI tools that assist engineers to agentic systems that can execute scoped tasks, and how we’re building platforms to support those workflows.
Official Discord Integrations for Steal a Brainrot, Grow a Garden, Brookhaven RP, and more
How some of the most popular Roblox games have integrated Discord account linking to enhance social features and safety capabilities for official community servers.
How we built Cloudflare's data platform and an AI agent on top of it
Here’s how we built Town Lake, Cloudflare's unified analytics platform, alongside Skipper, an internal AI agent running on top of it.
Introducing the next generation of Amazon OpenSearch Serverless for building your agentic AI applications
AWS rebuilt Amazon OpenSearch Serverless from the ground up for agentic AI and dynamic workloads. Get instant autoscaling and up to 60% cost savings.
Introducing the next generation of AWS Resilience Hub for generative AI-based SRE resilience journey
AWS launches the next generation of AWS Resilience Hub with a significantly expanded experience that brings together a new application model, dependency discovery assessment, generative AI-powered failure mode analysis,…
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026
[object Object]
Anthropic raises $65B in Series H funding at $965B post-money valuation
Introducing Claude Opus 4.8
Beyond the Menu Tree: How Yelp Built a Smarter Customer Success Chatbot with AI
The Evolution of Support: From Fixed Phrases to Conversation At Yelp, delivering responsive and accurate customer support is a core priority. For years, our legacy Customer Success (CS) Chatbot provided support by guidin…
Expanding Stripe Radar to protect more of your business
Radar now blocks high-risk transactions across all supported payment methods; defends against new fraud types like multi-account abuse and pay-as-you-go abuse, regardless of which payment processor you use; and gives pla…
Agentforce’s Agent Script: Building Deterministic Control for Enterprise AI Workflows
In our Engineering Energizers Q A series, we highlight the engineering minds driving innovation across Salesforce. Today, we spotlight Elijah Ben Izzy, Software Engineering Architect at Salesforce. Elijah is building Age…
Election information and safeguards in 2026
Ahead of global elections, we’re helping people access information, supporting cyber defenders, and increasing AI transparency
Warp’s big bet on building open source with GPT-5.5
Warp uses GPT-5.5 and OpenAI models to coordinate coding agents across local, cloud, and open-source development workflows.
Building self-improving tax agents with Codex
See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating workflows.
Cisco and OpenAI redefine enterprise engineering with Codex
Cisco and OpenAI are redefining enterprise engineering with Codex, helping Cisco scale AI-native development, accelerate AI Defense work, and automate defect remediation.
What’s New for Game Developers in NVIDIA RTX: DLSS 4.5 for UE5 and Multilingual AI Characters
NVIDIA RTX provides game developers with direct paths to AI-driven characters, frame generation, and ray-traced rendering. This post walks through a meaningful...
NVIDIA Blackwell Sets STAC-AI Record for LLM Inference in Finance
Large language models (LLMs) are revolutionizing the financial trading landscape by enabling sophisticated analysis of vast amounts of unstructured data to...
NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes
The cold-start problem In production inference deployments, demand fluctuates over time, requiring inference replicas to scale elastically. However,...
Extending Human Intelligence Through AI
Understanding AI as an extension of human intelligence—not a replacement for it—offers a more grounded path for building trustworthy AI systems. The post Extending Human Intelligence Through AI appeared first on Microsof…
Physics AI research that’s shaping the industry.
Introducing physics AI at Mistral: the foundation for engineering acceleration.
How AI coding agents actually use your technology
You ship an SDK, a CLI, an API, and developers use it. Now AI coding agents use it too, except they use it differently than humans do. Most of the time you have no idea what s actually happening between developer types a…
Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL
Reachy Mini goes fully local
Cohere and Mila partner to advance Quebec French language and cultural context in AI
Iran's Internet is partially restored, Cloudflare Radar data shows
Cloudflare Radar data confirms early indications of a partial Internet restoration in Iran, nearly three months after the shutdown began. Traffic spikes and DNS queries have risen, but network activity is currently just…
Meet Our Newest AWS Heroes – May 2026
We’re excited to welcome four outstanding community leaders as our newest AWS Heroes. These individuals embody the spirit of collaboration and knowledge sharing that makes the AWS community thrive. From building AI-power…
Anthropic opens Milan office to support Italian enterprise, research, and developers
Building an Enterprise Agent Platform: Enforcing Identity, Data, and API Governance
While enterprises deploy AI agents at a rapid pace, their governance strategies often remain fragmented. Most organizations enforce identity, data access, and API security in separate silos, which creates dangerous gaps…
Run Key Genomics and Protein Folding Workloads Faster with NVIDIA RTX PRO 4500 Blackwell
Precision medicine depends on two fundamental capabilities: understanding disease at the genomic level and identifying treatments at the molecular level. ...
NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++, Compiler Autotuning, and Python Updates
NVIDIA CUDA 13.3 brings new capabilities and performance optimizations to developers across the CUDA ecosystem. The launch of NVIDIA CUDA Tile programming in...
Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile
Developers can now use NVIDIA CUDA Tile programming within large existing C++ GPU codebases to develop highly optimized GPU kernels using tile-based...
Extract More Kernel Performance with NVIDIA CompileIQ Auto-Tuning
NVIDIA CompileIQ tackles one of the hardest problems in performance engineering: finding the compiler options that unlock the best performance for a specific...
SilverTorch: Index as Model — A New Retrieval Paradigm for Recommendation Systems
We’re introducing SilverTorch, a reimagining of recommendation systems that unifies all retrieval components for user generated content under a unified architecture. SilverTorch shows up to 23.7x higher throughput compar…
Reduce supply chain risk with SBOM-based dependency scanning
Third-party code dominates most codebases, and four recent supply chain incidents show how a single compromised package can ripple into every project that depends on it. AI is compounding this problem: Research suggests…
Full security scanner coverage of your codebase in minutes
Across the industry, every CI/CD platform faces the same challenge: As organizations grow, manually configuring scanners to run across every pipeline definition file isn t scalable. AI is accelerating how fast teams ship…
Shaping Product Understanding with Contrastive Reinforcement Learning
Etsy’s marketplace is defined by the creativity and craftsmanship of our sellers and the hundreds of millions of highly diverse products they offer. You can find silversmiths who cold-forge recycled sterling silver, weav…
Our 2026 Summer Merch Collection Is Here
Anthropic appoints KiYoung Choi as Representative Director of Korea ahead of Seoul office opening
OpenAI, Grupo Folha and Grupo UOL announce strategic content partnership
OpenAI partners with Grupo Folha and Grupo UOL to bring trusted Brazilian journalism to ChatGPT, expanding access to news with attribution and transparency.
Harness, Scaffold, and the AI Agent Terms Worth Getting Right
GitHub for Beginners: Getting started with Git and GitHub in VS Code
Discover how to use VS Code to interact with GitHub and maintain your projects. The post GitHub for Beginners: Getting started with Git and GitHub in VS Code appeared first on The GitHub Blog .
AWS Weekly Roundup: AWS Local Zones in Istanbul, open-source ExtendDB, Kiro Web, and more (May 25, 2026)
There’s something genuinely energizing about working with startups — something I’ve been doing intensely for more than two years now. Startups operate at a different frequency: the urgency is real, the constraints are ti…
Anthropic co-founder Chris Olah's remarks on Pope Leo XIV's encyclical "Magnifica humanitas"
Emmi joins Mistral to accelerate the AI-native industry
Agent Fabric Context Catalog and the Future of AI Governance
Modern agents no longer execute within predictable application boundaries. They invoke APIs dynamically, retrieve enterprise context through MCP servers, orchestrate workflows across multiple platforms and interact with…
OpenAI named a Leader in enterprise coding agents by Gartner
OpenAI is named a leader in the 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, with Codex recognized for innovation and enterprise-scale deployment.
How Virgin Atlantic ships faster with Codex
How Virgin Atlantic used Codex to ship its revamped mobile app on a fixed holiday travel deadline, reaching near-total unit test coverage and zero P1 defects.
Synthesize Realistic 3D Medical Images at Scale to Ship Pre‑Trained Models
High‑quality 3D medical imaging data is the foundation of modern radiology AI, but access to it is often constrained by data scarcity, privacy restrictions,...
Remote agents in Vibe. Powered by Mistral Medium 3.5.
Connect the dots: Build with built-in and custom MCPs in Studio
How AI Changes the Role of Applied Scientists
Levi Boxell, Tilman Drerup, Alexandr Lenk The Economics Team at Instacart is an applied science team that operates at the intersection of machine learning engineering and economics. Similar to other applied science teams…
The Hugo evolution: Engineering Grab's unified, one-click data ingestion platform with Apache Flink
Introduction Data drives every decision we make at Grab. As our operations scale, so does our need for robust, real-time data ingestion and processing frameworks. Enter Hugo: our self-service data platform that has long…
VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models
Streaming vision-language models (VLMs) continuously generate responses given an instruction prompt and an online stream of input frames. This is a core mechanism for real-time visual assistants. Existing VLM frameworks…
How Partition Access Visualizations Reduced our Data Lake S3 Cost by 33%
Introduction In large analytics environments, data teams often struggle to answer deceptively simple questions, like who their stakeholders are and how their data is being used. At Yelp, we address this by visualizing ac…
Making User-Sequence Data More Cost-Efficient, Faster, and Easier to Use
Authors ( listed alphabetically ) Ads Feature Engineering Infra team: Ajay Venkatakrishnan, Le Zhang Core ML Infra team: Eric Shang, Pihui Wei ML Data team: Connor Votroubek, Yi He User Understanding team: Camilo Munoz,…
AdventHealth advances whole-person care with OpenAI
AdventHealth is using ChatGPT for Healthcare to streamline workflows, reduce administrative burden, and return more time to patient care.
Building Token‑Metered AI Services on Telco AI Factories
Telcos around the world are building sovereign AI factories based on the NVIDIA Cloud Partner (NCP) reference architecture, giving governments, enterprises, and...
Unlock Exascale Performance on NVIDIA GB200 NVL72 with Slurm Topology-Aware Job Scheduling
As AI models grow in scale and complexity, realizing the full performance of modern accelerated infrastructure depends as much on how workloads are placed as on...
Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters
Maximizing the value of AI infrastructure demands deep visibility into GPU utilization. Yet many platform teams running AI workloads on Kubernetes operate with...
Automating and Optimizing Financial Signal Discovery with Multi-Agent Systems
In quantitative finance, researchers build algorithms to trade assets, derivatives, and other financial instruments. A key part of that work is finding signals:...
Vega: Zero-knowledge proofs for digital identity in the age of AI
Vega turns a full credential into a single proof, sharing only what is needed and nothing more, with performance that works in real apps. The post Vega: Zero-knowledge proofs for digital identity in the age of AI appeare…
MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models
MagenticLite is an agentic system for small models that works across the browser and local file system in a single workflow. It combines specialized models and orchestration to support efficient agentic performance on ev…
Announcing Web Serial Support in Firefox
Support for Web Serial in Firefox 151 for Desktop Firefox can now connect directly to microcontrollers, development boards, 3D printers, power meters, and other serial-connected hardware from the web. Starting in Firefox…
The AX stack: what’s fixed, where you can win
AI coding agents promise to make you more productive. On the surface they do, but in practice they fall short: agents generate code that doesn t compile, use a deprecated SDK, or pick the wrong service entirely. Is it yo…
GitLab 19.0 released
More AI models for GitLab Duo Agent Platform Self-Hosted
Customers running GitLab Duo Agent Platform Self-Hosted operate under constraints many software teams don t face: data residency mandates, air-gapped networks, and compliance regulations that prohibit sending source code…
Manage CI/CD credentials with GitLab Secrets Manager
Many credential leaks start with a developer who needs a credential, doesn’t have a good place to put it, and improvises. It lands in an over-scoped CI/CD variable, a config file, or a .env committed “just for a moment.”…
Track CI component usage across your organization
If your platform team publishes standardized pipeline components, you ve probably encountered this: once they re out in the wild, you lose visibility. You can t see if anyone’s actually using it, who s on which version,…
Transform MRs from manual tasks to an automated workflow
AI made writing code dramatically faster, but the work between opening a merge request and merging it has stayed almost entirely manual. Assigning reviewers, addressing feedback round after round, untangling conflicts, r…
Introducing Nova, our internal platform for coding agents
Nova lets engineers run multiple coding sessions in parallel and lets internal systems use AI agents as part of automated workflows.
Making It Easier Than Ever to Connect with Friends in League & VAL!
Starting soon, you’ll be able to link your Discord and Riot accounts to sync your friends lists, show your in-game activity as your Discord status, and invite your Discord friends directly to your League or VAL lobby.
We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks
Announcing Claude Compliance API support with Cloudflare CASB
Cloudflare now integrates with the Claude Compliance API, so that security teams can monitor Claude Enterprise activity directly in the Cloudflare Dashboard.
Optimizing Our Build Times by Migrating from Webpack to Rspack
Over the years, Webpack has remained the bundler of choice for many JS projects, including here at Yelp. While it has served us well, its speed has increasingly become a bottleneck as our monorepo continues to grow. Fort…
An OpenAI model has disproved a central conjecture in discrete geometry
An OpenAI model solved the 80-year-old unit distance problem, disproving a major conjecture in discrete geometry and marking a milestone in AI-driven mathematics.
The next phase of OpenAI’s Education for Countries
OpenAI advances Education for Countries, expanding AI adoption in schools with new partnerships, teacher training, and tools to improve global learning outcomes.
How Ramp engineers accelerate code review with Codex
How Ramp engineers use Codex with GPT-5.5 to review code and ship improvements, allowing them to get substantive feedback in minutes instead of hours.
Add a Specialized Deep Research Skill to Agent Harnesses
Agent harnesses like Claude Code, Codex, and LangChain Deep Agents are excellent orchestrators. They manage sessions, chain tools, execute code, and respond to...
Mastering Agentic Techniques: AI Agent Customization
Autonomous AI agents are taking on all types of work for businesses: routing logistics fleets, triaging support tickets, generating code, and orchestrating...
Introducing Command A+: Making sovereign agentic capabilities available to all
Cohere releases Command A+ for agentic AI sovereignty
How Salesforce Built an AI Security Agent for Autonomous Threat Triage
In our Engineering Energizers Q A series, we highlight the engineering minds driving innovation across Salesforce. Today, we spotlight Mor Levi, Vice President of Detection, Analysis and Response at Salesforce, who leads…
Advancing content provenance for a safer, more transparent AI ecosystem
OpenAI advances AI content provenance with Content Credentials, SynthID, and a verification tool to help people identify and trust AI-generated media.
Introducing OpenAI for Singapore
OpenAI for Singapore launches a multi-year AI partnership to expand deployment, build local talent, and support businesses and public services with AI.
Mastering Agentic Techniques: AI Agent Evaluation
Evaluating an AI model and evaluating an AI agent are related—but they answer fundamentally different questions. A model benchmark tests the capability of a...
NVIDIA-Verified Agent Skills Provide Capability Governance for AI Agents
Autonomous AI agents are becoming more capable. Open models, Model Context Protocol (MCP)-connected tools, and portable skills are also making agents easier to...
Agentic-Agile: Why Agent Development Needs Agile (Not Just Prompts)
A bad system will beat a good person [or agent] every time ~Dr. William Edwards Deming (with apologies) I started vibe coding by writing prompts (often dictated into my phone), refining them with an agent in M365 Copilot…
Introducing the Ettin Reranker Family
OlmoEarth v1.1: A more efficient family of Earth observation models
Cohere acquires Reliant AI to expand sovereign enterprise AI for the global biopharma and healthcare sectors
Announcing Claude Managed Agents on Cloudflare
Cloudflare has integrated with Anthropic's Claude Managed Agents to provide a fast, isolated execution environment for autonomous code delivery. This means builders can scale agent workflows globally while strictly contr…
EpiCache: Episodic KV Cache Management for Long-Term Conversation on Resource-Constrained Environments
Modern large language models (LLMs) extend context lengths to millions of tokens, enabling coherent, personalized responses grounded in long conversational history. However, the Key-Value (KV) cache grows linearly with t…
Widening the conversation on frontier AI
KPMG integrates Claude across its core business and workforce of more than 276,000 in strategic alliance
Scaling Airbnb’s identity graph with a unified knowledge graph infrastructure
How Airbnb shifts from PaaS to an internal knowledge graph infrastructure at scale. By: Lucen Zhao , Shukun Yang , Ashish Jain Knowledge graphs offer a natural and powerful way to represent relationships between entities…
Better Experiments with LLM Evals — A funnel, not a fork
TL;DR LLM evals, automated judges that assess relevance, coherence, and quality at scale, are a powerful new... The post Better Experiments with LLM Evals — A funnel, not a fork appeared first on Spotify Engineering .
OpenAI and Dell partner to bring Codex to hybrid and on-premise enterprise environments
OpenAI and Dell partner to bring Codex to hybrid and on-premise environments, helping enterprises deploy AI coding agents securely across data and workflows.
PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend
Codex and GitLab: From code fix to production
Codex, a coding agent, is a lot of fun when you are deep in the terminal. Point it at a repository, give it a focused task, and it gets to work fast. It reads the code, proposes a fix, runs commands, and helps you move f…
Beyond BYOK: Why governance matters for AI agents
GitHub recently announced that Copilot CLI now supports bring-your-own-key (BYOK) and locally running models. Developers can route CLI requests through their own model provider or run a local model entirely offline. But…
GitLab Dedicated for Government now GovRAMP-authorized
State and local agencies now have a faster path to adopting secure, compliant DevSecOps. GitLab Dedicated for Government has achieved GovRAMP Authorization, removing a critical procurement barrier for agencies ready to m…
Every Voice and Video Call on Discord Is Now End-to-End Encrypted
As of March 2026, E2EE is now enforced for every voice and video call on Discord. This represents a multi-year commitment, and Discord’s VP of Engineering is here to talk about why it matters.
Fast-tracking genetic leads to reverse cellular aging
Biologists use Co-Scientist to find novel factors that successfully rejuvenate human cells.
Project Glasswing: what Mythos showed us
In recent weeks, we pointed Mythos and other security-focused LLMs at live code across critical parts of our infrastructure. We share what we observed, the models’ strengths and weaknesses, and what the work around them…
AWS Weekly Roundup: AWS Transform at 1 year, Claude Platform on AWS, EC2 M3 Ultra Mac instances, and more (May 18, 2026)
Just a year ago, we launched AWS Transform for .NET, Mainframe and VMware workloads, the first agentic AI service purpose-built for modernizing enterprise applications at scale. At re:Invent 2025, we introduced AWS Trans…
Anthropic acquires Stainless
Making it easier to understand how content was created and edited
We're expanding our tools to help you understand how content was created and edited across the web.
Gemini for Science: AI experiments and tools for a new era of discovery
A collection of science tools and experiments to expand the scale and precision of scientific exploration.
Introducing Google Antigravity 2.0
Introducing Gemini Omni
Simulate real-world places with Project Genie and Street View
We’re expanding access to Google AI Ultra subscribers globally and introducing a new capability powered by Street View.
How WeatherNext helped the National Hurricane Center better predict Hurricane Melissa’s historic landfall in Jamaica
Learn how our WeatherNext AI model help forecasters give communities unprecedented time to prepare ahead of the historic Hurricane Melissa.
Uncovering repurposed medicines to fight liver fibrosis
Stanford geneticist uses Co-Scientist to help find new treatments for chronic liver disease and liver fibrosis.
Uniting biological toolkits for a new approach to ALS
Co-Scientist unites Boston Children’s Hospital and MIT’s labs to explore new RNA-based treatments for ALS.
Accelerating discovery of liver disease mechanisms
Filippo Menolascina uses Co-Scientist to identify new liver disease treatments and explain why existing drugs only help certain patients.
Opening new paths in aging research
Calico Life Sciences uses Co-Scientist to connect scattered findings and generate new leads in aging research.
Finding the molecular switches behind new infectious diseases
Clare Bryant uses Co-Scientist to identify genetic triggers in emerging infectious diseases.
Strengthening Singapore’s AI Future: A New National Partnership
Google DeepMind and Singapore partner to apply frontier AI to address complex challenges across health, education, and sustainability and more.
Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability
Our recent paper, “LLMs Corrupt Your Documents When You Delegate”, has generated discussion about the reliability of AI systems in delegated workflows. We appreciate the interest in this work and want to clarify several…
Scaling developer experience: How we improved Android Studio in a large monorepo
Introduction Long integrated development environment (IDE) sync/indexing times can quietly erode developer productivity, making code navigation sluggish, spiking memory usage, and slowing down Jetpack Compose preview upd…
Gemini 3.5: frontier intelligence with action
Gemini 3.5 is built to help you execute complex, agentic workflows.
Creating a Multi-Tenant AI Agent Platform Handling 7K+ Sessions Without Cross-Team Interference
In our Engineering Energizers Q A series, we highlight the engineering minds driving innovation across Salesforce. Today, we spotlight Priyanka Saraf, Senior Software Engineer on the Agentforce Foundations team. Priyanka…
How the NVIDIA Vera Rubin Platform is Solving Agentic AI’s Scale-Up Problem
Agentic inference has fundamentally changed the runtime dynamics of inference workloads by introducing non-deterministic trajectories—actions, observations,...
Scaling Personalized Marketing for Multi-Tenant Commerce Platforms
TL;DR Background: Marketing Across Marketplace and Storefront Instacart operates across two distinct commerce experiences: Instacart Marketplace, our first-party consumer marketplace Storefront Pro, our white-label e-com…
Unlocking asynchronicity in continuous batching
Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality
Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse
When a partitioning change to our petabyte-scale ClickHouse cluster caused critical billing jobs to stall, standard metrics showed no obvious errors. This post explores how we identified severe lock contention in ClickHo…
Amazon Bedrock introduces new advanced prompt optimization and migration tool
Amazon Bedrock Advanced Prompt Optimization enables customers to optimize their prompts for their current model or migrate prompts to new models faster than before with built-in evaluation feedback loops. Optimize your p…
Anthropic forms $200 million partnership with the Gates Foundation
PwC is deploying Claude to build technology, execute deals, and reinvent enterprise functions for clients
Accelerated X-Ray Analysis for Nanoscale Imaging (XANI) of Novel Materials
A massive-scale X-ray free-electron laser (XFEL) enables tracking structural and electron dynamics in novel systems, including fusion materials, semiconductors,...
Transform Video Into Instantly Searchable, Actionable Intelligence with AI Agents and Skills
In today’s data-driven world, organizations increasingly rely on video to capture critical information, yet extracting meaningful, real-time insights from...
GridSFM: A new, small foundation model for the electric grid
Introducing GridSFM, a small foundation model that can predict AC optimal power flow in milliseconds, boosting efficiency and unlocking cost savings. Learn how GridSFM gives grid operators direct visibility into congesti…
mimalloc: A new, high-performance, scalable memory allocator for the modern era
mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrat…
Reel Friends: Building Social Discovery that Scales to Billions
On its face the new Friend Bubbles feature looks simple enough. It highlights Reels your friends have watched and reacted to. But sometimes the features that seem the most straightforward require the deepest engineering…
Celebrate Discord’s 11th Birthday with an Exclusive Set of Emoji and Wallpapers
Discord is turning 11 this year! To celebrate, we made over twenty Discord-themed emojis, along with over twenty wallpapers, and even a digital poster for everyone to download, for free!
Browser Run: now running on Cloudflare Containers, it’s faster and more scalable
We’ve enabled higher usage limits, faster performance, better reliability, and increased shipping velocity for our Browser Run product by rebuilding on top of Cloudflare’s Containers. Here’s how.
Introducing Claude for Small Business
Viaduct 1.0 and the future of Airbnb’s data mesh
Moving from an internal tool to a community-driven, production-ready data mesh. By : Ryan Tanner , Raymie Stata , Adam Miskiewicz Introduction We’re excited to announce the 1.0 release of the Viaduct. This release marks…
An Engineer’s Guide to Better AI Skills: Implementing a Testing Process to Optimize Agent…
An Engineer’s Guide to Better AI Skills: Implementing a Testing Process to Optimize Agent Performance in Any Repository or Skill Author: Daniel Reed The tech industry is currently seeing a massive overhaul in the way we…
How to Eliminate Pipeline Friction in AI Model Serving
The path from a trained AI model to production should be smooth, but rarely is. Many teams invest weeks fine-tuning models, only to discover that exporting to a...
Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and multi-task models
MatterSim is expanding what AI can do for materials science—from faster large-scale simulations to MatterSim-MT, a new multi-task model for simulating properties beyond potential energy surfaces alone. The post Advancing…
Migrating Data Ingestion Systems at Meta Scale
Meta’s data ingestion system, which our engineering teams leverage for up-to-date snapshots of the social graph, has recently undergone a significant revamp to enhance its reliability at scale. Moving from our legacy sys…
Co-Scientist: A multi-agent AI partner to accelerate research
Introducing Co-Scientist, a collaborative AI partner built with Gemini to help researchers accelerate scientific breakthroughs.
When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug
We investigated a bug where CUBIC's congestion window became pinned at its minimum floor, causing a performance to plummet. The fix involved correctly measuring idle periods to distinguish RTT wait times from actual appl…
Amazon Redshift introduces AWS Graviton-based RG instances with an integrated data lake query engine
Amazon Redshift RG instances, powered by AWS Graviton, run data warehouse and data lake workloads up to 2.4x as fast as RA3 instances at 30% lower price per vCPU. Its integrated data lake query engine supports open table…
Five vertical SaaS insights from Sessions 2026
AI is forcing platforms to expand beyond pure software. See how vertical SaaS platforms are using payments, financial services, and agentic commerce to build more durable businesses.
Introducing NVIDIA Fleet Intelligence for Real-Time GPU Fleet Visibility and Optimization
The compute capability of large GPU fleets presents unprecedented opportunities to innovate and provide value to customers in record time. Yet these...
SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests
Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instructions to optimize for user interest. The…
Labyrinth 1.1: Making End-to-End Encrypted Backups Even More Reliable
We’re rolling out version 1.1 of Labyrinth, the encrypted storage system and protocol that secures messages and history on Messenger. Labyrinth 1.1 enhances the reliability of end-to-end encrypted backups with a new sub-…
Building Blocks for Foundation Model Training and Inference on AWS
How to Use Nitro: A Beginner’s Guide to Discord’s Premium Subscription
What’s Discord Nitro all about? What perks does it give, and how can you get it? If you’re looking to expand your Nitro knowledge, you’re in the right place.
Nitro Now Comes with Xbox Game Pass and New Benefits. Welcome to Nitro Rewards.
As we hit Nitro’s 10-year anniversary, we re launching Nitro Rewards: a brand-new benefits program built with some of the biggest names in gaming. See what’s coming for Nitro members, for no added cost.
AWS Weekly Roundup: Amazon Bedrock AgentCore payments, Agent Toolkit for AWS, and more (May 11, 2026)
My most exciting news of last week: Amazon Bedrock AgentCore previewed the first managed payment capabilities enabling AI agents to autonomously access and pay for APIs, MCP servers, web content, and other agents. Built…
BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning
Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever m…
Enhancing Ad Relevance: Integrating Real-Time Context into Sequential Recommender Models
Huiqin Xin | Machine Learning Engineer II, Ads Vertical Modeling; Lakshmi Manoharan | Senior Machine Learning Engineer, Ads Vertical Modeling; Karthik Jayasurya | Staff Machine Learning Engineer, Ads Signals; Ziwei Guo |…
Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo
An agentic exchange must preserve a structured interaction: assistant turns interleave reasoning with one or more tool calls, and subsequent user turns return...
Improving Bash Generation in Small Language Models with Grammar-Constrained Decoding
Bash is one of the most flexible and powerful interfaces exposed to AI agents. In the right system, a model that emits grep, curl, tar, or a shell pipeline is...
Scaling ArchUnit with Nebula ArchRules
By John Burns and Emily Yuan Introduction At Netflix, we operate using a polyrepo strategy with tens of thousands of Java repositories. This means that we need to have ways of sharing common build logic across these repo…
How Discord Automates ScyllaDB Clusters at Scale
You ve been asked to stand up a brand-new database cluster, meaning a whole day of configuring dozens of nodes, validating replication, wiring up dual-write pipelines… what if this whole ordeal took less than two hours?…
Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling
.apr-fig { text-align: center; margin: 1.35em 0; line-height: 1.4; } .apr-fig--wide img { display: inline-block; width: 100%; max-width: 100%; height: auto; vertical-align: middle; } .apr-fig--wide-0-8 { max-width: 80%;…
Velox: Learning Representations of 4D Geometry and Appearance
We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiri…
Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures
We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses inpu…
Apple Workshop on Privacy-Preserving Machine Learning & AI 2026
At Apple, we believe privacy is a fundamental human right. As AI capabilities increase and become more integrated into people’s daily lives, advancing research in privacy-preserving techniques is increasingly important t…
RVPO: Risk-Sensitive Alignment via Variance Regularization
Current critic-less RLHF methods aggregate multi-objective rewards via an arithmetic mean, leaving them vulnerable to constraint neglect: high-magnitude success in one objective can numerically offset critical failures i…
Real-Time Performance Monitoring and Faster Debugging with NCCL Inspector and Prometheus
Distributed deep learning depends on fast, reliable GPU-to-GPU communication using the NVIDIA Collective Communication Library (NCCL). When training slows down,...
Model Quantization: Post-Training Quantization Using NVIDIA Model Optimizer
Model quantization is an effective method to reduce VRAM usage and improve inference performance on consumer devices such as NVIDIA GeForce RTX GPUs. By...
Achieving Peak System and Workload Efficiency on NVIDIA GB200 NVL72 with Slurm Block Scheduling
NVIDIA GB200 NVL72 introduces a fundamentally new way to build GPU clusters by extending NVIDIA NVLink coherence across an entire rack. This design enables...
Behind the Scenes Hardening Firefox with Claude Mythos Preview
Two weeks ago we announced that we had identified and fixed an unprecedented number of latent security bugs in Firefox with the help of Claude Mythos Preview and other AI models. In this post, we’ll go into more detail a…
Enhancing Flink deployment with shadow testing
Introduction Ensuring the reliability of Apache Flink deployments in Grab is crucial for the availability of our business-critical, real-time applications. While all applications are tested in a staging environment befor…
Stock Up in the New Rust Shop! Enjoy a Discord-Only 20% Sale on Most Items until 5/21
Starting today, you can now browse, purchase, and even gift and wishlist in-game items for Rust directly in Discord! See how it all works, and learn about a hefty two-week Discord-exclusive launch discount on a wide sele…
How Cloudflare responded to the “Copy Fail” Linux vulnerability
When a critical Linux kernel privilege escalation was publicly disclosed, Cloudflare's security and engineering teams detected, investigated, and mitigated the threat across our global fleet, confirming zero customer imp…
Building for the future
This afternoon, we sent the following email to our global team. One of our core values at Cloudflare is transparency, and we believe it's important that you hear this directly from us because it’s a major moment at Cloud…
What Matters in Practical Learned Image Compression
One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system. Despite this potential, a…
Adding Benchmaxxer Repellant to the Open ASR Leaderboard
vLLM V0 to V1: Correctness Before Corrections in RL
AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields
Explore how AlphaEvolve's Gemini-powered algorithms are driving impact across business, infrastructure, and science.
When DNSSEC goes wrong: how we responded to the .de TLD outage
On May 5, 2026, DENIC published broken DNSSEC signatures for the .de TLD, making millions of domains unreachable. Here's what 1.1.1.1 saw, how serve stale cushioned the impact, and how we restored resolution.
The AWS MCP Server is now generally available
AWS announces the general availability of the AWS MCP Server, a managed remote Model Context Protocol (MCP) server that gives AI agents and coding assistants secure, authenticated access to all AWS services. The AWS MCP…
Higher usage limits for Claude and a compute deal with SpaceX
From SSH to REST: A Security-Driven Modernization of Slack’s EMR Data Pipelines
Excerpt By 2024, Slack s data platform had accumulated 700+ SSH-based operators orchestrating critical data pipelines. We re talking daily search indexing that processed terabytes of data, analytics jobs powering busines…
Building for the Rising Complexity of Agentic Systems with Extreme Co-Design
Generative AI’s explosive first chapter was defined by humans sending requests and models responding. The agentic chapter is different. Agents don't...
Trustworthy JavaScript for the Open Web
The open web is a critical platform for applications that handle highly sensitive data, from private communications to financial transactions and medical records. Traditionally, servers are trusted to deliver the appropr…
Azure Cosmos DB Conf 2026 Recap: Lessons from Production
A team was running at 100% RU utilization. Throttles were compounding into retries. P99 latency was degrading. The assumption was obvious: provision more throughput. They didn’t. Instead, they found a single logical part…
Modernize your workflows: Amazon WorkSpaces now gives AI agents their own desktop (preview)
Amazon WorkSpaces now lets AI agents securely operate legacy desktop applications—without APIs or modernization—using IAM authentication, MCP support, and computer vision within existing security frameworks.
Agents for financial services
Monitoring reliably at scale
Designing monitoring that works when everything else doesn’t. By : Abdurrahman J. Allawala Introduction When an incident hits, teams lean on observability to answer the only questions that matter: what’s broken, and why?…
Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph
Saish Sali , Nipun Kumar , Sura Elamurugu Introduction As Netflix has grown, machine learning continues to support our ability to deliver value to members and drive excellence across multiple areas of our business. When…
Empowering Carrot Ads with Domain Adaptive Learning
Authors: Trey Zhong, Xiyu Wang Contributors: Joseph Haraldson, Sharad Gupta, Sarah Lamacchia Introduction Carrot Ads is Instacart’s omnichannel retail media solution that allows retailer partners to build and scale their…
Discord Patch Notes: May 4, 2026
Check out the finer details of the more technical fixes implemented into Discord recently.
AWS Weekly Roundup: What’s Next with AWS 2026, Amazon Quick, OpenAI partnership, and more (May 4, 2026)
Last week, I took some time off in York, England, often described as the most haunted city in the country. I wandered through the ruins of abbeys that have stood for nearly a thousand years, walked along medieval walls,…
Building a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Building a Natural Language Interface to the Spotify Ads API with Claude Code Plugins
Turning OpenAPI spec and Markdown files into a conversational ads management tool — no compiled code required. The post Building a Natural Language Interface to the Spotify Ads API with Claude Code Plugins appeared first…
Optimizing ML Workload Network Efficiency (Part I): Feature Trimmer
Guangtong Bai | Staff Software Engineer, Product ML Infrastructure*; Shantam Shorewala | Software Engineer II, Product ML Infrastructure*; Chi Zhang | Staff Software Engineer, AI Platform*; Neha Upadhyay | Software Engin…
State of Routing in Model Serving
By Nipun Kumar , Rajat Shah , Peter Chng Introduction This is the first blog post in a multi-part series that shares technical insights into how our ML model serving infrastructure powers several personalized experiences…
How Meta Is Strengthening End-to-End Encrypted Backups
The HSM-based Backup Key Vault Meta s HSM-based Backup Key Vault provides the foundation for end-to-end encrypted backups for WhatsApp and Messenger. The system allows people to protect their backed-up message history wi…
Code Orange: Fail Small is complete. The result is a stronger Cloudflare network
We have completed a massive engineering effort to make our infrastructure more resilient. Through new tools like Snapstone and the Engineering Codex, we've implemented safer configuration changes and automated best pract…
Data Mesh at Grab (Part II): The foundational tools behind certification
Introduction In Part I , we discussed why Grab is investing in a data mesh, referred to as the Signals Marketplace within Grab, as part of our evolving data culture. We also explained how data certification aids teams in…
Enabling a new model for healthcare with AI co-clinician
Researching the path to AI-augmented care and development of an AI co-clinician.
Everything we announced at Sessions 2026
We’re making Stripe even more programmable; protecting and propelling your business with the strength of the Stripe network; and building economic infrastructure for AI.
Giving agents the ability to pay
Link’s wallet for agents gives agents programmatic access to Link, including the ability to generate a one-time-use card or Shared Payment Token (SPT) backed by the cards and bank accounts already in your wallet. It’s bu…
DeepInfra on Hugging Face Inference Providers 🔥
Granite 4.1 LLMs: How They’re Built
You’ve Got (Too Much) Mail: Behind the Scenes of the 3/25/26 Voice Outage
On March 25th, voice and video on Discord suffered major degradation beginning at 12:13 PDT, lasting a little over three hours. Learn how the issue originated, how it affected systems across Discord, how we recovered, an…
Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents
Top announcements of the What’s Next with AWS, 2026
At the "What's Next with AWS" 2026 event, AWS launched Amazon Quick—an AI assistant for work with a desktop app and expanded integrations—and expanded Amazon Connect into four agentic AI solutions for supply chain, hirin…
Claude for Creative Work
Skipper: Building Airbnb’s embedded workflow engine
How Airbnb built a lightweight workflow engine to solve durable execution. By : Ricardo Gamba , Andriy Sergiyenko Introduction: The durable execution problem Picture this hypothetical flow: A host submits an insurance cl…
From Clicks to Conversions: Architecting Shopping Conversion Candidate Generation at Pinterest
Authors: Richard Huang | Machine Learning Engineer II; Yu Liu | Senior Machine Learning Engineer; Ziwei Guo | Senior Machine Learning Engineer; Andy Mao | Staff Machine Learning Engineer; Supeng Ge | Sr. Staff Machine Le…
Workflows for work that runs the business
How to build scalable web apps with OpenAI's Privacy Filter
Announcing our partnership with the Republic of Korea
Google DeepMind and Korea partner to accelerate scientific breakthroughs using frontier AI models
AWS Weekly Roundup: Anthropic & Meta partnership, AWS Lambda S3 Files, Amazon Bedrock AgentCore CLI, and more (April 27, 2026)
Late March took me to Seattle for the Specialist Tech Conference, one of the most energizing gatherings of AWS specialists from around the world. It was an incredible opportunity to connect with peers, exchange experienc…
Anthropic names Theo Hourmouzis General Manager of Australia & New Zealand and officially opens Sydney office
Scaling Camera File Processing at Netflix
Orchestrating Media Workflows Through Strategic Collaboration Authors: Eric Reinecke , Bhanu Srikanth Introduction to Content Hub’s Media Production Suite At Netflix, we want to provide filmmakers with the tools they nee…
DeepSeek-V4: a million-token context that agents can actually use
Measure Less to Learn More: Using Fewer, Higher-quality Metrics to Capture What Matters
Too many experiment metrics can make meaningful changes harder to detect. Learn how Discord used simulations and Principal Component Analysis to maximize signal and reduce noise.
Cohere Aleph Alpha Join Forces
Anthropic and NEC partner to build AI-native engineering at scale in Japan
LangChain.js for Beginners: A Free Course to Build Agentic AI Apps with JavaScript
Want to build AI agents with JavaScript that go beyond basic chat completions? Agents that reason, call tools, and pull from knowledge bases on their own? We put together a free, open source course to help you get there.…
How We Built a Smarter Pickup Experience for Gated Communities
If you live in a gated community, you’ve been there: You request a ride from your apartment complex, expect your driver to come to you as usual, and then — your driver’s car icon just stops right at the front gate. You w…
How to Use Transformers.js in a Chrome Extension
How Yelp Keeps Server-Driven UI Consistent Across Four Platforms
If you’ve read our earlier post, you already know about CHAOS—the server-driven UI (SDUI) framework we built at Yelp that powers our dynamic views. Until now, we’ve explored its architecture, backend implementation, and…
Background Coding Agents: Supercharging Downstream Consumer Dataset Migrations (Honk, Part 4)
How we used Honk, Backstage, and Fleet Management to ease the pain of migrating thousands of datasets. The post Background Coding Agents: Supercharging Downstream Consumer Dataset Migrations (Honk, Part 4) appeared first…
Decoupled DiLoCo: A new frontier for resilient, distributed AI training
Modernizing the Facebook Groups Search to Unlock the Power of Community Knowledge
We’ve fundamentally transformed Facebook Groups Search to help people more reliably discover, sort through, and validate community content that’s most relevant to them. We’ve adopted a new hybrid retrieval architecture a…
AI and the Future of Cybersecurity: Why Openness Matters
QIMMA قِمّة ⛰: A Quality-First Arabic LLM Leaderboard
Partnering with industry leaders to accelerate AI transformation
Google DeepMind partners with global consultancies to bring the power of frontier AI to organizations around the world.
Why MoE models get more from speculative decoding
Building a fault-tolerant metrics storage system at Airbnb
How we built a storage system that ingests 50 million samples per second and stores 2.5 petabytes of logical time series data. By : Rishabh Kumar Modern observability practice encourages instrumenting every meaningful co…
Smarter URL Normalization at Scale: How MIQPS Powers Content Deduplication at Pinterest
Shanhai Liao | Senior Software Engineer, Content Acquisition and Media Platform; Di Ruan, | Senior Staff Software Engineer, Content Acquisition and Media Platform; Evan Li, | Senior Engineering Manager, Content Acquisiti…
Gradient-based Planning for World Models at Longer Horizons
.grasp-results-table table { font-size: 0.875rem; line-height: 1.35; width: 100%; } .grasp-results-table th, .grasp-results-table td { padding: 0.35rem 0.5rem; } /* Consistent whitespace between major sections (this post…
Anthropic and Amazon expand collaboration for up to 5 gigawatts of new compute
The Human Infrastructure: How Netflix Built the Operations Layer Behind Live at Scale
By: Brett Axler , Casper Choffat , and Alo Lowry In the three years since our first Live show, Chris Rock: Selective Outrage , we have witnessed an incredible expansion of our live content slate and the live operations t…
Introducing Claude Design by Anthropic Labs
Post-Quantum Cryptography Migration at Meta: Framework, Lessons, and Takeaways
We’re sharing lessons learned from Meta’s post-quantum cryptography (PQC) migration to help other organizations strengthen their resilience as industry transitions to post-quantum cryptography standards. We’re proposing…
Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale
We re sharing insights into Meta s Capacity Efficiency Program, where we ve built an AI agent platform that helps automate finding and fixing performance issues throughout our infrastructure. By leveraging encoded domain…
Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers
The PR you would have opened yourself
Ecom-RLVE: Adaptive Verifiable Environments for E-Commerce Conversational Agents
Making Discord on Desktop Look Just Right: Display Settings to Ease the Eyes
Learn all sorts of toggles, options, and features on Discord’s desktop app to help you view media at your pace, lower the strength of colors across the app, and make app content easier to see.
Introducing Claude Opus 4.7
Finding zombies in our systems: A real-world story of CPU bottlenecks
Vaibhav Shankar; Staff Software Engineer | Raymond Lee; Staff Software Engineer | Chia-Wei Chen; Staff Software Engineer | Shunyao Li; Sr. Software Engineer | Yi Li; Staff Software Engineer | Ambud Sharma; Principal Engi…
Meet HoloTab by HCompany. Your AI browser companion.
Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents
Gemini 3.1 Flash TTS: the next generation of expressive AI speech
Our newest audio model introduces granular audio tags that give you precise control to direct AI speech for expressive audio generation.
Anthropic’s Long-Term Benefit Trust appoints Vas Narasimhan to Board of Directors
Privacy-first connections: Empowering social experiences at Airbnb
Discover how Airbnb prioritizes user privacy while building a more connected community, empowering guests to engage socially, connect confidently, and maintain control of their personal data. By: Joy Jing ✨ Building a mo…
Managing context in long-run agentic applications
Excerpt In complex, long-running agentic systems, maintaining alignment and coherent reasoning between agents requires careful design. In this second article of our series, we explore these challenges and the mechanisms…
Scaling Recommendation Systems with Request-Level Deduplication
Authors: Matt Lawhon | Sr. Machine Learning Engineer; Filip Ryzner | Machine Learning Engineer II; Kousik Rajesh | Machine Learning Engineer II; Chen Yang | Sr. Staff Machine Learning Engineer; Saurabh Vishwas Joshi | Pr…
Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning
Gemini Robotics ER 1.6: Enhancing spatial reasoning and multi-view understanding for autonomous robotics.
Evaluating Netflix Show Synopses with LLM-as-a-Judge
by Gabriela Alessio , Cameron Taylor , and Cameron R. Wolfe Introduction When members log into Netflix, one of the hardest choices is what to watch. The challenge isn’t a lack of options — there are thousands of titles —…
Multimodal Embedding & Reranker Models with Sentence Transformers
Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs
Performance for Everyone
Author: Lin Wang (Android Performance Engineer) Default Feature For mobile apps, performance is considered as the “default feature”, which means apps are expected to run fast and be responsive. It’s just as if we expect…
Safetensors is Joining the PyTorch Foundation
Zero downtime Upgrade: Yelp’s Cassandra 4.x Upgrade Story
The Database Reliability Engineering team at Yelp seamlessly upgraded more than a thousand Cassandra nodes with zero downtime. This post takes you behind the scenes of our upgrade strategy, from planning sessions to flaw…
Evolution of Multi-Objective Optimization at Pinterest Home feed
Homefeed: Jiacong He, Dafang He, Jie Cheng (former), Andreanne Lemay, Mostafa Keikha, Rahul Goutam, Dhruvil Deven Badani, Dylan Wang Content Quality: Jianing Sun, Qinglong Zeng ML Serving: Li Tang Introduction In feed re…
Building a high-volume metrics pipeline with OpenTelemetry and vmagent
A production-tested approach for moving a large-scale metrics pipeline from StatsD to OpenTelemetry and Prometheus. By: Eugene Ma , Natasha Aleksandrova When migrating to a new monitoring system, you’ll want to frontload…
Stop Answering the Same Question Twice: Interval-Aware Caching for Druid at Netflix Scale
By Ben Sykes In a previous post , we described how Netflix uses Apache Druid to ingest millions of events per second and query trillions of rows, providing the real-time insights needed to ensure a high-quality experienc…
Discord Patch Notes: April 6, 2026
Check out the finer details of the more technical fixes implemented into Discord recently.
Welcome Gemma 4: Frontier multimodal intelligence on device
Improving storage efficiency in Magic Pocket, our immutable blob store
By turning compaction into a layered, adaptive pipeline and strengthening our monitoring and controls, we made Magic Pocket more resilient to workload changes.
Gemma 4: Byte for byte, the most capable open models
Gemma 4: Our most intelligent open models to date, purpose-built for advanced reasoning and agentic workflows.
The Enterprise AI Maturity Model
Any Custom Frontend with Gradio's Backend
Falcon Perception
MULTIPLAYER SEQUEL TO ACCLAIMED AAAA GAME “THE LAST MEADOW” ANNOUNCED: PLAYABLE NOW
Band together with Discordians from across the world in Last Meadow Online, the world’s first DBMMIRPG. Available to play until April 7, 2026.
From Custom to Open: Scalable Network Probing and HTTP/3 Readiness with Prometheus
The Problem: Legacy Tooling and Its Limitations Currently, Slack utilizes a hybrid approach to network measurement, incorporating both internal (such as traffic between AWS Availability Zones) and external (monitoring tr…
TRL v1.0: Post-Training Library Built to Move with the Field
Training mRNA Language Models Across 25 Species for $165
Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents
Ensemble Brings Agentic AI to RCM Platform with Cohere
Australian government and Anthropic sign MOU for AI safety and research
Predicting Rider Conversion in Sparse Data Environments with Bayesian Trees
At Lyft, understanding how riders go through our user experience is fundamental to operating a healthy marketplace. Specifically, it is important to have a robust model determining if a rider will actually request a ride…
Reimagining the mouse pointer for the AI era
Google DeepMind is transforming the mouse pointer into a context-aware AI partner. Move beyond the friction of traditional prompting with intuitive AI collaboration in Chrome and beyond.
Building Biz Ask Anything: From Prototype to Product
Introduction Users have access to a wealth of information on Yelp business pages – from reviews and photos to structured information, menus, and Ask the Community feature on the business page, a single business page can…
Liberate your OpenClaw
How Multi-Factor Authentication Helps Keep Your Discord Account Safe
A Discord account is more than just your username and avatar. That’s why it’s important to help keep your account safe and secure by using Multi-Factor Authentication, SMS Backup Authentication & QR Code Login. Learn how…
Gemini 3.1 Flash Live: Making audio AI more natural and reliable
Our latest voice model has improved precision and lower latency to make voice interactions more fluid, natural and precise.
Introducing Cohere Transcribe: a new state-of-the-art in open-source speech recognition
Firefox Developer Edition and Beta: Try out Mozilla’s .rpm package!
In January, we introduced our Nightly package for RPM-based Linux distributions. Today, we are thrilled to announce it is now available for Firefox Beta! Firefox Beta is great for testing your sites in a version of Firef…
Beyond A/B Testing: Using Surrogacy and Region-Splits to Measure Long-Term Effects in Marketplaces
Image generated with Gemini 3 Pro (Google), 2026. Written by Amber Wang and Y oonji Kim at Lyft. Background Whenever you use the Lyft app, there is a complex balancing act happening behind the scenes. Various levers are…
Reducing our monorepo size to improve developer velocity
Monorepos will continue to grow as products evolve, but growth doesn’t have to mean friction.
Lyria 3 Pro: Create longer tracks in more
Introducing Lyria 3 Pro, which unlocks longer tracks with structural awareness. We’re also bringing Lyria to more Google products and surfaces.
Protecting people from harmful manipulation
Google DeepMind researches AI's harmful manipulation risks across areas like finance and health, leading to new safety measures.
A New Framework for Evaluating Voice Agents (EVA)
Discord Update: March 24, 2026 Changelog
Here s the Discord Changelog from March 24, 2026, so you can stay informed on what’s new in recent app updates!
Speaking of Voxtral
Build a Domain-Specific Embedding Model in Under a Day
Making Ads Count: Using MMoE and Auxiliary Tasks to Better Connect Buyers & Sellers
When buyers search on Etsy, they need to quickly and easily find the perfect item. At the same time, sellers need to be confident their unique products are being seen by the right customers. Our Ads Search ranking model,…
How Slack Rebuilt Notifications 📣
Introduction 🔔 At Slack, notifications are how teams stay in the loop, but they can also become overwhelming when not designed with intention. Our goal was to make staying informed feel effortless. We set out to rebuild…
From firefighting to building: How AI agents restored our team’s core productivity
Abstract Grab’s Analytics Data Warehouse (ADW) team supports over 1,000 users each month. These users support an extensive repository of more than 15,000 tables, which powers approximately 50% of all queries within our d…
Migrating Etsy’s database sharding to Vitess
Etsy has maintained a sharded MySQL architecture since around 2010. This database cluster contains most of Etsy’s online data and is made up of ~1,000 tables distributed across ~1,000 shards. Over the last 16 years, it h…
Introducing Forge
How we optimized Dash's relevance judge with DSPy
We used DSPy to turn prompt engineering for our relevance judge into a measurable, automated optimization loop, improving task performance, cost, and how reliably it works in production.
Measuring progress toward AGI: A cognitive framework
We’re introducing a framework to measure progress toward AGI, and launching a Kaggle hackathon to build the relevant evaluations.
Mistral AI partners with NVIDIA to accelerate open frontier models
Leanstral: Open-Source foundation for trustworthy vibe-coding
Introducing Mistral Small 4
How ROOST is Advancing Online Safety
The threat landscape online has shifted dramatically. Many online platforms are left to reinvent safety tools from scratch. That’s the gap ROOST was built to close — and it’s why open-sourcing battle-tested tools like Os…
Cohere advances sovereign AI capabilities with NVIDIA
Identifying Interactions at Scale for LLMs
--> Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decisio…
Enabling R8 optimization at scale with AI-assisted debugging
Grab is Southeast Asia’s leading superapp, providing a suite of services that bring essential needs to users throughout the region. Its offerings include ride-hailing, food delivery, parcel delivery, mobile payments, and…
You’re Now Discord Official: Developers, Claim Your Game and Verify Your Server
Developers can now claim and customize their game’s profiles on Discord. Curate your game’s presence on the platform to help people discover more about your game, and get your server verified in the process! Read on to s…
Anthropic invests $100 million into the Claude Partner Network
Rails testing on autopilot: Building an agent that writes what developers won't
Introducing The Anthropic Institute
Sydney will become Anthropic’s fourth office in Asia-Pacific
Building on the Social Layer of Games: What’s New from GDC 2026
At GDC 2026, Discord gives developers more ways to close the gap between connection and play.
From games to biology and beyond: 10 years of AlphaGo’s impact
Ten years since AlphaGo, we explore how it is catalyzing scientific discovery and paving a path to AGI.
Reclaiming Terabytes: Optimizing Android image caching with TLRU
Introduction In a previous post, we discussed Project Bonsai , our initiative to reduce the Grab app’s download size. We successfully reduced the Android Application Package (APK) download size by 26%. This reduction off…
Discord Patch Notes: March 6, 2026
Check out the finer details of the more technical fixes implemented into Discord recently.
Partnering with Mozilla to improve Firefox’s security
Where things stand with the Department of War
Tracing Discord's Elixir Systems (Without Melting Everything)
Join Senior Software Engineer Nick Krichevsky as he explains how Discord added distributed tracing to Elixir s message passing and optimized it to handle millions of concurrent users.
Gemini 3.1 Flash-Lite: Built for intelligence at scale
Gemini 3.1 Flash-Lite is our fastest and most cost-efficient Gemini 3 series model yet.
Why is WebAssembly a second-class language on the web?
This post is an expanded version of a presentation I gave at the 2025 WebAssembly CG meeting in Munich. WebAssembly has come a long way since its first release in 2017. The first version of WebAssembly was already a grea…
Our Early Journey to Transform Instacart’s Discovery Recommendations with LLMs
Key Contributors: Moein Hasani, Hamidreza Shahidi, Trace Levinson, Guanghua Shu Introduction At Instacart, we are laser-focused on improving the user experience by making shopping feel easy, engaging, and personalized. O…
Using LLMs to amplify human labeling and improve Dash search relevance
How we train Dash's search ranking models with a mix of human and LLM-assisted labeling.
Nano Banana 2: Combining Pro capabilities with lightning-fast speed
Our latest image generation model offers advanced world knowledge, production ready specs, subject consistency and more, all at Flash speed.
Statement from Dario Amodei on our discussions with the Department of War
Goodbye innerHTML, Hello setHTML: Stronger XSS Protection in Firefox 148
Cross-site scripting (XSS) remains one of the most prevalent vulnerabilities on the web. The new standardized Sanitizer API provides a straightforward way for web developers to sanitize untrusted HTML before inserting it…
Getting Global Age Assurance Right: What We Got Wrong and What's Changing
Discord s CTO addresses community concerns about age assurance: no mass ID collection, new vendor transparency commitments, and a delayed global launch until second half of 2026.
Detecting and preventing distillation attacks
Scaling Localization with AI at Lyft
Written by Stefan Zier For years, Lyft’s localization infrastructure relied exclusively on human translation. While this model usually ensured excellent quality, it was bound by multi-day turnarounds and costs that scale…
How to Change Your Theme to Bring Your Vibe to Discord
Add a splash of personality and make Discord pop by changing Discord’s color scheme! Learn how to adjust the look of Discord on both desktop and mobile.
Osprey: Open Sourcing our Rule Engine
Discord uses Osprey to quickly detect and remove new types of harm from putting our customers at risk. Now we’re open-sourcing this tool so others can do the same.
Gemini 3.1 Pro: A smarter model for your most complex tasks
3.1 Pro is designed for tasks where a simple answer isn’t enough.
A new way to express yourself: Gemini can now create music
The Gemini app now features our most advanced music generation model Lyria 3, empowering anyone to make 30-second tracks using text or images.
Turning Data into Velocity: Caper’s Edge and Cloud Data Flywheel with Capsight
Key Contributors: Youming Luo, Andrew Tanner, Matas Sriubiskis, Sylvia Lin, Sikun Zhu, Lei Li, Xiao Zhou Introduction Caper is Instacart’s AI-powered smart cart that provides customers with a fast, seamless, and intuitiv…
Accelerating discovery in India through AI-powered science and education
Google DeepMind brings National Partnerships for AI initiative to India, scaling AI for science and education
Cohere Labs Launches Tiny Aya, Making Multilingual AI Accessible
Introducing Claude Sonnet 4.6
Anthropic opens Bengaluru office and announces new partnerships across India
Cohere signs world chess champion Magnus Carlsen as brand ambassador
Chris Liddell appointed to Anthropic’s board of directors
Launching Interop 2026
The Interop Project is a cross-browser initiative to improve web compatibility in areas that offer the most benefit to both users and developers. The group, including Apple, Google, Igalia, Microsoft, and Mozilla, takes…
Trusting the Untestable: Validation and Diagnostics for the Doubly Robust Models
written by Ross Chu and Shima Nassiri The Causal Frontier: Measurement Beyond Randomization The gold standard for determining the causal impact of a policy or product change at a company like Lyft is the A/B test (random…
How low-bit inference enables efficient AI
Making products like Dropbox Dash accessible to individuals and businesses means tackling new challenges around efficiency and resource use.
Gemini 3 Deep Think: Advancing science, research and engineering
Our most specialized reasoning mode is now updated to solve modern science, research and engineering challenges.
Anthropic is donating $20 million to Public First Action
Anthropic raises $30 billion in Series G funding at $380 billion post-money valuation
Insights from our executive roundtable on AI and engineering productivity
From Claude Code to Cursor, we're big adopters of AI coding tools at Dropbox. The early results have been promising, but there are still a lot of open questions about how to work with these tools most effectively and whe…
Covering electricity price increases from our data centers
From print to digital: Making weekly flyers shoppable at Instacart through computer vision and LLMs
From Print to Digital: Making Weekly Flyers Shoppable at Instacart Through Computer Vision and LLMs Key contributors: Prithvi Srinivasan, Shishir Kumar Prasad, Kristen Morgan, Bryan Pham, Rick Shukla, Preeti Chadha, Vipu…
Accelerating Mathematical and Scientific Discovery with Gemini Deep Think
Research papers point to the growing impact of Deep Think across fields
Introducing Claude Opus 4.6
Voxtral transcribes at the speed of sound.
Discord Patch Notes: February 4, 2026
Check out the finer details of the more technical fixes implemented into Discord recently.
Claude is a space to think
Migrating to Jetpack Compose
Migrating to Jetpack Compose: How AI Accelerated Our Journey at Caper Introduction At Instacart, our Caper smart carts bring together AI, computer vision, and real-time data to power the future of in-store shopping. Cust…
Apple’s Xcode now supports the Claude Agent SDK
How Yelp Built a Back-Testing Engine for Safer, Smarter Ad Budget Allocation
Introduction Modern advertising platforms are fast-paced and interconnected: even small adjustments can have ripple effects on how ads are shown, how budgets are spent, and the value advertisers get from their ad spend.…
How to Customize Your Discord Profile
Make your first impression count with a profile that represents you how YOU want to be seen. Learn how to edit and customize your profile to have it rep you the right way.
Anthropic partners with Allen Institute and Howard Hughes Medical Institute to accelerate scientific discovery
Cursor at Grab: Adoption and impact
Adoption overview The illustration below encapsulates how Cursor is scaled across Grab, achieving rapid and widespread adoption that accelerated software development and empowered non-technical teams to build solutions.…
Project Genie: Experimenting with infinite, interactive worlds
Google AI Ultra subscribers in the U.S. can try out Project Genie, an experimental research prototype that lets you create and explore worlds.
Engineering VP Josh Clemm on how we use knowledge graphs, MCP, and DSPy in Dash
Engineering VP Josh Clemm deep-dives into how we think about knowledge graphs, indexes, MCP, and prompt optimization using tools like DSPy.
Model Vault: a private platform for secure model inference
ServiceNow chooses Claude to power customer apps and increase internal productivity
Terminally online Mistral Vibe.
Heaps do lie: debugging a memory leak in vLLM.
How Etsy Uses LLMs to Improve Search Relevance
Ever searched for something specific, only to be met with results that are close, but not quite ? On Etsy’s Search Relevance team, that frustration is exactly what we are tackling. Our goal is simple yet ambitious: to he…
D4RT: Teaching AI to see the world in four dimensions
D4RT: Unified, efficient 4D reconstruction and tracking up to 300x faster than prior methods.
How the Cohere Labs open research community turns early-career researchers into global leaders
Veo 3.1 Ingredients to Video: More consistency, creativity and control
Our latest Veo update generates lively, dynamic clips that feel natural and engaging — and supports vertical video generation.
Information-Driven Design of Imaging Systems
An encoder (optical system) maps objects to noiseless images, which noise corrupts into measurements. Our information estimator uses only these noisy measurements and a noise model to quantify how well measurements disti…
Lyft’s Feature Store: Architecture, Optimization, and Evolution
Written by Rohan Varshney , with support from Devon Mittow Janice Lee . This article expands upon a presentation from the Feature Store Summit 2025, which can be viewed in full here . There is also another video availabl…
Google's year in review: 8 areas with research breakthroughs in 2025
Google 2025 recap: Research breakthroughs of the year
Introducing Mistral OCR 3
Gemini 3 Flash: frontier intelligence built for speed
Gemini 3 Flash offers frontier intelligence built for speed at a fraction of the cost.
Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior
Open interpretability tools for language models are now available across the entire Gemma 3 family with the release of Gemma Scope 2.
How a Community Effort is Teaching AI to See Africa’s Richness
From Python3.8 to Python3.10: Our Journey Through a Memory Leak
Image generated with ChatGPT (OpenAI), 2025. Intro When working with Python, memory management often feels like a solved problem. The garbage collector quietly does its job, and unlike C or C++, we rarely think about mal…
How to Make and Use Custom Emoji on Discord
Emojis on Discord are special — you can make a little picture out of almost any symbol, in-joke, or bizarre late-night inspiration.
Improved Gemini audio models for powerful voice experiences
Introducing Rerank 4: Cohere’s most powerful reranker yet
Building Trust in AI: Cohere’s Approach to AI Governance
Introducing: Devstral 2 and Mistral Vibe CLI.
Gift Ideas for the Dedicated Discord User in Your Life
We’ve got plenty of gift ideas, from those who game through the night, to those who decorate their profile just right — let’s dig in!
Discord Patch Notes: December 8, 2025
Check out the finer details of the more technical fixes implemented into Discord recently.
Your Discord Checkpoint is Rolling Out! Celebrate What You Did in 2025
How many messages did you send? How long did you hang out in voice? Who’d you talk with the most? Stop by your end-of-year Checkpoint, a recap of the stuff YOU did on Discord throughout the year!
Introducing Mistral 3
Bringing In-Game Commerce to Discord Communities
By bringing commerce directly to official game communities, we re giving game developers the opportunity to benefit from these dynamics, creating incremental revenue that complements their existing storefronts.
Save and Display Your Faves: Add Discord Shop & Marvel Rivals Items to Your Profile’s Wishlist
Keep track of all the stuff from the Shop you’ve been wanting to purchase with the new Wishlist feature. Display the stuff you’ve been eyeing on your profile, and if you’re lucky enough, maybe one of your friends may see…
Streamlining Security Investigations with Agents
Slack’s Security Engineering team is responsible for protecting Slack’s core infrastructure and services. Our security event ingestion pipeline handles billions of events per day from a diverse array of data sources. Rev…
Cohere expands partnership with SAP to provide Europe sovereign AI solutions
Reducing experiment duration with predicted control variates
I n 2021, we published a blog post titled “ Increasing experimentation accuracy and speed by using control variates ,” describing how we reduce the variance of metrics using CUPED in our experimentation platform. This is…
Android VPAT journey
Background A Voluntary Product Accessibility Template (VPAT) is a document that outlines how well a product aligns with accessibility (a11y) standards. Its primary purpose is to inform customers about a product s a11y fe…
Mistral AI - KI für Deutschland
LyftLearn Evolution: Rethinking ML Platform Architecture
Written by Yaroslav Yatsiuk At Lyft, machine learning (ML) is the engine behind our most critical business functions — from dispatch and pricing optimization to fraud detection and support automation. Our ML infrastructu…
How to Share What You’re Playing, Listening to, or Watching as Your Status on Discord
Playing a game right now? Listening to some tunes, or catching up on that one anime your friends won’t stop talking about? Learn how to show off what you’re up to as your Discord status and show @everyone what’s up!
How to Link Discord to Battlefield 6, Marvel Rivals & More
Some of the most popular multiplayer games have added the ability to directly link your Discord account to the game! Learn how to link your Discord account to some big-name titles and see what sorta perks it provides.
Building The Intent Engine: How Instacart is Revamping Query Understanding with LLMs
Authors: Yuanzheng Zhu, Guanghua Shu, Raochuan Fan, Vinesh Gudla, Tejaswi Tenneti Introduction When people search for items on Instacart, they don’t always type perfectly worded phrases. They might write “bread no gluten…
HIPAA Business Associate agreements for custom model development
Making data transfer in LLM systems faster, leaner, and more scalable
A Cornucopia of Updates Make Discord on Desktop Fresher Than a Crisp Fall Breeze
This fall, emoji making gets faster, the Settings page gets a redesign, Group DMs are easier to customize, more games gain support for special Discord-powered capabilities, and Family Center gets some expanded features.…
Discord Update: November 6, 2025 Changelog
Here s the Discord Changelog from November 6, 2025, so you can stay informed on what’s new in recent app updates!
Discord Patch Notes: November 4, 2025
Check out the finer details of the more technical fixes implemented into Discord recently.
RL without TD learning
In this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer . Unlike traditional methods, this algorithm is not based on temporal difference (TD) learning (…
Our commitment to AI model signing on Hugging Face
Improving performance by prefetching product pages from Etsy Search
Rarely are there opportunities for big, bold, game-changing improvements in web performance. The Speculation Rules API (SRA) is a recent browser development that offers just such an opportunity. This post details a joint…
Introducing Mistral AI Studio.
Understanding Etsy’s Vast Inventory with LLMs
For more than 20 years, Etsy has been the destination for human creativity online. Our marketplace is home to more than 100 million special items made, handpicked and designed by more than 5 million sellers. These items…
From Single-Node to Multi-GPU Clusters: How Discord Made Distributed Compute Easy for ML Engineers
From manual GPU configs to one-command clusters: Join Serrana Aguirregaray and Nathaniel Jenkins as they tell the story of Discord s journey to build a developer-friendly ML infrastructure on Ray for 200M+ monthly active…
Unlocking Faster Insights with Experimenter-Defined Segmentations
Imagine you have a fabulous idea to drive more sales on Etsy by giving out free ice cream with every purchase. How would you know if it will actually work? One way to test this out is to run an experiment ! An experiment…
During October, Treat a Friend to Nitro and Trick Out Your Profile for Halloween 🎃
Do you embrace tricks in the Shop or turn them into treats? Trick out your profile with shockingly sweet decorations, or treat your friend to Nitro and earn something for yourself in return!
Discord Patch Notes: October 7, 2025
Check out the finer details of the more technical fixes implemented into Discord recently.
Announcing the Cohere Partner Program: Boosting enterprise AI
S3 server access logs at scale
Introduction Yelp heavily relies on Amazon S3 (Simple Storage Service) to store a wide variety of data, from images, logs, database backups, and more. Since data is stored on the cloud, we need to carefully manage how th…
New Looks for Nitro, New Looks for You. Get Yourself a Nitro-exclusive Profile Bundle!
Between now and September 30th, 2025, new and existing Nitro members can claim a profile bundle matching Nitro s new look, including an Avatar Decoration, Profile Effect, and a Nameplate! Open this blog to see the detail…
Discord Update: September 25, 2025 Changelog
Here s the Discord Changelog from September 25, 2025, so you can stay informed on what’s new in recent app updates!
Cohere adds $100M in second close to latest round as it scales security-first enterprise AI
Exploring AI in Education
Cohere opens Paris office as EMEA hub
Staff Picks, September 2025: Welcome to Our Video Game Museum
Happy National Video Games Day! We’re hearing from Veronica, Scott, Tyler, and Anni about which games would go in their hypothetical video games museum. Nothing better than honoring a beloved game by putting it behind a…
Mistral AI raises 1.7B€ to accelerate technological progress with AI
Building Etsy Buyer Profiles with LLMs
Every day, shoppers from Etsy's community of nearly 90M buyers visit our marketplace to search for unique, handmade, and vintage items. But with over 100 million listings, how do we help each buyer find exactly what they…
Discord Patch Notes: September 3, 2025
Check out the finer details of the more technical fixes implemented into Discord recently.
Le Chat. Custom MCP connectors. Memories.
Make Memory work for you.
Bringing DAVE to All Discord Platforms
From March 1st, 2026, all clients and apps must be updated to utilize DAVE’s end-to-end encryption support for voice and video calls. In this blog, we explore the challenges we encountered bringing E2EE to browsers, and…
What exactly does word2vec learn?
What exactly does word2vec learn, and how? Answering this question amounts to understanding representation learning in a minimal yet interesting language modeling task. Despite the fact that word2vec is a well-known prec…
Command A Translate: Secure translation for global enterprises
Command A Reasoning: Enterprise-grade control for AI agents
CRLite: Fast, private, and comprehensive certificate revocation checking in Firefox
Firefox is now the first and the only browser to deploy fast and comprehensive certificate revocation checking that does not reveal your browsing activity to anyone (not even to Mozilla). Tens of millions of TLS server c…
Context engineering case studies: Etsy-specific question answering
This post investigates the benefits and limitations of prompt engineering in two instances of AI-assisted onboarding relying on large language model (LLM) technology. Of particular interest is how truthful (and therefore…
Cohere deepens partnership with Government of Canada
Elo ratings beyond arena-style evaluations
Cohere raises $500M at $6.8B valuation to accelerate enterprise efficiency with agentic AI
Modernizing FOI Systems with AI Agents
North by Cohere: Now Widely Available Agentic AI Platform
Unlocking the potential of vision language models on satellite imagery through fine-tuning
Introducing Command A Vision: Multimodal AI built for business
Announcing Codestral 25.08 and the Complete Mistral Coding Stack for Enterprise
Navigating the Global Push for Sovereign AI
Cohere and Bell partner to deliver sovereign AI
Secure and Responsible AI for Europe
Our contribution to a global environmental standard for AI
Le Chat dives deep.
Voxtral
Upgrading agentic coding capabilities with the new Devstral models
Exploring CHAOS: Building a Backend for Server-Driven UI
A little while ago, we published a blog post on CHAOS: Yelp’s Unified Framework for Server-Driven UI. We strongly recommend reading that post first to gain a solid understanding of SDUI and the goals of CHAOS. This post…
What Are AI Benchmarks? A Business Guide for Evaluations
Announcing AI for Citizens
Cohere ouvre un bureau à Montréal
Whole-Body Conditioned Egocentric Video Prediction
.modal { display: none; position: fixed; z-index: 9999; padding-top: 50px; left: 0; top: 0; width: 100%; height: 100%; overflow: auto; background-color: rgba(0,0,0,0.9); } .modal-content { margin: auto; display: block; m…
Cohere achieves ISO 42001 and ISO 27001 certifications
Cohere Partners with Canada and UK Governments on Secure AI
Revenue Automation Series: Testing an Integration with Third-Party System
Background As described in the second blog post of Revenue Automation series, Revenue Data Pipeline processes a large amount of data via complex logic transformations to recognize revenue. Thus, developing a robust produ…
PayPal Releases Agentic Toolkit to Accelerate Commerce
The following is a repost from the PayPal Developer Blog . Building on the release of PayPal’s MCP servers , PayPal is excited to introduce the PayPal Agentic Toolkit *. This toolkit empowers developers to seamlessly int…
Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)
Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP t…
Repurposing Protein Folding Models for Generation with Latent Diffusion
PLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models. The awarding of the 2024 Nobel Prize to AlphaFold2 marks…
PayPal Begins Rollout of MCP Servers to Accelerate Agentic Commerce
The following is a repost from the PayPal Developer Blog by Prakhar Mehrotra, SVP of Artificial Intelligence, PayPal At PayPal, we strive to make it easier for developers to access our services. Today, we are taking the…
Behind the Scenes - A Glimpse into Tax Calculations
In the past, sellers were responsible for managing and fulfilling their own tax obligations. However, more and more jurisdictions are now requiring marketplaces such as Etsy to collect the tax from buyers and remit the t…
Improving Firefox Stability in the Enterprise by Reducing DLL Injection
Beginning in version 138, Firefox will offer an alternative to DLL injection for Data Loss Prevention (DLP) deployments in enterprise environments. DLL Injection DLL injection into Firefox is a topic we’ve covered on the…
Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment
Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Our goal is…
Estimating Incremental Lift in Customer Value (Delta CV) using Synthetic Control
How we measure the impact of user actions and product adoptions at PayPal In today’s competitive digital landscape, understanding user interactions with your products is essential for driving revenue and building lasting…
Launching Interop 2025
Interop 2025 continues the mission to make the web more consistent across browsers, building on 2024’s 95% interoperability score. This year, 19 focus areas target key developer needs and long-standing issues, including…
Adopting Jetpack Compose for Etsy’s Android App
One of our Guiding Principles at Etsy is that we “commit to our craft.” This means that we have a culture of learning, in which we’re constantly looking for opportunities to improve and learn, adopt industry best practic…
Introducing Uniffi for React Native: Rust-Powered Turbo Modules
Mozilla and Filament have introduced Uniffi for React Native, a tool that allows developers to leverage the safety and performance benefits of Rust in cross-platform React Native apps. The post Introducing Uniffi for Rea…
Llamafile v0.8.14: a new UI, performance gains, and more
Discover the latest release of Llamafile 0.8.14, an open-source AI tool by Mozilla Builders. With a new command-line chat interface, enhanced performance, and support for powerful models, Llamafile makes it easy to run l…
0Din: A GenAI Bug Bounty Program – Securing Tomorrow’s AI Together
As AI continues to evolve, so do the threats against it. As these GenAI systems become more sophisticated and widely adopted, ensuring their security and ethical use becomes paramount. 0Din is a groundbreaking GenAI bug…
Announcing Official Puppeteer Support for Firefox
We’re pleased to announce that, as of version 23, the Puppeteer browser automation library now has first-class support for Firefox. This means that it’s now easy to write automation and perform end-to-end testing using P…
Machine Learning in Content Moderation at Etsy
At Etsy, we’re focused on elevating the best of our marketplace to help creative entrepreneurs grow their businesses. We continue to invest in making Etsy a safe and trusted place to shop, so sellers’ extraordinary items…
Snapshots for IPC Fuzzing
Process separation remains one of the most important parts of the Firefox security model and securing our IPC (Inter-Process Communication) interfaces is crucial to keep privileges in the different processes separated. W…
Sponsoring sqlite-vec to enable more powerful Local AI applications
Today we’re proud to announce the next Mozilla Builders project: sqlite-vec. Led by independent developer Alex Garcia, this project brings vector search functionality to the beloved SQLite embedded database. Alex has bee…
Enhancing Cloud Usage Forecasting, Monitoring & Optimizing
In 2020, Etsy concluded its migration from an on-premise data center to the Google Cloud Platform (GCP). During this transition, a dedicated team of program managers ensured the migration's success. Post-migration, this…
Efficient Visual Representation Learning And Evaluation
Etsy features a diverse marketplace of unique handmade and vintage items. It’s a visually diverse marketplace as well, and computer vision has become increasingly important to Etsy as a way of enhancing our users’ shoppi…
Experimenting with local alt text generation in Firefox Nightly
Firefox 130 will introduce an experimental new capability to automatically generate alt-text for images using a fully private on-device AI model. The feature will be available as part of Firefox’s built-in PDF editor, an…
Scaling PayPal’s AI Capabilities with PayPal Cosmos.AI Platform
By Jun Yang , Zhenyin Yang , and Srinivasan Manoharan , based on the AI/ML modernization journey taken by the PayPal Cosmos.AI Platform team in the past three years. Source: Dall-E 3 AI is a transformative technology tha…
Llamafile’s progress, four months in
When Mozilla’s Innovation group first launched the llamafile project late last year, we were thrilled by the immediate positive response from open source AI developers. It’s become one of Mozilla’s top three most-favorit…
Porting a cross-platform GUI application to Rust
In this blog post, we delve into the motivations for choosing Rust for our crash reporter, outline the unique challenges of designing an application that operates when the main browser has failed, and discuss the new arc…
Prototype even faster with the Gradio UI for Figma component library
In the fast-paced world of generative AI, staying ahead means moving swiftly and smartly. That's why we've embraced Gradio, the low-code prototyping toolkit from Hugging Face, as our go-to for bringing new ideas to life.…
Generative AI to quantify uncertainty in weather forecasting
Posted by Lizao (Larry) Li, Software Engineer, and Rob Carver, Research Scientist, Google Research Accurate weather forecasts can have a direct impact on people’s lives, from helping make routine decisions, like what to…
AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks
Posted by Urs Köster, Software Engineer, Google Research Time series problems are ubiquitous, from forecasting weather and traffic patterns to understanding economic trends. Bayesian approaches start with an assumption a…
Using AI to expand global access to reliable flood forecasts
Posted by Yossi Matias, VP Engineering Research, and Grey Nearing, Research Scientist, Google Research Floods are the most common natural disaster , and are responsible for roughly $50 billion in annual financial damages…
Computer-aided diagnosis for lung cancer screening
Posted by Atilla Kiraly, Software Engineer, and Rory Pilgrim, Product Manager, Google Research Lung cancer is the leading cause of cancer-related deaths globally with 1.8 million deaths reported in 2020. Late diagnosis d…
SCIN: A new resource for representative dermatology images
Posted by Pooja Rao, Research Scientist, Google Research Health datasets play a crucial role in research and medical education, but it can be challenging to create a dataset that represents the real world. For example, d…
ScreenAI: A visual language model for UI and visually-situated language understanding
Posted by Srinivas Sunkara and Gilles Baechler, Software Engineers, Google Research Screen user interfaces (UIs) and infographics, such as charts, diagrams and tables, play important roles in human communication and huma…
MELON: Reconstructing 3D objects from images with unknown poses
Posted by Mark Matthews, Senior Software Engineer, and Dmitry Lagun, Research Scientist, Google Research A person's prior experience and understanding of the world generally enables them to easily infer what an object lo…
Macramé: Untangling the Knot on the Etsy Android Listing Screen
Easily the most important and complex screen in the Buy on Etsy Android app is the listing screen, where all key information about an item for sale in the Etsy marketplace is displayed to buyers. Far from just a title an…
HEAL: A framework for health equity assessment of machine learning performance
Posted by Mike Schaekermann, Research Scientist, Google Research, and Ivor Horn, Chief Health Equity Officer Director, Google Core Health equity is a major societal concern worldwide with disparities having many causes.…
Cappy: Outperforming and boosting large multi-task language models with a small scorer
Posted by Yun Zhu and Lijuan Liu, Software Engineers, Google Research Large language model (LLM) advancements have led to a new paradigm that unifies various natural language processing (NLP) tasks within an instruction-…
Talk like a graph: Encoding graphs for large language models
Posted by Bahare Fatemi and Bryan Perozzi, Research Scientists, Google Research Imagine all the things around you — your friends, tools in your kitchen, or even the parts of your bike. They are all connected in different…
Chain-of-table: Evolving tables in the reasoning chain for table understanding
Posted by Zilong Wang, Student Researcher, and Chen-Yu Lee, Research Scientist, Cloud AI Team People use tables every day to organize and interpret complex information in a structured, easily accessible format. Due to th…
Health-specific embedding tools for dermatology and pathology
Posted by Dave Steiner, Clinical Research Scientist, Google Health, and Rory Pilgrim, Product Manager, Google Research There’s a worldwide shortage of access to medical imaging expert interpretation across specialties in…
Social learning: Collaborative learning with large language models
Posted by Amirkeivan Mohtashami, Research Intern, and Florian Hartmann, Software Engineer, Google Research Large language models (LLMs) have significantly improved the state of the art for solving tasks specified using n…
Croissant: a metadata format for ML-ready datasets
Posted by Omar Benjelloun, Software Engineer, Google Research, and Peter Mattson, Software Engineer, Google Core ML and President, MLCommons Association Machine learning (ML) practitioners looking to reuse existing datas…
How We Built The Deals Tab in Swift UI
Balancing Engineering Ambition with Product Realism Introduction In July of 2023, Etsy’s App Updates team, responsible for the Updates feed in Etsy’s mobile apps, set off with an ambitious goal: to revamp the Updates tab…
Google at APS 2024
Posted by Kate Weber and Shannon Leon, Google Research, Quantum AI Team Today the 2024 March Meeting of the American Physical Society (APS) kicks off in Minneapolis, MN. A premier conference on topics ranging across phys…
VideoPrism: A foundational visual encoder for video understanding
Posted by Long Zhao, Senior Research Scientist, and Ting Liu, Senior Staff Software Engineer, Google Research An astounding number of videos are available on the Web, covering a variety of content from everyday moments p…
Leveraging Spark 3 and NVIDIA’s GPUs to Reduce Cloud Cost by up to 70% for Big Data Pipelines
By Ilay Chen and Tomer Akirav At PayPal, hundreds of thousands of Apache Spark jobs run on an hourly basis, processing petabytes of data and requiring a high volume of resources. To handle the growth of machine learning…
Advances in private training for production on-device language models
Posted by Zheng Xu, Research Scientist, and Yanxiang Zhang, Software Engineer, Google Language models (LMs) trained to predict the next word given input text are the key technology for many applications [ 1 , 2 ]. In Gbo…
Learning the importance of training data under concept drift
Posted by Nishant Jain, Pre-doctoral Researcher, and Pradeep Shenoy, Research Scientist, Google Research The constantly changing nature of the world around us poses a significant challenge for the development of AI model…
DP-Auditorium: A flexible library for auditing differential privacy
Posted by Mónica Ribero Díaz, Research Scientist, Google Research Differential privacy (DP) is a property of randomized mechanisms that limit the influence of any individual user’s information while processing and analyz…
Graph neural networks in TensorFlow
Posted by Dustin Zelle, Software Engineer, Google Research, and Arno Eigenwillig, Software Engineer, CoreML Objects and their relationships are ubiquitous in the world around us, and relationships can be as important to…
Intervening on early readouts for mitigating spurious features and simplicity bias
Posted by Rishabh Tiwari, Pre-doctoral Researcher, and Pradeep Shenoy, Research Scientist, Google Research Machine learning models in the real world are often trained on limited data that may contain unintended statistic…
A decoder-only foundation model for time-series forecasting
Posted by Rajat Sen and Yichen Zhou, Google Research Time-series forecasting is ubiquitous in various domains, such as retail, finance, manufacturing, healthcare and natural sciences. In retail use cases, for example, it…
MobileDiffusion: Rapid text-to-image generation on-device
Posted by Yang Zhao, Senior Software Engineer, and Tingbo Hou, Senior Staff Software Engineer, Core ML Text-to-image diffusion models have shown exceptional capabilities in generating high-quality images from text prompt…
Mixed-input matrix multiplication performance optimizations
Posted by Manish Gupta, Staff Software Engineer, Google Research AI-driven technologies are weaving themselves into the fabric of our daily routines, with the potential to enhance our access to knowledge and boost our ov…
Exphormer: Scaling transformers for graph-structured data
Posted by Ameya Velingker, Research Scientist, Google Research, and Balaji Venkatachalam, Software Engineer, Google Graphs , in which objects and their relations are represented as nodes (or vertices) and edges (or links…
Declarative Feature Engineering at PayPal
Photo by fabio on Unsplash PayPal supports over 400 million active consumers and merchants worldwide. Every minute there are several thousand payment transactions. To prevent fraud in real-time at such a scale, we need t…
Streamlining Developer Productivity with the PayPal Visual Studio Code Extension
In the ever-evolving landscape of software development, productivity and efficiency have become paramount to success. Developers are constantly juggling multiple tasks, from navigating complex codebases to integrating th…
Managing Recurring Payments with Apple Pay Using PayPal
Recurring payments have become an integral part of the modern digital economy, offering convenience and predictability for both consumers and businesses. Our previous post highlighted different methods of integrating App…
Accept E-Commerce Payments Easily with PayPal’s Buttons Component
Accepting online payments is now a universal must-have, catering to everyone from solo entrepreneurs to massive global corporations. PayPal’s Standard Checkout allows for seamless integration of PayPal’s Payment Buttons…
Why You Should Attend PayPal’s Developer Meetup at Money20/20
The world of technology is constantly evolving, and developers are at the forefront of this dynamic landscape. Staying updated on the latest trends, tools, and innovations is not just a choice but a necessity for those i…
The AR Measuring Box: Etsy's answer to Big Tape Measure
A little while ago, Etsy introduced a new feature in its iOS app that could place Etsy sellers' artwork on a user's wall using Apple's Augmented Reality (AR) tools. It let them visualize how a piece would look in their s…
The So-fine Real-time ML Paradigm
Introduction Each year, Etsy hosts an event known as “CodeMosaic” - an internal hackathon in which Etsy admin propose and build bold advances quickly in our technology across a number of different themes. People across E…
Leveraging Real-Time User Actions to Personalize Etsy Ads
Introduction Personalization is vital to connect our unique marketplace to the right buyer at the right time. Etsy has recently introduced a novel, general approach to personalizing ML models based on encoding and learni…
LinkBERT: Improving Language Model Training with Document Link
Language Model Pretraining Language models (LMs), like BERT 1 and the GPT series 2 , achieve remarkable performance on many natural language processing (NLP) tasks. They are now the foundation of today’s NLP systems. 3 T…
Stanford AI Lab Papers and Talks at ACL 2022
The 60th Annual Meeting of the Association for Computational Linguistics (ACL) 2022 is taking place May 22nd - May 27th. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to pape…
Stanford AI Lab Papers and Talks at ICLR 2022
The International Conference on Learning Representations (ICLR) 2022 is being hosted virtually from April 25th - April 29th. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to…
Discovering the systematic errors made by machine learning models
Discovering systematic errors with cross-modal embeddings In this blog post, we introduce Domino, a new approach for discovering systematic errors made by machine learning models. We also discuss a framework for quantita…
Grading Complex Interactive Coding Programs with Reinforcement Learning
[Summary] tl;dr: A tremendous amount of effort has been poured into training AI algorithms to competitively play games that computers have traditionally had trouble with, such as the retro games published by Atari, Go, D…
Understanding Deep Learning Algorithms that Leverage Unlabeled Data, Part 1: Self-training
Deep models require a lot of training examples, but labeled data is difficult to obtain. This motivates an important line of research on leveraging unlabeled data, which is often more readily available. For example, larg…
Stanford AI Lab Papers and Talks at AAAI 2022
The 36th AAAI Conference on Artificial Intelligence (AAAI 2022) is being hosted virtually from February 22th - March 1st. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to pap…
How to Improve User Experience (and Behavior): Three Papers from Stanford's Alexa Prize Team
Introduction In 2019, Stanford entered the Alexa Prize Socialbot Grand Challenge 3 for the first time, with its bot Chirpy Cardinal , which went on to win 2nd place in the competition. In our previous post , we discussed…
Reward Isn't Free: Supervising Robot Learning with Language and Video from the Web
This work was conducted as part of SAIL and CRFM . Deep learning has enabled improvements in the capabilities of robots on a range of problems such as grasping 1 and locomotion 2 in recent years. However, building the qu…
BanditPAM: Almost Linear-Time k-medoids Clustering via Multi-Armed Bandits
TL;DR Want something better than \(k\)-means? Our state-of-the-art \(k\)-medoids algorithm from NeurIPS, BanditPAM, is now publicly available! \(\texttt{pip install banditpam}\) and you're good to go! Like the \(k\)-mean…
Stanford AI Lab Papers and Talks at NeurIPS 2021
The thirty-fifth Conference on Neural Information Processing Systems (NeurIPS) 2021 is being hosted virtually from Dec 6th - 14th. We’re excited to share all the work from SAIL that’s being presented at the main conferen…
Stanford AI Lab Papers at CoRL 2021
The Conference on Robot Learning (CoRL 2021) will take place next week. We’re excited to share all the work from SAIL that will be presented, and you’ll find links to papers, videos and blogs below. Feel free to reach ou…
Stanford AI Lab Papers at EMNLP/CoNLL 2021
The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021) will take place next week, colocated with CoNLL 2021. We’re excited to share all the work from SAIL that will be presented, and you’ll…
Selective Classification Can Magnify Disparities Across Groups
Selective classification, where models are allowed to “abstain” when they are uncertain about a prediction, is a useful approach for deploying models in settings where errors are costly. For example, in medicine, model e…
Stanford AI Lab Papers at ICCV 2021
The International Conference on Computer Vision (ICCV 2021) will be hosted virtually next week. We’re excited to share all the work from SAIL that will be presented, and you’ll find links to papers, videos and blogs belo…