Tech Blogs

Recent engineering posts and AI research from the companies I follow — fetched from their RSS feeds and linked to the originals. Filter by company; newest first.

Preply uses OpenAI to launch AI-generated lesson summaries, providing personalised feedback and language learning exercises.

Read

June 12, 2026OpenAI

New OpenAI Academy courses for the next era of work

OpenAI introduces three Academy courses that help people build practical AI skills, create repeatable workflows, and apply agents in everyday work.

Read

June 12, 2026NVIDIA

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

As enterprise AI adoption scales, developers are increasingly forced to stitch together fragmented pipelines—separate models for text, vision, and...

Read

June 12, 2026NVIDIA

NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI Benchmark

AI agents have fundamentally changed the complexity of inference workloads. Until now, the industry has struggled to define a standard for measuring how...

Read

June 12, 2026Microsoft Research

Ire identifies another LOTUSLITE specimen

Project Ire examined a timely malware sample and determined its intent through reverse engineering—identifying LOTUSLITE characteristics even as most major EDR tools did not detect it. The post Ire identifies another LOT…

Read

June 12, 2026Hugging Face

Read

June 12, 2026Anthropic

Read

June 11, 2026GitLab

June 11, 2026arXiv · ML

Read

June 11, 2026arXiv · ML

Simplex-Constrained Sparse Bagging: Transitioning from Uniform Priors to Sparse Posteriors in Ensemble Learning

We present Simplex-Constrained Sparse Bagging (SCSB), a mathematically rigorous framework for post-training compression and probability calibration of bootstrap-based bagging ensembles. Standard bagging ensembles (such a…

Read

June 11, 2026arXiv · ML

Multiagent Protocols with Aggregated Confidence Signals

Confidence is used for reliability, oversight, and a range of downstream decision tasks in Natural Language Processing (NLP), yet no existing method produces or evaluates a confidence for the output of a multiagent syste…

Read

June 11, 2026arXiv · ML

Reward Modeling for Multi-Agent Orchestration

Read

June 11, 2026arXiv · ML

Read

June 11, 2026arXiv · ML

Language operates as a mechanism of both marginalization and resistance, especially for minority communities navigating insensitive and harmful speech online. As content moderation increasingly depends on large language…

Read

June 11, 2026arXiv · AI

Read

June 11, 2026arXiv · AI

Toward Instructions-as-Code: Understanding the Impact of Instruction Files on Agentic Pull Requests

AI-agents (e.g., GitHub Copilot) collaborate as teammates in different software engineering tasks, including code generation proposed through pull requests (Agentic-PRs). For better agent efficiency, developers create in…

Read

June 11, 2026arXiv · AI

Ontology Memory-Augmented ASR Correction for Long Text-Speech Interleaved Conversations

Read

June 11, 2026arXiv · AI

Understanding the Rejection of Fixes Generated by Agentic Pull Requests -- Insights from the AIDev Dataset

AI coding agents are increasingly used to generate pull requests (PRs) that propose code fixes in software projects. From a first exploration of the AIDev dataset, we find that 46.41\% of the fixes proposed by the agents…

Read

June 11, 2026arXiv · AI

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

Read

June 11, 2026arXiv · AI

SupraBench: A Benchmark for Supramolecular Chemistry

Read

June 11, 2026arXiv · AI

Read

June 11, 2026arXiv · AI

This paper examines three recent frameworks for understanding the cognitive and epistemic consequences of artificial intelligence: Tri-System Theory, Thinkframes, and System 0. It argues that while the first two capture…

Read

June 11, 2026arXiv · AI

June 10, 2026Cohere

Read

June 9, 2026Hugging Face

How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces

Read

June 9, 2026Hugging Face

Read

June 9, 2026Google DeepMind

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Read

June 9, 2026Google DeepMind

Fluid, natural voice translation with Gemini 3.5 Live Translate

Gemini 3.5 Live Translate brings near real-time, natural speech translation to Google AI Studio, Google Translate and Google Meet.

Product Launch

Read

June 5, 2026Cohere

AI for Developers

Read

June 5, 2026Cohere

Company News

Read

June 5, 2026Cohere

Read

June 4, 2026Hugging Face

Read

June 3, 2026Hugging Face

Read

June 3, 2026Anthropic

Read

June 2, 2026GitHub

GitHub Copilot app: The agent-native desktop experience

At Microsoft Build 2026, GitHub introduced new tools, updates, and surfaces so agents can work the way you already work. The post GitHub Copilot app: The agent-native desktop experience appeared first on The GitHub Blog…

Read

June 2, 2026Anthropic

The rise of autonomous, long-running AI agents has introduced a new class of compute demand, namely tasks that maintain large context windows, spawn concurrent...

Read

June 1, 2026Hugging Face

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

Read

June 1, 2026Hugging Face

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

Read

June 1, 2026Cohere

May 28, 2026Mistral AI

Vibe gets to work.

Read

May 28, 2026Mistral AI

May 28, 2026Anthropic

Read

May 27, 2026Mistral AI

Introducing physics AI at Mistral: the foundation for engineering acceleration.

Read

May 27, 2026Microsoft

How AI coding agents actually use your technology

You ship an SDK, a CLI, an API, and developers use it. Now AI coding agents use it too, except they use it differently than humans do. Most of the time you have no idea what s actually happening between developer types a…

Read

May 27, 2026Hugging Face

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

Read

May 27, 2026Hugging Face

Reachy Mini goes fully local

Read

May 27, 2026Cohere

Read

May 26, 2026Anthropic

Anthropic appoints KiYoung Choi as Representative Director of Korea ahead of Seoul office opening

Read

May 25, 2026OpenAI

OpenAI, Grupo Folha and Grupo UOL announce strategic content partnership

OpenAI partners with Grupo Folha and Grupo UOL to bring trusted Brazilian journalism to ChatGPT, expanding access to news with attribution and transparency.

Read

May 25, 2026Hugging Face

Read

May 23, 2026Mistral AI

Read

May 22, 2026Mistral AI

Read

May 21, 2026Google DeepMind

Read

May 20, 2026Cohere

Read

May 19, 2026Hugging Face

OlmoEarth v1.1: A more efficient family of Earth observation models

Read

May 19, 2026Cohere

Read

May 19, 2026Anthropic

Read

May 17, 2026Google DeepMind

Read

May 14, 2026Hugging Face

Read

May 14, 2026Anthropic

Read

May 11, 2026Hugging Face

Read

May 6, 2026Hugging Face

Read

April 27, 2026Hugging Face

Read

April 24, 2026Netflix

Scaling Camera File Processing at Netflix

Orchestrating Media Workflows Through Strategic Collaboration Authors: Eric Reinecke , Bhanu Srikanth Introduction to Content Hub’s Media Production Suite At Netflix, we want to provide filmmakers with the tools they nee…

Read

April 24, 2026Hugging Face

DeepSeek-V4: a million-token context that agents can actually use

Read

April 24, 2026Discord

Measure Less to Learn More: Using Fewer, Higher-quality Metrics to Capture What Matters

Too many experiment metrics can make meaningful changes harder to detect. Learn how Discord used simulations and Principal Component Analysis to maximize signal and reduce noise.

Read

April 24, 2026Cohere

Cohere Aleph Alpha Join Forces

Read

April 24, 2026Anthropic

Read

April 22, 2026Yelp

How Yelp Keeps Server-Driven UI Consistent Across Four Platforms

If you’ve read our earlier post, you already know about CHAOS—the server-driven UI (SDUI) framework we built at Yelp that powers our dynamic views. Until now, we’ve explored its architecture, backend implementation, and…

Read

April 22, 2026Spotify

Background Coding Agents: Supercharging Downstream Consumer Dataset Migrations (Honk, Part 4)

How we used Honk, Backstage, and Fleet Management to ease the pain of migrating thousands of datasets. The post Background Coding Agents: Supercharging Downstream Consumer Dataset Migrations (Honk, Part 4) appeared first…

Read

April 22, 2026Google DeepMind

Decoupled DiLoCo: A new frontier for resilient, distributed AI training

Read

April 21, 2026Meta

Modernizing the Facebook Groups Search to Unlock the Power of Community Knowledge

We’ve fundamentally transformed Facebook Groups Search to help people more reliably discover, sort through, and validate community content that’s most relevant to them. We’ve adopted a new hybrid retrieval architecture a…

Read

April 21, 2026Hugging Face

AI and the Future of Cybersecurity: Why Openness Matters

Read

April 21, 2026Hugging Face

QIMMA قِمّة ⛰: A Quality-First Arabic LLM Leaderboard

Read

April 21, 2026Google DeepMind

Partnering with industry leaders to accelerate AI transformation

Google DeepMind partners with global consultancies to bring the power of frontier AI to organizations around the world.

Read

April 21, 2026Cohere

Read

April 17, 2026Netflix

The Human Infrastructure: How Netflix Built the Operations Layer Behind Live at Scale

By: Brett Axler , Casper Choffat , and Alo Lowry In the three years since our first Live show, Chris Rock: Selective Outrage , we have witnessed an incredible expansion of our live content slate and the live operations t…

Read

April 17, 2026Anthropic

Read

April 16, 2026Hugging Face

The PR you would have opened yourself

Read

April 16, 2026Hugging Face

Ecom-RLVE: Adaptive Verifiable Environments for E-Commerce Conversational Agents

Read

April 16, 2026Discord

Making Discord on Desktop Look Just Right: Display Settings to Ease the Eyes

Learn all sorts of toggles, options, and features on Discord’s desktop app to help you view media at your pace, lower the strength of colors across the app, and make app content easier to see.

Read

April 13, 2026Google DeepMind

Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning

Gemini Robotics ER 1.6: Enhancing spatial reasoning and multi-view understanding for autonomous robotics.

Read

April 10, 2026Netflix

Evaluating Netflix Show Synopses with LLM-as-a-Judge

by Gabriela Alessio , Cameron Taylor , and Cameron R. Wolfe Introduction When members log into Netflix, one of the hardest choices is what to watch. The challenge isn’t a lack of options — there are thousands of titles —…

Read

April 9, 2026Hugging Face

Multimodal Embedding & Reranker Models with Sentence Transformers

Read

April 9, 2026Hugging Face

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs

Read

April 8, 2026Pinterest

Performance for Everyone

Author: Lin Wang (Android Performance Engineer) Default Feature For mobile apps, performance is considered as the “default feature”, which means apps are expected to run fast and be responsive. It’s just as if we expect…

Read

April 8, 2026Hugging Face

Read

April 2, 2026Dropbox

Improving storage efficiency in Magic Pocket, our immutable blob store

By turning compaction into a layered, adaptive pipeline and strengthening our monitoring and controls, we made Magic Pocket more resilient to workload changes.

Read

April 2, 2026Google DeepMind

Gemma 4: Byte for byte, the most capable open models

Gemma 4: Our most intelligent open models to date, purpose-built for advanced reasoning and agentic workflows.

Read

April 2, 2026Cohere

The Enterprise AI Maturity Model

Read

April 1, 2026Hugging Face

Any Custom Frontend with Gradio's Backend

Read

April 1, 2026Hugging Face

Read

March 31, 2026Hugging Face

Training mRNA Language Models Across 25 Species for $165

Read

March 31, 2026Hugging Face

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

Read

March 31, 2026Cohere

Ensemble Brings Agentic AI to RCM Platform with Cohere

Read

March 31, 2026Anthropic

Read

March 27, 2026Discord

How Multi-Factor Authentication Helps Keep Your Discord Account Safe

A Discord account is more than just your username and avatar. That’s why it’s important to help keep your account safe and secure by using Multi-Factor Authentication, SMS Backup Authentication & QR Code Login. Learn how…

Read

March 26, 2026Google DeepMind

Gemini 3.1 Flash Live: Making audio AI more natural and reliable

Our latest voice model has improved precision and lower latency to make voice interactions more fluid, natural and precise.

Read

March 26, 2026Cohere

Read

March 24, 2026Discord

Discord Update: March 24, 2026 Changelog

Here s the Discord Changelog from March 24, 2026, so you can stay informed on what’s new in recent app updates!

Read

March 23, 2026Mistral AI

Speaking of Voxtral

Read

March 20, 2026Hugging Face

Read

March 17, 2026Dropbox

How we optimized Dash's relevance judge with DSPy

We used DSPy to turn prompt engineering for our relevance judge into a measurable, automated optimization loop, improving task performance, cost, and how reliably it works in production.

Read

March 17, 2026Google DeepMind

Measuring progress toward AGI: A cognitive framework

We’re introducing a framework to measure progress toward AGI, and launching a Kaggle hackathon to build the relevant evaluations.

Read

March 16, 2026Mistral AI

Mistral AI partners with NVIDIA to accelerate open frontier models

Read

March 16, 2026Mistral AI

Leanstral: Open-Source foundation for trustworthy vibe-coding

Read

March 16, 2026Mistral AI

Introducing Mistral Small 4

Read

March 16, 2026Discord

How ROOST is Advancing Online Safety

The threat landscape online has shifted dramatically. Many online platforms are left to reinvent safety tools from scratch. That’s the gap ROOST was built to close — and it’s why open-sourcing battle-tested tools like Os…

Read

March 16, 2026Cohere

Read

March 11, 2026Mistral AI

Rails testing on autopilot: Building an agent that writes what developers won't

Read

March 11, 2026Anthropic

Introducing The Anthropic Institute

Read

March 10, 2026Anthropic

Read

March 5, 2026Anthropic

Read

February 24, 2026Mozilla

Goodbye innerHTML, Hello setHTML: Stronger XSS Protection in Firefox 148

Cross-site scripting (XSS) remains one of the most prevalent vulnerabilities on the web. The new standardized Sanitizer API provides a straightforward way for web developers to sanitize untrusted HTML before inserting it…

Read

February 24, 2026Discord

Getting Global Age Assurance Right: What We Got Wrong and What's Changing

Discord s CTO addresses community concerns about age assurance: no mass ID collection, new vendor transparency commitments, and a delayed global launch until second half of 2026.

Read

February 23, 2026Anthropic

Read

February 17, 2026Anthropic

Introducing Claude Sonnet 4.6

Read

February 16, 2026Anthropic

Anthropic opens Bengaluru office and announces new partnerships across India

Read

February 13, 2026Cohere

Cohere signs world chess champion Magnus Carlsen as brand ambassador

Read

February 13, 2026Anthropic

Read

February 12, 2026Anthropic

Anthropic raises $30 billion in Series G funding at $380 billion post-money valuation

Read

February 11, 2026Dropbox

Insights from our executive roundtable on AI and engineering productivity

From Claude Code to Cursor, we're big adopters of AI coding tools at Dropbox. The early results have been promising, but there are still a lot of open questions about how to work with these tools most effectively and whe…

Read

February 11, 2026Anthropic

Read

January 29, 2026Grab

Cursor at Grab: Adoption and impact

Adoption overview The illustration below encapsulates how Cursor is scaled across Grab, achieving rapid and widespread adoption that accelerated software development and empowered non-technical teams to build solutions.…

Read

January 29, 2026Google DeepMind

Project Genie: Experimenting with infinite, interactive worlds

Google AI Ultra subscribers in the U.S. can try out Project Genie, an experimental research prototype that lets you create and explore worlds.

Read

January 28, 2026Dropbox

Engineering VP Josh Clemm on how we use knowledge graphs, MCP, and DSPy in Dash

Engineering VP Josh Clemm deep-dives into how we think about knowledge graphs, indexes, MCP, and prompt optimization using tools like DSPy.

Read

January 28, 2026Cohere

Model Vault: a private platform for secure model inference

Read

January 28, 2026Anthropic

ServiceNow chooses Claude to power customer apps and increase internal productivity

Read

January 27, 2026Mistral AI

Terminally online Mistral Vibe.

Read

January 21, 2026Mistral AI

Read

January 13, 2026Google DeepMind

Veo 3.1 Ingredients to Video: More consistency, creativity and control

Our latest Veo update generates lively, dynamic clips that feel natural and engaging — and supports vertical video generation.

Read

January 10, 2026BAIR (Berkeley)

Information-Driven Design of Imaging Systems

An encoder (optical system) maps objects to noiseless images, which noise corrupts into measurements. Our information estimator uses only these noisy measurements and a noise model to quantify how well measurements disti…

Read

January 6, 2026Lyft

Lyft’s Feature Store: Architecture, Optimization, and Evolution

Written by Rohan Varshney , with support from Devon Mittow Janice Lee . This article expands upon a presentation from the Feature Store Summit 2025, which can be viewed in full here . There is also another video availabl…

Read

December 23, 2025Google DeepMind

Google's year in review: 8 areas with research breakthroughs in 2025

Google 2025 recap: Research breakthroughs of the year

Read

December 17, 2025Mistral AI

Read

December 15, 2025Lyft

From Python3.8 to Python3.10: Our Journey Through a Memory Leak

Image generated with ChatGPT (OpenAI), 2025. Intro When working with Python, memory management often feels like a solved problem. The garbage collector quietly does its job, and unlike C or C++, we rarely think about mal…

Read

December 12, 2025Discord

How to Make and Use Custom Emoji on Discord

Emojis on Discord are special — you can make a little picture out of almost any symbol, in-joke, or bizarre late-night inspiration.

Read

December 12, 2025Google DeepMind

Improved Gemini audio models for powerful voice experiences

Read

December 11, 2025Cohere

Introducing Rerank 4: Cohere’s most powerful reranker yet

Read

December 10, 2025Cohere

Building Trust in AI: Cohere’s Approach to AI Governance

Read

December 9, 2025Mistral AI

Read

December 2, 2025Discord

Bringing In-Game Commerce to Discord Communities

By bringing commerce directly to official game communities, we re giving game developers the opportunity to benefit from these dynamics, creating incremental revenue that complements their existing storefronts.

Read

December 2, 2025Discord

Save and Display Your Faves: Add Discord Shop & Marvel Rivals Items to Your Profile’s Wishlist

Keep track of all the stuff from the Shop you’ve been wanting to purchase with the new Wishlist feature. Display the stuff you’ve been eyeing on your profile, and if you’re lucky enough, maybe one of your friends may see…

Read

December 1, 2025Slack

Streamlining Security Investigations with Agents

Slack’s Security Engineering team is responsible for protecting Slack’s core infrastructure and services. Our security event ingestion pipeline handles billions of events per day from a diverse array of data sources. Rev…

Read

November 27, 2025Cohere

Read

November 18, 2025Lyft

LyftLearn Evolution: Rethinking ML Platform Architecture

Written by Yaroslav Yatsiuk At Lyft, machine learning (ML) is the engine behind our most critical business functions — from dispatch and pricing optimization to fraud detection and support automation. Our ML infrastructu…

Read

November 14, 2025Discord

How to Share What You’re Playing, Listening to, or Watching as Your Status on Discord

Playing a game right now? Listening to some tunes, or catching up on that one anime your friends won’t stop talking about? Learn how to show off what you’re up to as your Discord status and show @everyone what’s up!

Read

November 14, 2025Discord

How to Link Discord to Battlefield 6, Marvel Rivals & More

Some of the most popular multiplayer games have added the ability to directly link your Discord account to the game! Learn how to link your Discord account to some big-name titles and see what sorta perks it provides.

Read

November 13, 2025Instacart

Building The Intent Engine: How Instacart is Revamping Query Understanding with LLMs

Authors: Yuanzheng Zhu, Guanghua Shu, Raochuan Fan, Vinesh Gudla, Tejaswi Tenneti Introduction When people search for items on Instacart, they don’t always type perfectly worded phrases. They might write “bread no gluten…

Read

November 13, 2025Cohere

HIPAA Business Associate agreements for custom model development

Read

November 12, 2025Cohere

Read

October 29, 2025Etsy

Improving performance by prefetching product pages from Etsy Search

Rarely are there opportunities for big, bold, game-changing improvements in web performance. The Speculation Rules API (SRA) is a recent browser development that offers just such an opportunity. This post details a joint…

Read

October 24, 2025Mistral AI

Read

September 26, 2025Yelp

S3 server access logs at scale

Introduction Yelp heavily relies on Amazon S3 (Simple Storage Service) to store a wide variety of data, from images, logs, database backups, and more. Since data is stored on the cloud, we need to carefully manage how th…

Read

September 25, 2025Discord

New Looks for Nitro, New Looks for You. Get Yourself a Nitro-exclusive Profile Bundle!

Between now and September 30th, 2025, new and existing Nitro members can claim a profile bundle matching Nitro s new look, including an Avatar Decoration, Profile Effect, and a Nameplate! Open this blog to see the detail…

Read

September 25, 2025Discord

Discord Update: September 25, 2025 Changelog

Here s the Discord Changelog from September 25, 2025, so you can stay informed on what’s new in recent app updates!

Read

September 24, 2025Cohere

Cohere adds $100M in second close to latest round as it scales security-first enterprise AI

Read

September 23, 2025Cohere

Exploring AI in Education

Read

September 15, 2025Cohere

Cohere opens Paris office as EMEA hub

Read

September 12, 2025Discord

Staff Picks, September 2025: Welcome to Our Video Game Museum

Happy National Video Games Day! We’re hearing from Veronica, Scott, Tyler, and Anni about which games would go in their hypothetical video games museum. Nothing better than honoring a beloved game by putting it behind a…

What Are AI Benchmarks? A Business Guide for Evaluations

Read

July 3, 2025Mistral AI

Announcing AI for Citizens

Read

July 3, 2025Cohere

Cohere ouvre un bureau à Montréal

Read

July 1, 2025BAIR (Berkeley)

Whole-Body Conditioned Egocentric Video Prediction

.modal { display: none; position: fixed; z-index: 9999; padding-top: 50px; left: 0; top: 0; width: 100%; height: 100%; overflow: auto; background-color: rgba(0,0,0,0.9); } .modal-content { margin: auto; display: block; m…

Read

June 27, 2025Cohere

Cohere achieves ISO 42001 and ISO 27001 certifications

Read

June 15, 2025Cohere

The International Conference on Computer Vision (ICCV 2021) will be hosted virtually next week. We’re excited to share all the work from SAIL that will be presented, and you’ll find links to papers, videos and blogs belo…

Read

Statement on the US government directive to suspend access to Fable 5 and Mythos 5

How Preply combines AI and human tutors to personalize learning

New OpenAI Academy courses for the next era of work

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI Benchmark

Ire identifies another LOTUSLITE specimen

olmo-eval: An evaluation workbench for the model development loop

Scaling out Distroless adoption With AI

How we made GitHub Copilot CLI more selective about delegation

How Dropbox uses MCP and Dash to close the design-to-code security gap

Scaling Security Insights: how we achieved a 10x increase in global scanning capacity

Results from the first Anthropic Public Record

TCS and Anthropic partner to bring Claude to regulated industries

Stripe Projects adds new agent integrations, more providers, and custom developer controls

Agentic Testing: Where Agents Fit in the E2E Testing Stack

How MuleSoft Is Raising the Trust Bar for AI-Generated Code

OpenAI to acquire Ona

BBVA puts AI at the core of banking with OpenAI

How an astrophysicist uses Codex to help simulate black holes

Supporting Europe’s work in ensuring a trustworthy AI ecosystem

One-Click Multi-Tenant Security with NVIDIA Quantum InfiniBand

Your agent just scaffolded a project from 2020

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

GitLab Patch Release: 19.0.2, 18.11.5, 18.10.8

Making secret scanning more trustworthy: Reducing false positives at scale

GitHub availability report: May 2026

Zanele Munyikwa

MiniPIC: Flexible Position-Independent Caching in <100LOC

HyPE: Category-Aware Hypergraph Encoding with Persistent Edge Embeddings for Persona-Grounded Dialogue

NTS-CoT: Mitigating Hallucinations in LLM-based News Timeline Summarization with Chain-of-Thought Reasoning

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

MemRefine: LLM-Guided Compression for Long-Term Agent Memory

LAUKIN: A Multi-jurisdictional Common Law Contract Dataset

A Context-Aware Dataset for Stance Detection in Bioethical Controversies on Reddit

SICI: A Semantic-Pragmatic Complexity Index Reveals Regime Shifts in LLM Stance Detection

Understanding helpfulness and harmless tension in reward models

Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization

When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates

PolyAlign: Conditional Human-Distribution Alignment

ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm

Evaluating Pluralism in LLMs through Latent Perspectives

TimeLens: On-Device Artifact Recognition with Retrieval-Augmented Question Answering for the Grand Egyptian Museum

Cross-Modal Masked Compositional Concept Modeling for Enhancing Visio-Linguistic Compositionality

RogueAI: A Reverse Turing Test for Detecting Licensed AI Deception in Dialogue

SkillCAT: Contrastive Assessment and Topology-Aware Skill Self-Evolution for LLM Agents

Low-Latency Real-Time Audio Game Commentary System via LLM-Based Parallel Text Generation

IVIE: A Neuro-symbolic Approach to Incremental and Validated Generation of Interactive Fiction Worlds

From Passive Generation to Investigation: A Proactive Scientific Peer Review Agent

An End-to-End Hybrid Framework for Rumour Detection in Low-Resources Algerian Dialect

S-GBT: Smooth Growth Bound Tensor for Certified Robustness Against Word Substitution Attacks in NLP

Why Sampling Is Not Choosing: Intentionality, Agency, and Moral Responsibility in Large Language Models

Examining the Cognitive Gap Between Authors and Peer Reviewers on Academic Paper Novelty

Ontology Memory-Augmented ASR Correction for Long Text-Speech Interleaved Conversations

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

SupraBench: A Benchmark for Supramolecular Chemistry

Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data

When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval

Adaptive Turn-Taking for Real-time Multi-Party Voice Agents

Uncertainty-Aware Hybrid Retrieval for Long-Document RAG

Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models

ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

The Tone of Awareness: Topic, Sentiment, and Toxicity Maps During Mental Health Month on TikTok

Reward Modeling for Multi-Agent Orchestration

Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models

One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders

Beyond Uniform Tokens: Adaptive Compression for Time Series Language Models

From Tokens to Faces: Investigating Discrete Speech Representations for 3D Facial Animation

Operads for compositional reasoning in LLMs

Recursive Agent Harnesses

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

Operadic consistency: a label-free signal for compositional reasoning failures in LLMs

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents

Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Clipping Makes Distributed and Federated Asynchronous SGD Robust to Stragglers

Simultaneous Latent Budget Trees for Stratified Classification

Quantizing Time-Series Models As Dynamical Systems: Trajectory-Based Quantization Sensitivity Score