GPU论文 - 专知

会员服务 ·

GPU

ML-Based Optimum Sub-system Size Heuristic for the GPU Implementation of the Tridiagonal Partition Method

Arxiv

0+阅读 · 10月31日

Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding

Arxiv

0+阅读 · 10月31日

AMD MI300X GPU Performance Analysis

AMD MI300X GPU Performance Analysis

Arxiv

0+阅读 · 10月31日

Intelligent Software System for Low-Cost, Brightfield Segmentation: Algorithmic Implementation for Cytometric Auto-Analysis

Intelligent Software System for Low-Cost, Brightfield Segmentation: Algorithmic Implementation for Cytometric Auto-Analysis

Arxiv

0+阅读 · 10月31日

Tokencake: A KV-Cache-centric Serving Framework for LLM-based Multi-Agent Applications

Arxiv

0+阅读 · 10月31日

Dynamic Risk Assessments for Offensive Cybersecurity Agents

Arxiv

0+阅读 · 10月30日

Running VLAs at Real-time Speed

Running VLAs at Real-time Speed

Arxiv

0+阅读 · 10月30日

Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models

Arxiv

0+阅读 · 10月30日

Analysis and Optimized CXL-Attached Memory Allocation for Long-Context LLM Fine-Tuning

Arxiv

0+阅读 · 10月30日

VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference

Arxiv

0+阅读 · 10月30日

GPU-Accelerated Primal Heuristics for Mixed Integer Programming

Arxiv

0+阅读 · 10月30日

Oneiros: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving

Arxiv

0+阅读 · 10月29日

mitransient: Transient light transport in Mitsuba 3

Arxiv

0+阅读 · 10月29日

Dynamic Risk Assessments for Offensive Cybersecurity Agents

Arxiv

0+阅读 · 10月29日

Scalable GPU-Based Integrity Verification for Large Machine Learning Models

Arxiv

0+阅读 · 10月27日

参考链接

微信扫码咨询专知VIP会员