模型性能论文 - 专知

会员服务 ·

模型性能

Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance

Arxiv

0+阅读 · 12月15日

Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths

Arxiv

0+阅读 · 11月24日

MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation

Arxiv

0+阅读 · 11月28日

Can Language Models Discover Scaling Laws?

Arxiv

0+阅读 · 12月15日

Reflections on the Reproducibility of Commercial LLM Performance in Empirical Software Engineering Studies

Arxiv

0+阅读 · 11月14日

Rectifying LLM Thought from Lens of Optimization

Arxiv

0+阅读 · 12月1日

HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks

Arxiv

0+阅读 · 12月4日

Bant: Byzantine Antidote via Trial Function and Trust Scores

Arxiv

0+阅读 · 12月4日

AtomDisc: An Atom-level Tokenizer that Boosts Molecular LLMs and Reveals Structure--Property Associations

Arxiv

0+阅读 · 11月28日

Adjusting for Heavy Censoring and Double-Dipping to Compare Risk Stratification Abilities of Existing Models for Time to Diagnosis of Huntington Disease

Arxiv

0+阅读 · 11月5日

Script Gap: Evaluating LLM Triage on Indian Languages in Native vs Roman Scripts in a Real World Setting

Arxiv

0+阅读 · 12月11日

Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality

Arxiv

0+阅读 · 11月6日

Learning from Sufficient Rationales: Analysing the Relationship Between Explanation Faithfulness and Token-level Regularisation Strategies

Arxiv

0+阅读 · 11月20日

Extendable Planning via Multiscale Diffusion

Arxiv

0+阅读 · 11月16日

Mind the Gap... or Not? How Translation Errors and Evaluation Details Skew Multilingual Results

Arxiv

0+阅读 · 11月7日

参考链接

微信扫码咨询专知VIP会员