SLAP: SISMD 矢量长的可分延迟适应性可调整VLIW输油管结构 (SLAP: A Split Latency Adaptive VLIW pipeline architecture which enables on-the-fly variable SIMD vector-length) - 专知论文

会员服务 ·

0

Performer · 向量化 · 标量 · Processing（编程语言） · 泛函 ·

2021 年 2 月 26 日

SLAP: A Split Latency Adaptive VLIW pipeline architecture which enables on-the-fly variable SIMD vector-length

翻译：SLAP: SISMD 矢量长的可分延迟适应性可调整VLIW输油管结构

Ashish Shrivastava,Alan Gatherer,Tong Sun,Sushma Wokhlu,Alex Chandra

from arxiv, Selected in ICASSP 2021 Conference

Over the last decade the relative latency of access to shared memory by multicore increased as wire resistance dominated latency and low wire density layout pushed multiport memories farther away from their ports. Various techniques were deployed to improve average memory access latencies, such as speculative pre-fetching and branch-prediction, often leading to high variance in execution time which is unacceptable in real time systems. Smart DMAs can be used to directly copy data into a layer1 SRAM, but with overhead. The VLIW architecture, the de facto signal processing engine, suffers badly from a breakdown in lockstep execution of scalar and vector instructions. We describe the Split Latency Adaptive Pipeline (SLAP) VLIW architecture, a cache performance improvement technology that requires zero change to object code, while removing smart DMAs and their overhead. SLAP builds on the Decoupled Access and Execute concept by 1) breaking lockstep execution of functional units, 2) enabling variable vector length for variable data level parallelism, and 3) adding a novel triangular load mechanism. We discuss the SLAP architecture and demonstrate the performance benefits on real traces from a wireless baseband system (where even the most compute intensive functions suffer from an Amdahls law limitation due to a mixture of scalar and vector processing).

翻译：过去十年来,随着电线阻力控制延迟和低电线密度布局将多端记忆推离港口,多层记忆存取的相对延迟性因电线阻力控制下拉动和低线密度布局将多端记忆拖离港口更远而增加。我们运用了各种技术来改善平均记忆存取延缓,例如投机性预拉和分管,往往导致执行时间差异很大,在实时系统中这是不可接受的。智能DMA可直接将数据复制到一层1 SRAM,但有间接费用。VLIW结构,即事实上的信号处理引擎,由于固定执行卡路和矢量指令的中断而严重受损。我们描述了SLIP结构,并展示了需要零修改对象代码的缓存性能改进技术,同时删除智能DMAs及其间接费用。 SLAP在分解式接入和执行概念的基础上,1)打破了功能单位的锁链条,2)允许数据级平行的可变矢量长度,3)添加了一个新的三角载荷机制。我们讨论了SLAP结构,并展示了无线控制层控制层控制室系统至最大限制矢量的容器系统的实际跟踪功能。

0

相关内容

Performer

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【Google大脑Sara Sabour】胶囊架构（Capsule Architectures），附47页ppt

【Google大脑Sara Sabour】胶囊架构（Capsule Architectures），附47页ppt

专知会员服务

39+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

TensorFlow 2.0 分布式训练

TensorFlow 2.0 分布式训练

TensorFlow

8+阅读 · 2020年1月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

泡泡机器人SLAM

29+阅读 · 2018年10月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

Receding-Horizon Perceptive Trajectory Optimization for Dynamic Legged Locomotion with Learned Initialization

Arxiv

0+阅读 · 2021年4月19日

A Follow-the-Leader Strategy using Hierarchical Deep Neural Networks with Grouped Convolutions

Arxiv

0+阅读 · 2021年4月17日

Stream Processing With Dependency-Guided Synchronization

Arxiv

0+阅读 · 2021年4月17日

An expressiveness hierarchy of Behavior Trees and related architectures

Arxiv

0+阅读 · 2021年4月16日

Reducing Solid-State Drive Read Latency by Optimizing Read-Retry

Arxiv

0+阅读 · 2021年3月25日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

Arxiv

7+阅读 · 2019年4月16日

Neural Architecture Search: A Survey

Arxiv

12+阅读 · 2018年9月5日

SINet: A Scale-insensitive Convolutional Neural Network for Fast Vehicle Detection

Arxiv

11+阅读 · 2018年4月2日

Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor

Arxiv

4+阅读 · 2017年12月19日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【Google大脑Sara Sabour】胶囊架构（Capsule Architectures），附47页ppt

【Google大脑Sara Sabour】胶囊架构（Capsule Architectures），附47页ppt

专知会员服务

39+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

TensorFlow 2.0 分布式训练

TensorFlow 2.0 分布式训练

TensorFlow

8+阅读 · 2020年1月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

泡泡机器人SLAM

29+阅读 · 2018年10月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

Receding-Horizon Perceptive Trajectory Optimization for Dynamic Legged Locomotion with Learned Initialization

Arxiv

0+阅读 · 2021年4月19日

A Follow-the-Leader Strategy using Hierarchical Deep Neural Networks with Grouped Convolutions

Arxiv

0+阅读 · 2021年4月17日

Stream Processing With Dependency-Guided Synchronization

Arxiv

0+阅读 · 2021年4月17日

An expressiveness hierarchy of Behavior Trees and related architectures

Arxiv

0+阅读 · 2021年4月16日

Reducing Solid-State Drive Read Latency by Optimizing Read-Retry

Arxiv

0+阅读 · 2021年3月25日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

Arxiv

7+阅读 · 2019年4月16日

Neural Architecture Search: A Survey

Arxiv

12+阅读 · 2018年9月5日

SINet: A Scale-insensitive Convolutional Neural Network for Fast Vehicle Detection

Arxiv

11+阅读 · 2018年4月2日

Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor

Arxiv

4+阅读 · 2017年12月19日

微信扫码咨询专知VIP会员