类群是类群吗? (Are Classes Clusters?) - 专知论文

会员服务 ·

0

Performer · MoDELS · 分类模型 · 簇 · 类别 ·

2021 年 4 月 16 日

Are Classes Clusters?

翻译：类群是类群吗?

from arxiv, 7 pages, 4 tables

Sentence embedding models aim to provide general purpose embeddings for sentences. Most of the models studied in this paper claim to perform well on STS tasks - but they do not report on their suitability for clustering. This paper looks at four recent sentence embedding models (Universal Sentence Encoder (Cer et al., 2018), Sentence-BERT (Reimers and Gurevych, 2019), LASER (Artetxe and Schwenk, 2019), and DeCLUTR (Giorgi et al., 2020)). It gives a brief overview of the ideas behind their implementations. It then investigates how well topic classes in two text classification datasets (Amazon Reviews (Ni et al., 2019) and News Category Dataset (Misra, 2018)) map to clusters in their corresponding sentence embedding space. While the performance of the resulting classification model is far from perfect, it is better than random. This is interesting because the classification model has been constructed in an unsupervised way. The topic classes in these real life topic classification datasets can be partly reconstructed by clustering the corresponding sentence embeddings.

翻译：嵌入判决模式的目的是为判决提供一般目的嵌入。本文件所研究的大多数模式都声称在STS任务中表现良好,但并不报告是否适合分组。本文审视了最近四个嵌入判决模式的嵌入模式(Universal Polden Eccoder(Cer等人,2018年)、判决-BERT(Reimers和Gurevych,2019年)、LASER(Artexe和Schwenk,2019年)和DECLUTR(Giorgi等人,2020年) 。它简要概述了执行这些模式背后的想法。它随后调查了两个文本分类数据集(Amazon Revi等人,2019年)和News Gelge Datasset(Misra,2018年))的组合图在相应的句内嵌入空间中的组合中的各个主题类别。虽然由此形成的分类模型的性能远非完美,但优于随机性。这很有趣,因为分类模型是以未受监督的方式构建的。这些真实生活分类数据集的专题分类可以部分通过组合来重建。

0

相关内容

Performer

【2020新书】Python文本分析，104页pdf

【2020新书】Python文本分析，104页pdf

专知会员服务

100+阅读 · 2020年12月23日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【干货书】机器学习Primer，122页pdf

【干货书】机器学习Primer，122页pdf

专知会员服务

109+阅读 · 2020年10月5日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

On Margin-Based Cluster Recovery with Oracle Queries

Arxiv

0+阅读 · 2021年6月9日

Seeing All From a Few: Nodes Selection Using Graph Pooling for Graph Clustering

Arxiv

0+阅读 · 2021年6月8日

Intrinsic Dimension Estimation

Arxiv

0+阅读 · 2021年6月8日

Automatic selection of clustering algorithms using supervised graph embedding

Arxiv

0+阅读 · 2021年6月7日

Structural Deep Clustering Network

Arxiv

3+阅读 · 2020年2月5日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

Centroid Networks for Few-Shot Clustering and Unsupervised Few-Shot Classification

Arxiv

3+阅读 · 2019年2月22日

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

Arxiv

4+阅读 · 2018年12月22日

Super Characters: A Conversion from Sentiment Classification to Image Classification

Arxiv

4+阅读 · 2018年10月15日

VIP会员

文章信息

相关主题

相关VIP内容

【2020新书】Python文本分析，104页pdf

【2020新书】Python文本分析，104页pdf

专知会员服务

100+阅读 · 2020年12月23日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【干货书】机器学习Primer，122页pdf

【干货书】机器学习Primer，122页pdf

专知会员服务

109+阅读 · 2020年10月5日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

On Margin-Based Cluster Recovery with Oracle Queries

Arxiv

0+阅读 · 2021年6月9日

Seeing All From a Few: Nodes Selection Using Graph Pooling for Graph Clustering

Arxiv

0+阅读 · 2021年6月8日

Intrinsic Dimension Estimation

Arxiv

0+阅读 · 2021年6月8日

Automatic selection of clustering algorithms using supervised graph embedding

Arxiv

0+阅读 · 2021年6月7日

Structural Deep Clustering Network

Arxiv

3+阅读 · 2020年2月5日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

How to Fine-Tune BERT for Text Classification?

How to Fine-Tune BERT for Text Classification?

Arxiv

13+阅读 · 2019年5月14日

Centroid Networks for Few-Shot Clustering and Unsupervised Few-Shot Classification

Arxiv

3+阅读 · 2019年2月22日

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

Arxiv

4+阅读 · 2018年12月22日

Super Characters: A Conversion from Sentiment Classification to Image Classification

Arxiv

4+阅读 · 2018年10月15日

微信扫码咨询专知VIP会员