Sentence embedding - Revision history

RobowaifuDev: Added DiffCSE

2023-01-09T11:21:01Z

Added DiffCSE

← Older revision		Revision as of 04:21, 9 January 2023
Line 3:		Line 3:
	=== State of the art ===		=== State of the art ===
	As of October 2022, [[CLIP]] text embeddings from a 38M parameter model have been found out perform [[BERT]] and Phrase-BERT 110M parameter models, when using [[domain aware prompting]] on sentences from news articles ([[CoNLL-2003]]), chemical-disease interactions ([[BC5CDR]]), and emerging and rare entity recognition ([[WNUT 2017]]).<ref>An Yan, Jiacheng Li, Wanrong Zhu, Yujie Lu, William Yang Wang, Julian McAuley. "CLIP also Understands Text: Prompting CLIP for Phrase Understanding." 2022; [https://arxiv.org/abs/2210.05836 arXiv:2210.05836]</ref> Without domain aware prompting, CLIP still outperformed other models on sentences from news articles.		As of October 2022, [[CLIP]] text embeddings from a 38M parameter model have been found out perform [[BERT]] and Phrase-BERT 110M parameter models, when using [[domain aware prompting]] on sentences from news articles ([[CoNLL-2003]]), chemical-disease interactions ([[BC5CDR]]), and emerging and rare entity recognition ([[WNUT 2017]]).<ref>An Yan, Jiacheng Li, Wanrong Zhu, Yujie Lu, William Yang Wang, Julian McAuley. "CLIP also Understands Text: Prompting CLIP for Phrase Understanding." 2022; [https://arxiv.org/abs/2210.05836 arXiv:2210.05836]</ref> Without domain aware prompting, CLIP still outperformed other models on sentences from news articles.

			As of April 2022, [[DiffCSE]] achieves state-of-the-art results in unsupervised sentence representation learning.<ref>Chuang et al. "DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings." 2022. [https://arxiv.org/abs/2204.10298 arXiv:2204.10298]</ref>

	=== Pretrained models ===		=== Pretrained models ===

RobowaifuDev: Added references heading

2022-10-13T19:15:10Z

Added references heading

← Older revision		Revision as of 12:15, 13 October 2022
Line 6:		Line 6:
	=== Pretrained models ===		=== Pretrained models ===
	[https://www.sbert.net/docs/pretrained_models.html Sentence transformers] provides a variety of pretrained models for sentence embeddings.		[https://www.sbert.net/docs/pretrained_models.html Sentence transformers] provides a variety of pretrained models for sentence embeddings.

			=== References ===

RobowaifuDev: Created page with "A '''sentence embedding''' is a technique in natural language processing where sentences are mapped to vectors and can be used for similarity..."

2022-10-13T19:14:31Z

Created page with "A '''sentence embedding''' is a technique in natural language processing where sentences are mapped to vectors and can be used for similarity..."

New page

A '''sentence embedding''' is a technique in [[Natural language processing|natural language processing]] where sentences are mapped to vectors and can be used for [[similarity search]]. In transformer models this is usually achieved with a [[classification token]] but it can also be done by taking the first token of the hidden state of a [[Transformer|transformer encoder]] or mean pooling over all tokens, from the last layer or multiple layers.

=== State of the art ===
As of October 2022, [[CLIP]] text embeddings from a 38M parameter model have been found out perform [[BERT]] and Phrase-BERT 110M parameter models, when using [[domain aware prompting]] on sentences from news articles ([[CoNLL-2003]]), chemical-disease interactions ([[BC5CDR]]), and emerging and rare entity recognition ([[WNUT 2017]]).<ref>An Yan, Jiacheng Li, Wanrong Zhu, Yujie Lu, William Yang Wang, Julian McAuley. "CLIP also Understands Text: Prompting CLIP for Phrase Understanding." 2022; [https://arxiv.org/abs/2210.05836 arXiv:2210.05836]</ref> Without domain aware prompting, CLIP still outperformed other models on sentences from news articles.

=== Pretrained models ===
[https://www.sbert.net/docs/pretrained_models.html Sentence transformers] provides a variety of pretrained models for sentence embeddings.