Site is currently under maintenance
Some features may be unavailable or limited during this time. We apologize for any inconvenience and appreciate your patience.

Published on 01 January 2026

Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering

View Dataset
Panchendrarajan, Rrubaa;Zubiaga, Arkaitz

Description

This repository contains the train and test data used for the experiments involving Claim2Vec Model - the first multilingual embedding model optimized to represent fact-check claims as vectors in an improved semantic embedding space. MultiClaim - Train49K Multilingual claim pairs annotated for their similarity using three large language models as similar or dissimilar. All claim pairs belongs to topic group 1.Content:CID_1, CID_2 - Factchecked claim IDs from the original dataset MultiClaimNetCLAIM_1, CLAIM_2 - Claims in their original languageTranslation_1, Translation_1, Claims in English translationLanguage_1, Language_2 - Language of the claimsLabel - 1/0 similar/dissimilar MultiClaim - TestSubset of clusters from MultiClaim from MultiClaimNet discussing topic group 2. This set composed of 42.4K claims grouped into 16K clusters.  Preprint: https://arxiv.org/abs/2604.09812 ReferencesIf you use claim pairs from Claim2Vec research in any publication, project, tool, or in any other form, please cite the following paper:@misc{panchendrarajan2026claim2ve,      title={Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering},       author={Rrubaa Panchendrarajan and Arkaitz Zubiaga},      year={2026},      eprint={2604.09812},      archivePrefix={arXiv},      primaryClass={cs.CL},      url={https://arxiv.org/abs/2604.09812}, }

Citations (0)

Mentions (0)

Metrics Over Time

Publication Details

DOI

Publisher

Zenodo