The scDART-seq data used in the <b>Statistical modeling of single-cell epitranscriptomics enabled trajectory and regulatory inference of RNA methylation</b>

View Dataset
Wang, Haozhe;Wang, Yue;Zhou, Jingxian;Song, Bowen;Tu, Gang;Nguyen, Anh;Su, Jionglong;Coenen, Frans;Wei, Zhi;Rigden, Daniel;Meng, Jia

Description

The scDART-seq data used in the study "Statistical Modeling of Single-Cell Epitranscriptomics Enabled Trajectory and Regulatory Inference of RNA Methylation" was obtained from both the SMART-seq2 and 10x Genomics platforms.For the SMART-seq2 dataset, the aim was to profile the m6A epitranscriptome in 1,382 HEK293T cells, consisting of 991 cells with m6A modifications (APOBEC1-YTH) and 391 negative control cells (APOBEC1-YTHmut). The negative control cells, identified by their mutated YTH domain, were unable to induce m6A-associated signals, thereby serving as a means to estimate background noise. After data extraction, 510,554 candidate m6A sites were retained for further analysis. The dataset also includes two case studies using SMART-seq2 data. These case studies utilized the Odds Ratio (OR) results from SigRM for nine trajectory-related genes (MCM6, PCNA, and SLBP as markers for the G1 phase; RRM2, MCM5, and DTL associated with the S phase; and TOP2A, CCNB1, and AURKA for the G2/M phase) as well as the expression levels of five genes related to m6A modification (the well-acknowledged m6A writers METTL3, METTL14, and WTAP, and the erasers FTO and ALKBH5).For the 10x Genomics data, the dataset contains read count information for two replicates (frequency_rep_1_processed.rds, frequency_rep_3_processed.rds), and data from 2,000 single cells from another replicate, including 1,000 test cells and 1,000 control cells. The read counts for these cells are found in frequency_all_processed.rds, and expression data is provided in expression_TPM.rds. A total of 17,733 candidate m6A sites were identified, with further details available in SNP_all_processed.rds. The code used can be found in the code files.The supplementary file contains the results of the case study 2.
The files associated with this study include:SNP File (SNP_all.rds):Data Details: Stored in RDS format, containing a GRanges object.Content: Each entry in the GRanges object represents a specific genomic region corresponding to an m6A modification site detected through scDART-seq.seqnames: Factor Rle object containing chromosome or genomic sequence names.ranges: IRanges object containing genomic intervals (start and end positions).strand: Factor Rle object indicating the strand (directionality) of the genomic region.mcols: DataFrame object containing optional metadata columns, such as quality scores, coverage depth, and mutation data.seqinfo: Seqinfo object providing information about the genomic sequences present in the GRanges object.Purpose: Provides detailed genomic information about identified m6A modification sites, facilitating further analysis of their distribution, characteristics, and genomic context in HEK293T cells.Frequency File (frequency_all.rds):Data Details: Also stored in RDS format, consisting of a list.Content: Each item in the list represents a single-cell, containing counts of methylated and unmethylated reads for corresponding m6A modification sites detected in scDART-seq data.Purpose: Offers quantitative data on the abundance of methylated and unmethylated sequences at each m6A site across individual cells, enabling investigation of m6A modification patterns at a single-cell level.Expression TPM File (expression_TPM.rds):Data Details: Stored as an RDS file, comprising a list.Content: Each item in the list represents a single-cell, with corresponding TPM values for gene expression.Purpose: Provides information on gene expression levels across individual cells, facilitating examination of potential correlations between m6A modification patterns and gene expression profiles in scDART-seq data from HEK293T cells.gene Information File (gene_informations.rds):Data Details: Stored as an RDS file, comprising a data frame.Content: Includes information such as Gene ID, Gene Name, Reference, Strand, Start position, End position, and Coverage.Purpose: Offers additional details about gene expression data, aiding in the interpretation and analysis of gene expression profiles in conjunction with m6A modification patterns.

Citations (1)

Mentions (0)

Metrics

Dataset Index

0.6

FAIR Score

85%

Citations

1

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

figshare

Assigned Domain

Subfield

Molecular Biology

Field

Biochemistry, Genetics and Molecular Biology

Domain

Life Sciences

Confidence Score

55%

Source

Scholar Data Model

Keywords

Genomics and transcriptomics

Normalization Factors

FT

30.77

CTw

1.00

MTw

1.00