Published on 01 January 2026 |
A Financial Risk Dataset for Chinese Companies from 2008 to 2020
View DatasetDescription
Enterprise risk prediction is of great significance for identifying potential business crises, optimizing strategic decisions, and achieving sustainable development. However, existing research data mostly consists of structured financial data from a single source, lacking integration with unstructured information such as news and public opinion, and lacking standardized risk labels for machine learning modeling. This study constructed a risk annotation dataset for Chinese enterprises from 2008 to 2020 by integrating financial statements and news headline data from listed companies in the RESSET financial research database. This dataset covers 2576 enterprises, including 30757 financial samples and 21499 news samples, covering 10 categories with a total of 210 financial indicators. The content includes basic information of enterprises, financial indicator sequences, news title texts, and risk labels generated based on regulatory delisting rules. In terms of data quality control, financial data is sourced from the statutory disclosure documents of listed companies, and risk labels are automatically generated using objective definition methods to ensure the accuracy and reproducibility of the data. This dataset can provide data support for the construction of enterprise risk identification and warning models, and provide reference for enterprise risk management and development decisions.