Scholar Data

Chinese-English Biology and Chemistry Abstract Parallel Text

Introduction

Chinese-English Biology and Chemistry Abstract Parallel Text was developed by The MITRE Corporation. It consists of parallel sentences from a collection of chemistry and biology-related scientific article abstracts published in Mandarin and translated into English by translators with particular expertise in the technical area. Translators were instructed to err on the side of literal translation if required, but to maintain the technical writing style of the source and make the resulting English as natural as possible. The translators were given specific guidelines for translation, and those are included in this distribution.

Data

This release contains 2,239 lines of parallel Mandarin and English, with a total of 156,445 characters of Mandarin and 75,515 words of English, presented in a separate UTF-8 plain text file for each language. The sentences were translated in sequential order and presented in scrambled order, such that parallel sentences at identical line numbers are translations. For example, the 31st line of the English file is a translation of the 31st line of the Mandarin file. The original line sequence is not provided.

Samples

For an example of the data please consult this Chinese sample and English sample.

Updates

None at this time.

Authors

Doran, Christine ;
Burger, John D. ;
Henderson, John C. ;
Zarrella, Guido

0 Citations0 Mentions35% FAIR0.9 Dataset Index

10.35111/w3e2-e464January 2013

Russian-English Computer Security Parallel Text

Introduction

Russian-English Computer Security Parallel Text was developed by The MITRE Corporation. It consists of parallel sentences from a set of computer security reports published in Russian and translated into English by translators with particular expertise in the technical area. Translators were instructed to err on the side of literal translation if required, but to maintain the technical writing style of the source and to make the resulting English as natural as possible. The translators followed specific guidelines for translation, and those are included in this distribution.

Data

There are 6,276 lines of parallel Russian and English, with a total of 60,059 words of Russian and 76,437 words of English, presented in a separate UTF-8 plain text file for each language. The sentences were translated in sequential order and presented in a scrambled order, such that parallel sentences at identical line numbers are translations. For example, the 31st line of the English file is a translation of the 31st line of the Russian file. The original line sequence is not provided. 1,694 untranslated lines (such as code snippets) are included as a separate file

Samples

For a sample of the data please view this English text sample and Russian text sample. Here is the Russian sample as an image.

Updates

None at this time.

Authors

Doran, Christine ;
Burger, John D. ;
Henderson, John C. ;
Zarrella, Guido

0 Citations0 Mentions35% FAIR0.8 Dataset Index

10.35111/fgmp-0035December 2012

Chinese-English Semiconductor Parallel Text

Introduction

Chinese-English Semiconductor Parallel Text was developed by The MITRE Corporation. It consists of parallel sentences from a collection of abstracts from scientific articles on semiconductors published in Mandarin and translated into English by translators with particular expertise in the technical area. Translators were instructed to err on the side of literal translation if required, but to maintain the technical writing style of the source and to make the resulting English as natural as possible. The translators followed specific guidelines for translation, and those are included in this distribution.

Data

There are 2,169 lines of parallel Mandarin and English, with a total of 125,302 characters of Mandarin and 64,851 words of English, presented in a separate UTF-8 plain text file for each language. The sentences were translated in sequential order and presented in a scrambled order, such that parallel sentences at identical line numbers are translations. For example, the 31st line of the English file is a translation of the 31st line of the Mandarin file. The original line sequence is not provided.

Samples

Follow these links for Chinese and English samples.

Updates

None at this time.

Authors

Doran, Christine ;
Burger, John D. ;
Henderson, John C. ;
Zarrella, Guido

0 Citations0 Mentions35% FAIR0.8 Dataset Index

10.35111/w9ke-2w41November 2012

Automated Author Profile
Burger, John D.

Burger, John D.

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Chinese-English Biology and Chemistry Abstract Parallel Text

Introduction

Data

Samples

Updates

Russian-English Computer Security Parallel Text

Introduction

Data

Samples

Updates

Chinese-English Semiconductor Parallel Text

Introduction

Data

Samples

Updates

Automated Author ProfileBurger, John D.

Burger, John D.

Current S-Index

Average Dataset Index per Dataset

Total Datasets

Average FAIR Score

Total Citations

Total Mentions

S-Index Interpretation

S-Index Over Time

Cumulative Citations Over Time

Cumulative Mentions Over Time

Datasets

Chinese-English Biology and Chemistry Abstract Parallel Text

Introduction

Data

Samples

Updates

Russian-English Computer Security Parallel Text

Introduction

Data

Samples

Updates

Chinese-English Semiconductor Parallel Text

Introduction

Data

Samples

Updates

Automated Author Profile
Burger, John D.