PARALLEL CORPORA FOR AI

Parallel corpora for over 350 language pairs and numerous multilingual combinations, including 9 million bilingual segments and 90 million tokens in twenty languages.

The segments consist of manually curated full sentences and short phrases with translation equivalents, based on corpus evidence and frequency, originally created by our editors and translators worldwide as examples of usage for dictionary entries.

The data can be applied to boost the performance of Language Service Providers, to train Machine Learning models and enhance their Neural Machine Translation solutions.

The languages include Arabic, Chinese (Simplified), Danish, Dutch, English, French, German, Greek, Hebrew, Italian, Japanese, Korean, Norwegian, Polish, Portuguese – Brazilian and European, Russian, Spanish, Swedish, and Turkish.

In addition to parallel corpora of general vocabularies, we also offer corpora for a hundred specific subject domains as shown below. Please contact us to enquire about specific language pairs for any domain.

DOMAINS


Acoustics

Music


Architecture

Cartography


Chemistry

Pharmacology


Culinary

Drinks


Electricity

Energy


Geography

Geology


Grammar
Linguistics


Literature

Publishing


Military

Police


Theology

Religion


Agriculture

Botanics
Environment


Anthropology

Archeology
Philosophy


Culture
History
Politics


Education

School
University


Games
Leisure time&hobbies


Geometry

Mathematics
Statistics


Maritime

Nautical
Oceanography


Mythology

Psychology
Sociology


Journalism

Law
Occupation


Astronomy

Meteorology
Optics
Physics


Clothing

Cosmetics
Dress
Fashion


Radio

Technology
Telephone
Television


Anatomy

Genetics
Health
Medicine
Physiology


Aeronautics

Aviation
Automobiles
Rail
Transportation


Anatomy

Biology
Ecology
Genetics
Physiology
Zoology


Administration

Advertising
Commerce
Economics
Finance
Industry
Marketing


Art

Cinema
Color
Dance
Entertainment
Music
Photography
Theatre

 


Computers

Data
Electronics
Engineering
Informatics
Internet
IT
Technical
Technology
Telecommunication


Astrology
Construction

Family
Furniture
Hygiene
Measurements&units
Mechanics
Post
Sex
Space
Sport
Time
Tourism

CONTACT

    Font Resize
    Contrast