To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
Languages
Recent
Show all languages
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

Russian National Corpus

From Wikipedia, the free encyclopedia

The Russian National Corpus (Russian: Национальный корпус русского языка, lit.'National Corpus of the Russian language') is a corpus of the Russian language that has been partially accessible through a query interface online since April 29, 2004. It is being created by the Institute of Russian language, Russian Academy of Sciences.

It currently contains more than 1 billion word forms[1] that are automatically lemmatized and POS-/grammeme-tagged, i.e. all the possible morphological analyses for each orthographic form are ascribed to it. Lemmata, POS, grammatical items, and their combinations are searchable. Additionally, 6 million word forms are in the subcorpus with manually resolved homonymy.

The subcorpus with resolved morphological homonymy is also automatically accentuated. The whole corpus has a searchable tagging concerning lexical semantics (LS),[2] including morphosemantic POS subclasses (proper noun, reflexive pronoun etc.), LS characteristics proper (thematic class, causativity, evaluation), derivation (diminutive, adverb formed from adjective etc.).

The RNC includes also the following subcorpora:

  • a treebank of syntactical dependencies (largely based on the Igor Mel'čuk's Meaning-Text Theory)
  • English⇔Russian, German⇒Russian, Ukrainian⇔Russian and Belorussian⇔Russian parallel corpora;
  • a large (100+ million words) separate corpus of modern newspapers (2001–2011);
  • a corpus of Russian poetry, where the rhyming words and poetic prosody (including meter, stanzas etc.) is additionally tagged;
  • a corpus of Russian dialects with specific dialect grammar tagging;
  • a multimedia corpus with searchable tagged fragments of Russian-language movies;
  • a corpus showing the history of Russian stress
  • an educational subcorpus reflecting school standards.

All the texts have tags bearing metatextual information - the author, his/her birth date, creation date, text size, text genres (general fiction, detective story, newspaper article etc.); all these categories are browsable and searchable separately. It is possible to define a user's subcorpus to search lemmata/POS-grammeme/semantic tags combinations only within this subset.

YouTube Encyclopedic

  • 1/3
    Views:
    805
    212 298
    1 043
  • Present Active Participles: Practice
  • Russian & Soviet Patriotic Song - "День Победы"
  • Corpus Anarchicum by Hamid Dabashi

Transcription

See also

References

  1. ^ "Национальный корпус русского языка". Национальный корпус русского языка (in Russian). Archived from the original on March 5, 2022. Retrieved August 28, 2022.
  2. ^ Apresjan, Ju.; Boguslavsky, I.; Iomdin, B.; Iomdin, L.; Sannikov, A.; Sizov, V. (2006). A Syntactically and Semantically Tagged Corpus of Russian: State of the Art and Prospects. Proceedings of LREC. Genova, Italy. pp. 1378–1381. CiteSeerX 10.1.1.111.8165.

External links


This page was last edited on 3 September 2022, at 14:08
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.