To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
Languages
Recent
Show all languages
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

Canterbury corpus

From Wikipedia, the free encyclopedia

The Canterbury corpus is a collection of files intended for use as a benchmark for testing lossless data compression algorithms. It was created in 1997 at the University of Canterbury, New Zealand and designed to replace the Calgary corpus. The files were selected based on their ability to provide representative performance results.[1]

YouTube Encyclopedic

  • 1/1
    Views:
    6 716
  • University Wits and Their Contributions to Literature | বাংলা লেকচার | Bengali Lecture

Transcription

Contents

In its most commonly used form, the corpus consists of 11 files, selected as "average" documents from 11 classes of documents,[2] totaling 2,810,784 bytes as follows.

Size (bytes) File name Description
152,089 alice29.txt English text
125,179 asyoulik.txt Shakespeare
24,603 cp.html HTML source
11,150 fields.c C source
3,721 grammar.lsp LISP source
1,029,744 kennedy.xls Excel spreadsheet
426,754 lcet10.txt Technical writing
481,861 plrabn12.txt Poetry (Paradise Lost)
513,216 ptt5 CCITT test set
38,240 sum SPARC executable
4,227 xargs.1 GNU manual page

The University of Canterbury also offers the following corpora. Additional files may be added, so results should be only reported for individual files.[3]

  • The Artificial Corpus, a set of files with highly "artificial" data designed to evoke pathological or worst-case behavior. Last updated 2000 (tar timestamp).
  • The Large Corpus, a set of large (megabyte-size) files. Contains an E. coli genome, a King James bible, and the CIA world fact book. Last updated 1997 (tar timestamp).
  • The Miscellaneous Corpus. Contains one million digits of pi. Last updated 2000 (tar timestamp).

See also

References

  1. ^ Ian H. Witten; Alistair Moffat; Timothy C. Bell (1999). Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann. p. 92. ISBN 9781558605701.
  2. ^ Salomon, David (2007). Data Compression: The Complete Reference (Fourth ed.). Springer. p. 12. ISBN 9781846286032.
  3. ^ "The Canterbury Corpus: Descriptions". corpus.canterbury.ac.nz.

External links


This page was last edited on 15 May 2023, at 01:31
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.