To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

CMU Pronouncing Dictionary

From Wikipedia, the free encyclopedia

CMU Pronouncing Dictionary
Developer(s)Carnegie Mellon University
Stable release
0.7b / November 19, 2014; 9 years ago (2014-11-19)
Available inEnglish
LicenseBSD
Websitewww.speech.cs.cmu.edu/cgi-bin/cmudict

The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research.

CMUdict provides a mapping orthographic/phonetic for English words in their North American pronunciations. It is commonly used to generate representations for speech recognition (ASR), e.g. the CMU Sphinx system, and speech synthesis (TTS), e.g. the Festival system. CMUdict can be used as a training corpus for building statistical grapheme-to-phoneme (g2p) models[1] that will generate pronunciations for words not yet included in the dictionary.

The most recent release is 0.7b; it contains over 134,000 entries. An interactive lookup version is available.[2]

YouTube Encyclopedic

  • 1/3
    Views:
    645
    8 379
    44 548
  • CMU Distinguished Lecture Series - Dr. Douglas Lenat
  • Tutorial Expanding Dictionary Of Acoustic Model.avi
  • Java prog#145. How to convert text to speech using Java

Transcription

Database format

The database is distributed as a plain text file with one entry to a line in the format "WORD  <pronunciation>" with a two-space separator between the parts. If multiple pronunciations are available for a word, variants are identified using numbered versions (e.g. WORD(1)). The pronunciation is encoded using a modified form of the ARPABET system, with the addition of stress marks on vowels of levels 0, 1, and 2. A line-initial ;;; token indicates a comment. A derived format, directly suitable for speech recognition engines is also available as part of the distribution; this format collapses stress distinctions (typically not used in ASR).

The following is a table of phonemes used by CMU Pronouncing Dictionary.[2]

Vowels
ARPABET Rspl. IPA Example
AA ah ɑ odd
AE a æ at
AH0 ə ə about
AH uh ʌ hut
AO aw ɔ ought, story
AW ow cow
AY eye hide
EH eh ɛ Ed
Vowels
ARPABET Rspl. IPA Example
ER ur, ər ɝ, ɚ hurt
EY ay ate
IH i, ih ɪ it
IY ee i eat
OW oh oat
OY oy ɔɪ toy
UH uu ʊ hood
UW oo u two
Stress
AB Description
0 No stress
1 Primary stress
2 Secondary stress
Consonants
ARPABET Rspl. IPA Example
B b b be
CH ch, tch cheese
D d d dee
DH dh ð thee
F f f fee
G g ɡ green
HH h h he
JH j gee
Consonants
ARPABET Rspl. IPA Example
K k k key
L l l lee
M m m me
N n n knee
NG ng ŋ ping
P p p pee
R r r read
S s, ss s sea
Consonants
ARPABET Rspl. IPA Example
SH sh ʃ she
T t t tea
TH th θ theta
V v v vee
W w, wh w we
Y y j yield
Z z z zee
ZH zh ʒ seizure

History

Version Release date[3] License
0.1 16 September 1993 Public Domain
0.2 10 March 1994 Public Domain
0.3 28 September 1994 Public Domain
0.4 8 November 1995 Public Domain
0.5 No public release Public Domain
0.6 11 August 1998 Public Domain
0.7 No public release Public Domain
0.7a 18 February 2008 2-clause BSD
0.7b 19 November 2014[4] 2-clause BSD
GitHub (unversioned) 26 May 2021 2-clause BSD

Applications

  • The Unifon converter is based on the CMU Pronouncing Dictionary.
  • The Natural Language Toolkit contains an interface to the CMU Pronouncing Dictionary.
  • The Carnegie Mellon Logios[5] tool incorporates the CMU Pronouncing Dictionary.
  • PronunDict, a pronunciation dictionary of American English, uses the CMU Pronouncing Dictionary as its data source. Pronunciation is transcribed in IPA symbols. This dictionary also supports searching by pronunciation.
  • Some singing voice synthesizer software like CeVIO Creative Studio and Synthesizer V uses modified version of CMU Pronouncing Dictionary for synthesizing English singing voices.
  • Transcriber, a tool for the full text phonetic transcription, uses the CMU Pronouncing Dictionary
  • 15.ai, a real-time text-to-speech tool using artificial intelligence, uses the CMU Pronouncing Dictionary

See also

References

  1. ^ "Sequitur G2P - A trainable Grapheme-to-Phoneme converter".
  2. ^ a b "The CMU Pronouncing Dictionary". CMU Pronouncing Dictionary. 2015-07-16. Archived from the original on 2022-06-03. Retrieved 2022-06-04.
  3. ^ ftp://ftp.cs.cmu.edu/project/speech/dict/[permanent dead link]
  4. ^ "CMUdict". svn.code.sf.net.
  5. ^ "Cmusphinx - Revision 10973: /Trunk/Logios". Archived from the original on 2011-05-20. Retrieved 2009-12-19.

External links

This page was last edited on 20 June 2024, at 11:03
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.