To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
Languages
Recent
Show all languages
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

Persian Speech Corpus

From Wikipedia, the free encyclopedia

The Persian Speech Corpus is a Modern Persian speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions of about 2.5 hours of Persian speech aligned with recorded speech on the phoneme level, including annotations of word boundaries.[1] Previous spoken corpora of Persian include FARSDAT, which consists of read aloud speech from newspaper texts from 100 Persian speakers and the Telephone FARsi Spoken language DATabase (TFARSDAT) which comprises seven hours of read and spontaneous speech produced by 60 native speakers of Persian from ten regions of Iran.[2]

The Persian Speech Corpus was built using the same methodologies laid out in the doctoral project on Modern Standard Arabic of Nawar Halabi at the University of Southampton. The work was funded by MicroLinkPC, who own an exclusive license to commercialise the corpus, though the corpus is available for non-commercial use through the corpus' website. It is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The corpus was built for speech synthesis purposes, but has been used for building HMM based voices in Persian. It can also be used to automatically align other speech corpora with their phonetic transcript and could be used as part of a larger corpus for training speech recognition systems.[1]

YouTube Encyclopedic

  • 1/3
    Views:
    1 049
    1 464
    1 714 915
  • Corpus Anarchicum by Hamid Dabashi
  • Hamid Dabashi: Humanism and Homelands
  • The Weird Truth About Arabic Numerals

Transcription

Contents

The corpus is downloadable from its website, and contains the following:

  • 396 .wav files containing spoken utterances
  • 396 .lab files containing text utterances
  • 396 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files.
  • phonetic-transcript.txt which has the form "[wav_filename]" "[Phoneme Sequence]" in every line
  • orthographic-transcript.txt which has the form "[wav_filename]" "[Orthographic Transcript]" in every line

See also

References

  1. ^ a b Halabi, Nawar (2016). Modern Standard Persian Phonetics for Speech Synthesis (PDF) (PhD Thesis). University of Southampton, School of Electronics and Computer Science.
  2. ^ Bijankhan, Mahmood, Javad Sheykhzadegan, Mohammad Bahrani, Masood Ghayoomi, 2011. “Lessons from building a Persian written corpus: Peykare” Language Resources and Evaluation 45.2: 143–164

External links

This page was last edited on 10 May 2024, at 07:48
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.