Paper on Wikipedia and other internet sources
Published 2014 in Speak Out, the journal of the Pronunciation Special Interest Group of IATEFL
English Pronunciation Reference Sources on the Internet
Peter Roach, University of Reading
We all look things up on the internet. I would like to take a brief look at some of the free reference sources that we and our students are likely to use. The sources I am interested in could be put under two main headings: encyclopedias and dictionaries. The OED defines ‘encyclopedia’ as “A literary work containing extensive information on all branches of knowledge, usually arranged in alphabetical order”, and this definition seems to fit Wikipedia. OED defines ‘dictionary’ as “A book dealing with the individual words of a language … so as to set forth their orthography, pronunciation, signification, and use … the words are arranged in some stated order, now, in most languages, alphabetical”. There are some free online pronunciation dictionaries available on the internet, and I will discuss Forvo, Inogolo and Howjsay.
Wikipedia
It makes quite a plausible science fiction scenario: a world in the near future where the official version of all human knowledge is available only from Wikipedia. Already Wikipedia (henceforth WP) seems to be the first port of call for any student with an essay to write on any subject, and even if you don't want to get your information from WP, it is most likely that your search engine will choose it for you. If, for example, you Google 'phoneme' or 'intonation' you will almost certainly find the WP article comes top of the list. It seems likely that around the world, students and teachers working in the general area of EFL and ESL will be making use of WP when they want to learn about or check some factual matter about English, and those with an interest in pronunciation will look at WP articles concerned with phonetics. I would like to concentrate on two main questions: what material is contained in WP, and who writes and edits this material?
1. What information does WP contain in the field of English pronunciation?
We should start with some basic WP principles that you may not be familiar with. One is the OR principle (Original Research): WP does not allow itself to publish anything that could be counted as new research - only existing and recognized knowledge can be included. Another principle is that contributions are anonymous: the cult of anonymity which seems so dear to people who post stuff on the internet is strong in WP, and it is therefore not usually possible to find out who wrote (or changed) a particular item. I still find it bizarre to be in a discussion about a point of phonetics with someone calling themselves FluffyBunny or DeathStarAvenger. When you read an article in WP you can see at the top of the page that in addition to the article itself, there is a Talk Page where people can discuss or question material that is (or could be) in the article. This discussion material is sometimes quite amusing, and sometimes rather depressing. At times the material is hacked by stupid people who write obscene comments or personal abuse, though this is usually quickly removed by editors.
There is a vast list of WP articles on phonetics and pronunciation - if you want to see the whole list you can find it at http://en.wikipedia.org/wiki/List_of_phonetic_topics. We can start by looking at basic articles on subjects of interest. One of the most ambitious is English Phonology. This is a long article which attempts to cover a lot of ground. In some cases (e.g. Intonation) it gives a brief description of a topic that gets a separate article elsewhere. There are many tables (WP authors are very fond of tables) followed by long lists of notes. It can be seen that as far as possible articles on English pronunciation try to cover at least British (RP/BBC) and American (GA) unless the topic is related to one specific accent. Sometimes writers seem to feel that only these two accents need to be covered, so that until recently WP was stating that English is a stress-timed language, without showing any awareness of the many varieties of English pronunciation around the world which clearly are not stress-timed.
Those of us involved in teaching British English are likely to be interested in the article on Received Pronunciation. This topic tends to attract some very contentious messages from non-specialists on the Talk page, saying how much they hate "posh talk" and wanting to change what is said about the status of RP to fit their political beliefs.
Non-native pronunciations of English is a topic that should be of interest to pronunciation teachers, but the coverage is very uneven. Arabic, French, German, Hebrew, Hungarian, Italian, Japanese, Russian, Spanish and Vietnamese are listed, but some of these only have one or two sketchy points of pronunciation to offer. I think this article is crying out for some expert input. There is a similar article on Anglophone pronunciation of foreign languages that contains some very odd information about problems that English speakers encounter in learning specific foreign languages.
WP has a lot of material aimed at teaching contributors how to use phonetic symbols. This is seen as important because WP policy is (laudably) that IPA transcription should be used at the beginning of articles if the pronunciation is likely to be difficult for readers. If you look at http://en.wikipedia.org/wiki/IPA_for_English it can be seen that the WP way of doing this involves a diaphonemic representation that is supposed to make it possible to cover all major accents of English with one transcription, instead of doing separate transcriptions for different accents. For example, ‘New York’ is supposed to be transcribed /njuː ˈjɔrk/, on the understanding that readers will know that some speakers will not pronounce the /j/ in ‘New’ and others will not pronounce the /r/ in ‘York’. I believe that this system is seriously flawed, both in general principle (readers would need to know a lot about English phonology in order to interpret correctly the diaphonemic transcriptions) and in detail (for example, in the ‘Key’ table, it is implied that the ‘i’ in ‘nasturtium’ is an example of a reduced vowel – I have not yet succeeded in persuading people that there is no /i/, reduced or otherwise, in ‘nasturtium’, while the guidance for using the syllabic division mark . is virtually unintelligible to me). However, making substantial improvements to it seems to me to run the risk of putting Original Research into a WP article, since nobody as far as I know has actually made such a transcription work.
2. Who writes and edits WP material?
In principle, anyone can become a WP editor. In practice, anyone getting into editing quickly learns that there is a hierarchy of senior editors who police anything being added or changed. The way these senior editors work has a rather hieratic (or perhaps boy-scoutish) feel to it. If you do a lot of editorial work and it is judged to be useful, you get awarded stars (they call them "barnstars") that you can stick on your own home page. If you edit an existing article and someone disagrees with what you have written, they can immediately "revert" it, that is, remove what you wrote. Editors sometimes get into "edit wars" with other editors who have disagreed with alterations, and the arguments can get pretty heated. The most unpleasant dispute I have seen concerns the International Phonetic Association. Anyone with a professional interest in phonetics knows that the Chart of the IPA is an important and long-established framework which is maintained by the Association. Recently it was observed that in the WP article on the IPA an editor had changed the IPA Chart to incorporate his/her own improvements. The Secretary of the IPA, Prof. Pat Keating, requested that the official IPA Chart should be the principal chart shown in the article, and received quite a hostile response from editors who believe that WP has the right to make changes to anything they want[1]. Below is some of what was written in response to Pat's request by one senior editor (clearly not a native speaker):
I propose we drop anything & everything their formal chart is (references will do). IPA is just a source, and their charts are outdated anyway. … The Secretary of the IPA (I first thought it was a joke name) troubles with: A): copyright and B) OR. I say: well, then, your Secretary Highness, 1) open your source and 2) cooperate. I myself can write this: we over here at Wikipedia have the best IPA reference anyone can dream of. I state: we have made the best IPA source … Wikipedia is the best, and most actual & accurate, IPA source in the World. I say, publish with the freedom you noted, great. Most likely, we at WP will rephrase it (read: tear it apart and reform it) Note: The argument can still be read at http://en.wikipedia.org/wiki/Talk:International_Phonetic_Alphabet#Can_we_discuss_adding_the_official_IPA_chart_to_this_article.3F
Wikipedia has strict rules about conflicts of interest and self-promotion in the writing of articles, for reasons that are easy to understand. Articles are often turned down if the author has been personally involved in work on the topic and cites work by others who have been similarly involved. I was given the following advice by a senior editor:
… Disclose your conflict of interest clearly on your user page, and on the talk page of the article once it is in main space … Be very cautious to maintain neutrality and cite other of your colleagues as you cite yourself. If uninvolved editors perceive that you are using Wikipedia as a tool to promote your own work at the expense of the comparable published work of others, objections will be raised …
People may infer from this that you shouldn’t write an article for WP on a topic with which you have been personally involved. As a result, it often seems that articles that should have been written by someone with professional expertise in a subject are instead written by someone who has expertise in writing for WP but only a dilettante’s knowledge of the subject s/he is writing on. I am sure this is not the case in all articles, many of which are extremely impressive, but in our field you can see the amateur nature of the writers’ approach at times.
I believe that the basic idea of a free store of information open to editing by anyone who feels they can improve it and constantly being updated is a very good one. Given the principles on which WP is based, it is surely better to try to correct what is wrong with it than to dismiss it or belittle it. Nevertheless, I hope I have made it clear that I feel there are quite a few things to be concerned about.
Free Online Pronunciation Dictionaries
Forvo
One of the most interesting sites for pronunciation information is Forvo (www.forvo.com), which lets you look up a word and hear one or more recorded speakers saying it. At present it is far from perfect, but its potential is great. As with WP, the contributors are all unpaid volunteers, and anyone can contribute; this gives it a big advantage over commercial publishing, because it is not saddled with the big legal problems over copyright encountered when publishers try to use “authentic” recorded material. You can vote your approval or disapproval of a particular pronunciation, and Forvo shows on a map the geographical origin of the speakers. It is not limited to English (for which at the time of writing 105,693 words, phrases and names are listed): several hundred languages are covered. It would be a huge step forward if Forvo could also get speakers to record words in context in connected speech.
As an example, the word ‘surveillance’ at the time of writing has just one pronunciation recorded for American English and one for British.
Peter Roach, University of Reading
We all look things up on the internet. I would like to take a brief look at some of the free reference sources that we and our students are likely to use. The sources I am interested in could be put under two main headings: encyclopedias and dictionaries. The OED defines ‘encyclopedia’ as “A literary work containing extensive information on all branches of knowledge, usually arranged in alphabetical order”, and this definition seems to fit Wikipedia. OED defines ‘dictionary’ as “A book dealing with the individual words of a language … so as to set forth their orthography, pronunciation, signification, and use … the words are arranged in some stated order, now, in most languages, alphabetical”. There are some free online pronunciation dictionaries available on the internet, and I will discuss Forvo, Inogolo and Howjsay.
Wikipedia
It makes quite a plausible science fiction scenario: a world in the near future where the official version of all human knowledge is available only from Wikipedia. Already Wikipedia (henceforth WP) seems to be the first port of call for any student with an essay to write on any subject, and even if you don't want to get your information from WP, it is most likely that your search engine will choose it for you. If, for example, you Google 'phoneme' or 'intonation' you will almost certainly find the WP article comes top of the list. It seems likely that around the world, students and teachers working in the general area of EFL and ESL will be making use of WP when they want to learn about or check some factual matter about English, and those with an interest in pronunciation will look at WP articles concerned with phonetics. I would like to concentrate on two main questions: what material is contained in WP, and who writes and edits this material?
1. What information does WP contain in the field of English pronunciation?
We should start with some basic WP principles that you may not be familiar with. One is the OR principle (Original Research): WP does not allow itself to publish anything that could be counted as new research - only existing and recognized knowledge can be included. Another principle is that contributions are anonymous: the cult of anonymity which seems so dear to people who post stuff on the internet is strong in WP, and it is therefore not usually possible to find out who wrote (or changed) a particular item. I still find it bizarre to be in a discussion about a point of phonetics with someone calling themselves FluffyBunny or DeathStarAvenger. When you read an article in WP you can see at the top of the page that in addition to the article itself, there is a Talk Page where people can discuss or question material that is (or could be) in the article. This discussion material is sometimes quite amusing, and sometimes rather depressing. At times the material is hacked by stupid people who write obscene comments or personal abuse, though this is usually quickly removed by editors.
There is a vast list of WP articles on phonetics and pronunciation - if you want to see the whole list you can find it at http://en.wikipedia.org/wiki/List_of_phonetic_topics. We can start by looking at basic articles on subjects of interest. One of the most ambitious is English Phonology. This is a long article which attempts to cover a lot of ground. In some cases (e.g. Intonation) it gives a brief description of a topic that gets a separate article elsewhere. There are many tables (WP authors are very fond of tables) followed by long lists of notes. It can be seen that as far as possible articles on English pronunciation try to cover at least British (RP/BBC) and American (GA) unless the topic is related to one specific accent. Sometimes writers seem to feel that only these two accents need to be covered, so that until recently WP was stating that English is a stress-timed language, without showing any awareness of the many varieties of English pronunciation around the world which clearly are not stress-timed.
Those of us involved in teaching British English are likely to be interested in the article on Received Pronunciation. This topic tends to attract some very contentious messages from non-specialists on the Talk page, saying how much they hate "posh talk" and wanting to change what is said about the status of RP to fit their political beliefs.
Non-native pronunciations of English is a topic that should be of interest to pronunciation teachers, but the coverage is very uneven. Arabic, French, German, Hebrew, Hungarian, Italian, Japanese, Russian, Spanish and Vietnamese are listed, but some of these only have one or two sketchy points of pronunciation to offer. I think this article is crying out for some expert input. There is a similar article on Anglophone pronunciation of foreign languages that contains some very odd information about problems that English speakers encounter in learning specific foreign languages.
WP has a lot of material aimed at teaching contributors how to use phonetic symbols. This is seen as important because WP policy is (laudably) that IPA transcription should be used at the beginning of articles if the pronunciation is likely to be difficult for readers. If you look at http://en.wikipedia.org/wiki/IPA_for_English it can be seen that the WP way of doing this involves a diaphonemic representation that is supposed to make it possible to cover all major accents of English with one transcription, instead of doing separate transcriptions for different accents. For example, ‘New York’ is supposed to be transcribed /njuː ˈjɔrk/, on the understanding that readers will know that some speakers will not pronounce the /j/ in ‘New’ and others will not pronounce the /r/ in ‘York’. I believe that this system is seriously flawed, both in general principle (readers would need to know a lot about English phonology in order to interpret correctly the diaphonemic transcriptions) and in detail (for example, in the ‘Key’ table, it is implied that the ‘i’ in ‘nasturtium’ is an example of a reduced vowel – I have not yet succeeded in persuading people that there is no /i/, reduced or otherwise, in ‘nasturtium’, while the guidance for using the syllabic division mark . is virtually unintelligible to me). However, making substantial improvements to it seems to me to run the risk of putting Original Research into a WP article, since nobody as far as I know has actually made such a transcription work.
2. Who writes and edits WP material?
In principle, anyone can become a WP editor. In practice, anyone getting into editing quickly learns that there is a hierarchy of senior editors who police anything being added or changed. The way these senior editors work has a rather hieratic (or perhaps boy-scoutish) feel to it. If you do a lot of editorial work and it is judged to be useful, you get awarded stars (they call them "barnstars") that you can stick on your own home page. If you edit an existing article and someone disagrees with what you have written, they can immediately "revert" it, that is, remove what you wrote. Editors sometimes get into "edit wars" with other editors who have disagreed with alterations, and the arguments can get pretty heated. The most unpleasant dispute I have seen concerns the International Phonetic Association. Anyone with a professional interest in phonetics knows that the Chart of the IPA is an important and long-established framework which is maintained by the Association. Recently it was observed that in the WP article on the IPA an editor had changed the IPA Chart to incorporate his/her own improvements. The Secretary of the IPA, Prof. Pat Keating, requested that the official IPA Chart should be the principal chart shown in the article, and received quite a hostile response from editors who believe that WP has the right to make changes to anything they want[1]. Below is some of what was written in response to Pat's request by one senior editor (clearly not a native speaker):
I propose we drop anything & everything their formal chart is (references will do). IPA is just a source, and their charts are outdated anyway. … The Secretary of the IPA (I first thought it was a joke name) troubles with: A): copyright and B) OR. I say: well, then, your Secretary Highness, 1) open your source and 2) cooperate. I myself can write this: we over here at Wikipedia have the best IPA reference anyone can dream of. I state: we have made the best IPA source … Wikipedia is the best, and most actual & accurate, IPA source in the World. I say, publish with the freedom you noted, great. Most likely, we at WP will rephrase it (read: tear it apart and reform it) Note: The argument can still be read at http://en.wikipedia.org/wiki/Talk:International_Phonetic_Alphabet#Can_we_discuss_adding_the_official_IPA_chart_to_this_article.3F
Wikipedia has strict rules about conflicts of interest and self-promotion in the writing of articles, for reasons that are easy to understand. Articles are often turned down if the author has been personally involved in work on the topic and cites work by others who have been similarly involved. I was given the following advice by a senior editor:
… Disclose your conflict of interest clearly on your user page, and on the talk page of the article once it is in main space … Be very cautious to maintain neutrality and cite other of your colleagues as you cite yourself. If uninvolved editors perceive that you are using Wikipedia as a tool to promote your own work at the expense of the comparable published work of others, objections will be raised …
People may infer from this that you shouldn’t write an article for WP on a topic with which you have been personally involved. As a result, it often seems that articles that should have been written by someone with professional expertise in a subject are instead written by someone who has expertise in writing for WP but only a dilettante’s knowledge of the subject s/he is writing on. I am sure this is not the case in all articles, many of which are extremely impressive, but in our field you can see the amateur nature of the writers’ approach at times.
I believe that the basic idea of a free store of information open to editing by anyone who feels they can improve it and constantly being updated is a very good one. Given the principles on which WP is based, it is surely better to try to correct what is wrong with it than to dismiss it or belittle it. Nevertheless, I hope I have made it clear that I feel there are quite a few things to be concerned about.
Free Online Pronunciation Dictionaries
Forvo
One of the most interesting sites for pronunciation information is Forvo (www.forvo.com), which lets you look up a word and hear one or more recorded speakers saying it. At present it is far from perfect, but its potential is great. As with WP, the contributors are all unpaid volunteers, and anyone can contribute; this gives it a big advantage over commercial publishing, because it is not saddled with the big legal problems over copyright encountered when publishers try to use “authentic” recorded material. You can vote your approval or disapproval of a particular pronunciation, and Forvo shows on a map the geographical origin of the speakers. It is not limited to English (for which at the time of writing 105,693 words, phrases and names are listed): several hundred languages are covered. It would be a huge step forward if Forvo could also get speakers to record words in context in connected speech.
As an example, the word ‘surveillance’ at the time of writing has just one pronunciation recorded for American English and one for British.
The quality of the recording and of the speaking is not at all bad, and compares quite well with the CD-ROM recordings of the two main English pronunciation dictionaries. A recent addition, I think, is a phonemic transcription in a panel at upper right on the screen. As far as I can see, this only gives a British (RP/BBC) transcription, even where only American pronunciation is represented. Thus the word ‘thoroughness’ is given as /ˈθʌrənəs/ but all three of the pronunciations recorded are American and differ from this transcription. The word ‘particular’ has two American and one British speaker. The first American pronounces the word as /pɑːrˈtɪkjəlɚ/ but the transcription given is /pəˈtɪkjələ(r)/. The major weakness (for me) is that many of the recordings are simply not good enough. In some cases the speaker appears to lack experience of speaking into a microphone, while in others the sound quality is so poor and has so much background noise that it seems to have been made with the built-in microphone of a laptop computer. To give an example, the five pronunciations of ‘complimentary’ available at the time of writing seem to start acceptably, but the last two are really not good enough for public use. ‘Altruistic’ has three example pronunciations, of which the first is good, the second is almost drowned by hum and the third recorded with low level for the voice and a lot of background noise.
Inogolo
Inogolo (www.inogolo.com) is a modest resource, but within its niche it has the potential to be useful. It describes itself as “… the practical, easy-to-use website devoted to the English pronunciation of the names of people, places, and miscellaneous stuff. The site contains a searchable database of names with both phonetic and audio pronunciations in English”. In general the idea seems to be to give (American) English pronunciations of foreign or obscure names. The pronunciation information is given in respelling rather than IPA transcription, which is obviously a disadvantage from my point of view. There is some other pronunciation information given, such as “commonly mispronounced” (e.g. Mozart).
Inogolo
Inogolo (www.inogolo.com) is a modest resource, but within its niche it has the potential to be useful. It describes itself as “… the practical, easy-to-use website devoted to the English pronunciation of the names of people, places, and miscellaneous stuff. The site contains a searchable database of names with both phonetic and audio pronunciations in English”. In general the idea seems to be to give (American) English pronunciations of foreign or obscure names. The pronunciation information is given in respelling rather than IPA transcription, which is obviously a disadvantage from my point of view. There is some other pronunciation information given, such as “commonly mispronounced” (e.g. Mozart).
Since it is not obvious how one should pronounce the name of the site itself, I checked this and found it is given in respelling as ih-NO-go-lo. Clicking on the audio button produces a clearly-recorded American pronunciation (/ɪˈnɔʊ.ɡə.lɔʊ/). Unfortunately many words only have the pronunciation in the clunky spelling system, with no audio. No doubt the audio will come in time. I tried a few musical names as a small test. ‘Beethoven’ is given in spelling as BAY-“toe”-vuhn and has good audio. ‘Mozart’ is MOE-tsart. ‘Dvorak’ gives duh-VOR-ak and no audio. ‘Haydn’ is HIDE-un, with no audio. ‘Chopin’ is SHO-pan, and the audio follows this with what sounds like /ˈʃɔʊ.pæn/. There are no entries for Vivaldi or Verdi. If the search comes up with nothing, Inogolo will offer to search elsewhere for the information. The help I got with ‘Schoenberg’ was “Five hotels in Schoenberg” followed by “Schoenberg on Amazon”.
Howjsay
This site (www.howjsay.com) is the nearest thing on the internet to a traditional pronunciation dictionary. You type the word you want into the search box, and when the word has been found it is presented highlighted in a list of alphabetically adjacent words. You only have to move the mouse pointer over the word to hear it.
Howjsay
This site (www.howjsay.com) is the nearest thing on the internet to a traditional pronunciation dictionary. You type the word you want into the search box, and when the word has been found it is presented highlighted in a list of alphabetically adjacent words. You only have to move the mouse pointer over the word to hear it.
The pronunciations are RP/BBC or close to it, and are well recorded by a single expert speaker (Tim Bowyer). Most words are given a single pronunciation, but in the best-known cases alternative pronunciations are given, e.g. ‘controversy’, ‘either’, ‘kilometer’, ‘cigarette’, ‘schedule’. The alternatives are often presented as British – American differences, where I would rather see them as age-related British alternatives. Users can request pronunciations to be added, if they are not currently in the inventory. You can type in several words separated by semicolons so that you can compare and contrast as you hear them in sequence. The number of words represented is currently over 165,000. Jack Windsor Lewis has written an appreciation of Howjsay in
http://www.yek.me.uk/archive25.html#blog247 which is well worth reading.
Conclusion
The free reference sites that I have talked about make up only a tiny fraction of the material "out there" that English pronunciation teachers can make use of, and I have said nothing at all about the wealth of classroom material that writers have generously put out on the internet for sharing. In the case of the reference material, I hope I have shown that despite the efforts of the people responsible for producing it, the quality is very variable. It seems likely that the quality will improve as more users become involved. One question that interests me personally is what impact this free material will have on works published on the internet that currently have to be paid for. Given the general enthusiasm for "free stuff" on the web, and the difficulty of preventing and policing piracy, it seems likely that sites like Forvo and Howjsay will eventually be the winners over the products of commercial publishers. It is going to be very important for us experts in our field to look at what is being put on free sites and correct the material when it is faulty.
http://www.yek.me.uk/archive25.html#blog247 which is well worth reading.
Conclusion
The free reference sites that I have talked about make up only a tiny fraction of the material "out there" that English pronunciation teachers can make use of, and I have said nothing at all about the wealth of classroom material that writers have generously put out on the internet for sharing. In the case of the reference material, I hope I have shown that despite the efforts of the people responsible for producing it, the quality is very variable. It seems likely that the quality will improve as more users become involved. One question that interests me personally is what impact this free material will have on works published on the internet that currently have to be paid for. Given the general enthusiasm for "free stuff" on the web, and the difficulty of preventing and policing piracy, it seems likely that sites like Forvo and Howjsay will eventually be the winners over the products of commercial publishers. It is going to be very important for us experts in our field to look at what is being put on free sites and correct the material when it is faulty.