Namespaces, Clear Syntax and Automation

From Fresh Dictionary

We need Namespaces, Clear Syntax and Automation, or do we?

I've seen the definition for Wiktionary: A Dictionary and a Thesaurus in every language. The goal is great, but the current infrastructure and free-form syntax means that there will be an enormous amount of redundant manual editing, that could be done automatically if we just had a clear syntax and some software. I'm not saying that it would be easy, I'm just saying it could be feasible.

e.g. To achieve the goal for one meaning of one word in n languages we have to make n(n-1)=n^2-n entries will frustrate a lot of people, who think something like: "If we just XML'd this and this and so forth..."

At least in nouns there are a lot of unambiguous words in most languages.

I'm sure that here are lots of people who have thought of this kind of scheme of automation through clearly defined syntax, namespaces for languages and classes of words (noun, verb...) and evolving the underlying software. Please see My page on what I've managed to scribble down on this matter

I'll iterate on the subject with your help. Cheers.

- Juho 13:26 Feb 22, 2003 (UTC)

IT WON'T WORK! Language is not that well behaved, and I shudder at the thought of bot generated translations.
Please consider the following POVs:
Automatical entries could go to a special namespace and therefor have a different colour before they are checked by a human to be sane and truthful
The dependency-data from the automatical translation would be very useful for detecting, stopping and reversing vandalism.
Let me illustrate this point.
In Wikipedia when I make a change to an article it takes me some time, lots of concentration, Googleing, backtracking my subscribed RSS-feeds and consulting books which makes it very likely that I will put it on my Watchlist to see if someone axes my edits or what further info people input on the subject. I believe most people go about this Watchlist matter in the same way, which results in numerous eyeballs ready to catch vandalism, minor puns, POVs and so on.
In Wiktionary the contribution of adding a translation usually takes 10-30 seconds and when you get into the flow, you'll do these for half an hours straight and I have no interest to watch these articles (most likely thing to catch would be someone adding a translation in a language I don't have a clue about). Therefore it is much simpler to vandalise Wiktionary e.g. just change some translation to an obcenity and mark something else in the summary.


When utilising Wikipedia to get information, one can use common sense to filter out possibly unreliable information.
When utilising Wiktionary to get a translation, I'm really vunerable to practical jokes, puns and obcenities whether human or bot created
This vulnerability (and the redundant manual work I've mentioned before) increases the chances that some people who feel almost religiously about the future of XML will fork a separate project from the Wiktionary to illustrate the power of meta-data and alleviate some headaches and frustrations
Comments and further thinking are very welcome.
I'll write more on this subject in my own space. I'll post a link here when I've elaborated and argued my view more precisely


- Juho 11:24 Feb 23, 2003 (UTC)
Sure there are some words that can easily be mapped on a one-to-one basis between languages, but these are the exception. This mapping works best with modern technical terms. The further one gets from these technical terms, the more connotational baggage a word picks up, and that baggage will not be the same in every language. Distinctions may be made in one language but not in another. Distinguishing between ser and estar is a problem for a new speaker of Spanish. The use of the is a problem for slavs wanting to learn English. Do we treat each item of a Finnish declension as a separate word?
Regrettably automation gives us the situation where a letter addressed to the Widget Company will begin with the salutation, "Dear Mr. Company".
<xml>
  <company name="Widget"></company>
  <person firstname="John" surname="Doe"></person>
</xml>
Sorry, I just had to put this here (I'm not trying to provoke a fork into "xmltionary") Juho
I do believe that a fairly uniform format for articles would be an asset, but putting articles in that form will be still need to be mostly done by humans who will be in a position to exercise judgment when exceptions arise.
In my view the vision of a multilingual Wiktionary involves separate Wiktionaries for each participating language. Each of these would be written with the speakers of that language in mind. Even the foreign words on each Wiktionary would be described in a way to benefit the speakers of the base language.
-- Eclecticology 01:50 Feb 23, 2003 (UTC
Personal tools