A beautiful language

Translated from the original Japanese on 2021-10-02.

When I started working on Established Arka in 2001, I had a secret personal objective for it: to plan a language that sounded lovely when Ridia, my lover, spoke it, and tender and attractive when I spoke it to her. Because Established Arka was an on-demand work, this was a completely personal plan.

However, modern linguistics does not admit the notion that a language can be judged as being beautiful or cute or tender. Although I knew this since the 90s, I wondered whether this was the case. I could not accept such an idea from linguistics. At the least, I have always thought of the idea as quasi-philanthropic. I wondered whether humans had a common “sense of sound” to some extent.

For example, the words describing blossoming or sprouting in many languages tend to use /p/ to depict the sound of some kind of rupture, because [p] is a stop (plosive). By skillfully using articulatory phonetics, couldn’t we get the a priori sense of sound shared to some extent by all humans — I thought — at least to a greatest commmon denominator?

When I think of it, I used a PC-9801 in 1998, when I was creating Old Arka, and I had planned a program to calculate the beauty and cuteness of input words. That is to say, I already believed that there was a language that would sound beautiful to society since then. Of course, I also knew that linguistics would not recognize such a belief.

In 1999, I read a book entitled Onsō (『音相』) by Takayuki Kidoori and exchanged a few e-mails with them. However, that book’s theory was based on the sense of sound in Japanese, and it did not cover the sense of sound as understood by languages in general from the view of articulatory phonetics.

Having learned several foreign languages, I knew that which sounds were considered beautiful differed across languages. For instance, Japanese people consider unvoiced sounds to be more beautiful than voiced sounds and perceive sounds such as [t͡ɕ] (チュなどの音) to be childish. Deutschland was borrowed into Japanese as ドイツ because [t͡ɕ], heard as childish, was avoided.

Furthermore, because culturally speaking, the Japanese admire Westerners, the sounds [v] and [r], which are nonexistent or rare in Japanese, are also often admired.

If we collected 100 Japanese people and asked then which of [pepːonɡjutʃap] or [vanɡaɹdi] sounded more attractive, almost all of them would choose the latter. They have been instilled with the belief that the latter, vaguely Western-sounding, word, sounds more attractive than the former, which seems to come from Thai or a similar language.

In this way, what sounds seem beautiful is dictated to some extent by one’s mother tongue and culture. Therefore, this ratio [of which word is preferred] should vary from country to country. Perhaps there are countries in which the former word is perceived as more attractive.

Nevertheless, couldn’t we use articulatory phonetics to define a universal sense of sound from a more-or-less neutral point of view? Thus, when I started to work on Established Arka in 2001, I thought to try to define such a sense.

The phonetic level

I first thought about whether general properties could be defined by certain sounds in terms of articulatory phonetics. I focused on the fact that the word for spring in many languages has the sound [p] since it takes part in sounds such as that of sprouting buds and words for spring often have etymologies related to sprouting.

The word for spring in Japanese was originally /paru/, so this holds. This property applies to Korean, English, and French as well. When the word for spring contains [p], it is likely to have an etymology related to sprouting. It seems that [p] brings up images of rupturing or swelling, or of being round or rolling.

[t] and [k], which are also stops, seem to give only a faint impression of rupture. [k] has a feeling of depth because it is pronounced deep in the mouth; [t], compared to the other two, is in the middle.

On one hand, [s], even among fricatives, contains a harsh hissing noise that is particularly noticeable in English-language news shows. Even among fricatives, this noise is pronounced in [s] and [ʃ]. Because such a noise is hard on the ears, frequent use of these sounds is not considered euphonious. Among the same group of fricatives, sounds such as [f] do not have as much of this noise because [f] has a lower sonority than [s].

As a digression, why do Japanese, Koreans, and Americans use similar hushing sounds to tell someone to be quiet? Is it because this “shh!” sound itself is noisy enough to annoy someone? I have a memory of losing my temper when I was young, thinking “your hushing is even louder, you dumbass!”. Perhaps this noisy sound conveys the intention of garnering someone’s attention to tell them to be quiet.

In addition, [s] might be used because it, unlike [p] and friends, can be sustained. If “p!” was the expression to tell someone to be quiet, then it would all be over when the [p] is sounded out, and the sound would not be conspicuous. If the message is not passed to the recipient, then it has no meaning. For that reason, the loudest and most conspicuous members of sustained sounds (such as [f], [s], and [ʃ]) – namely, [s] and [ʃ] – would be chosen.

On the other hand, there are nasals such as [m] and [n]; [m] has a round, bulging sensation and gives an impression of mumbling or chewing. Among nasals, it is the sound that gives the most sluggish feeling.

Because the articulation of [m] closes the mouth for a moment, the airflow from the lungs is gathered inside the oral cavity once. The only place for the air to come out is through the nasal cavity. Thus it has a nasal sound that is suggestive of mumbling or chewing.

[n], also a nasal, is pronounced with the mouth open, such that the airflow from the lungs always leaks slightly through the lips. For that reason, the proportion of air that goes through the nose is lower than in [m]. As a consequence, it does not sound as mumbly as [m].

Nasals have neither rupture or friction. They are soft, nasal sounds. Therefore, they can produce a tender, lovely, and innocent impression.

In addition, liquids, which also lack closure or friction, do not give the impression of rupture or of friction. Nevertheless, it does not give an image of roundness either, as airflow does not come out of the nose. True to their name, they give an image of flowing.

From such special qualities that arise from articulation, I thought that the neutral element among stops, [t] should be chosen; fricatives, being harsh, avoided; [n], not feeling rotund, chosen as the nasal; and among the liquids, [l] used as the principal element.

When it comes to vowels, the front of the tongue usually gives a bright but soft image, while the back of the tongue gives an obscured, dark image. This quality is felt by every language to some extent, and this at least is observed even by linguistics.

Too many [i]s and the language sounds shrill and loud; too many [u]s and it sounds dark. For that reason, I decided to use the comparatively neutral vowels [a] and [e] extensively.

Incidentally, [p] and [m] are said to be the first consonants acquired by young children, and they might also give an infantile impression. Personally, they give me an impression of a spitting sound, crude and unrefined – that is, their image is not so good. [m] gives the impression of sounding like the texture of chewed pumpkin and I did not want to use much of it. Still, perhaps this feeling of [m] as the mouthfeel of pumpkin is a product of synesthesia.

Thus, [a], [e], [t], [l], and [n] were chosen as the sounds preferred by Established Arka, but now, in 2012, the frequencies of each phoneme in New Arka have been calculated and these sounds are indeed often included. That is, I have designed the language according to plan.

From this, we can conclude that Arka has suceeded in gaining the image – in terms of articulatory phonetics – of “not using many plosives, which give a childish feeling of bloating or swelling – that is, which give a crude image”, “avoiding a harsh feeling and the screeches of friction”, “avoiding excessive use of nasals that suggest the sound of chewing”, and “not being too bright or too dark”.

I was not so fond of having Ridia speak harshly with numerous [s] sounds, nor was I of having her speak in a chubby or infantile way with numerous [p]s, nor did I want her to speak as if she were chewing. She, too, wanted to speak in an elegant and lovely manner. Therefore, I left it to [l] to deliver an elegant impression, to [n] a lovely impression, and [t], [a], and [e] to act as neutral sounds.

Because I wanted Ridia to speak in a lovable manner, I decided to give [n]s more frequently for women’s words. Traces of this pattern can be seen in words such as non, nonno, and an-.

On the other hand, my own speech is not as cute. Because it is better for it to be tender and stylish, I devised a way for masculine speech to use more of the neutral [t] and the elegant [l].

The intonational level

Excluding sounds that are heard as phonemically dirty or slovenly is the first step, but the second step is in fact more important.

Generally speaking, Chinese is [considered] raucous, because, from what I’ve asked Chinese people when I was in university, it is a tonal language and thus one is prone to raise one’s voice with violent intonation, even when one intends merely to speak.

English is similar – when I hear news in that language, the intonation sounds aggressive. Therefore, I tried examining the range between the lowest and highest sounds, and indeed, English had a larger range than Japanese. That is, the intonation of English is more extreme.

Languages with agressive intonation generally sound loud. In order to express intonation, one is apt to raise one’s voice, and the intonation itself is rich with fluctuations, making the speech sound louder. Thus I describe this “intonational level” later, but when the intonation is aggressive, the speech would sound silly to an ordinary person. Therefore, when I planned the future of Arka, I decided to keep variations in intonation as low as possible.

Still, there is another reason that the intonation of Arka is so narrow: I wanted to make myself heard as tender and attractive whenever I spoke (in other words, I wanted Ridia to say “Seren, you’re so cool!”).

— But why narrow the intonation? That is because I have a complex about my voice being high. When I speak Japanese, for instance, I have a tendency to begin sentences with a high pitch by any means. This arises because catathesis is abundant in Japanese.

For that reason, I gave Arka a component called a return point (回復点; sarit). Because Arka has return points, there is no tendency to start the sentence with a high pitch or end one with a low pitch. Thanks to return points, the ambitus is narrow and there is little intonational variation. The pitch fluctuates within the range in a meandering path.

When I, usually having a high-pitched voice, try to show off by speaking at a lower pitch, I, thanks to catathesis, end up being unable to voice notes below my vocal range and continue the phrase, at least when it comes to Japanese. In Arka, however, I can keep myself at the lowest pitch in my range and skillfully whisper to Ridia since the intonational variation is narrower thanks to return points.

The language was designed such that women would enjoy the low-pitched masculine voice from the start. Because I knew this, I wanted to whisper to her with a deep, sweet voice by all means as someone in a long-distance relationship. This secret intention succeeded with flying colors, and I got Ridia to say, “Seren, you sound better in Arka than in Japanese.” I planned the language using my brain, and I therefore succeeded in making myself handsome. This was quite a “got you!” moment as a conlanger.

Arka’s return points exist even for feminine speech, such that the intonational range is narrow for women as well. This is indeed convenient because Ridia, on the contrary, could speak to me like a cute girl without a low tone, maintaining a high, lovely voice. In reality, when Ridia speaks in Japanese, her voice becomes low due to catathesis and she stops sounding dainty, but when she speaks in Arka, her voice remains high and gentle, and I can listen to her while being enamored with her from start to end.

The female voice, when the pitch is high and the intonation is aggressive, sounds hysterical, but this is not the case with Arka, which has a narrow intonational range. For me, this was an excellent decision to have made.

In this manner, using wits and knowledge, I managed to construct a language that gave the impression of sounding lovely when spoken by Ridia and tender and handsome when spoken by me. Indeed, as long as one has the wisdom, one can even manipulate the impression given by a language to one’s own will.

In terms of sound, I like Arka the most out of any language. This preference is unsurprising, given that I have customized it to sound the best to my ears using my knowledge. I would call it a feat to do something such as customizing how a language sounds.

Does anyone consider the project of influencing and designing the sound of a language according to one’s own ideals? Even among ten thousand people, not even one person might think of it. Those with such refined ears – those who deserve to be called language sommeliers – are almost nonexistent.

From the time I was a child, I liked to listen and compare languages in conveying the same information, and I came to play the game of determining which language sounded the most beautiful. I thought that such games would lead to success. For that reason, I acquired the ears of a language sommelier whom no one, whether in Japan or in the rest of the world, would understand, and furthermore used them to create a wine for myself to drink. Perhaps there was no one who had ears on the level of such an artisan, or if there were, they were few in number.

I have said various things, but the point is that when it came to the beauty of a language, I had to depart from the common sense of linguistics and make my own plans. Such an idea is not proper in linguistics, which intentionally ignores the study of sound sense that exists in reality.

Although I think about whether it is okay to recognize notions of the universal sense of sound at the greatest common denominator that can be inferred from linguistics and articulatory phonetics, I currently do so aggressively only up to the idea that [i] gives a bright impression when discussing the theory of onomatopoeias.

This attitude in linguistics is no good, because facts must not only be acknowledged as facts but also dealt with. It is no use to label it as quasi-philanthropism in order to prevent beautiful languages or ugly languages from being made.

The linguistic level

Incidentally, when I was a lecturer, I conducted an experiment. I tried speaking in foreign languages such as French, German, and Chinese to the students. It is given that the less academically able students are more likely to laugh. The more academically gifted students will often admire you and listen to you in a trance.

When I observed the foolish students who seemed to be laughing at the person, I noticed a certain regularity. They seemed to find the r-sound of French1 amusing and thus often burst out in laughter when they heard it. Chinese itself seems to sound ridiculous to them, and they immediately made fun of it. But they hardly laughed at German.

Perhaps it was because the intonation of German is not as extreme as that of English, there were no phonemes that sounded funny, and it sounded serious. Because this experiment was so interesting, I tried it on the general public. They laughed at Chinese or French, but they did not laugh at German.

By the way, when I spoke in English without telling them that it was English, they laughed, but when I spoke in English after telling them, they said “oh!”. Perhaps this is because of common sense among the general public that able to speak English = amazing. Hence, I decided to conceal the name of the language as much as possible in this experiment. Furthermore, since the students, no matter how uneducated, seemed to recognize katakana English as English, I enunciated the words precisely when I spoke. Otherwise, it would not be an experiment.

Then I considered observing children this way, but I felt quite embarassed to speak in a language with such an aggressive intonation toward them, that they would make fun of and laugh at me. This I know from experience, so it seems that they would not laugh at me if I spoke Arka because it had a lowered intonation.

Occasionally, I have been at side shows, speaking in foreign languages to the unlearned public without telling them the language. When I spoke in Chinese, they would laugh at me, saying “what, what language is this?”; when I spoke in Arka, they would say the same thing with a plain face. Thanks to these experiments, I have concluded for now that Arka did not sound funny, even to uneducated ears.


Still, this manuscript does not imply that Arka is a beautiful language. First, it asserts that there might be a universal sense of sound that can be inferred using articulatory phonetics. In addition, it merely applies this idea to Arka and discusses how the language was set to feel comforting to our ears.

That Arka is a beautiful language is my opinion and not a universal fact. I would like for you to avoid misunderstanding this point.

  1. Of course, German has the same sound. This effect might be coming from other sounds in French – most notably, its nasal vowels.