+merlan #flirora's website

How to create a linguistically consistent conlang

Translated from the original Japanese on 2021-12-31.

  1. How to create a linguistically consistent conlang
    1. The objectives of this manuscript and the intended audience
    2. What is necessary for a conlang is not status but rather effort and literature
    3. The minimum to protect is the set of linguistic universals
    4. If you study the outline of linguistics, focus on typology first
    5. The next essential is phonetics
    6. Grammar is secondary; enquiry has a wide scope
    7. Idiomaticity
    8. Tendencies that produce idiomaticity
    9. The idiomaticity of conlangs
    10. Why conlanging is not a category of linguistics
    11. A conlang without linguistic contraditions
    12. The influence of other languages
    13. Dual histories
    14. Sister article

The objectives of this manuscript and the intended audience

This manuscript discusses how to make a constructed language without linguistic inconsistencies – or, to put it mildly, one that is not unnaturalistic.

Only a priori languages will need this way of creating a language. This guide is not necessary for a posteriori languages because the languages that they borrow words from are natural languages. For instance, Esperanto does not need this manuscript.

Out of a priori languages, there are some that pursue reality and others that do. Among these, the former languages do not need this theory. For example, Klingon does not need this manuscript.

The only ones who need this manuscript are researchers wanting to create “a natural language that feels as if it could actually exist” to use in a fictional world to pass off as reality.

Since the following is an explanation regarding this type of conlanging, I would like to note that it does not necessarily coincide with the process of conlanging expressed as “how to make a conlang”.

What is necessary for a conlang is not status but rather effort and literature

Then assuming that you have the goals mentioned above, the first thing you need is a knowledge of linguistics: an undergraduate-level understanding at the minimum, and preferrably a graduate level.

Linguistics is typically dealt with by the Japanese or English department among the department of literature. At this rate, you might think you would need to belong to the literature department, but that is not actually true. That is because linguistics, unlike engineering, does not require expensive experimental equipment. Although equipment is expensive in the one subfield of speech recognition, you can get the knowledge you need to make a conlang as long as you can get the right literature.

Therefore, all it takes is one’s own effort and literature. That is, it suffices to study linguistics on one’s own. Furthermore, it is not even necessary to enroll in a university’s literature department to major in linguistics. Although the author of this manuscript has majored in linguistics in graduate school, the same author has deemed it to be not particularly necessary. But why is this the case? That is because conlanging is not considered to be within linguistics. This reason will be discussed below.

During high school, the author studied linguistics independently and noticed that linguistics did not deal with conlanging. Still, in those days when the Internet was not widely available, studying linguistics at the university would be the closest that he would get to conlanging. Thus, he reluctantly studied linguistics there, but because the department’s courses were aimed at those who were unfamiliar with linguistics, he already had nothing to learn there and took graduate courses as well since his undergraduate years. With the kindness of his superiors, he started to be affiliated with like-minded people connected to linguistics. However, they never acknowledged his pasttime of conlanging.

Therefore, in university, he researched fields that would be the least bit helpful for conlanging. He thought that he would be treated as an oddball for making languages, or rather, if anything, he would even be ridiculed for it. In addition, the classwork did not help at all when it came to Arka, but everything else was greatly illuminating to him.

If you look at the face of conlanging, he could say from his own experience that it is not necessary to belong to a university. For this reason, even if you are only an amateur linguist, even if you are no graduate of the literature department, even if you are in the sciences, even if you are a high-school graduate, title does not matter for the goal of creating a conlang; as long as you can put in your own effort and read the literature, you will manage.

By the way, the author enjoyed the school traditions at the university because it fit well for him. He has had good memories, and his classmates, seniors, and faculty alike were kind people.

The importance of literature has been emphasized several times in this text because technical books can easily cost hundreds of dollars. Because the author’s parents were masters of debt, he, of course, paid for graduate school costs with part-time jobs; since he had to buy technical books above that, he had a very hard time. As he went into a cycle of earning money to buy books, he graduated with the minimum amount of acquired credits, and he had never attended a mixer like the ones his peers were attending. Even when he dated a girlfriend, it always consisted of hanging around in Junkudo or something, and he ended up making her think miserable thoughts such as sharing a pack of takoyaki between them instead of buying one for each of them.

It was not enough to simply lecture at a cram school; he had even worked at a live-in agriculture job to earn money for books. Having to to carry a 20 kg box with two of his fingers on one hand to load on a track (i.e. a total of 40 kg per trip) or work under the scorching sun in temperatures over 40 °C with no water made an impression on his mind. Although none the other farmers thought so, it was unfortunate that the place he went to casually had a cold reception to part-time work. That family would work while drinking water, but they would not pass him any, even if it the temperatures were blazing. They would force him to go through the window instead of the front door, and they also would not let him use the toilet. Dinner consisted of rice and eggplants. When relatives of the family were in a car accident, the author’s head started bleeding, and even though he literally had blood over his face, they pretended not to notice and so on. By the way, that blood was cleaned off by the evening rain. Even through these circumstances, however, he always had linguistics and Arka in his mind.

Thinking of that, when he was once transported to a hospital in an ambulance after falling over in an unrelated case, he thought of the faintly visible words of German origin written on the equipment in the ambulance. It seems that he could not get his head away from linguistics and Arka, whatever situation he might be in. He wished that he had more money, so that he would not have to go through these hardships.

Because the author had to go through this trouble to earn money for books in this manner, he has ended up becoming aware of something. He has been happy as a working adult to live comfortably without worrying about money. Now he has gone as far as to be able to invest the profit margin of his speculation into earthquake recovery support.

But in exchange for this economic security, he has unfortunately started to complain about his body breaking down from working too much up to now. Even though he had just enough stamina to travel from Saitama to Kyoto on a bicycle that was not even a roadracer, his body had put up with the work he had done for many years for the sake of conlanging.

Let us return to the discussion at hand. At any rate, if you are not a student but rather a working adult, there is no need to worry about paying for literature. The only thing you need is your own effort.

The minimum to protect is the set of linguistic universals

What must be kept at the very least are the universals of language typology. The following are said to be absolute universals in language:

Your language must adopt these universals, no questions asked. For us, universals are not limitations but rather useful conveniences. When most creators think of creating a realistic language, they wish for something to rely on from linguistic theory. At this time, linguistic universals are felt to be a great convenience.

If you study the outline of linguistics, focus on typology first

However, I am sorry to say that there are very few universals discovered in linguistics. Most are only either strong tendencies or weak tendencies. Still, it is more realistic to prefer to adopt these tendencies.

For example, according to Matsumoto (2006), the most common basic word order in the world is SOV and the least common is OSV. Indeed, SOV and SVO alone account for 84.3% of the world’s languages. If you are seeking reality, then the safest choice is to choose between SOV and SVO. That does not mean that a language is automatically unrealistic if it is OSV. However, because SOV and SVO are “overwhelmingly more likely”, it is linked with being “indeed perceived to take on a sense of reality” and thus perceived as more realistic. However, although these words have been repeated, to part from the rest and use OSV is your choice.

These sorts of tendencies can be found by the field of linguistic typology. The first thing you should study after linguistics in general is linguistic typology. Not only must you decide which word order to use, but also make choices such as the type of morphology it uses (e.g. agglutinative or fusional) or its morphosyntactic alignment (e.g. accusative or ergative). When you are faced with such choices, linguistic typology is essential.

Furthermore, as you advance forward, you will bump into walls such as the fact that existing languages are almost never 100% agglutinative, and you will find that linguistic classification is not that easy, or to put it in a different way, creating languages is not so simple either.

The next essential is phonetics

Phonetics and phonology should have high priority for study. Languages are made from sounds, and you will need to decide what phonemes your language will use. In this case, it is absolutely essential to avoid choosing sounds that are impossible to pronounce IPA-wise.

Occasionally, a fool claiming to be able to speak Martian will show up on TV, but the human vocal organs are extremely delicate, and even monkeys, who are terribly similar structurally to humans, are unable to imitate human language (just by being unable to walk upright). A living thing with the form of a human could not survive on a planet with a completely different environment. Therefore, even if there were Martians or Venusians, they would have quite a different form from humans. That is, even if they had vocal organs to begin with, they would differ from the corresponding human ones. Consequently, humans would not be able to imitate their voices, either. As a result, articulatory phonetics does not admit the notion that these swindlers are speaking Martian.

Although such things are understood to be common sense, they are also something understood to bring up the delicateness of the IPA. You must limit yourself to the sounds presented in the tables of the IPA and choose which phonemes to include in your language.

There is a reason that typology was mentioned before phonetics. For instance, there is no language that has [z] but not [s]. This is made clear by typology. Even so, if you do not know these sorts of things, then there is a danger of choosing phonemes in a linguistically implausible way.

When you choose your phonemes, you should avoid choosing only phonemes that you like or are used to. You must investigate languages from various language families such as Indo-European or Sino-Tibetan and investigate synchronic and diachronic sound changes, and based on those investigations, you must choose the phonemes of your own language.

In his university days, the author frequented the library; not only had he read through from the first volume of The Sanseido Encyclopedia of Linguistics to the supplement in the fifth volume, but he had also skimmed the extra issue, “The Encyclopedia of the World’s Scripts”. In addition, he had purchased a glossary of technical terms, expecting to use it often. Such elaborate investigation is essential.

Incidentally, the writing system that the author thinks is the most polished on Earth is Chinese characters, whatever problems with ease of learning and stroke count it may have.

Grammar is secondary; enquiry has a wide scope

Perhaps you want to hurry up and get to the fun work of creating grammar. However, that is a talk for much later. If you set the grammar in stone as soon as you finish studying typology, then whatever blueprint you draw when you are unfamiliar with typology is bound to appear preposterous.

Grammar is a changing thing, and one grammatical feature can influence another. For instance, if a language’s word order is VSO, then adjectives will always follow the noun. Under no circumstances should you make decisions haphazardly to your own convenient likings. It is necessary to take into account the influence of not only typology but also the languages of the surrounding peoples, as well as historical changes, when making decisions.

For example, Japanese grammar is not something that belongs only to Japanese. One must take into account the wider area of the Pacific Rim along with the various languages in that region. That is to say that the language of the country that you wish to create must consider the wider area along with the neighboring areas. For these matters, Matsumoto (2007) will be helpful.

I understand the feeling of being impatient and wanting to create the grammar right away. Then the first stage you take when you want to create a language might be to gather the personal pronouns. But because even something as simple as the personal pronouns can vary in its system depending on whether it is a DO-language or a BECOME-language as mentioned below, they cannot be decided that readily. For example, in a BECOME-language such as Japanese, first-person pronouns are generally abundant, while in a DO-language such as English, they are few. To put together even one personal pronoun requires thinking about various other grammatical features collectively. For that reason, the principle of studying comes into effect even before they are put together.

Moreover, Matsumoto (2010) goes into exceeding detail on personal pronouns. There are many good works cited here, but this is one that the creator of Arka wishes had been published before Arka was created. If all of these good works were present, then Arka would have been finished sooner, passing through its twists and turns more quickly.

By the way, since perhaps you are Japanese, you should be familiar with accusative languages. As both Japanese and English are accusative languages, it is probably easy for your language to naturally end up with accusative alignment. However, in the world, there also exist ergative languages. To put it in other words, there is also the option of ergative alignment, but you should not casually choose it just because you like it or “because it seems interesting”.

Furthermore, ergative and accusative languages do not exist as completely segregated categories. There are cases when the alignment of a language can change from ergative to accusative depending on tense. For example, English is accusative, but according to Kondo (2006), the hypothesis that its ancestor Indo-European was ergative is widely supported among linguists.

Moreover, whether a language is accusative or ergative is influenced by its other grammatical features. For instance, Kondo (2006) seeks the origin of the ergative case in the instrumental. It is interesting that one feature of ergative languages is the grammatical relationship between the ergative case and the seemingly-unrelated instrumental case because it means that one must think about the other grammatical features as a whole when choosing between accusativity and ergativity.

In addition, although Kondo seeks the origin of the ergative case in the instrumental, most linguists support a hypothesis that traces it to the genitive case.


Typology, phonetics, phonology, and grammar. Once these areas are considered, the skeleton of the language is complete. Naturally, in the course of putting them together, basic vocabulary and function words will have been made. When doing so, you must study morphology separately.

When this work is finished, the language will have advanced to the point that simple things can be expressed in it. At this point, one might notice that there are multiple ways to express the same phenomenon. The sentence “he provided an apple for me”, for instance, has the same meaning as “I got an apple from him” or “he gave me an apple”. However, the first sentence sounds like it had been translated from another language and does not feel natural1. For some reason, it does not look English-y. In this manner, only when you can express the same information in different wordings does the feeling of “idiomaticity” arise.

Each language has its own standard of “idiomaticity”. For instance, in Japanese, 「雨が降る」 (“rain descends”) is the natural way to say “it’s raining”, while 「雨る」 is ungrammatical. On the contrary, “It rains” is the natural way to convey this in English, but “Rain falls” is not the typical expression.

「雨が降る」 is Japanese-y, while 「雨る」 is not. On the other hand, “It rains” is English-y, while “Rain falls” is not that English-y. It is natural in Japanese to use 「雨」 as a noun, and it is natural in English to use “rain” as a verb. In this manner, each language has its own “idiomaticity”.

The following are some examples that are closer to the Japanese expression.

By changing the verb part of the sentence, Japanese and friends can show the source of the natural phenomenon more concretely. For instance, Japanese uses 「降り」 with 「霜」 (“fog”) but 「出る」 (“come out”) with 「霧」 (“fog, mist”). In Chinese, 「雨」 uses 「下」, but 「風」 uses 「刮」.

However, according to Matsuse (2007), the Newar language uses WAYE (“come”) for all of these. In this way, even among all of the languages that resemble Japanese at first sight, each language can have different features.

On the other hand, we now look at some languages that are similar to English in this regard.

By the way, there are some languages that can express this idea in both ways. In English, 「雨が急に降りだした」 is expressed without using a verb: “Suddenly rain began to fall.”2 Arka tends to use a verb to express raining, as in “eskat im fis” (“it rained today”), but it can also express it as “esk lunat im fis” (“rain came today”).

However, every language has preferences such as for whether to express raining using a noun or using a verb. If a sentence strays from that preference, it becomes unnatural or ungrammatical.

Perhaps even if a sentence is unnatural, it could be understood. For instance, a person might understand what one means by 「雨る」 or “Rain falls”. Still, these sentences are unnatural. That is, each language has its own idiomatic way to express ideas. This appears in the difference between “it rains” and “rain falls”.

Linguists, domestic and foreign, have various technical terms to refer to this “idiomaticity”, and it is discussed by various fields such as linguistic typology and contrastive linguistics. Within Japan, Ikegami (1981) has been the leading force behind the initial research in this topic.

Still, this idiomaticity described here is a property that Ikegami (2006), for instance, presents as follows: “Is it not the case that those who have consciously grappled with foreign languages, or those who have tried objectifying their mother tongue in a cycle of reinterpreting it, have experienced a tendency for each language to have its own preference in reagrds to this verbalization of each language – that is, for each language to compose expressions in its own way?” (own translation) As understood from this expression, this is often mainly a tendency or preference, rather than a universal rule.

Tendencies that produce idiomaticity

According to a series of studies including Kunihiro (1982), Nakamura (2004), and Ikegami (1981, 2007), it is understood that if a language has a certain feature, then it will be more likely to other specific features.

Ikegami (1981) classifies Japanese as a BECOME-language (「なる」型言語; ナル言語) and English as a DO-language (「する」型言語; スル言語).3

On the other hand, Japanese is a KOTO-language (コト言語) and English is a MONO-language (モノ言語). We cite a good example of KOTO- and MONO- here. As can be seen from (2a), Japanese can be classified as a KOTO-language. (2b) is rather unnatural.

(1) Do you like me?
(2a) 私のことが好きですか。
(2b) ? 私を好きですか。

Interestingly, DO-languages generally tend to be MONO-languages, and “becoming” languages tend to be KOTO-languages. In the same way, DO-languages are usually HAVE-languages, and BECOME-languages are usually BE-languages. In other words, this means that if a certain language is a DO-language, then it has a strong tendency to be a MONO-language and a HAVE-language as well. We now verify this with an example between Japanese and English:

(3) I have a sister.
(4) 私には妹がある(いる)。

Thus, the phenomenon wherein a certain feature implies the existence of others is seen among languages. However, in the end, this is a tendency, not something universal without exception. Nevertheless, the phenomenon in which various features have tendencies toward each other and flock together will be seen in any language; this group of features is considered to constitute the idiomaticity of each language.

The idiomaticity of conlangs

Natlangs have idiomaticity, but what about conlangs?

A posteriori conlangs can acquire their idiomaticity by adopting as a whole the idiomaticity of whichever language they borrow from.

However, because an a priori conlang must be built up from zero, the author might be liable to make their conlang quite atypical from a linguistic point of view, for instance, having it be both a DO- and a BE-language, if they lack this knowledge.

If a conlanger, having knowledge of linguistics, wants to intentionally give an unnaturalistic idiomaticity to a conlang as a thought experiment, then there is no problem. However, if they had made that a priori language as one used by a particular group of people in a ficitional world, it can be said that this language, with its unnaturalistic idiomaticity, is linguistically unnaturalistic. From a creative point of view, the enquiry done for this conlang can be seen as insufficient or crude.

Of course, even on Earth, DO-languages sometimes do have features that should be difficult for DO-languages to have. In the end, please bear in mind that these are only tendencies.

Why conlanging is not a category of linguistics

There are two main reasons that conlanging is not treated as a branch of linguistics.

The first reason is diachronic: in the 19th century, the Société de Linguistique de Paris decided that they would “not accept essays related to language origin theory or universal languages”. Not only universal languages but also conlangs as a whole have been regarded as something to keep quiet about up to today.

The second reason is synchronic: linguistics is “research about the languages cultivated by peoples in daily life, whether spoken today or in the past”. Therefore, conlangs have not been included in this definition from the start.

However, to be frank, these are not the only reasons. I apologize for being too honest, but it is not only because conlangs are a “waiting game”, but also because most conlangs are made by people without linguistic knowledge and end up being unskillful.

And above all, it is because linguists generally lack interest in conlangs in the first place and there is no reward to research conlanging, which is not even within the scope of linguistics, let alone one of its trends. This problem comes before the theory; it is a mere reality.

I should explain what I mean by a “waiting game”.

For instance, the aforementioned research results that “DO-languages are likely to be HAVE-languages” is a tendency made by inductive reasoning based on the inspection of many natural languages.

In addition, according to Matsumoto (2006), the most frequent word order in the world’s languages is SOV, while OSV is the least frequent. This result, too, was found inductively from natural languages. It goes without saying that these results have linguistic significance.

Even so, a conlang can easily overturn these results. One can change the inductive results as one likes, quite like a waiting game. This is because one can make as many OSV languages as one likes.

If conlangs were mixed into natlang linguistics, all of the tendencies that had been discovered hitherto by linguistics would be in danger of being completely toppled. Because of this, it is undesirable for conlangs to be included as a subject of linguistics.

Even if conlangs were within the scope of linguistics, one cannot help limiting them to elaborate a priori languages based on linguistics or completely a posteriori ones. To say even more, even for these conlangs, one cannot help considering them as separate from natlang data.

A conlang without linguistic contraditions

Until now, we have looked at linguistic patterns that were found inductively, and most of these are not universals but rather tendencies. However, these facts are useful enough and should be heeded when creating a conlang.

How to create a conlang that does not have linguistic contradictions – or, to put it mildly, is not unnaturalistic – is to be able to base it on such considerations.

The influence of other languages

Although it may seem paradoxical with what was mentioned earlier, one must also include the study of diachronics if one is to create a conlang that does not seem unnaturalistic or contradict linguistics in the true sense. For instance, one must not simply give a language associated with DO-languages just because it is a DO-language.

Japanese is a BECOME-language, and by nature, it tends away from using inanimate subjects compared to DO-languages. Even so, Kindaichi (1988) states that “the appearance of inanimate subjects in conjunction with passive voice, as with 「戸は開かれたり」 (‘the door is opened’) and 「賽は投げられた」 (‘the die is cast’), has increased due to influence from Western languages.” (own translation)

In this manner, even though Japanese is a BECOME-language, it has some traits of a DO-language or approves of them due to influence from foreign languages. Of course, the reverse is also true. Therefore, even for conlanging, diachronic considerations based on historical context are necessary as well. To think only of the synchronic factors mentioned above is not sufficient.

Then what is to create a conlang based on historical background in more concrete terms?

Arbazard, in which Arka is spoken, has a history spanning thousands of years, and it came to be exposed to threats from various peoples. Many ethnic groups and languages existed within the country, and they left a large amount of influence on Arka4.

Although Arka was originally a DO-language, it coexisted with BECOME-language-speaking peoples within Arbazard for a long time, and so Arka started to gradually adopt features of a BECOME-language. Even in the present, it essentially has strong qualities of a DO-language, as can be seen in (5a):

(5a) an til amel (“I have a younger sister”)
(5b) * amel xa an (“A younger sister is at the place where I am”; this is not ungrammatical but has a different meaning.)

Furthermore, DO-languages tend to have few phenomimes, but Arka is the same in this regard.

Moreover, DO-languages tend to lack indirect passives, but Arka is the same in this regard as well.

In this manner, Arka firmly possesses the characteristics of a DO-language at a basic level.

On the other hand, Arka also has some features of a BECOME-language. We give some examples below.

The features of a DO-language is objective, while the features of a BECOME-language are subjective. Related to this, first-person pronouns are diverse in BECOME-languages, while they tend to be fixed in DO-languages. In reality, first-person pronouns in Japanese are plenty (e.g. 「俺」, 「私」, 「僕」), but English has only “I” and French has only “je”.

Arka is more like a BECOME-language in this regard; there is an assortment of pronouns such as “an”, “non”, “men”, “yuna”, and “noel”. Formerly, Arka had only “del” and “an” as fitting for a DO-language. At most, “non”, for use by women, was thrown into these. The diversification of personal pronouns only started after the influx of different peoples who spoke BECOME-languages.

In addition, with DO-languages, the focus of the verb is the result, while in BECOME-languages, the focus of the verb is the action as a whole.

For example, in the case of “persuade”, one could say 「説得したが駄目だった」 (“persuaded but it was no use”) in Japanese. But in English, “persuade” has the nuance of succeeding in changing someone’s mind; the focus is on the result. Therefore, phrases such as “persuaded but failed” are not idiomatic. One would rather say “tried to persuade but failed”.

In Arka, the word for “persuade” is soso, but this does not have the connotation of successfully changing someone’s mind. From this, Arka can be understood to place the focus of the verb on the action as a whole. That is, another characteristic of a BECOME-language can be found here.

In this way, Arka is created not only with synchronic linguistic tendencies in mind but also with diachronic circumstances.

In order to create an a priori conlang that does not contradict linguistics – that is, to say the least, is not unnaturalistic – one must combine both synchronic and diachronic study.

The tendencies brought out by features of DO- and BECOME-languages are indispensable for creating conlangs. However, when using these, it is absolutely necessary to think diachronically and think about the historical changes and relationships with other peoples.

Dual histories

On one hand, Arka is used in the conworld Kaldia; on the other hand, it was also used in real life by the creators. Therefore, Arka has two histories: one in fiction and one in reality. It is mainly the former that is mentioned in this manuscript. Let us touch on the latter briefly as a supplement.

Because Arka was originally a language formed when speakers of various natlangs were gathered, it had features of both DO- and BECOME-languages. However, as the days went by, the power relationship between DO and BECOME changed. In general, it was thought to be more BECOME-like during the first half of the 1990s and more DO-like during the second half. Until the first half of the 2000s, the tendency toward DO was considered strong, and it is thought that the BECOME components increased in number in the second half. As a result, in the same way as Arka in the conworld Kaldia, it became a language taking from both sides of the divide.

Sister article

Inquiry into the cognitive linguistics of Arka

(The bibliography is omitted here; it can be found in the original article.)

  1. The original sentences were 「彼は私にリンゴを与えた」, 「彼からリンゴをもらった」, and 「彼がリンゴをくれた」, in that order, in case you want to know. The translations of the first and third sentences are switched here because “I gave him an apple” is indeed the natural way to word the information in English. 

  2. One can also say “It suddenly began to rain.” 

  3. There are some slides about this here (in Japanese but with examples) if you’re curious. Thanks to Luke from CDN for finding these. Also see this resource (luckily in English). 

  4. (in original) The ancestor of Arka is called Arbaren, and the ancestor of Arbaren was Lyudiaren; however, to avoid introducing even more technical terms, we refer to all of them as Arka.