A possible Homeland of the Indo-European Languages

And their Migrations after the extended Separation Level Recovery Method (Separation Level Recovery under Two Distributions, SLR2D)

	> Version française>		>Deutsche Version

By Hans J.J.G. Holm

0. Most educated people have at least a rough idea, what 'Indo-European' (IE) languages are: The many languages spoken between the Northwest of Europe to the East of the Indian subcontinent (historically even to Xinjiang in the Northwest of China), which are combined by their common inherited amount of lexemes (e.g. the system of counting or pronouns) as well as the grammar. For basic informations, see any newer encyclopedia, the pertinent Wikipedia sites are substandard. Highly unreliable are pages of non-Indo-Europeanists, often unable to assess the special problems of historical linguistics, lexicostatistics, and prehistory, as addressed in Holm 2007b. Such authors are often recognizable by citations of a few secondary sources or even racial nonsense.

1. However, what is still under discussion, are the pre-historical developments of these languages, the stages of subgrouping. A main error in all these discussions was and still is the superficial view, that a higher amount of agreements automatically and proportionally meant a closer relationship, without noticing that these agreements depend e.g. upon the rest of original residues, or the amount of replacements after the separation of any language (cf. Holm 2003). It should really be understandable that languages with heavy losses (as e.g. Albanian or Armenian), in spite of a close relationship, simply because of their smaller data base, share lesser agreements than so-called big-corpus languages like Greek or Indo-Aryan. All this is regrettably most times overlooked or ignored.

1.1. In fact these parameters - in mathematical terms - depend hypergeometrically upon each other, and must be transformed. Only by this necessary SLRD-transformation we achieve the original state (the amount of features that must have been present in common at the time of separation), the so-called 'separation level'. These figures, for the 91 pairs between 14 attested branches of IE, have been published in Holm 2000.

1.2. As the amount of original features can only decrease in and by historical events, they result in an unambiguous sequence of separations (NOT ´glottochronology´), which can be visualized by a >family-tree, here with the representations of the IE words for 'hand'. This is, of course, a simplification and then can and should be applied to the different hypotheses of a zone of origination ("staging area", "Urheimat") of the speakers of Proto-Indo-European, including the migrations and ending in their final establishment in the concrete geographical area.

1.3. Linguistically proved contacts between earliest stages of Indo-European and Uralian strongly suggest a homeland in the forest steppes north of the Black Sea ('Pontus', cf. e.g. Anthony 2007). Here is a map graph >IE diversification map. Note that the migration routes are up to now not convincingly proven.

2. By going backward from the safe grounds of Hittite historical data, it seems clear that the IE expansion roughly parallels the adoption of the bronze metallurgy, of draught oxen and wagons, and mounded graves (burial hills). That does not mean (!) that speakers of Indo-European invented these techniques and customs, but made extensive use of them. Being herding nomads with a high proportion of horses they had to be good riders. This in turn gave tactical advantages in raids and warfare. The attempts to prove or disprove horseback riding by wear of teeth due to bits overlooks that there are indeed dozens of bitless bridles. The migrations could have happened in quite fewer centuries or somewhat former or later as well.

3. Another point of disagreement and discussion is the question whether the so-called Anatolian languages, in particular Hittite,
- were full members of Proto-IE
- or the latter have achieved their complete development only after the separation of Hittite.
In general, it is a misunderstanding that methods from bio-informatics could speak in favor of one or the other hypothesis, out of reasons addressed in [4]. Further: Cladistic researchers just assume a priori that Hittite has not shared the final development of IE, and use this language as a so-called 'outgroup' to define the starting point of their originally unrooted (!) graph, not vice versa!

4. The momentarily fashionable phylogeny reconstructions by mechanistic misuse of computer packages from the field of biological systematics rest on at least one of two erroneous beliefs:
4.1. The primitive similarity principle, completely ignoring the interdependencies outlined in chapter [1] (the so-called 'Proportionality Trap'- cf. Holm 2003) that languages were closer related the more cognates they share (erroneously looked at as 'evolutional distance'), or even
4.2. That words in languages change like clocks/rates 'by' time, what is obviously wrong, a rehash of the obsolete glottochronology: Look up any word in an etymological dictionary and find the reason for its existence: it will never be 'time', but historical (e.g. cultural, technical, military) events, which nobody ever can foresee = compute: E.g. English did not replace about 50 % of its originally Germanic vocabulary 'by time', but, as educated speakers of English know, by Norman dominance after the battle of Hastings, besides a long-lasting educational background of Latin. That in a so-called "basic vocabulary" the amount of changes is gradually lower, does not at all change the socio-historical reasons and causes, in particular their uncomputability. Even in the basic 100-word list of English, 6% are loans from Viking dialects - unnoticed by these 'experts' (cf. Holm 2007c). Journalists cannot be blamed for not understanding what really is going on in these computations. But it is regrettable that some scholars are not accessible to such basic knowledge of the causalities in language change.
--------



 5. References:
- Holm, Hans J. (2000): Genealogy of the Main Indo-European Branches Applying the Separation Base Method. In: Journal of Quantitative Linguistics 7-2:73-95.
- Application to the Indo-European Etymological Dictionary of Pokorny; updates see 2007a,b below - [ABSTRACT: In earlier quantitative analyses of genetic relations between languages the stochastic bias caused by replacements was not properly eliminated, which only could lead to wrong results. Only owing to the lexeme count of the huge, and thereby statistically significate, database of J. Pokorny, Indogermanisches etymologisches Wörterbuch (Bern: Francke, 1959) in N. Bird, Distribution of Indo- European root morphemes (Wiesbaden: Harrassowitz, 1982), it was possible, in spite of its known shortcomings, to compute the number of lexemes at the era of separation for any pair of languages, by means of a robust estimator. The results allow us to infer a sequence of separation. The customary black and white hypotheses, e.g. pro or contra an Italo-Celtic relationship, cannot do justice to the real developments and must give way to this more differentiated overall view].
- Holm, Hans J. & Embleton, Sheila (2001): Review of 'Mathematical foundations of Linguistics' (by Hubey, H.Mark, 1999, LINCOM handbooks in Linguistics 10, Muenchen: LINCOM); In: Journal of Quantitative Linguistics 8-2:149-62.
- Holm, Hans J. (2003): The proportionality trap, or: what is wrong with lexicostatistical subgrouping? In: Indogermanische Forschungen 108: 39-47.
- The basics, employing only the hypergeometric distribution; also for non-mathematicians - [ABSTRACT: With the help of an experiment it is shown that the raw amount of agreements (e.g. cognate numbers) between any two languages can never express their degree of genealogical relationship. It is then demonstrated, how, by taking into account all statistical determining parameters, the original level of any pair and further the correct subgroupings can be recovered].
- Holm, Hans J. (2005): Genealogische Verwandtschaft. Chapter 45 in 'QUANTITATIVE LINGUISTICS' [HSK-series, vol. 27], Berlin: de Gruyter.
- The 20th century lexicostatistical attempts in language subgrouping, updated 2008 below - [Contents: 1. Wann sind Sprachen "verwandt"? 2. Datenbewertung; 3. Beziehungsmaße; 3.1. Synchrone ~; 3.2. Diachrone Beziehungsmaße; 4. Strukturierung genealogischer Abhaengigkeiten.]
- Holm, Hans J. (2007a): Language Subgrouping. In: Grzybek, P. & R. Köhler (Editors), Exact Methods in the Study of Language and Text. Dedicated to Professor Gabriel Altmann on the occasion of his 75th birthday. [Quantitative Linguistics 62]. Berlin: de-Gruyter: 225-235.
- Handling scatter in multiple subgroupings - [Abstract: After many years of testing, and facing many competing methods, the Separation Level Recovery method (Holm 2000, passim) has been refined in terms of its stochastic and linguistic data requirements. It has been tested on how stochastic scatter can be distinguished from bad data and how data should be improved.]
- Holm, Hans J. (2007b): The new Arboretum of Indo-European "Trees" - Can new Algorithms Reveal the Phylogeny and even Prehistory of IE? In: Journal of Quantitative Linguistics 14-2:167-214 -> Offprints available via http://cats.tfinforma.com/PTS/in?t=rl&m=237780. (For a glance at the MS, click >Arboretum IE trees.pdf.-
- update to 2005, newer lexicostatistical attempts in language subgrouping - [ABSTRACT: Specialization in the fields of linguistics vs. biological informatics leads to growing misunderstandings and false results caused by poor knowledge of the essential conditions of the applied respective methods and material. These are analyzed and the insights used to assess the recent glut of attempts in establishing new phylogenies of Indo-European languages.]
- Holm, Hans J. (2008): The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages. In: Christine Preisach, Hans Burkhardt, Lars Schmidt-Thieme, Reinhold Decker (eds.): Data Analysis, Machine Learning, and Applications. Proc. of the 31th Annual Conference of the German Classification Society (GfKl), University of Freiburg, March 7-9, 2007. Springer-Verlag, Heidelberg-Berlin: 629-636. For a glance at the raw MS click >SLRD.pdf; - Solving distribution problems in corpora of natural languages -> improved IE "Family Tree" - find the full presentation at holm-ie-subgrouping-by-slrd-freiburg-2007-21155000/;
[Abstract: Linguists use to assume that languages were closer related, the more features, in particular common innovations, they share. In Holm (2003) has been demonstrated that this assumption is erroneous because these researchers miss the fact that the amount of shared agreements depends stochastically upon three more parameters. Only by help of the maximum likelihood estimator of the hypergeometric distribution we are able to find the amount of features, which must have been present in both languages at the era of their separation. This way we obtain a chain of separation between a family of languages for which the appropriate data is available. When applied to data of the Pokorny IEW, the resulting late separation of Hittite, Albanian and Armenian could well have been caused by their central position and therefore did not appear suspicious. Only when in a further application to Mixe-Zoquean data the same observation occurred that poorly documented languages appeared to separate late, a systematic bias could be suspected. This work reveals the reason for this bias peculiar to lists of natural languages, as opposed to stochastically normal distributed test cases like those presented in Holm 2007a. As more modern and linguistic reliable database the new "Lexikon der indogermanischen Verben", 2nd.ed. (Rix et al. 2001) was the best choice. Indeed the suspicion was confirmed and it is shown how these biased data can be correctly projected to true separation amounts. The result is a partly new chain of separation for the main Indo-European branches, which fits well to the grammatical facts, as well as to the geographical distribution of these branches. In particular it clearly demonstrates that the Anatolian languages did not part as first ones and thereby refutes the Indo-Hittite hypothesis.]
-"- (2011): "Swadesh lists" of Albanian Revisited and Consequences for Its Position in the Indo-European Languages. The Journal of Indo-European Studies 39-1&2. - English and updated version (note >Corrigenda).-
[In the last decade, several scholars claimed to have finally solved the subgrouping of Indo-European by new lexicostatistical attempts. The public of course was not able to perceive the questionable outcomes, of which the different and idiosyncratic positions of Albanian are particularly conspicuous. One reason for this is the inadequate methods, simply copied from bioinformatics (cf. Holm, H. J. 2007). That defective data may contribute a great deal to these mistakes, is now first demonstrated here by analysing the Albanian part of three representative lists frequently employed in these studies: Thirteen percent of the data on these lists contains errors and this mixes inextricably with the overlooked stochastic dispersion. Seventeen new etymologies are proposed; however, about thirty per-cent of the list remains unsolved or questionable. Moreover, the high amount of differently changing replacements in Albanian is one more compelling argument against the rate assumption in glottochronology.]

- Holm, Hans J., Review of: Frank Sirocko (Hg.) "Wetter, Klima, Menschheitsentwickung, Von der Eiszeit bis ins 21. Jahrhundert". (In German, see German page) - Holm, Hans J. (2011): Archäoklimatologie des Holozäns: Ein durchgreifender Vergleich der "Wuchshomogenität" mit der Sonnenaktivität und anderen Klimaanzeigern ("Proxies"). Archäologisches Korrespondenzblatt 41-1:119-132. For the pdf, please click >Archäoklimatologie
[Abstract: Recent approaches upon the validity of both the homogeneity of tree-ring widths of Middle-European oaks as well as two proxies for the activity of the sun do not stand our thorough comparison. This holds in particular regarding their alleged climatic meaning, e.g. regarding precipitation. Better correspondences, on the other hand, seem to be recognizable for the last 9 000 years between the alpine tree lines, as well as temperature evidence of the NGRIP ice core.]
------------

Started 2010-05-27:
free counters
addr