Skip to content
National Albanian Registry United States of America
13 min read

Paleo-Balkan Peoples and the Origins of Albanians

The diaspora question of where Albanians come from has no simple answer. Scholars trace lines back to the Illyrians, Thracians, and other pre-Roman peoples — but the evidence is suggestive, not closed.

Enri Zhulati

By Enri Zhulati

National Albanian Registry · 501(c)(3) editorial desk

Paleo-Balkan Peoples and the Origins of Albanians
In this article Show
  1. 01 Who the Paleo-Balkan peoples were
  2. 02 The Illyrians and their territory
  3. 03 The Illyrian-Albanian continuity hypothesis
  4. 04 What linguistic substrate evidence shows
  5. 05 Competing theories: Daco-Thracian and late-Romance
  6. 06 Archaeology and the gap in the historical record
  7. 07 Genetic evidence and what it can and can’t tell us
  8. 08 Why this matters to Albanian Americans
  9. 09 How NAR fits into the conversation
Audio Listen to this article
0:00 / —:—

Most Albanian American families inherit one sentence about the deep past. We come from the Illyrians. It gets repeated at weddings, in Sunday schools, in the captions under flag images on Facebook. It is also a sentence that scholars treat with more caution than the average diaspora dinner-table conversation.

The honest version is longer. The peoples who lived in the Balkans before Rome and before the Slavic migrations are called the Paleo-Balkan peoples — Illyrians, Thracians, Dacians, Paeonians, Daco-Mysians, and groups ancient Greek writers grouped under names like Pelasgians. Some of them left fragments of language. Most left almost nothing written. Modern Albanian is the only living Indo-European language whose ancestor was almost certainly spoken among them. Tying that ancestor to a specific Paleo-Balkan group, with confidence, is harder than the popular story suggests.

This article walks through what is actually known and what remains contested. The Illyrian-Albanian continuity hypothesis. The Daco-Thracian alternative. The late-Romance theory. The substrate evidence buried in Albanian vocabulary. The archaeological gap between the last Illyrian kingdoms and the first medieval mention of Albanians. And, finally, why this matters for a US-based community trying to understand its own identity in 2026.

The framing throughout is honest about uncertainty. Scholarly consensus on Albanian origins is partial, not closed. Where it leans, it leans toward continuity in the western Balkans. Where it doesn’t, the gaps are real and worth naming.

Who the Paleo-Balkan peoples were

The Paleo-Balkan peoples are the Indo-European populations who lived in the Balkan peninsula from roughly the second millennium BCE through the Roman conquest, and in some cases into the early medieval period. According to the Wikipedia overview of Paleo-Balkan languages, the grouping is geographic and chronological rather than a single linguistic family with a clean tree.

The major groups, west to east, are these. Illyrians in the western Balkans, along the Adriatic from modern Croatia down through Albania. Thracians in the eastern Balkans, in modern Bulgaria and European Turkey. Dacians north of the Danube in modern Romania. Paeonians in the central Balkans in what is today North Macedonia. Daco-Mysians as a sometimes-separate, sometimes-overlapping group between Dacian and Thracian areas. Phrygians, related but mostly in Anatolia after their migration from the Balkans. And the Pelasgians — a label ancient Greek writers like Herodotus used loosely for pre-Greek populations, more an ethnographic placeholder than a precisely identified group.

Languages spoken across these populations were Indo-European but separate branches. Illyrian, Thracian, Dacian, Phrygian, and Messapic (spoken by Illyrian migrants in southern Italy) are each treated as their own branch or sub-branch in most classifications. None left enough text for a full grammar. What survives is mostly proper names, glosses in Greek and Roman authors, and short inscriptions.

The result is a frustrating kind of evidence. Enough to know these peoples existed and spoke Indo-European languages. Not quite enough to settle every question about how they related to each other, or to modern populations.

The Illyrians and their territory

The Illyrians occupy the largest chunk of the Albanian-American imagination, and they are also the best documented of the Paleo-Balkan peoples relevant to Albanian origins. According to the Wikipedia article on the Illyrians, they were a collection of related tribes living along the eastern Adriatic coast and inland, from the Istrian peninsula in the north down to what is now central Albania in the south.

Major tribes included the Dalmatae, the Liburnians, the Iapydes, the Ardiaei, the Taulantii, the Dassaretae, and the Enchelei. By the third century BCE, an Illyrian kingdom centered at Shkodra had emerged powerful enough to alarm Rome. Queen Teuta’s wars against the Romans in the 220s BCE opened a long Roman campaign that ended with the defeat of King Gentius in 168 BCE and the absorption of Illyrian territory into the Roman provinces of Illyricum, later subdivided into Dalmatia and Pannonia.

What did Illyrians speak. Almost nothing in extended text survives. What does survive is hundreds of personal and place names from inscriptions and Roman authors. Modern linguists have grouped these names into clusters — northern, central, southeastern — which has fueled debate over whether Illyrian was one language or several.

Archaeologically, Illyrian material culture is well attested. Hill forts in the Albanian highlands, tumulus burials with grave goods, characteristic bronze fibulae, the city walls of Bylis and Lissus and Antigoneia. The continuity of habitation in some of these sites from Illyrian into late antique and medieval layers is one of the threads scholars pull on when they argue for population continuity in the region.

The Illyrian-Albanian continuity hypothesis

The leading scholarly account of Albanian origins, in most reference works, is what gets called the Illyrian-Albanian continuity hypothesis. According to the Wikipedia overview of the origin of the Albanians, this view holds that Albanian descends from a Paleo-Balkan language spoken in the western Balkans, most plausibly Illyrian or a closely related variety, and that Albanian-speaking populations have lived in or near their current territory continuously since antiquity.

The continuity case rests on a small number of strong points. The geographic argument is the simplest one. Albanian today is spoken in a region that overlaps significantly with ancient Illyrian territory. No historical record describes a mass migration of Albanian speakers into the area from elsewhere. The medieval Principality of Arbanon, where Albanians first appear under their own name in the eleventh century, sits inside the old Illyrian zone.

The linguistic argument is more technical. Albanian preserves an inherited layer of Indo-European vocabulary that handles the kind of words people don’t borrow — basic kinship, body parts, numbers, geographic features. Some of this vocabulary fits patterns that look at home in the western Balkans rather than the eastern ones. Scholars including the late American linguist Eric Hamp, one of the most cited modern authorities on Albanian’s place in Indo-European, argued for an early divergence of Albanian from neighboring branches and a long, isolated development in the western Balkans.

The hypothesis is the leading one, but it is not proven. Critics point out that the gap between attested Illyrian names and attested Albanian language is several centuries wide, that some Illyrian onomastic patterns don’t match Albanian neatly, and that the archaeological record alone cannot identify the language a population spoke. Continuity is possible. The case for it is suggestive. The case is not closed.

What linguistic substrate evidence shows

Substrate evidence is the most interesting piece of the puzzle, because it is the kind of evidence that can survive even when written records don’t. According to the Wikipedia article on the Albanian language, Albanian is the sole modern member of its own branch of Indo-European, often called the Albanoid branch. Inside that branch, certain patterns suggest a long pre-history in or near the western Balkans.

A few features carry weight. Albanian has a layer of inherited words for local flora and fauna and landscape that look ancient — words for mountains, rivers, native trees. It has Indo-European cognates that show sound changes consistent with an early divergence from neighboring branches. And it has a pattern of Latin loanwords that, importantly, follow western Romance phonological patterns, not eastern ones. The Albanian word mbret (king), from Latin imperator, and a substantial body of related vocabulary entered the language in a specific period and from a specific contact zone.

What about Slavic loanwords. Albanian has a meaningful Slavic layer, but the layer is structured. Slavic words tend to be later, often medieval, and concentrated in domains like agriculture, certain crafts, and church organization. Earlier core vocabulary is mostly inherited or from older Latin and Greek contact. The Wikipedia overview of the Albanian language notes that this layering is itself an argument for continuity, because it suggests an Albanian-speaking population already in place when Slavic speakers arrived in the sixth and seventh centuries.

The substrate evidence is the strongest single piece of the continuity case. It does not, on its own, identify which specific Paleo-Balkan group Albanian descends from. It does suggest that whatever the answer is, the answer involves long, deep, mostly local development.

Competing theories: Daco-Thracian and late-Romance

The continuity case has serious competition. Two alternative models deserve direct treatment.

The Daco-Thracian theory argues that Albanian descends not from an Illyrian or Illyrian-adjacent variety but from a language closer to ancient Thracian, or to the Daco-Mysian group north and east of the main Illyrian zone. Proponents — German linguists Gottfried Schramm and Hermann Olberg are sometimes cited in this tradition — point to certain shared sound changes between Albanian and Romanian’s Thracian substrate, to the cluster of Albanian-Romanian shared vocabulary that doesn’t come from Latin, and to the geographic plausibility of a population shifting south and west during late antique upheavals.

The Daco-Thracian model has real evidence behind it. It also has problems. Thracian itself is even more poorly attested than Illyrian, which makes “closer to Thracian” hard to test rigorously. The shared Albanian-Romanian vocabulary can be explained by a common Balkan substrate without requiring a single ancestral language. And the geographic argument requires a migration that, like the Illyrian one, leaves no clear historical trace.

The late-Romance theory, associated with some twentieth-century scholarship and revived in different forms more recently, treats Albanian as something closer to a Balkan Romance language whose Romance layer was later overlaid or partly displaced. This view is harder to defend in current scholarship. Albanian’s inherited core vocabulary is genuinely Indo-European but not Romance, and the structure of the language does not pattern with the Romance family the way Romanian or Dalmatian does. The late-Romance theory has fewer active defenders today, but it appears in the literature often enough that any honest survey should mention it.

A fair summary: continuity is the leading view, Daco-Thracian is the strongest live alternative, late-Romance is a minority position. Each has serious scholars on its side. Each has gaps the others exploit.

Archaeology and the gap in the historical record

Archaeology is supposed to be the tiebreaker, but it isn’t. The gap in the historical record between the last attested Illyrian polities and the first attested Albanians runs roughly from the third or fourth century CE to the eleventh. Six to eight centuries during which the western Balkans were repeatedly disturbed by Gothic, Avar, Slavic, and Bulgar incursions, by the rise of Byzantine themes, and by the contraction and expansion of Roman and post-Roman authority.

What the archaeology shows is mostly continuity of some sites and disruption of others. Hill forts in northern and central Albania show occupation across this period in many cases. Coastal cities, especially Durrës (ancient Dyrrhachium), persisted as urban centers. Burial customs evolve rather than break completely. Place names in the region preserve pre-Slavic forms in many areas. None of this proves linguistic continuity. None of it disproves it either.

The Wikipedia article on the origin of the Albanians treats the historical-record gap as the central evidentiary problem. Albanians enter the record under their own ethnic name in 1078 AD, in the writings of the Byzantine historian Michael Attaleiates, and slightly later in references to the medieval Principality of Arbanon. By that point, the language is already a developed Indo-European language with clear substrate evidence of long local presence. The work of getting from the third century to the eleventh is reconstruction, not direct documentation.

That gap is what keeps the question open. It is also what makes scholarly honesty important. Anyone who tells you the answer is settled is overstating what the sources say.

Genetic evidence and what it can and can’t tell us

Population genetics has entered this debate over the last fifteen years. Studies of modern Albanian and neighboring populations show patterns consistent with long, deep ancestry in the western Balkans, with the kind of demographic continuity from antiquity to the present that you would expect under the continuity hypothesis. They also show the kind of mixing — with Slavic, Greek, and other neighboring populations — that you would expect from any people living in the Balkans for two thousand years.

Genetic evidence has real limits in this kind of question. Genes are not languages. A population can adopt a new language without genetic replacement, and a population can keep its language across major demographic shifts. The interesting genetic finding for Albanian origins is not a smoking gun for Illyrian descent. It is the absence of a recent, large migration event that would be needed for the late-Romance or major-displacement scenarios.

The honest reading is that genetic data is consistent with the continuity hypothesis but does not prove it. It also weakens the alternatives without ruling them out. As with archaeology, genetics is one input into a multi-disciplinary puzzle.

Why this matters to Albanian Americans

The Paleo-Balkan question is not academic trivia for the diaspora. It is woven into how Albanian Americans answer the question every immigrant family eventually faces from a curious neighbor or a school-age child. Where do you come from. The shorthand answer is Albania. The longer answer is that the people now called shqiptar (the Albanian self-name) are part of a story that runs deeper than any modern map.

That story matters in three concrete ways for a US-based community.

First, it matters for cultural confidence. A community that understands its history is better equipped to teach it. Children who learn that the Albanian language preserves an inherited layer of Indo-European vocabulary, that the homeland sits inside a region documented for two and a half millennia, and that scholarly debate is genuine and ongoing, end up with a richer relationship to their identity than children who inherit only slogans.

Second, it matters for honesty. The diaspora is poorly served by overclaiming. Saying we are the Illyrians as a closed historical fact sets up children for an uncomfortable encounter with the actual evidence later. Saying the leading scholarly account ties us to the Illyrians, and the case is strong though not proven is both more accurate and more durable.

Third, it matters for solidarity across the Albanian-speaking world. The Paleo-Balkan question doesn’t stop at modern borders. The same scholarly debates apply to Albanians in Albania, in Kosovo, in North Macedonia, in Montenegro, to the Italian Arbëresh villages and the Greek Arvanites, and to the families now in New York, Michigan, and Massachusetts. A shared deep history is one of the threads that holds a global community together.

How NAR fits into the conversation

The National Albanian Registry is not a history department. It is a community-led count of Albanian Americans, run as a 501(c)(3) nonprofit, with the goal of producing a real number for a population the US Census has historically undercounted. Roughly 224,000 people self-reported Albanian ancestry in the 2024 American Community Survey. Community estimates put the actual figure closer to one million when second- and third-generation Americans are included.

What does that have to do with the Paleo-Balkan peoples. Two things.

The first is that an accurate count is a way of saying the community continues. The story that runs from the Illyrians and their neighbors through Skanderbeg and Rilindja Kombëtare (the National Renaissance) and the post-1991 diaspora is incomplete without a present-day chapter. The registry is one way to write that chapter clearly.

The second is that a community that takes its history seriously also takes its present seriously. NAR’s count is built with the same disciplined honesty that good Paleo-Balkan scholarship requires. Real numbers, named methodology, gaps acknowledged where they exist. The certificate NAR issues to each registrant is a recognition document — not government identification, not citizenship, not legally binding — but it is a record that the person stood up and was counted.

If the long story matters to you, the short act is simple. Get counted. Each registration is a thread in a record the community owns.

National Albanian Registry

National Albanian Registry Published by National Albanian Registry · 501(c)(3) editorial desk · Editorial standards

FAQ

Common questions

Who were the Paleo-Balkan peoples?

The Paleo-Balkan peoples were the Indo-European populations of the Balkan peninsula before Roman conquest and before the Slavic migrations. They include the Illyrians of the western Balkans, the Thracians of the east, the Dacians north of the Danube, the Paeonians, the Daco-Mysians, and groups ancient Greek writers called Pelasgians. Their languages, often grouped as Paleo-Balkan, are mostly extinct.

Are Albanians descended from the Illyrians?

Many scholars treat the Illyrian-Albanian continuity hypothesis as the leading account, but it is not proven. The strongest evidence is geographic and linguistic continuity in the western Balkans. Competing models propose Daco-Thracian or mixed origins. The Wikipedia overview of the origin of the Albanians describes the question as open, with consensus partial rather than settled.

What is the Daco-Thracian theory?

The Daco-Thracian theory argues that Albanian descends from a language closer to ancient Thracian or Daco-Mysian rather than Illyrian. Proponents point to shared sound changes and certain place-name patterns. Critics counter that Thracian itself is poorly attested and that the geographic argument for an Illyrian or Illyrian-adjacent origin is at least as strong.

Why does this debate matter for Albanian Americans?

Origin stories shape how a community sees itself. For diaspora families teaching children about heritage, the question of whether Albanians are Illyrian, Thracian, or something in between is not academic trivia. It is part of why the language exists, why the homeland sits where it does, and why a community-led count in the United States is worth doing carefully.

What evidence supports continuity in the western Balkans?

The continuity case rests on three pillars. First, Albanian preserves an old layer of inherited Indo-European vocabulary tied to local geography. Second, Latin loanwords entered Albanian early and follow Western, not Eastern, Roman patterns. Third, no record exists of a large Albanian-speaking population arriving from elsewhere. Each pillar is suggestive rather than decisive.

When were Albanians first mentioned by name?

Albanians enter the historical record under their own ethnic name in 1078 AD, in the writings of Byzantine historian Michael Attaleiates, and again in later Byzantine and Latin sources describing the medieval Principality of Arbanon. The gap between the disappearance of the Illyrians from sources and the appearance of Albanians is roughly six to eight centuries.

What does NAR's count have to do with ancient history?

The National Albanian Registry counts Albanian Americans today, not ancient peoples. But the count is a thread in a long story. Each registration is a small act of saying the community continues. Ancient history is the deep context. The registry is the present-day record. Both matter, in different ways, to how the diaspora understands itself.

Was this useful?

One tap. No email. We read every reply.

Discussion

Comments

Loading discussion…

    Leave a comment

    Comments are reviewed before they go live.

    Never published. Used only to verify your address.