Where Albanian sits in the language family tree
Albanian (shqip) is one of the older and stranger branches of the Indo-European family. It is the only surviving member of its sub-branch — called Albanoid or, in older scholarship, Illyric — and it has no close living cousins. About 7 to 10 million people speak it worldwide. Roughly 5 to 6 million live in the Balkans; the rest are diaspora.
For Albanian Americans, this matters in two ways. First, the language is one of the few unbroken threads connecting a family in Detroit or the Bronx to a village in Korçë or Pejë. Second, Albanian is genuinely unusual: a language whose ancestors were here when the Romans arrived, that absorbed Latin without becoming Latin, that came through 500 years of Ottoman rule with its grammar intact, and that holds clues to what the languages of Europe sounded like 2,500 years ago. Worth understanding, even if you only ever speak it at the dinner table.
This explainer covers where Albanian fits in the Indo-European family, the Tosk-Gheg split, the endangered cousins in Italy and Greece, the alphabet and the 1908 Congress that fixed it, the parts of the grammar that surprise English speakers, the loanwords that record Albanian history layer by layer, and where to learn it today.

Where Albanian fits in Indo-European
The Indo-European family is enormous. Roughly half the world speaks one of its descendants. The big branches are familiar names: Romance (Italian, Spanish, French, Romanian), Germanic (English, German, Dutch, Swedish), Slavic (Russian, Polish, Serbian, Bulgarian), Indo-Iranian (Hindi, Persian, Bengali), Hellenic (Greek), Celtic (Irish, Welsh), Baltic (Lithuanian, Latvian), and Armenian. Albanian is one branch on that list, all by itself.
What makes Albanian unusual is that isolation. Greek and Armenian are also sole survivors of their branches, but Greek has 13 million speakers and Armenian has its own historical literature stretching back to the 5th century. Albanian has neither the speaker base nor a comparable ancient written record. That makes the surviving language a kind of living museum: it preserves words and grammatical patterns that other Indo-European languages lost long ago, and the only way to study them is through Albanian itself.
The current scholarly consensus places Albanian as a descendant of one of the Paleo-Balkan languages — most likely Illyrian, the language of the pre-Roman Adriatic coast, possibly with input from Thracian or Daco-Mysian further inland. The strongest evidence for the Illyrian connection is geographic continuity (Albanians have lived where Illyrians lived) and the close link to Messapic, an extinct language of southeastern Italy that early Illyrian-speaking migrants likely carried across the Adriatic. None of those ancestor languages are well attested. They are mostly known from place names, personal names, and short inscriptions. So the reconstruction has gaps, and linguists argue about the details.
What is not in dispute: Albanian is its own thing, it is old, and its core vocabulary preserves Indo-European roots that turn up nowhere else in the same form. That is why every introductory historical linguistics textbook eventually mentions Albanian, even when it does not have space for much else about it.
Tosk and Gheg: the two main dialects
Albanian has two major dialect groups, divided by a river. The Shkumbin, a 181-kilometer waterway in central Albania that the Romans called the Genusus, runs roughly east-to-west and roughly cuts the country in half. Tosk is spoken south of the Shkumbin. Gheg is spoken north of it.
The geographic split matches a population split. Tosk speakers live in southern Albania, parts of southern North Macedonia, the Arvanite communities of southern Greece, and the Arbëresh communities of southern Italy. Gheg speakers live in northern Albania, in Kosovo, in northwestern North Macedonia, and in Montenegro’s Albanian areas. Both groups speak Albanian, both consider themselves Albanian, and the dialects are mutually intelligible, especially in their standard forms.
The differences are real but not enormous. Three things stand out.
Vowels. Gheg keeps nasal vowels — vowels pronounced partly through the nose, the way French keeps them in bon and vin. Tosk has lost the nasal quality. Gheg also has long vowels that Tosk has flattened.
Rotacism. This is the famous one. In Tosk, an older “n” between vowels became “r.” So the word for “sand” is rëra in Tosk and rana in Gheg. The word for “Albanian” itself shows it: the older form is Arbën (preserved in Gheg and in Arbëresh), while the Tosk and standard form is Arbër. Once you notice rotacism, you see it everywhere.
Lexicon and syntax. A few common words differ — Gheg uses kam me for the future tense (“I have to”) while Tosk and the standard use do të. There are slight syntactic divergences, mostly in the verb system.
The 1972 Orthographic Conference in Tirana picked Tosk as the basis for the literary standard. The political context matters: that conference happened under the Hoxha communist government, which was based in Tirana but drew much of its early support from the Tosk south. Many Gheg-speaking Kosovars and northerners felt the standardization had erased part of their dialect from public life. The friction has eased over the decades, and most Albanian writers and broadcasters now use the standard while preserving Gheg in literature, song, and family speech. We avoid taking sides on the debate; both dialects are Albanian, both are valid, and the standard exists because every modern language needs one.
The endangered diaspora variants
Three older offshoots of Albanian live outside the Balkans, all of them descended from medieval migrations and all of them now small and threatened.
Arbëresh (also written Arbërisht) is spoken in about 50 villages in southern Italy and Sicily, by descendants of Albanian Christians who fled Ottoman expansion in the 13th, 14th, and 15th centuries. Roughly 100,000 people speak it. It is an archaic Tosk dialect — closer to medieval Albanian than any modern variety — preserved by isolation. Arbëresh has absorbed substantial Italian and Sicilian vocabulary, but the grammar remains recognizably Albanian. UNESCO classifies it as definitely endangered.
Arvanitika is spoken by the Arvanites of southern Greece — Attica, Boeotia, and a few islands — by descendants of settlers from roughly the same era as the Arbëresh. Fewer than 50,000 people still speak it, almost all of them elderly. Greek nationalist policy in the 19th and 20th centuries pushed Arvanitika out of public life, and Arvanite communities largely shifted to Greek monolingualism. UNESCO lists it as severely endangered. Without active intervention it will not survive another two generations.
Cham Albanian is the dialect of the Çamëria region of northwestern Greece (Greek: Thesprotia). Cham speakers were ethnically cleansed from Greece between 1944 and 1945 and resettled in Albania, where their dialect survives among descendants of the displaced. A small population still speaks it inside Greece. Cham preserves features of southern medieval Albanian and is closely related to Arvanitika.
These variants are not just curiosities. They are the linguistic record of when and how Albanians moved across the Mediterranean, and what the language sounded like before the modern standard was set.
The alphabet and writing
Before 1908, Albanian was a language without an agreed-upon alphabet. Writing it was a religious and regional decision more than a linguistic one. Catholic priests in the north used the Latin alphabet. Orthodox writers in the south used the Greek alphabet. Muslim writers used a modified Arabic script. And several local intellectuals — frustrated with all of those options — invented their own scripts entirely.
The most ambitious of the home-grown alphabets is the Elbasan script, which appears in an 18th-century manuscript called the Anonimi i Elbasanit (the Anonymous of Elbasan). It has 40 characters, none of them borrowed from Latin or Greek, and was probably designed by a single literate Orthodox Albanian who wanted Albanian Christianity to have its own letters. The Vithkuqi alphabet, devised in 1844 by the merchant Naum Veqilharxhi, was another attempt — 33 letters, also original. There were others: Todhri, Veso Bey, the Beratinus codices. One scholarly count puts the number of distinct Albanian alphabets in serious use over the centuries at ten.
The fragmentation made Albanian publishing nearly impossible. A book printed in Greek script could not be read by a Catholic from Shkodër; a book printed in Arabic script alienated Christians; a Latin-script book offended Ottoman religious authorities. Throughout the 19th-century Rilindja (national awakening), Albanian writers debated alphabet choice as fiercely as any political question.
The matter was settled at the Congress of Manastir (in modern Bitola, North Macedonia), which ran November 14 to 22, 1908. Eleven delegates representing Albanian communities across the Ottoman Empire and the diaspora agreed on a single Latin-based alphabet of 36 letters. The decision was not unanimous in spirit — some Orthodox delegates preferred Greek script, some Muslim delegates preferred Arabic — but the practical pressure for a unified writing system carried the day. The 1908 alphabet is the one Albanian schoolchildren still learn from the abetare (the primer; the same word now used as shorthand for the alphabet itself).
The 36 letters are: a, b, c, ç, d, dh, e, ë, f, g, gj, h, i, j, k, l, ll, m, n, nj, o, p, q, r, rr, s, sh, t, th, u, v, x, xh, y, z, zh. Two have diacritics: ë (a soft schwa, like the e in English “the”) and ç (the “ch” in “church”). Nine are digraphs that count as single letters: dh (the “th” in “this”), gj (a soft palatal “g”), ll (a thicker, darker “l”), nj (the “ny” in “canyon”), rr (a rolled R, distinct from a tapped r), sh (English “sh”), th (the “th” in “thin”), xh (the “j” in “judge”), and zh (the “s” in “measure”). There is no W.
A second decisive moment was the 1972 Orthographic Conference in Tirana, which standardized the literary form of the language across both dialects and tightened the spelling rules that publishers, schools, and broadcasters use today. As noted above, the standard is Tosk-based, and that choice has been politically debated. Linguistically, what the 1972 conference did was give Albanian a single written register for the first time in its history.
Grammar, briefly, for the curious
A few features of Albanian grammar surprise English speakers and reward attention.
The definite article is suffixed to the noun. Where English puts “the” in front (the book), Albanian glues it to the end. Libër is “book.” Libri is “the book.” Libra is “books.” Librat is “the books.” This is one of the features that links Albanian to a wider Balkan Sprachbund — Romanian and Bulgarian also suffix their definite articles, almost certainly because they were in contact with early Albanian (or with whatever Paleo-Balkan substrate Albanian descends from).
Six grammatical cases. Albanian nouns inflect for nominative, accusative, genitive, dative, ablative, and vocative. Each case has different endings depending on gender and definiteness. This is more cases than German (four) but fewer than Russian (six in practice, more if you count vestiges) or Latin (six). After a few months of practice, learners stop thinking about it.
Two genders, masculine and feminine. Most masculine nouns end in a consonant or in -i; most feminine nouns end in -ë or -a. There are exceptions, and a small set of “ambigender” nouns that change gender between singular and plural — a quirk Albanian shares with Italian.
The admirative mood. This is the showpiece. Albanian verbs have a special set of forms used to express surprise, disbelief, hearsay, or reported information you are not vouching for personally. English handles this with adverbs and tone — “apparently,” “supposedly,” “really?” Albanian conjugates it directly into the verb. Ai punon means “He works.” Ai paska punuar means something like “Wait, he actually worked?” or “He’s said to have worked.” It is one of the rare verb categories outside the Indo-Iranian and Balkan families.
Word order is mostly SVO (Subject-Verb-Object), like English, but more flexible. Because case endings carry the grammatical work, Albanian can move words around for emphasis without confusing the listener.
For learners, the takeaway is that Albanian is consistent. The patterns are not random; once you internalize them, the language stops feeling alien and starts feeling rhythmic.
Loanwords as a window into Albanian history
One of the most useful things about Albanian, for historians as much as for linguists, is that the language records its own contact history. Each major period of foreign influence left a layer of loanwords, and the layers are still visible.
Latin (~600 words). The Roman period left a deep imprint. Albanian has mik “friend” from Latin amicus, qytet “city” from civitas, mbret “king” from imperator, kalë “horse” from caballus, peshk “fish” from piscis. The pattern of which Latin words got borrowed and which did not gives clues to how Romans and proto-Albanians actually interacted — agriculture, urban administration, the military, and the church show heavy borrowing; the most basic everyday vocabulary mostly stayed Albanian.
Slavic (~500 words). From the 6th and 7th centuries on, Slavic-speaking populations settled across the Balkans and Albanian absorbed common nouns from them: pop “priest,” gozhdë “nail,” trup “body,” bisedë “conversation.” Place names across modern Albania record the same contact.
Ottoman Turkish (~3,000 words). The largest single source of Albanian loanwords is the 500 years of Ottoman rule. Most are nouns of daily life, administration, and trade: xhep “pocket,” çaj “tea,” dyqan “shop,” çorape “socks,” jastëk “pillow,” sahat “clock,” kafe “coffee.” A 20th-century purification movement scrubbed many Turkish loanwords out of formal Albanian — replacing them with Latin- or Greek-derived neologisms — but the everyday register kept most of them.
Modern Italian, Greek, and English. Italian contributed heavily through Venetian Adriatic trade and again through 20th-century cinema and broadcast: makinë “machine, car,” frigorifer “refrigerator.” Greek contributed older borrowings (krevat “bed”) and many modern technical terms. English now contributes the technological and pop-culture vocabulary you would expect: kompjuter, internet, email.
What survives in pure Albanian — without obvious Indo-European cousins or recognizable loans — is the deepest core: numbers (një, dy, tre), most body parts (sy “eye,” dorë “hand,” zemër “heart”), basic kinship (nënë “mother,” babë “father,” vëlla “brother,” motër “sister”), and words for weather, terrain, and basic actions. Linguists treat that core as the load-bearing evidence that Albanian descends directly from a Paleo-Balkan ancestor and is not a creole or a heavily reshaped daughter language.
Speaking, learning, and using Albanian today
For diaspora families, the question is rarely “should we learn Albanian” — it is “how do we keep it alive in our kids.”
Apps: Drops has an Albanian course (vocabulary-focused). Dinolingo targets young children with Albanian as one of its supported languages. Mango Languages offers Albanian through many U.S. public library systems — free with a library card.
Online courses: msoshqip.com and learnalbanian.com are community-run, with structured lessons and grammar references. Both are free or low-cost. Italki and Preply let you book one-on-one sessions with Albanian teachers — useful for adults who want conversation practice.
University programs: UCLA teaches Albanian as part of its Center for European and Russian Studies. The University of Chicago offers Albanian periodically through its Slavic department. Mercy College in New York has had Albanian-language coursework supported by the local diaspora. Other programs come and go depending on enrollment.
Saturday schools: Across the U.S., Albanian-American churches, mosques, and community organizations run weekend shkolla shqipe programs for kids. These exist in New York, Detroit, Boston, Chicago, Worcester, Philadelphia, Houston, and dozens of smaller community centers. They are usually free or pay-what-you-can. The programs vary in quality; the best of them are the most reliable way for U.S.-born Albanian American kids to grow up genuinely bilingual.
Media: Reading and listening matter. Albanian-language news (RTSH, RTK, Top Channel), music (Albanian rap, traditional polyphonic singing, festivals like Festivali i Këngës), and social media (Albanian YouTube, TikTok, podcasts) are all freely available. For a heritage learner, twenty minutes of Albanian audio a day will do more for fluency than any app.
Why this matters to the registry
Language is the most fragile piece of diaspora identity. A grandparent who came over speaking Gheg, a parent who grew up code-switching, a child who hears Albanian at home but answers in English — that is how a language exits a family in three generations. It happens quietly, and it is rarely undone.
The point of counting Albanian Americans is partly to make a community visible to itself. People who know how many of them there are, where they live, and what they are doing tend to invest in the things — Saturday schools, language programs, community centers, Albanian-language media — that keep the language alive for the next generation. Communities that stay invisible to themselves let those things go.
If your family speaks shqip, your kids speak shqip, or you wish they did — get counted. It is free, it takes about a minute, and it tells the next funder, the next school organizer, and the next community leader that you are here.