Introduction to the Soviet census data presented here


Here you will find an electronic version of the most important parts of the census data on ethnicity and language proficiency from 1939, 1959, 1970, 1979 and 1989. For the data from 1970, 1979, 1989, you will find the columns that are explained in the table below. In the file 1979 according to republic, you will find the data for some Autonomous Republics and Areas in the North-Western part of the Soviet Union. The original listed all areas, of course, but I only wrote down the areas directly relevant to my own research (I invite others to carry on the work!). In the file 1989 data for the Peoples of the Northern Areas you will find the data only for these 26 peoples. The last file, covering 1939-89, is explained below.

Since the first release, the following errors have been detected and corrected:

980310, (thanks to John Clifton):
The assimilation percentages for Dungan (dungansk) for 1989 were not correct (the data from another lg had crept in instead), but they are now corrected.
NMmax and NMmin were mixed in the Silver formula below, and an errouneus paranthesis was removed
The formula for ABmax had R2 (pro correctly R1) as logical maximum.

Språk
Name of language (in Norwegian)
gr.
Genetic affiliation of the language
område
Main administrative area where the lg is spoken
# etn. ´89
Number of individuals that defined themselves as belonging to the ethnic group in question
#mom=titspr
Number of individuals that defined themselves as speaking the language of their ethnic group as their mother tongue
#mom=russ
Number of individuals that defined themselves as speaking Russian as their mother tongue
#mom=3dje
Number of individuals that defined themselves as speaking some language other than Russian or their titular language as their mother tongue
#2spr=titspr
Number of individuals that defined themselves as speaking the language of their ethnic group as their second language
#2spr=russ
Number of individuals that defined themselves as speaking Russian as their second language
#2spr=3dje
Number of individuals that defined themselves as speaking some language other than Russian or their titular language as their second language



The following 8 columns contain information about how large percentage of the minority population is unassimilated, partly or totally assimilated into the major population. The columns refer to the socalled Silver formula (Silver 19xx), dveloped for estimating degree of assimilation on the basis of Soviet census data (the terms and the formula are explained below):

NMmax

Maximal number of Native monolinguals

NMmin

Minimal number of Native monolinguals

UBmax

Maximal number of Unassimilated bilinguals

UBmin

Minimal number of Unassimilated bilinguals

ABmax

Maximal number of Assimilated bilinguals

ABmin

Minimal number of Assimilated bilinguals

AMmax

Maximal number of Assimilated monolinguals

AMmin

Minimal number of Assimilated monolinguals

Explanation of the Silver formula


For a thorough presentation of the formula, see silvers original exposure, or Lallukka's pedagogical exposure. The formula itself is simple, and this explanation should be enough to grasp the point.
N, R and T stand for "Native (to the ethnic goup)", "Russian" and "Third", respectively.
Native monolinguals (NM) know only the language of their group.
Unassimilated bilinguals (UB) have the language of their group as their mother tongue, but know the majority language as well (in most cases this language is of course Russian).
Assimilated bilinguals (AB) have the majority language as their mother tongue, but speak the language of their ethnic group as a second language. In language shift processes, this group is typically very small.
Assimilated monolinguals (AM) are the ones that have reported themselves as belonging to a minority group still not speaking this language.
Ignoring people knowing any language other than the language of the ethnic group or Russian, the formulas for estimating these four groups are very simple. Taking only people belonging to the ethnic group in question, we get the following formulas:

NM=N1-R2

The ones with the native languate as first language minus the ones with Russian as their second

UB=R2

The ones with Russian as their second language

AB=N2

The ones with the native language as their second language

AM=R1-N2

The ones with Russian as their first language minus the ones with the native language as their second language


In real life we cannot exclude this third language, though, and each formula must be corrected for the interference of the third language.
Logical limitations

1. Native monolinguals

NMmax = N1-R2+T1

cannot exceed N1


NMmin = N1-R2

cannot be less than 0

2. Unassimilated bilinguals

UBmax = R2

cannot exceed N1


UBmin = R2-T1

cannot be less than 0

3. Assimilated bilinguals

ABmax = N2

cannot exceed R1


ABmin = N2-T1

cannot be less than 0

4. Assimilated monolinguals

AMmax = R1-N2+T1

cannot exceed R1


AMmin = R1-N2

cannot be less than 0

Note that these logical limitations are not worked into the tables (I did not find a way of programming conditional clauses in Excel). So, whenever you find a negative percentage (for NMmin, UBmin, ABmin or AMmin), simply replace it by zero. Correspondingly, if you, for NMmax or UBmax find numbers greater than N1 or for ABmax or AMmax find numbers greater than R1, replace them with the corresponding N1 and R1 values.

The file, 1939-89 only contains data on first language knowledge. In addition to what can already be read out of the other files, it contains the data from 1959 and 1939 (these censuses did not contain information on second language frofiency), and it contains a comparision of mother tongue retention for each time-span.
Here is the legend to the columns:

Språk

Language name (in Norwegian)

gr.

Genetic group of the language in question

# etn. ´39

Number of people identifying themselves with the ethnic group in question in 1939

# talar ´39

Number of people claiming to have the language in question as mother tongue in 1939

talar-%

Speakers in percent of members of ethnic group


Similarily for the years 1959, 1970, 1979, 1989.

p:59-39

Change in language profiency percentage from 1939 to 1959

p:70-59

Change in language profiency percentage from 1959to 1970

p:70-39

Change in language profiency percentage from 1939 to 1970

59av39

Speakers in 1959 as percentage of speakers in 1939

70av59

Speakers in 1970 as percentage of speakers in 1959

70av39

Speakers in 1970 as percentage of speakers in 1939

89av39

Speakers in 1989 as percentage of speakers in 1939



Adm.stat.

Administrative status of the language in question

ssr

Language of a Soviet Socialist Republic

assr

Language of an Autonomous Soviet Socialist Republic

ao

Language of an Autonomous Area

ingen

No administrative status

nabo

Official language of one of the neighbour countries of the Soviet Union



>´37

Before 1937 there was...

´37>

After 1937 there was...

u

developed a literary language

iu

no literary language developed (or, if developed earlier, not in use)

Kommentar

short comments for myself. These are not checked, I did not write down the source (in some cases Comrie is the source) and should thus be erased. Do not quote these comments.

A note on the reliability and use of the data presented here


Among demographs and sociolinguists, the Soviet census data are generally held as a reliable source, with some known exceptions (e.g. the Nganasan, that are reported with too high native language profiency), in any case they are the only ones available, and probably the most extensiv demographic database of this size and complexity. Norway, to cite a country of the present homepage, left out questions of language proviciency from its census data during the first half of this century, and even in earlier questionnnaires, second language proficiency was never asked for.
Another source of errors not to be ignored is the typist, i.e. myself. I typed in these numbers from the published sources. After typing them in I went over them and checked, but I still cannot guarantee that they are error-free. So, in case of unexpected data, this electronic version should be checked against the original. needless to say, I would appreciate reports on any detected error in this material.
This material may freely be used for research upon Soviet linguistic, sociolinguistic and sociological matters. The reason I now make it available is exactly to promote such research. The material may not be used for commercial purposes. In case you use it, I would appreaciate that you mention the source and make reference to this site.
In addition to the data given here, there are more data available both in the published and in the unpublished sources. The data are also broken down in age cohorts, urban/rural, rayon by rayon, etc. Rather than aspiring at presenting the whole material, I present what I have, and hope that this may inspire researchers to go to the archives after more fine-grained material.

Primary sources


Vsesojuznaja perepis´ naselenija 1939 goda. Osnovnye itogi.Rossijskaja Akademija Nauk. Moskva 1992. (1939)
Itogi vsesojuznoj perepisi naselenija 1959 goda (svodnyj tom). Tsentral'noe statistitsjeskoe upravlenie pri sovete ministrov SSSR. Moskva 1962. (1959)
Itogi Vsesojuznoj perepisi naselenija 1970 goda. Moskva. Statistika (1970)
Tsjislennost' i sostav naselenija SSSR Po dannym Vsesojuznoj perepisi naselenija 1979 goda. Moskva. Finansy i statistika. 1984. (1979)
Vestnik Statistiki 10/1990. Moskva. Finansy i statistika. (1989)

Literature:

Lallukka, Seppo 1990: The East Finnic Minoritites in the Soviet Union. Annales academiæ scientiarum Fennicæ ser. B tom. 252.
Silver, B: 1975: Methods of Deriving Data on Bilingualism from the 1970 Soviet Census.Soviet Studies 27:4.

Heim | Språkvitskap | Språk og samfunn | Språk og IT | Språk | Andre sider
Om desse sidene | Näistä sivuista | About these pages
Lingvistisk institutt | Humanistisk fakultet | Universitetet i Tromsø