International Adaptation of Measurement Instrument

Going Global: 39 Language Versions of the BFI-2-XS

Beatrice Rammstedt*1,2 , Lena Roemer1 , Dorothée Behr1 , Matthias Bluemke1 , Clemens Lechner1 , Steve Dept3, Laura Wäyrynen3, Chris Soto4, Oliver P. John5

Measurement Instruments for the Social Sciences, 2025, Vol. 7, Article e14067, https://doi.org/10.5964/miss.14067

Received: 2024-02-28. Accepted: 2024-11-28. Published (VoR): 2025-01-22.

Handling Editor: Ronald Fischer, Victoria University of Wellington, Wellington, New Zealand

*Corresponding author at: Leibniz Institute for the Social Sciences, PO Box 12 21 55, 68072 Mannheim, Germany. E-mail: beatrice.rammstedt@gesis.org

This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

In the 2023 Survey of Adult Skills (PIAAC), the Big Five personality traits were assessed using the 15-item extra-short form of the Big Five Inventory-2 (BFI-2-XS). For this purpose, the instrument was translated into 24 languages and adapted to 29 countries, resulting in 39 language versions. This translation and adaptation process followed state-of-the-art procedures to generate language versions of the BFI-2-XS that are maximally comparable across countries and regions. In the present paper, we describe this general translation procedure from a methodological point of view. We also document each resulting language version and report in detail the decisions taken during the translation process and the adaptations made to preexisting national versions of the BFI-2-XS. Our aim is to share with researchers the resulting BFI-2-XS language versions developed with high quality standards to allow maximal cross-cultural comparability. Our intention in so doing is to enable their wider usage beyond PIAAC.

Keywords: Big Five Inventory, BFI-2-XS, translation and adaptation, PIAAC, cross-cultural

In recent decades, the Big Five personality traits have become widely accepted as a framework to parsimoniously describe personality on a global level (e.g., John et al., 2008; McCrae & Costa, 2008). This has led to broad interest in their assessment, even in fields beyond core personality research, such as sociology, economics, and public health. Numerous cross-cultural surveys, such as the World Values Survey (WVS; Ludeke & Larsen, 2017) and the Survey of Health, Ageing, and Retirement in Europe (SHARE; Levinsky et al., 2019), now include measures of the broad personality dimensions, allowing researchers to study cultural differences in these traits and their relations to other constructs of interests.

Large-scale survey studies such as the WVS or SHARE are (a) multi-theme surveys that focus on topics other than personality, and (b) aim to assess the Big Five domains as predictors or correlates at the population level, rather than use them for individual diagnostic purposes (e.g., admission tests). To meet the demands of large-scale surveys, there is a need for ultra-efficient measures—of the Big Five as well as other constructs of interest—whose psychometric properties allow group comparisons.

Several ultra-short instruments assessing the Big Five have been developed in recent decades. One of the most prominent and widely used instruments in large-scale assessments is the 10-item Big Five Inventory (BFI-10; Rammstedt & John, 2007; for an overview of its usage and psychometric performance, see Rammstedt, Roemer, & Lechner, 2024), an ultrashort version of the established 44-item Big Five Inventory (BFI-44; John et al., 1991, 2008). Such ultra-short measures assess the global Big Five domains, but they cannot reflect their complexity (i.e., the fact that each broad domain subsumes several more-specific facet traits). As researchers have become more and more interested in assessing not only the global domains but also the narrow facets of the Big Five, instruments reflecting this hierarchical structure, such as the 60-item BFI-2 (Soto & John, 2017a), have been developed. The BFI-2 distinguishes between three central facets in each Big Five domain (e.g., Sociability, Assertiveness, and Energy Level within the Extraversion domain). To allow the assessment of both global and facet levels in research contexts with time limitations, Soto and John (2017b) developed and validated two abbreviated forms of the BFI-2: (a) the 30-item BFI-2-S and (b) the BFI-2-XS, comprising a 15-item subset of the 30 BFI-2-S items. By including one item from each of the three facets defining each Big Five domain, the 3-item domain scales of the BFI-2-XS cover the full breadth of the Big Five dimensions as defined in the original BFI-2. A more technical advantage of the BFI-2-XS compared to previous ultra-short Big Five scales (e.g., the BFI-10 with its two items per domain) is that each of its domains can be modeled as a latent variable, because three items are enough for a latent variable to be (just) identified in confirmatory factor analysis.

Assessing the Big Five in the Survey of Adult Skills (PIAAC)

The Programme for the International Assessment of Adult Competencies (PIAAC) is an established international comparative large-scale study program initiated and run by the Organisation for Economic Co-operation and Development (OECD). Its main product, the Survey of Adult Skills, assesses key cognitive skills such as literacy and numeracy in the adult population, allowing for analyses of differences in the level of skills within and between countries as well as the predictors and outcomes of skills.

Complementing its in-depth assessment of adults’ (cognitive) competencies, in its current second cycle (2022/2023), PIAAC has, for the first time, included social-emotional skills as one of several background variables in its extensive background questionnaire. For this purpose, it was decided to employ a measure assessing the Big Five dimensions of personality, which are often used as proxies of social-emotional skills (Lechner, Anger, & Rammstedt, 2019). To investigate which Big Five instrument best fit its needs, the OECD convened an expert panel and conducted comprehensive national and international pilot studies in which the full BFI-2 was assessed. Results of these pilots suggested that the variance explained in numerous outcomes of interest to PIAAC was highly comparable for the 60-item BFI-2, the 30-item BFI-S, and the 15-item BFI-XS (Rammstedt, Lechner, & Danner, 2024). Because of time constraints in the administration of the background questionnaire, the OECD initially decided to assess only the two dimensions of Conscientiousness and Openness with six items each from the BFI-2-S in the PIAAC Field Trial. Nonetheless, twenty-four countries opted to administer the full 30-items BFI-2-S in the Field Trial. For the Main Study, the decision was taken to cover all dimensions, but to use the shorter BFI-2-XS instrument to save time. National project teams of the participating countries were given the option to administer the additional items from the BFI-2-S if they wished. Out of the 28 countries that opted to participate in the Big Five assessment in the Main Study, 12 decided to test the full 30-item BFI-2-S in the Main Study. In total, the instruments were translated into 23 languages and adapted to 29 countries, resulting in 39 language versions.

Aims of the Present Study

The aims of the present study are to make these language versions of the BFI-2-S and BFI-2-XS available to a wider research audience and to present the general state-of-the-art translation and adaptation procedure followed in PIAAC. In the Results section, we will describe the major decisions taken and adaptations made in the multi-step development of the final BFI-2-XS fielded in the Main Study of PIAAC. In doing so, we will adopt a national or, where necessary, a language-by-country perspective. Details of the adaptation steps by country and item are documented in an online appendix on the Open Science Framework (OSF, see Roemer et al., 2024). Finally, and most importantly, we also share the resulting final translated BFI-2-XS versions (and BFI-2-S versions if available) to enable their use in the broader scientific community. Although OECD guidelines regarding data confidentiality preclude the publication of empirical results from the PIAAC Field Trial, widely sharing these language versions of the BFI-2-S and BFI-2-XS, as well as the procedures used to develop and validate them, will facilitate both within-country and international research on personality traits and social-emotional skills in a broad range of languages and cultures around the world.

Method

The language versions of the BFI-2-XS were produced for the current second cycle of PIAAC (referred to in what follows as PIAAC 2023). Since PIAAC aims to draw conclusions about human capital and its usage in the various participating countries, it is crucial that all measures used be comparable across countries. To achieve this, PIAAC ensures compliance with very high methodological standards for cross-cultural translations and adaptations. PIAAC therefore includes a large-scale Field Trial conducted in all participating countries in form of a full “dress rehearsal” of the study including the same instruments and design as the Main Study. It was based on between ≈ 450 to ≈ 1,800 respondents per country.1

The OECD commissioned an international consortium to plan the design of PIAAC and to monitor its implementation in the participating countries. The PIAAC Consortium member with primary responsibility for the translation and cultural adaptation of the survey instruments, is the language service provider cApStAn. Within each country, national centers were established to implement PIAAC according to a detailed set of methodological standards to ensure comparability of the language versions across countries and the overall quality of the resulting data. Data and results are due to be published in 2024, including a technical report detailing the psychometric properties of the BFI-2-XS (Roth et al., 2024).

Countries and Language Regions

The social-emotional skills module, which comprised the BFI-2-XS, was an international option of the PIAAC 2023 background questionnaire. This means that countries could decide whether to include it or not. Of the 31 participating countries, only three—Korea, Japan and the United States—opted not to do so. In Korea, however, the BFI-2-S instrument was administered during the Field Trial and subsequent changes were implemented, resulting in a validated language version (even though it was not fielded in the Main Study). The Korean adaptation of the BFI-2-S thus fully followed the process described below and is therefore included in the present documentation.

The 29 countries for which the social-emotional skills module for PIAAC 2023 was developed are listed in Table 1. In eight of these countries, the module was administered in multiple languages (e.g., Spain, where regional languages such as Basque, Catalan and Galician are used in addition to Spanish). For the different language groups within these countries, separate instruments were prepared. Some countries (e.g., Germany, Austria, and Switzerland) share an official language. However, as the spoken language differs slightly between these countries, different language versions were prepared, albeit from a common starting point. This resulted in a total of 39 different language versions of the social-emotional skills module in PIAAC 2023.

Table 1

Translation and Adaptation Steps and Decisions for the 39 Language Versions of the BFI-2-(X)S

LanguageCountryAdaptations of the BFI-2-S for PIAAC Field Trial
Adaptations of the BFI-2-XS for PIAAC Main Study
BFI-2 version in FT aPre-existing version?
(y/n)
if preexisting version:
Changes implemented by cApStAn?b
National domain experts in cApStAn translation process?Change requests during verification (e.g., by NC or cApStAn)bChanges imple-mented? bAny comments after the Field Trial? cChanges imple-mented? cFielded BFI-2 -XS version identical to?Fielded BFI-2- XS version similar to?Comments
ara (Arabic)ISR (Israel)Only C, Oy0n0 out of 120 out of 1232Changes for Main Study could also refer to items not fielded in the Field Trial.
cat (Catalan)ESP (Spain)BFI-2-Sy10n0000val – ESPdCatalan and Valencian are counted as the same language. BFI-2-S fielded in Main Study.
ces (Czech)CZE (Czech Republic)BFI-2-Sy19n0000High number of changes aiming at gender-consistent wording.
BFI-2-S fielded in Main Study.
Changes in response options.
dan (Danish)DNK (Denmark)BFI-2-Sy0n0100
deu (German)DEU (Germany)BFI-2-Sy3y1100deu - AUTdeu – CHEeBFI-2-S fielded in Main Study.
Changes in response options.
deu (German)AUT (Austria)BFI-2-Ssee DEUsee DEUsee DEU1100deu - GERdeu – CHE5eChanges in response options
deu (German)CHE (Switzerland)BFI-2-Ssee DEUsee DEUsee DEU1100deu – DEU, AUTeChanges in response options
eng (English)SHP (Singapore)BFI-2-Sy (master)nn2200eng - NZLeng – CAN, GBR, IRLf
eng (English)NZL (New Zealand)BFI-2-Sy (master)nn1100eng - SHPeng – CAN, GBR, IRLfBFI-2-S fielded in Main Study.
eng (English)IRL (Ireland)BFI-2-Sy (master)nn0000eng - CAN, GBReng – NZL, SHPf
eng (English)CAN (Canada)BFI-2-Sy (master)nn1100eng - IRL, GBReng – NZL, SHPfBFI-2-S fielded in Main Study.
eng (English)GBR (Great Britain)Only C, Oy (master)nn0 out of 120 out of 1200eng - IRL, CANeng – NZL, SHPf
esp (Spanish)ESP (Spain)BFI-2-Sy5n0000esp – CHLgBFI-2-S fielded in Main Study.
esp (Spanish)CHL (Chile)BFI-2-Ssee ESP (esp)9n31500esp – ESPgHigh number of changes aiming at gender-consistent wording
BFI-2-S fielded in Main Study.
est (Estonian)EST (Estonia)BFI-2-Sy0n2211BFI-2-S fielded in Main Study.
Changes in response options
fin (Finnish)FIN (Finland)BFI-2-Snn/ay1144Changes in response options
fra (French)FRA (France)BFI-2-Sy0n4455fra – CHE, CANh
fra (French)CHE (Switzerland)BFI-2-Ssee FRAsee FRAsee FRA1100fra – FRA, CANh
fra (French)CAN (Canada)BFI-2-Ssee FRAsee FRAsee FRA7711fra – CHE, FRAhBFI-2-S fielded in Main Study.
heb (Hebrew)ISR (Israel)Only C, Oy1n0 out of 120 out of 1244Changes for Main Study could also refer to items not fielded in the Field Trial.
hrv (Croatian)HRV (Croatia)BFI-2-Snn/ay0000BFI-2-S fielded in Main Study.
hun (Hungarian)SVK (Slovakia)BFI-2-Ssee HUNsee HUNsee HUN0000hun – HUNiBFI-2-S fielded in Main Study
hun (Hungarian)HUN (Hungary)Only C, Onn/ay3 out of 123 out of 1200hun – SVKi
ita (Italian)ITA (Italy)BFI-2-Snn/an131300ita – CHEjHigh number of changes aimed at gender-consistent wording
BFI-2-S fielded in Main Study.
ita (Italian)CHE (Switzerland)BFI-2-Ssee ITAsee ITAsee ITA0000ita – ITAj
kor (Korean)KOR (South Korea)BFI-2-Snn/an0033Korea did not field the BFI-2 XS nor the BFI-2 S in the Main Study.
Changes in response options
lav (Latvian)LAV (Latvia)BFI-2-Snn/an1100
ltu (Lithuanian)LIT (Lithuania)Only C, Onn/an1 out of 121 out of 1211Changes for Main Study could also refer to items not fielded in the Field Trial.
nld (Dutch)NLD (Netherlands)BFI-2-Sy0n2000nld – BELk
nld (Dutch)BEL (Belgium)BFI-2-Ssee NLDsee NLDsee NLD1100nld – NLDk
nob (Norwegian)NOR (Norway)BFI-2-Sy0n5511BFI-2-S fielded in Main Study.
Changes in response options
pol (Polish)POL (Poland)BFI-2-Sy0n242411High number of changes, aimed mainly at gender-consistent wording and correcting typing errors
Changes in response options
por (Portuguese)PRT (Portugal)BFI-2-Snn/an01400High number of changes: Country changed translations without requesting changes.
BFI-2-S fielded in Main Study.
rus (Russian)EST (Estonia)BFI-2-Sy9n1000rus - LVABFI-2-S fielded in Main Study.
rus (Russian)LVA (Latvia)BFI-2-Ssee EST (rus)see EST (rus)see EST (rus)0000rus - EST
slo (Slovakian)SVK (Slovakia)BFI-2-Sy10n0022BFI-2-S fielded in Main Study.
Changes in response options.
swe (Swedish)SWE (Sweden)BFI-2-Snn/ay0011swe – SWElChanges in response options
swe (Swedish)FIN (Finland)BFI-2-Ssee SWEsee SWEsee SWE01128swe – FINlHigh number of changes: Country changed translations without requesting changes.
Changes in response options
val (Valencian)ESP (Spain)BFI-2-Ssee ESP (Cat)see ESP (Cat)see ESP (Cat)0000cat – ESPdCatalan and Valencian are counted as the same language.
BFI-2-S fielded in Main Study.

Note. If not otherwise indicated, the BFI-2-XS was fielded in Main Study. FT = Field Trial; NC = national center; y = yes; n = no.

a "Only C, O" indicates that only the six Conscientiousness and six Openness items from the BFI-2-S were administered in the Field Trial (i.e., 12 items in total). b Refers to the 30 items of the BFI-2-S, if not otherwise specified. c Refers to the 15 items of the BFI-2-XS. Thus, in cases where only selected domains were fielded, change requests could also refer to items not fielded in the Field Trial. d Similar versions: cat – ESP similar to val – ESP (9x language-specific suffixes; 1x different pronoun translation). e Similar versions: deu – DEU/AUT similar to deu – CHE (1x differently translated response options). f Similar versions: eng – CAN/GBR/IRL similar to eng – SHP/NZL (1x spelling difference) . g Similar versions: esp – ESP similar to esp – CHL (3x different item formulations; 2x different adjectives; 3x different types of gendering). h Similar versions: fra – FRA similar to fra – CAN (5x different formulations; 3x different punctuation) and similar to fra – CHE (5x different formulations); fra – CHE similar to fra – CAN (1x different formulations; 3x different punctuation). i Similar versions: hun – HUN similar to hun – SVK (1x different adjective; 1x different item formulations). j Similar versions: ita – ITA similar to ita – CHE (5x different order gendering). k Similar versions: nld – NLD similar to nld – BEL (1x different item formulation). l Similar versions: swe – SWE similar to swe – FIN (2x different item formulation; 1x spelling difference.

Source Instrument

The source instrument for the social-emotional skills module in PIAAC 2023 was the Anglo-American original version of the BFI-2-XS (Soto & John, 2017b), which consists of 15 short-phrase items—three per Big Five domain. These items are answered on a 5-point rating scale ranging from disagree strongly (1) to agree strongly (5); the neutral category is labeled “neutral; no opinion” (3).

To facilitate translation and comparability, all 15 items of the BFI-2-XS were adapted slightly to form full sentences rather than phrases (see Appendix A in the Supplementary Materials, Roemer et al., 2024). The formulation of the response categories was also adapted slightly to fit the format typically applied in PIAAC. The adapted labels were "strongly disagree," "disagree," "neither agree nor disagree," "agree," and "strongly agree."

Translation Procedure

State-of-the-art translation procedures were applied for all instruments assessed in PIAAC. The specific procedure applied for the translation of the social-emotional skills module deviated from that applied for the other measures administered in PIAAC, in that the Big Five module was translated centrally by the PIAAC Consortium rather than by teams of the individual participating countries. The intention was to take account of the sensitivity of the Big Five items to even slight meaning shifts and to ensure a monitored double translation and reconciliation design for these items. In preparation for the translations, the PIAAC Consortium (together with Christopher J. Soto, the first author of the BFI-2-XS) compiled a list of existing translations of the BFI-2-XS. These existing (and often already validated) versions (see Table 2) were used as a source for the national adaptations of the BFI-2-XS to be used in PIAAC. Further, a list of one or more content experts for the Big Five in each of the 29 countries participating in the Big Five assessment was compiled, many of whom were authors of existing BFI-2 translations or at least similar personality inventories. These experts were later contacted to serve as expert reviewers for any new translations into the target languages.

Table 2

Languages of the BFI-2-S/XS Adaptations and Sources of Preexisting BFI-2(-S) Versions

LanguagePre-existing BFI-2(-S) version
1ArabicTranslation was in progress; no publication has yet resulted.
2Catalan/ValencianTranslation was in progress; no publication has yet resulted
3Croatian
4CzechHřebíčková et al. (2020)
5DanishVedel et al. (2021)
6DutchDenissen et al. (2020)
7EnglishSoto and John (2017a, b)
8EstonianTranslation was in progress; no publication has yet resulted.
9Finnish
10FrenchTranslation from PIAAC Pilot was used.
11GermanDanner et al. (2019), Rammstedt et al. (2020)
12HebrewTranslation was in progress; no publication has yet resulted.
13Hungarian
14Italian
15Korean
16Latvian
17Lithuanian
18NorwegianFøllesdal and Soto (2022)
19PolishTranslation from PIAAC Pilot was used; translation was in progress no publication has yet resulted.
20Portuguese
21RussianShchebetenko et al. (2020)
22SlovakianHalama et al. (2020)
23SpanishGallardo-Pujol et al. (2022)
24Swedish

Note. The references in the table are for publications that have resulted from the preexisting versions that were used in the current project.

Translation Notes

To ensure item comparability across language versions, and to prevent mistranslations, translation and adaptation notes (also known as item-specific guidelines or translation annotations) were provided for the BFI-2-S. These notes describe what specific words or phrases mean in measurement terms so that translators can transfer this meaning correctly without having to adhere too closely to the wording and structure of the source instrument. For the present purpose, the translation notes for the BFI-2 provided by Soto and John (personal communication, November 14, 2018) were used and expanded by cApStAn. For instance, the item “I am compassionate, have a soft heart” was accompanied by the note: “‘Has a soft heart’ means ‘is caring and compassionate’.” The adapted translation notes for the BFI-2-S items are provided in Appendix B (see Roemer et al., 2024).

Initial Translated BFI-2-S Versions

In a first step, it was checked whether a translation of the BFI-2-S into the target language already existed (for a full list of translations used see Table 2). This could be a version used in a previous OECD pilot study (see Rammstedt, Roemer, & Lechner, 2024) or a version from an independent translation project (e.g., for Germany, Rammstedt et al., 2020). These translated versions were then used as the basis for the corresponding national adaptations. The respective translators, who were selected, trained, and supervised by cApStAn on behalf of the PIAAC Consortium, reviewed these preexisting translations and suggested edits if needed—for example, if the existing item translation did not conform to the translation notes. Revisions to the adaptations made to the BFI-2-S source version used for PIAAC were also carried out at this stage.

If no translation of the BFI-2-S into the target language previously existed, the PIAAC Consortium produced translations of the items following a double translation and reconciliation approach (Lyberg et al., 2021), a slightly modified version of the TRAPD2 procedure (Harkness, 2003). Specifically, two independent translations into the target language were produced by professional translators with extensive experience in translating surveys and psychological assessments. These two versions were then reconciled into one translation by a senior questionnaire translator who merged them by (a) selecting the best components from each version; (b) selecting one version over the other; or, very rarely, (c) proposing a new version in case neither of the provided versions was deemed satisfactory. Problematic issues were discussed and resolved in a subsequent meeting between the initial translators and the reconciler. For all corresponding countries, scholars from the above-mentioned list of domain experts for the Big Five were then contacted. Whenever possible, the resulting translations were reviewed by these experts.

In the case of countries sharing an official language (e.g., German for Germany, Austria, and Switzerland), one language version (either preexisting or newly translated) served as the starting point for all countries and subsequently underwent further country-specific adaptation).

Regardless of the translation approach (use of an existing translation version or translation from scratch), national teams were asked to review the translations provided by the PIAAC Consortium and request changes if problems were identified. These requests were reviewed by independent verifiers (linguists trained to identify potential equivalence issues in translated or adapted questionnaires) commissioned by cApStAn, and in some cases also by further members of the Consortium (e.g., domain experts). This step was called verification. Based on this feedback, the initial translated versions were finalized and assessed in the PIAAC Field Trial.

All steps in the process from translation to finalization of the instrument were rigorously documented, including the different translation versions at each step and additional comments by translators, verifiers, country teams, and domain experts if translation challenges or problems arose.

Final National Versions of the BFI-2-XS

Data from the PIAAC Field Trial were analyzed centrally by the PIAAC Consortium.3 For the BFI-2-XS, the analyses focused on descriptive statistics and distributions of responses as well as dimensionality, scale reliability, and validity of the five domains. Besides overall inspection of predictive validity for several criteria, confirmatory factor analysis was used to inspect within each country the model fit for each dimension and for a joint 5-dimensional measurement model. In addition, multi-group confirmatory factor analysis was applied to inspect the cross-national functioning of the measurement of the five domains.

Country teams received a common international report, as well as country-specific national reports. Based on an alignment procedure within multi-group confirmatory factor analysis, countries could identify any flagged severe national deviations from the international results. Based on these findings, country teams were asked to review their respective item translations for potential translation biases.

Country teams could then request revisions to their national BFI-2-S/BFI-2-XS versions from the PIAAC Consortium. Requests had to be limited to major issues, such as mistranslations that caused errors in the Field Trial responses, preferential changes or minor (inconsistency) requests were not approved. The final BFI-2-XS/BFI-2-S versions administered in the PIAAC Main Study were thus either the initial versions used in the Field Trial or the slightly modified versions incorporating country team feedback based on the Field Trial results. The general translation process is summarized in Figure 1.

Click to enlarge
miss.14067-f1
Figure 1

Translation and Adaptation Process for the BFI-2-S/BFI-2-XS Followed in PIAAC

Results

Table 1 outlines for each of the 29 countries and 39 language versions the exact translation procedure and adaptation steps followed for the BFI-2-XS. The detailed procedure and adaptation steps per country version and item are described in Appendix C; the final 39 BFI-2-XS language versions are provided in Appendix A (for all appendices see Roemer et al., 2024).

As described above, in PIAAC it was initially intended to assess only two of the Big Five dimensions—Conscientiousness and Openness—with six items each from the BFI-2-S. However, 24 of the 29 countries that opted to assess the Big Five in PIAAC 2023 decided to implement the full BFI-2-S in the PIAAC Field Trial. Thus, for these 24 countries translations and/or adaptations of the full BFI-2-S were prepared, resulting in 34 language versions. These are provided in Appendix A and are also available through an open access repository for measurement instruments (https://zis.gesis.org/en).

Development of the Initial BFI-2-S/BFI-2-XS Language Versions

For 15 of the 24 languages in which the PIAAC social-emotional skills module was administered, BFI-2-S/BFI-2-XS versions already existed, some which had already been validated (see Table 2). These existing language versions were consulted and adapted for use in PIAAC. For all these country versions, the adaptations applied to the source version (see above) were implemented. In addition, the PIAAC Consortium sometimes also changed item wording. These adaptations were often minor, for example, adding the feminine gender (as done in Czech) or correcting typing errors. However, in some cases, for example in the Spanish and Slovakian adaptations, items were reformulated to better represent the construct in the target language, align with translation notes, or capture common language use.

For nine languages (or 12 language versions), there were no preexisting BFI-2-S adaptations. For these, the PIAAC Consortium conducted translations and, where possible, had them reviewed by Big Five experts in the respective countries.

Review of the Initial BFI-2-S/BFI-2-XS Language Versions (Before the Field Trial)

All language versions were thoroughly reviewed by the respective national teams in the verification phase prior to the Field Trial, and changes were requested when needed. These requests were then discussed within the PIAAC Consortium and implemented where appropriate.

For the purpose of this study, we coded the requests for changes—and thus the errors in the existing translations identified by the national centers—according to a customized MQM (Multidimensional Quality Metrics) error typology, focusing only on relevant top error categories. MQM is a widely used framework for systematic translation quality assessment (https://themqm.org/). We differentiated the following error types: (a) accuracy errors (i.e., meaning-related errors); (b) accuracy errors as flagged in the Field Trial; (c) linguistic convention errors related to the linguistic correctness of the text (spelling, grammar, punctuation, etc.); (d) errors of style that reflect an inappropriate language use (awkward or unidiomatic text, etc.); (e) errors related to survey-specific terminology that reflect inappropriate wording of response scales; (f) gender errors (i.e., missing or inconsistent gendering); and (g) other (e.g., changes to the English source wording or to the wording used in another country). In Appendix D, we provide further details on our coding approach.

In general, change requests were typically approved, with the exception of three requests that were considered preferential and therefore not implemented. Some countries also requested that the labeling of the response scale options be modified. These cases are documented in Appendix C. In the following, we will therefore concentrate on requests for changes to item formulations, which were also approved and implemented.

The number of requested and implemented changes before the Field Trial classified into the various error types are shown in the left-hand panel of Figure 2. As can be seen in the left panel of Figure 2, across all country versions, 49% (n = 37) of the requested and implemented changes to item formulations during the verification phase prior to the Field Trial related to adaptations of item gendering. Twenty-four percent (n = 18 requests) of the changes were due to linguistic conventions, and 21% (n = 16 requests) were due to accuracy of the item formulations. As an example of the latter, the French team requested that the preexisting French BFI-2 adaptation of the original item “I can be cold and uncaring” should be changed. It was originally translated as “Je suis parfois dédaigneux/euse, méprisant(e)” [I am sometimes disdainful and contemptuous]. The national center argued that “dédaigneux/euse" and "méprisant(e)" were stronger terms in French than "cold" and "uncaring" in the source item. This would have compromised the comparability of the translation with the other language versions and could have resulted in comparatively lower item means. The French team therefore requested that the item should be reformulated to "Je suis parfois indifférent(e), insensible" [I am sometimes indifferent, insensitive]. Averaged across all country versions, 10% of the BFI-2-S items were changed before the Field Trial, with changes per country version ranging between 0% and 80% of items.

Click to enlarge
miss.14067-f2
Figure 2

Revisions to the Translations of the BFI-2-XS Before the Field Trial and the Main Study by Multidimensional Quality Metrics (MQM) Category

Testing the BFI-2-S/BFI-2-XS Language Versions in the Field Trial

The finalized BFI-2-S versions (or in some cases only two of the five trait domains) were then administered in a large-scale Field Trial in each country (see Footnote 1) with a total sample size of nearly 30,000 respondents. The resulting data were analyzed centrally by the PIAAC Consortium as described above. Based on these analyses, countries received detailed feedback on the psychometric performance of the national BFI-2-S adaptations. The psychometric evaluation comprised inter alia descriptive statistics on both the scale and the item level, scale reliability, exploratory and confirmatory factor analyses, and measurement invariance analyses across countries. Through these analyses, divergences from the international results were flagged, and country teams were asked to inspect these cases especially.

Review of the BFI-2-S/BFI-2-XS Language Versions Tested in the Field Trial

Based on this psychometric feedback, country teams examined the item formulations, looked for specific causes of item misinterpretations, and suggested reformulations. The option to request changes was availed of for 13 of the 39 language versions. As can be seen in the right panel of Figure 2, nearly half of the changes requested and later implemented were due to specific items being flagged in the Field Trial (46% or n = 15 requests).4 For example, in the Finnish-language version for Finland, the item “I tend to be quiet” was flagged. The national center reviewed the Finnish translation and concluded that the formulation used in the Field Trial—“Olen yleensä hiljainen” [I am usually quiet]—was stronger than the source formulation. They thus requested that the item wording be changed to “Olen usein hiljainen” [I am often quiet].

Other requests for changes were also due mostly to issues of accuracy in the item formulation—however, without the item issue being explicitly referred to in the documentation as having been flagged in the Field Trial (27% or n = 9 requests). Further, in a few cases, item translations were adapted to harmonize them with other versions of the same language or to correct typing errors. On average, 6% of the BFI-2-XS items per country version were changed at this point (with a range of 0% to 53%), resulting in the final BFI-2-XS versions administered in the PIAAC Main Study.

Discussion

In the present study, we presented adaptations of the BFI-2-XS developed as part of PIAAC Cycle 2 for 29 countries in 24 languages, resulting in a total of 39 language versions. New language versions were developed following a thorough and state-of-the-art procedure with two independent professional translations, reconciliations, expert consultations whenever possible, verifications, empirical pretesting, and, overall, an in-depth review and revision process. Preexisting translations similarly underwent linguistic reviews, verifications, empirical pretesting, review, and revision. All decisions taken in each individual step were documented to promote transparency and future use.

The translation and adaptation procedure was fully standardized and conducted in parallel with the aim of achieving fully comparable BFI-2-XS versions across all countries and languages. This approach of centrally producing adaptations in parallel based on a unified methodological framework and within a joint project structure has an advantage over independent individual translation projects because it has a higher likelihood of achieving comparable language versions. A lack of comparability could bias conclusions drawn from cross-cultural research.

The multi-step procedure described here, which included both domain experts and professional translators, was time and resource intensive. The results of the PIAAC Field Trial and Main Study indicate that this was time and money well spent, in that the procedure yielded brief measures of the Big Five traits that function well (cf. Roth et al., 2024) and were developed following a fully standardized and comparable procedure to ensure high comparability across languages and cultures.

The aims of the present study were twofold. From a general point of view, we aimed to present this state-of-the-art translation and adaptation procedure. From a more pragmatic perspective, our aim was to share the different language versions of the BFI-2-XS developed in this process with the research community. We hope that other researchers can now use these translations in their own research, thereby promoting both within-country and international research across a broad range of languages and cultural contexts.

However, there is also a clear need for further research. Although all BFI-2-S and BFI-2-XS versions presented here were developed thoroughly, they have not yet been fully validated. The exact psychometric properties of each adaptation should thus be investigated in future studies and should be compared with those of the original English-language source version. Further, for languages in which alternative BFI-2 versions already existed, or in cases where the preexisting version was modified in the present adaptation process, future studies should compare these different adaptations and advise the research community about which version is best suited for research and applied use.

In sum, the present study describes the development and implementation of a coordinated procedure for developing, revising, and validating a psychological inventory in multiple languages and cultural contexts simultaneously. We hope that it will help enhance the quality of future translation projects, thereby increasing the comparability of the resulting language versions. Of equal importance, by sharing the resulting adaptations of the BFI-2-XS we also hope to support future cross-cultural research using the Big Five.

Notes

1) Due to fieldwork constraints during the COVID-19 pandemic, a small subset of countries employed only simulated data to study the design and emulate the processes.

2) T: Translation, R: Review, A: Adjudication, P: Pretest, D: Documentation.

3) Data of the PIAAC Field Trial are confidential. Thus, no empirical findings can be reported here.

4) The categorization of the requested changes in the category “Accuracy flagged in Field Trial” is a conservative estimate, as we counted only those requests that explicitly referred to such flagging.

Funding

The authors have no funding to report.

Acknowledgments

We thank the OECD for providing valuable feedback on earlier versions of the manuscript.

Competing Interests

Beatrice Rammstedt is Editor-in-Chief of Measurement Instruments for the Social Sciences. While this role includes the technical possibility to intervene in the review process, she declares that she did not make use of this possibility and was not involved in any aspect of the editorial or peer review process for this article. Matthias Bluemke and Clemens Lechner are Associate Editors, but had no technical possibility to influence the review process. The editorial process was independently managed by other members of the editorial team.

Ethics Statement

The study includes no individual data. The country feedbacks summarized in the appendix have been approved by the different countries.

Data Availability

Data is available on the OSF site (see Roemer et al., 2024).

Supplementary Materials

For this article, the following Supplementary Materials are available (see Roemer et al., 2024):

  • the fielded language versions of the BFI-2-XS or BFI-2-S

  • the translation notes applied for the BFI-2-XS and BFI-2-S

  • a spreadsheat with language version specific comments and decisions taken on item formulations

  • a description on the error coding approach

Index of Supplementary Materials

  • Roemer, L., Rammstedt, B., Soto, C., Bluemke, M., Lechner, C., & John, O. P. (2024). Going global: 39 language versions of the BFI-2-XS [Data, materials]. OSF. https://osf.io/u4f36

References

  • Danner, D., Rammstedt, B., Bluemke, M., Lechner, C., Berres, S., Knopf, T., Soto, C., & John, O. (2019). Das Big Five Inventar 2: Validierung eines Persönlichkeitsinventars zur Erfassung von 5 Persönlichkeitsdomänen und 15 Facetten. Diagnostica, 65(3), 121-132. https://doi.org/10.1026/0012-1924/a000218

  • Denissen, J. J. A., Geenen, R., Soto, C. J., John, O. P., & van Aken, M. A. G. (2020). The Big Five Inventory–2: Replication of psychometric properties in a Dutch adaptation and first evidence for the discriminant predictive validity of the facet scales. Journal of Personality Assessment, 102(3), 309-324. https://doi.org/10.1080/00223891.2018.1539004

  • Føllesdal, H., & Soto, C. J. (2022). The Norwegian adaptation of the Big Five Inventory-2. Frontiers in Psychology, 13, Article 858920. https://doi.org/10.3389/fpsyg.2022.858920

  • Gallardo-Pujol, D., Rouco, V., Cortijos-Bernabeu, A., Oceja, L., Soto, C. J., & John, O. P. (2022). Factor structure, gender invariance, measurement properties, and short forms of the Spanish adaptation of the Big Five Inventory-2. Psychological Test Adaptation and Development, 3(1), 44-69. https://doi.org/10.1027/2698-1866/a000020

  • Halama, P., Kohút, M., Soto, C. J., & John, O. P. (2020). Slovak adaptation of the Big Five Inventory (BFI-2): Psychometric properties and initial validation. Studia Psychologica, 62(1), 74-87. https://doi.org/10.31577/sp.2020.01.792

  • Harkness, J. (2003). Questionnaire translation. In J. Harkness, F. J. R. van de Vijver, & P. P. Mohler (Eds.), Cross-cultural survey methods (pp. 35–56). Wiley.

  • Hřebíčková, M., Jelínek, M., Květon, P., Benkovič, A., Botek, M., Sudzina, F., Soto, C. J., & John, O. P. (2020). Pětifaktorový dotazník BFI-2: Hierarchický model s 15 subškálami [Big Five Inventory 2 (BFI-2): Hierarchial model]. Československá Psychologie: Časopis Pro Psychologickou Teorii a Praxi, 64(4), 437-460.

  • John, O. P., Donahue, E. M., & Kentle, R. L. (1991). The Big Five Inventory: Versions 4a and 54. University of California, Berkeley, Institute of Personality and Social Research.

  • John, O. P., Naumann, L. P., & Soto, C. J. (2008). Paradigm shift to the integrative Big Five trait taxonomy: History, measurement, and conceptual issues. In O. P. John, R. W. Robins, & L. A. Pervin (Eds.), Handbook of personality: Theory and research (3rd ed., pp. 114–158). Guilford.

  • Lechner, C. M., Anger, S., & Rammstedt, B. (2019). Socio-emotional skills in education and beyond: Recent evidence and future research avenues. In R. Becker (Ed.), Research handbook on the sociology of education (pp. 427–453). Edward Elgar. https://doi.org/10.4337/9781788110426.00034

  • Levinsky, M., Litwin, H., & Lechner, C. M. (2019). Personality traits: The ten-item Big Five Inventory (BFI-10). In M. Bergmann, A. Scherpenzeel, & A. Börsch-Supan (Eds.), SHARE Wave 7 Methodology: Panel innovations and life histories (pp. 29–34). MEA, Max Planck Institute for Social Law and Psychology. http://www.share-project.org/fileadmin/pdf_documentation/MFRB_Wave7/SHARE_Methodenband_A4_WEB.pdf

  • Ludeke, S. G., & Larsen, E. G. (2017). Problems with the Big Five assessment in the World Values Survey. Personality and Individual Differences, 112, 103-105. https://doi.org/10.1016/j.paid.2017.02.042

  • Lyberg, L., Pennell, B.-E., Cibelli Hibben, K., de Jong, J., Behr, D., Burnett, J., Fitzgerald, R., Granda, P., Luz Guerrero, L., Gyuzalyan, H., Johnson, T., Kim, J., Mneimneh, Z., Moynihan, P., Robbins, M., Schoua-Glusberg, A., Sha, M., Smith, T. W., Stoop, I., . . . Zechmeister, E. J. (2021). AAPOR/WAPOR task force report on quality in comparative surveys. https://wapor.org/wp-content/uploads/AAPOR-WAPOR-Task-Force-Report-on-Quality-in-Comparative-Surveys_Full-Report.pdf

  • McCrae, R. R., & Costa, P. T., Jr. (2008). Empirical and theoretical status of the five-factor model of personality traits. In G. J. Boyle, G. Matthews, & D. H. Saklofske (Eds.), The SAGE handbook of personality theory and assessment, Vol. 1. Personality theories and models (pp. 273–294). Sage. https://doi.org/10.4135/9781849200462.n13

  • Rammstedt, B., & John, O. P. (2007). Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German. Journal of Research in Personality, 41(1), 203-212. https://doi.org/10.1016/j.jrp.2006.02.001

  • Rammstedt, B., Danner, D., Soto, C. J., & John, O. P. (2020). Validation of the short and extra-short forms of the Big Five Inventory-2 (BFI-2) and their German adaptations. European Journal of Psychological Assessment, 36(1), 149-161. https://doi.org/10.1027/1015-5759/a000481

  • Rammstedt, B., Lechner, C. M., & Danner, D. (2024). Beyond literacy: The incremental value of non-cognitive skills [OECD Education Working Papers, No. 311]. OECD Publishing. https://doi.org/10.1787/19939019

  • Rammstedt, B., Roemer, L., & Lechner, C. (2024). Consistency of the structural properties of the BFI-10 across 16 samples from eight large-scale surveys in Germany. European Journal of Psychological Assessment, 40(3), 204-215. https://doi.org/10.1027/1015-5759/a000765

  • Roth, M., Singh, R., & Lechner, C. M. (2024). Chapter 11 - Social-emotional skills [Manuscript in preparation]. Technical Report for PIAAC Cycle 2. OECD.

  • Shchebetenko, S., Kalugin, A. Y., Mishkevich, A. M., Soto, C. J., & John, O. P. (2020). Measurement invariance and sex and age differences of the Big Five Inventory–2: Evidence from the Russian version. Assessment, 27(3), 472-486. https://doi.org/10.1177/1073191119860901

  • Soto, C. J., & John, O. P. (2017a). The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. Journal of Personality and Social Psychology, 113(1), 117-143. https://doi.org/10.1037/pspp0000096

  • Soto, C. J., & John, O. P. (2017b). Short and extra-short forms of the Big Five Inventory–2: The BFI-2-S and BFI-2-XS. Journal of Research in Personality, 68, 69-81. https://doi.org/10.1016/j.jrp.2017.02.004

  • Vedel, A., Wellnitz, K. B., Ludeke, S., Soto, C. J., John, O. P., & Andersen, S. C. (2021). Development and validation of the Danish Big Five Inventory-2: Domain- and facet-level structure, construct validity, and reliability. European Journal of Psychological Assessment, 37(1), 42-51. https://doi.org/10.1027/1015-5759/a000570