<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD with MathML3 v1.2 20190208//EN" "JATS-journalpublishing1-mathml3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" article-type="research-article" dtd-version="1.2" xml:lang="en">
<front>
<journal-meta><journal-id journal-id-type="publisher-id">MISS</journal-id><journal-id journal-id-type="nlm-ta">Meas Instrum Soc Sci</journal-id>
<journal-title-group>
<journal-title>Measurement Instruments for the Social Sciences</journal-title><abbrev-journal-title abbrev-type="pubmed">Meas. Instrum. Soc. Sci.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2523-8930</issn>
<publisher><publisher-name>PsychOpen</publisher-name></publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">miss.16869</article-id>
<article-id pub-id-type="doi">10.5964/miss.16869</article-id>
<article-categories>
<subj-group subj-group-type="heading"><subject>Advances in Methodology</subject></subj-group>

<subj-group subj-group-type="badge">
<subject>Data</subject>
	<subject>Materials</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Don’t Pull Any Old Personality Taxonomy From the Shelf: The Performance of Historical and Sample Derived Taxonomies in Extracting Personality Information From Text</article-title>
<alt-title alt-title-type="right-running">Sample Derived Personality Taxonomy</alt-title>
<alt-title specific-use="APA-reference-style" xml:lang="en">Don’t pull any old personality taxonomy from the shelf: The performance of historical and sample derived taxonomies in extracting personality information from text</alt-title>
</title-group>
<contrib-group>
	<contrib contrib-type="author" corresp="yes"><contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0001-5166-0728</contrib-id><name name-style="western"><surname>Karl</surname><given-names>Johannes A.</given-names></name><xref ref-type="corresp" rid="cor1">*</xref><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib>
<contrib contrib-type="author"><contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0002-3055-3955</contrib-id><name name-style="western"><surname>Fischer</surname><given-names>Ronald</given-names></name><xref ref-type="aff" rid="aff3"><sup>3</sup></xref></contrib>
<contrib contrib-type="editor">
<name>
	<surname>Perugini</surname>
	<given-names>Marco</given-names>
</name>
<xref ref-type="aff" rid="aff4"/>
</contrib>
	
<aff id="aff1"><label>1</label><institution content-type="dept">Department of Psychology</institution>, <institution>University of Zurich</institution>, <addr-line><city>Zurich</city></addr-line>, <country country="CH">Switzerland</country></aff>
<aff id="aff2"><label>2</label><institution content-type="dept">School of Psychology</institution>, <institution>Victoria University of Wellington</institution>, <addr-line><city>Wellington</city></addr-line>, <country country="NZ">New Zealand</country></aff>
	
	<aff id="aff3"><label>3</label><institution content-type="dept">Cognitive Neuroscience and Neuroinformatics Unit</institution>, <institution>Institute D’Or for Research and Education</institution>, <addr-line><city>São Paulo</city></addr-line>, <country country="BR">Brazil</country></aff>
	
	<aff id="aff4">University of Milan Bicocca, Milan, <country>Italy</country></aff>
	
	
</contrib-group>
<author-notes>
	<corresp id="cor1"><label>*</label>Binzmuehlestrasse 14, 8050 Zurich, Switzerland. <email xlink:href="Johannes.karl@psychologie.uzh.ch">Johannes.karl@psychologie.uzh.ch</email></corresp>
</author-notes>
<pub-date date-type="pub" publication-format="electronic"><day>16</day><month>04</month><year>2026</year></pub-date>
<pub-date pub-type="collection" publication-format="electronic"><year>2026</year></pub-date>
<volume>8</volume>
<elocation-id>e16869</elocation-id>
<history>
<date date-type="received">
<day>02</day>
<month>02</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>15</day>
<month>01</month>
<year>2026</year>
</date>
</history>
<permissions><copyright-year>2026</copyright-year><copyright-holder>Karl &amp; Fischer</copyright-holder><license license-type="open-access" specific-use="CC BY 4.0" xlink:href="https://creativecommons.org/licenses/by/4.0/"><ali:license_ref>https://creativecommons.org/licenses/by/4.0/</ali:license_ref><license-p>This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License, CC BY 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p></license></permissions>
<abstract>
<p>Substantial efforts have been made to develop comprehensive taxonomies of personality traits in many languages. Nevertheless, given that what is important and salient in individuals’ lived experience might be changing over time, this raises the question about the long-term usefulness of ‘off-the-shelf’ taxonomies developed decades ago. In the current study we used a bottom-up approach to create a large sample-specific taxonomy of personality terms. We subsequently examined the overlap and sensitivity of this taxonomy compared to an established trait taxonomy in the same language. Overall, we found that the two taxonomies only showed limited overlap with a pronounced divergence in emotionality (Neuroticism) and social aspects (Agreeableness) of personality. In addition to this, we found that while the personality assessment extracted from self-descriptions using the established taxonomy showed alignment with participants’ self-rated personality, especially Extraversion, Agreeableness, and Neuroticism, the sample-specific taxonomy showed a significantly greater alignment between text-based and self-rated personality. In summary, our current study highlights the need to extend our thinking about the psycholexical hypothesis, moving away from assumptions of time invariant language encoding to more explicitly recognizing temporal and sample-specific dynamics underpinning the expression and use of personality trait terms.</p>
</abstract>
<kwd-group kwd-group-type="author"><kwd>lexical hypothesis</kwd><kwd>text-based personality assessment</kwd><kwd>text mining</kwd><kwd>sample specific taxonomy</kwd><kwd>Big Five</kwd></kwd-group>

</article-meta>
</front>
<body>
	<sec sec-type="intro"><title/>
<p>In 1884, Francis <xref ref-type="bibr" rid="r28">Galton (1949)</xref> famously asked: ‘Can we discover landmarks of character to serve as bases for a survey, or is it altogether too indefinite and fluctuating to admit of measurement?’ (pp. 179–180). Galton suggested that relevant moral faculties are ‘so intermixed that they are never singly in action’ (p. 181), yet he suggested it is possible to identify the most ‘conspicuous aspects of the character’. To do so, Galton examined many pages of Roget’s Thesaurus and estimated that it contained a fully one thousand words expressive of character. With this casual statement, he started an active field of inquiry known today as the psycholexical hypothesis (<xref ref-type="bibr" rid="r4">Ashton, Lee, Perugini, et al., 2004</xref>). There is a general consensus that a small number of factors can be used to describe human personality (<xref ref-type="bibr" rid="r2">Ashton &amp; Lee, 2005</xref>; <xref ref-type="bibr" rid="r15">De Raad et al., 2010</xref>; <xref ref-type="bibr" rid="r34">Goldberg, 1993</xref>; <xref ref-type="bibr" rid="r51">Saucier et al., 2014</xref>). One of the core assumptions is that human communities will encode salient and important information about individual traits and character features in single terms in each language. Based on this assumption, taxonomies have been created that can be used to ask respondents to rate targets (<xref ref-type="bibr" rid="r1">Allport &amp; Odbert, 1936</xref>; <xref ref-type="bibr" rid="r3">Ashton, Lee, &amp; Goldberg, 2004</xref>; <xref ref-type="bibr" rid="r33">Goldberg, 1992</xref>; <xref ref-type="bibr" rid="r46">Norman, 1967</xref>; <xref ref-type="bibr" rid="r48">Saucier, 1994</xref>). However, language is dynamic and semantic content of words as well as the co-associations of individual words change over time (<xref ref-type="bibr" rid="r59">Xu et al., 2021</xref>). Consequently, it is important to ask whether taxonomies developed at some point in time with specific communities retain their usefulness over time. This is particularly important and interesting in the current social media environment with easier and wider access to user-created text that could be analysed with taxonomies as an unobtrusive measure of personality assessment (<xref ref-type="bibr" rid="r11">Boyd &amp; Pennebaker, 2017</xref>; <xref ref-type="bibr" rid="r56">Suedfeld et al., 2011</xref>). Yet, this presumes a relative time-invariance of the taxonomies, an assumption which requires examination. We report the development of a theory-driven bottom-up English taxonomy in one specific sample of native English speakers and compare self-ratings based on this sample-specific taxonomy with both a commonly used off-the-shelf taxonomy and survey-based self-ratings.</p>
<sec sec-type="other1"><title>Psychological Taxonomies to Capture Personality Traits</title>
	<p>The first comprehensive taxonomy in English was developed by <xref ref-type="bibr" rid="r1">Allport and Odbert (1936)</xref>. <xref ref-type="bibr" rid="r46">Norman (1967</xref>) empirically identified a five-factor structure based on a reduced list of taxonomy-derived ratings. Over time a consensus emerged that five or six factors are sufficient to describe the main variability underlying both self- and other ratings (<xref ref-type="bibr" rid="r3">Ashton, Lee, &amp; Goldberg, 2004</xref>; <xref ref-type="bibr" rid="r49">Saucier &amp; Goldberg, 1996</xref>). The most extensive of these taxonomies is the 1,710 personality-descriptive adjective list compiled by Goldberg (<xref ref-type="bibr" rid="r32">Goldberg, 1982</xref>), which has been a foundation for a number of factor-analytical studies (<xref ref-type="bibr" rid="r3">Ashton, Lee, &amp; Goldberg, 2004</xref>). This list was given to undergraduate students in the US and Australia (total <italic>N</italic> = 310). Although initial analyses suggested up to seven factors, the five and six factor solutions have been most widely used and the highest loading terms have been used as empirical markers for personality traits (<xref ref-type="bibr" rid="r57">Thalmayer et al., 2021</xref>; but see also <xref ref-type="bibr" rid="r50">Saucier &amp; Iurino, 2020</xref>). Our work is guided by the five-factor solution differentiating Conscientiousness, Agreeableness, Neuroticism, Openness/Intellect and Extraversion (<xref ref-type="bibr" rid="r34">Goldberg, 1993</xref>; <xref ref-type="bibr" rid="r43">McCrae &amp; John, 1992</xref>). We prefer this structure for our purposes because it is more parsimonious in describing the higher-order structure of broad personality domains and because the sixth factor (Honesty-Humility) tends to split off from within a broader Agreeableness factor (<xref ref-type="bibr" rid="r15">De Raad et al., 2010</xref>, <xref ref-type="bibr" rid="r16">2014</xref>).</p></sec>
<sec sec-type="other2"><title>Language and Semantic Change</title>
<p>One interesting question is whether the factor structure of this taxonomy has remained stable over the last 40 years and across samples. Taking some hints from emotion research (<xref ref-type="bibr" rid="r59">Xu et al., 2021</xref>), emotion terms do change in their semantic meaning, as indicated by changing co-word associations in naturally occurring text over the last 100 years. Why should we be concerned with such changes? First, to the extent that personality traits have some biological foundation (<xref ref-type="bibr" rid="r17">DeYoung, 2014</xref>; <xref ref-type="bibr" rid="r42">McAdams &amp; Pals, 2006</xref>), we should expect stability over time. At the same time, what and how we communicate important information is subject to cultural modification and transformation encoded in language (<xref ref-type="bibr" rid="r7">Bernardes et al., 2025</xref>; <xref ref-type="bibr" rid="r13">Christiansen &amp; Chater, 2016</xref>). This argument is compatible with both cross-cultural and anthropological research suggesting that information is conveyed in locally relevant ways (and thereby temporally bound), which could result in changed factor structures. Such dynamics are relevant for any sample specific structures, which also applies to factor analysis models which reveal sample specific factor solutions. Therefore, the question of replicability of such structures across samples and time periods can provide important insights into the time and sample variant and invariant components of personality structure.</p>
<p>Second, with increasing availability of text via social media that could be used for personality assessment at a distance (<xref ref-type="bibr" rid="r19">Eichstaedt et al., 2021</xref>) and the generation of large language model and chatbots (<xref ref-type="bibr" rid="r14">Cutler &amp; Condon, 2023</xref>; <xref ref-type="bibr" rid="r25">Fischer et al., 2023</xref>), one promising approach has been to rely on a bottom up analysis of text and then correlate any individual terms or combinations of terms with self-rated personality traits (<xref ref-type="bibr" rid="r11">Boyd &amp; Pennebaker, 2017</xref>). For example, the open vocabulary approach has mapped word usage in Facebook status updates to personality self-ratings (<xref ref-type="bibr" rid="r39">Kern et al., 2014</xref>). This requires identification of relevant terms. Language as a communication tool is group and age specific, with slang and ideographic word use serving as identity badges to demark group membership along social and age specific boundaries (<xref ref-type="bibr" rid="r47">Nortier &amp; Svendsen, 2015</xref>). As standard survey development exercises continue being informed by taxonomies within the lexical tradition (<xref ref-type="bibr" rid="r57">Thalmayer et al., 2021</xref>), it is important to study which personality terms are used by individuals from a specific sample to prevent incorrect results or conclusions.</p>
<p>Our interest is in identifying terms that our participants consensually use and understand to convey personality trait relevant information. We used definitions of the Big Five and asked participants to think of terms that they may use when describing an individual that is high or low on that particular trait. By using this approach, we use an explicit elicitation strategy which is transparent and participant driven and therefore, bottom up. Only terms that are salient for describing an individual with those theoretically meaningful characteristics are likely to be produced. Furthermore, by triangulating the word usage across our sample, we gain insights into the relative distribution of terms in this specific sample. Although the use of person-derived terms may seem anachronistic in times of Large Language Models and machine-learning approaches to natural text for extracting possible personality markers (<xref ref-type="bibr" rid="r30">Giannini et al., 2024</xref>), we believe that the participant-driven approach is a distinct strength over these computational methods. Generally, machine learning and transformer-based approaches need to be trained on specific corpora and rely on a number of unexamined assumptions about the stability and representativeness of the training text (<xref ref-type="bibr" rid="r6">Bender et al., 2021</xref>; <xref ref-type="bibr" rid="r37">Hu et al., 2025</xref>; <xref ref-type="bibr" rid="r44">Mehrabi et al., 2021</xref>), turning them into virtual ‘black-boxes’ that reduce transparency and replicability. For example, to what extent are descriptions of venues good proxies of personality descriptions, unless we want to make certain assumptions about how humans describe both other humans and venues (see <xref ref-type="bibr" rid="r30">Giannini et al., 2024</xref>)? The proliferation of training data derived from popular models such as ChatGPT also raises the risk of deterioration of signal (<xref ref-type="bibr" rid="r53">Shumailov et al., 2024</xref>). Using human-derived data with explicit instructions and verifying the consensus and overlap between participants provides a transparent option for creating a list of terms that participants use to describe each other. Furthermore, the black-box nature of transformed based models makes it difficult to study semantic drift over time given that it is often not easily understood and comparable how scores are calculated. Therefore, once replicated across samples and across time periods, our method offers a distinct advantage for more fine-grained contextual analyses.</p>
	<p>To evaluate how relevant those terms are, we utilized an open writing task in which participants had to describe themselves. This task allows us to compare the performance of our sample-specific taxonomy with the published taxonomy. We extracted terms from these self-descriptions and mapped them to a) our bottom-up theory-driven taxonomy and b) the taxonomy by Ashton and colleagues. We also compared the relative correlation of these two text-based scores with each other and with self-ratings using a standard psychology questionnaire (<xref ref-type="bibr" rid="r55">Soto &amp; John, 2017</xref>). Considering possible temporal change, we also examined overlap in these taxonomies—what terms are used by our sample when describing individuals high or low on a personality trait and how well are they captured by classic taxonomies developed roughly 40 years ago.</p></sec></sec>
<sec sec-type="methods"><title>Method</title>
<sec sec-type="subjects"><title>Participants</title>
<p>Our sample consisted of 317 participants who took part exchange for course credit (mean age = 19.22 years, <italic>SD</italic> = 3.08; 77.9% female). The sample size was determined by logistical constraints of running the study within the context of a university degree. Our actual sample size allowed for a minimum detectable correlation (80% power, α = .05) of <italic>r</italic> = 0.14. Our study was open to self-enrolment by the target population until a pre-specified cut-off date. All de-identifiable data is available on the open science framework (<ext-link ext-link-type="uri" xlink:href="https://osf.io/hn69f/overview">https://osf.io/hn69f/overview</ext-link>). The personal narratives of subjects are removed due to ethical considerations of anonymity.</p></sec>
<sec><title>Measures</title>
<sec><title>BFI-2</title>
<p>We used the BFI-2 to assess personality (<xref ref-type="bibr" rid="r55">Soto &amp; John, 2017</xref>). The overall scale had 60 items and participants reported their agreement with each item on a 1-(Disagree strongly) to 5-(Agree strongly) Likert-scale. Example items were “I am someone who is outgoing, sociable” and “I am someone who is compassionate, has a soft heart”. All dimensions showed high reliability: ω<sub>Extraversion</sub>: .849[.826, .872], ω<sub>Agreeableness</sub>: .828[.802, .854], ω<sub>Conscientiousness</sub>: .850[.828, .873], ω<sub>Neuroticism</sub>: .909[.895, .922], ω<sub>Openness</sub>: .817[.790, .845].</p></sec>
<sec><title>Self-Description</title>
<p>Participants were prompted with the following statement for a self-description: “We would like to ask you to describe yourself in 500 words (this is roughly a single page or 2000 characters). Who are you as a person?” The average response was 1853.09 (SD = 182.83) characters long (min = 1301, max = 2000). This self-description task was presented in a counterbalanced fashion with the BFI-2 across participants.</p></sec>
<sec><title>Personally – Relevant Personality Terms</title>
<p>To create a sample level taxonomy, participants were lastly prompted for each of the five factors of personality to submit 10 terms (5 positive and 5 negative) which they would use to label a person either high or low on this trait. These trait descriptions were based on definitions and descriptions of the big five in the literature (<xref ref-type="bibr" rid="r8">Bernardes et al., 2022</xref>; <xref ref-type="bibr" rid="r17">DeYoung, 2014</xref>; <xref ref-type="bibr" rid="r21">Fischer, 2017</xref>; <xref ref-type="bibr" rid="r55">Soto &amp; John, 2017</xref>). For example, for extraversion participants were prompted: ”Persons with high scores on Extraversion tend to be sociable and energetic in social interactions, they get a lot of energy out of being with others. What words would you use to describe such individuals to your friends?”. This task was always presented last. Overall, participants provided 3900 unique personality terms. We excluded terms with less than two characters or a frequency below three. This filtering resulted in a list of 703 unique terms. As participants were able to nominate a term for multiple categories or different participants naming a term for different categories, we assigned personality terms to a category based on their most frequent mention. We dropped terms with equal nominations across dimensions. Our final taxonomy of terms consisted of 671 terms that were commonly mentioned and clearly attributable to one of the five factors of personality. We show the full taxonomy in the supplementary material in STable 1. We show the terms excluded due to non-distinguishable double-nominations in STable 2. Terms were equally distributed across positive (<italic>N</italic> = 328) and negative terms (<italic>N</italic> = 354). Examining distributions across positive and negative terms, participants provided significantly more negative Agreeableness and negative Openness terms (χ<sup>2</sup>(4) = 13.21, <italic>p &lt;</italic> .010; see <xref ref-type="table" rid="t1">Table 1</xref>).</p>
</sec>
<sec><title>Existing Personality Taxonomy</title>
<p>We used the 1710 taxonomy (<xref ref-type="bibr" rid="r3">Ashton, Lee, &amp; Goldberg, 2004</xref>) as a starting point, but we only used trait terms that were unambiguously loading with loadings &gt; .30 and cross loadings &lt; .20 in the original study. This resulted in 405 terms. Exploratory analyses with larger word sets (which included more cross-loading terms and lower loading terms) did not substantively change the performance of this taxonomy (see footnote 2). In the final version used here, these terms were equally distributed across positive (N = 198) and negative terms (N = 207). Positive and negative terms were equally distributed within traits (χ<sup>2</sup>(4) = 3.496, p = .479; see <xref ref-type="table" rid="t1">Table 1</xref>). Importantly, this taxonomy had substantially less Openness and Neuroticism terms (see <xref ref-type="table" rid="t1">Table 1</xref>) compared to the other traits.</p>
	<table-wrap id="t1" position="anchor" orientation="portrait">
		<label>Table 1</label><caption><title>Terms in Each Taxonomy by Positive and Negative Direction</title></caption><graphic xlink:href="miss.16869-t1" position="anchor" orientation="portrait"/>
<table frame="hsides" rules="groups">
<col width="33.05%" align="left"/>
<col width="13.39%"/>
<col width="13.39%"/>
<col width="13.39%"/>
<col width="13.39%"/>
<col width="13.39%"/>
<thead>
<tr>
<th>Direction</th>
<th>A</th>
<th>C</th>
<th>E</th>
<th>N</th>
<th>O</th>
</tr>
</thead>
<tbody>
<tr>
<th colspan="6">Sample Derived</th>
</tr>
<tr>
<td style="indent">Negative</td>
<td>92</td>
<td>66</td>
<td>53</td>
<td>60</td>
<td>83</td>
</tr>
<tr>
	<td style="indent">Positive</td>
<td>58</td>
<td>75</td>
<td>59</td>
<td>76</td>
<td>60</td>
</tr>
<tr>
<th colspan="6">Historical 1710</th>
</tr>
<tr>
	<td style="indent">Negative</td>
<td>57</td>
<td>53</td>
<td>51</td>
<td>35</td>
<td>11</td>
</tr>
<tr>
	<td style="indent">Positive</td>
<td>48</td>
<td>63</td>
<td>52</td>
<td>24</td>
<td>11</td>
</tr>
</tbody>
</table>
</table-wrap></sec></sec>
<sec><title>Extraction of Term-Based Personality From Text</title>
<p>To extract the personality data from text, we first created a list based on each term corpus for the two taxonomies using the <italic>quanteda</italic> package. Prior to extraction we removed punctuation, numbers, symbols, common English stopwords, and coerced all words to lowercase to allow for unambiguous matching. For each participant we extracted the total number of words used and the personality terms matched in each taxonomy. To increase the comparability across participants we normalized each personality score for each participant by dividing it by the number of total words written and centred the score around their mean usage of personality terms.</p></sec></sec>
<sec sec-type="results"><title>Results</title>
<sec><title>Overlap of Taxonomy Terms</title>
	<p>We first examined the shared terms between our sample specific taxonomy and the off-the-shelf taxonomy. Overall, we found that the taxonomies had an overlap of 19.75%. The taxonomies had the greatest overlap for Openness (27.27%), Conscientiousness (23.28%), and Extraversion (21.36%), but we found a lower overlap for Neuroticism (15.25%) and Agreeableness (15.24%). We show the overlapping terms in Supplementary Table 3 (see <xref ref-type="bibr" rid="sp1_r2">Karl &amp; Fischer, 2026</xref>).</p></sec>
<sec><title>Overlap of Extracted Personality Between Taxonomies</title>
<p>To examine the overlap in extracted personality between the taxonomies we correlated the score of each participant across dimensions and term directions between the taxonomies. On average the taxonomies correlated at <italic>r</italic> = .28 and scores were significantly positively correlated across the taxonomies except for negative Neuroticism (we show all correlations in <xref ref-type="fig" rid="f1">Figure 1</xref>, correlation tables are available on the OSF). While some dimensions such as Extraversion had a substantial correlation <italic>r</italic> &gt; .50 for both positive and negatively valenced terms, others such as openness had a smaller correlation. For Neuroticism, positively valenced terms correlated quite strongly, whereas negatively valenced terms showed virtually no correlation. Taken together these patterns imply that the extracted personality differed substantially across the taxonomies which might be due to the terms not shared between the taxonomies. Similar taxonomy-based effects have been reported previously (<xref ref-type="bibr" rid="r7">Bernardes et al., 2025</xref>; <xref ref-type="bibr" rid="r24">Fischer et al., 2020</xref>). In other words, the terms included in taxonomies are idiosyncratic and specific taxonomy usage may result in different patterns for the same data set.</p>
	
	<fig id="f1" position="anchor" fig-type="figure" orientation="portrait"><label>Figure 1</label><caption>
			<title>Correlation Between Sample-Derived Scores and Historical 1710 Taxonomy Scores</title></caption><graphic xlink:href="miss.16869-f1" position="anchor" orientation="portrait"/></fig>

</sec>
<sec><title>Self-Report — Text-Based Personality Congruence</title>
	<p>To examine the similarity of self-ratings and text-based personality assessment we examine the correlation between participants scored personality according to each taxonomy within the text and their self-rating on the BFI-2. For ease of interpretation this was split by positive and negative terms. To confirm the robustness of the difference in correlations for dependent groups we used <xref ref-type="bibr" rid="r60">Hittner et al.’s (2003</xref>) procedure. <xref ref-type="bibr" rid="r60">Hittner et al.’s (2003</xref>) modified Z-test is a statistical procedure designed to test whether two correlation coefficients derived from the same sample differ significantly from one another. This test is necessary when comparing dependent correlations because the correlations share a common variable, violating the independence assumption required for standard correlation comparison tests. The procedure accounts for the intercorrelation between the variables being compared, adjusting the standard error to reflect the dependency structure in the data. This approach provides a more accurate assessment of whether observed differences in correlation magnitudes are statistically meaningful rather than due to chance.</p>
<p>As can be seen in <xref ref-type="table" rid="t2">Table 2</xref>, we found that while the off-the-shelf taxonomy showed small to medium correlations with self-rated personality (Mean<sub>positive terms</sub>: .124, range: .04 to .21; Mean<sub>negative terms</sub>:-.122, range: -.31 to .02), the sample specific taxonomy qualitatively outperformed it using positive and negative terms (Mean<sub>positive terms</sub>: .194, range: .11 to .24, Mean<sub>negative terms</sub>: -.118, range: -.27 to -.02). The correlations between sample-specific taxonomy scores and self-ratings significantly differed from the correlation between off-the-shelf taxonomy scores and self-ratings for positive C terms and positive O terms (all <italic>p</italic> &lt; .05).</p>
<table-wrap id="t2" position="anchor" orientation="portrait">
<label>Table 2</label><caption><title>Correlation of BFI Self-Ratings With Sample Derived or Historical Positive and Negative Terms in the Text-Based Extraction</title></caption>
<table frame="hsides" rules="groups" style="compact-1">
<col width="12%" align="left"/>
<col width="13%"/>
<col width="13%"/>
<col width="15%"/>
<col width="15%"/>
<col width="16%"/>
<col width="16%"/>
<thead>
<tr>
<th valign="bottom">Trait (self-report)</th>
	<th valign="bottom">Sample-Derived Positive</th>
	<th valign="bottom">Sample-Derived Negative</th>
	<th valign="bottom">Historical 1710 Positive</th>
	<th valign="bottom">Historical 1710 Negative</th>
	<th valign="bottom">Sample-Derived Positive (Reduced)</th>
	<th valign="bottom">Sample-Derived Negative (Reduced)</th>
</tr>
</thead>
<tbody>
<tr>
<td>E</td>
<td align="char" char=".">0.24***</td>
<td align="char" char=".">-0.27***</td>
<td align="char" char=".">0.21***</td>
<td align="char" char=".">-0.31***</td>
<td align="char" char=".">0.28***</td>
<td align="char" char=".">-0.27***</td>
</tr>
<tr>
<td>A</td>
<td align="char" char=".">0.11*</td>
<td align="char" char=".">-0.12*</td>
<td align="char" char=".">0.14**</td>
<td align="char" char=".">-0.18***</td>
<td align="char" char=".">0.11*</td>
<td align="char" char=".">-0.13**</td>
</tr>
<tr>
<td>C</td>
<td align="char" char="."><bold>0.18***</bold></td>
<td align="char" char=".">-0.10*</td>
<td align="char" char=".">0.04</td>
<td align="char" char=".">-0.09</td>
<td align="char" char=".">0.17**</td>
<td align="char" char=".">-0.10*</td>
</tr>
<tr>
<td>N</td>
<td align="char" char=".">0.20***</td>
<td align="char" char=".">-0.08</td>
<td align="char" char=".">0.17**</td>
<td align="char" char=".">-0.05</td>
<td align="char" char=".">0.17**</td>
<td align="char" char=".">-0.06</td>
</tr>
<tr>
<td>O</td>
<td align="char" char="."><bold>0.24***</bold></td>
<td align="char" char=".">-0.02</td>
<td align="char" char=".">0.06</td>
<td align="char" char=".">0.02</td>
<td>0.16** <sup>a</sup></td>
<td align="char" char=".">-0.03</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Note.</italic> Correlations in bold significantly differ at <italic>p</italic> &lt; .05 between the sample derived and historical taxonomies. Columns marked ‘reduced’ exclude terms that can be found in the researcher provided trait descriptions.</p>
<p><sup>a</sup> indicates significant differences from the original term-self report correlation.</p>
	<p>***<italic>p</italic> &lt; .001. **<italic>p</italic> &lt; .01. *<italic>p</italic> &lt; .05.</p>
</table-wrap-foot>
</table-wrap>
<p>A final analysis was to compare the overall pattern of the correlations across positive and negative terms for each taxonomy with the self-report scores. The overall correlation of the pattern was <italic>r</italic> = .91. This suggests that the correlation pattern of taxonomies with self-ratings was highly similar, pointing towards problems with specific traits instead of overall non-comparability. In this regard, it was interesting to note that positive terms showed a greater tendency to pick up participants’ self-rated personality.<xref ref-type="fn" rid="fn1"><sup>1</sup></xref><fn id="fn1"><label>1</label>
<p>We explored the difference between the full 1710 historical dictionary and our cleaned version. Overall, using the positive terms of the full version the 1710 historical dictionary showed a lower correlation with self-rated personality for Extraversion (<italic>r</italic> = .11, <italic>p</italic> &lt; .05), Agreeableness (<italic>r</italic> = .10, <italic>p</italic> = .07), Conscientiousness (<italic>r</italic> = .04, <italic>p</italic> = .39) and Openness (<italic>r</italic> = .01, <italic>p</italic> = .79), but a higher relationship for Neuroticism (<italic>r</italic> = .19, <italic>p</italic> &lt; .001) compared to the cleaned version.</p></fn> Only E showed medium sized correlations for both positive and negatively valenced terms for both taxonomies with self-reports. In contrast, N and O showed essentially zero correlations for the negative pole.</p></sec>
<sec><title>Robustness Checks</title>
<p>Half of the participants saw the Big Five measure before the free self-description task, which may have influenced their responses to either measure or subsequently nominated terms. To address the potential impact, we conducted five separate analyses. First, we examined the frequency responses between the sets of responses based on shared terms between the sets of participants that completed the free-text before and after the Big Five measures. We computed a rank-order correlation and found a significant correlation of .96, <italic>p</italic> &lt; .001, indicating a high degree of similarity in the frequency of terms nominated which were shared. Second, when using all terms nominated and computing the similarity across the imbalanced set, we find a Jaccard similarity of 58.56% indicating a substantial overlap in the specific terms nominated (even when allowing for rare terms).</p>
<p>To examine if in the subsequent trait nomination participants only replicated terms from their self-description or the Big Five measure we ran two additional analyses. First, prior to examining that participant’s self-provided personality terms were not overly overlapping with their self-descriptions, we extracted the ratio of terms nominated by participants that could also be found in their self-description. On average the overlap between self-provided terms and terms used in their description was 2.66 terms whereas the overlap with all terms provided by participants was 13.28 terms on average. This indicates that participants were substantially more likely to use terms in their self-description that were not found in their nominated terms later. Finally, to examine the possibility that participants were primed by our trait description to use specific terms, we examined the incidence ratio of a term being present in the researcher provided descriptor on its nomination by participants. Overall, we found that terms in the descriptor were nominated more often, with an incidence rate ratio of 4.19 (95% CI [3.98, 4.42], <italic>p</italic> &lt; .001), suggesting that terms from existing personality descriptor lists were approximately four times more frequently nominated by participants compared to novel terms. Nevertheless, it is difficult to conclude if this is due to the prototypicality of the selected terms or general priming. Therefore, to examine the robustness of our analysis to the exclusion of the terms in our trait description, we reran the analysis excluding terms that could be found in the description of the trait (<xref ref-type="table" rid="t2">Table 2</xref>). We only found one significant difference with the relationship of personality ratings based on extracted terms with BFI self-report weaker for positive openness, but the correlation was still substantially higher compared to using the 1,710 terms.</p></sec></sec>
<sec sec-type="discussion"><title>Discussion</title>
	<p>One of our major questions motivating the current research was whether sample specific taxonomies of personality are better at capturing participants' personality compared with self-reports than established off-the-shelf taxonomies. Overall, our study shows that sample specific taxonomies out-perform off-the-shelf taxonomies in capturing participants’ personality, especially for Conscientiousness and Openness. This is not to say that off-the-shelf taxonomies do not present a valuable research tool, especially if no sample-based taxonomy can be created due to logistical reasons (e.g., all members of the study population are deceased).</p>
<p>Our results nevertheless point to a number of challenges in this broad area going forward. The correlation between personality scores extracted from text using previously published taxonomies and sample-specific taxonomies was relatively weak on average (r = .28), corresponding to a moderate effect size for individual difference research (<xref ref-type="bibr" rid="r31">Gignac &amp; Szodorai, 2016</xref>). This may be somewhat disappointing but probably not surprising considering that the overlap in terms across taxonomies was less than 20% across all traits. Furthermore, the correlations between self-ratings using standard survey inventories and text-based scores were again low, with a slight advantage for sample-specific taxonomies. These patterns raise questions on a) whether self-reports using surveys or text-based scores provide complementary or distinct information, b) which language terms within text may be most indicative of personality traits, c) whether some traits are better detectable via text and others via self- (or other) reports and d) more broader concerns about determining the ground-truth in relation to human personality (<xref ref-type="bibr" rid="r10">Boyd et al., 2020</xref>; <xref ref-type="bibr" rid="r11">Boyd &amp; Pennebaker, 2017</xref>). We will selectively discuss some of these issues next.</p>
<p>Concerning specific patterns that stood out and may speak to the four questions just mentioned: negative Openness and Neuroticism descriptors showed very low correlations with self-reports. These low correlations are contrasted with the comparatively high correlations for negative Extraversion. This pattern raises a few intriguing possibilities. Firstly, in lexical approaches terms are used as equally weighted in their indication of the construct, which contrasts with findings that people are more likely to use positive terms compared to negative terms. At the same time, rarer terms convey more information (<xref ref-type="bibr" rid="r29">Garcia et al., 2012</xref>). There is the possibility that this frequency - information density ratio of positive and negative terms varies across traits affecting the signal ratio. Alternatively, some researchers have highlighted the complex conceptual nature of Openness (<xref ref-type="bibr" rid="r52">Schwaba &amp; Thalmayer, 2025</xref>) and the variability the trait behaviour link of Openness and Neuroticism (<xref ref-type="bibr" rid="r54">Soto, 2021</xref>), which might especially manifest in negations increasing the difficulty of signal detection. Another important point to consider is the number of terms available within a taxonomy, which may increase the ability to detect signals. For example, our sample-specific taxonomy contained more terms for Openness, which may have increased the ability to detect weak signals in text and this in turn increased the correlation with self-reports. Yet, removing marker terms significantly decreased this correlation. This again points to the importance for future research to examine the information contained in marker terms vis-à-vis the breadth of personality traits.</p>
<p>Further, our results indicate that while samples agree on a substantial corpus of personality terms, a considerable portion of taxonomy entries may be idiosyncratic. Our sample was culturally similar to the samples which were used to derive the off-the-shelf taxonomy, yet our samples were separated by roughly 40 years. Some traits such as Neuroticism and Agreeableness showed a markedly larger shift in content and performance. To speculate why these traits might have shifted more, both are related to emotional content which might show an increased rate of change over time (<xref ref-type="bibr" rid="r59">Xu et al., 2021</xref>). Alternatively, socio-cultural changes might have resulted in a different conceptual construction of these terms. Especially in light of recent studies which show an accelerating rise of cognitive distortions which are related to both interpersonal and emotion-regulation (<xref ref-type="bibr" rid="r9">Bollen et al., 2021</xref>), we may expect larger divergences in socially and emotionally focused traits. This highlights the possibility that the seemingly greater change in Neuroticism and Agreeableness terms might be temporally specific and the emergence of different cultural patterns might dampen or exacerbate this trend.</p>
	<p>In our current study we focused on the five-factor model of personality, yet this leaves open the question how other potential traits, such as Honesty-Humility within the HEXACO (<xref ref-type="bibr" rid="r4">Ashton, Lee, Perugini, et al., 2004</xref>) might perform. Honesty-Humility has been viewed as part of Agreeableness and has shown substantial correlations in some studies (<xref ref-type="bibr" rid="r15">De Raad et al., 2010</xref>). An interesting potential example of the ambiguity of meaning can be found in the way participants have labelled the term <italic>honest</italic> in our data, which has been equally classified as positive Agreeableness, negative Agreeableness, or negative Openness. In the full 1,710 wordlist the original sample rated this term equally as an indicator of Agreeableness and Conscientiousness.</p>
<p>Importantly, recent studies have challenged the universality of the big five theory (<xref ref-type="bibr" rid="r21">Fischer, 2017</xref>, <xref ref-type="bibr" rid="r22">2021</xref>; <xref ref-type="bibr" rid="r41">Laajaj et al., 2019</xref>), suggesting that different trait structures may emerge in different social and ecological settings. Our approach suggests that the terms included within the taxonomies (or items within surveys) may not be representative of the traits within those specific samples. This issue has been identified as the problem of indicator relevance and representativeness (<xref ref-type="bibr" rid="r23">Fischer &amp; Karl, 2019</xref>; <xref ref-type="bibr" rid="r27">Fontaine, 2005</xref>). The issue with the traditional lexical hypothesis is that it assumes time invariant information mapping. However, linguistic shifts do occur, and taxonomies are unlikely to remain stable. Examining the indicator relevance and representativeness problem from a lexical hypothesis perspective, we could argue that the lexical basis of this hypothesis is more aligned with temporally and sample-specific dynamic indicator-to- construct mappings. Moving away from assumptions of time invariant language encodings may open ways for a better understanding of what information is relevant to be passed on within specific language communities and how this information maps onto cognitive schema that people hold about socially relevant constructs. We believe that such an explicit recognition of temporal and sample-specific information value can open important new insights into both personality structure and personality dynamics over time (<xref ref-type="bibr" rid="r26">Fischer &amp; Rudnev, 2025</xref>).</p>
<sec><title>Limitations and Future Research Directions</title>
<p>To allow for a comparison with established trait taxonomies, our current study was limited to one specific sample in one anglophone context which is culturally and linguistically similar to the original samples used to develop and validate the off-the-shelf taxonomy. This limits our insight on change and similarity in taxonomy performance to the English language. It would be important for future studies to extend this line of research using some of the recently developed trait term taxonomies in diverse language groups and study their performance with new samples within each language group. Similarly, our study represents a specific sample, which skews mostly female and is all university students. University students represent a relatively homogeneous group in terms of cognitive ability and socioeconomic status, which could influence both personality manifestations and self-descriptions. At one level, our approach therefore highlights the benefit of tailoring terms to a specific sample, but by necessity also limits the generalizability of our findings and the resultant taxonomies to other samples. We believe that this limitation highlights a major point of the current study, namely that while existing taxonomies of personality can pick up signals of personality from text, researchers working with specific samples might benefit from expanding these taxonomies by generating bottom-up trait descriptors to capture a clearer signal in their respective sample.</p>
<p>Our study is also limited by its cross-sectional nature and relying on a single sample. Although, we can get some insight into the change of personality descriptors in presumed culturally comparable cohorts over time, it would nevertheless be an important future avenue to examine the change of personality descriptors within and across samples.</p>
<p>Our study provides initial evidence that semantic drift may influence the performance of established trait taxonomies comparing a contemporary snapshot to an existing historical dictionary derived forty years ago with the aim to provide initial insight into potential drift. To extend our understanding of semantic change comparable data should be systematically collected on bottom-up personality descriptors across multiple time periods and cohorts to examine and validate the construction of personality categories over time. Such a temporally distributed datasets would allow the use of diachronic word embeddings (<xref ref-type="bibr" rid="r36">Hamilton et al., 2018</xref>; <xref ref-type="bibr" rid="r40">Kutuzov et al., 2018</xref>) to enable researchers to track systematic shifts in the meaning, usage, and semantic associations of trait terms across historical periods in a bottom up fashion, rather than imposing ahistorical trait definitions. By applying these computational approaches to personality-relevant vocabulary collected across multiple time points, future research could directly quantify the magnitude and nature of semantic drift in trait terms, determine which descriptors remain stable versus which undergo substantial meaning shifts, and identify the cultural and linguistic factors that drive such changes.</p>
<p>Starting with a contemporary personality model such as the Big Five, we presuppose that this structure is applicable and relevant to our sample. Given the current evidence, it seems reasonable to assume the applicability in Western and highly educated samples (<xref ref-type="bibr" rid="r41">Laajaj et al., 2019</xref>; <xref ref-type="bibr" rid="r55">Soto &amp; John, 2017</xref>; <xref ref-type="bibr" rid="r58">Thalmayer et al., 2022</xref>). At the same time, individuals in other cultural and linguistic contexts may share implicit personality structures that diverge from this Big Five model identified in student samples like ours, with either fewer or more factors (<xref ref-type="bibr" rid="r12">Cheung et al., 2001</xref>; <xref ref-type="bibr" rid="r21">Fischer, 2017</xref>; <xref ref-type="bibr" rid="r35">Gurven et al., 2013</xref>; <xref ref-type="bibr" rid="r45">Nel et al., 2012</xref>; <xref ref-type="bibr" rid="r57">Thalmayer et al., 2021</xref>). This clearly requires substantive additional work, in order to identify locally meaningful personality models as well as their relevant marker terms.</p>
<p>By conducting bottom-up analyses with human populations or by using computational methods to identify period-specific word embeddings, it may be possible to identify both more time invariant (e.g., models and markers that are relative insensitive to temporal changes) and time variant personality models and descriptors. Moving beyond human derived trait lists, researchers may start with seed words from person descriptions in text from in different temporal periods and compute word embeddings of key terms identified. These word embeddings can then be further queried to map systematic changes in valence, salience or breadth (see <xref ref-type="bibr" rid="r5">Baes et al., 2024</xref>). This approach aligns with an emerging historical psychological movement (<xref ref-type="bibr" rid="r38">Jackson &amp; Atari, 2025</xref>) that seeks to understand how psychological constructs themselves evolve across historical periods, recognizing that personality traits are culturally and temporally situated phenomena (<xref ref-type="bibr" rid="r18">Du et al., 2024</xref>; <xref ref-type="bibr" rid="r24">Fischer et al., 2020</xref>). Critically, this historical approach enables more accurate and comprehensive study of psychological concepts by allowing researchers to examine temporal change and potentially time-invariant features together within a unified framework. Rather than treating historical variation as noise to be controlled away, this method treats both changing and stable aspects of personality as substantive phenomena worthy of investigation, thereby providing a more complete picture of how personality operates across both time and culture. To the extent that it is possible to identify systematic factors that influence the emergence and structuring of personality terms across time, this would open new opportunities for testing evolutionary models of personality. We see our study as a first stepping stone in this direction, which will require systematic replications and extensions across different cultural samples and time periods.</p>
<p>A further limitation that is shared by most lexical studies is the so-called ground-truth problem, that is, what scores can be considered to capture personality dynamics with the greatest accuracy and validity. We used self-report ratings as comparison standards, but other behaviour-based options need to be explored in future research (<xref ref-type="bibr" rid="r11">Boyd &amp; Pennebaker, 2017</xref>). Finally, we focused on the five factor model, which leaves an open question about stability and change in personality descriptors related to culture specific social-relational traits (<xref ref-type="bibr" rid="r20">Fetvadjiev et al., 2015</xref>).</p></sec>
<sec sec-type="conclusions"><title>Conclusion</title>
<p>In summary, our study shows that both off-the-shelf and sample-specific taxonomies can be used to extract personality information from narratives and self-descriptions, but a sample-specific taxonomy might be preferable as it exhibits greater sensitivity and shows more similar patterns to self-report measures. Our study demonstrates the need to move beyond the idea of one personality taxonomy per sample, but rather focus more study on how personality expression changes within samples over time to separate potential time-invariant descriptors of personality from descriptors idiosyncratic to a specific temporal instance of a sample.</p></sec></sec>
</body>
<back>
<ref-list><title>References</title>
<ref id="r1"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Allport</surname>, <given-names>G. W.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Odbert</surname>, <given-names>H. S.</given-names></string-name></person-group> (<year>1936</year>). <article-title>Trait-names: A psycho-lexical study.</article-title> <source>Psychological Monographs</source>, <volume>47</volume>(<issue>1</issue>), <elocation-id>i–171</elocation-id>. <pub-id pub-id-type="doi">10.1037/h0093360</pub-id></mixed-citation></ref>
<ref id="r2"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Ashton</surname>, <given-names>M. C.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Lee</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2005</year>). <article-title>The lexical approach to the study of personality structure: Toward the identification of cross-culturally replicable dimensions of personality variation.</article-title> <source>Journal of Personality Disorders</source>, <volume>19</volume>(<issue>3</issue>), <fpage>303</fpage>–<lpage>308</lpage>. <pub-id pub-id-type="doi">10.1521/pedi.2005.19.3.303</pub-id><pub-id pub-id-type="pmid">16175738</pub-id></mixed-citation></ref>
<ref id="r3"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Ashton</surname>, <given-names>M. C.</given-names></string-name>, <string-name name-style="western"><surname>Lee</surname>, <given-names>K.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Goldberg</surname>, <given-names>L. R.</given-names></string-name></person-group> (<year>2004</year>). <article-title>A hierarchical analysis of 1,710 English personality-descriptive adjectives.</article-title> <source>Journal of Personality and Social Psychology</source>, <volume>87</volume>(<issue>5</issue>), <fpage>707</fpage>–<lpage>721</lpage>. <pub-id pub-id-type="doi">10.1037/0022-3514.87.5.707</pub-id><pub-id pub-id-type="pmid">15535781</pub-id></mixed-citation></ref>
<ref id="r4"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Ashton</surname>, <given-names>M. C.</given-names></string-name>, <string-name name-style="western"><surname>Lee</surname>, <given-names>K.</given-names></string-name>, <string-name name-style="western"><surname>Perugini</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Szarota</surname>, <given-names>P.</given-names></string-name>, <string-name name-style="western"><surname>de Vries</surname>, <given-names>R. E.</given-names></string-name>, <string-name name-style="western"><surname>Di Blas</surname>, <given-names>L.</given-names></string-name>, <string-name name-style="western"><surname>Boies</surname>, <given-names>K.</given-names></string-name>, &amp; <string-name name-style="western"><surname>De Raad</surname>, <given-names>B.</given-names></string-name></person-group> (<year>2004</year>). <article-title>A six-factor structure of personality-descriptive adjectives: Solutions from psycholexical studies in seven languages.</article-title> <source>Journal of Personality and Social Psychology</source>, <volume>86</volume>(<issue>2</issue>), <fpage>356</fpage>–<lpage>366</lpage>. <pub-id pub-id-type="doi">10.1037/0022-3514.86.2.356</pub-id><pub-id pub-id-type="pmid">14769090</pub-id></mixed-citation></ref>
<ref id="r5"><mixed-citation publication-type="preprint">Baes, N., Haslam, N., &amp; Vylomova, E. (2024). <italic>A multidimensional framework for evaluating lexical semantic change with social science applications</italic>. arXiv. <pub-id pub-id-type="doi">10.18653/v1/2024.acl-long.76</pub-id></mixed-citation></ref>
<ref id="r6"><mixed-citation publication-type="confproc">Bender, E. M., Gebru, T., McMillan-Major, A., &amp; Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? 🦜. <italic>Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency</italic>, 610–623. <pub-id pub-id-type="doi">10.1145/3442188.3445922</pub-id></mixed-citation></ref>
<ref id="r7"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Bernardes</surname>, <given-names>G.</given-names></string-name>, <string-name name-style="western"><surname>Bozza</surname>, <given-names>B.</given-names></string-name>, <string-name name-style="western"><surname>Motta</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Mattos</surname>, <given-names>P.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Fischer</surname>, <given-names>R.</given-names></string-name></person-group> (<year>2025</year>). <article-title>Semantic meaning means a lot: Exploring the role of semantics in the development of a Big Five taxonomy.</article-title> <source>Journal of Research in Personality</source>, <volume>115</volume>, <elocation-id>104570</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.jrp.2024.104570</pub-id></mixed-citation></ref>
<ref id="r8"><mixed-citation publication-type="data"><person-group person-group-type="author"><string-name name-style="western"><surname>Bernardes</surname>, <given-names>G.</given-names></string-name>, <string-name name-style="western"><surname>Fischer</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Motta</surname>, <given-names>M.</given-names></string-name></person-group> (<year>2022</year>). <data-title><italic>Personality encoded in language: A theory-based dictionary in Brazilian Portuguese</italic>.</data-title> <pub-id pub-id-type="doi">10.17605/OSF.IO/QD4ET</pub-id></mixed-citation></ref>
<ref id="r9"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Bollen</surname>, <given-names>J.</given-names></string-name>, <string-name name-style="western"><surname>ten Thij</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Breithaupt</surname>, <given-names>F.</given-names></string-name>, <string-name name-style="western"><surname>Barron</surname>, <given-names>A. T. J.</given-names></string-name>, <string-name name-style="western"><surname>Rutter</surname>, <given-names>L. A.</given-names></string-name>, <string-name name-style="western"><surname>Lorenzo-Luaces</surname>, <given-names>L.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Scheffer</surname>, <given-names>M.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Historical language records reveal a surge of cognitive distortions in recent decades.</article-title> <source>Proceedings of the National Academy of Sciences of the United States of America</source>, <volume>118</volume>(<issue>30</issue>), <elocation-id>e2102061118</elocation-id>. <pub-id pub-id-type="doi">10.1073/pnas.2102061118</pub-id><pub-id pub-id-type="pmid">34301899</pub-id></mixed-citation></ref>
<ref id="r10"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Boyd</surname>, <given-names>R. L.</given-names></string-name>, <string-name name-style="western"><surname>Pasca</surname>, <given-names>P.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Lanning</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2020</year>). <article-title>The personality panorama: Conceptualizing personality through big behavioural data.</article-title> <source>European Journal of Personality</source>, <volume>34</volume>(<issue>5</issue>), <fpage>599</fpage>–<lpage>612</lpage>. <pub-id pub-id-type="doi">10.1002/per.2254</pub-id></mixed-citation></ref>
<ref id="r11"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Boyd</surname>, <given-names>R. L.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Pennebaker</surname>, <given-names>J. W.</given-names></string-name></person-group> (<year>2017</year>). <article-title>Language-based personality: A new approach to personality in a digital world.</article-title> <source>Current Opinion in Behavioral Sciences</source>, <volume>18</volume>, <fpage>63</fpage>–<lpage>68</lpage>. <pub-id pub-id-type="doi">10.1016/j.cobeha.2017.07.017</pub-id></mixed-citation></ref>
<ref id="r12"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Cheung</surname>, <given-names>F.</given-names></string-name>, <string-name name-style="western"><surname>Leung</surname>, <given-names>K.</given-names></string-name>, <string-name name-style="western"><surname>Zhang</surname>, <given-names>J.-X.</given-names></string-name>, <string-name name-style="western"><surname>Sun</surname>, <given-names>H.-F.</given-names></string-name>, <string-name name-style="western"><surname>Gan</surname>, <given-names>Y.-Q.</given-names></string-name>, <string-name name-style="western"><surname>Song</surname>, <given-names>W.-Z.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Xie</surname>, <given-names>D.</given-names></string-name></person-group> (<year>2001</year>). <article-title>Indigenous Chinese personality constructs: Is the five-factor model complete?</article-title> <source>Journal of Cross-Cultural Psychology</source>, <volume>32</volume>(<issue>4</issue>), <fpage>407</fpage>–<lpage>433</lpage>. <pub-id pub-id-type="doi">10.1177/0022022101032004003</pub-id></mixed-citation></ref>
<ref id="r13"><mixed-citation publication-type="book">Christiansen, M. H., &amp; Chater, N. (2016). <italic>Creating language: Integrating evolution, acquisition, and processing</italic>. MIT Press.</mixed-citation></ref>
<ref id="r14"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Cutler</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Condon</surname>, <given-names>D. M.</given-names></string-name></person-group> (<year>2023</year>). <article-title>Deep lexical hypothesis: Identifying personality structure in natural language.</article-title> <source>Journal of Personality and Social Psychology</source>, <volume>125</volume>(<issue>1</issue>), <fpage>173</fpage>–<lpage>197</lpage>. <pub-id pub-id-type="doi">10.1037/pspp0000443</pub-id></mixed-citation></ref>
<ref id="r15"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>De Raad</surname>, <given-names>B.</given-names></string-name>, <string-name name-style="western"><surname>Barelds</surname>, <given-names>D. P. H.</given-names></string-name>, <string-name name-style="western"><surname>Levert</surname>, <given-names>E.</given-names></string-name>, <string-name name-style="western"><surname>Ostendorf</surname>, <given-names>F.</given-names></string-name>, <string-name name-style="western"><surname>Mlacić</surname>, <given-names>B.</given-names></string-name>, <string-name name-style="western"><surname>Di Blas</surname>, <given-names>L.</given-names></string-name>, <string-name name-style="western"><surname>Hrebícková</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Szirmák</surname>, <given-names>Z.</given-names></string-name>, <string-name name-style="western"><surname>Szarota</surname>, <given-names>P.</given-names></string-name>, <string-name name-style="western"><surname>Perugini</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Church</surname>, <given-names>A. T.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Katigbak</surname>, <given-names>M. S.</given-names></string-name></person-group> (<year>2010</year>). <article-title>Only three factors of personality description are fully replicable across languages: A comparison of 14 trait taxonomies.</article-title> <source>Journal of Personality and Social Psychology</source>, <volume>98</volume>(<issue>1</issue>), <fpage>160</fpage>–<lpage>173</lpage>. <pub-id pub-id-type="doi">10.1037/a0017184</pub-id><pub-id pub-id-type="pmid">20053040</pub-id></mixed-citation></ref>
<ref id="r16"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>De Raad</surname>, <given-names>B.</given-names></string-name>, <string-name name-style="western"><surname>Barelds</surname>, <given-names>D. P. H.</given-names></string-name>, <string-name name-style="western"><surname>Timmerman</surname>, <given-names>M. E.</given-names></string-name>, <string-name name-style="western"><surname>De Roover</surname>, <given-names>K.</given-names></string-name>, <string-name name-style="western"><surname>Mlačić</surname>, <given-names>B.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Church</surname>, <given-names>A. T.</given-names></string-name></person-group> (<year>2014</year>). <article-title>Towards A Pan–Cultural Personality Structure: Input from 11 Psycholexical Studies.</article-title> <source>European Journal of Personality</source>, <volume>28</volume>(<issue>5</issue>), <fpage>497</fpage>–<lpage>510</lpage>. <pub-id pub-id-type="doi">10.1002/per.1953</pub-id></mixed-citation></ref>
<ref id="r17"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>DeYoung</surname>, <given-names>C. G.</given-names></string-name></person-group> (<year>2014</year>). <article-title>A cybernetic Big Five theory for personality psychology.</article-title> <source>Personality and Individual Differences</source>, <volume>60</volume>, <fpage>S18</fpage>. <pub-id pub-id-type="doi">10.1016/j.paid.2013.07.381</pub-id></mixed-citation></ref>
<ref id="r18"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Du</surname>, <given-names>A. H.</given-names></string-name>, <string-name name-style="western"><surname>Karl</surname>, <given-names>J. A.</given-names></string-name>, <string-name name-style="western"><surname>Fetvadjiev</surname>, <given-names>V.</given-names></string-name>, <string-name name-style="western"><surname>Luczak-Roesch</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Pirngruber</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Fischer</surname>, <given-names>R.</given-names></string-name></person-group> (<year>2024</year>). <article-title>Tracing the evolution of personality cognition in early human civilisations: A computational analysis of the Gilgamesh epic.</article-title> <source>European Journal of Personality</source>, <volume>38</volume>(<issue>2</issue>), <fpage>274</fpage>–<lpage>290</lpage>. <pub-id pub-id-type="doi">10.1177/08902070231161869</pub-id></mixed-citation></ref>
<ref id="r19"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Eichstaedt</surname>, <given-names>J. C.</given-names></string-name>, <string-name name-style="western"><surname>Kern</surname>, <given-names>M. L.</given-names></string-name>, <string-name name-style="western"><surname>Yaden</surname>, <given-names>D. B.</given-names></string-name>, <string-name name-style="western"><surname>Schwartz</surname>, <given-names>H. A.</given-names></string-name>, <string-name name-style="western"><surname>Giorgi</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Park</surname>, <given-names>G.</given-names></string-name>, <string-name name-style="western"><surname>Hagan</surname>, <given-names>C. A.</given-names></string-name>, <string-name name-style="western"><surname>Tobolsky</surname>, <given-names>V. A.</given-names></string-name>, <string-name name-style="western"><surname>Smith</surname>, <given-names>L. K.</given-names></string-name>, <string-name name-style="western"><surname>Buffone</surname>, <given-names>A.</given-names></string-name>, <string-name name-style="western"><surname>Iwry</surname>, <given-names>J.</given-names></string-name>, <string-name name-style="western"><surname>Seligman</surname>, <given-names>M. E. P.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Ungar</surname>, <given-names>L. H.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Closed- and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations.</article-title> <source>Psychological Methods</source>, <volume>26</volume>(<issue>4</issue>), <fpage>398</fpage>–<lpage>427</lpage>. <pub-id pub-id-type="doi">10.1037/met0000349</pub-id><pub-id pub-id-type="pmid">34726465</pub-id></mixed-citation></ref>
<ref id="r20"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Fetvadjiev</surname>, <given-names>V. H.</given-names></string-name>, <string-name name-style="western"><surname>Meiring</surname>, <given-names>D.</given-names></string-name>, <string-name name-style="western"><surname>van de Vijver</surname>, <given-names>F. J. R.</given-names></string-name>, <string-name name-style="western"><surname>Nel</surname>, <given-names>J. A.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Hill</surname>, <given-names>C.</given-names></string-name></person-group> (<year>2015</year>). <article-title>The South African Personality Inventory (SAPI): A culture-informed instrument for the country’s main ethnocultural groups.</article-title> <source>Psychological Assessment</source>, <volume>27</volume>(<issue>3</issue>), <fpage>827</fpage>–<lpage>837</lpage>. <pub-id pub-id-type="doi">10.1037/pas0000078</pub-id><pub-id pub-id-type="pmid">25602691</pub-id></mixed-citation></ref>
<ref id="r21"><mixed-citation publication-type="book">Fischer, R. (2017). <italic>Personality, values, culture: An evolutionary approach</italic>. Cambridge University Press. <pub-id pub-id-type="doi">10.1017/9781316091944</pub-id></mixed-citation></ref>
<ref id="r22"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Fischer</surname>, <given-names>R.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Alternative four-factor structure of the Mini-IPIP in Thailand.</article-title> <source>International Journal of Personality Psychology</source>, <volume>7</volume>, <fpage>35</fpage>–<lpage>42</lpage>. <pub-id pub-id-type="doi">10.21827/ijpp.7.37978</pub-id></mixed-citation></ref>
<ref id="r23"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Fischer</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Karl</surname>, <given-names>J. A.</given-names></string-name></person-group> (<year>2019</year>). <article-title>A primer to (cross-cultural) multi-group invariance testing possibilities in R.</article-title> <source>Frontiers in Psychology</source>, <volume>10</volume>, <elocation-id>1507</elocation-id>. <pub-id pub-id-type="doi">10.3389/fpsyg.2019.01507</pub-id><pub-id pub-id-type="pmid">31379641</pub-id></mixed-citation></ref>
<ref id="r24"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Fischer</surname>, <given-names>R.</given-names></string-name>, <string-name name-style="western"><surname>Karl</surname>, <given-names>J. A.</given-names></string-name>, <string-name name-style="western"><surname>Luczak–Roesch</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Fetvadjiev</surname>, <given-names>V. H.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Grener</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2020</year>). <article-title>Tracing personality structure in narratives: A computational bottom–up approach to unpack writers, characters, and personality in historical context.</article-title> <source>European Journal of Personality</source>, <volume>34</volume>(<issue>5</issue>), <fpage>917</fpage>–<lpage>943</lpage>. <pub-id pub-id-type="doi">10.1002/per.2270</pub-id></mixed-citation></ref>
<ref id="r25"><mixed-citation publication-type="preprint">Fischer, R., Luczak-Roesch, M., &amp; Karl, J. A. (2023). <italic>What does ChatGPT return about human values? Exploring value bias in ChatGPT using a descriptive value theory</italic>. arXiv. <pub-id pub-id-type="doi">10.48550/arXiv.2304.03612</pub-id></mixed-citation></ref>
<ref id="r26"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Fischer</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Rudnev</surname>, <given-names>M.</given-names></string-name></person-group> (<year>2025</year>). <article-title>From MIsgivings to MIse-en-scène: The role of invariance in personality science.</article-title> <source>European Journal of Personality</source>, <volume>39</volume>(<issue>4</issue>), <fpage>662</fpage>-<lpage>673</lpage>. <pub-id pub-id-type="doi">10.1177/08902070241283081</pub-id></mixed-citation></ref>
<ref id="r27"><mixed-citation publication-type="book">Fontaine, J. R. J. (2005). Equivalence. In <italic>Encyclopedia of Social Measurement</italic> (Vol. 1, pp. 803–813).</mixed-citation></ref>
<ref id="r28"><mixed-citation publication-type="book">Galton, F. (1949). The measurement of character. In W. Dennis (Ed.), <italic>Readings in general psychology</italic> (pp. 435–444). Prentice-Hall. <pub-id pub-id-type="doi">10.1037/11352-058</pub-id></mixed-citation></ref>
<ref id="r29"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Garcia</surname>, <given-names>D.</given-names></string-name>, <string-name name-style="western"><surname>Garas</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Schweitzer</surname>, <given-names>F.</given-names></string-name></person-group> (<year>2012</year>). <article-title>Positive words carry less information than negative words.</article-title> <source>EPJ Data Science</source>, <volume>1</volume>(<issue>1</issue>), <elocation-id>3</elocation-id>. <pub-id pub-id-type="doi">10.1140/epjds3</pub-id></mixed-citation></ref>
<ref id="r30"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Giannini</surname>, <given-names>F.</given-names></string-name>, <string-name name-style="western"><surname>Marelli</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Stella</surname>, <given-names>F.</given-names></string-name>, <string-name name-style="western"><surname>Monzani</surname>, <given-names>D.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Pancani</surname>, <given-names>L.</given-names></string-name></person-group> (<year>2024</year>). <article-title>Surfing the OCEAN: The machine learning psycholexical approach 2.0 to detect personality traits in texts.</article-title> <source>Journal of Personality</source>, <volume>92</volume>(<issue>6</issue>), <fpage>1602</fpage>–<lpage>1615</lpage>. <pub-id pub-id-type="doi">10.1111/jopy.12915</pub-id><pub-id pub-id-type="pmid">38217359</pub-id></mixed-citation></ref>
<ref id="r31"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Gignac</surname>, <given-names>G. E.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Szodorai</surname>, <given-names>E. T.</given-names></string-name></person-group> (<year>2016</year>). <article-title>Effect size guidelines for individual differences researchers.</article-title> <source>Personality and Individual Differences</source>, <volume>102</volume>, <fpage>74</fpage>–<lpage>78</lpage>. <pub-id pub-id-type="doi">10.1016/j.paid.2016.06.069</pub-id></mixed-citation></ref>
<ref id="r32"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Goldberg</surname>, <given-names>L. R.</given-names></string-name></person-group> (<year>1982</year>). <article-title>From Ace to Zombie: Some explorations in the language of personality.</article-title> <source>Advances in Personality Assessment</source>, <volume>1</volume>, <fpage>203</fpage>–<lpage>234</lpage>.</mixed-citation></ref>
<ref id="r33"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Goldberg</surname>, <given-names>L. R.</given-names></string-name></person-group> (<year>1992</year>). <article-title>The development of markers for the Big-Five factor structure.</article-title> <source>Psychological Assessment</source>, <volume>4</volume>(<issue>1</issue>), <fpage>26</fpage>–<lpage>42</lpage>. <pub-id pub-id-type="doi">10.1037/1040-3590.4.1.26</pub-id></mixed-citation></ref>
<ref id="r34"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Goldberg</surname>, <given-names>L. R.</given-names></string-name></person-group> (<year>1993</year>). <article-title>The structure of phenotypic personality traits.</article-title> <source>The American Psychologist</source>, <volume>48</volume>(<issue>1</issue>), <fpage>26</fpage>–<lpage>34</lpage>. <pub-id pub-id-type="doi">10.1037/0003-066X.48.1.26</pub-id><pub-id pub-id-type="pmid">8427480</pub-id></mixed-citation></ref>
<ref id="r35"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Gurven</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>von Rueden</surname>, <given-names>C.</given-names></string-name>, <string-name name-style="western"><surname>Massenkoff</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Kaplan</surname>, <given-names>H.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Lero Vie</surname>, <given-names>M.</given-names></string-name></person-group> (<year>2013</year>). <article-title>How universal is the Big Five? Testing the five-factor model of personality variation among forager-farmers in the Bolivian Amazon.</article-title> <source>Journal of Personality and Social Psychology</source>, <volume>104</volume>(<issue>2</issue>), <fpage>354</fpage>–<lpage>370</lpage>. <pub-id pub-id-type="doi">10.1037/a0030841</pub-id><pub-id pub-id-type="pmid">23245291</pub-id></mixed-citation></ref>
<ref id="r36"><mixed-citation publication-type="preprint">Hamilton, W. L., Leskovec, J., &amp; Jurafsky, D. (2018). <italic>Diachronic word embeddings reveal statistical laws of semantic change</italic>. arXiv. <pub-id pub-id-type="doi">10.48550/arXiv.1605.09096</pub-id></mixed-citation></ref>
	
	<ref id="r60"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Hittner</surname>, <given-names>J. B.</given-names></string-name>, <string-name name-style="western"><surname>May</surname>, <given-names>K.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Silver</surname>, <given-names>N. C.</given-names></string-name></person-group> (<year>2003</year>). <article-title>A Monte Carlo evaluation of tests for comparing dependent correlations.</article-title> <source> The Journal of General Psychology</source>, <volume>130</volume>(<issue>2</issue>, <fpage>149</fpage>-<lpage>168</lpage>. <pub-id pub-id-type="doi">10.1080/00221300309601282</pub-id></mixed-citation></ref>
	
<ref id="r37"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Hu</surname>, <given-names>T.</given-names></string-name>, <string-name name-style="western"><surname>Kyrychenko</surname>, <given-names>Y.</given-names></string-name>, <string-name name-style="western"><surname>Rathje</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Collier</surname>, <given-names>N.</given-names></string-name>, <string-name name-style="western"><surname>van der Linden</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Roozenbeek</surname>, <given-names>J.</given-names></string-name></person-group> (<year>2025</year>). <article-title>Generative language models exhibit social identity biases.</article-title> <source>Nature Computational Science</source>, <volume>5</volume>(<issue>1</issue>), <fpage>65</fpage>–<lpage>75</lpage>. <pub-id pub-id-type="doi">10.1038/s43588-024-00741-1</pub-id><pub-id pub-id-type="pmid">39668254</pub-id></mixed-citation></ref>
<ref id="r38"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Jackson</surname>, <given-names>J. C.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Atari</surname>, <given-names>M.</given-names></string-name></person-group> (<year>2025</year>). <article-title>Historical psychology: How the events of yesterday shaped the minds of today.</article-title> <source>Current Research in Ecological and Social Psychology</source>, <volume>9</volume>, <elocation-id>100247</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.cresp.2025.100247</pub-id></mixed-citation></ref>
<ref id="r39"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Kern</surname>, <given-names>M. L.</given-names></string-name>, <string-name name-style="western"><surname>Eichstaedt</surname>, <given-names>J. C.</given-names></string-name>, <string-name name-style="western"><surname>Schwartz</surname>, <given-names>H. A.</given-names></string-name>, <string-name name-style="western"><surname>Dziurzynski</surname>, <given-names>L.</given-names></string-name>, <string-name name-style="western"><surname>Ungar</surname>, <given-names>L. H.</given-names></string-name>, <string-name name-style="western"><surname>Stillwell</surname>, <given-names>D. J.</given-names></string-name>, <string-name name-style="western"><surname>Kosinski</surname>, <given-names>M.</given-names></string-name>, <string-name name-style="western"><surname>Ramones</surname>, <given-names>S. M.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Seligman</surname>, <given-names>M. E. P.</given-names></string-name></person-group> (<year>2014</year>). <article-title>The online social self: An open vocabulary approach to personality.</article-title> <source>Assessment</source>, <volume>21</volume>(<issue>2</issue>), <fpage>158</fpage>–<lpage>169</lpage>. <pub-id pub-id-type="doi">10.1177/1073191113514104</pub-id><pub-id pub-id-type="pmid">24322010</pub-id></mixed-citation></ref>
<ref id="r40"><mixed-citation publication-type="book">Kutuzov, A., Øvrelid, L., Szymanski, T., &amp; Velldal, E. (2018). Diachronic word embeddings and semantic shifts: A survey. In E. M. Bender, L. Derczynski, &amp; P. Isabelle (Eds.), <italic>Proceedings of the 27th International Conference on Computational Linguistics</italic> (pp. 1384–1397). Association for Computational Linguistics. <ext-link ext-link-type="uri" xlink:href="https://aclanthology.org/C18-1117/">https://aclanthology.org/C18-1117/</ext-link></mixed-citation></ref>
<ref id="r41"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Laajaj</surname>, <given-names>R.</given-names></string-name>, <string-name name-style="western"><surname>Macours</surname>, <given-names>K.</given-names></string-name>, <string-name name-style="western"><surname>Pinzon Hernandez</surname>, <given-names>D. A.</given-names></string-name>, <string-name name-style="western"><surname>Arias</surname>, <given-names>O.</given-names></string-name>, <string-name name-style="western"><surname>Gosling</surname>, <given-names>S. D.</given-names></string-name>, <string-name name-style="western"><surname>Potter</surname>, <given-names>J.</given-names></string-name>, <string-name name-style="western"><surname>Rubio-Codina</surname>, <given-names>M.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Vakis</surname>, <given-names>R.</given-names></string-name></person-group> (<year>2019</year>). <article-title>Challenges to capture the big five personality traits in non-WEIRD populations.</article-title> <source>Science Advances</source>, <volume>5</volume>(<issue>7</issue>), <elocation-id>eaaw5226</elocation-id>. <pub-id pub-id-type="doi">10.1126/sciadv.aaw5226</pub-id><pub-id pub-id-type="pmid">31309152</pub-id></mixed-citation></ref>
<ref id="r42"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>McAdams</surname>, <given-names>D. P.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Pals</surname>, <given-names>J. L.</given-names></string-name></person-group> (<year>2006</year>). <article-title>A new Big Five: Fundamental principles for an integrative science of personality.</article-title> <source>The American Psychologist</source>, <volume>61</volume>(<issue>3</issue>), <fpage>204</fpage>–<lpage>217</lpage>. <pub-id pub-id-type="doi">10.1037/0003-066X.61.3.204</pub-id><pub-id pub-id-type="pmid">16594837</pub-id></mixed-citation></ref>
<ref id="r43"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>McCrae</surname>, <given-names>R. R.</given-names></string-name>, &amp; <string-name name-style="western"><surname>John</surname>, <given-names>O. P.</given-names></string-name></person-group> (<year>1992</year>). <article-title>An introduction to the five-factor model and its applications.</article-title> <source>Journal of Personality</source>, <volume>60</volume>(<issue>2</issue>), <fpage>175</fpage>–<lpage>215</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-6494.1992.tb00970.x</pub-id><pub-id pub-id-type="pmid">1635039</pub-id></mixed-citation></ref>
<ref id="r44"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Mehrabi</surname>, <given-names>N.</given-names></string-name>, <string-name name-style="western"><surname>Morstatter</surname>, <given-names>F.</given-names></string-name>, <string-name name-style="western"><surname>Saxena</surname>, <given-names>N.</given-names></string-name>, <string-name name-style="western"><surname>Lerman</surname>, <given-names>K.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Galstyan</surname>, <given-names>A.</given-names></string-name></person-group> (<year>2021</year>). <article-title>A Survey on Bias and Fairness in Machine Learning.</article-title> <source>ACM Computing Surveys</source><italic>,</italic> <volume>54</volume>(<issue>6</issue>), <elocation-id>115</elocation-id>. <pub-id pub-id-type="doi">10.1145/3457607</pub-id></mixed-citation></ref>
<ref id="r45"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Nel</surname>, <given-names>J. A.</given-names></string-name>, <string-name name-style="western"><surname>Valchev</surname>, <given-names>V. H.</given-names></string-name>, <string-name name-style="western"><surname>Rothmann</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>van de Vijver</surname>, <given-names>F. J. R.</given-names></string-name>, <string-name name-style="western"><surname>Meiring</surname>, <given-names>D.</given-names></string-name>, &amp; <string-name name-style="western"><surname>de Bruin</surname>, <given-names>G. P.</given-names></string-name></person-group> (<year>2012</year>). <article-title>Exploring the personality structure in the 11 languages of South Africa.</article-title> <source>Journal of Personality</source>, <volume>80</volume>(<issue>4</issue>), <fpage>915</fpage>–<lpage>948</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-6494.2011.00751.x</pub-id><pub-id pub-id-type="pmid">22091948</pub-id></mixed-citation></ref>
<ref id="r46"><mixed-citation publication-type="book">Norman, W. T. (1967). <italic>2800 personality trait descriptors: Normative operating characteristics for a university population</italic>. University of Michigan, Dept. of Psychology.</mixed-citation></ref>
<ref id="r47"><mixed-citation publication-type="book">Nortier, J., &amp; Svendsen, B. A. (Eds.). (2015). <italic>Language, youth and identity in the 21st century: Linguistic practices across urban spaces</italic>. Cambridge University Press. <pub-id pub-id-type="doi">10.1017/CBO9781139061896</pub-id></mixed-citation></ref>
<ref id="r48"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Saucier</surname>, <given-names>G.</given-names></string-name></person-group> (<year>1994</year>). <article-title>Mini-Markers: A brief version of Goldberg’s unipolar Big-Five markers.</article-title> <source>Journal of Personality Assessment</source>, <volume>63</volume>(<issue>3</issue>), <fpage>506</fpage>–<lpage>516</lpage>. <pub-id pub-id-type="doi">10.1207/s15327752jpa6303_8</pub-id><pub-id pub-id-type="pmid">7844738</pub-id></mixed-citation></ref>
<ref id="r49"><mixed-citation publication-type="book">Saucier, G., &amp; Goldberg, L. R. (1996). The language of personality: Lexical perspectives on the five-factor model. In J. S. Wiggins (Ed.), <italic>The five-factor model of personality: Theoretical perspectives</italic> (pp. 21–50). Guilford Press.</mixed-citation></ref>
<ref id="r50"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Saucier</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Iurino</surname>, <given-names>K.</given-names></string-name></person-group> (<year>2020</year>). <article-title>High-dimensionality personality structure in the natural language: Further analyses of classic sets of English-language trait-adjectives.</article-title> <source>Journal of Personality and Social Psychology</source>, <volume>119</volume>(<issue>5</issue>), <fpage>1188</fpage>–<lpage>1219</lpage>. <pub-id pub-id-type="doi">10.1037/pspp0000273</pub-id><pub-id pub-id-type="pmid">31714107</pub-id></mixed-citation></ref>
<ref id="r51"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Saucier</surname>, <given-names>G.</given-names></string-name>, <string-name name-style="western"><surname>Thalmayer</surname>, <given-names>A. G.</given-names></string-name>, <string-name name-style="western"><surname>Payne</surname>, <given-names>D. L.</given-names></string-name>, <string-name name-style="western"><surname>Carlson</surname>, <given-names>R.</given-names></string-name>, <string-name name-style="western"><surname>Sanogo</surname>, <given-names>L.</given-names></string-name>, <string-name name-style="western"><surname>Ole-Kotikash</surname>, <given-names>L.</given-names></string-name>, <string-name name-style="western"><surname>Church</surname>, <given-names>A. T.</given-names></string-name>, <string-name name-style="western"><surname>Katigbak</surname>, <given-names>M. S.</given-names></string-name>, <string-name name-style="western"><surname>Somer</surname>, <given-names>O.</given-names></string-name>, <string-name name-style="western"><surname>Szarota</surname>, <given-names>P.</given-names></string-name>, <string-name name-style="western"><surname>Szirmák</surname>, <given-names>Z.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Zhou</surname>, <given-names>X.</given-names></string-name></person-group> (<year>2014</year>). <article-title>A basic bivariate structure of personality attributes evident across nine languages.</article-title> <source>Journal of Personality</source>, <volume>82</volume>(<issue>1</issue>), <fpage>1</fpage>–<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1111/jopy.12028</pub-id><pub-id pub-id-type="pmid">23301793</pub-id></mixed-citation></ref>
<ref id="r52"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Schwaba</surname>, <given-names>T.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Thalmayer</surname>, <given-names>A. G.</given-names></string-name></person-group> (<year>2025</year>). <article-title>Openness/Intellect: A unique trait requires unique considerations.</article-title> <source>Personality and Social Psychology Review</source>. <comment>Advance online publication</comment>. <pub-id pub-id-type="doi">10.1177/10888683251377227</pub-id><pub-id pub-id-type="pmid">41199691</pub-id></mixed-citation></ref>
<ref id="r53"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Shumailov</surname>, <given-names>I.</given-names></string-name>, <string-name name-style="western"><surname>Shumaylov</surname>, <given-names>Z.</given-names></string-name>, <string-name name-style="western"><surname>Zhao</surname>, <given-names>Y.</given-names></string-name>, <string-name name-style="western"><surname>Papernot</surname>, <given-names>N.</given-names></string-name>, <string-name name-style="western"><surname>Anderson</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Gal</surname>, <given-names>Y.</given-names></string-name></person-group> (<year>2024</year>). <article-title>AI models collapse when trained on recursively generated data.</article-title> <source>Nature</source>, <volume>631</volume>(<issue>8022</issue>), <fpage>755</fpage>–<lpage>759</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-024-07566-y</pub-id><pub-id pub-id-type="pmid">39048682</pub-id></mixed-citation></ref>
<ref id="r54"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Soto</surname>, <given-names>C. J.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Do links between personality and life outcomes generalize? Testing the robustness of trait–outcome associations across gender, age, ethnicity, and analytic approaches.</article-title> <source>Social Psychological &amp; Personality Science</source>, <volume>12</volume>(<issue>1</issue>), <fpage>118</fpage>–<lpage>130</lpage>. <pub-id pub-id-type="doi">10.1177/1948550619900572</pub-id></mixed-citation></ref>
<ref id="r55"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Soto</surname>, <given-names>C. J.</given-names></string-name>, &amp; <string-name name-style="western"><surname>John</surname>, <given-names>O. P.</given-names></string-name></person-group> (<year>2017</year>). <article-title>The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power.</article-title> <source>Journal of Personality and Social Psychology</source>, <volume>113</volume>(<issue>1</issue>), <fpage>117</fpage>–<lpage>143</lpage>. <pub-id pub-id-type="doi">10.1037/pspp0000096</pub-id><pub-id pub-id-type="pmid">27055049</pub-id></mixed-citation></ref>
<ref id="r56"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Suedfeld</surname>, <given-names>P.</given-names></string-name>, <string-name name-style="western"><surname>Cross</surname>, <given-names>R. W.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Brcic</surname>, <given-names>J.</given-names></string-name></person-group> (<year>2011</year>). <article-title>Two years of ups and downs: Barack Obama’s patterns of integrative complexity, motive imagery, and values.</article-title> <source>Political Psychology</source>, <volume>32</volume>(<issue>6</issue>), <fpage>1007</fpage>–<lpage>1033</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-9221.2011.00850.x</pub-id></mixed-citation></ref>
<ref id="r57"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Thalmayer</surname>, <given-names>A. G.</given-names></string-name>, <string-name name-style="western"><surname>Job</surname>, <given-names>S.</given-names></string-name>, <string-name name-style="western"><surname>Shino</surname>, <given-names>E. N.</given-names></string-name>, <string-name name-style="western"><surname>Robinson</surname>, <given-names>S. L.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Saucier</surname>, <given-names>G.</given-names></string-name></person-group> (<year>2021</year>). <article-title>ǂŪsigu: A mixed-method lexical study of character description in Khoekhoegowab.</article-title> <source>Journal of Personality and Social Psychology</source>, <volume>121</volume>(<issue>6</issue>), <fpage>1258</fpage>–<lpage>1283</lpage>. <pub-id pub-id-type="doi">10.1037/pspp0000372</pub-id><pub-id pub-id-type="pmid">33252975</pub-id></mixed-citation></ref>
<ref id="r58"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Thalmayer</surname>, <given-names>A. G.</given-names></string-name>, <string-name name-style="western"><surname>Saucier</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Rotzinger</surname>, <given-names>J. S.</given-names></string-name></person-group> (<year>2022</year>). <article-title>Absolutism, relativism, and universalism in personality traits across cultures: The case of the Big Five.</article-title> <source>Journal of Cross-Cultural Psychology</source>, <volume>53</volume>(<issue>7–8</issue>), <fpage>935</fpage>–<lpage>956</lpage>. <pub-id pub-id-type="doi">10.1177/00220221221111813</pub-id></mixed-citation></ref>
<ref id="r59"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name name-style="western"><surname>Xu</surname>, <given-names>A.</given-names></string-name>, <string-name name-style="western"><surname>Stellar</surname>, <given-names>J. E.</given-names></string-name>, &amp; <string-name name-style="western"><surname>Xu</surname>, <given-names>Y.</given-names></string-name></person-group> (<year>2021</year>). <article-title>Evolution of emotion semantics.</article-title> <source>Cognition</source>, <volume>217</volume>, <elocation-id>104875</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.cognition.2021.104875</pub-id><pub-id pub-id-type="pmid">34403985</pub-id></mixed-citation></ref>
	
	
	
	
</ref-list>
	<sec sec-type="data-availability" id="das"><title>Data Availability</title>
		<p>For this article, data is freely available (see <xref ref-type="bibr" rid="sp1_r1">Karl &amp; Fischer, 2022</xref>).</p>
	</sec>	

	
	
	
	<sec sec-type="supplementary-material" id="sp1"><title>Supplementary Materials</title>
		
		
		<p>For this article, the following Supplementary Materials are available:</p>
		
		
		<list list-type="bullet">
			<list-item><p>Data (see <xref ref-type="bibr" rid="sp1_r1">Karl &amp; Fischer, 2022</xref>)</p></list-item>
			<list-item><p>Additional analyses and results. STable 1 presents the full participant-provided personality trait dictionary, listing terms generated by participants for each Big Five dimension (Agreeableness, Conscientiousness, Extraversion, Neuroticism, and Openness) organised by positive and negative valence. STable 2 reports terms that were excluded from the dictionary due to double nominations across multiple facets. STable 3 lists overlapping terms between the participant-provided dictionary and existing personality taxonomies. The document also includes robustness checks examining task order effects on the convergent validity correlations, including a random-effects mini-meta-analysis and an accompanying forest plot (SFigure 1) (see <xref ref-type="bibr" rid="sp1_r2">Karl &amp; Fischer, 2026</xref>)</p></list-item>
		</list>
		
		
		<ref-list content-type="supplementary-material" id="suppl-ref-list">
			<ref id="sp1_r1">
				<mixed-citation publication-type="supplementary-material">
					<person-group person-group-type="author">
							<name name-style="western">
								<surname>Karl</surname>
								<given-names>J. A.</given-names>
							</name>
							<name name-style="western">
								<surname>Fischer</surname>
								<given-names>R.</given-names>
							</name>
					</person-group> (<year>2022</year>). <source>The performance of off-the-shelf and population derived lexica in extracting implicit personality from self-descriptions</source> <comment>[Data]</comment>. <publisher-name>OSF</publisher-name>. <ext-link ext-link-type="uri" xlink:href="https://osf.io/hn69f">https://osf.io/hn69f</ext-link>		
				</mixed-citation>
			</ref>
			
			
			<ref id="sp1_r2">
				<mixed-citation publication-type="supplementary-material">
					<person-group person-group-type="author">
						<name name-style="western">
							<surname>Karl</surname>
							<given-names>J. A.</given-names>
						</name>
						<name name-style="western">
							<surname>Fischer</surname>
							<given-names>R.</given-names>
						</name>
					</person-group> (<year>2026</year>). <source>Supplementary materials to "Don’t pull any old personality taxonomy from the shelf: The performance of historical and sample derived taxonomies in extracting personality information from text"</source> <comment>[Tables, figures]</comment>. <publisher-name>PsychOpen GOLD</publisher-name>. <pub-id pub-id-type="doi" xlink:href="https://doi.org/10.23668/psycharchives.21826">10.23668/psycharchives.21826</pub-id>		
				</mixed-citation>
			</ref>
			
		</ref-list>
	</sec>
			

<fn-group>
<fn fn-type="financial-disclosure"><p>The authors have no funding to report.</p></fn>
</fn-group>
<fn-group>
<fn fn-type="conflict"><p>The authors have declared that no competing interests exist.</p></fn>
</fn-group>
<ack>
<p>The authors have no additional (i.e., non-financial) support to report.</p>
</ack>
</back>
</article>
