A Computerized Text and Cluster Analysis Approach to Psychotherapy Talk Across Time

This paper illustrates an analytical approach combining LIWC, a computer text-analytic application, with cluster analysis techniques to explore ‘language styles’ in psychotherapy across sessions in time. It categorizes session transcripts into distinct clusters or styles based on linguistic (di)similarity and relates them to sessional progression, thus providing entry points for further qualitative exploration. In the first step, transcripts of four illustrative therapist-client dyads were scored under ten LIWC variables including ‘analytic thinking’, ‘clout’, ‘authenticity’, ‘emotional tone’, and pronoun types. In the next step, agglomerative hierarchical clustering uncovered distinct session clusters that are differently distributed in each dyad. The relationships between these clusters and the chronological progression of sessions were then further discussed in context as contrastive exemplars. Applications, limitations and future directions are highlighted.


Introduction
Psychotherapy, the 'talking cure', is a mental health activity that applies clinical methods and interpersonal stances to modify behaviors, cognitions, emotions, and/or other attributes (Norcross, 1990).Though grounded in psychological principles, its verbal and interactive nature has fostered a tradition of linguistic analysis (Labov & Fanshel, 1977;Pittenger, Hockett & Danehy, 1960).Various types of language features have been analyzed in therapist-client talk.While these features are of inherent interest to linguists, attempts are often made to bring linguistic theoretical perspectives to bear on therapy processes and outcomes.For example, clients' use of first-person pronouns (e.g., I, me, we, myself) have been investigated as a reflection of constructs like client agency and self-focused attention.While some found increased use associated with positive outcomes (Demiray & Gençöz, 2018;Van Staden & Fulford, 2004), others found it predictive of depressive symptoms (Zimmerman, Brockmeyer, Hunn et al., 2016).Another well-known approach is Conversation Analysis, which moves beyond individual words to examine how social actions unfold sequentially as participants take turns to converse.This has also been applied to various aspects of therapy talk including how people (mis)interpret each other, ask questions, build relationships, exhibit empathy, and so on (Ferrara, 1994;Peräkylä, Antaki, Vehviläine et al., 2011).The use of metaphors to describe abstract things in terms of more concrete things (e.g., HIV is a large dark cloud hanging above me) (Kopp & Craw, 1998) is another widely studied feature.Metaphors perform functions like clarifying concepts and expressing emotions (Cirillo & Crider, 1995;Lyddon, Clay & Sparks, 2001;Tay, 2011) and are helpful for therapist training (Aronov & Brodsky, 2009;Stott, Mansell, Salkovskis et al., 2010;Tay, 2013).More critical discourse analytic perspectives that relate observed language to "broader social structures, meanings, and power relations" (Spong, 2010, p. 69) have also been adopted.An example is how the very notion of psychological difficulties is culturally constructed, prompting critical reflection on attitudes towards therapeutic practice (Avdi & Georgaca, 2007).
In summary, psychotherapy language research examines how therapy talk reflects relevant psychosocial processes to provide insights and prompt critical reflection.A promising tool in this regard is Linguistic Inquiry and Word Count (LIWC), which has been used in descriptive and comparative studies of language across many social contexts (Tausczik & Pennebaker, 2010) including psychotherapy (Huston, Meier, Faith et al., 2019).LIWC is a computer program for word-level sentiment analysis in English and a growing number of other languages.It classifies words used under a range of socio-psychological and linguistic categories with well-validated algorithms.In particular, summary variables can be derived to measure the extent of speakers' analytical thinking, clout, authenticity, and emotional tone.The approach can complement semantic annotation tools well-known to linguists like USAS (Archer, Wilson & Rayson, 2002) that focus more on content categories like 'food and farming' or 'science and technology'.Despite these possibilities, rich accounts of talk fragments are still more forthcoming than automated analyses of large datasets in psychotherapy language research.A key reason is the traditional emphasis on linguistic variability and its resonances with claims that no two therapy sessions are ever alike (Wohl, 1989).Quantitative approaches are thus criticized as capturing larger patterns at the expense of fine-grained analyses believed to be more useful for therapists' critical self-reflection.Therefore, although general psychotherapy research has a rich and continuing tradition of using computerized and/or various forms of quantitative analysis (Althoff, Clark & Leskovec, 2016;He, Veldkamp, Glas et al., 2017;Mergenthaler & Bucci, 1999;Watson & Laffal, 1963), work oriented towards the nature of language in psychotherapy is less inclined to do so.It is thus timely to show that computer tools like LIWC, combined with appropriate forms of statistical analysis, have a role in complementing traditional methods.This paper focuses especially on an approach that can be used by therapists to describe, monitor, and reflect on aspects of their own language, in ways difficult to achieve by qualitative analysis alone.Pennebaker et. al (2015) details the approximately 90 variables analyzable with LIWC and their psychometric evaluation.We focus here on two specific sets of psychotherapeutically relevant variables.The first set is the four aforementioned summary variables -analytical thinking, clout, authenticity, and emotional tone.Analytic thinking is based on observed differences between 'categorical' and 'dynamic' language in college admissions essays (Pennebaker, Chung, Frazee et al., 2014).A high score suggests formal, logical, and hierarchical thinking, while a low score informal, personal, here-and-now, and narrative thinking.Clout is based on relationships between word choices and perceived social power in chats, emails, and letters (Kacewicz, Pennebaker, Jeon et al., 2013).A high score suggests speaking/writing with high expertise and confidence, while a low score a more tentative and humble style.Authenticity is based on relationships between language features and deception when expressing opinions on social issues (Newman, Pennebaker, Berry et al., 2003).A high score suggests more honest, personal, and disclosing discourse, while a low score a more guarded and distanced discourse.Emotional tone is based on relationships between language features in journal entries and psychological changes following the events of September 11, 2001 (Cohn, Mehl & Pennebaker, 2014).A high score is linked to a more positive and upbeat style, a low score to anxiety/sadness/hostility, while a neutral score suggests a lack of emotionality or ambivalence.

LIWC and Therapeutically Relevant Variables
These variables offer useful multivariate profiles of how language is used to perform psychotherapy functions.They include the way therapist and client narratives are told, the stance of therapists when dispensing advice, the negotiation of relationships, and linguistic displays of emotional well-being.For example, a therapist offering a solution could speak in a highly logical way (analytic thinking), and yet hedge his advice (clout) to come across as more personal (authenticity).They are each scored from 0-100 based on standardized scores from large comparison samples in the mentioned studies.They are the only 'non-transparent' variables in LIWC in the sense of not being direct frequency measures of specific words.
The second set of variables are pronoun types.Pronouns are relatable to psychotherapeutic concerns like self-presentation and attention, ego (Rizzuto, 1993), other persons and entities, and engagement between the dyad.Different than the four summary variables, LIWC directly calculates the frequencies of pronouns as a proportion (0 to 1) of all words in a text.The pronoun types are personal (first-person singular [e.g., I, me, myself], plural [e.g.we, us, our], second-person [e.g., you, your], third-person singular [e.g., he, she, him, her], plural [e.g., they, them, their]) and impersonal (e.g., it, those).Differences between first-person singular (e.g., I, me, myself) and plural (e.g., we, us, our) can suggest whether the speaker is being 'exclusive' or 'inclusive' when talking about self-related issues, as often observed in political discourse (de Fina, 1995).The reported internal consistency of the six pronoun types; i.e., the extent to which items of the same type co-occur across texts, range from 0.7 to 0.85 (corrected α).LIWC does not detect more pragmatic pronoun meanings that may bear therapeutic implications like agentive I versus non-agentive me (Demiray & Gençöz, 2018;Van Staden & Fulford, 2004).It can nevertheless still give a useful indication of the overall degree of interpersonal reference and engagement in a transcript.
We can thus derive multivariate linguistic profiles of different session transcripts according to these variable sets.However, these raw profiles are of limited use because the variable scores are inherently relative.This brings us to the next step of the present approach -to systematically compare and classify sessions within a specific therapist-client dyad.Given that therapeutic progress is seldom linear, how language patterns change dynamically across sessions and their potential therapeutic implications remain underexplored.Hunston et. al (2019), for example, used LIWC to compare between good and poor outcomes as well as within each type of outcome, but broadly defined the latter in terms of just two time periods (early vs. late).Cluster analysis, a class of data analytic techniques that classifies similar objects into groups, Language and Psychoanalysis, 2020, 9 (1), 4-25.http://dx.doi.org/10.7565/landp.v9i1.17017 can be a useful resource to capture language patterns across time in more nuanced ways.

Cluster Analysis
Data analytic techniques like principal components, factor, discriminant, and cluster analysis share the key purpose of data reduction; i.e., reducing a large number of cases or variables in some dataset into a smaller number of groups based on similarities and/or relationships.These techniques derive groups where members within each group are maximally similar, and the groups themselves maximally dissimilar from one another.Specific inter-group differences are then often further analyzed.Cluster analysis is particularly used on cases rather than variables.For the present purpose, this means classifying a dyad's sessions into distinct groups/clusters based on their linguistic profiles, with each cluster representing a distinct language style.Cluster analysis methods are either non-hierarchical or hierarchical.The former (e.g., k-means clustering) specifies a hypothesized or desired number of groups beforehand and tries to assign each case to the best-fitting one, while the latter is more exploratory with the eventual groups determined by the data alone.Hierarchical clustering is more relevant here since there is no clear reason to assert a certain number of linguistic styles in any dyad beforehand.There are two sub-types of hierarchical clustering: divisive and agglomerative.As their names suggest, the former starts with all cases as a single cluster and iteratively splitting it into smaller ones, while the latter starts with each case as a separate cluster and iteratively merging them into larger ones.Agglomerative clustering is more common due to its lower computational costs and more straightforward application.This paper therefore applies agglomerative hierarchical clustering.
Agglomerative hierarchical clustering in turn involves two key processes which require specific methodological considerations often not critically discussed (Clatworthy, Buick, Hankins et al., 2005).The first is how to measure the similarity/distance between cases.The most ideal distance metric depends on the nature of the information characterizing the cases (e.g., quantitative vs. categorical) as well as whether its definition of similarity aligns with theoretical assumptions of the study.The most common distance metrics include the Pearson's correlation coefficient, Euclidean (squared) distances, Jaccard's coefficient, and so on.Euclidean distance is presently used because i) the linguistic variables are quantitative, and ii) as opposed to non-Euclidean measures that consider cases as similar if their scores on different variables are correlated, Euclidean measures consider how close the actual variable scores are (Kassembara, 2017).For the present purpose, this would be a better indicator of similar linguistic style; i.e., whether two sessions are closer in terms of analytic thinking/authenticity etc. rather than whether a high/low variable score in one session is also correspondingly high/low in the other.
The second methodological consideration is the linkage measure.This basically means the distance between clusters, which cannot rely on Euclidean or other measures between single objects if there is more than one object per cluster.There are again different methods based on how distance between clusters is defined (e.g., minimum distance between permutations of one case from each cluster vs. average of distances between pairs of cases).The most common ones include Ward's method, simple linkage, complete linkage, average linkage, and so on, again to be chosen based on theoretical considerations of the study.Ward's method has been applied to linguistic variables (Szmrecsanyi, 2012) and is chosen for its theoretical consistency with standard procedures for computing group differences.By defining the distance between two clusters as the sum of squares between them across all variables, it essentially takes an analysis of variance (ANOVA) approach to clustering.It should also be noted that various studies using different statistical criteria like the Rand index and Cohen's kappa to evaluate clustering outcomes with different types of simulated data have also found Ward's method to outperform others (Blashfield, 1976;Ferreira & Hitchcock, 2009;Hands & Everitt, 1987;Kuiper & Fisher, 1975).
Different choices of method will generally lead to different clustering outcomes; i.e., group number and/or membership (Saraçli, Doǧan & Doǧan, 2013).As justified above, this paper will only present the outcomes from Euclidean distances and Ward's method, mostly because of its focus on illustrating the utility of clustering in general rather than an empirical comparison between different clustering methods.Readers interested in such comparisons and their implications, as well as accessible technical explanations of different methods beyond the present scope, may refer to Yim & Ramdeen (2015) and Everitt et al. (2011).This paper will detail four exploratory analyses of different therapist-client dyads.It demonstrates the combined use of LIWC and agglomerative hierarchical clustering to i) classify sessions into clusters according to language styles discovered in bottom-up fashion, ii) relate these styles to the chronological progression of sessions, iii) provide entry points for further qualitative analysis, and iv) offer insights for research and therapists' self-reflection.

Data Selection
Transcripts of four illustrative dyads each lasting 20 sessions (472,009 words) were selected from Counselling and Psychotherapy Transcripts, Client Narratives, and Reference Works (www.alexanderstreet.com).This database adheres to the American Psychological Association's ethics guidelines for use and anonymity.The transcripts were checked for formatting and spelling errors with annotations (e.g., [laughter], [inaudible]) that should not be coded by LIWC removed.We will see that the overall word count balances between the requirements of statistical analysis and tractability for manual discourse analysis.
The four dyads involve different therapists and clients but all clients showed symptoms of depression, anxiety, and low self-esteem.Dyads A and B reportedly followed a client-centered approach where therapists are open, non-judgmental, and empathetic towards clients, and allow them an active role to discover potential solutions (Rogers, 1951).Dyads C and D followed a psychoanalytic approach where therapists explore the unconscious 'inner world' of clients, often in the form of early life experiences, that are deemed to interfere with their present lives (Freud, 1924).It is important to emphasize that this study does not make claims about how language tends to be used with certain therapy modalities, symptoms, or topics.The samples are therefore not controlled as such.It instead showcases an approach that can be used by therapists to analyze the language of their own sessions.For example, one can examine how this language varies within and between his clients, as he develops Language and Psychoanalysis, 2020, 9 (1), 4-25.http://dx.doi.org/10.7565/landp.v9i1.17019 professionally, and/or for research purposes on representative samples.Furthermore, therapist and client utterances will not be separately analyzed, and the language styles therefore reflect a composite of both speakers in the dyad.

LIWC Coding and Agglomerative Hierarchical Clustering
The first phase of the analysis was done with LIWC (2015 version).All transcripts were scored under the four summary variables (analytical thinking, clout, authenticity, and emotional tone) and six pronoun types (first-person singular, first-person plural, second-person, third-person singular, third-person plural, and impersonal) for a total of ten variables.This resulted in four multivariate linguistic profiles with 20 sessions each.
The next phase was to perform agglomerative hierarchical clustering on each profile with the Python programming language (version 3.7.1).As the variables were scored on different scales -the four summary variables from 0 to 100 and pronoun variables from 0 to 1 -they were first standardized as z-scores (number of standard deviations from their respective means).Euclidean distances and Ward's minimum variance method were then used as distance and linkage measures respectively.Cophenetic correlation coefficients were used to evaluate the goodness-of-fit of the clustering outcomes.This yielded four dendograms (Figure 1), one for each therapist-client dyad, which visualize the clusters and the extent of (dis)similarity between them.
The final phase was to explore patterns between session sequence and clusters; i.e., how sessions are distributed by language patterns as they progress chronologically.Since each cluster is by definition distinct, it is assumed to represent a distinct language style with a unique combination of variable scores.These variable scores were thus analyzed in detail and supported by illustrative examples from corresponding session transcripts.In this way, we gain a fuller account of how language use in each case evolves across the span of treatment.

Overview of Clustering Solutions
Figure 1 shows the four dendograms representing the clustering solutions for dyads A-D.The y-axis indicates Euclidean distances with higher values representing greater dissimilarity.The x-axis indicates session numbers.Euclidean distances are also shown on each node, indicating the similarity between the sessions under that node.The cut-off point of 7.0 indicated by the dotted line yields three color-coded clusters for all four dyads, but the sessions that belong to each cluster expectedly differ for each dyad.Cophenetic correlation coefficients, which measure how well the clustering solution represents the dissimilarities among observations (Sokal & Rohlf, 1962), are all adequate at 0.774, 0.76, 0.786, and 0.733 respectively.We can now claim that each cluster represents a distinct language style, with all sessions in a cluster reflecting that style.Note again that the clusters are not made up of different variables; rather, the ten variables used remain the same throughout, but each cluster is defined by a unique combination of variable scores.Dendograms and similarity measures for dyads A-D Table 1 is an alternative representation of Figure 1 focusing more on the relationship between the clusters and chronological order of sessions.Each color-coded cluster is now represented by a number; for example, the green cluster of dyad A is now called 'style 1', the cyan cluster 'style 2', and the red cluster 'style 3'.

Language and Psychoanalysis
Since each dyad was individually clustered, style 1/2/3 of dyad A is likely not equivalent to their counterparts in the other dyads.Furthermore, since the dyads all discuss different subject matters, it is more useful to contrast styles within rather than across dyads.We may nevertheless begin the interpretation with a broad overview of the latter.In Table 1, the green blocks represent at least three consecutive sessions where the same style is used.The uncolored blocks show an otherwise intermittent switch between styles.We quickly observe that in terms of stylistic consistency, dyad A shows the least extent and only towards the beginning of therapy.Dyad B has intermittent blocks, and dyads C and D have relatively more contiguous consistent blocks.We will interpret and exemplify these style distributions with short extracts of actual therapy talk from each dyad below.Together, they outline four different distributions of language patterns that showcase how the present approach is useful for comparative and self-monitoring purposes., plural [e.g., they, them, their]) were analyzed in the clustering process, they are summarized in the following figures as just one category of personal pronouns (ppron), versus impersonal pronouns (ipron) as a whole.The variable scores are also plotted with 95% CI error bars.Inferential statistics to compare them between the three styles were not performed due to the relatively low number of sessions per style.

Figure 2
Style properties of dyad A It is interesting to compare these scores with mean scores from other discourse contexts like blogs, expressive writing, novels, natural speech, the New York Times, and Twitter (Pennebaker et al., 2015), collected by LIWC developers and reproduced in Table 2.One might expect the scores for psychotherapy talk to be close to natural speech but in general, only analytic thinking and pronoun use in the present dataset are similar.Otherwise, there are noticeably lower levels of clout and emotional tone, and higher levels of authenticity than natural speech.Clout and authenticity are instead more comparable with blogs and expressive writing, and emotional tone with expressive writing and novels.We will not elaborate on these comparisons here, but note that this type of analysis offers a new perspective on the discourse analytic question of just how psychotherapy is different than 'ordinary' conservation (Ferrara, 1994;Mondada, 2010).Going back to dyad A, recall that style 2 dominates the beginning phase of therapy (sessions 2 to 6).The sessions thereafter reflect an intermittent switching between the three styles.We now see from Figure 2 that compared to styles 1 and 3, style 2 has a mid-level of analytic thinking (17.1), clout (34.8), and authenticity (76.3), low-tomiddle pronoun use, and the highest emotional tone (31.4) by a narrow margin.These suggest that on the linguistic level, the present course of therapy begins with a moderate approach.It is relatively informal with a narrative rather than logical style, which appears to be consistent with the avowed client-centred approach of facilitating an open-ended dialogic environment.This will later develop in either direction to become even more informal or more formal.The same goes for clout as we observe a considerable positive correlation between the two variables (r=0.606,p=0.005).At the beginning phase the speakers do not immediately assert a high degree of expertise and confidence.Later on, this likewise moves in both directions as therapy proceeds.
Authenticity also starts at mid-level but is thereafter negatively correlated with analytic thinking (r=-0.459,p=0.042).and clout (r=-0.776,p<0.001).As with the previous aspects, the language becomes most honest and disclosing, as well as more guarded and distanced throughout the course of therapy.Emotional tone fluctuates minimally, but the beginning phase is slightly more positive and upbeat than later.Pronoun use across the styles do not vary by more than 2%.
The following extracts provide a brief contextual illustration of these trends.The first extract (style 2) occurs near the beginning of therapy.The mid-level analytic thinking, clout, and authenticity can be discerned from the somewhat vague nature of the discussion.Both therapist and client appear to be figuring things out and exploring what would be important to talk about, not (yet) committing to a specific mode of therapeutic analysis.The emotional tone is noticeably lower than other discourse contexts like blogs, natural speech, and social media (see Table 2) but slightly higher than the remainder of the sessions, when the client's issues are discussed in detail.
Therapist: I will go through my usual beginning spiel which is I don't have anything particular that I need to know.I'd like you just to start where you feel like starting.
Client: You want me to start where I feel like starting, wherever I feel like starting I take it.Well, I suppose that's real direct.I don't really know where I'd like to start either.I wouldn't know where to start.I suppose we need to start about how I ended up here.
The next extract illustrates style 1, which is even more informal and personal than style 2 (lower analytic thinking and clout, higher authenticity).As mentioned, style 1 occurs intermittently with styles 2 and 3 after the initial stretch of style 2. We can observe a high level of personal disclosure as the client discusses his drug habits, and a more casual mode of interaction as both therapist and client use highly informal language (bum, whatever, oh god, christ).
Therapist Therapist: There's something, it sounds like, about these evenings feeling tired, tense, or whatever.
Client: Oh God I haven't slept in three days.I just...I get in bed and I start worrying about something or other.And I've got to get to sleep.Christ I get up at 5:30 in the morning and I'm used to going to bed at 5:30.
The final extract from dyad A illustrates style 3, which is the converse of style 1 (higher analytic thinking and clout, lower authenticity).Personal pronouns increase but impersonal pronouns decrease, suggesting a shift in focus towards the client's self and relevant others.We can observe a more analytic approach as the therapist performs his institutional role -explicating his inference and interpretation of what the client said, and providing insight on the client's feelings.
Client: Right, I just...well, I can't say I can imagine but I know that Matt must be extremely terrified.Even if it has come to the point where he can take it calmly, he is still scared.He has not talked to me like that but he has talked to Jack Larkin about how terrified he is -constantly.It would almost be proving the non-connection of those things around him, to him, if we were to just throw him out and forget about him.
Therapist: You would be just reinforcing his view of the world is.I guess I also...this is an inference I am making -I don't remember you saying this -but it sounds to me that there is something very horrible to you to think of a person alone with that terror all of the time.Like I said before, you don't know what effect it has -having him staying there -but somehow if it has any, it is worth doing.
In summary, the clustering and contextual analysis of dyad A reveals a particular interactional style -a relatively prolonged and moderate approach at the beginning to set the scene, followed by a more linguistically varied approach where both ends of the variable spectra are manifested.Such information is of potential interest to therapists wanting to reflect on their own interactional style, as we continue to show below for the remaining dyads.

Dyad B
Figure 3 shows the variable scores defining the three styles in dyad B

Figure 3
Style properties of dyad B Dyad B begins similarly as dyad A in that one style predominates the beginning phase (style 1).In dyad B, however, the same style 'returns' towards the end of the 20 sessions (sessions 15 to 18), with intermittent style switches in the middle phase.We see from Figure 3 that style 1 has the highest level of analytic thinking (12.2) and clout (33.6), lowest authenticity (74.5), and low emotional tone (36.5).While impersonal pronoun use is also the lowest for style 1, pronoun use in general does not vary by more than 2% across the styles.It is interesting to further note that similar to dyad A, clout is negatively correlated with authenticity (r=-0.728,p<0.001), but its positive correlation with analytic thinking is not significant (r=0.352,p=0.129).
We can tell from this general overview that language use in dyad B differs from dyad A in two ways that are worth further contextual investigation.Firstly, while dyad A begins with a moderate style, dyad B begins with a more formal and analytic style that was seen only intermittently in the later stages of dyad A. Secondly, consistency is seen at both ends of the treatment span in dyad B, which may dovetail with a therapeutic strategy (deliberate or otherwise) that can be described as 'letting the conclusion mirror the introduction'.These linguistic differences are observed despite both dyads ostensibly following a client-centred approach, cautioning us against overly generalizing linguistic patterns as a function of therapeutic modality without a more comprehensive dataset designed for that purpose.
This consistency at both ends is illustrated by the following two extracts from session 1 and 18 respectively.The first extract from session 1 suggests that the therapist immediately adopts an analytical style that is echoed by the client.In the first turn the therapist summarizes what the client had previously said, providing an analysis of the situation.The client concurs and elaborates on her feelings, although in a general way without full disclosure of details.The therapist again provides an interpretation in the following turn.This exchange clearly contrasts with what we saw in dyad A where the therapist "don't have anything particular that I need to know" and the client "don't really know where I'd like to start either".
Therapist: And it sounds like that's a signal of something wrong when you can't make a clear decision to go do something you really love to do.Like that really says that's really sad for me.
Client: Yeah.Because that's -then that's part of the -that just repeats that into the whole problem of I mean part of I'm sure what's causing it is all of the feeling that you know here I am really just sitting around and going over not doing anything that I really feel is worthwhile.And just really wasting time.And as a result I just sit around and waste more time.It's just very strange thing.
Therapist: It's like everything that happened that piles another thing on top of that.As if nothing happens to break it or to break into it or to loosen it at all for you.But it just all becomes an additional weight.Is that what you were saying?
The subsequent middle phase for dyad B is similar to dyad A where we see some cycling between the three styles and the continua of analytic thinking, clout, authenticity, and emotional tone.In many instances this is the phase where therapists engage closely with clients' experiences, memories, thoughts, and feelings, which corresponds to a general picture of inconsistency in language styles.However, the following extract from session 18 illustrates how the previous analytic style returns in the final stretch of sessions.
Client: And in fact I could sort of conceive of some kind of an ideal where you can work through your feelings, the more you can talk them out.But I'm not sure of what it consists of and I'm not sure how to go about reaching out to him.
Therapist: Yeah, and it sounds like right now too you don't see, I mean, you feel like the feelings themselves ought to change before you could start to work them through.At least that's the impression I'm getting is that they ought to be somewhat different maybe in intensity or something.It isn't just a matter of learning how to do it.
Client: Yeah, they ought to be less intense and I ought to express them in different ways.
We observe that, expectedly, the client can now express some insight on how she should work through her feelings.The style of language and interaction nevertheless mirrors the beginning as the therapist still plays an analytic and interpretative role.It would be interesting from the perspective of training and feedback (Claiborn & Goodyear, 2005) to query if this represents a deliberate or subconscious attempt by the dyad to revisit a certain mode of language and interaction.

Dyad C
Figure 4 shows the variable scores defining the three styles in dyad C.

Figure 4
Style properties of dyad C We first observe that the correlations between analytic thinking, clout, and authenticity seen in dyads A and B do not exist for dyad C. While this implies that the variable scores for dyad C are relatively eclectic, it also has the longest stretch of stylistic consistency so far.Style 1 predominates for up to three quarters of the sessions with a short three-session switch to style 3 in the middle phase (sessions 9-11).The sessions then end intermittently in the final quarter.Figure 4 shows that style 1 is relatively low in analytic thinking and clout but high in authenticity and emotional tone.Style 3 is also low in analytic thinking and high in authenticity.However, it sees a large increase in clout and decrease in emotional tone from style 1. Style 2 has large error bars as it defines only two sessions.Overall, we can discern interesting similarities and differences with the previous two dyads that underline the comparative import of the present approach.Dyad C bears some structural resemblance with dyad B as it begins and ends with the same style, but differs in that it has an additional consistent stretch in the middle -a 'transition block' of sessions so to speak.However, the nature of the styles is neatly reversed.The beginning and end of dyad B is more formal and analytic but that of dyad C is less formal and analytic compared to the remaining sessions.On the other hand, while dyad A does not have a consistent beginning and end, the quality of its beginning resembles dyad C.
The following extracts illustrate the general structural character of dyad C -a beginning and end that is relatively informal and personal, sandwiched by a transition block in the middle that becomes more authoritative and emotionally negative.The first extract occurs at the beginning (session 1) where the client takes considerable time discussing her recent experience at a career fair.The descriptive is narrative-like, and the therapist follows this conversational style without adopting an expert stance.
Client The next extract occurs towards the end (session 16) where the conversation returns to events in the client's life.She had been talking about difficulties with opening an account with a brokerage firm after her impending graduation, to which the therapist adopts a similar approach of replying in a fairly informal style and concurring with what was said.
Client: That was hard and so now I feel even worse about not doing it, because now I don't have the excuse of well, I've been under a lot of pressure and I can't -don't have the energy to cope with it.
Therapist: Right.Right, so it -I mean different task, same shit.
Therapist: And ah, yeah, and I guess part of the whole paradigm is that not only is it different task, same shit, but it's different task, same shit, no accounting for the task that you've just gotten done.
The quality of these extracts can be contrasted with the following, taken from the transition phase (session 9) which considerably increases in clout and decreases in emotional tone (style 3).Consistent with the avowed psychoanalytic approach, this is the period when the client discusses her marriage problems in detail, accounting for the substantial drop in tone.The therapist becomes less informal in his responses to the client.Although there is no obvious increase in analytic thinking, he begins to adopt a more explicit expert stance instead of the more agreeable tone observed earlier.We can see this in the final turn where he dispenses concrete advice and makes reference to his professional identity.
Therapist: This is very sad and hurtful.It wouldn't surprise me that you were bitter.
Client: No.But I should get over it.That's what I keep telling myself.I'm like well that could be dumb.
Therapist: For what it's worth I mean, and of course I'm going to say this, you're going to have better luck getting over it talking about it and looking at it than being annoyed with yourself and sweeping it under the rug.I mean I know spoken like a but I do believe it's true.

Dyad D
Figure 5 shows the variable scores defining the three styles in dyad D.

Figure 5
Style properties of dyad D Dyad D also reflects a distinct pattern of style distribution.The first half (sessions 1 to 11) is dominated by a large block of style 1 followed by three sessions of style 2. The second half then switches to an intermittent series featuring all three styles.We could describe this as a therapeutic 'tale of two halves' where an initial display of consistency is distinct from the subsequent display of inconsistency.The consistent half begins with style 1.It scores relatively high for analytic thinking, emotional tone, and clout, but low for authenticity and personal pronouns.Analytic thinking appears to be, but is not actually negatively correlated with authenticity (r=-0.323,p=0.371).However, unlike other dyads, it is positively correlated with emotional tone (r=0.624,p=0.003).This suggests that while neither emotional tone nor analyticity is high overall, a more formal/logical way of speaking co-occurs with moments that are relatively positive.The transition to style 2 towards the middle is noteworthy for the abrupt decline in analytic thinking, clout, as well as emotional tone, and a corresponding increase in authenticity.This suggests a switch to an even greater Language and Psychoanalysis, 2020, 9 (1), 4-25.http://dx.doi.org/10.7565/landp.v9i1.170120 narrative style as the client begins to elaborate on his experiences, as expected under a psychoanalytic approach.Unlike dyad C, however, the sharp drop in tone is not accompanied by the therapist using more authoritative language to address the issues.
The following extract from session 9 (style 2) instead suggests that during this phase, the client begins to take extended turns as he narrates his negative situation with observed long pauses.We also observe a greater sense of uncertainty in his reflection of the reasons for his anxiety, accounting for the decline in clout.
Therapist Subsequent to the short stretch of style 2 with extended client turns as seen above, the remaining sessions for dyad D cycle between the three styles.The switch between styles 2 and 3 from sessions 12 to 18 is of some interest.From Figure we see that analytical thinking continues to drop and authenticity continues to rise from style 2 to style 3.This suggests a continuation of narrative-like talk, accompanied by a curious rebound in clout.The following extract from session 12 (style 3) offers some explanation.While the client continues to narrate with uncertainty, the therapist begins to assume a more authoritative stance.Crucially, however, this stance is relatively less prolonged than for other dyads as reflected in the intermittent switches between styles 2 and 3.
Client: And then you just couple it all with I'm just -I don't want to say I'm paralyzed, but I'm just so -my motivation is just not there.

Conclusion
This paper demonstrated the combination of computerized text, clustering, and manual analysis to gain insights about evolving language styles in psychotherapy.For more research-oriented purposes, the relationship between style and session sequence is an underexplored perspective on language variation in addition to more familiar factors like therapist/client groups, modalities, and cultural contexts.There is also potential for studies to investigate associations between evolving language styles and different therapy outcomes.In terms of reflection and training on a more personal level, the present approach allows therapists to critically reflect upon their own language use with modest samples collected from personal practice.It gives therapists greater awareness of (linguistic) changes that can take place within the context of a single client rather than across clients.
An important methodological point about this approach is its compatibility with traditional approaches like conversation and discourse analysis.Having determined the clusters, we may choose to qualitatively scrutinize them in different ways.The four dyads illustrated different degrees of consistency and variation in language styles.All four opened with a stylistically consistent stretch of sessions, but the exact nature of these styles varied in each dyad from being more/less analytic and authentic.Furthermore, there is variation in how long this consistency is maintained, ranging from several sessions to the entire first half of treatment.In the case of dyad B we also observed a 'return to consistency' as the closing stretch mirrored the initial stretch.These examples showcase an open-ended range of possibilities, each of which warrants further analysis in their own right.It is worth reiterating that the present study makes no claims about language tendencies in particular therapy modalities.
There are several limitations to the study to be addressed in future work.Firstly, given enough sessions, more detailed time series analyses with ARIMA models or other approaches can be conducted to depict how each summary variable changes on a persession basis (Tay, 2017(Tay, , 2019)).This would complement the clustering approach which is more concerned with grouping sessions than modeling the entire time series, and offer more clarity to present observations of structural resemblances (e.g., between dyads B and C).Secondly, the present study considers language style as a composite of therapist and client language, but future research could separately examine and compare the two.This would allow more detailed insights into the relationship between language use and the expectations imposed on therapists/clients in particular modalities; for example the 'non-directive' client-centred therapist.It would also be a relevant approach to investigate the notion of 'synchrony' (Koole & Tschacher, 2016); i.e., how therapist and client behavior echo each other.The final limitation is the inherent inability of LIWC to capture figurative language such as metaphor and irony.
Although figurative language occupies a small proportion of general language use (Steen et al., 2010), it was explained in the introduction that metaphors can be particularly impactful in psychotherapy.This is one reason why the present approach complements but does not replace qualitative methods of therapeutic language analysis.More generally, it is also incumbent upon computerized methods to enhance their validity and reliability for more insightful analysis of pragmatically complex contexts like psychotherapy.

Table 1
Overview of style distributions across the dyads Figure2shows the variable scores defining the three styles in dyad A and a reproduced style distribution for easy reference.The mean/standard deviation refer to the average variable scores/standard deviation of the sessions that belong to that style.For clarity of presentation, although all five personal pronoun variables (first-person singular [e.g., I, me, myself], plural [e.g.we, us, our], second-person [e.g., you, your], third-person singular[e.g., he, she, him, her]

Table 2
Average variables scores from other discourse contexts : I want to hear what you're saying.You wouldn't like it if you did?Client: Well, no I wouldn't.I wouldn't, definitely.I've always...One of the reasons that I ever considered it OK to me, for me, to use marijuana, is that it wasn't an escape and I wasn't just freaking out somewhere and letting reality trip on by, and it'd really bum me out if it ever became that to me.
: I feel like the projects I worked on for my imaging class and my circuitry class were definitely worth talking about, but they were class projects not a thesis.And I feel like everyone's -I don't know, maybe I shouldn't be going in expecting everyone to ask me about my thesis instead of, you know, what are you interested in or what have you done.But it seems to me that the obvious thing to talk about is grants and your thesis and not like a general project for a class.Therapist: I don't know, you tell me.You're probably closer to this -surely closer to this than me.But I imagine what they really want to know is how you can talk about what you've done and show that you can think.
: So, do you know what was making you so anxious that it was hard to work yesterday?Client: I don't know.I mean, I think it's (pause), since I did the affidavit and sent it off that, I don't know, because I like, I had the meeting and then I did the affidavit and that was enough work for the day.You know, just wanted to kind of go home and sleep.And as I'm driving home I remembered about the envelopes, the letters.And, I don't know, it's not like it's, the anxiety yesterday wasn't as bad as it has been (long pause).It really sucks about the $3,100 for the car.I had to ask my parents for 1,500 for the car.And right now I'm spending all the money for the past tickets.So I'm supposed to have a closing with the bank, I don't know when the hell that's supposed to happen.Just no other business coming through the door, you know.
(long pause) I don't know if I'm so tired because I'm getting up early every day this week; semi-early.I get to sleep in tomorrow.(long pause) I may have to pick up Ian from school today which means that the work I'm supposed to be doing I can barely do.Most rational people would do it when they got home and I, I just can't.Once I hit the apartment I'm, I haven't been bringing my laptop home.I've been leaving it in the office.I've been checking access through home.I've got a desktop that's really slow, but I can, you know, I can go to my PC I can access.And I kind of joked with myself that I would do the letters that way, that I would at least get the addresses done.
: Well, I think you're really worried about getting deeper and deeper into a hole financially again.I think it's really hard to step back at all from that worry so that you have some space to think and I think you just shut down.You know, you are -trying to think -to me, doing a lot more stuff.I mean you're really able to focus a lot more on work.And, you know, I remember us talking not that long ago about all of this networking stuff.And you're saying like, "Yeah, I know I should be doing this, I know I should be doing that.I just can't get myself to do it."Well, now you can.I mean you really have been -it's been gradual, but you've really been ramping up a lot over the last, I guess, nine months or so, especially in terms of just what you can do.I mean your capacity to work has really increased a lot.I remember when it was, you know -you weren't doing any of the networking.And I guess you're doing E&G but...
a thing online about short sales, and I'm just going to -you know, it's like an hour long; figured I'd watch that because it's free.But just -I don't know.I'm just so -I don't know.I just don't have the motivation to do stuff, to do my work.Put everything off until the last second.Yeah.Therapist