This essay, essentially, is a response to Kaufman’s reservations about the use of social factors as causes within the restricted potential outcomes framework (RPOF) and for surveillance purposes. Specifically, I will respond to Kaufman’s critiques about statistical modeling and interpretation of social factors with race as an example. The first section will define social factors. The second section will lay out Kaufman’s critiques and my rebuttals that will demonstrate that Kaufman’s reservations are not well-founded as they have inconsistencies.
Section I: Definition
I will define social factors following Durkheim 1895 and then relate components and implications of the definition to other authors, where needed. Durkheim defined social factors (called “fact”) as: Every way of acting which is general through a given society while simultaneously existing in its own right, independent of its individual manifestations. A major implication of this definition is that: what is present in the parts (individuals or groups of individuals) of the society is because it is present in the whole of society (aka the collective). In the context of epidemiology, Deiz-Roux 1998 called this holistic determination or determination of the parts by the whole: An individual's risk of adopting a certain behavior is influenced by the prevalence of that behavior in their given society/social group (Diez-Roux pg 1029).
Another implication is that a given property/characteristic of the collective or social group cannot be replaced with the sum (or some other summary value) of the characteristic for its parts. There are at least two reasons for the lack of equivalence between the whole and the sum of the parts. First, the whole is not just the sum of the parts as it also includes the “product of actions and reactions” between the parts in the collective (Durkheim pg 9). In the context of epidemiology, this means that when considering the health outcomes of a collective, one has to consider the manifestations of a social factor in the individuals in the collective and how the individuals influence each other’s manifestations of that factor. Kaufman - a skeptic of social factors as causes within RPOF - noted that the interactions between the study units (e.g., individuals) were a source of indirect effects of public health policy interventions and that the full range of indirect effects was challenging to estimate but they would certainly be different (greater or smaller) from the sum of the intended direct effects of the interventions on the study units (Kaufman pg 362-363). Second, Durkheim noted that the social factor for the social group is distinct from its manifestations in individuals such that there is a dissociation between the social factor for the whole and that for the parts (Durkheim pg 7: “... none of these can be found entirely reproduced in the applications made of them by the individuals since they can exist even without being actually applied”). Related to this, for epidemiology, Rose 1985 noted that the distributional shift in characteristics of the whole distribution required a mass influence acting on the population as a whole. Hence, the determinants of disease incidence or prevalence for a population are not necessarily the same as the causes of disease cases (Rose pg 34).
From the above arguments, it is evident that what Durkheim noted as a ‘collective’ is what social epidemiologists including Diez-Roux and Rose considered a social group’s population. The individuals in the population will have values for a characteristic (e.g., race, gender, etc.) but the population itself will also have a form of the characteristic that might be distinct from the individual manifestations. Such distinction is easier to note for characteristics like economic inequality that do not have any individual-level manifestation and that are only meaningfully interpretable at the population level. However, it is challenging to imagine the distinction between characteristics like race that can have both individual and population-level impacts on outcomes under study. For instance, there might be a general societal experience shared by individuals belonging to a racial group that varies somewhat across individuals within that group based on other individual-level characteristics. On the other hand, the racial group’s population as a whole might face a constraint (in Durkheim’s terms) from the rest of the society, regardless of the differences in the individual manifestations. In such a case, the population-level experience would be expected to differ between different racial groups.
The sociological view of population differs from that adopted in both ‘causes of effects’ and ‘effects of causes’ frameworks in epidemiology, where a population of study units (e.g., individuals) has a characteristic because at least some of the study units possess that characteristic. Population, in such frameworks, is used interchangeably with groups of individuals. Hence, the sum of some characteristic of the parts would be equal to the aggregate value of the characteristic and it would be considered the “population value”. For the remainder of this essay, I use population strictly to mean a social group’s population per the view adopted by social epidemiologists.
Section II: Critique and Rebuttal
Kaufman’s critiques of social factors as causes are based on RPOF. In the previous essays, I have defined causes and causal effects under RPOF. However, in brief – causes are humanly feasible well-defined interventions for which causal effects are identifiable under certain assumptions based on the contrasts of their potential outcomes (see Vandenbroucke et al. 2016 for a summary). The identifiability assumptions include (but are not limited to depending on the specific context, study design, and analysis considerations) exchangeability, positivity, and consistency of potential outcomes. As noted by Vandenbroucke et al., RPOF follows an interventional paradigm in that it requires causes to be manipulable by well-defined interventions that in turn are actionable by humans. While Kaufman presented several critiques as to why social factors cannot be considered causes within RPOF, here, I will focus on the critique of the statistical modeling and interpretation of social factors. For causal inference, the main critique is that social factors such as race cannot be studied as causes under RPOF since they are not manipulable through well-defined interventions. As noted by Vandenbroucke, RPOF’s insistence on interventions to be humanly feasible is too restrictive compared to even the other PO frameworks including the contrastivist view that is closely related to RPOF in the family tree of causal inference perspectives. This restriction prevents epidemiologists from asking important epidemiological questions and investigating relevant exposures. Further, there is no clear demarcation between what is humanly feasible or not since what is unfeasible can change over time or might be different across contexts. Most importantly, as far as causal inference is concerned as a scientific concern in epidemiology – without any obligation for pubic health action – there is no need for the interventions to be humanly feasible. Hypothetical interventions satisfy all the identifiability assumptions for causal effects. Assuming a hypothetical way to manipulate, i.e., an in-principle intervention, under some other POF such as that of Woodward’s interventionist view (see Vandenbroucke figure 1 on pg 1780) that is closely related RPOF, we can estimate the causal effect of social factors as causes.
More philosophically, RPOF assumes that social factors are not manipulable since a human action cannot assign or change an individual’s race. Such a view removes the social context that in reality assigns (and perhaps even manipulates over time and across settings) an individual’s race. This was noted by Durkheim where ways of existing were considered to emanate from ways of acting that were more crystallized based on social context: “structure of the society is merely the way in which its component segments have become accustomed to live with one another” (Durkheim pg 12). Such as view implies that the present ‘states’ of the social factors have been reached through historical and sociopolitical processes manipulating the past states of the given factor. Regardless, the definition of social factors adopted in this essay includes action (“ways of acting”) in it. Perhaps, this sociological view is not entirely lost to Kaufman who noted that social factors like race “... is malleable in the perceptions of others” (Kaufman pg 366). Kaufman presented examples of experimental studies that estimated the causal effects of race on several outcomes including psychiatric diagnoses. In these studies, race was not manipulable in that the race was changed through human action. Rather, what it meant was that awareness of race was assigned to be ‘present’ or ‘absent’ by the experimenter. Hence, it was an assignment and not a manipulation. By Kaufman’s admission then, assignment and not manipulation, is needed for estimating causal effects. I have argued above that in reality, race is assigned by the local social context. Hence, there can be questions and study designs employing time or place/setting-dependent contrasts as exemplified by Harper & Strumpf 2011 that can help estimate the causal effects of social factors such as race.1
For surveillance purposes, Kaufman’s main critique was that the adjusted measure for the health outcome does not represent the distribution of the outcome in a “real world population”. To be clear, the critique was not that the contrast is ill-defined. In fact, Kaufman noted the interpretation explicitly for a ratio measure for black-white contrasts adjusted for co-variates or those standardized for age categories (Kaufman pg 364-365). The critique was that such a contrast’s meaningful interpretation in the real world was not possible since it did not correspond to a real-world population. This critique relies on the population viewed solely as a group of individuals. However, if one considers the population to be a collective, then the contrast is meaningful for the collective. The population-level interpretation of the adjusted measure is for a hypothetical state of that population with a given co-variate distribution. This interpretation is not any different than one drawn for individual-level predictions from a multivariable regression model for associational inferences. Hence, it is Kaufman’s (or RPOF’s) lack of accounting for the fact that a characteristic can have an interpretation at the population level that is distinct from its individual-level manifestations that incorrectly guides them to the conclusion that adjusted measures for social factors are ill-suited for surveillance purposes.
Hence, under a sociological view of the population and use of POF that is not too restrictive, social factors can be studied as causes within the ‘effects of causes’ framework and their population-level statistically adjusted measures are meaningfully interpretable.
I am not expanding on this with clear applications or examples since that would need a whole separate post. I plan to do that sometime later.