Research Projects | Cross Cultural Assessment and Research Methods in Education (CARME)

Cross-cultural Assessment: Making Inferences Based on International Assessment Data
Research team: Kadriye Ercikan, Wolff-Michael Roth, Mustafa Asil
Two key uses of international assessments of achievement have been (a) comparing country performances and identifying the countries with effective education systems and (2) generating insights about effective policy and practice strategies that are associated with higher learning outcomes in other countries. Do country rankings really reflect the quality of education in different countries? Should we look to higher performing countries to identify strategies for improving students’ reading proficiency in our own countries? In this research, using some exemplifying analyses of the PISA 2009 results, we caution against (a) using country rankings as indicators of better education and (b) using correlates of higher performance within other countries as a way of identifying strategies for improving education in the home country. We elaborate on these cautions by discussing methodological limitations and by comparing five countries that scored very differently on the reading literacy scale of the 2009 PISA assessment. A research paper based on this research will be published in an upcoming special issue of the Teachers College Record edited by Nancy Perry and Kadriye Ercikan.

Think Aloud Protocol (TAP) Methods for Validating Assessments (Funded by SSHRC)
Research team: Kadriye Ercikan, Peter Seixas, Wolff-Michael Roth, Marielle Simon, Juliette Lyons-Thomas, Dallie Sandilands
This research investigates using think aloud protocol approach for examining constructs tapped by different assessment tasks. In particular, it focuses in examining the utility of TAP methodology in fairness research, in conjunction with differential item functioning methodology, and for validating assessment of complex constructs.

Assessment of Linguistic Minorities (Funded by SSHRC)
Research team: Kadriye Ercikan, Wolff-Michael Roth, Marielle Simon, Juliette Lyons-Thomas, Dallie Sandilands
Canadian educational systems rely on accountability measures such as large-scale provincial, national, and international assessments to inform educational policy. Most of these assessments are administered in Canada’s two official languages, English and French. Based on comparisons of assessment results, educational systems are rank ordered, schools and districts are labeled as successful and unsuccessful (e.g., the comparisons of school “performance” published by the Fraser Institute), and school practices that are associated with higher scores are identified as effective. This research investigates the role of language on performance levels of Francophone students who live in minority language contexts (MFS-Minority Francophone Students) outside of Quebec. These students receive instruction in French, may or may not speak French at home, and live in a societal context where English is the predominant language. For the past three decades, there has been a consistent pattern of lower performance levels for the MFS on educational achievement tests compared to their Anglophone as well as their Francophone counterparts who live in Quebec, where, in most municipalities, French is the predominant language and where French, as per Bill 101, is the only official language of the province. Our research investigates how and to what extent language affects measurements of MFS’ achievement in reading, mathematics, and science.

Assessment of Complex Constructs: Assessing Historical Thinking (Funded by CURA-SSHRC)
Research team: Kadriye Ercikan, Peter Seixas, Juliette Lyons-Thomas, Lindsay Gibson
Other collaborators: Kristen Huff, Pamela Kaliski
The primary purpose of this project is to design and validate an assessment of historical thinking that can be used as a model for developing other assessments of historical thinking, as well as a model for designing and validating assessments of complex thinking in other subjects. An evidence centered assessment design (ECD) is used to design and develop the assessment. ECD also guided the validation investigation which placed cognitive data from student think aloud protocols as central to examining the constructs captured by the assessment.

Generalizing in Education, Health and Policy Research
Research team: Kadriye Ercikan, Wolff-Michael Roth
Generalization is a critical concept in all research so it is not surprising that educational and health researchers and policy makers are concerned with generalizability. However, there is both confusion about and misunderstanding of the concept. This research focuses on (a) articulating the structure and limitations of different forms of generalizations across the spectrum of quantitative and qualitative research and (b) develops an overarching framework for generalizing that includes population heterogeneity and users of knowledge claims as part of the criteria for generalizations from educational research.

Population Heterogeneity and Validity Research (funded by SSHRC)
Research team: Kadriye Ercikan, Maria Elena Oliveri, Raman Grover, Dallie Sandilands, Juliette Lyons-Thomas
In educational measurement, fairness is defined as (a) consistency of score meaning for key focal groups and (b) equitable treatment of groups (Kane, 2012). Most of measurement research on fairness has focused on the former whereas the latter has been addressed by standardization of tests and testing situations. Traditionally, comparability of score meaning has been investigated by comparing psychometric properties of items and tests for reference and focal groups. Focal groups have typically been defined as groups who may be potentially disadvantaged based on societal, historical, and educational conditions. Examples of these groups include girls and ethnic minority groups, such as African American, Latino, Aboriginal and Native American students, who often perform lower on tests compared to their counterparts (males or students from the mainstream cultures) who are typically identified as the reference groups. The focal groups may also consist of students for whom the test may not have been originally developed with their specific learning, language and communication needs in mind. These may be students who take tests in a language other than their mother tongue, such as English language learners (ELLs), students with disabilities, or students who are taking tests adapted from a different language.

The focus of this research is item and test fairness investigations for heterogeneous groups. We focus on differential item functioning (DIF) methods used to investigate potential bias in test items when there are large degrees of population heterogeneity in comparison groups. By population heterogeneity we refer to non-uniformity and diversity in examinees in relation to the construct being measured as reflected in their response patterns.