Teaching and Learning in Communication Sciences Teaching and Learning in Communication Sciences & Disorders & Disorders

Abstract This study sought to scaffold administration performance of a standardized bilingual screener to sufficient levels of accuracy for data collection using principles of Cognitive Load Theory by managing task complexity when training pre-service clinicians. Before training administration skills, two students were given copies of the manual for the Bilingual English Spanish Oral Screener (BESOS) and asked to administer the protocol independently. During the intervention phase, students were scaffolded through administration tasks of increasing complexity and given explicit instruction, which included tailored goals, modeling and feedback. Performance for four skills was assessed using a fidelity rubric and analyzed using visual analysis. Performance varied per skill but overall scores were higher during the intervention phases than during the baseline phase for both students. In addition, accuracy of performance maintained across client participants showing patterns of generalization. Although the data are limited, scaffolding training skills for pre-service clinicians appears supportive in training administration skills for bilingual tasks. The level of support may vary per skill and per language. Future research may seek to investigate other clinical skills and tasks.

Experienced speech-language pathologists (SLPs) often serve in supervisory roles for student preservice clinicians who have little to no previous experience working with clients. Unlike expert SLPs, student clinicians' lack of experience means they do not have previously established schemas for how to perform clinical tasks. In particular, they need the skills to accurately administer and interpret assessments for the purposes of screening and evaluating clients (American Speech-Language-Hearing Association [ASHA], 2020). This may compromise procedural fidelity while they develop these schemas. Few studies have examined how to effectively and efficiently facilitate learners' independence with these evaluation skills. When working with bilingual clients, the use of two languages presents an additional challenge during the evaluation process, particularly for student clinicians that have not used Spanish in a professional or clinical context. Lower language proficiency may compromise fidelity. This pilot study examines the efficacy of explicit training procedures for student clinicians to effectively administer a Spanish-English oral language screening measure to preschool children. Specifically, we ask whether providing explicit, scaffolded instruction supports procedural fidelity in test administration for bilingual student clinicians.

Supervision in Communication Sciences and Disorders
The task of the clinical supervisor is to guide the learner from direct/active supervision style to an end state of self-supervision with a transitional stage in between these two ends of the continuum (Anderson, 1988;McRae & Brasseur, 2003). Throughout the traditional clinical education process, a student clinician gains experience first with direct support from their supervisor and then shifts to more self-directed, independent learning. Supervisors must decide how much support to provide, what type of instruction is appropriate, and how much time to spend in each stage. In guiding student clinicians through the diagnostic process, a supervisor could remain in the direct/active supervision stage for a prolonged period of time and not allow the student to make independent diagnostic decisions. In contrast, a supervisor may provide minimal guidance and instruction and incorrectly assume a student may be ready to function in a more independent or role. Since the supervisor's role is to maximize the learning opportunities for the student while balancing the ethical responsibilities for the client, there is a delicate balance between these two extremes (ASHA, 2013). The type and level of support needed may vary by task. Additional guidance may be required to establish more explicit expectations so as to foster a student clinician's independence without putting the client or clinical process at risk. This process is typically referred to as "scaffolding" (Austin, 2013).
Scaffolding early clinical teaching has been a common recommendation in the field of communication sciences and disorders (CSD) in the last 25 years, as outlined in multiple popular textbooks regarding supervision in the field (Austin, 2013;Dowling, 2001;McCrea & Brasseur, 2020). Examples of scaffolding include the use of scripts (Dowling, 2001;McCrea & Brasseur, 2020) as well as mediated learning opportunities (Gillam & Peña, 1995). Walden (2013) offers practical worksheets for supervisors to guide their students from non-learning responses, defined as presumption or rejection, to reflective learning responses during clinical supervision, defined as reflective practice and experimental learning. Peña and Kiran (2008) offer a multi-step model for teaching clinical skills that is useful for conceptualizing instructional scaffolding. As students move up the five-rung ladder, they are guided towards more self-directed learning. At the highest and fifth rung, Expert, it is anticipated that learners have developed a more complex schema, in which self-directed learning is expected. A key concept of this approach is that initially students are explicitly taught the skills they are expected to master early in the process. While guidance is helpful, scaffolding student learning also requires empirical support as part of evidence-based practice.

Screening Language Skills in Bilingual Children
Language screeners are common assessments that student clinicians are routinely asked to administer and interpret. Language screeners are designed to be brief measures that allow a clinician to determine whether further evaluation is necessary to identify potential language delays or impairments (ASHA, 2004). Screeners offer a valuable opportunity to introduce evaluation skills to the student clinician, as the decision-making component is substantially reduced relative to a full standardized assessment. At the same time, they still require the student to obtain valid assessment data and interpret it accurately.
The Bilingual English Spanish Oral Screener (BESOS;  is an example of one such screener. The BESOS is designed to assess the language skills of bilingual children in English and Spanish. There are three variations of the screener: pre-kindergarten, first and second grades, and third grade. The BESOS consists of four sections: Spanish Morphosyntax, Spanish Semantics, English Morphosyntax, English Semantics. The clinician is responsible for administering all four parts to ensure a complete description of the child's abilities across languages. Scores are normreferenced and indicate whether further testing is warranted. Importantly, the BESOS utilizes conceptual scoring; skills are compared across languages, and answers are accepted in either. Administration time is generally 20 minutes for an experienced clinician (Lugo-Neris et al., 2015).
Although screeners appear simple on the surface, successful administration and scoring requires the integration of a complex set of skills and behaviors. These include procedural knowledge of test administration, language proficiency in bilingual clinicians, and interpersonal skills. For the BESOS, procedural knowledge encompasses knowledge of test administration rules including identifying basal and ceiling scores, providing allowable and appropriate feedback, accurately scoring responses in two languages, and completing the screener within an appropriate time frame. Testing in two languages is a particular challenge in bilingual assessment because of the wide variability in level of proficiency in each language, especially among heritage speakers (Scontras et al., 2015;Giguere & Hoff, 2020). Further, in the United States, heritage speakers often receive most of their education in English and many do not learn to read and write in their home language (Scontras et al., 2015;Giguere & Hoff, 2020). Thus, language proficiency and use will influences how much assistance a student may need in order to successfully carry out a test in two languages. Additionally, students need to demonstrate appropriate interpersonal skills, including establishing rapport, setting expectations, and managing the client's behavior. When combined, the task of administering a screener such as the BESOS evolves into one in which the student is responsible for multiple simultaneously moving parts. Student clinicians must develop schemas for accurate test administration and managing the intricacies of interpersonal communication. Supervisors must guide them in developing these skills.

Scaffolding Training of Early Learners
Prior studies in the field of CSD (e.g., Brightenburg, 2006;Phillips, 2009;Ensslen, 2013) have looked at various student and supervisor experiences. In the current study, we are interested in how supervisors scaffold novice (student) clinician knowledge. We also draw from related fields for insights into student errors and the effects of training novice clinicians in our own field. Scaffolding student instruction has supported mastery in psychologists' administration of cognitive assessment measures. Blakely et al. (1987) compared independent preparation (i.e., selfstudy) to systematic training. Their systematic training used the MASTERY model by Fantuzzo et al. (1983). In this model, the students received explicit instruction on test administration rules and were required to pass a test on the test manual content. Students then practiced test in the roles of both examiner and examinee. The MASTERY model resulted in higher effects for improving administration accuracy over studying the manual independently.
While it might seem intuitive, or even rudimentary, that students would respond to systematic training in a positive manner, supervisory decisions should be evidence based (ASHA, 2008). Empirically examining scaffolded instruction in our fields, therefore, becomes particularly important.

Cognitive Load Theory as a Foundation for Scaffolded Instruction
Cognitive principles of learning can provide a framework through which we can understand and implement instruction in early clinical teaching. Theories of cognitive load and learning suggest novel stimuli place high demands on working memory. Austin (2013) offers a clinical model based on the work of van Merriënboer and Sweller (2010) that controls for cognitive load during the supervisory process. In early stages, learners are attempting to manage all of the details of a new task, which likely taxes working memory to a high degree. van Merriënboer and Sweller (2010) divide the load placed on working memory into three parts: intrinsic load, extraneous load, and germane load. Intrinsic load refers to the complexity of the task itself. For example, learning vocabulary is less complex than learning a procedural task, such as administration of an assessment. Extraneous load refers to the manner in which the novel information is presented to the learner. Learning a portion of the assessment process requires less effort than learning the full procedure at once. Germane load refers to the resources available to the learn to manage the cognitive load. As learners develop schemas for the task, the overall load on working memory is reduced, freeing cognitive resources.
Manipulating the nature of the instruction for student clinicians may help to reduce demand on working memory. van Merriënboer & Sweller (2010) suggest using oral explanations in the moment the student clinician is learning new information versus providing written explanation beforehand. For CSD students, this could be the difference between reading the manual of an assessment measure on their own versus receiving verbal explanations as they are practicing. In such situations, the students are no longer expected to hold on or memorize previously-read, pertinent information.
As student clinicians transition to independence, their connection of knowledge helps to develop schemas for how to complete tasks they have experienced (Peña & Kiran, 2008). These developing schemas, in theory, should result in higher accuracy of performance as demands decrease on working memory. Supervisors can also help facilitate development of schemas by providing variability in the tasks (Austin, 2013). One example of variability might be performing the same task on different types of clients. Group work also offers a natural opportunity for variability, as graduate students often work together, and supervisors may wish to foster potential peer collaboration. Administering the same measure on children of different ages is another example of variability in learning. Together, these principles could help to optimize the germane load of early learners.

Summary
Fidelity and accuracy of administration for assessment tasks is of particular importance for student clinicians in speech-language pathology. Teaching bilingual student clinicians how to administer screening tools offers a unique opportunity to understand variables that may impact their assessment skills and ensure student clinicians are prepared for the demands of the field. By providing scaffolded opportunities to increase student's diagnostic skills, we hope to identify variables that may facilitate the novice clinician's transition to being an independent learner. Cognitive load theory offers a framework to guide this process by identifying ways in which scaffolding may help facilitate learning.
This pilot study investigated the role of scaffolded instruction in the education of bilingual student clinicians for the purpose of developing their clinical evaluation skills. Specifically, we sought to answer the following research question: What is the effect of scaffolded instruction to train procedural knowledge on the performance of student clinicians when administering a bilingual standardized measure?
We predicted that by reducing the demands of the task, learners would show higher accuracy in performance and develop schemas that would assist in long-term retention of learned skills. For the BESOS, we anticipated that teaching procedural skills independently from other skills, practicing first in low-fidelity environments, and providing variability in practice would result in higher levels of performance within and across languages.

Methodology
The current data was selected as part of an on-going study that sought to develop an experimental criterion-based language measure for Spanish-English bilingual children (Bedore, 2020). The data were collected as part of an examiner training protocol for the pilot portion of the larger study, during which undergraduate volunteer research assistants administered a language screening measure to bilingual children to ensure their status as typically developing. The study was approved by the IRB at the University of Texas at Austin. We collected data on the systematic training procedures and amount of effort it took for undergraduate volunteers to consistently and reliably administer the BESOS in terms of following administration rules, giving allowable feedback, accurate scoring, and time management.
Participants. Three undergraduate students were recruited via word of mouth from a pool of students already working as volunteers in our research lab. They were invited to participate as examiners to collect data and signed informed consent. One participant was unable to complete the full training due to graduating and was excluded from the current analyses. Of the remaining participants, one was a leveling (i.e. post-baccalaureate) student, and the other a senior at the time of participation. Each completed an online questionnaire in which they were asked to indicate their previous observation experience, direct time assisting in therapy, and direct time assisting in assessment. Descriptive information about each participant is summarized in Table 1. Pseudonyms are used throughout the document to refer to the participants. In addition, participants were asked to rate their levels of conversational language proficiency in Spanish and English on a three-point scale created by the first author. Both indicated full professional proficiency in English, as indicated by a rating of 3. In terms of Spanish, Participant 1 reported working proficiency (2), and Participant 2 indicated she had elementary proficiency in Spanish (1). Both reported being heritage speakers that learned their Spanish via home and community exposure. Participants did not report having had any formal education in Spanish. Note that Participant 1 had a higher number of observation hours than Participant 2. Participant 2 had some experience with intervention. Neither participant reported prior experience in assessment administration.

Measures.
BESOS. The BESOS ) is a bilingual language screening measure designed to assess language skills of bilingual children in English and Spanish. It consists of a semantics and morphosyntax subtest in both English and Spanish. The subtests are not translations of each other but designed to include specific markers of language impairment in each language. As part of the protocol established for the larger study, all four subtests had to be administered in order to provide an accurate measure of a child's skills across languages and score reliably. Scores are normreferenced and indicate whether further testing is warranted. Preliminary validation data indicates concurrent sensitivity of 90% and specificity of 91% (Peña et al., 2018b). Reliable administration and scoring of the BESOS was important given that it was used to screen for potential language delays as an exclusionary factor for the larger project. The present data reflects training data for all versions of the BESOS.

Dependent Variable.
Performance on participants' administration of the BESOS was measured using a fidelity protocol developed by the authors (see Table 2). The fidelity protocol included 4 categories consisting of skills that were easily observable and measurable that the authors identified as areas in which novice, undergraduate student volunteers typically have the most difficulty.
Each category was scored on a 4-point scale labeled as independent (4), approaching competency (3), emerging (2), and insufficiently present (1). Table 2 presents specifics for each level of the scale.

Procedure.
Design. We utilized a single-subject case design. The study consisted of three phases: baseline, training (multiple trials over 2 sessions), and generalization (two levels). Figure 1 depicts an outline of the study design and related activities.  Baseline. During the baseline phase, participants were given the BESOS manuals and asked to familiarize themselves with the manuals, plates, and protocols. In order to ensure they had completed this task, the participants were asked to take a short quiz on information covered in the manuals using the Canvas Learning Management Platform. Participants were required to receive a score of 80% or higher before proceeding to the next step. Participant 1 passed the first time but Participant 2 was asked to repeat the quiz, as she did not reach criterion on her first attempt.
Upon passing, the participants were then asked to administer a subsection of the BESOS to their peer and were observed and scored using the fidelity rubric presented in Table 2. Baseline data were collected as participants participated in a Round-Robin task in which one participant administered the measure and another served as a client. One data point was collected in each language (i.e., two overall data points) during the baseline phase. Additional data points were not collected due to testing and time constraints associated with the larger study. It is important to note, however, that participants struggled to simply complete any portion of the screener at this phase. On multiple occasions, practice testing was paused because participants stated they did not know what to do. On these occasions, the supervisor re-directed them to give their best attempt.
Instruction. Following baseline, participants attended two instructional sessions one week apart. The first session lasted 60 minutes and the second session lasted 120 minutes. Each of the sessions had an identical training structure. The first session occurred on campus and was restricted in time to accommodate for class schedules. The second session occurred on the weekend.
This instructional portion of the study provided participants with highly structured and scaffolded activities focused on tasks that allowed for practice of a limited number of components at a time. For example, the beginning focused on procedural knowledge and then introduced other aspects to focus on during Round-Robin practice opportunities with peers. Instruction included a presentation in which participants were presented with clear, measurable goals based on the fidelity rubric, teaching of key concepts, and providing strategies for each goal area. Key concepts covered in the presentation included: (a) rules for basals, ceilings, and test discontinuation, (b) acceptable prompts and acceptable feedback on responses, (c) linguistic concepts related to the specific subtests, such as subjunctive, clitics, and imperfect moods in Spanish, and (d) modeling of screener administration. Specific strategies modeled included rote/scripted phrases to praise participation (e.g., "nice sitting", "thank you for answering"), using a visual timer for pacing, and matching child's response to bolded acceptable responses on the test form when scoring. Subsequent to the presentation, participants again engaged in Round-Robin practice where they took on either the role of the examiner or the client and then reversed. Since there was one additional undergraduate participant whose incomplete data was later excluded, there were two working pairs of examiner/client between the 3 participants and the trainer (first author). Data on the fidelity protocol (Table 2) were collected at times when the participant was acting as the examiner to the trainer (acting as a client). Participants were provided verbal feedback from the trainer immediately following their practice Round-Robin administrations. Participants were only allowed to move to the next phase after achieving a score of 3 (approaching competency) on all fidelity rubric criteria.
Generalization. We examined whether the trained skills transferred beyond practice activities to more realistic testing situations. As such, the participants then completed two additional testing situations, which we refer to as the generalization phase. A week after the instruction phase, the participants each independently administered the complete BESOS to a preschool-aged child while the trainer watched from an observation room. The participants were allowed to review the test protocols and the manual prior to the session. Each child passed the screener and demonstrated no challenges with attention or behavior. The participants were given verbal feedback regarding the four fidelity criteria after their completion of this administration. In order to provide one more opportunity for practice, we introduced additional variability by having the participants administer the BESOS a week later to a novel graduate student that served as a peer actor. The actor was instructed to act like a five-year-old child and present with behavioral challenges for added difficulty. These challenges included a short attention span, age-typical errors, and a preference for play over the test items. Each participant individually administered the subtests to the peer actor and were not provided with instruction or feedback before or during this session. The trainer again observed and collected data from an observation room. Once finished, feedback was provided.
Analyses. Data for each participant were graphed for each of the four targeted skills. Visual analysis guidelines (Clearinghouse, 2014;Cook et al., 2014;Horner et al., 2005) were followed to identify trends or patterns in the data that would indicate achievement of a given criterion (a score of 3) for each target behavior prior to moving to the next phase. Finally, we wanted to know if the effort to increase skills through training sessions resulted in sufficient change to warrant such intensive instruction. Since there were no published guidelines in this area, we compared the number of training hours to the number of skills that reached criterion.

Reliability.
As this was a pilot investigation, we did not collect reliability data for our supervisor ratings. However, we attempted to control for bias through using concrete definitions for each measured behavior and scoring definition. These definitions were consistent with the administration manual for the test.

Results
We graphed performance for both participants by behaviors to visually compare variance in performance between participants and across phases. Specifically, we looked at mean levels of performance, immediacy of effects, any trends in performance, and consistency of those patterns. Results for each behavior are described individually and. Results of data for each participant in each skill area are presented in Figures 2 -5.

Figure 3
Supervisor Ratings for Giving Feedback

Figure 4
Supervisor Ratings for Scoring

Figure 5 Supervisor Ratings for Time of Administration
Protocol Administration. Both participants started with protocol administration skills that were insufficiently present (score of 1) as judged by the supervisor. We saw administration accuracy scores rise quickly for administration skills upon moving into the training phase. These skills were generally maintained upon administration with a child and peer actor across languages for Participant 1 with the exception of one subtest in Spanish. Participant 2 showed more variability in performance but within an acceptable range (score of 3 -4) with the typically developing child but not with a peer actor. The sharp increase in performance for both participants upon training and drop for Participant 2 during generalization phases suggests training was responsible for the increase of performance but not sufficient for both participants to completely maintain skills when the additional load of managing behaviors was introduced.
Giving Feedback. Both participants demonstrated skills that were insufficiently present for giving appropriate feedback of the BESOS. Specifically, they initially gave responses that would clue in the child to their performance when correct (e.g., "good job") and incorrect (e.g., "that's close…"). These skills showed improvement with instruction. Participants quickly picked up how feedback could affect results and began erring toward a lack of feedback. As such, we saw performance for appropriate feedback slope quickly upward following instruction and stay constant across phases and languages for both participants. This trend, consistency of performance, and timing of increase suggest that training was supportive for the increased performance in skills.
Scoring. Both participants started off again with insufficiently present scoring skills during initial sessions. Scoring skills were variable across phases. Participants made errors identifying grammatical concepts such as pronouns, subjunctive forms in Spanish, and past tense forms in Spanish. Again, we saw skills improve with instruction. Participant 2, whose self-rated herself with lower Spanish skills, also demonstrated lower performance on scoring the Spanish subtests. However, both participants did reach criterion scores of 3 in both languages and continued to show growth even when administering to a child and novel peer actor. The slow overall upward trend across phases suggest that training did aid their performance of skills. However, these skills appeared relatively more challenging.
Time. Keeping administration of each subtest within five minutes was an area in which both participants particularly struggled during baseline sessions. Both participants initially took over eight minutes to administer each subtest. Time slowly decreased upon further practice repetitions and inclusion of a timer for self-pacing. Both participants reached criterion of a score of 3 (i.e., subtest administration within two minutes of target time) before proceeding to full administration with a child and peer actors. Performance was maintained in English but we saw variable performance in Spanish, particularly on the Spanish morphosyntax portions for Participant 1. Participant 2's performance dropped initially but then showed some late improvement in Spanish. The timing in variation of performance, specifically the skill increase with instruction and skill decrease at higher levels of challenge, suggest that our instruction was the primary locus of change.
Significance. Separate from our visual analysis of control over target behaviors, we also wanted to assess how much teaching time the supervisor needed to put forth for the participants to achieve the requisite test administration skills. We measured this by taking the total number of trial sessions (28) and dividing by number of subtests (2 subtests x 2 languages x 3 versions = 12). On average, it took the participants 2.33 training trials (including probe trials) to achieve a score of 3 in both languages with a peer actor with the exception of one subtest by participant 1 (Rosa).

Discussion
The ability to accurately administer assessments, including screeners, is essential for acquisition of the knowledge and skills outcomes associated with ASHA required training standards (ASHA, 2008;ASHA, 2021). It is incumbent upon educators to develop training protocols that permit supervisors to train students efficiently and effectively to provide competent services in line with our ethical responsibilities. The purpose of this study was to explore the effect of scaffolded training on the skills of student clinicians when administering a standardized, bilingual screener. The results of this study suggest that our training protocol was supportive for training procedural skills. Both participants showed low baseline performance without instruction and a change across behaviors upon being provided with scaffolded instruction. We also saw skills generalize to additional testing conditions. These findings are consistent with Cognitive Load Theory.
This study uniquely highlights the level of support needed to train clinical skills. Specifically, we saw procedural and scoring skills require less support than training acceptable feedback and time management skills. Time management was a challenge for both participants and remained so throughout all phases. Our observations may be indicative of more general patterns of clinical training. Certain skills and certain measures may require more time and support than others.
This study also highlights the role of language proficiency. Bilingual assessment is not the same as monolingual assessment because language skills vary across domains for clinicians as they do for clients. Specifically, we saw lower language skills that required additional support in the minority language of our participants. Since neither of the participants at any formal Spanish instruction, grammatical concepts and professional jargon were areas of challenge. Understanding how to train bilingual clinicians is exceptionally crucial if we are to meet the needs of our increasingly diverse linguistic communities.
Limitations. Limitations to this study lie first in its design. As we were unable to collect three data points during baseline, it is possible that skills had not reached stability. However, performance was initially at floor levels, and participants informally expressed being completely lost on what to do from reading the manual, even when taking the pre-test. These qualitative accounts coupled with the quantitative data suggest baseline performance was an accurate representation of skills. Second, we had only two participants, and these may have reflected the most motivated learners. A larger pool of volunteers would help establish our observations as a more general phenomenon and allow findings to be generalized to other groups. Finally, no additional measure of reliability was collected for the data presented. This data was itself a measure of fidelity for the larger project. Although unlikely, it may be possible that the main author introduced bias upon scoring across phases.

Conclusion
Future research would benefit from extending our research findings with alternative instructional approaches and additional skills. Our next step is to extend our training protocol to intervention skills with tighter experimental controls. In addition, a subjective rating of the amount of mental effort students give to the tasks, or alternatively, how difficult they perceive the task to be (e.g., Van Gog & Paas, 2008), would aid in ensuring accurate levels of scaffolding. Future studies might also replicate this task with additional measures. Understanding the level of support and instruction that is needed for different types of assessment measures, such Bilingual English Spanish Assessment (BESA; Peña et al., 2018a) in comparison with the Preschool Language Scale -4 Spanish (PLS-4 Spanish; Zimmerman, et al., 2011), might help both developers and supervisors alike in ensuring clinical readiness. Regardless, intervention research that seeks to inform how to train bilingual clinicians is beneficial to all students and the greater needs of our surrounding communities.

Authors Disclosures
Financial Disclosures. Research reported in this publication was supported in part by the National Institutes on Deafness and Other Communication Disorders (NIDCD) under award number R01DC015588.