ETEC 500 - Analysis and Critique of a Research Study

See a PDF version of my analysis here.


Research Study to be Analyzed and Critiqued:

Liu, H., & Guo, W. (2025). Effectiveness of AI-driven vocal art tools in enhancing student performance and creativity. European Journal of Education, 60, e70037. https://doi.org/10.1111/ejed.70037



Summary


Purpose


     The purpose of Liu and Guo’s (2025) study was to evaluate how AI-based vocal training tools influence students’ vocal performance and creative output. Specifically, the researchers questioned whether integrating AI technologies into vocal rehearsals and instruction would produce greater technical improvement in singing ability and creativity than traditional methods.


Design and Procedure


    The study involved 158 senior voice students from Yulin Normal and Xiamen Universities in China. Participants completed preliminary assessments of vocal skills and creative thinking abilities followed by a post assessment after 12 weeks. Both pre- and post-tests were two-hour evaluations measuring vocal skill and creativity on five-point scales through both subjective and objective evaluations. To ensure reliability and validity, Liu and Guo (2025) calculated Cronbach’s alpha (0.88, demonstrating strong reliability) and used Pearson correlation coefficients to examine test score relationships.

    Students were then divided into control and experimental groups, balanced by performance and creativity scores from the pre-tests. Both groups received 90 minutes of individual instruction twice weekly for 12 weeks, covering topics such as technique, style, and musicianship. The control group relied on traditional instructional methods with feedback primarily from instructors and assigned materials while the experimental group was split into two subgroups: Subgroup A used Vocal AI Analyzer, which offered real-time technical analysis of singing performance, while Subgroup B used Smart Vocal Coach, which provided adaptive training plans and gamified exercises.

    Creativity was assessed with the sociometric rating index (Griffiths et al., 2021), which combined positive performance elements (enhancing creativity) and negative elements (detracting from creativity) into composite scores across originality, improvisation, and emotional impact. Vocal performance skills was rated five categories:(intonation, diction, dynamics, performance technique, emotional expression).

Findings


    Over the 12 weeks, results showed that the experimental group improved by an average of +1.0 in vocal skills compared to the control group. Within the experimental group, Subgroup B (Smart Vocal Coach) showed a creativity increase of +1.08, while Subgroup A (Vocal AI Analyzer) improved by +0.25. Researchers confirmed the statistical significance of these results through statistical tests t-test and ANOVA. As Liu and Guo (2025) go on to report on the findings:

“The vocal skills indicators for students in the control group improved from 3.4 to 3.7 1… In contrast, students in the experimental group showed a significant improvement from 3.5 to 4.5 2… The creativity level for students in the Control Group rose from 2.8 to 3.0 3… Significant improvement was observed in the Experimental Group, with a rise from 2.9 to 4.1 4… The t-test and ANOVA results indicate that the differences between the groups are statistically significant 5… with a more pronounced effect in Subgroup B (Smart Vocal Coach) than in Subgroup A (Vocal AI Analyzer) 6.”


    Overall, the findings demonstrate gains in both vocal skills and creativity categories for students receiving AI-assisted instruction with the Smart Vocal Coach subgroup showing the strongest improvements in creativity. The study goes on to speculate that improvements in the creativity category are likely due to the AI technologies adaptive, gamified design and ongoing technical feedback. While findings over 12 weeks showed an increase of vocal skill and ability among students, questions about the longevity of these gains and overall usability of these AI tools in achieving similar outcomes in long-term educational contexts persist.


Evaluation and Critique

Objective


    The study sought to determine whether AI tools improve vocal skill and creativity outcomes compared to traditional methods by tracking participants’ progress across 12 weeks and comparing experimental and control groups.

Alignment and Design


    The researchers employed a quantitative, quasi-experimental, matched comparison group design. Pre- and post-test assessments provided empirical data for statistical analysis of changes in vocal skill and creativity. This design aligns well with the study’s purpose of comparing AI-assisted and traditional instruction.

    While students were allocated to groups to achieve equivalence based on initial assessments, the study did not clearly describe how distribution occurred relative to specific categories or average scores. Greater transparency regarding group assignment or the use of randomization using similar scores would have strengthened internal validity for this study.

    Overall, the study aligns with Creswell’s (2012) postpositivist emphasis on empirical measurement and statistical methods (t-tests, ANOVA) to establish cause-effect relationships between instructional method and outcomes.

Sampling


    One point the researchers make regarding the limitations of the sampling of the study was that “this sample is fairly representative of the specific university, [but that] its size limits the generalizability of the results to a broader population of students from other universities or cultural contexts. Cultural and educational characteristics of the region may influence the outcomes restricting the applicability of the findings to other countries or regions” (Liu & Guo, 2025, p. 6).

    Researchers also rationalized the selection of fourth- and fifth-year music students by stating that they “had trained voices and understood the peculiarities of musical theory and the skills of variation of musical elements in practice” (Liu & Guo, 2025, p. 3). It would be interesting to widen the pool of participants towards students with more diverse musical backgrounds in order to more closely align with the study’s objective to a more generalized demographic.

    Beyond being a fourth- or fifth-year music student, there were no other particular sampling limitations identified in the study, leaving other recruitment details and the potential for self-selection biases (like motivation to practice, or prior experiences with the technology) in the study’s sampling uncertain. As Suter (2012) notes, a clear and transparent sampling rationale is essential to evaluating external validity, and so in this case, while the rationale for targeting advanced students is briefly stated in the study, the lack of detail on the recruitment process may weaken the study’s alignment towards Creswell’s (2012) postpositivist framework as it risks uncontrolled variables influencing the findings.

Data Collection and Analysis


    Data collection involved instructor-administered pre- and post-tests measuring vocal skills and creativity. While the reliability of the study was reported as high (with a Cronbach’s alpha around .90), the study provided minimal detail on the type of validity measured or the procedures used to establish it procedures.

     While the use of both objective and subjective criteria in the pre- and post-test assessments align with the research’s purpose, concerns are raised towards the potential for evaluator bias. As ratings of musical performance are inherently subjective in their own regard, the fact that the study did not report measures for inter-rater reliability may be problematic. Similarly, while the study stated instructors were trained to use AI tools, little information was provided about that training process, the instructors openness towards the technology, or their own personal competencies and effectiveness using that technology as a tool to individually instruct students. While the study reports that the sociometric rating index (Griffiths et al., 2021) was used to help evaluators define and identify creativity, further justification and clarity of this measurement construct would increase the study’s credibility.

    The study also did not clarify whether how or if AI algorithms accounted for linguistic or cultural variations in it’s usage, or whether the technology was calibrated for microtonal variations in pitch or equal temperament tuning systems. In order to mitigate evaluator bias in the analysis of the data, researchers ensured pre- and post-tests were scored by instructors not involved in teaching the experimental groups. While this step helped promotes impartiality between evaluators, the absence of inter-rater reliability measurements remain a limitation in the overall generalizability of the findings.

Trustworthiness


    Even though Liu and Guo’s study offers reasonable credibility through its pre-test and post-test design method and use of impartial evaluators to reduce internal bias, the non-random participant allocation to control and experimental groups leaves room for selection bias. As Cresswell (2012) had pointed out, without random assignment there is limitations to the certainty that improvements are due to the intervention (of AI tools) alone. In addition to that, alternative explanations such as novelty effects or teacher’s enthusiasm towards technology could be other variables to be considered in the study. Suter (2012) had explained that educational interventions and their design have the potential to fail in separating the technology effect from other influential factors such as instructor behaviour/attitudes, or changes in student’s motivation. From a methodological perspective, both Suter and Cresswell’s emphasis on rigorous research design and transparency highlight limitations in this study. Furthermore, questions about construct validity, particularly whether the study fully captures the subjective nature of “creativity” in vocal performance, may also be a limiting factor in the trustworthiness of the conclusions presented about creativity gains.

Personal Reflection


    In my experiences leading a secondary school choir and working with musical theatre voice students, I haven’t given much attention to AI-based tools in my past practice. In completing this reflection and critique of Lui & Guo (2025), I feel that my current philosophy towards AI-based tools has shifted towards the possibility of tooling myself with this technology in order to order help develop the muscle memory or audiation skills associated with a particular objective. Perhaps the role of AI-based tools in music education may be best utilized towards targeting specific technical areas while actively receiving visual and audible real-time feedback. I am under the impression that the guided-but-independent practice environment AI-based tools offer have the potential to help learners establish consistent routines, refine repetition-based skills, while also helping the student identify and correct poor practice habits. That said, I remain skeptical that an over-reliance on AI-based instruction for vocal performance may create long-term ensemble-based musicality problems such as the blending, balancing, or diction, instead choosing to favour developing the student’s own individualism over traditional styles and genres of vocal music. I would have been curious to have read more about student’s own perceptions of individual growth and what their opinions on this research might have been.

    Reflecting on the choice of AI-based tools by the researchers for this study, I had troubles finding any information regarding either of these tools outside of this study and am unsure as to whether I would ever try and utilize these tools myself as a music instructor. With that said, as technology continues to advance and researchers work towards testing-the-waters with AI tools replacing traditional instructional techniques in classrooms and individual learning contexts, I continue to be interested in exploring how instructors can further harness technology, motivation and student engagement factors in relation to tech-tools in order to maximize the development of musical proficiencies in students.

Peer Feedback


    I was thankful for the opportunity to have peer feedback as part of process in this comparative analysis of Liu & Guo (2025) as many of the points that my colleagues raised were valuable to me and the editing of my work. The comments provided by my peers were valuable in shaping and refining what I had previously wrote in my draft by offering different perspectives that challenged my thinking and helped me strengthen the quality of my critical review and academic writing. I am appreciative that this assignment allowed for this collaborative editing process as I found the feedback I was provided invaluable.

Disclaimer: I disclose that my submission to this graded work includes material generated with the assistance of OpenAI’s ChatGPT version 4o. This AI tool was used for organizing and clarifying ideas, and refining my language. All ideas and final wording reflect my own understanding and academic judgments. No AI-generated content has been submitted verbatim


Footnotes

  1. The control group showed only small improvements in vocal skills (3.4 – 3.7). This seems normal since traditional training often emphasizes repetition over innovation.
  2. The experimental group’s progress (3.5 – 4.5) is a much larger increase. AI-assisted tools may have accelerated improvements by providing more immediate and individualized feedback, potentially helping fine motor skills and muscle memory.
  3. Creativity in the control group improves only slightly (2.8 – 3.0). This suggests that traditional teaching may not nurture creativity as effectively as technical ability.
  4. Creativity in the experimental group increased substantially (2.9 – 4.1), likely due to AI encouraging experimentation without fear of failure. Individual dynamics between instructor and student may also influence creativity growth.
  5. Statistical tests (t-test, ANOVA) confirm the differences between groups are statistically significant and not random.
  6. The experimental subgroup using Smart Vocal Coach improved more than the subgroup using Vocal AI Analyzer, suggesting real-time guidance is more effective than technical feedback alone.