ETEC 500 - Analysis and Critique of a Research Study • Jeremy Sheeshka

Research Study to be Analyzed and Critiqued:

Liu, H., & Guo, W. (2025). Effectiveness of AI-driven vocal art tools in enhancing student performance and creativity. European Journal of Education, 60, e70037. https://doi.org/10.1111/ejed.70037

Summary

Purpose

The purpose of Liu and Guo’s (2025) study was to evaluate how AI-based vocal training tools influence students’ vocal performance and creative output. Specifically, the researchers questioned whether integrating AI technologies into vocal rehearsals and instruction would produce greater technical improvement in singing ability and creativity than traditional methods.

Design and Procedure

The study involved 158 senior voice students from Yulin Normal and Xiamen Universities in China. Participants completed preliminary assessments of vocal skills and creative thinking abilities followed by a post assessment after 12 weeks. Both pre- and post-tests were two-hour evaluations measuring vocal skill and creativity on five-point scales through both subjective and objective evaluations. To ensure reliability and validity, Liu and Guo (2025) calculated Cronbach’s alpha (0.88, demonstrating strong reliability) and used Pearson correlation coefficients to examine test score relationships.

Students were then divided into control and experimental groups, balanced by performance and creativity scores from the pre-tests. Both groups received 90 minutes of individual instruction twice weekly for 12 weeks, covering topics such as technique, style, and musicianship. The control group relied on traditional instructional methods with feedback primarily from instructors and assigned materials while the experimental group was split into two subgroups: Subgroup A used Vocal AI Analyzer, which offered real-time technical analysis of singing performance, while Subgroup B used Smart Vocal Coach, which provided adaptive training plans and gamified exercises.

Creativity was assessed with the sociometric rating index (Griffiths et al., 2021), which combined positive performance elements (enhancing creativity) and negative elements (detracting from creativity) into composite scores across originality, improvisation, and emotional impact. Vocal performance skills was rated five categories:(intonation, diction, dynamics, performance technique, emotional expression).

Findings

Over the 12 weeks, results showed that the experimental group improved by an average of +1.0 in vocal skills compared to the control group. Within the experimental group, Subgroup B (Smart Vocal Coach) showed a creativity increase of +1.08, while Subgroup A (Vocal AI Analyzer) improved by +0.25. Researchers confirmed the statistical significance of these results through statistical tests t-test and ANOVA. As Liu and Guo (2025) go on to report on the findings:

“The vocal skills indicators for students in the control group improved from 3.4 to 3.7¹… In contrast, students in the experimental group showed a significant improvement from 3.5 to 4.5²… The creativity level for students in the Control Group rose from 2.8 to 3.0³… Significant improvement was observed in the Experimental Group, with a rise from 2.9 to 4.1⁴… The t-test and ANOVA results indicate that the differences between the groups are statistically significant⁵… with a more pronounced effect in Subgroup B (Smart Vocal Coach) than in Subgroup A (Vocal AI Analyzer)⁶.”

Overall, the findings demonstrate gains in both vocal skills and creativity categories for students receiving AI-assisted instruction with the Smart Vocal Coach subgroup showing the strongest improvements in creativity. The study goes on to speculate that improvements in the creativity category are likely due to the AI technologies adaptive, gamified design and ongoing technical feedback. While findings over 12 weeks showed an increase of vocal skill and ability among students, questions about the longevity of these gains and overall usability of these AI tools in achieving similar outcomes in long-term educational contexts persist.

Evaluation and Critique

Objective

The study sought to determine whether AI tools improve vocal skill and creativity outcomes compared to traditional methods by tracking participants’ progress across 12 weeks and comparing experimental and control groups.

Alignment and Design

The researchers employed a quantitative, quasi-experimental, matched comparison group design. Pre- and post-test assessments provided empirical data for statistical analysis of changes in vocal skill and creativity. This design aligns well with the study’s purpose of comparing AI-assisted and traditional instruction. While students were allocated to groups to achieve equivalence based on initial assessments, the study did not clearly describe how distribution occurred relative to specific categories or average scores. Overall, the study aligns with Creswell’s (2012) postpositivist emphasis on empirical measurement and statistical methods (t-tests, ANOVA) to establish cause-effect relationships between instructional method and outcomes.

Sampling

One point the researchers make regarding the limitations of the sampling of the study was that “this sample is fairly representative of the specific university, [but that] its size limits the generalizability of the results to a broader population of students from other universities or cultural contexts." (Liu & Guo, 2025, p. 6). Researchers also rationalized the selection of fourth- and fifth-year music students by stating that they “had trained voices and understood the peculiarities of musical theory and the skills of variation of musical elements in practice” (Liu & Guo, 2025, p. 3).

Data Collection and Analysis

Data collection involved instructor-administered pre- and post-tests measuring vocal skills and creativity. While the reliability of the study was reported as high (with a Cronbach’s alpha around .90), the study provided minimal detail on the type of validity measured or the procedures used to establish it procedures. Concerns are raised towards the potential for evaluator bias; the absence of inter-rater reliability measurements remains a limitation in the overall generalizability of the findings.

Trustworthiness

Even though Liu and Guo’s study offers reasonable credibility through its pre-test and post-test design method and use of impartial evaluators to reduce internal bias, the non-random participant allocation leaves room for selection bias. As Cresswell (2012) had pointed out, without random assignment there is limitations to the certainty that improvements are due to the intervention alone. Alternative explanations such as novelty effects or teacher’s enthusiasm towards technology could be other variables to be considered.

Personal Reflection

In my experiences leading a secondary school choir and working with musical theatre voice students, I haven’t given much attention to AI-based tools in my past practice. I feel that my current philosophy towards AI-based tools has shifted towards the possibility of tooling myself with this technology in order to help develop the muscle memory or audiation skills associated with a particular objective. Reflecting on the choice of AI-based tools by the researchers for this study, I had troubles finding any information regarding either of these tools outside of this study. I continue to be interested in exploring how instructors can further harness technology to maximize musical proficiencies.

Peer Feedback

I was thankful for the opportunity to have peer feedback as part of process in this comparative analysis of Liu & Guo (2025) as many of the points that my colleagues raised were valuable to me and the editing of my work. I am appreciative that this assignment allowed for this collaborative editing process as I found the feedback I was provided invaluable.

Disclaimer: I disclose that my submission to this graded work includes material generated with the assistance of OpenAI’s ChatGPT version 4o. This AI tool was used for organizing and clarifying ideas, and refining my language. All ideas and final wording reflect my own understanding and academic judgments. No AI-generated content has been submitted verbatim.

Footnotes

The control group showed only small improvements in vocal skills (3.4 – 3.7). This seems normal since traditional training often emphasizes repetition over innovation.
The experimental group’s progress (3.5 – 4.5) is a much larger increase. AI-assisted tools may have accelerated improvements by providing more immediate and individualized feedback.
Creativity in the control group improves only slightly (2.8 – 3.0). This suggests that traditional teaching may not nurture creativity as effectively as technical ability.
Creativity in the experimental group increased substantially (2.9 – 4.1), likely due to AI encouraging experimentation without fear of failure.
Statistical tests (t-test, ANOVA) confirm the differences between groups are statistically significant and not random.
The experimental subgroup using Smart Vocal Coach improved more than the subgroup using Vocal AI Analyzer, suggesting real-time guidance is more effective.

References

Creswell, J. W. (2012). Educational Research: Planning, Conducting, and Evaluating Quantitative and Qualitative Research (4th ed.). Pearson.

Liu, H., & Guo, W. (2025). Effectiveness of AI-driven vocal art tools in enhancing student performance and creativity. European Journal of Education, 60, e70037. https://doi.org/10.1111/ejed.70037

Suter, W. N. (2012). Introduction to Educational Research: A Critical Thinking Approach (2nd ed.). Sage.