Nitro — a Profolio Template made for Framer

CAS

A mobile experience designed to help children improve speech and emotional recognition through guided interactions, intuitive UI, and engaging, user-centered design.

problem

Children with CAS need consistent, high-frequency speech practice, but limited access to therapy sessions creates a gap in learning. There is a need for an accessible, engaging solution that supports continuous practice beyond traditional therapy settings.

solution

A hybrid therapy solution combining a 3D avatar with enhanced visual cues, real-time feedback, and guided prompts to improve both engagement and effective speech learning—especially for mild to moderate CAS.

Introduction

Childhood Apraxia of Speech is a congenital motor-speech disorder. It affects the child's ability to learn languages and pronounce correctly.

Fast Facts about CAS Therapy

Considering the pain points, accessibility challenges, and therapy limitations, we arrived at the following “How Might We” statement.

Since many children with Childhood Apraxia of Speech lack consistent access to therapy, we want to design a hybrid solution that makes speech learning more engaging, accessible, and supportive.

Why a 3D Model Was the Answer?

Per the initial hypothesis - an integration of the 3D model is a promising solution. As it has the potential to visually cue the child for the pronunciation of the sounds and words.

How We Built the Model?

Metahuman Creator

We used a meta-human creator to build a high fidelity 3D avatar / model. Extra care was practiced to ensure that the model does not exhibit the qualities of uncanny valley, and looks friendly.

Challenge: Limited users to test the application. We hit a limitation realizing that testing the application with children has legal complications since they belong to the sensitive demographic and getting all the legal compliancy was not fitting in our timeline, therefore we decided to test with the parents of children suffering from CAS, since if the parents themselves are apprehensive of the idea then they will never share it with their children anyways.

Facial Animation

Nvidia Audio2Face is a facial animation software, which works on their Tensor RT Engine, we leveraged it for the 3D model's facial expressions and audio output.

Voice Integration - NVIDIA Audio2Face

The Challenges We Faced

'Users having strong reservations about the application'

Our Guiding Assumptions

Hypothesis: We expect that visual cues through the 3D model will be of aid to our users, augmenting their speech therapy experience.
Hypothesis: We expect that the experience would augment with the addition of a 3D avatar as it will guide the child, making the therapy process more engaging.

Testing and Iterating Early Ideas

The assessment was designed in a way to be able to find maximum answers towards our key areas of focus, by dividing it into two phases.

Challenge: Latency and asynchronization between lip movement and audio

User Interviews - Phase I

The user interviews were conducted to understand the acceptance of technology among our participants especially when it is used for delivering therapy.

Think-Aloud - Phase II

The think-aloud was designed keeping in mind the two key areas of focus and to shed light on them as much as possible.

The People Who Made it Happen

Refining through Feedback

Second iteration is grounded in the design directions obtained from the formative assessment. Redesign observes changes in the color palette - making it welcoming towards children. Other changes include storytelling and animations. Challenge: Reservations from the SLP.

Design Directions

Updated color palette to make the platform friendly and welcoming.
Changed secondary font to Nunito to improve readability and legibility.
Added a mic button to guide users when to start speaking, and move to the next step.
Removed the bottom bar to increase the overall engagement of the application.

Refining through Feedback

Storytelling is an effective method to execute the last step of the app-based therapy - spontaneous production. As it is better if the word is based on the story, which can be prompted later through a question.

The Framework We Created

Style Guide

Color Palette

After the formative assessment, many of our participants highlighted the need of a color palette which is ‘soft’ and ‘easy on the eyes’.

Typography

Fredoka has a playful and rounded design, enhancing friendliness of the app.

Nunito has a good x-height increasing its legibility and readability.

Motion Design

Appropriate animations have been used to engage the users and target their attention towards the call-to-action whenever required.

Iconography

Google material design rounded icons were used to keep a harmony between the fonts and the icons, and to provide a friendly user interface for children.

What Comes Next?

The future-scope aims to build on the foundations, focusing on the integration of LLM to produce speech along with animating the Metahuman to effectively cue the child. Challenge: Increased user hesitance

NVIDIA Nemo - ASR

Due to limited time we were not able to integrate the pre-trained speech recognition model with our 3D avatar, unable to test the random prompt generation. Third iteration will accomplish it enhancing the cueing abilities of the 3D avatar.

Full Body Animation

We plan to make a full body animation of the current model, as it can guide the child through hand gestures making it intuitive for them to understand speech production.

Lessons We Learned

The year long study led to many significant discoveries on the remote implementation of speech therapy, leading to the following learnings-

Child-Computer Interaction : Echo, primarily deals with children. I was motivated to learn about the psychology of children and how they perceive digital interfaces. Duolingo Kids served as a good platform, while understanding the principles of designing UI for children.

Conversational Animation: As I integrated the voice animation - lip movement and voice prompts, it led me to understand how latency and other nuances can affect the user experience.

Understanding the Uncanny

The integration of the 3D Metahuman instigated the uncanny valley effect. Incorporating subtle animations like blinking, facial muscle movement (brow muscles) ensured the friendliness perception of the 3D Avatar.

year

Oct 23 - May 25

timeframe

1.5 years

tools

Unreal Engine, Figma, Spline and Adobe Illustrator