Earable SSI | Xuefu Dong

(a) ReHEarSSE uses a novel earbud-based ultrasonic sensing method to infer silently spelled words, even if they are not in the training lexicon. (b) ReHEarSSE can be used while interacting with an extended-reality device for hands-free text input. (c) ReHEarSSE can also be used on-the-go while users’ hands are unavailable or inconvenient for text entry on a smartwatch or smart eyewear

Silent speech interaction (SSI) allows users to discreetly input text without using their hands. Existing wearable SSI systems typically require custom devices and are limited to a small lexicon, limiting their utility to a small set of command words. This work proposes ReHEarSSE, an earbud-based ultrasonic SSI system capable of generalizing to words that do not appear in its training dataset, providing support for nearly an entire dictionary’s worth of words (Dong et al., 2024). As a user silently spells words, ReHEarSSE uses autoregressive features to identify subtle changes in ear canal shape. ReHEarSSE infers words using a deep learning model trained to optimize connectionist temporal classification (CTC) loss with an intermediate embedding that accounts for different letters and transitions between them. We find that ReHEarSSE recognizes unseen words with an accuracy of 89.3 ± 10.9%.

(ab) potential threats against SSI usage; (c) HEar-ID uses multi-task learning (MTL) to authenticate user identity and reliably infer silent speech;

While users tend to prefer SSI over traditional speech recognition in public contexts, prior SSI-related works hardly consider the safety issues. While normal voice-based authentication mitigates these risks by verifying the speaker’s identity before granting access, it can be vulnerable to replay and injection attacks (e.g., triggering Siri via a loudspeaker), leaving an imperative need for developing a reliable SSI system. We analyzed and found that the silent speech recognition task and the speaker authentication task are correlated rather than independent. The inherent structure uniqueness of each individual’s ear canal creates distinct acoustic propagation paths, so that subtle ear canal deformations encode both the utterance content and speaker identity.

In this work, we enable reliable SSI by proposing HEar-ID(Dong et al., 2025), which only leverages a commodity active noise-canceling earbud to emit an inaudible OFDM signal and record both ultrasonic reflections and whisper audio to enable silent spelling input (e.g. /i: eI Ar/ for the word “ear”) and user verification with a single machine learning model. In the preliminary experiments, HEar-ID consistently delivered promising results: for 11 participants, the system reliably rejected impostors with a false positive rate (FPR) of 3.2% and a true positive rate (TPR), accompanying 90.25% Top-1 word recognition accuracy for eight of them.

References

2025

UbiComp’25

🏆 Poster: Recognizing Hidden-in-the-Ear Private Key for Reliable Silent Speech Interface Using Multi-Task Learning

Xuefu Dong, Liqiang Xu , Lixing He , and 6 more authors

In UbiComp Companion ’25 , Oct 2025

Bib

@inproceedings{hearid,
  author = {Dong, Xuefu and Xu, Liqiang and He, Lixing and Han, Zengyi and Kenneth, Christofferson and Chen, Yifei and Taya, Akihito and Nishiyama, Yuuki and Sezaki, Kaoru},
  title = {🏆 Poster: Recognizing Hidden-in-the-Ear Private Key for Reliable Silent Speech Interface Using Multi-Task Learning},
  booktitle = {UbiComp Companion '25},
  year = {2025},
  month = oct,
  pages = {5 pages},
  address = {Espoo, Finland},
  publisher = {ACM},
  doi = {10.1145/3714394.3754429},
  isbn = {979-8-4007-1477-1/2025/10},
}

2024

ReHEarSSE: Recognizing Hidden-in-the-Ear Silently Spelled Expressions

Xuefu Dong, Yifei Chen , Yuuki Nishiyama , and 4 more authors

In Proceedings of the CHI Conference on Human Factors in Computing Systems , Oct 2024

Acceptance Rate 26.3%

Abs Bib PDF Video

Silent speech interaction (SSI) allows users to discreetly input text without using their hands. Existing wearable SSI systems typically require custom devices and are limited to a small lexicon, limiting their utility to a small set of command words. This work proposes ReHEarSSE, an earbud-based ultrasonic SSI system capable of generalizing to words that do not appear in its training dataset, providing support for nearly an entire dictionary’s worth of words. As a user silently spells words, ReHEarSSE uses autoregressive features to identify subtle changes in ear canal shape. ReHEarSSE infers words using a deep learning model trained to optimize connectionist temporal classification (CTC) loss with an intermediate embedding that accounts for different letters and transitions between them. We find that ReHEarSSE recognizes 100 unseen words with an accuracy of 89.3%.
@inproceedings{rehearsse, author = {Dong, Xuefu and Chen, Yifei and Nishiyama, Yuuki and Sezaki, Kaoru and Wang, Yuntao and Christofferson, Ken and Mariakakis, Alex}, title = {ReHEarSSE: Recognizing Hidden-in-the-Ear Silently Spelled Expressions}, year = {2024}, isbn = {9798400703300}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3613904.3642095}, doi = {10.1145/3613904.3642095}, booktitle = {Proceedings of the CHI Conference on Human Factors in Computing Systems}, articleno = {321}, numpages = {16}, keywords = {acoustic sensing, autoregressive model, earable computing, silent speech interface, text entry}, location = {<conf-loc>, <city>Honolulu</city>, <state>HI</state>, <country>USA</country>, </conf-loc>}, series = {CHI '24}, note = {Acceptance Rate 26.3\%}, }