What if? Why not?

Lately, we’ve been pondering a lot of questions, but these are two that we like a lot. They’re kind of at the heart of why we’ve taken Talk to Me, Goose! in the direction that we have.

  • What if we could connect people with an emotionally rich, natural voice using text-to-speech on a mobile device?
  • What if we could enable people to get access to their own personal voice clone with an accessible tool at an affordable cost?
  • What if we eliminated the awkward pause when someone communicates using a text-to-speech device by leveraging AI to create text on the fly?
  • What might it feel like to reinfuse some humanity into text-to-speech conversations using AI as a catalyst?
  • What if we made it dynamic and increasingly personalized and context-aware?
  • Why can’t we make it fun and easy to use?

These were some of the questions that underpinned the drive to create Talk to Me, Goose! and they continue to drive the ongoing innovation. Have we got it right? Probably not entirely, but we think that these questions and what we’ve built are an important start. If you haven’t downloaded it and tried it out. You should. It’s free to try. Do it now.

Emotionally Rich, Natural Speech

Finding ElevenLabs answered a lot of the biggest questions we had when we started down this path. As the leading provider of voice cloning technology with some of the most realistic sounding, natural human-like voices, they were the logical choice for connecting people to a real voice or to their own voice. Their technology is top-notch. They render speech so naturally that their models even insert filler words very naturally into sentences where they make sense. Questions sound like questions. Pacing varies; there is intonation and rhythm. This formed the foundation of answering the first question as to how to connect to a real voice.

Eliminating the Awkward Pause

We’ve written a lot about the awkward pause. Merlin came about to help with this. We kept thinking that generative AI had to be the answer to creating text on the fly to help people craft meaningful messages from as little input as possible. And, we’ve got a good start. We’re hopeful that Merlin can free up the effort to allow people to re-engage with one another, to listen, to connect and to talk to one another. Not to type. Let’s let Merlin do the typing. And, over time, if Merlin can get to know you, maybe he can be even more proactive with his suggestions and allow you to reconnect even more. Someone asked if we could let Merlin listen in and formulate a response with no typing at all! I guess that would require that in addition to giving Merlin a brain, we would have to give Merlin ears. That’s not on the roadmap right now. But, what if? And, why not?

Re-infusing Humanity with Artificial Intelligence

This was the most important driver for us. How might we infuse some humanity back into the interactions when using text-to-speech? This is most notable when the level of effort to create texts is high. If we can reduce that level of effort by taking very little input to create more reasonably length texts that convey broader meaning that’s great. For example, if with two clicks and one word, “blanket,” we can turn that one word from a stated demand into a pleasant request, “When you have a moment, could you please bring me a blanket?” We’ve used AI to re-infuse some gratitude and humanity into the interaction when it was lost in the demand simply because the effort was too high. That’s critical.

Increasingly Personal, Context-Aware

As Merlin becomes more personal, we’re continuing to ask how to make it more context-aware. How might additional context inform the suggestions that Merlin makes for text? This is one where the questions continue to outnumber the answers. Linking suggestions over time seems like the first step. Suggestion 1 begets Suggestion 2? After that, I don’t know. Even linking temporally seems to be difficult. Linking situationally is also complicated. Nonetheless, Merlin has us thinking. We’re looking for ideas. Give us some below in the comments.

Can’t it be fun?

But, it’s a talking Goose? There’s been some “feedback” about the tone and tenor of the Goose. To this, we say, “Why not?” If we can’t have fun, then what? We think that it’s ok to sometimes use a pirate voice to ask for breakfast, or to speak as a sinister villain when you need to take the dog out. Why not?