What if it was easy to make Amazon’s Echo do anything your computer knows about, or you can make it know about? That’s what I’m going to talk about here.
It’ll be worth your time, I promise.
The Echo is a nice device to have around the house; you can ask it some things that Amazon has set up, you can even launch a few (very few) things via the IFTTT website. But there is a huge, untapped area there, those things that are unique to you and your family.
What if you could leverage the Echo to do… anything!?!?
This post is aimed at both those who want to develop (or are developing) for Amazon Echo, and the actual Amazon Echo development team. Here’s what I’m suggesting:
Basically, this is a port-based I/O API for the Echo. It would work as follows.
Next, the Echo needs a “speak-this” port as well. Right now, looking at my local area network, my Echo is at 192.168.1.106; This provides a specific place to send some text for the Echo to convert into speech. My computer is at 192.168.1.100, and that provides a specific place for the Echo to send messages to me.
So that’s it. A network port on the Echo we can send to, and one where it can receive from us. As a developer, you know this is extremely easy to set up!
Now here’s what I have in mind to do with these ports.
Right now, when the Echo hears you say something, it sends it to Amazon. It is converted into text along the way somewhere, and then they attempt to parse the text into something they can understand. If they succeed, they do whatever needs to be done.
If they don’t succeed, they have the Echo say “I’m sorry, I didn’t understand what I heard” or something along those lines.
We can make this into a wonderful capability with almost no effort at all — not on our part as developers, and not on Amazon’s part in implementing the feature. This would be the new way this would work:
But if, on your echo, you’ve set up the “contact-me” port, instead of telling echo to say “I’m sorry, I didn’t understand what I heard”, it has the Echo fire off the text-converted string it heard, as sent back from Amazon (which they are already doing… you can see it in your phone or tablet based Echo application) to your “contact-me” port.
Now you decide what to do.
Attempt to parse the text, and if you can’t, then you tell Echo to say so via the “speak-this” port.
If you can, then you do whatever it is that is required.
For instance, I could tell Echo, “Alexa, I’ll be at the gym from 5 pm to 6pm”
Or perhaps “Alexa, local command: I’ll be at the gym from 5 pm to 6pm”
Amazon doesn’t understand this, or in the case where you say “local command” it knows it’s not meant to parse it, so it gets sent to my defined “contact me” port, where I parse it, and put that info into a local schedule database. I tell Echo to say “Ok, got it” or perhaps “Ok, you’ll be at the gym from 5pm to 6pm” via the “speak-this” port.
Later, at 5:15pm, my kid asks the Echo “Alexa, do you know where dad is?”
Amazon doesn’t understand this, so it gets sent to my defined “contact me” port, where I parse it, and check my schedule database; then via an automated telnet or SSH session, I tell Echo to say “Dad is at the gym until 6pm” This is so easy to do from the local machine it hardly counts as effort at all.
See how cool that would be? A decent parser is not a problem — I already have that kind of code in Python and would be delighted to share it.
You could say “Alexa, turn on my lights” and whatever kind of custom automation system you have for the lights, the sprinklers, a setup to alter the Ph of your fish-tank, etc., you could easily make it work. The possibilities are endless, and useful to a degree that is difficult to exaggerate.
None of this would depend on manufacturers creating “clouds” or websites that have to be up to interact with your devices; you could command or interrogate a local Raspberry Pi or to something out on the net, anything you needed to do, no limitations at all. That means when the company that makes your thermostat stops supporting it (cough… Google… always dumping support for this and that) it won’t matter. Because you can control it directly.
So there’s the idea. It’s trivial for Amazon to implement; it is easy to use in any way you like on the development / home end; and it is powerful to the point of essentially unlimited.
There are obvious elaborations on the idea, such as Amazon could be more explicit (and just a bit more clunky) by requiring you to say “Alexa, local command: I’ll be at the gym from 5 pm to 6pm” and that way they know not to even try to parse it at Amazon, but just send it along to you. Or it could be an optional phrase, and it will send along anything it doesn’t get, as well.
One benefit of such an approach is that lets you say things that would otherwise be consumed directly by the Echo, which could be useful.
For instance, you might have a significant collection of ZZ Top songs on your home media server. So you might say “Alexa, local command: Play ZZ Top”
Now that goes to your own parser, and you play your media.
Otherwise, if you had just said “Alexa, Play ZZ Top”, it would have come from your Amazon library or Prime music. Using “Local Command:” as a required prefatory phrase, you can have both.
So. If you like this idea, please nudge Amazon on their Echo social media channel, #AmazonEcho, and tweet to/about them something like:
In fact, I’ll set the tweet up for you: