Interfacing at the Speed of Thought

15Dec
2015

There are a lot of interfaces that irk me, not because they’re poorly designed in general, but because they don’t interface well with my brain. In particular, they don’t interface well with the speed of brains. The best interfaces become extensions of your body. You gain the same direct control over them that you have over your fingertips, your eyes, your tongue in forming words.

This essay comes in two parts: (1) why this is an issue and (2) advice on how to make the best of what we’ve got.

Feedback and control systems in the body

One thing that characterizes your control over your body is that it (usually) has very, very good feedback. Probably a bunch of kinds you don’t even realize exists. Consider that your muscles don’t actually know anything about location, but simply exerting a pulling force. If all of the information you had were your senses of sight and touch-against-skin, and the ability to control those pulling forces, it would be really hard to control your body. But fortunately, you also have proprioception, the sense that lets you know where your body is, even if your eyes are shut and nothing is touching. For example, close your eyes and try to bring your finger to about 2cm (an inch) from your nose. It’s trivially easy.

One more example that I love and then I’ll move on. Compensatory eye movements. Focus your gaze at something at least two feet away, then bobble your head around. Tried it? Your brain has sophisticated systems (approximating calculus that most engineering students would struggle with) that move your eyes exactly opposite to your head, so that whatever you’re looking at remains in the center of your gaze and really quite incredibly stable even while you flail your head. This blew my mind when I first realized it.

The result of all of these control systems is that our bodies kind of just do what we tell them to. As I type this, I don’t have to be constantly monitoring whether my arms are exerting enough force to stay levitated above my keyboard. I just will them to be there. It’s beyond easy―it’s effortless.

Now, try willing your phone to call your friend. You’re allowed to communicate your will using your voice, your hands, whatever. Why does it take so many steps, or so much waiting?

Case Study: Voice & Gesture

A lot of people are trying to design interfaces that are controlled by voice. Well, a lot of people that are mostly concentrated at a few major companies, because voice is hard. Then there are also gestures, with smaller companies releasing products such as the LeapMotion and Myo that promise to let us control… well, basically everything, using hand movements.

This is all great… in theory. But so far, voice and gesture recognition are still not accurate and consistent enough for you to be able to rely on your own ears and the physics of the room, or on your own proprioception, to know that you’ve correctly communicated your desires to the machine. You know you said your friend’s name correctly, but you don’t know if the interface will correctly parse it. If what you’re doing is artistic or fluid, like trying to draw in the air, then gesture control may be fine, but if you’re trying to give a specific command, such as “next slide” or “pause the music”, it’s annoying. You don’t have immediate feedback, so you spend a period wondering if the gesture worked.

On the voice end of things, Google Now is supposed to be able to answer various questions for you, sometimes before you even realize you have them. To partially replace a personal assistant. And it does this… somewhat. But say you want to know something like “when’s the next bus coming?” If you imagine an actual physically-present human personal assistant standing in the same room as you, and that human had the following properties (all of which Google Now has)

knew where you were (obviously)
knew where you were going (from google calendar or whatever)
and had a 3G connection from their brain to Google

then they could immediately respond:

“the 7D bus is leaving from King & Victoria in three minutes and even if you sprint, it’ll take you at least five.”

That is what voice commands want to be. Where you don’t have to worry about the other end asking a bunch of questions to clarify. As it is, Google Now can’t handle this, and so it ends up being much worse than a simple app like Transit. Transit is simple: you open it and it shows you the nearest bus times at the stops closest to you.

Taking a more realistic example, you’d basically never have to worry about this hypothetical human assistant not recognizing the name of the person you want to call, assuming that they have access to a list of your contacts, like Google Now does. If you try to ask Google Now to call someone, there’s a long lull where it keeps listening (and picks up background noise), and then it confirms the name, then a 5s delay which I’m pretty sure is just a chance for you to cancel in case it messed up. This is not an interface that works at the speed of thought. It may require less thinking than typing, on a touchscreen at least (or if your hands are occupied) but if this is a friend you call often, you may be better off with a speed-dial menu that shows you 6 faces and you tap one.

It is worth noting that the human assistant I suggested above would use as much energy in a day as an iPhone does in 8 months (thanks Wolfram Alpha) which gives a bit of perspective to the idea that your phone isn’t able to be constantly listening for audio and has to be in certain states to be able to listen. But still. It makes the experience not feel smooth.

How To Become a Cyborg in 2015

The word “cyborg” probably brings up the idea of having hardware connected directly to your neurons. But on top of the technical hurdles to actually doing that, it turns out that human brains are not actually optimized for you having direct control over the behaviour of individual neurons (or even clusters of hundreds of neurons). If we’re going to have interfaces that work at the speed of thought, our best bet is to output those thoughts via our bodies (particularly the hands and mouth) which are already very well-wired to the brain.

So given that part of the interface is going to include voice and mouth, how can you leverage these to get the fastest, most direct control of devices?

Much of this is going to be done by increasing the amount of feedback you can get. Some of it is also just making the interface be better suited for the kinds of things you want to do, so you can issue more intuitive commands.

Computers

First let’s talk about the mouse, versus the keyboard. We want the fastest interface between thought/intention and getting the machine to do what we want it to do. Despite clicking on graphic user interfaces being a lot more easy-to-learn, the keyboard (thanks to its rapid feedback) is able to bury itself deeper into your muscle memory in the medium-to-long term, thus shortening the time and number of steps between intention and outcome. That is, you don’t need to wait for things to show up on the screen because you can feel if you’ve pressed a key… by contrast, you can’t just know by the movement of your hand if you’ve moved the mouse to the exact spot on screen. Therefore…

Learn to touch-type. Lots of people have written extensively on this, but I need to include it because it underpins everything else here. In order to have rapid communication with your keyboard, you need to be able to know simply via proprioception and your sense of touch which keys you’ve pressed.

Learn typing-shortcuts. Once you can touch type, learn to manipulate not just letters but entire words. I initially started this sentence with the words “Your brain doesn’t” but then I realized that I wanted to start it differently. But at that point, my thought wasn’t “I want to delete those 18 characters (press backspace 18 times).” It was “I want to delete those 3 words (press ctrl+backspace 3 times).” The latter action is one you can just do, and move on. Your brain knows it’s three without having to keep track, so you don’t need to count or check to make sure you got the right number. Similarly, pressing ctrl+left or ctrl+right will move the cursor an entire word in that direction. Ctrl+shift+direction will select entire words, and so on. Home & end (or cmnd-left & cmnd-right on macs) will jump to the start/end of a line. Play around with these. They’re amazing.

Learn more keyboard shortcuts. Say you’re in gmail, writing an email to someone, and you’ve just typed the words “an article I wrote”, and you want to linkify those words so that they point at the article. What do you do? Here’s what I do:

press ctrl+left four times to select the words
press ctrl+k to bring up the “Add hyperlink” dialog
press ctrl+t to open a new tab
type the shortest phrase that will cause the page to be in chrome’s dropdown
press down then enter to start loading the page
press ctrl+l (lowercase L) to select the url (while the page is still loading)
press ctrl+c to copy
press ctrl+w to close the new tab (it probably hasn’t even loaded yet)
press ctrl+v to paste into the linky box
press enter

I just timed myself at 5.6 seconds (using an audio recording where I clapped then did the thing then clapped again). Most people would probably take at least three times as long. But the point isn’t just the time. It’s the fact that for me, it’s about as easy as thinking “I want to linkify those words… what’s the url again?” Whereas for most people, it involves a bunch of poking at stuff, and aiming their mouse so that it hits the right buttons.

Give yourself extra keyboard shortcuts. Want to navigate the web without having to swing your mouse around to click on links? Get vimium, which adds extra hotkeys to chrome. You can then press the letter f and every link or button on the page will have a little 1-2 letter code appear above it. Type those characters, and the link opens. It sounds weird, and it is. But because hands+keyboard has a better feedback loop than hand+mouse, once you’re familiar with it it lets you tell your body “click that link” in a way that you couldn’t before.

You can also use text expansion software like AutoHotKey (windows), AutoKey (linux) or Text Expander (mac) to insert commonly-typed phrases. I use this so that if eg I want to link to my startup’s signup page instead of thinking “and now I type h t t p colon slash slash w w w dot complice dot co slash signup” I just think “complice site slash signup” and I type hco/signup and the hco gets expanded to https://complice.co/. This is a little different than many of the other suggestions, but I figured it was worth including because of how it makes “insert the complice url” into a single act rather than a bunch of typing.

Also related: these simple quicksearches that let you go directly to facebook or gmail search results, bypassing your news feed or inbox. You don’t need to make small-talk with facebook’s interface.

Advanced: automate common commands. I use Beeminder extensively, and their interface is okay but it’s still not cyborgy. I want to be able to tell my computer “I did 30 pushups” without having to bring up the Beeminder website and click on things. For unrelated reasons, I had just created a Beeminder Node.js integration, so I got it set up so that now if I want to tell Beeminder about my pushups I just press a few keys and it’s done.

I don’t really recommend manually automating something unless it’s a serious pain-point for you, but in many cases you can find tools that exist already. For example, there’s also a command-line Twitter client called ‘t’. I wrote a tiny four line script that would let me post a Complice “user visible improvement” to both Twitter and Beeminder simultaneously. Now it’s as simple as saying “hey internet, I just pushed this new feature”.

Palette Gear: knobs and sliders for your computer As I ranted about above, I’m kind of annoyed at gesture startups, but I’m stoked about a company called Palette, which makes extra hardware that you can use to control your computer. For instance, if you’re editing photos, set one slider to control the saturation, a knob to adjust the size of your brush, and a button to toggle between modes. You can use these with your hands and get consistent action-level feedback and take full advantage of muscle memory and so on.

Phones

Unfortunately, I don’t have as many examples here, in large part because there’s no hardware keyboard with really rapid feedback. Here are a few ways I’ve compensated for that:

I got a Pressy. Basically, this is a tiny button that goes in the audio jack of your android phone, plus an app that interprets press-combos on the button and executes arbitrary commands on the phone. What really sold it for me is that you can also use the app with the button on your headset. So for example, when I go to take a nap (which is once or twice daily) I just do two long presses on the button and my nap track starts playing, without even having to take my phone from my pocket, let alone unlock it, open the music app, search for the nap track, and press play.

(Update: I think I lost my Pressy or something since originally drafting this post… at any rate, I at least lost interest in it because it ended up just being a little too inconsistent at doing the right thing, and too inconvenient to keep in the right state, and it required extra attention at other times which I didn’t like. But my ideal smartphone would just have a programmable button (or 3) on the side.)

Get the KeyMonk keyboard. In addition to two-finger swipe, it offers the ability to swipe left on the backspace key to delete an entire word. I just can’t go back to any other keyboard now that I have that (for reasons described in the typing shortcuts section above).

Update last 2018: KeyMonk seems to have been pulled from the Play Store. Having been forced to switch, I am getting used to not having two-finger swipe and swipe-left-to-backspace, but I would happily pay $20 for a modernized version of KeyMonk (the previous one had some issues and hadn’t been updated in years). Or more, in the context of a kickstarter or similar, if it made the difference between this existing and not.

Learn the limits and language of your voice interface. Know what Google Now or Siri can do. It’s painfully little right now, all things considered, but still, for things it can do it’s less work and fairly consistent. On Android, you can get Commandr, which gives you extra voice commands like toggling the flashlight.

I’ve wanted to write this post for a long time. Coming up on two years. Most blog post ideas do not stick with me that long. But while things have improved in that time, human actions are still way more adaptable than machine interpretations, so it’s still more efficient to take a few moments to figure out how to tell the machine exactly what you want directly. This is plugging it into your brain, via your hands.

You want your interfaces to be so smooth that your devices just feel like an extension of your body. So that you can make things happen as fast as you can think them.

If you want more thinking along these lines, you might like Bret Victor’s Brief Rant on the Future of Interaction Design.

bci