![]() |
|
|||
|
||||
Test-driven developmentMonday, June 30. 2008
By Trevor Baca, VP Software Engineering
How do you keep old code from breaking when you write new code? The answer is, and has always been, the running of a regression battery of tests. If you have custom classes in your code that, for example, are supposed to allow for multiplication by positive numbers and zero but not by negative or imaginary numbers, then writing a couple of tests to make sure that multiplication by positive numbers and zero work is the way to go. Even better, include a couple of tests to make sure that multiplication by negatives, imaginaries and other types of number don't. As you write more code, write more tests. The tests together constitute your regression battery. Then run your regression battery and make sure your tests all pass at important times. Such as before any major or minor release. And preferably at lot more frequently than that, such as before committing new code to the repository. A good regression battery is a requirement for a good team. And now a small distinction in the times at which developers write new tests for inclusion in the battery is having big consequences in the world of software development. Enter the notion of test-driven development. Test-driven development (TDD) helps us develop reliable code faster by putting testing at the center of the process. We've been migrating more and more of our development at Jaduka and NetworkIP to a TDD way of doing things. And the results have been exceptional. So what is TDD? Take a look at this recent post from molecular biologist and Python enthusiast Titus Brown. TDD asks that we write our tests as we go rather than at the end of the process. Given that almost all recent software development models are iterative and incremental, this makes good sense. How often does testing get squeezed in at the very end of a project? Too often, of course. And authoring test cases as we go can help considerably. Supporting the different Jaduka APIs requires an approach to development that prizes consistency. Once any API vendor releases new methods, the public interface to those methods needs to stay pretty much the same. Because users like it that way. But working on the back-end systems and networking code that makes Jaduka run requires a different set of tools. And a very important one of those is turning out to be TDD. Developer collaboration IIITuesday, June 17. 2008
By Trevor Baca, VP Software Engineering
Check out this post Ben Collins-Sussman posted to his blog on Thursday. Ben's one of the primary authors of Subversion -- the repository we use for all our projects around Jaduka and NetworkIP -- and has been at Google for some time now as a lead engineer. Ben's post addresses what he identifies as "programmer insecurity" -- why do we as developers always wanna hold off on releasing our own code until we've reached some future state of perfection ... and then why do we wanna cover our tracks when we're done? -- as an entry point to talk about why programmers work the way we do. And also how the tools that we chose to work with can help us hide or share our results. It's a point well taken: there's no question that "commit early, commit often" helps foster a solid team understanding of what the key subsystems in the codebase are actually *doing*. And this seems to be just as much the case with projects internal to our own teams here as it is with open source initiatives. I've blogged elsewhere about the value of different tools -- like UNIX screen and conference bridges -- for developer collaboration. Ben's post makes the additional point that, above and beyond our favorite collaborative tools, we have to work to ensure a collaborative workflow on our teams. Well worth checking out. Google App EngineTuesday, April 8. 2008
By Trevor Baca, VP Software Engineering
If you haven't already heard, click over to Google's announcement of the new app engine development and runtime environment. Summary: write a web app and deploy to Google's massive array of servers. APIs exist for user authentication, data storage, mail, URL fetching, etc. 500MB of storage and CPU bandwidth for 5 million pageviews a month. Only free accounts at the moment with the possibility to buy more computing or storage resource coming in future. The interesting part? Google app engine is 100% python. (For now, anyway.) The web framework is Django (which is to python what Rails is to ruby). And you have to write all application code in python (as opposed to PHP or ruby). Oh, and Google announced the app engine last night with 10,000 accounts available. Which are now all taken. Every last one. It makes sense. Guido van Rossum, python's inventor, now works at Google. And python is widely reported as the single most important language in the Google's running and management of its network and absolutely enormous arrays of servers. We're just now finishing up a three-year migration of the most important parts of the Jaduka and NetworkIP codebases from C to python. The migration has been challenging -- all migrations are -- but extremely beneficial. We're realized a 10-to-1 compression of lines of code, far more flexible database access, and -- most importantly of all -- dramatically faster implementation of every realtime telephony service we've put our hands on since then. But full disclosure -- our use (and love) of python extends only to our core realtime systems code. We manage a complex network of telephony switches, routers, and application-, database- and statistical process control-servers all in python. But our web applications? They're all in PHP. It takes an array of languages to make the world go 'round. And we're excited about the announcement of Google's new app engine. Check it out. Wii Head-TrackingFriday, March 28. 2008
By Trevor Baca, VP Software Engineering
So much of games programming is already so virtuosic that it's frequently hard to be impressed when the next new thing rolls around. But this YouTube video on Wii head-tracking is a dramatic exception to that rule. Aaron over in our business apps team brought the video to my attention. It features Johnny Chung Lee at Carnegie Mellon's HCI program. And it runs about 5 minutes. The part you really wanna see is starting at about 2:50 into the video where you get the first-person perspective of what IR head-tracking really allows. Way cool. My friend John and I were talking last night about when and how game design will outgrow the conventions of cinema. That's a big question ... but it certainly seems that technical developments like the ones shown here have the possibility to help provide an answer. SXSW 2008 in AustinMonday, March 10. 2008
By Trevor Baca, VP Software Engineering
![]() Austin at night At EtechTuesday, March 4. 2008
By Trevor Baca, VP Software Engineering
![]() San Diego 2008-03-03. Surprises About WritingThursday, February 21. 2008
By Trevor Baca, VP Software Engineering
1. Writing is older than we thought. Google and you'll find that Egyptian hieroglyphs are some 5000 years old while ancient Chinese dates back to 1500 BCE or even 4500 BCE depending on the source. But even this earliest of dates isn't early enough. Researchers reporting to China's Xinhua news agency last year in May documented thousands of characters on cliffs in the northwest part of the country dating back 8000 years. A summary appears in the BBC online here. 2. Unicode reserves plenty of space for dead symbols. Need classical Chinese characters that fell out of use centuries ago? They're in there. How about Mayan hieroglyphs? Check. This seems crazy but there's nothing else to do -- writing is 100% historical and arbitrary and so information systems developed to model writing will always have to take historical fall-out into account. Today's characters aren't guaranteed to last. And symbols from centuries past sometimes resurface. 3. The Roman letters used in writing English and other western languages look different on the page compared to Greek and Cyrillic. But there's a common ancestor in the Phoenician alphabet of 3000 years ago. The real kicker? Written Hebrew and Arabic are related in the same way. The Phoenicians spoke a Semitic ancestor of modern Hebrew and Arabic and the script they exported on ships across the Mediterranean still shows this relationship today. The three dancing dots floating Arabic shin are precisely the three points of the trident-shaped character in Hebrew and Cyrillic. All three represent the sh sound for which we have no symbol at all in our Roman alphabet. ![]() http://www.flickr.com/photos/patrickjburns/218383188/ 4. Internationalizing your apps for Chinese, Japanese and Korean is definitely a pain. But even alphabetic languages can trip you up. Remember that "IJ" in Dutch, "ch" in Czech and "rr" in Spanish all count as individual letters -- or digraphs -- that alphabetize differently than what your String.sort( ) method might like. The Czech word "chleb" for bread comes between "hora" (for mountain) and "golub" (for pigeon) in the dictionary. Oh, and if you're an graphic interface designer, don't forget that Hebrew and Arabic will run right-to-left in your apps, even in your drop-downs, text boxes and radio buttons.5. The internet may not be all Roman for ever. Reporters at the Guardian reported last month that Russian officials may at some point push for the creation of a Cyrillic internet. Similar reports for Chinese circulate from time to time. Whether driven by politics or culture, the result for developers can only mean even more work when dealing with the intricacies of the written word. Vocal Inflection, part IIIThursday, February 21. 2008
By Trevor Baca, VP Software Engineering
Whereas part I and part II of this series have explored vocal inflection in English -- and what makes it so hard for machines to get right -- this final post in our series on vocal inflection explores tone in a radically different way. Linguists describe Chinese, Vietnamese and a great many other southeast Asian and also west African languages as tonal. This use of "tonal" contrasts with our use of "intonation" and "inflection" in our exploration of text-to-speech. Whereas we explored the emotive and discourse meanings behind "up" (pronounced with a rising tone), "up" (pronounced with a falling tone), and "up" (pronounced with a compound falling-then-rising tone) in English, speakers of tone languages use tones to distinguish different lexical words. One Googleable example in (Mandarin) Chinese runs through the four different tones speakers may place on the syllable "ma". The resulting words denote "mother", "scold", "hemp" or "horse". These four words distinguish purely on the basis of tone and have nothing otherwise to do with each other. A very different situation than what we find in English. This much you've likely encountered already somewhere. What's less widely known about the tone languages that serve the mostly emerging markets of literally billions of people, however, are these three facts: First, most of the world's approximately 6000 languages are, in fact, tonal like Chinese rather than nontonal like English. Chinese has an enormous number of speakers, of course. But note that certain other tone languages, such as many of the tone languages of Africa, have considerably fewer speakers. Second, while you'll have much better results getting the correct word-tone out of a text-to-speech system for Chinese (because word-tone is so well studied and understood for Chinese, by linguists, by everyday users of the language, and by developers), you're still going to run into the same problem with sentence-tone in Chinese that we've explored in our last two posts here for English. Yes, Chinese has both word-tone and sentence-tone. Try modelling that correctly in a synthesis system. Third -- and this one's astounding -- researchers at the University of Edinburgh reported last year in the Proceedings of the National Academy of Sciences that populations of speakers of tonal languages possess special varieties (alleles) of two genes. This is especially surprising given the almost complete lack of known associations between genetic variation and high-level language characteristics in humans (at least at this point in time). The findings were reported last year in the Economist in an article which closes with the equally spectacular fact that the genes correlating with tone languages appear to be both "highly beneficial" to the groups involved and very recent in the evolution of humans (developing only some 5,800 and 37,000 years ago). Tone in languages like Chinese and vocal inflection (or intonation) in languages like English are spectacularly complex phenomena. We'll have an opportunity to come back to these and other findings when we consider the special things that voice gives us a developers and consultants and problem-solvers that written words do not.
Posted by Trevor Baca
in Innovation
at
10:07
| Comments (0)
| Trackbacks (0)
Defined tags for this entry: african languages, chinese, economist, genetics, inflection, intonation, tone, vietnamese
Vocal Inflection, part IIWednesday, February 20. 2008
By Trevor Baca, VP Software Engineering
In part I of this post we looked at the three most basic tones in English and we checked out the performance of the text-to-speech, or TTS, robot at AT&T Labs named "Mike". We discovered that English does in fact have tones. And we discovered that tones are hard to get right in text-to-speech. In this post we look at a different example of vocal inflection in English. And we see how tones interact with sentences. Listen to examples #1a and b, below. Example #1a (falling then rising): "You downloaded the newest vèrsion, dídn't you?" Example #1b (falling then falling again): "You downloaded the newest vèrsion, dìdn't you?" (The examples here follow the presentation of local meanings of rising tones in the second edition of Alan Cruttenden's Intonation, which we introduced in the previous post.) Same sentence, two different tone contours. (Accent marks help us see the tones.) Example #1a ends the first clause with a falling tone on "vèrsion" and then follows that with rising tone on "dídn't you?". This is a very common pattern in spoken English. The pattern seems to convey genuine uncertainty on the part of the speaker. "Umm, you know, I'm not really sure that you downloaded the newest version and in fact I think there's a chance you may be looking at one of the old, out-dated versions. So let me ask you to make sure: you downloaded the newest version, didn't you?" Example #1b starts off exactly the same as example #1a. But example #1b follows up the first clause with yet another falling tone on the "dìdn't you" in the second clause. This is different. Example #1b seems represent a much stronger degree of certainty on the part of the speaker. So much so that the speaker seems to be asking for confirmation. "You did download the newest version and I'm quite certain about that fact; now confirm it for me so we can move on to more important things." So the meaning behind the vocal inflection in example #1a is something like "genuine question, uncertainty" whereas the meaning behind the vocal inflection in example #1b is more like "request for confirmation, relative certainty". Exactly the same words. Only one tone differs. But now let's look at what happens when we let this exact same pair of tone contours interact with a different type of sentence. Example #2a (falling then rising): "You downloaded the newest vèrsion, díd you?" Example #2b (falling then falling again): *"You downloaded the newest vèrsion, dìd you?" [wrong] The sentences in examples #2a, b are almost exactly the same as the sentences in examples #1a, b. The only difference is the change from "didn't" to "did" in the so-called "tag question" at the end of the sentence. But notice what happens. Example #2a (falling then rising) is perfectly acceptable. But example #2b (falling then falling again) is not acceptable. At least not for a native speaker. If we stop and think about this for a moment, we realize something quite astounding. The acceptability of English tones is somehow conditioned on (very slight) differences in syntax. Take away a "not" from examples #1a, b and you render one tone pattern valid and one completely unacceptable. There are a couple of take-aways here. First, tones are by no means the exclusive province of speaker preference. Yes, when we listen to Clinton, Obama and McCain we hear wildly different patterns of vocal inflection (some probably much more interesting than others). But the choices that different speakers make when they select different patterns of vocal inflection are very strongly conditioned by rules that govern the interaction between tone and syntax. Second, these rules that govern the interaction between tone and syntax are largely hidden. Sure we teach kids and second-language learners to raise their voices at the end of a question. But our examples here give perfectly valid situations where you do exactly the opposite and lower the voice at the end of a question (to ask for confirmation rather than to exhibit doubt). How do we, as application designers interested in voice, capture these rules? Better yet, how do we as application designers pick different tone patterns for these sentences given that their written forms are exactly the same? Developers hate hidden rules.
Posted by Trevor Baca
in Innovation
at
10:12
| Comments (0)
| Trackbacks (0)
Defined tags for this entry: alan cruttenden, hidden rules, intonation, syntax, text-to-speech, tts, vocal inflection
Vocal Inflection, part ITuesday, February 19. 2008
By Trevor Baca, VP Software Engineering
Communications-enabled business processes (CEBP) take many forms. Think school- and jobsite-closing messages broadcast simultaneously and automatically to many phones at once some morning when there's bad weather and you get the idea. When we at Jaduka collaborate with clients on a new CEBP improvement project, the question of text-to-speech, or TTS, frequently comes up. Not all CEBP improvement projects need TTS. But some can benefit from careful TTS somewhere. Our general advice is to be smart about TTS -- make sure you need it and then use it sparingly. And we find that we sometimes have to go back over this point because executives tend to want TTS even when they don't need it. Think Flash webpage intros in the 1.0 bubble. This post introduces vocal inflection as part of our continued series on what makes TTS tricky. Vocal inflection -- aka intonation, aka tone of voice, aka prosody -- is the combination of pitch, loudness, speed, pauses, stops and starts that modulate words up and down during speech. Speakers of all languages make use of vocal inflection, though unevenly. As far as we have data, English makes greater use of vocal inflection than most other languages (leading German, for example, with French probably near the bottom of the list of languages that admit a wide array of different inflection patterns), though this may change as better data appear. The array of different inflection patterns in English is enormous. For reasons of space, we'll go over just a single example set here. And, as you're reading, imagine you're a developer responsible for coming up with some sort of algorithm to handle this sort of thing programmatically ... precisely like a good TTS system. First, here are some tones in English. Click on each of these to listen to the sound of the word "up" in each example: Example #1: "She put it up." Example #2: "Did she put it up?" Example #3: "She didn't put it up but down." It generally comes as no surprise to native speakers that the "up" in example #2 occurs with an up-tone because we are taught that the voice rises on a question (though this is far from always the case). But it generally does come as a surprise that the "up" in example #1 occurs with a down-tone. And it is astonishing indeed that English makes very regular use of the compound falling-rising tone on the "up" in example #3. Listen to examples again and hear the falling tone in #1, the rising tone in #2, and the falling-rising tone in #3. These things are easier to hear side-by-side. So thanks to the wonder of the audio editor, we've cut the three different "ups" out and spliced them together here. We've also added accent marks: Example 4: "ùp ... úp ... ŭp." These results are surprising because we're not used to thinking about vocal inflection in English as an independent phenomenon. It's just something we kinda do, but that we are expected to do correctly (and the foreigners very frequently do not do correctly). Alan Cruttenden, in his textbook-length treatment on the subject, identifies the need for at least seven so-called "nuclear tones" for the analysis of spoken English, with an even greater number of tones required in certain special cases. So what does this tells us? If we're a text-to-speech robot, the data tell us that we better be able to figure which tone to use when. Let's test "Mike", the TTS robot at AT&T Labs that we introduced in an earlier post. Example #5: Mike says, "She put it up." Example #6: Mike says, "Did she put it up?" Example #7: Mike says, "She didn't put it up but down." And side by side: Example #8: Mike says, "ùp ... úp ... ŭp." So how does Mike do? About 1 1/2 out of 3. Mike knows -- like native speakers -- that the voice should rise in the question in example #6. But Mike uses exactly this same inflection in the declarative example #5; while this isn't wrong (but perhaps an expression of something akin to "cheerfulness") it's less likely. Example #7 Mike gets completely wrong; Mike has no contrastive falling-rising tone at all, it would appear, and the substitution of a flat-low tone seems to be Mike's programmers just trying to escape the problem. These are the absolute most basic cases possible of vocal inflection in English and AT&T Labs starts off with a score of about 50% relative to a native speaker. The results are guaranteed only to get worse as consider more tones and more sentence types. The conclusion for voice applications developers? Approach TTS with a healthy distance. Your app probably doesn't need it. But if it does, expect TTS to be at best comprehensible. But not idiomatic.
Posted by Trevor Baca
in Innovation
at
17:02
| Comments (0)
| Trackbacks (0)
Defined tags for this entry: intonation, prosody, text-to-speech, tts, vocal inflection, voice application development
"Welcome to the NetworkIP Systems Engineering Office"Tuesday, February 19. 2008
By Trevor Baca, VP Software Engineering
This is the first of a series of posts where we'll explore why it's hard to get computers to sound like humans. "TTS", or "text-to-speech", sometimes sounds like this: What makes text-to-speech so tricky? Click to listen. The voice is "Mike". Mike is one of the text-to-speech personae available at AT&T Labs's TTS demo site. You can visit AT&T Labs here. We understand Mike. But there's still something not quite right. We'll come back to what, exactly, is off about Mike later in the series. For now we'll leave "Mike" and switch to Rob. Rob's a human. And an engineer in our Austin offices. And sometime back Rob was busy setting up our office phones. Rob: "I need to get a new greeting prompt recorded. Here's the transcript I'm gonna send to the prompt shop. Wanna look?" Now, some background information. NetworkIP is the parent company of Jaduka. And our Austin offices house a couple of different technology teams. Systems Engineering is one of those teams. So Rob's transcript reads like this: "Welcome to the NetworkIP Systems Engineering Office. If you know the extension of the party you wish to reach ..." Now listen to what came back, recorded by a real human: Example #1: "Welcome to the NetworkIP Systems Engineering Office." What's going on here? Rob shipped the prompt back and asked for a re-record. What came back the second time was much better: Example #2: "Welcome to the NetworkIP Systems Engineering Office." What's going on here is that example #1 pauses significantly between "Systems" and "Engineering". Example #2 doesn't. To us it seems obvious that example #1 is bad and example #2 is good. So why did the prompt shop first ship us over example #1? The answer has to do with the real world information we use to group together long strings of nouns as we speak. NetworkIP has many different teams. There's the marketing team. There's quality assurance team. And there's the systems engineering team. This real world fact binds "Systems" and "Engineering" together. We can use parentheses to show this relationship as (Systems Engineering). When we then consider that the systems engineering team is a part of the corporate structure of NetworkIP as a company we get (NetworkIP (Systems Engineering)). And when we finally consider that the NetworkIP systems engineering team is sitting in an office somewhere we get ((NetworkIP (Systems Engineering)) Office).But Rob's prompt shop had a different mental model during the recording of example #1. Something like ((NetworkIP Systems) (Engineering Office)). This models the engineering office of some business entity named "NetworkIP Systems". Now, there is no business entity named "NetworkIP Systems". But how are the folks at the prompt shop supposed to know this? And so who's to blame them for dividing up the text this way?The conclusion here is that how we divide up phrases when we speak depends on our knowledge of the people, places and things in the real world. Exactly the sort of information that computers tend not to have. This matters if you're a voice applications developer. At least if you want to keep your clients happy. Let's close by thinking again of "Mike" at AT&T Labs. How is Mike -- or Mike's creators -- supposed to get access to the real world information about companies like NetworkIP and the different teams at NetworkIP like systems engineering? How, in general, are the creators of text-to-speech systems supposed to incorporate the real world information that humans everywhere use to do things when we speak? Put yourself in the position of the programmers trying to solve this issue of real world information -- or what linguists like to call "pragmatics" -- and you'll see that whatever it is that causes us to pause in some places but not others is not at all a trivial concept to model. We'll pick this, and a handful of related questions, up in later posts. IVR Best Practices -- Ellipsis & ProximityMonday, February 18. 2008
By Trevor Baca, VP Software Engineering
Telecom junkies use the term "IVR" to refer to the "press 1 to speak to a representative ..." phone systems we all have to deal with when we dial in to talk to tech support or the airlines or our bank. The letters in the acronym stand for interactive voice response system. We've spent a lot of time over the years helping clients tweak their IVRs for better end-user experience. Before we go further a secret must be imparted. Your users hate your IVR. You spend time and money getting your IVR up and running. But, alas. Your users hate your IVR. (There are good structural reasons for why your users hate your IVR, btw; more about that in a later post.) That being the case, are there IVR best practices? There are. And the two such practices we'll explore here go by the labels ellipsis and proximity. For examples of both we'll turn to CapitalOne's credit card customer service line at 1 (800) 955 - 7070. When we dial in, we hear this. Go ahead. Click and listen We enter our credit number and then we hear this. Go ahead. Click again. So notice what's not here? Menu items 3, 4, 5 have no "for" at the beginning. Nor do they have a "press" towards the end. "Ellipsis" means leaving stuff out. Just like menu items 3, 4, 5. Take a look at the transcript. Not exciting. But a perceptually faster -- and even smoother -- user experience than it otherwise would be. By getting rid of six words. "Six words? That's only fractions of a second!" I know. But this sort of stuff winds up having a larger perceptual impact than it should. Leafing through the data from conversation analysis, intonation and research areas that quantify this stuff doesn't really make it any easier to explain what's going on. But it's there. Use ellipsis to your advantage. And make your users hate you less. We borrow "proximity" from the GUI usability folks who in turn borrow it from the cognitive psych types who study this stuff. We tend to see stuff that's near other stuff as clumping together. In our case our users can't see anything. They can only listen. But the length of the pauses we build into the IVR combine with the word order of our prompts to yield something similar to the clumping proximity principle from gestalt psychology. Consider that each of the menu options "to pay by phone, press 1", "for recent transactions, press 2", etc, above, could have been written instead as "press 1 to pay by phone", "press 2 for recent transactions", etc. What's the difference? The menu options as they are today put users categories like "recent transactions" first and verbal invitations like "press 2" last. This is a good thing. Reversing the order with the verbal invitations first and the user categories last opens up enough of a window for users to forget which key it is that you wanted them to press by the time they're done listening to the full name of the category. (Why this is the case, btw, has to do with the relative lengths of pauses between what are written above as commas within a menu item and semicolons between different items.) So there they are. Ellipsis and proximity. Slight differences in language and language use that speed up the perception of experience and protect us from our own forgetfulness. These first two are freebies. If there's any interest then we can visit more best practices in later posts.
Posted by Trevor Baca
in Innovation
at
22:29
| Comment (1)
| Trackbacks (0)
Defined tags for this entry: best practices, cognitive psychology, ellipsis, gestalt principles, IVRs, phone trees, proximity, telecom, telephony, user experience
(Page 1 of 2, totaling 14 entries)
» next page
|
QuicksearchRecent PostsVoice Mashups that provide customer delight
Tuesday, July 22 2008 Test-driven development Monday, June 30 2008 Developer collaboration III Tuesday, June 17 2008 Earthcaller 2.0 In Production Friday, May 16 2008 Google App Engine Tuesday, April 8 2008 Two-part texting security Tuesday, April 8 2008 Wii Head-Tracking Friday, March 28 2008 Video goodness Thursday, March 27 2008 CategoriesArchivesTagsSyndicate This Blog |

Recent Comments
Wed, 04.06.2008 14:57
Opra, I couldn't agree more. If you haven't already, pick [...]
Wed, 04.06.2008 14:50
The OG Review Query is pretty routine. It's probably an [...]
Wed, 04.06.2008 14:14
What is a "Og Review Query"? Can I contact the "Og" about [...]