I was being interviewed for a podcast last night. The setup was that I’m in my hotel room using the woefully over-contended in-room Internet access. The caller could only record calls made using his landline phone, so he called me on my SkypeIn number.
The audio experience was OK, but about that of a typical cellular call. Not ideal for a podcast.
This does, however, provide great fodder for a “Voice 2.0”-ish story. Normally, VoIP uses the UDP protocol for media transmission. If the packet doesn’t get there within 300ms, or whatever, forget it. No point in asking for reliability and re-transmission of lost data. The TCP protocol is used for signalling and other purposes where a reliable, in-sequence connection is required.
Now in this case, when a call is being recorded, the need is rather more subtle. In the real-time portion of the conversation, it just needs to be good enough for us to understand each other to hold a conversation. What we then want is to “fill in the gaps” and allow the person recording to request re-transmission of packets corresponding to gaps and dropouts, but in a way that doesn’t impinge on the bandwidth allocated to the ongoing conversation. So we might send the media over UDP still, but there’s a TCP-like component at the side.
Another alternative is for my VoIP client to record the conversation in wideband audio, and then upload that (possibly after the conversation has finished). It would embed suitable synchronisation data so that the audio from both ends of the conversation can easily be mixed into one stream.
This highlights a case of user needs often being subtle and unexpected. As always, the purpose of the “stupid network” is to enable crazy new things, not connectivity arbitrage. Sometimes, a phone conversation is more than a phone call.
Posted by Martin Geddes at 02:02 PMTrackBack URL for this entry:
http://www.telepocalypse.net/cgi-sys/cgiwrap/mgeddes/MT/mt-tb.cgi/790.
Right, this is my contention about knowing when you really care about latency made manifest. The iChat AV/GarageBand combination is good at this - you start a multiway audio conversation in iChat, and hit record in GarageBand and it records a separate track for each speaker, the local one at max quality, the remote ones decoded as heard.
This permits independent equalisation and synch adjustment after finishing.
Garageband and iChat AV are mac only, but the AV chat interoperates with AOL Instant messenger on the Windows side.
Another example is when using QuickTime Broadcaster to send live AV to a remote server for relay, TCP is a better choice than UDP, as a dropped compressed packet cascading across multiple listeners is worse than any induced latency, for this essentially one-way transmission (the back channel is commonly text chat).
Yes..heard about this on one of Adam Curry's DSC's (Charlies?). Somebody (can't be more helpful than that) recorded one end and the other guy recorded the other and they stiched them together and it sounded fine.
PS if he was paying the bill get him to call your mobile or just ring the hotel and transfer to your room. Last time I checked you don't pay incoming on hotel phones.
Posted by: at September 11, 2006 10:24 PMI went through the same process some time back. Initially I tried to record using Gizmo's record feature, but Gizmo (OS X version) kept crashing.
Then I did this insane thing
http://www.petrovic.org/blog/2006/07/12/developing-podcast-audio-from-rtp-voip-packets/
which required that I stitch the channels together using audacity. I had to offset one of the channels with respect to the other by a few hundred ms to reproduce the original synchrony as I remembered it during the call.
Then D. Beckemeyer stepped in and added a record feature to PhoneGnome.
http://www.petrovic.org/blog/2006/07/14/phonegnome-supports-podcasters/
Posted by: at September 12, 2006 01:10 AM