ACR, New Techs Vie to Meet Sync Challenge in Device ITV

John Gilles, EVP, sales & marketing, Coincident

John Gilles, EVP, sales & marketing, Coincident

January 27, 2012 – Interactive TV is roaring into commercial reality as a growing number of content suppliers, often working independently of service providers, pursue a mind-boggling array of technology options aimed at synching up connected device apps and TV programs.
“Synchronization is huge,” says Bill Sheppard, chief digital media officer for Oracle’s Java Development Group. “Just the core ability to know what’s being watched and exactly where in the show you are shouldn’t be that hard to get, but it relies on some standards to do that.”

Standards, of course, don’t exist at this point, so it’s a technological free-for-all as startups such as Coincident, Zeebox, Miso, Umami, Zeitera, Yahoo!’s recently purchased IntoNow and others vie with established players like Audible Magic, Civolution, Shazam, Technicolor, Nielsen and Cisco Systems to win sufficient market allegiance to their methods to at least create a de facto standard. The question seems to boil down to whether some kind of time-stamping method allowing a cloud-based system to sync a companion device precisely with a broadcast stream will win the day or whether something more complex such as forensic fingerprinting or watermarking will be needed.

There’s much to recommend a method that would simply utilize the data already in the broadcast stream, Sheppard notes. “Whether it’s over the air or cable or satellite, it’s there in the stream every few milliseconds,” he says. “So clearly the software stack in the box could be making that information available to other devices in the home. Once you know what you’re watching and where you are in the program, you don’t need a lot of other overhead to do all kinds of interesting things with that.”

But there’s the rub. A built-in mode of standardized synching between set-tops and connected devices would deleverage the service provider’s opportunity to share in the potentially lucrative benefits cooked up by content providers, advertisers and app developers, which leaves the latter group the choice of coming up with solutions that work around the middlemen or partnering with them on solutions. So far, the momentum appears to be with the workarounds.

Three bellwethers to what’s taking shape along competing tech tracks are Coincident, Audible Magic and U.K.-based Zeebox, which is set to bring its companion device platform to the U.S. on the heels of announcing a big investment and partnership deal with News Corp.’s BSkyB. In one way or another each company is working with major content players to make it possible for viewers to socialize and access a wealth of information flowing in real time with what they’re watching on TV.


Successes scored by two-year-old Coincident are particularly impressive, although, so far, they are confined to interactive experiences tied to TV content when it’s available online rather than with live broadcasts. But, as John Gilles, Coincident executive vice president of sales and marketing makes clear, the firm is leveraging its position as a provider of online ITV functions to the likes of NBC, ABC, CBS, Fox and MTV to win support for a new element to its platform, ScreenSync TV, that is meant to address demand for companion device experiences with live TV.

When it comes to offering what could become a standardized approach to achieving synchronization and integration of app triggers, Coincident’s approach seems to have some potential along these lines insofar as it starts with a simple-to-use video-optimized coding method it calls “ITV Markup Language” (ITVML). The XML-based tool “enables video authors to connect any frame of any video to anything on the Internet,” Gilles says.

By selectively identifying key frames they want to use as app triggers for a given segment, producers can give viewers options to get more information about personalities appearing in the video, react to products in the stream such as an actress’s dress, interact with ads, engage in games, network socially or whatever else someone might dream up to enhance user experience. “On the Web, where everything is state based, video is put in a rectangle and held hostage,” Gilles says. “We’re making it possible for video authors to create a variety of interactive opportunities over time by linking frames to different URLs.”

Typically there might be a dozen or so link triggers tied to an hour’s worth of video, although the number is up to the producers. Gilles, characterizing the use of ITVML and Coincident’s other components as part of the “post, post production” process, says the template is very simple to use, freeing producers to quickly enhance the user experience however they please.

The SuperFan Experience

Showing how this has been done by the producers of the hit Fox show Glee, Gilles points to an unobtrusive bar that appears at the bottom of the screen as a viewer watches the online time-delayed showing of an episode where options exist for people to click to bonus videos offering interviews with actors and directors and behind-the-scenes views of productions. Or, for example, in the case of a song from West Side Story performed by the Glee cast, the viewer can be connected to a purchase option on iTunes or to a clip of the original cast performance of the song, or get an opportunity to purchase a ticket to a current performance of West Side Story.

Results from the use of the Coincident platform with multiple shows from major broadcast networks show what this type of engagement can mean to monetizing content, Gilles notes. “In 2011 we did over 30 million video sessions, most with full episodes of the programs,” he says. “We found the ‘superfan’ engagement option increased the completion rate of full episode viewing by 33 percent with the averaging viewing time lasting ten minutes longer compared to what you see with shows that don’t have the interactive option.”

Moreover, he notes, the amount of effective video time resulting from the second screen bonus video option for a 46-minute episode was 75 minutes. “We’re seeing millions of additional ad impressions,” he says.

Synching Companion Devices

Now, with use of the ScreenSync component for companion devices, the deeper engagement option can be implemented with programming as it’s broadcast, so long as the receiving set-top or connected TV has a Wi-Fi connection that can communicate with the companion device. “ScreenSync TV is the future for video infrastructure providers, both with the companion app capability and with supplying the cloud-based systems that deliver the ancillary content,” says Coincident co-founder and CEO David Kaiser. Significantly, he notes, the platform allows users to avoid the “often problematic automatic content recognition feature embedded within competitive products in the marketplace.”

That’s because, once the companion device knows what channel the viewer is tuned to, it can communicate with the cloud to pick up the time-based frame cues that relate to the timing of the specific broadcast feed the user is on, which can vary greatly depending on how the viewer is accessing the feed, whether from cable, satellite, broadcast or over the top. “We are currently engaging in talks with potential partners and expect full solutions for consumers powered by ScreenSync TV later this year,” Kaiser says.

But while the Coincident method for attaining synchronization overcomes potential complications tied to fingerprinting, watermarking or speech recognition, it requires that specific programmers employ ITVML to set up specific modes of engagement tied to frame sequences in a given broadcast feed. “We have to involve the content supplier,” Giles says.


Meanwhile, since its public launch in the U.K. last fall, zeebox has taken off on the strength of deals with Channel 4’s Desperate Scousewives program series and, more recently, with BSkyB for a much deeper engagement extending to its full programming lineup. But although involvement of content suppliers and service providers in fashioning interactive options is a priority at zeebox, the company’s platform is designed to support program-synchronized apps for companion devices independently of such participation.

As a result, the company has been able to set up zeebox as a free downloadable app now available to iPhones, iPads and computers with a vast range of enhanced content experiences that sync up with whatever a person is watching on TV regardless of where the programming is coming from. The company’s engineers have even gone so far as to figure out the channel-changing mechanisms of leading brands of connected TVs, allowing viewers to use their iPads and iPhones as remote controls before the relationships zeebox is pursuing with some of these manufacturers are firmed up.

“Part of the magic is in the application’s ability to tell the personal device, via the DLNA-enabled home network, to switch HDMI inputs on the connected TV if required, and then to select the appropriate channel and program seamlessly from within the application,” comments David Mercer of Strategy Analytics in a recent blog. “Specifically [zeebox co-founder and CTO Anthony] Rose claims that the first implementations will be compatible with Samsung, Sony, LG and Panasonic 2011, and some 2010.”

“zeebox has the potential to transform the commercial model of the television industry through its ability to understand viewer decision-making and tastes, says CEO Ernesto Schmitt, the firm’s other co-founder. “zeebox also creates a direct channel between advertisers and consumers with powerful transactional capabilities.”

zeebox is a complex product because it spans multiple disciplines, Schmitt says. These include broadcast TV ingest, natural language processing, video and audio fingerprinting, social, product and user experience design, TV schedules and metadata, real-time presence and chat service and more.

Synching via Speech Recognition

Natural language processing is the starting point for allowing zeebox to operate without requiring use of manual processes to mark frames as in the case of Coincident or the complex matrix of content marking, communications and infrastructure required with use of fingerprinting and watermarking. Instead, the processors running on company servers “listen” for keywords tied to a vast catalog of names, places, objects and what have you as over-the-air and premium content is broadcast in real time.

When the system’s speech recognition technology identifies a keyword in a given program, the platform generates a “zeetag” over the Web for display via the app UI running on computers, iPads and iPhones of users who are tuned to that program. These tags, appearing in the right-hand column of the UI, will link them to any variety of Web-based material associated with the keyword, such as products from a performer that are available for sale on Amazon, information about a topic from Wikipedia, news stories, etc.

At the same time, because users can log in to Facebook or Twitter accounts, the app lets them know what their friends are watching and to communicate about shared viewing experiences. It also runs a dynamic graphic showing how the most-watched programs among friends line up in real time.

But the basic app level of tagging, relying on recognition of keywords and Google-like “spidering” to find keyword-related links on the Web, is fairly random compared to what can be done with fingerprinting. Moreover, without some means of automated channel identification such as fingerprinting provides, users must manually identify which channel they’re watching on the UI to enable the app. And the stream of apps generated by the keyword search mechanism can be error prone or produce highly irrelevant and, according to some reviewers, irritating links in the stream of zeetags to the UI.

Premium Level Synching

From an advertising perspective there’s huge opportunity in companion device apps tied to live viewing, especially in the way zeebox works, says Rex Harris, media innovations supervisor for the Video Innovations Group at Starcom MediaVest Group’s SMGx research operation. “I think they’re the best example of a second-screen app that I’ve seen,” Harris says. “I think 2012 is going to be an interesting year for them when they come to the States.

“What’s really interesting is they have a real-time feed of tags relevant to the show and what’s happening in the show,” he continues. “What you’ll see on the right hand side of the tablet if you’re watching, say, Saturday Night Live, as people make cameos, they’ll show a tag for that actor’s or actress’s name. Click on it and you get more information.

“As the ads started happening, that would also be tags coming through that stream. You could click on that to get more information, go to our website, that kind of thing.”

But to do this zeebox must go beyond the basic level of service to employ audio and video fingerprinting. Those techniques come into play in relationships with programmers and advertisers where, as Harris puts it, “curated experiences can come from the content providers themselves, so the people who actually produce the show can say this is the kind of link we want to happen.”

zeebox has developed a plug-in architecture it calls “Showtime” that allows broadcasters and advertisers to take over and augment Zeebox with their own content. The template includes a Twitter visualization mini-app that can deliver tweets from the cast, videos, information on the soundtrack, a Google Map of the show’s key locations and, in the case of its first announced programming customer, Channel 4’s Desperate Scousewives, a “Scouse Glossary” for viewers who are struggling with some of the slang.

It’s unclear to what extent, if any Sky, which has taken a 10 percent stake in zeebox, will leverage the more advanced options. The provider will launch the new “augmented TV” service in the first half of this year and says zeebox apps will be integrated with other apps such as Sky Go, which allows users to access programming from mobile phones, and Sky +, the service’s whole-home DVR app. Eventually zeebox-equipped devices will work as remote controls with the Sky set-tops, the companies say.

“Sky took an early position of leadership with companion devices, having recognized the demand from our customers to use second screens to discover, enjoy and interact with their favorite content,” says Emma Lloyd, Sky’s director of emerging products. “The integration of zeebox’s innovative technology will enable us to make the companion device experience even richer and more engaging.”

Zeebox officials confirm Harris’s comment that they are headed to the U.S. later this year. “Thanks to Sky’s backing we now also have the resources and expertise to set our sights firmly on international expansion alongside further innovation here in the UK and Ireland,” Schmitt says.


Stateside, the fingerprinting and content tagging operation Audible Magic has in operation as a wholesale provider to services like Miso and YapTV could give zeebox a leg up as well. U.K. press reports have quoted Audible Magic officials as acknowledging they’re in discussions with zeebox.

With years of experience providing fingerprinting as a major tool in the battle against content theft Audible Magic has been able to get a fast start in automatic content recognition for entertainment and advertising. “We know we can do millions of matches per day based on the [digital rights] compliance work we do with customers like Facebook and over 200 universities,” says Jeff Vinson, vice president of market development at Audible Magic.

When it comes to the crucial synchronization required to companion apps with TV viewing, “we’re time accurate and frame accurate,” Vinson says. “We not only know what you’re watching but when. So we can deal with all these developers who have to be very in sync for polling and other things.”

The company has a deal with Rovi whereby it uses that supplier’s massive metadata base to create an environment where app developers can deliver specific information and action options to companion devices directly related to what viewers are watching on their TVs. “We have listening posts on the East Coast of the U.S. listening to 100 different networks, which, in real time, are creating a tiny audio fingerprint and tagging that to the Rovi ID of that particular show,” Vinson explains.

Those fingerprints, consisting of bits of code from a fragment of the audio track, are deposited into the Audible Magic database. The app running on the iPad uses the device’s microphone to monitor the sound and creates a small fingerprint following the same algorithmic instructions that are used with creation of fingerprints at the listening posts.

The user’s device sends the newly created fingerprint over the Wi-Fi connection to the user’s broadband network and from there to our database. When the system comes up with a match of the stored fingerprint with the received fingerprint, the system then knows the Rovi ID of the program and goes out to the Rovi cloud to retrieve the metadata and drive relevant information to the user’s iPad screen.

Massively Scalable Architecture

Demonstrating the system with viewing of CNBC’s Closing Bell, Vinson shows how the iPad syncs up with the program being viewed and immediately displays information about the presenters along with options to communicate about the show over Facebook and Twitter and to dig deeper into the background of people involved with the show. Just what the apps are is up to Audible Magic’s customers.

“We don’t set categories or otherwise determine what the end user sees,” Vinson says. “Our role is to provide the backend services and license the technology to create the fingerprints and check them against our database.”

This system allows multiple app providers to participate in the infrastructure and therefor contribute to scaling efficiently to a mass audience. This is crucial, Vinson says, because if companion device apps are to catch on, the options for end users must become pervasive. “You need the cooperation of everyone,” he says.

The ability to leverage a single infrastructure across myriad content suppliers, CE manufacturers and app providers, thereby alleviating participants of having to deal with complexities on an individual basis, is a big draw for Audible Magic, Vinson notes. Helping drive participation is the fact that at least two chip manufacturers, Broadcom and Trident, are embedding the fingerprinting and other Audible Magic functionalities in their products, he adds.

“This changes the game by making it easier for developers to access the fingerprinting capabilities,” he says. “The TV, for example, can do all the fingerprinting and send them to the device, which can then bring up the information from the cloud.”

Prospects for Liftoff

Will such universally positioned platforms lead to a mode of synchronization that can be reliably and practically deployed on a mass scale? Or, as Oracle’s Sheppard terms these forensic solutions, are they unnecessarily complicated “hacks” that wouldn’t be needed if some more direct way could be found through standardized use of signals embedded in every TV frame to create the synchronization essential for companion apps to thrive?

With the growing number of players moving to the platforms provided by the likes of Coincident, zeebox and Audible Magic it looks like the market is too impatient to wait for a simpler, universal solution. If consumers like what they see, the solutions appear suited to driving the new interactive marketplace sooner than later.