Transcoding Advances Intensify Debate over Hardware Strategies

Kevin Wirick, VP & GM, video processing, Motorola Mobility

Kevin Wirick, VP & GM, video processing, Motorola Mobility

October 20, 2012 –The vendor-driven battle over digital video encoding strategies has taken a new turn with introduction of new hardware platforms touting massive processing capacity as software-based systems continue to post new gains in bitrate and distribution efficiencies.

Motorola Mobility and Imagine Communications are publicizing as-yet-unavailable transcoding systems running on purpose-built ASICs (application-specific integrated circuits) at unprecedented processing rates of 3 gigapixels and 20 gigapixels per second per rack unit, respectively, with prospects for major savings in power and space consumption as well as cost-effective approaches to expanding live multiscreen channel counts into the thousands. Meanwhile, transcoding systems designed to run on generic processors continue to make great strides, not only as a function of ever-greater processing power but as a result of advances in encoding and other software-based techniques.

Software-based systems like Elemental’s, which uses a combination of individual or hybrid CPUs and GPUs (graphics processing units), and Envivio’s, designed for Intel CPUs, have so far dominated the multiscreen streaming environment, prompting some traditional hardware-based encoder suppliers like Harmonic to develop software-based system. But as multiscreen streaming moves from the over-the-top domain into the premium service provider space Motorola and Imagine have gambled on developing hardware systems with massive processing capabilities that are meant to consolidate and cost effectively expand the range of multiscreen streaming options to include all live TV channels, including all the local broadcast channels as well as all the nationally based channels, which can add up to two or three thousand channels in the case of a Tier 1 MSO.

At a moment when most operators haven’t even begun streaming live channels to connected devices and when those that have in most cases are delivering only a handful of channels, there’s general agreement the channel count is going to go up amid a great deal of uncertainty about how that can be accomplished cost effectively. With all the devices in play, comprehensive coverage to all types of Apple iOS and Android smartphones and tablets, PCs, Macs, game consoles and smart TVs requires up to 16 encoded profiles per live channel or on-demand file, meaning handling all requirements for local as well as national programming from a regional headend could require capacity to generate many thousands of profiles at once.

Moreover, there’s a lot of processing required beyond the basic encoding of each stream for each type of device. The transcoder must be able to de-interlace each encoded NTSC file to progressive mode, add IDR (instantaneous decoder refresh) frames to enable SCTE 35-based ad insertion and perform GOP (group of pictures) alignment to ensure smooth transition between fragments sent from adaptive bitrate (ABR) streaming packagers.

Proprietary hardware system advocates assert that pushing the envelope on hardware density and processing power serves to lower the amount of space and power consumed for a given volume of transcoding, far outweighing any cost penalty to be paid for proprietary hardware. Equally, if not more important, the super high processing power of an ASIC purpose built for encoding enables more efficient compression. No matter how many streams a stack of transcoder modules might deliver, the lower the bitrate per stream for a given level of quality, the greater the utilization of bandwidth, which is the most expensive commodity to cope with in the move to multiscreen services.

In fact, notes Kevin Wirick, vice president and general manager of video processing at Motorola Mobility, “the secret of the GT-3 is the latest video technology and our custom video processing algorithms that allow us to get the best video quality in a very small efficient package. An operator can now process a lot more video at different resolutions and provide a higher resolution for different screen formats using our product than with our competitors.”

For example, he explains, the company’s advanced video processing algorithms can exploit the latest processing capabilities of purpose-built ASICs to do much more motion prediction across multiple frames than was previously possible. “So if an operator can only get one megabit through their cable bandwidth and over Wi-Fi to your iPad, with the GT-3 you can have a higher resolution picture than using other people’s transcoders,” he says.

The ability to process video in a one-rack unit at 3 gigapixels per second, more than tripling the highest levels of current-generation hardware-based encoders, translates into capacity to process the equivalent of about 48 1080p/30 HD channels. Input versus output configurations vary, depending on types of channels on the input side and the number of profiles per channels to be delivered from the box. Motorola is spec’ing the 1RU unit as supporting up to 24 input channels with up to 16 encoded profiles per channel on the output.

“Compared to server-based approaches we’re at about ten times more density,” Wirick says. “So we get about ten times more video with the same amount of power as somebody using an Intel-based 86X server would get to do adaptive stream transcoding.”

The GT-3 is slated for general availability in the first quarter. “We have interest from the top tier operators who are doing deployments now and are planning new services coming up over the next year,” he says.

Imagine Communications, which hopes to have its new super high-power transcoding product, dubbed “next:,” available for commercial deployments by the end of the second quarter next year, has been more focused on the hardware aspects than the algorithmic aspects of the platform at this point, acknowledges Chris Gordon, vice president for product and marketing at Imagine. “We’re still working on motion extension and mode decisioning tracking, but that’s not our primary focus,” Gordon says, noting the next: platform benefits from the major encoding advances that have made Imagine’s first-generation product a factor in over half the digital premium channel encoding performed in the U.S.

Along with motion extension, which is to say the predictive encoding processes referenced by Motorola’s Kevin Wirick, mode decisioning is one of the major areas of improvement in encoding efficiency enabled by more advanced processors. It’s a process by which the results of different decision paths are compared to determine what is optimal for a given level of resolution, thereby avoiding over use of resources.

“We’ll continue to tweak our software capabilities,” Gordon says. “But right now our resources are devoted to supporting customer trialing and bringing the product to market on time.”

Imagine’s next: platform will be available in 2RU, 4RU and 10RU iterations. In an apples-to-apples comparison with the 1RU specs of the Motorola GT-3, Gordon notes the 20 gigapixel processing power of next: is the equivalent of 320 HD channels compared to the 48 represented by 3 gigapixels per second. What this means in terms of practical proportions of input channels versus output channels depends, as always, on the number of profiles supported on the output and whether HD or SD channels are in play.

In Imagine’s case there’s no limit on the number of video stream profiles per ABR group, which facilitates adjustments to ongoing changes in multiscreen requirements, Gordon notes. The platform also supports all current profiles, including 1080p60. And, like many transcoding platforms, it comes with support for packaging in multiple ABR streaming modes.

Imagine is able to race ahead with this kind of capacity on next-generation ASICs by virtue of its ability to leverage its accomplishments in software, says Richard Stanfield, the company’s CEO. “What the ASIC can’t do, we do in software,” he says. “We can take all the software from our last generation and create a new product, so our time to market is quick.”

The next: platform promises to open markets beyond North America for Imagine, Stanfield notes. “Our business has traditionally been with large Tier 1 North American MSOs,” he says. “This product takes us to the next level with the rest of the world where there’s a strong demand for linear transcoding in IPTV as well as cable. We’ll be able to price below the current market to capture market share.”

Imagine views IPTV operators’ need for gear to replace aging encoders deployed with initial rollouts six or so years ago as the lowest hanging fruit. Right behind that is the demand for multiscreen streaming support from both IPTV and cable operators here and abroad.

Gordon stresses the flexibility of the new platform when it comes to the type of hardware packaging it’s compatible with and the ways in which built-in storage can be employed. Because most of the firm’s MSO customers have deployed the first-generation platform on HP BladeSystem c7000 enclosures, the next: system will frequently be added as another blade on that chassis.

More generally, availability of the next: system with the Imagine ASICs embedded in PCI cards creates an opportunity to place the transcoding on edge servers operators are deploying to support their own CDN (content delivery network) infrastructures. In such cases, the 1 gigabyte of onboard storage in the 2RU version of the platform could be used to accommodate local time-shifted programming, Gordon suggests.

Advances at Elemental

Support for distributed as well as centralized transcoding architectures, of course, is a major selling point of software-based systems with their ability to leverage low-cost COTS (commodity off-the-shelf) servers. How those purported cost advantages stack up against the forthcoming Motorola and Imagine transcoding machines, given the density and power consumption benefits of the latter, remains to be seen.

But it’s clear the software system providers aren’t sitting still, even when it comes to gaining improvements that could impact MPEG-2 encoding for rapidly increasing volumes of on-demand content. Elemental, for example, which built its MPEG-2 encoding algorithms from the ground up as it has with MPEG-4, VC-1 and the emerging HEVC (High Efficiency Video Coding) standard, believes it can get the MPEG-2 rate to below 10 mbps and possibly down to 8 mbps without sacrificing quality. The result for an MSO aggressively expanding its VOD file count could be infrastructure savings approaching $1 billion, says Keith Wymbs, vice president of marketing at Elemental.

Along with encoding knowhow Elemental achieves a high level of performance efficiency on its core Linux-based Elemental Server platform through a unique blend of parallel processing utilizing Intel CPUs and GPUs from NVIDIA or the new Sandy Bridge hybrid CPU/GPU from Intel, resulting in a three to seven times density improvement over CPU-only systems, according to Elemental officials. The technology, rather than processing individual macroblocks within each video frame serially, processes all the macroblocks in a frame concurrently.

As previously reported (September 2011, p. 10), Comcast is using Elemental’s on-demand transcoding platform for its Xfinity online and mobile service initiatives. Trading out a previous encoding system, Comcast reduced the physical footprint for its Xfinity servers by 75 percent.

Where HEVC is concerned, while the standard is not slated to be completed until well into next year, Elemental believes it has an edge when it comes to having a product that will be ready for commercial deployment once the standard is finalized. “We’re watching the spec closely and implementing aspects as they stabilize,” Wymbs says, noting its implementations so far have achieved a 40 percent in bitrates compared to H.264 (MPEG-4) bitrates. “Our customers will be able to implement the code with software upgrades on Elemental technology they deployed a year ago.”

Elemental also is now delivering a new product, Elemental Stream, which offers premium service providers a way to lower costs of high-volume streaming of live and on-demand content over their networks. Stream, which can be deployed at the encoding location or with CDN resources, allows content to be delivered from the transcoder in a single encoded video format for each bitrate profile by applying the specific DRMs and ABR formats to each user’s stream on the fly.

Elemental Stream supports Apple HTTP Live Streaming (HLS), Adobe HTTP Dynamic Streaming (HDS), Microsoft Smooth Streaming and MPEG-DASH and can apply content protection such as Microsoft PlayReady, Verimatrix VCAS and Motorola SecureMedia, Wymbs says, noting additional profiles could be added in response to new developments. The platform also supports SCTE-35 advertising triggers, closed captioning and subtitle conversion and allows international broadcasters and operators to associate a single video with multiple audio tracks.

Envivio Achieves Big Density Gains

Envivio, too, has been racing ahead with its Intel CPU-based system, which recently scored a big win with a still unnamed Tier 1 U.S. MSO for its multiscreen service. The firm’s advances include the 4Caster G4, the latest version of its fully packaged 2RU encoding platform, representing a 6x density improvement over its previous version. That translates to power to transcode into multiple bitrate formats up to 12 HD channels per 2RU chassis, according to Julien Signès, president and CEO of Envivio.

“Envivio 4Caster G4 is the most powerful encoding platform that we have ever offered,” Signès says. “We are providing a broader range of interfaces, the largest number of output formats and the option of high quality or high density configurations.”

The 4Caster G4 platform houses Muse Live, the core Envivio software system supporting multiple codecs, including the capability to encode in HEVC as that standard takes shape, to transcode premium content into profiles for live and on-demand multiscreen services on all types of distribution networks. By virtue of its support for IP, ASI and SD/HD-SDI interfaces along with redundant power supplies and hot-swappable nodes, the new platform can be used for all premium service environments, Signès notes.

Muse also runs on HP BladeSystem c7000 and ProLiant BL460c series servers and supports a wide range of additional features such as picture-in-picture, alternative audio languages, closed captions, DVB-Subtitles and DVB-Teletext. This allows Envivio to support a wide range of distributed architectures and pure OTT plays as a complement to the more centralized 4Caster option.

Envivio’s solution for distributed positioning of the stream fragmentation and DRM packaging process for ABR-based services is the Halo Network Media Processor, which the company recently upgraded to support time-shifted service models, such as catch-up TV, start over and network PVR. Signès says the Halo “TV Anytime” functionalities are in trials with multiple operators in Europe and North America, representing still another sign of how all the on-demand services common to the traditional TV realm are now moving into the multiscreen space.

“The new TV Anytime capabilities available on Halo further enhance the multiscreen user experience by allowing operators to deliver time-shifted TV and customized assets,” Signès says. He notes that a key element now available on Halo is Personalized Index Creation (PIC), a new approach enabling dynamic asset creation in the network, including highlights creation and time-shifted TV assets.

This streamlined solution utilizes bits of content already cached in the network to deliver a unique stream per user, he explains. By leveraging the existing caching infrastructure, PIC does not require expensive storage and processing and opens up possibilities for new personalized service offerings.

It will be interesting to see what impact the massive ASICs-based transcoding solutions from Motorola and Imagine have on service providers’ decision making as they ramp up for all-encompassing next-generation multiscreen services. Whether or not they will trigger a swing back to hardware-based systems will depend a lot on the software system suppliers’ ability to build compelling, market-leading software solutions. But they’ll also have to sustain what has been a winning argument about the merits of relying on Moore’s law to generate commodity hardware options that make reliance on proprietary hardware a risky proposition.