Media servers: from 4K to MP4, media server software keeps innovating our industry toward its OTT future.
The following highlights of the past year cover just a few key things to consider when making a buying decision regarding media server software. And if you're looking to host your media service in the cloud, or you want to use a managed service instead of buying the media server software outright, you'll be glad to know that all of the companies mentioned in this Buyers' Guide offer options as flexible as your budget.
If the problems of years past centered on converting one streaming protocol to another--from RTMP, say, to Apple HTTP Live Streaming (HLS) or Smooth Streaming--the new challenge is how to handle frequent changes to the manifest file required by scalability challenges and changing network infrastructure environments.
A manifest file acts as a stated playlist for the end-user's video player to follow from one piece of content to another. The file itself is often text or extensible markup language (XML) for ease of readability by both a human engineer and the media server(s) required to serve up content listed on the manifest file.
In its simplest form, a manifest--also known as a Media Presentation Description (MPD) in MPEG parlance--is similar to a Spotify playlist, where the end user picks which songs she would like to listen to, and in what order.
The manifest file, though, is invisible to the user and contains more than just the songs chosen for playback. Both manifests and MPDs contain information about media segments--the chunks of data that make up Apple HTTP Live Streaming (HLS) or MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH)--as well as information necessary to choose between the multiple data rates and resolutions offered to a device as part of the manifest file.
In addition, as ad-based over-the-top (OTT) video continues to grow as an industry offering, the need to manipulate manifest files grows. As such, the density of the manifest file grows as content publishers look to customize ad playback, multiple data rates, various screen resolutions, and other pertinent metadata necessary to refine ad-serving options to a highly granular level.
Why are multiple data rates offered, not just for the primary content, but also for the ads served alongside the user-chosen premium content in her individualized playlist?
The primary reason is user feedback: Consumers would rather watch or listen to streamed content at lower bitrates than wait for the content to buffer (or play haltingly) at higher data rates. The ability to adapt a stream to the current realities of network congestion or intermittent delivery is know as adaptive bitrate (ABR) streaming, and it adds a high level of complexity and density to any particular manifest file.
From Encode Once to Mix Once
Even with the complexities of manifest density, media servers have grown into the challenge. Most media servers can handle robust manifest files, including the transcoding (codec conversion) or transrating (resolution or data rate downscaling) of a single high-resolution stream. This combination is necessary for proper ABR delivery, especially for live content that's streamed from the acquisition point as a high bitrate/high resolution such as 4K that needs to be downscaled to 1080p or even 720p for streaming delivery.
The industry slogan for this has been "encode once, stream everywhere" for several years now. But at least one media server company is mixing up that slogan, replacing it with "mix once, play everywhere."
"Manifest manipulation requires bespoke changes to encoders and origin servers as well as manifest modifications for each and every delivery format, says Arjen Wagenaar, CTO of Unified Streaming, adding that these changes typically use client SDKs that are complicated to maintain.
"There is always that nagging uncertainty: Will players, web browsers, and TVs be able to play the stream from the manipulated manifest?" he asks.
This approach, which the company calls Unified Remix, stitches content upstream from the origin server, creating a reference MP4 file for the company's origin server (Unified Origin) to access as the source, which is then streamed as a single stream.
This approach to integrating disparate pieces of content, including primary show content and ads, has been around for some time, but only recently has it found its way into media server applications that the general public can buy to stream to a wide variety of devices. In essence, Unified Streaming is manipulating the streams rather than the manifest file so that "the stream functions as though it has a single origin and a single timeline."
Old Formats Are New Again
The move toward MP4 files isn't just limited to one media server company. The ISO Base Media File Format (ISO-BMFF) is the MP4 file container, and this is key to being able to stream fragmented MP4 (fMP4) using byte-range addressing without the need to create hundreds, thousands, or even hundreds of thousands of standalone segments or chunks before streaming commences.
But what about streaming actual MP4 files, in a way that allows legacy devices to receive streams? It turns out that DDVTECH, makers of the MistServer, have approached legacy device streaming in exactly this way for both video on demand (VOD) and live streaming.
"Back at the beginning of 2016 when we released Live MP4, it had a latency of about 2 keyframe intervals," says Jaron Vietor, CTO of DDVTECH.
What that means, in real-world terms, is that the "live" MP4 stream would require twice the length of the keyframe distance. So if the keyframes were set 5 seconds apart, the MistServer approach would be delayed by at least 10 seconds, on top of the actual delay from the encoder itself. In some ways, this was similar to the way that HLS works, in that HLS needs at least two segments--of approximately 2-10 seconds in length, based on Apple's best practice guidance--to download to the end-user's device before "live" playback begins.
What a difference a year makes, though, as Vietor says that the most recent version of MistServer, released in mid-January 2017, has reduced latency down to a static ~1500 milliseconds.
"That's right, a static ~1500 ms, regardless of key frame interval," says Vietor. "You can watch a stream with a key frame every 20 seconds, with only 1.5 seconds latency, without graphical glitches, without stuttering, and in any quality, with no plug-ins or scripts whatsoever."
This 1.5 second latency is, of course, in addition to encoder latency, which a media server cannot control. But the goal of the DDVTECH team is to reduce it further, down to around a static ~200 ms range, moving much closer to real-time delivery through the media server.
How Now, Latency!
With additional work being done by all media server companies to reduce the latency of their core products, it's no surprise that the Wowza Streaming Engine is also being optimized for a variety of new formats.
We covered a number of Wowza features in the recent "Latency Sucks!" article (go2sm.com/latency), but one of the key takeaways was Wowza's focus on reducing latency in two key ways: WebRTC and WebSockets.
WebSockets are geared toward problems faced when using TCP to deliver streaming via HTTP (using technology such as Apple's HLS or MPEG-DASH). Essentially, WebSockets help keep up the appearance of persistent TCP connections so that smaller segment sizes (of 1 second or less) can be delivered to an OTT device without requiring the request and delivery to be acknowledged.
Using WebSockets to simulate TCP means that the server and end-user player can both freely send messages in either direction without needing to wait for a request from the other side. Internally at the browser, WebSockets use a TCP connection, meaning that they overcome any issues with delivering segments that may be smaller than the TCP windowing size set by the ISP, default browser settings, or even the media server itself.
On the other side of the basic delivery protocol spectrum is UDP, an approach that doesn't require any messaging acknowledgment at all. UDP has traditionally been used by standards-based, real-time streaming protocols such as RTSP. Given the aggressive nature of UDP, many publishers have shunned it in favor of RTMP, which offered low latency streaming across TCP.
The emergence of WebRTC, which we also covered in length in the "Latency Sucks!" article, essentially allows the benefits of UDP--including the ability to traverse firewalls and other potential network obstacles--to be used directly within the browser. According to one industry source, the combination of UDP and WebRTC "thus theoretically can achieve the absolute best latency of any in-browser method, hands down."
Media servers have offered support for closed-captioning solutions, but newer advances in live captioning are beginning to emerge. To understand how live captioning realities differ from traditional timed text--such as SAMI, SMIL, Timed-Text Markup Language (TTML)--and the need for traditional CEA-608/CEA-708 compliance from acquisition through the media server or transcoder and on to the end-user's player, see "Captioning Live Video" on page 220.
Web broadcasters and CE manufacturers alike face a challenge standardizing on a common timed-text standard, which means that media servers will play a significant role in converting between the various types of timed text for the foreseeable future.
In some ways, this support for CEA-708 is a chicken-and-egg conundrum. Even if the live captions can be acquired, if either the media server or end-user player don't support it fully, this important metadata won't make it through the delivery process.
"YouTube and Ustream are the only big players that support [CEA 708] that I've found in the live space," Hurford told me in an interview at Streaming Media West 2016. "Others will say they support it, but they don't actually support the 708 standard, and they offer a different solution which oftentimes isn't a great user experience, [such as] a scrolling transcript that pops up in a separate window."
Far from being relegated to the fate of older streaming media technologies, the growth in media server options continues to be strong. Innovation, offered by newer companies with novel ideas, coupled with enhancements of legacy protocols and formats, which allow1 live content to scale to television-sized audiences with fixed latency timing, means that 2017 should prove to be a year in which several key problems are solved.
Watch closely for the ratification of updated WebRTC standards by the Internet Engineering Task Force, and expect to see media server companies continue to push hard to limit the amount of delay between over-the-air broadcasts and OTT delivery to streaming devices.
Finally, now that 4K streams have been shown for key sporting events over the past several years, expect to see 4K live streaming become more commonplace. We have to thank media server companies for this too, as their research and development makes it possible for online video platforms to offer up adaptive bitrate streams at a variety of resolutions, from Ultra HD to 1080p to 720p.
By Tim Siglin
Tim Siglin is a streaming industry veteran and longtime contributing editor to Streaming Media magazine.
Comments? Email us at email@example.com, or check the masthead for other ways to contact us.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||buyers' guides|
|Date:||Mar 1, 2017|
|Previous Article:||Enterprise video platforms: how to answer the build vs. buy question.|
|Next Article:||Unified Communications: being able to connect enterprise devices isn't the real goal.|