Performance Testing Results of Adaptive Media Streaming over HTTP/2

Posted by Lucas Pardue on 26 Jul 2015, last updated 2 Nov 2015

Following a successful public trial involving HTTP/2 we present the results from some comparative performance testing carried out in an internal controlled environment.

Performance Testing Results of Adaptive Media Streaming over HTTP/2

HTTP/2 research recap

The main design objective of the HTTP/2 protocol is to improve the performance of client-side web page load times. The focus of our research is on finding out whether HTTP/2 is a good fit for media streaming applications. We are particularly interested in measuring server-side performance to understand the benefits and costs of this new protocol in the media streaming context compared with HTTP/1.1.

In order to gather data we ran a public trial over Christmas 2014 and reported the initial findings in our previous blog post. The data set obtained through the public trial was very interesting but despite serving over 25,000 requests, our analysis showed that it was not large or consistent enough to satisfy our research objectives.

Comparative performance testing

To get a large enough set of data we realised we would need to measure server performance under high load conditions and the only way we could realistically achieve this was by simulating large numbers of concurrent client requests. We therefore decided to create an internal testbed that would allow us to run a series of performance comparison experiments under controlled lab' conditions. The diagram at the top shows the general arrangement.

The testbed allows us to more fairly and accurately compare the performance of cleartext and encrypted variants of HTTP/1.1 and HTTP/2. Cleartext HTTP/2 is not well supported by the major web browsers and in our public trial we saw very little traffic of this type.

We followed a very similar methodology to the public trial and developed experiments around the streaming of MPEG-DASH. This time we wanted to measure the performance of the server when under sustained load from a large number of clients. The experiments again made use of the on-demand BBC Testcard stream across the HTTP variants shown below.

HTTP Version	Cleartext	Encrypted
1.1	HTTP/1.1	HTTPS/1.1
2	h2c	h2

Reverse Proxy

Based on our experience of nghttpx in the public trial, we continued to use it as a Reverse Proxy in front of the media origin. nghttpx captures properties of the HTTP protocol interaction in its access logs with each line representing a request/response pair. We were most interested in the Request Time value, which is the time interval between receiving the first byte of the request, sending the complete response and then writing the log line. We modified nghttpx to increase the accuracy of the Request Time to milliseconds with microsecond resolution.

The Request Time gives us a measurement of the relative performance of the server allowing fair comparisons to be made between the four protocol variants. Note that the values cannot be directly compared with those reported in other sources, not least because the testbed provides an "ideal" environment free from issues, such as packet loss and delay, encountered in typical Internet usage.

Test client

In order to perform the tests with a large number of clients we surveyed the HTTP load testing tools available but failed to identify one that satisfied all of our criteria (support for all four protocol variants, support for MPEG-DASH-like behaviour and support for scalable testing with minimal management overhead). In the end we took h2load, part of the nghttp2 project, and customised it to add HTTP/1.1 support, add timing script playback capability and add client ramp-up capability.

We created a timing script format that allows us to run tests with repeatable DASH-like client behaviour, essential for fair comparison of the results. We define the process of capturing timing scripts as training. The training session was run from Google Chrome Canary 64-bit build version 44, loaded with the DASH-IF JavaScript Reference Client version 1.3.0 playing the BBC Testcard stream via the Reverse Proxy over HTTP/2.

The training session lasted for 34 minutes, a realistic length for media streaming. For the Testcard stream this produces a test script containing 1083 requests. In the absence of resource contention or network issues the training client naturally drifted toward the highest available audio and video Representations.

Deployment and execution

The experiments were run on the testbed architecture shown above. h2load and nghttpx (built as part of nghttp2 0.6.7-DEV linked against OpenSSL 1.1.0-dev) both run on the same test server, which has eight physical CPU cores and runs a v3.10 Linux kernel. The CPU supports hardware acceleration via AES-NI, meaning that our tests involving encryption benefit from the NISTZ optimisations in OpenSSL.

Testing during the development phase identified the potential for extremely poor performance if either the Reverse Proxy or clients are resource constrained. Due to the nature of the deployment the main resource under contention is the CPU and the worst case in this regard is serving h2, even with AES-NI hardware acceleration. We determined by trial and error an allocation of resources that would allow both the Reverse Proxy and clients to perform well, dedicating a single core to nghttpx and spreading the h2load clients over the remaining seven cores. We finalised on 13 client threads per core for a total of 91 concurrent clients. The start time of clients was staggered so as to avoid a "stampeding herd" effect on the Reverse Proxy.

In order to provide a large data set resilient to transient effects and variability three separate test runs were performed per test variant, totalling 12 test runs. Each run is executed by running a fresh instance of nghttpx and h2load.

Analysing the data

The experiments created a lot of raw data (over 2 million log lines!) and in order to help us process this efficiently we developed an in-house tool to condense the data into a format more suitable for analysis.

Due to the way MPEG-DASH works within the ideal network conditions in the testbed, we saw the majority of HTTP requests to be for media segments belonging to two Representations:

Audio – 128 kbit/s AAC-LC, 48 kHz sampling rate, 2 channels, 60 kBytes median content size
Video – 1920 &times 1080, 25 Hz frame rate, Interlaced, High 4.0 profile, 8 Mbit/s approximate bit rate, 3.7 MBytes median content size

For clarity, we present the data only for media segments; we have deliberately excluded the MPD and Initialisation Segments from consideration due to their relative statistical insignificance, both in terms of request count and content size. Similarly, we have chosen to analyse only successful requests, defined as requests receiving HTTP Status code with value 200. Fewer than 0.004% of the total requests were unsuccessful.

During the analysis of the data, it became apparent that the Request Time was susceptible to variation, most notably with some large maximum values observed. In order to avoid misrepresentation of results we have concentrated on median values and Interquartile Ranges (IQR).

Results

The table below presents the results for the four test variants:

	Audio Request Time (ms)		Video Request Time (ms)
	Median	IQR	Median	IQR
HTTP/1.1	4.9	3.7	204.7	119.8
h2c	6.9	5.8	305.2	230.4
HTTPS/1.1	10.3	9.3	535.2	414.2
h2	28.9	30.5	1259.5	421.4

For HTTP/1.1, it is seen that encryption increases the median Request Time by a factor of 2.1 for Audio and 2.6 for Video. For HTTP/2, encryption increases median Request Time by a factor of 4.2 for both the Audio and Video. Similarly, we see the IQR (a measure of the variability) increase. The results for HTTP/2 stand out as being noticeably worse and spurred us to investigate the possible cause.

Additional experimentation

Keen to understand the cause of the HTTP/2 observation, we pursued some further lines of investigation including analysis of the nghttp2 code base and correspondence with the project maintainer. This yielded two possible factors that may have contributed to the poor performance; namely dynamic TLS record sizing and HTTP/2 flow control.

Dynamic TLS record sizing is a page load time optimisation method that works by improving the interaction between TCP and TLS. However, due to the nature of MPEG-DASH this dynamic behaviour could result in poor data throughput. Our analysis revealed that this optimisation was only enabled in nghttpx for HTTP/2 connections. To test if dynamic TLS record sizing was contributing to the poor HTTP/2 performance we modified to code to effectively disable it. Tests run with this change are labelled "h2-R".

HTTP/2 flow control is a scheme designed to ensure that multiplexed HTTP/2 streams sharing the same transport connection do not destructively interfere with each other. Our version of the h2load test client used the default window size (defined in the HTTP/2 specification) and we wondered whether a larger window would improve performance. To test this hypothesis we modified the runtime configuration of h2load to specify the largest value it allows. Tests run with this change are labelled "h2-F" (flow control only) and "h2-RF" (flow control plus dynamic TLS record size).

Additional Results

In our analysis of the additional test runs we observed that dynamic TLS record sizing was the cause of the poor HTTP/2 behaviour, while HTTP/2 flow control did not significantly affect results one way or the other. The table below captures the best HTTP/2 results observed across all tests (where "best" is defined as having the lowest median and IQR values). For the best HTTP/2, encryption increases median Request Time by a factor of 2.1 for Audio and 2.4 for Video.

	Audio Request Time (ms)		Video Request Time (ms)
	Median	IQR	Median	IQR
HTTP/1.1	4.9	3.7	204.7	119.8
h2c-F	6.4	5.3	278.2	190.2
HTTPS/1.1	10.3	9.3	535.2	414.2
h2-RF	13.2	15.6	671.0	764.7

The final chart presents all of the results in the form of cumulative distribution. There are two main groups, relating to the Audio and Video Representations. The poorest performers are identified as being the right-most within their group. The gradient represents the performance profile.

The following chart presents the all of the results in the form of box and whisker plots. The dotted lines indicate the fence between acceptable outliers and statistical anomalies.

Conclusion

In a series of controlled experiments we showed that for MPEG-DASH media segments:

Encrypted HTTP versions perform worse than their cleartext equivalents.
HTTP/2 performs worse than HTTP/1.1 for both cleartext and encrypted traffic.
HTTP/2 and encryption increase the variability of server request processing times.

Contact

For more information please contact us at the address http2@rd.bbc.co.uk

This post is part of the Broadcast and Connected Systems section

Accessibility links

Research & Development