Network World Lab Test: Web Front-End Device Performance
Results
Published 16 January 2006
Test Methodology
Version 5.60. Copyright
2005-2006 by Network Test Inc. Vendors are encouraged to comment on this
document and any other aspect of test methodology. Network Test reserves the
right to change test parameters at any time.
This document describes the procedures used to benchmark the
performance of Web front-end devices. In addition to boosting performance
through conventional L4-L7 load-balancing functions, these devices also help
improve server farm performance with features like layer-7 switching, TCP
connection multiplexing, and HTTP compression.
While an earlier test in
Network World assessed Web front-end devices in terms of features, the
primary focus of this evaluation is performance. Using a variety of Web content
types, we will evaluate devices in terms of:
A key differentiator of products in this area is their layer-7 switching capabilities. In all tests described here, path selection decisions are made on the basis of information embedded in URLs. This approach offers finer-grained control over application performance than switching based on criteria at layer 4 or below.
Some of our tests model traffic loads representing two different classes of concurrent users. These are clients connected via dial-up and DSL/cable connections, including representative latencies for each user class.
A key assumption is that the Web front-end devices will
handle all offered loads with no failed transactions during the measurement
period. While it may be possible to achieve higher performance with some number
of failed transactions, there is no consensus as to what constitutes
Òacceptable loss.Ó In addition to reporting on metrics such as response times
and transaction rates, we also plan to report on failed transactions, if any.
This document is organized as follows. This section
introduces the project. Section 2 describes the test bed. Section 3 describes
test procedures. Section 4 logs changes to this document.
Figure 1 below illustrates the physical configuration of the test bed, and Figure 2 shows the logical configuration.

Figure 1: Web front-end device physical test bed

Figure 2: Web front-end device logical test bed
In the logical configuration, up to 2.35 million clients
running Microsoft Internet Explorer request objects from a single virtual IP
address. Behind that virtual IP, the device under test (DUT) forwards requests
to 16 servers running Microsoft IIS.
The servers are divided into Groups 1 and 2. In all tests, the DUT will be configured to forward requests to Server Groups 1 or 2 depending in information in the URL. Section 2.2.3.1 gives the criteria for server group selection.
For tests of transaction rates and response times, the hosts in this diagram represent two classes of users: Those on dial-up and DSL/cable connections. The section ÒUser ClassesÓ below describes the rate and delay characteristics for these user groups.
While clients may come from any of 2.35 million unique IP
addresses, the actual number of concurrent clients will be much smaller. For
example, if a DUT can handle 10,000 transactions per second, and each client
requests a page that contains 100 objects, the number of concurrent clients at
any given instant would be 100 (10,000 / 100). The actual totals will vary
depending on workload and DUT horsepower.
To prevent ARP cache overload, clients will sit behind
virtual routers emulated on the Avalanche traffic generators. Thus, the DUT
will see only a handful of MAC addresses on its external interface(s).
The device under test should be equipped with at least 2 gigabit Ethernet interfaces, preferably copper (1000Base-T). We can accommodate devices with multiple interfaces on either the external or internal side, or both.
We also can accommodate up to four multimode fiber gigabit
Ethernet interfaces on the DUT in place of copper interfaces.
The device under test should use the following IPv4
addresses on external and internal interfaces:
|
Side |
IPv4 address |
Prefix length |
Gateway |
|
Internal 1 |
2.1.1.1 |
24 |
2.1.1.1 |
|
External 1 |
2.1.2.1 |
24 |
2.1.2.1 |
|
Internal additional (optional) |
2.1.1.[102,103,104É] |
24 |
2.1.1.1 |
|
External additional (optional) |
2.1.2.[2,3,4É] |
24 |
2.1.2.[2,3,4É] |
The device under test requires static routes to reach the
following networks:
|
Network/prefix length |
Gateway |
|
3.0.0.0/16, 4.0.0.0/16, É 11.0.0.0/16, 43.0.0.0/8 |
2.1.2.101 |
|
13.0.0.0/16, 14.0.0.0/16, É 21.0.0.0/16 |
2.1.2.102 |
|
23.0.0.0/16, 24.0.0.0/16, É 31.0.0.0/16 |
2.1.2.103 |
|
33.0.0.0/16, 34.0.0.0/16, É 41.0.0.0/16 |
2.1.2.104 |
|
43.0.0.0/8 |
2.1.2.101 |
Devices under test should be configured to make forwarding decisions based on URL contents.
As shown above in Section 2.1, Figure 2, we divide servers into Groups 1 and 2. The following table lists rules for forwarding to each server group:
|
If URL ends withÉ |
Éforward to server group: |
|
0, 1, 2, 3, 4, 5, 6, 7, 8,
9, css, htm, html, js |
1 |
|
0_, 1_, 2_, 3_, 4_, 5_,
6_, 7_, 8_, 9_, gif, jpe, jpg |
2 |
In the tests described below in section 3.3, we compare performance results with HTTP compression enabled and disabled. We would expect HTTP compression to be beneficial for dial-up and DSL/cable users, and advise against enabling HTTP compression for LAN users.
Some front-end devices support URL rewriting, for example to hide the topology of an internal network or to hide the type of Web objects in use (e.g., .asp pages). If supported, URL rewriting should be DISABLED. Vendors should supply details as to rewrite syntax if not already called out in documentation.
If supported, caching should be DISABLED. This is not a commentary on the utility or desirability of caching as a feature. We regard caching as a vitally important feature, and plan to discuss caching features in a sidebar article accompanying this test. The traffic load used in this test is not large enough, in terms of object size or object count, to represent a meaningful exercise of caching capacity.
TCP multiplexing should be enabled whenever possible. We can test with TCP multiplexing disabled if requested, but only in addition to the with-multiplexing case, and then only if time permits.
If supported, DDOS protection should be ENABLED for all
tests.
The primary test instruments for this project will be the Spirent Avalanche 2500 and Reflector 2500. These appliances generate and analyze layer 4-7 traffic at very high volumes. Each Avalanche/Reflector 2500 pair is capable of sustaining loads of up to 50,000 HTTP transactions per second and up to 2 million concurrent connections.
We run Avalanche version 6.55 on two pairs of Avalanche 2500 and Reflector 2500 appliances.
Avalanche emulates clients using Microsoft Internet Explorer and Reflector emulates servers running Microsoft IIS. Wherever possible, we configure Avalanche and Reflector to use the default networking characteristics of both IE and IIS. For example, IE closes most TCP connections with a RST rather than a FIN, and thus so will Avalanche.
There are some differences between the default Reflector IE and IIS configurations for our tests, including the following:
More information about Avalanche and Reflector is available
here:
http://www.spirentcom.com/analysis/product_line.cfm?pl=32&wt=2
We use an Extreme Summit7i switch to connect the external and internal sides of the test bed, assigning different L2 untagged VLANs to each side. We have verified the Extreme switch does not leak traffic between VLANs, even under line-rate loads.
The Summit switch is equipped with 1000Base-TX copper and 1000Base-SX fiber gigabit Ethernet interfaces.
We use an Apcon Intellapatch virtual patch panel to tie various vendorsÕ devices under test into the test bed. The Intellapatch unit allows us to connect multiple vendorsÕ devices to the test bed without the need for recabling.
In tests of transaction rates and response times (Sections 3.3, 3.5 and 3.6), we will configure the Avalanche test instruments to emulate two classes of users: Those on dial-up and DSL/cable connections[1]. This section describes the rate and delay characteristics for these user groups. Note that the delay we introduce is in addition to the reduced transmission rate.
Dial-up users:
Access speed: 53 kbit/s
Delay (send/receive): 50 milliseconds
DSL/cable users:
Access speed: 1.5 Mbit/s
Delay (send/receive): 15 milliseconds
In tests where we model these user classes, we will do so in a ratio of approximately 1:1, measured in terms of the number of users in each class.
We segregate users by source IP subnets. The following table lists the source subnets for each user class:
|
User class |
Source IP subnets |
|
Dial |
[3,6,9,13,16,19,23,26,29,33,36,39].0.0.0/16 |
|
DSL/cable |
[4,7,10,14,17,20,24,27,30,34,37,40].0.0.0/16 |
This section documents the procedures we use for each test.
For each event, we list the objective, test bed configuration, procedure, and
test metrics.
To determine the maximum number of concurrent client TCP connections
a device can sustain with zero failed connections.
The test bed is configured as shown in Section 2.1 of this document, and devices are configured with IP addresses as given in Section 2.2.2.
We configure the Avalanche (client emulator) to request objects from Reflector (server emulator) with a variation on the ÒSPI_Open_ConnsÓ test supplied with the test instrument.
Our configuration uses HTTP 1.1 to increase the connection count. Unlike the stock test, our configuration uses the client and server IP addresses given in section 2 of this document.
We enable the ÒSLB binningÓ feature of Avalanche. This feature tracks the efficiency of load-balancing by measuring the number of requests sent to each origin server. Since the client request load should be evenly distributed across all servers in this test, variation among transaction counts should be minimal.
|
|
Phase 0 |
Phase 1 |
Phase 2 |
Phase 3 |
|
Label |
Delay |
Stair Step |
Steady State |
Ramp Down |
|
Pattern |
Flat |
Stair |
Stair |
Flat |
|
Time Scale |
Default |
Default |
Default |
Default |
|
Repetitions |
NA |
10 |
1 |
NA |
|
Height |
0 |
400,000 |
0 |
0 |
|
Ramp Time |
0 |
60 |
0 |
0 |
|
Steady Time |
28 |
28 |
60 |
28 |
Maximum concurrent TCP connections with zero failures
To determine the savings in TCP connections between the DUT and servers when handling a large number of client connections
Configuration is similar to the previous section on maximum concurrent client connections. There are two major changes:
|
|
Phase 0 |
Phase 1 |
Phase 2 |
Phase 3 |
|
Label |
Delay |
Stair Step |
Steady State |
Ramp Down |
|
Pattern |
Flat |
Stair |
Stair |
Flat |
|
Time Scale |
Default |
Default |
Default |
Default |
|
Repetitions |
NA |
10 |
1 |
NA |
|
Height |
0 |
10,000 |
0 |
0 |
|
Ramp Time |
0 |
60 |
0 |
0 |
|
Steady Time |
28 |
28 |
60 |
28 |
Concurrent client-side TCP connections
Concurrent server-side TCP connections
Client-side transaction rate
Failed transactions (if any)
To determine transaction rates and page and object response times
The test bed is configured as shown in Section 2.1 of this document, and devices are configured with IP addresses as given in Section 2.2.2. Clients and servers use HTTP 1.1.
The Avalanche load spec for this test is ÒSimUsersÓ (simulated users). Clients in this test emulate the two user classes described in section 2.5 ÒUser Classes.Ó Again, here are the characteristics for the user classes:
Dial-up users:
Access speed: 53 kbit/s
Delay (send/receive): 50 milliseconds
User think time[2]: 60 seconds
DSL/cable users:
Access speed: 1.5 Mbit/s
Delay (send/receive): 15 milliseconds
User think time: 60 seconds
We model the two user classes in a ratio of 1:1. Note that these ratios refer to the number of active users for each class, not to their rates.
We segregate users by source IP subnets. The following table lists the source subnets for each user class:
|
User class |
Source IP subnets |
|
Dial |
[3,6,9,13,16,19,23,26,29,33,36,39].0.0.0/12 |
|
DSL/cable |
[4,7,10,14,17,20,24,27,30,34,37,40].0.0.0/12 |
In separate test runs, clients will request the following content types from servers:
TEST 3.3A: 500-kbyte HTML text object
TEST 3.3B: 5 popular Web sites (Amazon, BBC, UCLA, White House, Yahoo!)
For both content types, we run the tests twice: One with HTTP compression disabled, and again with HTTP compression enabled (for devices that support this feature).
A 500-kbyte text object is relatively large and should be highly compressible. We would expect traffic in the 5-site test to be compressed to a lesser degree, especially given the large percent of small objects and noncompressible objects such as image files.
|
|
Phase 0 |
Phase 1 |
Phase 2 |
Phase 3 |
|
Label |
Delay |
Stair Step |
Steady State |
Ramp Down |
|
Pattern |
Flat |
Stair |
Stair |
Flat |
|
Time Scale |
Default |
Default |
Default |
Default |
|
Repetitions |
NA |
1 |
1 |
NA |
|
Height |
0 |
42 |
0 |
0 |
|
Ramp Time |
0 |
300 |
0 |
0 |
|
Steady Time |
28 |
0 |
60 |
28 |
We run this test twice: once with
HTTP compression disabled on the DUT and (if supported) again with HTTP
compression enabled.
|
|
Phase 0 |
Phase 1 |
Phase 2 |
Phase 3 |
|
Label |
Delay |
Stair Step |
Steady State |
Ramp Down |
|
Pattern |
Flat |
Stair |
Stair |
Flat |
|
Time Scale |
Default |
Default |
Default |
Default |
|
Repetitions |
NA |
1 |
1 |
NA |
|
Height |
0 |
500 |
0 |
0 |
|
Ramp Time |
0 |
300 |
0 |
0 |
|
Steady Time |
28 |
0 |
60 |
28 |
We run this test twice: once with
HTTP compression disabled on the DUT and (if supported) again with HTTP
compression enabled.
|
|
Phase 0 |
Phase 1 |
Phase 2 |
Phase 3 |
|
Label |
Delay |
Stair Step |
Steady State |
Ramp Down |
|
Pattern |
Flat |
Stair |
Stair |
Flat |
|
Time Scale |
Default |
Default |
Default |
Default |
|
Repetitions |
NA |
1 |
1 |
NA |
|
Height |
0 |
4200 |
0 |
0 |
|
Ramp Time |
0 |
300 |
0 |
0 |
|
Steady Time |
28 |
0 |
60 |
28 |
We run this test twice: once with
HTTP compression disabled on the DUT and (if supported) again with HTTP
compression enabled.
Transactions per second
Response times (equivalent to time to last byte)
Failed transactions (if any)
To determine the maximum forwarding rate of the DUT when handling HTTP transactions.
The test bed is configured as shown in Section 2.1 of this document, and devices are configured with IP addresses as given in Section 2.2.2. Clients and servers use HTTP 1.1.
Unlike the previous test, all clients in this event operate at native LAN rates.
Clients will request a 1-Mbyte object from the servers.
The Avalanche load specification will be in SimUsers. We run this test with 100 concurrent SimUsers emulated by the Avalanche units.
|
|
Phase 0 |
Phase 1 |
Phase 2 |
Phase 3 |
|
Label |
Delay |
Stair Step |
Steady State |
Ramp Down |
|
Pattern |
Flat |
Stair |
Stair |
Flat |
|
Time Scale |
Default |
Default |
Default |
Default |
|
Repetitions |
NA |
1 |
1 |
NA |
|
Height |
0 |
1,000 |
0 |
0 |
|
Ramp Time |
0 |
300 |
0 |
0 |
|
Steady Time |
28 |
0 |
60 |
28 |
Maximum forwarding rate
Failed transactions (if any)
To determine the performance cost, if any, of access control lists on the device under test
The test bed configuration is identical to that in Section 3.3B, in the scenario with HTTP compression enabled (if supported). In this test, the device under test also should be configured with 20 access control rules.
All rules should deny access from specific IP addresses and to specific URLs.
The table below lists the access control roles
|
Condition |
Action |
|
IP src: 101.0.0.0/8 |
Deny |
|
IP src: 103.0.0./8 |
Deny |
|
É |
Deny |
|
IP src: 119.0.0.0/8 |
Deny |
|
URL: http://www.x.com |
Deny |
|
URL: http://www.xx.com |
Deny |
|
É |
Deny |
|
Deny |
Transactions per second
Page response times (equivalent to time to last byte)
URL response times (equivalent to time to last byte per object)
Failed transactions (if any)
To determine the performance cost, if any, of DDOS attacks on the device under test
The test bed configuration is identical to that in Section 3.3B, in the scenario with HTTP compression enabled (if supported). In this test, the Avalanche test instrument also will be configured to offer one or more DDOS attacks concurrent with client requests.
On the principle that Òattackers donÕt make appointments,Ó we do not advertise the type or rate of DDOS attacks to be used, other than to say we use attack(s) available in the Avalanche test tool.
Transactions per second
Page response times (equivalent to time to last byte)
URL response times (equivalent to time to last byte per object)
Failed transactions (if any)
Version: 5.60
Date: 19 January 2006
Final version
Header: Noted publication date
Section 3.3.3, step 5: Added description of 12,000-user test
Version: 5.50
Date: 27 November 2005
Executive summary: reword discussion of traffic classes
Section 2.1: Eliminated LAN traffic class (three-class test is now two-class test)
Section 2.3: Cut bullet point d on Òoverride internal timeout calculation.Ó We now use the default setting of no override, which delivers higher performance.
Section 2.5: Reworded to indicate two traffic classes, not three
Section 3.1: Deleted SLB binning metrics
Section 3.2.3: Distinguished between tests with 3-second client think time (test 3.2A) and 60-second think time (test 3.2B)
Section 3.3: Now with two user classes (dial-up and cable/DSL), and with 500-kbyte and 5-site tests
Section 3.5: Now with two user classes (dial-up and cable/DSL), and with 500-kbyte and 5-site tests
Section 3.6: Now with two user classes (dial-up and cable/DSL), and with 500-kbyte and 5-site tests
Version 5.00
Date: 18 November 2005
Section 2.2.2: Added net-43 to static routing table
Section 2.2.3.2: Reworded HTTP compression discussion to reflect newly added tests comparing of compression disabled and enabled
Section 2.2.3.3: Header rewriting, if supported, may be disabled
Section 2.2.3.6: Specified that DDOS protection should be enabled for all tests
Section 2.3a-d: Noted differences from default Avalanche and Reflector values
Section 2.5: Revised text about ratios (now based on transaction rates rather than user counts) and LAN rates (now 10 Mbit/s rather than 1 Gbit/s)
Section 3.1.3: Specify that we use a search (not step) algorithm to find maximum concurrent connection count
Sections 3.2.2-3.2.3: Added 3, 30, and 60-second user think times to test
Sections 3.3.2-3.3.3: Replaced 5-site test with 11-kbyte HTML text test and added notation that tests are run with HTTP compression disabled and enabled
Sections 3.5.2-3.5.3: Replaced 5-site test with 11-kbyte HTML text test and added notation that tests are run with HTTP compression enabled
Sections 3.6.2-3.6.3: Replaced 5-site test with 11-kbyte HTML text test and added notation that tests are run with HTTP compression enabled
Version 4.10
Date: 26 October 2005
Global: Changed erroneous ÒTCP compressionÓ to ÒHTTP compressionÓ
Section 2.1: Added Apcon virtual patch panel back to Figure 1
Section 2.3: Updated Avalanche version to 6.55
Section 2.4: Reinstated Apcon patch panel description
Section 3.1.2: Described SLB efficiency metrics
Section 3.1.3: Clarified that measurements are taken only during steady-state phase
Section 3.1.4: Added SLB efficiency metrics
Version 4.00
Date: 19 September
2005
Global: Changed all tests to use L7 criteria for forwarding
decisions; was L3 and L4 criteria
Executive summary: Added notation about L7 selection
Section 2.1: Revised Figure 2 to show two groups of servers
Section 2.1: Revised text to indicate URL switching for
choice of server group
Section 2.2.3: Added section 2.2.3.1 and table showing URL
switching criteria
Sections 3.2.4, 3.3.4, 3.4.4, 3.5.4, 3.6.4: Added Òfailed
transactions (if any)Ó as a metric
Version 3.20
Date: 14 September
2005
Section 2.1: In Figure 2, corrected number of emulated
clients (64K on each of 36 client subnets)
Section 2.3: Added notation specifying that Reflector
maximum requests per connection is set to 4,294,967,295
Sections 3.1.3 and 3.2.3: Changed Phase 2 duration from 28
to 60 seconds
Sections 3.x.3 (all tests): Changed Phase 3 duration from 60
to 28 seconds
Section 3.4.3, step 1: Clarified that clients request
objects one after another
Section 3.5.3 and 3.6.3: Clarified that test repeats 5-site,
100,000-concurrent-SimUser test from Section 3.3
Version 3.10
Date: 8 September 2005
Sections 3.4.2 and 3.4.3: Changed concurrent SimUser count from 1,000 to 100
Version 3.00
Date: 16 August 2005
Title: Deleted publication date
Section 2.1: Deleted Apcon patch panel from Figure 1; changed client subnet prefix lengths in Figure 2 from /12 to /16; reduced client count from 36 million to 2.35 million; deleted reference to loss in ÒUser Classes referenceÓ
Section 2.2.2: Corrected IP addresses for optional additional internal interfaces; in static routes table, changed client subnet prefix lengths from /12 to /16
Section 2.2.3: Clarified configuration information regarding caching and TCP multiplexing
Section 2.3: Updated Avalanche software version
Section 2.4: Deleted references to ClearSight analyzer and Apcon Intellipatch panel
Section 2.5: Deleted references to packet loss (packet loss is now 0); changed client subnet prefix lengths from /12 to /16
Section 3.1.3: Changed object size from 8 kbytes to 1 kbyte; moved 60-second delay from server side to client side; added table with traffic load parameters; clarified that measurement period is 60-second steady-state phase only; changed granularity from nearest 1,000 to nearest 100,000 connections
Section 3.2.3: Added table with traffic load parameters
Section 3.3.2: Eliminated loads involving 20-, 50-, and 100-kbyte text objects (kept 5-site and 500-kbyte text loads)
Section 3.3.3: Added tables with traffic load parameters for 5-site and 500-kbyte text loads
Section 3.4.3: Added table with traffic load parameters
Section 3.5.1: Corrected typo (test goal is measurement under DDOS attack)
Section 3.5.3: Clarified 5-site load (was maximum transaction count, now follows load used in section 3.3)
Version 2.01
Date: 11 April 2005
Section 2.2.3: Clarified that HTTP compression is recommended only for dial-up and DSL/cable users
Section 2.3: Changed Avalanche software version to 6.51
Version 2.0
Date: 6 April 2005
Section 2.1: Added second Reflector and Apcon virtual patch panel to physical test bed diagram
Changed client subnet count to 36 in logical test bed diagram
Changed virtual host count to 36 million
Added language about multiple user classes in some tests
Section 2.2.2: Changed client subnet count to 36 in table
Section 2.2.3: Added language about caching and DDOS protection.
Section 2.4: Noted that Extreme Summit7i has fiber and copper interfaces; added language about Apcon Intellapatch virtual patch panel
Section 2.5: Added section on user classes
Section 3.1: Changed to HTTP 1.1
Section 3.2: Changed objects requested to 5 popular sitesÕ home pages
Section 3.3: Renamed ÒTransaction Rates and Response TimesÓ to emphasize importance of response-time metrics
Section 3.3.1: Added response time objectives
Section 3.3.3: Changed load spec to Simusers, and added language about controlling transaction rate with user think time
Section 3.3.4: Added metrics for response time with 100 and 1000 Simusers
Added sections 3.5 and 3.6 on ACL and DDOS testing (both time permitting)
Version 1.0
Date: 25 February 2005
Initial public release
[1] Previous versions of this methodology added a third user class at native LAN rates. The problem with this approach was that even a very small amount of LAN traffic overwhelmed the other user classes because of the huge differential in traffic rates.
[2] User think time represents the time a simulated user waits before requesting top-level URLs. For example, in the 5-site test, a user would request the Amazon home page, retrieve all objects from that page, wait 60 seconds, and then request the BBC home page.