Tuning web server for more connections
Tuning web server for more connections
Recently, I was assigned a task, to figure out how to tune web server, so it can handle more connections at once. I know how to install web server, and I know how to set it up, but I actually haven’t thought about how to tune web server for more performance. This is a really good chance to get to know your web server better, so let’s dig in!
When you’re trying any parameter to your service/application, it’s important for you to know what that parameter means. Any tuning is useless unless you know how that parameter changed your service/application. Do this on your own risk, and only change one parameter at a time. Check if a change makes your service/application unstable, if it did, rollback, proceed to next change if everything works fine.
The task I was assigned is to improve performance for Amazon EC2 instance, running Amazon Linux 2. But any Linux distro should works fine, too.
The results in this post came from
Ubuntu Server 20.04.
Unless specified, these options are going to be written in
Every program in the system have it’s own virtual memory, and it’s operating system’s job to manage those memory.
We have four parameters to study here.
Let’s take a look at each of these parameters
This represents the percentage of the free memory before activating swap. The lower the value, the less swapping is used and more memory pages are kept in physical memory.
The default value of
vm.swappiness is usually 60, and using swap may make your program less responsive, so if you need your program to be responsive at any time, it’s better to lower the value of this parameter.
For MariaDB, it’s recommended to set
vm.swappiness value to 1. For web server, setting this value to 10 should work.
vm.dirty_ratio, we need to talk about what
dirty means first.
dirty memory means the memory data that is in the physical memory, and needed to be written into the disk. Because hard drive’s write speed is usually slower than RAM’s write speed, and there are lots of write tasks in the system, so it’s understandable that system would first write data into RAM temporarilly, then, when there’s too many data in memory or system is idle, it would write those data into hard drive, and release those resources.
When your program creates too many dirty memory, it may make other programs unable to get memory, thus result in lower system performance, so it’s a good idea to write dirty memory into drive before it consume too many RAM.
The difference between
vm.dirty_ratio will run writeback in foreground, and
vm.dirty_background_ratio will run in background.
So the value of
vm.dirty_background_ratio should always be lower than
An example of these values is
vm.dirty_ratio = 75
You need to consider your machine’s spec for these values. Whether you got lots of RAM or you got RAID card, they will make lots of difference when applying these values.
This percentage value controls the tendency of the kernel to reclaim the memory being used for caching of directory and inode objects.
If your program is going to open lots of file, or accessing lots of directory, it’s a good idea to set this value lower, set this value higher otherwise.
The default value of this option is 100, set this value higher means more aggressive, set this value lower means unaggressive.
BBR is a TCP congestion control algorithm developed by Google, this will not speed up your network.
For more detailed information about BBR, please refer TCP BBR congestion control comes to GCP - your Internet just got faster.
In short, BBR can bring higher throughput and lower latency to your system, it uses recent measurements of the network’s delivery rate and round-trip time to build an model, that can be used to control how fast it sends data and the maximum amount of data it’s willing to allow in the network at any time.
BBR’s code is merged into Linux kernel since version 4.9, recent Linux distro should have
tcp_bbr kernel module.
Use command below to enable BBR in your system.
# modprobe tcp_bbr
Connection, socket buffer and other tunables
We are going to talk about these parameters
This option will set the maximum value of total connections to kernel. This effect your web server’s performance a lot.
The default value of this option is 128, and this value has been raised in kernel 5.4 to 4096.
If you type
sysctl net.core.tcp_rmem, you should see three numbers, like this
[email protected]:~$ sysctl net.ipv4.tcp_rmem
The first number is the minimum TCP buffer size, the second number is current TCP buffer size, the last number is maximum TCP buffer size.
Why do you need a bigger buffer? There are two kinds of senario, one is you got really fast network environment, and second one is you’re communicating over high-latency WAN. Both senarios can benefit from a larger TCP buffer size.
First, let’s refresh our memory with TCP three-way handshake.
In order to ensure that client got the packet that server sent, the server really can’t send any new packet to the client, thus less efficient network. By increasing buffer size, you can send larger packet at once, thus increasing efficiency in your system.
For this option, you need to calculate the value for your system yourself. The value should be your network speed in bytes times your round-trip delay time in seconds. For example, for a 1Gbps network with 4ms delay, the maximum value of
net.core.rmem should be 500kbytes, or 512000. Setting maximum buffer too big will just result in network congestion.
This option affects the memory allocated to
cmsg list maintained by the kernel that contains “extra” packet information like
Increasing this option allows the kernel to allocate more memory as needed for more control messages that need to be sent for each socket connected (including IPC sockets/pipes)
The content above is copied from In Linux, how do I determine optimal value of optmem_max?, nice answer.
Yeah…we need to talk about TCP three-way handshake, again.
For a normal TCP three-way handshake, the data can only be sent after the client send “ACK” to server, then server can send data back to client. This is not efficient, is there any way that we can send data earlier?
TCP Fast Open solved this issue. When the client created a connection to the server for the first time, the server would issue a cookie, and send to client when server send “SYN-ACK” back to client, and now the client have the cookie.
Now, when the client create another connection to the server, it would also send the cookie to the server, the server can identify the client using that cookie, and it would know that this client has created connection before, so now I can send data directly to the client. So instead of sending “SYN-ACK”, the server would send “SYN-ACK + Data”, so the client won’t need to send “ACK” again.
The options for
1, only enabled on outgoing connections
2, only available on listening sockets
3, enables on outgoing and listening
This will enable reuse of TIME-WAIT sockets for new connections when it’s safe from protocol viewpoint.
Basically, this means the server can use the socket that was already created, have been used and now is idle. By reusing the connection, the system will not need to create a new socket, resulting in faster connection creation time.
The value of this option is
1, global enable
2, enable for loopback traffic only
This option will tell kernel how many half-open connections can keep in connection queue. If you have lots of clients connecting to server, and server can’t handle connections fast enough, put those connections to queue instead of refusing connection is a good idea.
This option allows kernel to change TCP window size. If this option is enabled, program can increase the size of their socket buffers and the window scaling option will be employed.
This option has two available value
This will affect the maximum number of files your system/user/program can open at once.
This option will control the maximum amount of files that your system can open at the same time.
Bigger is usually better.
This file is used by
pam_limits, and it can limit how many resources a user can get.
There are two kinds of limit,
hard. You can think
soft as default value, and
hard as upper limit.
If I want to change how many files user
nginx can open at the same time, I need to write these lines into
nginx soft nofile 1048576
Systemd will also limit how many files a service can open at the same time.
To change the maximum number of files a service can open, add
LimitNOFILE=<AMOUNT> option into
[Service] block in the unit file.
RAM & swap
If you have enough RAM, but your system is still using swap, other than changing the value of
vm.swappiness, you can also use
zram to create swap.
Zram basically is
swap in RAM. It can create swap using RAM, and it also support compression, creating a much faster and efficient swap.
Commands below are example for creating
# modprobe zram
For my task, I was asked to tune
Apache Web Server and
And these are options I found helpful when tuning for performance.
This option will set the the maximum number of child process that Apache can have.
This option will set the maximum number of thread a child process can handle at the same time.
npm_event_module was created at time around Apache 2.0, this module is designed to allow more requests to be serrved simultaneously by passing off some processing work to the listening threads, freeing up the worker threads to serve new requests.
There are some options for this module.
Number of child processes created at startup.
Minimum number of idle threads available to handle request spikes.
Maximum number of idle threads available to handle request spikes.
Maximum number of connections that will be processed simultaneously.
Limit on the number of connections that an individual child server will handle during its life.
This option will set the maximum amount of files a nginx worker can open. Don’t set this any larger than system limit.
The maximum amount of connections a worker can handle at once.
This option can enable gzip compression for HTTP request.
This option can enable caching for file descriptors, very helpful when serving static files.
This option will tell nginx to use TCP Fast Open for conection, remember to enable kernel support for TCP Fast Open.
This option will limit the maximum length for the queue of pending connections. You can set this option the same value as
This option will tell nginx to create individual listening socket for each worker process. This will allow kernel distributing incoming connection between worker processes.
We have seen lots of tunables, and now it’s time to see the numbers!
Testing VMs are using Ubuntu Server 20.04
Both server and client are equipped with 2 vCPU, 2GB RAM
Test was done using hey
Let’s start by testing our web servers with 1500 clients, and creating 15,000 requests.
$ hey -c 1500 -n 15000 http://192.168.133.142:8080
We got some errors with Nginx, hope there isn’t going to be any error after tuning.
Using the same testing params as Nginx.
[email protected]:~$ hey -c 1500 -n 15000 http://192.168.133.142:8081
Although Apache takes more time, but it completes all the requests, good job!
I changed the params to 9000 clients, and total requests of 90000. 6x the amount of before, let’s see the result.
$ hey -c 9000 -n 90000 http://192.168.133.142:8080
While trying with 9000 clients and 90000 requests, Apache failed lots of requests. So I decreased the clients to 6000 and 60000 requests.
$ hey -c 6000 -n 60000 http://192.168.133.142:8081
I think there’s still more options/parameters to tune, and log from web servers should be helpful when we want to increase the connections to the server. But after some simple tweaks, the web server can perform 4x to 6x better than before, I think it’s worth the tuning and studying.