I have a system 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 with server application installed with 200-1000 simultaneously connected clients.
The communication is made using UDP protocol, average client session may consist from 5 up to hundreds of packets, mainly incoming (client to server). The packet size at the iptraf histogram is:
1-73: 36%
74-146: 47%
147-219: 16%
The bandwidth used from 128 kbit/s up to 1 Mbit/s. The packets per second is high (now evening, 113 kbit/s and 105 packets/s).
Technically it does not take high amount of system resources (now 1.5% CPU), all client requests are processed on the fly.
Sometimes I noticed the problem "server did not reply" at the client, but the reproducibility of this trouble is very low and it's hard to repeat it.
Recently I read some papers about DNS configuration and related high-load problems and figured out that furing month+ of server usage with no restarts netstat -s shows extremely high error values, e.g.
Lets say about 4 month after last reboot, average about 487 pps and 1 error per second. Errors are happening irregularly, i.e. during the time I'm writing these lines it's count is the same, so I guess during high number of clients connected.Udp:
4629113825 packets received
818260 packets to unknown port received.
10461880 packet receive errors
5309885 packets sent
RcvbufErrors: 10461880
As I have 3 same configured VPS at 2 providers in parralel with same stats and same problems at each of them, I started to tune one for tests. After I set
25.5 hours pased and I seenet.core.netdev_max_backlog=10000
net.core.wmem_max = 33554432
net.core.rmem_max = 33554432
net.core.rmem_default = 8388608
net.core.wmem_default = 4194394
Average 489 pps, one error each 1.7 seconds! Again, packets should not wait in queue. They are received from udp socket using a separate thread, some of them are replied immediately, other are stored to the local database. There are no bottle-necks at the storage speed etc. Earlier I thought that hosting provider filters some packets or there are problems but now I see it's related to the socket or system configuration related to the network.Udp:
44912609 packets received
16204 packets to unknown port received.
53012 packet receive errors
272450 packets sent
RcvbufErrors: 52993
InCsumErrors: 15
Do somebody have any ideas why do these packets sometime lost and how to avoid it?