One of my client has server node located at north America, Asia and Europe data centers. All servers are connected using 1000Mbps links. They transfers lots of data between all nodes over ssh session using scp / sftp. However, performance was horrible. After some research I came across High Performance SSH/SCP - HPN-SSH patch for OpenSSH:
SCP and the underlying SSH2 protocol implementation in OpenSSH is network performance limited by statically defined internal flow control buffers. These buffers often end up acting as a bottleneck for network throughput of SCP, especially on long and high bandwith network links.
Modifying the ssh code to allow the buffers to be defined at run time eliminates this bottleneck. We have created a patch that will remove the bottlenecks in OpenSSH and is fully interoperable with other servers and clients. In addition HPN clients will be able to download faster from non HPN servers, and HPN servers will be able to receive uploads faster from non HPN clients. However, the host receiving the data must have a properly tuned TCP/IP stack.
The amount of improvement any specific user will see is dependent on a number of issues. Transfer rates cannot exceed the capacity of the network nor the throughput of the I/O subsystem including the disk and memory speed. The improvement will also be highly influenced by the capacity of the processor to perform the encryption and decryption. Less computational expensive ciphers will often provide better throughput than more complex ciphers.
You can download HPN-SSH patch here. This patch improved our performance. You also need to tweak Linux TCP/IP networking settings. Here is my sysctl.conf file ( read this TCP tunning Linux guide for detailed explanation) :
# optimization start
# increase TCP max buffer size setable using setsockopt()
net.ipv4.tcp_rmem = 4096 87380 8388608
net.ipv4.tcp_wmem = 4096 87380 8388608
# increase Linux auto tuning TCP buffer limits
# min, default, and max number of bytes to use
# set max to at least 4MB, or higher if you use very high BDP paths
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
net.core.netdev_max_backlog = 5000
net.ipv4.tcp_window_scaling = 1
# optimization end