||# of thread activations/sec
||Eliminated. Transport layer encryption is
removed right in the Pumper, then individual messages are
processed on the job queue.
|NTCP Pumper, NTCP Reader, NTCP Writer, NTCP
||Combine 13 threads into one. Optimize
execution order. Execute afterSend and delayed close in
tight loops. Do not activate SimpleTimer.
|Tunnel GW Pumper
||Eliminate thread pool, running code highly
parallel from calling threads. Intelligently batch together
traffic for same tunnel. Always flush tunnels if no further
traffic concurrently queued while pumping. Keep delayed
flush through the SimpleTimer with its 100 ms wait for local
traffic to an absolute minimum.
|Job Queue System, Client message pool
||Run outbound local traffic as parallized
priority jobs. Standard jobs run after them. Standard jobs
activate a thread only if no outbound running. Raised to
priority after 2 ms, if no priority comes in. Timed jobs go
as piggybacks only on threads that would go idle otherwise.
||Eliminate PacketHandler thread. Run code
directly from the UDP Receiver, forwarding into the central
job queue. Eliminate UDP Message Receiver.
||Tune thread count and other properties to OS
and hardware capabilities.
||Handle fragment expiration inline while
tunnel is live. Reuse expire timers. Restrict SimpleTimer
use to end of tunnel lifetime where possible. Completely
lockless message reception. Debugged.
||Use setRemoveOnCancelPolicy(true) to prevent
timers firing that have their tasks canceled.
||Bypass UDP Sender when no current BW limits
|PRNG, DH / XDH / YK Precalc
||Run all crypto generators often enough that
all buffers are always filled and no worker thread has to
wait for precalc stuff. Wait times are calculated in a way
that on average every run generates 1 item or a bit more.
||Complete overhaul. Less retransmissions and
higher connection speeds. Contact us for detailed questions.
||Several bottlenecks removed together with
changes above result in clearly higher throughput.
||Double max. number of fast and high capacity
||Substitute random tunnel selection by round
||Access the system clock only 2 times per ms
on average, instead of 50+ times before. There is too much
talk about precision timing. I2P survives a 1000ms heapdump
without logging any error, so it does not hurt if the clock
is off by some µs.
||Do not lock down the random number generator
(major bottleneck). Retrieve random numbers lockfree. Use
spinlocks to solve concurrency.
|Queues and caches, various places
||Changed to lockfree ring buffers, where spotted as bottleneck. Use spinlocks to solve concurrency.|