The last thing I could recommend to test is another modem-router, the most basic you can find around, without port forwarding and with STUN on your ATA.
What you described looks so much to NAT/STUN issues that I would insist on testing around this point. Why? Because the problem occurs on incoming audio stream, exactly what STUN/NAT is involved in... And exactly the kind of issue I've been used to see most of time.
Just one more thing: ensure that both ends have same codecs activated! At least one common, I would suggest alaw(g711a)/ulaw(g711u).
For example: if your device has only g729 activated and your callee has other codec(s) activated but not g729, I think you could get the same result.
While testing, since it also appears from Voxalot to Voxalot, I would continue testing without VSP since it brings some more complexities, thus potential concurrent issues.
Good luck...
|