When you need to test that a code change has made an improvement or, at least, didn't reduce the current level of performance, it is called regression testing. A level of statistical analysis is needed to ensure that perceived differences are not the result of random chance. The rigorous way to accomplish this is to use the Student's t-test to compare the results. The t-test can tell you the probability that a regression exists, but it doesn't tell you which regressions should be ignored and which must be pursued. However, provided the cost of running additional tests a cost-benefit analysis could tell you that.
Check out the Apache Commons Mathematics Library (TTest) for an implementation for Student's t-test.