N-body integrations are used to model a wide range of astrophysical dynamics, but they suffer from errors which make their orbits diverge exponentially in time from the correct orbits. Over long time-scales, their reliability needs to be established. We address this reliability by running a three-body planetary system over about 200 e-folding times. Using nearby initial conditions, we can construct statistics of the long-term phase-space structure and compare to rough estimates of resonant widths of the system. We compared statistics for a wide range of numerical methods, including a Runge-Kutta method, Wisdom-Holman method, symplectic corrector methods, and a method by Laskar and Robutel. “Improving” an integrator did not increase the phase space accuracy, but simply increasing the number of initial conditions did. In fact, the statistics of a higher order symplectic corrector method were inconsistent with the other methods in one test.