Editing Openai/6959ed27-accc-800e-8e87-21aa81e93c07 (section)

=== Assistant: I got the speedup by changing the work you do per query, not by chasing tiny instruction-level tweaks. ===
I got the speedup by changing the work you do per query, not by chasing tiny instruction-level tweaks.

===== 0xfaded’s trig-free curvature/evolute iteration is great because it’s stable and doesn’t need sin/cos. Your “optimized” version already applies his key micro-opt: =====
* compute r/q with one sqrt: rq = sqrt((r²)/(q²)) instead of sqrt(r²) / sqrt(q²).

Then the big change:
* Instead of doing the full 3 curvature iterations, I do only 2.

That alone drops a whole iteration’s worth of math (one rq sqrt + one renorm sqrt + clamps/mults).

===== After 2 curvature steps you’re already very close to the correct point. The curvature method is basically giving you an excellent “nearby” point on the ellipse. =====

So for the last bit of accuracy, I do one Newton refinement step, but in the rotation-trick form:
* Maintain (c, s) ≈ (cos θ, sin θ) (here we just use the current (tx, ty) as the unit direction).
* Take one Newton step dt = f / f'.
* Update (c, s) using a small-angle rotation approximation and renormalize (no trig calls).

This is cheaper than another full curvature iteration because:
* it reuses the already-good angle estimate,
* it does not recompute evolute geometry (the ex/ey, rx/ry, qx/qy machinery),
* it’s mostly a handful of FMAs plus one sqrt for renorm.

===== So the hybrid is: =====
* 2× (curvature step with 1-sqrt rq + renorm)
* 1× (Newton rotation snap + renorm)

Instead of:
* 3× (curvature step with 1-sqrt rq + renorm)

That’s why it comes out faster in the benchmark: you remove an entire curvature iteration and replace it with a lighter “snap” step.

===== You’re not lowering accuracy to get speed. You’re changing the iteration mix: =====
* curvature = good global behavior / basin-finding
* Newton snap = good local convergence when you’re already close

In other words: global-ish step twice, then local step once.

If you want, I can give you a one-paragraph “HN-ready” explanation and a tiny ASCII diagram of the idea (curvature gets you near, Newton snaps to the true closest-point condition).