Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/6959ed27-accc-800e-8e87-21aa81e93c07
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
======================================================================= The results are very clear now: With -ffast-math: 4x speedup for curvature method Without -ffast-math: 2.6-3x speedup for curvature method The circle case is interesting - Newton converges very fast for circles (1-2 iterations), so it's nearly tied there. Let me also create a Rust version to verify these results with another compiler. Excellent! Let me also create a Rust version for cross-validation: Good, I've given the user the files. Let me now provide a summary of the results. C++ Benchmark Results The results are decisive. 0xfaded's curvature method is 3-4x faster than Newton's method in compiled C++: Configuration Curvature Newton Speedup Moderate eccentricity 43 ns 173 ns 4.0x High eccentricity 44 ns 192 ns 4.4x Extreme eccentricity 47 ns 194 ns 4.1x Near-degenerate 42 ns 193 ns 4.6x Circle 44 ns 70 ns 1.6x Why It's Faster The key is avoiding trig functions in the inner loop: Operation CPU Cycles sqrt ~10-15 sin/cos ~50-100+ atan2 ~100-150+ Curvature: 3 iterations × 3 sqrt = 9 sqrt calls Newton: 1 atan2 + ~5 iterations × 2 sin/cos = 11+ trig calls Even without -ffast-math, the curvature method is still 2.6-3x faster. The Circle Edge Case For circles (a=b), Newton converges in just 1-2 iterations because the initial atan2 guess is perfect. That's why the speedup drops to 1.6x for that case. Accuracy Both achieve identical machine precision (~4×10⁻¹⁶ deviation from ellipse). The C++ and Rust source files are included if you want to verify on your own machine. Benchmark results Document · MD Ellipse benchmark CPP Ellipse benchmark RS how does the speedup compare against the python benchmark 8:04 PM The user wants me to compare the Python benchmark results against the C++ benchmark results. Let me look back at what we found: Python Results (scalar): Curvature was actually slower than Newton in scalar Python Newton was about 1.2x faster than curvature in pure Python scalar code Python Results (vectorized NumPy): Curvature was 2-4x faster than Newton when vectorized C++ Results: Curvature was 3-4x faster than Newton consistently So the interesting comparison is: In scalar Python, Newton wins slightly (Python's math.sin/cos are highly optimized C) In vectorized Python (NumPy), Curvature wins by 2-4x In C++, Curvature wins by 3-4x The Python scalar result is misleading because Python's interpreter overhead dominates, and the C math library's trig functions are very optimized. The "real" performance difference shows up in compiled code or vectorized operations. Great question - the comparison reveals something important about microbenchmarking: Python vs C++ Speedup Comparison Scenario Curvature vs Newton Python scalar Newton wins ~1.2x Python vectorized (NumPy) Curvature wins 2-4x C++ (-O3 -ffast-math) Curvature wins 3-4x C++ (-O3, strict IEEE) Curvature wins 2.6-3x Why Python Scalar Was Misleading In scalar Python, Newton appeared faster because: Python's math.sin/math.cos are C library calls - highly optimized, and the Python→C call overhead is the same whether you call sqrt or sin Interpreter overhead dominates - the Python bytecode dispatch cost dwarfs the actual math operations Newton does fewer iterations - 4-5 vs fixed 3, so fewer Python loop iterations The Real Picture Python Scalar Python NumPy C++ Compiled ------------- ------------ ------------ Curvature method: 6.4 μs 1.2 ms 43 ns Newton method: 5.2 μs 3.7 ms 173 ns ↓ ↓ ↓ Newton wins Curv wins Curv wins 1.2x 3.0x 4.0x The C++ results are the ground truth for algorithm comparison. Python scalar benchmarks are measuring Python overhead, not the algorithms themselves. Key Takeaway If you're benchmarking algorithms in Python, either: Use NumPy/vectorized operations (which bypass the interpreter) Or use compiled code (C/C++/Rust) Scalar Python microbenchmarks can give you the opposite conclusion from reality, as we saw here. Take the c++ benchmarks we have, and compare against these . Models: C, G, O: C: Great question! Here are several optimizations for game use: #include <cmath> struct Ellipse { float cx, cy; float a, b; // Precomputed values (set once when orbit is defined) float a2, b2; // a², b² float ab_diff; // b² - a² float inv_a, inv_b; // 1/a, 1/b void precompute() { a2 = a * a; b2 = b * b; ab_diff = b2 - a2; inv_a = 1.0f / a; inv_b = 1.0f / b; } }; // Optimized version: ~3-4x faster float distanceToEllipseFast(const Ellipse& e, float px, float py) { // Work in first quadrant float qx = std::abs(px - e.cx); float qy = std::abs(py - e.cy); // Better initial guess using normalized coordinates float nx = qx * e.inv_a; float ny = qy * e.inv_b; float inv_len = 1.0f / std::sqrt(nx '' nx + ny '' ny + 1e-10f); float cosT = nx * inv_len; float sinT = ny * inv_len; // Fixed 4 iterations, no branches, no trig after init // Use sin²+cos²=1 to update without calling sin/cos for (int i = 0; i < 4; ++i) { float ex = e.a * cosT; float ey = e.b * sinT; float f1 = e.ab_diff '' sinT '' cosT + qx '' sinT '' e.a - qy '' cosT '' e.b; float f2 = e.ab_diff '' (cosT '' cosT - sinT '' sinT) + qx '' cosT '' e.a + qy '' sinT * e.b; float dt = f1 / f2; // Update sin/cos using small angle approximation + renormalization // cos(t-dt) ≈ cos(t) + dt''sin(t), sin(t-dt) ≈ sin(t) - dt''cos(t) float newCos = cosT + dt * sinT; float newSin = sinT - dt * cosT; // Renormalize to stay on unit circle inv_len = 1.0f / std::sqrt(newCos '' newCos + newSin '' newSin); cosT = newCos * inv_len; sinT = newSin * inv_len; } float dx = e.a * cosT - qx; float dy = e.b * sinT - qy; return std::sqrt(dx '' dx + dy '' dy); } // Even faster: if you just need "inside/outside orbit" checks float signedDistanceToEllipseCheap(const Ellipse& e, float px, float py) { float dx = (px - e.cx) * e.inv_a; float dy = (py - e.cy) * e.inv_b; float d = std::sqrt(dx '' dx + dy '' dy); // Approximate distance (exact on circle, approximate on ellipse) return (d - 1.0f) * std::min(e.a, e.b); } Key optimizations: # No trig in the loop — update sin/cos via rotation + renormalization # Fixed iteration count — no branches, predictable for CPU pipeline # Precomputed constants — avoid repeated division # float instead of double — 2x faster, plenty precise for games # Cheap fallback — if you just need approximate distance or inside/outside tests If you need batch processing (many points): #include <immintrin.h> // Process 8 points at once with AVX void distanceToEllipseSIMD(const Ellipse& e, const float'' px, const float'' py, float* out, int count) { __m256 va = _mm256_set1_ps(e.a); __m256 vb = _mm256_set1_ps(e.b); __m256 vcx = _mm256_set1_ps(e.cx); __m256 vcy = _mm256_set1_ps(e.cy); // ... vectorized version of the above } For most games, the 4-iteration float version is accurate to sub-pixel precision and should handle tens of thousands of queries per frame without issue. G: paste1 O: paste2 8:08 PM The user wants me to compare the curvature-based method (0xfaded's) against these game-optimized Newton variants. Let me create a comprehensive benchmark that includes: Original curvature method (0xfaded) Original Newton method Model C's optimized Newton (sin/cos rotation trick) Model G's fast Newton (Eberly-style t equation) Model O's fast Newton (similar Eberly-style) The key differences to test: Curvature: 3 iterations, no trig in loop, uses sqrt Newton variants: 2-4 iterations, various optimizations Let me write a comprehensive C++ benchmark. Let me create a comprehensive benchmark comparing all these approaches: bash cd /home/claude && g++ -O3 -march=native -ffast-math -std=c++17 ellipse_benchmark_all.cpp -o bench_all -lm && ./bench_all Output
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)