Editing Openai/693eaf5a-1494-8007-b1e3-00abefb11523 (section)

=== Assistant: Got it — below is a complete, ready-to-run Python script that does everything you asked: ===
Got it — below is a complete, ready-to-run Python script that does everything you asked:
* Builds the triangle from Origin, Global max-price point, and Final-date baseline.
* Uses the half-orthogonal rule to build a global ray (global half-height) and also computes per-day local half-heights.
* Creates two tetrahedron families for each day: - Local tetra: apex uses that day’s half-orthogonal (local h_i). - Global-height tetra: apex uses the global half-height (same h for all days).
* Computes tetrahedral volumes, triangle areas, hypotenuse lengths, etc., for each day.
* Bins prices (Freedman–Diaconis), aggregates tetra volumes per bin, and charts % of overall tetra volume vs % days in that bin.
* Backtests volumetric indicators (single-feature logistic classification for next-day up move), ranks by AUC, and plots ROC curves.
* Saves CSV outputs and plots so you can inspect them.

Copy–paste this into a local Jupyter notebook or Python script and run it. Replace the prices array if you want to use your own series.

<syntaxhighlight lang="python"># tetrahedral_analysis.py
"""
Tetrahedral volumetric analysis + backtest for price series.

Requirements:
  pip install numpy pandas matplotlib scikit-learn

Usage:
* Edit 'prices' if you want to use real data (must align with dates or adjust date range)
* Run in Jupyter or Python; plots will display and CSV outputs written to ./tetra_outputs
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, roc_curve
import math, os

=== ------------------------- ===
=== Parameters / Generate Data ===
=== ------------------------- ===
np.random.seed(42)
dates = pd.date_range(start="2025-01-01", end="2025-12-31", freq="D")
n = len(dates)

=== Default synthetic prices (replace with your series if desired) ===
prices = np.random.uniform(50, 150, n)

=== Put into DataFrame ===
df = pd.DataFrame({"date": dates, "price": prices})
df["d_index"] = np.arange(n)
df["doy"] = df["date"].dt.dayofyear.astype(float)

=== ------------------------- ===
=== Global triangle points ===
=== O = (0,0) ===
=== M = (day_of_max_price, max_price) ===
=== F = (day_of_final_date, 0) ===
=== ------------------------- ===
imax = df["price"].idxmax()
d_m = df.loc[imax, "doy"]
p_m = df.loc[imax, "price"]
d_f = df["doy"].iloc[-1]  # final day-of-year

def point_to_line_distance_origin(x1, y1, x2, y2):
    # distance from (0,0) to line through (x1,y1)-(x2,y2)
    num = abs(x1''y2 - y1''x2)
    den = math.hypot(y2 - y1, x2 - x1)
    return num/den if den != 0 else 0.0

dist_O_to_MF = point_to_line_distance_origin(d_m, p_m, d_f, 0.0)
h_global = dist_O_to_MF / 2.0

=== ------------------------- ===
=== Per-day calculations ===
=== For each day i: ===
=== A = (doy_i, 0) ===
=== B = (0, price_i) ===
=== dist_i = distance from origin to line AB ===
=== h_local_i = dist_i / 2 ===
=== apex_local = (doy_i, price_i, h_local_i) ===
=== apex_global = (doy_i, price_i, h_global) ===
=== vol_local_i = |det([A,B,apex_local])|/6 ===
=== vol_global_i = |det([A,B,apex_global])|/6 ===
=== tri_area = area(O, A, B) ===
=== hyp_len = length(A-B) ===
=== ------------------------- ===
d_i = df["doy"].values
p_i = df["price"].values
n = len(df)

h_local = np.zeros(n)
vol_local = np.zeros(n)
vol_global = np.zeros(n)
tri_area = np.zeros(n)
hyp_len = np.zeros(n)

for i in range(n):
    x1, y1 = d_i[i], 0.0
    x2, y2 = 0.0, p_i[i]
    hyp_len[i] = math.hypot(x1 - x2, y1 - y2)
    dist = point_to_line_distance_origin(x1, y1, x2, y2)
    h = dist / 2.0
    h_local[i] = h
    A_vec = np.array([x1, y1, 0.0])
    B_vec = np.array([x2, y2, 0.0])
    D_local = np.array([x1, y2, h])
    D_global = np.array([x1, y2, h_global])
    M_local = np.column_stack((A_vec, B_vec, D_local))
    M_global = np.column_stack((A_vec, B_vec, D_global))
    vol_local[i] = abs(np.linalg.det(M_local)) / 6.0
    vol_global[i] = abs(np.linalg.det(M_global)) / 6.0
    tri_area[i] = abs(x1 '' y2 - x2 '' y1) / 2.0

df["h_local"] = h_local
df["tetra_vol_local"] = vol_local
df["tetra_vol_global"] = vol_global
df["tri_area"] = tri_area
df["hyp_len"] = hyp_len

=== ------------------------- ===
=== Bin prices using Freedman–Diaconis rule ===
=== Aggregate tetra volumes per bin and compute % of total volume and % of days ===
=== ------------------------- ===
def freedman_diaconis_nbins(arr):
    q75, q25 = np.percentile(arr, [75,25])
    iqr = q75 - q25
    n = len(arr)
    if iqr == 0:
        return 10
    bin_width = 2 * iqr / (n ** (1/3))
    if bin_width <= 0:
        return 10
    nbins = int(np.ceil((arr.max() - arr.min()) / bin_width))
    return max(1, nbins)

nbins = freedman_diaconis_nbins(df["price"].values)
bin_edges = np.histogram_bin_edges(df["price"].values, bins=nbins)
df["price_bin_fd"] = pd.cut(df["price"], bins=bin_edges, include_lowest=True)

agg = df.groupby("price_bin_fd").agg(
    count_days=("price", "size"),
    vol_sum_local=("tetra_vol_local", "sum"),
    vol_sum_global=("tetra_vol_global", "sum"),
    price_min=("price","min"),
    price_max=("price","max")
).reset_index()

total_vol_local = df["tetra_vol_local"].sum()
total_vol_global = df["tetra_vol_global"].sum()

agg["pct_total_vol_local"] = 100 * agg["vol_sum_local"] / total_vol_local if total_vol_local>0 else 0.0
agg["pct_total_vol_global"] = 100 * agg["vol_sum_global"] / total_vol_global if total_vol_global>0 else 0.0
agg["pct_days"] = 100 * agg["count_days"] / agg["count_days"].sum()

=== Print and save aggregation ===
print("\nAggregated volume by FD price bins:\n", agg[["price_min","price_max","count_days","pct_days","pct_total_vol_local","pct_total_vol_global"]])
os.makedirs("tetra_outputs", exist_ok=True)
agg.to_csv("tetra_outputs/agg_by_price_bin.csv", index=False)

=== Plot percent total local volume vs percent days in bin ===
plt.figure(figsize=(10,5))
x = np.arange(len(agg))
width = 0.4
plt.bar(x - width/2, agg["pct_total_vol_local"], width=width, label="% total LOCAL tetra volume")
plt.bar(x + width/2, agg["pct_days"], width=width, label="% days in bin")
plt.xticks(x, [f"{int(a)}-{int(b)}" for a,b in zip(agg["price_min"], agg["price_max"])], rotation=45)
plt.ylabel("Percentage (%)")
plt.title("Percent of Total Local Tetra Volume vs % Days per Price Bin (FD bins)")
plt.legend()
plt.tight_layout()
plt.savefig("tetra_outputs/volume_vs_days_by_price_bin.png")
plt.show()

=== ------------------------- ===
=== Backtest indicators: single-feature logistic for next-day up move ===
=== Indicators: tetra_vol_local, tetra_vol_global, h_local, tri_area, hyp_len ===
=== Compute AUC, correlation with up_next, prior and max quartile posterior lift ===
=== ------------------------- ===
df["next_price"] = df["price"].shift(-1)
df["up_next"] = (df["next_price"] > df["price"]).astype(int)
df_bt = df.iloc[:-1].copy()

indicators = ["tetra_vol_local", "tetra_vol_global", "h_local", "tri_area", "hyp_len"]
results = []
roc_curves = {}
for ind in indicators:
    X = df_bt[ind].values.reshape(-1,1)
    y = df_bt["up_next"].values
    if np.allclose(X, X.mean()):
        auc = float("nan"); corr = float("nan"); fpr = tpr = None
    else:
        clf = LogisticRegression(solver="liblinear")
        clf.fit(X,y)
        probs = clf.predict_proba(X)[:,1]
        auc = roc_auc_score(y, probs)
        fpr, tpr, _ = roc_curve(y, probs)
        corr = np.corrcoef(df_bt[ind].values, y)[0,1]
    try:
        df_bt["quartile"] = pd.qcut(df_bt[ind], 4, labels=False, duplicates="drop")
        post = df_bt.groupby("quartile")["up_next"].mean()
        max_post = post.max()
    except Exception:
        max_post = float("nan")
    prior = y.mean()
    max_lift = max_post - prior if not math.isnan(max_post) else float("nan")
    results.append({"indicator": ind, "auc": auc, "corr_with_up": corr, "prior": prior, "max_quartile_post": max_post, "max_lift_over_prior": max_lift})
    roc_curves[ind] = (fpr, tpr)

res_df = pd.DataFrame(results).sort_values(by="auc", ascending=False).reset_index(drop=True)
print("\nBacktest indicator ranking (by AUC):\n", res_df)
res_df.to_csv("tetra_outputs/backtest_indicator_ranking.csv", index=False)

=== ROC plot ===
plt.figure(figsize=(8,6))
for ind in indicators:
    fpr, tpr = roc_curves[ind]
    if fpr is None or tpr is None:
        continue
    auc_val = res_df.loc[res_df["indicator"]==ind, "auc"].values[0]
    plt.plot(fpr, tpr, label=f"{ind} (AUC={auc_val:.3f})")
plt.plot([0,1],[0,1], linestyle="--", color="gray")
plt.xlabel("FPR"); plt.ylabel("TPR")
plt.title("ROC curves for volumetric indicators")
plt.legend()
plt.tight_layout()
plt.savefig("tetra_outputs/roc_curves.png")
plt.show()

=== Save detailed per-day metrics CSV ===
df.to_csv("tetra_outputs/tetra_metrics_per_day.csv", index=False)

print("\nOutputs saved to ./tetra_outputs/")
print("Global max price day-of-year:", d_m, "price:", p_m)
print("Global orthogonal distance from origin to MF hypotenuse:", dist_O_to_MF)
print("Global half-height h_global:", h_global)

</syntaxhighlight>

===== - tetra_vol_local uses each day’s own half-orthogonal height; tetra_vol_global uses the global half-height derived from the triangle formed by the global max-price and the final-date baseline. =====
* pct_total_vol_local per price-bin shows how much of the sum of local tetra volumes falls into price ranges. Overlaying % days reveals bins that contribute disproportionately to tetra volume.
* Backtest uses single-feature logistic and AUC to see which volumetric indicator signals next-day price increases best. AUC ≈ 0.5 is random; >0.6 is modest; >0.7 indicates useful signal (but watch for overfitting).
* If you have real price data (or higher-frequency data), replace prices with your array. Keep the date range consistent or change dates.

If you want, I can:
* Tweak the script to perform a walk-forward retrain and report time-varying AUCs (recommended),
* Add regression tests (predict next-day price magnitude, not only direction),
* Or modify the binning (KMeans or quantile) and show interactive plots.

Which of those next?