Editing Openai/6939b23e-87cc-8001-bdd9-9546d1143ce9 (section)

==== 1. Model size / hidden layers - More neurons per layer → more capacity to represent complex relationships between ideas. ====
# Number of layers - More sequential transformations → deeper reasoning chains (like multiple passes of thought).
# Number of attention heads - Determines how many relationships between tokens the model can track simultaneously.
# Training on reasoning tasks - Instruction-tuning and chain-of-thought training make models better at: - Step-by-step deduction - Hypothetical reasoning - Multi-step problem solving
# Ability to handle long context windows - Large context window models (70k–130k tokens or more) allow you to feed huge lists or structured content. - This is crucial if you plan to add all the context externally rather than relying on the model’s internal knowledge.