Pratham Patel

Blog

Long-form articles, tutorials, and my thoughts on software development.

MaxRL: From REINFORCE to Maximum Likelihood

Why dividing by the number of successes instead of the batch size changes what your gradient estimator optimizes — and how this connects REINFORCE, maximum likelihood, and pass@k through one clean mathematical identity.

reinforcement-learningmachine-learningpolicy-gradient

Reinforcement Learning from Scratch

Building RL from the ground up — actions, rewards, policies, expected reward, the policy gradient theorem, and REINFORCE — all derived step by step with concrete examples.

reinforcement-learningmachine-learningpolicy-gradient

Mathematical Prerequisites for Reinforcement Learning

Building the math foundations you need for RL — probability, expected value, derivatives, the log trick, and Monte Carlo estimation — all through one consistent example.

reinforcement-learningmathematicsmachine-learning

Manifold-Constrained Hyper-Connections: Stabilizing Deep Networks Beyond ResNets (with the actual math)

From residual identity paths to Hyper-Connections and mHC — now with the paper's exact equations, fully unrolled products, and concrete numeric examples

deep-learningneural-networkslinear-algebra

Manifold-Constrained Hyper-Connections: Stabilizing Deep Networks Beyond ResNets

A deep dive into why residual connections work, how Hyper-Connections generalize them, and why constraining learned skip paths to doubly stochastic matrices solves the instability problem

deep-learningneural-networkslinear-algebra

Gradient Boosting: A Complete Guide

A deep dive into Gradient Boosting - from intuition and geometry to the math behind pseudo-residuals, stage-wise corrections, and practical implementation considerations.

machine-learninggradient-boostingensemble-methods

Hello, World!

A quick introduction about me and what I do.

personalintroduction