LabelBox · Robotics Simulation Engineer
Most of what I build begins in mathematics: not as a formal starting point, but as a way of understanding what kind of problem something actually is. I work across robotics simulation, world-sim / world-model evaluation, autonomous systems, AI, quantum algorithms and quantitative finance. At Labelbox, I work on confidential robotics simulation and model-evaluation tasks involving MuJoCo and world-sim workflows. I am drawn to problems where models have to respect physics, uncertainty and real constraints — not just produce plausible-looking answers.
Working with an international team (Rangers-Intrinsic) on a robotics + ML competition to train a UR5e robot arm to perform dexterous cable insertion in simulation. My focus: ACT (Action Chunking with Transformers) model training, flow-matching policy, and RL fine-tuning. Running on Gazebo + ROS 2 + Isaac Sim on AWS EC2 GPU instances.
A real-time multi-modal social co-pilot for neurodivergent users. Three concurrent threads — YOLOv11 pose estimation on NVIDIA A10G GPU, a custom FERNet CNN for facial emotion including compound states, and ElevenLabs live transcription — synthesised by Claude every 15 seconds into spoken advice.
Privacy-preserving cross-chain stablecoin settlement protocol. Three privacy layers: ephemeral wallet identities, aggregation-based omnibus settlement, and QAOA-optimised multi-hop routing on the Flare Network with Merkle proof verification.
Avellaneda-Stoikov framework with XGBoost/LightGBM directional predictors and mean-variance portfolio optimisation. Extended with Multi-Agent Reinforcement Learning. Maximised Sharpe ratio across 4 assets; PnL evaluated in under a minute.
Ongoing participation in IMC's Prosperity algorithmic trading competition. Full write-up coming soon.
CAD design for the drone airframe and components, plus a computer vision pipeline for autonomous target detection and tracking.
A web application built using Perplexity's API with a custom frontend in Tailwind CSS — exploring what a well-designed AI search experience could look like beyond the default interface. Inspired by Sienar Industries' approach to product-quality AI tooling.
Built entirely from scratch — no external chess libraries. Hybrid Minimax + Alpha-Beta Pruning + MCTS, with a pre-trained Transformer handling simulation. FEN notation parsing, SQLite-backed game history, endgame tablebases, and a full Pygame interface.
Upload any photograph to simulate anamorphic and spherical mirror distortions using ray-optics geometry. Also built a separate ray-optics simulation in MATLAB and Python with OpenCV. Awarded Gold in the BPhO Computational Challenge.
Only team of first-years. Two-stage stochastic optimisation with MILP for flexible compute scheduling, deployed via the Beckn protocol. 4th place internationally from 100+ teams; sponsored by Google Cloud and the University of Cambridge.
Graduated from a postgraduate-level quantum programme — one of few undergrads selected. Focused on Carleman Linearisation, QFT, Phase Kickback, and HHL. GitHub portfolio: quantum fundamentals in PennyLane, applied research in quantum ML, Monte Carlo methods, and quantitative finance.
Year-long programme building a fully autonomous medical aid delivery robot. PWM for motor control, ultrasonic distance sensing, GPS navigation, and OpenCV for camera-based obstacle avoidance. Won Industrial Cadets Gold Award and Project of the Year.
A CPU emulator written in C++ — implementing the fetch-decode-execute cycle, registers, memory addressing, and a basic instruction set from scratch. A deep-dive into computer architecture motivated by wanting to understand what actually happens below the abstraction layer.
Read the original 1966 Weizenbaum papers on ELIZA in depth and built an extended chatbot therapist. The starting point for a self-directed journey into natural language processing — pattern matching, response generation, and the surprising depth of rule-based systems before neural approaches dominated.
Detailed technical write-ups of each project — architecture decisions, problems encountered, how they were solved, and what they produced. Each section draws directly from the project README and source code.
The Intrinsic AI for Industry Challenge is a robotics + ML competition hosted by Intrinsic (Google/Alphabet) and Open Robotics, with a $180,000 prize pool. The task is deceptively precise: get a UR5e robot arm to insert SFP and NIC cables into a connector board in a simulated environment running Gazebo (Google DeepMind) and Isaac Lab (NVIDIA). Scoring rewards task completion (~75 points), trajectory smoothness (~24 points), and penalises collisions and excessive force. The challenge is non-trivial because cable physics in simulation is notoriously unpredictable — cables buckle, twist, and get physically stuck on ports in ways that deterministic policies cannot reliably handle.
My contributions are on the ML side of the stack — the team has specialists handling simulation infrastructure (Enrique), data collection (Ali), and cloud infrastructure (Evan). My focus:
experiment_8 dataset from HuggingFace; investigating a persistent GPU utilisation bottleneck (15–25% observed, 90%+ desired — almost certainly a CPU DataLoader bottleneck addressable with pin_memory and prefetching).Development runs on a dual-monitor setup: RViz and Gazebo Sim side by side, with four concurrent terminals managing simulation, recording, teleoperation, and episode control. Training runs remotely on AWS EC2 g6e instances (NVIDIA L40S GPU — the g4dn.xlarge is insufficient; the L40S is mandatory for Isaac Lab).
Left: Gazebo Sim + Isaac Lab environment. Right: RViz showing UR5e model with TF frame tree.
The UR5e arm performing the NIC card cable insertion task. These clips are from the data collection phase — the oracle cheat-code policy executing the insertion that the trained model is learning to replicate.
UR5e arm inserting a cable into the NIC card connector board. The policy must learn to replicate this motion from demonstration data.
The full stack spans four layers. The simulation layer runs a UR5e arm in Gazebo (Google DeepMind) / ROS 2 / Isaac Lab (NVIDIA) on AWS EC2 g6e instances (NVIDIA L40S GPU required). A data recorder captures bag files per trial at 20Hz — wrench, image, forward kinematics, and motion commands. A scenario generator randomises task board configurations for training variety.
Above that sits the policy training infrastructure: ACT and flow-matching models trained on the experiment_8 HuggingFace dataset, tracked on Weights & Biases. A non-trivial engineering problem: the standard AIC action space is 6-dimensional, but the team's recordings are 19-dimensional (extra metadata). Similarly, the standard observation space is 26-dimensional; recordings capture 48. Resolving this dimensionality mismatch without losing useful context is an open design question.
The team has analysed the wider competition and identified three key techniques to adopt. Force/Torque tare calibration (from competitor jlamperez): the raw wrist wrench reads ~20N because it measures the gripper's own weight — taring before each episode makes the policy weight-agnostic. Recording at 10Hz not 30Hz: cameras are capped at 20fps, so 30Hz recording introduces ~33% duplicate frames. Diffusion Policy over vanilla ACT (from ALOHA Unleashed): diffusion handles multimodal action distributions — multiple valid insertion approaches — without collapsing to a bad average. Sanjay leads this track.
Rangers-Intrinsic is distributed across three time zones. Tommy Ly (NY) leads the project and runs ML alongside a full-time day job. Shi Hao (London) handles RL and ACT training; 17+ W&B experiments logged. His independent portfolio and Project Automaton write-up: sienarindustries.com ↗ Enrique (London) built the simulation environment, data recorder, and HuggingFace dataset. Ali (London) built the scenario generator and ran 1,000 brute-force collection trials. Vladimir is a Senior ML Scientist at Booking.com specialising in RL for LLMs/VLMs. Sanjay (Illinois) leads diffusion policy. Justin (Boston) leads PPO exploration. Evan (London) manages AWS org-level infrastructure.
Social situations are genuinely difficult to navigate for many neurodivergent people — not because of any deficiency, but because the implicit communication layer of human interaction (body language, microexpressions, tone) is processed differently. No tool currently addresses all three channels simultaneously in real time. NeuroCue is a quiet, non-intrusive scaffolding system — not a replacement for social instinct, but a support layer for those who need it.
Three concurrent Python threads share a single threading.Lock state object, fully decoupling data production from LLM consumption:
Trained from scratch during the hackathon. ResNet-style residual blocks with Squeeze-Excite channel attention. 7-class softmax output with smart compound expression mapping — the model doesn't just output a single emotion, it reads the probability distribution: high happy + high fear → "smiling but may be masking discomfort"; surprise + fear → "looks confused." Graceful degradation built in — if fer_model_best.pt fails to load, the system continues on body language and transcript only, no crash.
The three data streams run on entirely different cadences — GPU inference has variable network latency, FER runs every 5s locally, and audio records continuously. The naive approach (waiting for synchronous responses from all three) would have made the system fragile and latency-bound. The solution was to treat the shared state object as a bulletin board: each thread posts its latest output whenever it's ready, and the LLM call reads the whole board atomically on its own 15-second clock. This decoupling is what makes the system robust in live conditions.
The NeuroCue session report interface — showing emotion timeline, body language summary, and voice analysis in a single view. Built with the B2B enterprise use case in mind: exportable session reports, multi-signal visualisation, and a clean dashboard for reviewing what the system detected.
Left: Session report view — 15-minute session, 7 tips given, dominant emotion Neutral. Body language and voice analysis panels. Right: Emotions panel — real-time probability distribution and arousal/valence metrics.
Developing into a B2B SaaS product — target markets: HR departments, autism support organisations, and enterprise training platforms.
A standard on-chain stablecoin transfer publicly reveals the buyer's address, the merchant's address, the exact payment amount, and the precise timestamp. This metadata leakage creates real risks. FZAP ensures there is never a direct blockchain transaction between a buyer and a merchant — without becoming an arbitrary privacy mixer (which would be a regulatory problem).
1. One-time-use ephemeral identities. Each transaction generates a fresh set of cryptographically secure wallet identifiers, unique within the protocol lifetime. No reuse, no linkage.
2. Aggregation-based settlement. Buyer deposits are pooled before settlement — conceptually an omnibus account. Once aggregated, individual payment amounts lose their one-to-one correspondence with specific wallets. A deliberate propagation delay further weakens timing-based correlation attacks.
3. Multi-hop QAOA-optimised routing. Aggregated funds are converted across stablecoins and chains. Possible transitions are modelled as a graph — nodes represent currencies, edge weights represent swap and bridge fees. QAOA (via Qiskit) searches this cost landscape to identify routing configurations that minimise aggregate loss. Real-time pricing from the Flare Time Series Oracle (FTSO) API — temporary stablecoin depegs are modelled as negative cost offsets to preserve pooled value.
For each settlement, an attestation request references the transaction hash, destination chain, and payment type. This is processed during a Flare Data Connector (FDC) voting round. Once finalised, the result is committed to a Merkle tree. Merchants submit a Merkle proof to the settlement verification smart contract, which validates inclusion against the published FDC root. A settlement is only considered complete once verified on-chain — no linkage is ever created between buyers, intermediate hops, and the final merchant payment.
Unlike traditional privacy mixers, FZAP provides no arbitrary anonymisation or user-controlled withdrawal paths. All value flows are explicitly tied to merchant payments; withdrawals are restricted to predefined settlement addresses. This is what makes it commercially deployable rather than a regulatory liability — and the distinction the sponsors found compelling.
Instead of using python-chess, all core mechanics — valid move generation, checkmate detection, board state management — are implemented from the ground up. This gives full control over the game tree and was the point: I wanted to understand exactly what was happening at every level, not call a function and trust it.
The engine uses a Minimax algorithm to navigate the decision tree — identifying the move that maximises its advantage while assuming the opponent plays optimally to minimise it. Alpha-Beta Pruning cuts off branches that are mathematically proven to be worse than previously explored options, significantly reducing computation time. The Negamax variant is used where applicable, handling evaluation from a single perspective by negating scores for the opponent.
Traditional tree searches are limited by the exponential growth of possible moves. MCTS is integrated to balance exploration vs. exploitation. Selection and expansion use Minimax logic to prioritise high-value nodes. Simulation uses a pre-trained Transformer to estimate game outcomes from a given state — far more efficient than searching the entire subtree. Results back-propagate up the tree to update the value of each move.
FEN notation parsing — Forsyth-Edwards Notation is the standard for encoding board states. Implemented a full BNF-grammar-based parser so the engine can import, export, and resume any position.
Endgame tablebases — pre-computed optimal play for positions with few pieces remaining, used to guide the engine in the endgame rather than searching from scratch.
SQLite-backed history — user accounts and game history stored relationally, enabling game replay, analysis, and statistical aggregation across sessions.
Cycle detection — the search tree identifies and ignores repetitive move sequences to prevent infinite loops and enforce threefold repetition rules correctly.
Pygame interface — custom graphical board with piece loading, move highlighting, drag-and-drop input, and a real-time evaluation bar.
Formulate a high-frequency trading strategy to trade 4 assets, maximising the Sharpe ratio using market making and arbitrage strategies, such that PnL can be evaluated in under a minute.
Instead of quoting around the raw mid-price, the engine calculates a reservation price: mid + predicted short-term drift (μᵢ) + inventory penalty (k_inv). This skews quotes away from the direction of inventory exposure, reducing position risk. Spreads are dynamically widened or narrowed based on estimated volatility (σᵢ) and current risk-aversion parameters — wider in high-volatility regimes, tighter in stable ones.
Short-horizon predictors (XGBoost / LightGBM) forecast price movement over the next k ticks. Aggressive market orders are only executed when the signal-to-noise ratio E[R] / σ_est exceeds a calibrated threshold. Quotes are also aggressively skewed away from the predicted direction to avoid being filled on the wrong side of a trend (adverse selection avoidance).
Capital is allocated across the four assets by solving a mean-variance proxy: minimise J(w) = −wᵀμ + λwᵀΣw, subject to leverage caps, dollar neutrality, and maximum per-asset exposure. Recomputed periodically via scipy.optimize.
Raw Limit Order Book snapshots (top 5 levels) transformed into: mid-price, microprice, spread, multi-level imbalance, depth sums, queue imbalances, order-flow deltas, short-term realised volatility, VWAP, and rolling means. These features feed both the market-making reservation price and the directional predictors.
Later extended using Multi-Agent Reinforcement Learning, inspired by Google DeepMind's research on cooperative MARL. Each asset is treated as a separate agent with its own policy, sharing a global Sharpe ratio reward signal — agents learn to coordinate rather than compete for capital.
Many details are highly confidential. Simulating drone test flight for proof of concept to investigate closed loop behvaiour with nominal and perturbed wind fields.
Working on confidential robotics simulation and model-evaluation tasks across MuJoCo and world-sim / world-model workflows. The work includes assessing model-generated MuJoCo code and responses, reviewing simulation-oriented reasoning, and contributing to production-level robotics simulation evaluation without disclosing proprietary task details.
Insight into quantitative trading and research at SIG. 2nd place in Susquehanna's Guesstimathon — a structured estimation competition testing rapid probabilistic reasoning under uncertainty.
Below 1% acceptance rate. Software: incident management via a real-time simulation. Trading: completed the training activities given to quant trading interns, including Jane Street's in-house designed games: Figgie and the Estimathon. Came 2nd in both of these. I thoroughly enjoyed it and want to further my understanding of the trading mindset.
Graduated from a postgraduate-level quantum computing programme at 17. Focused on quantum algorithms for non-linear PDEs: Carleman Linearisation, QFT, Phase Kickback. I sought this out independently; it stems from a pure curiosity about what quantum computing can actually do that classical computing cannot.
Founded a Code Club in a local library where I designed and ran weekly sessions on Algorithmic Thinking, Principles of Programming, and Python for children aged 9–13. An environment purely focused on building curiosity for computing is something I find energising to create as well as to be in.
Year-long industry placement designing and building an autonomous medical aid delivery robot. PWM for motor control, ultrasonic sensing, GPS navigation, and OpenCV for obstacle avoidance. Won the Project of the Year award.
Selected to participate in the Summer of Quantum Camp, run by MIT and Caltech researchers. First formal exposure to quantum computing — sparked the self-directed study that led to the Womanium programme two years later.
Moments, setups, and events — the texture of the work, not just the outputs. An honest record of what it actually looks like.
Gazebo Sim (Google DeepMind) and RViz running simultaneously with the UR5e arm rendered in the connector board environment. The right panel is NVIDIA's Isaac Sim. This image is a demonstration of a typical setup.
The first thing that becomes clear when you join a team of professional ML engineers mid-competition is how much you don't know — and, in my case, how quickly you need to decide whether that is a reason to hesitate or a reason to move faster. I joined Rangers-Intrinsic in April 2026, two weeks before the submission deadline for the Intrinsic AI for Industry Challenge: a robotics competition hosted by Intrinsic (Google/Alphabet) and Open Robotics, with a $180,000 prize pool, asking teams to train a UR5e robot arm to insert SFP and NIC cables into a connector board in a simulated environment. The team had already been running for weeks. I was not there at the start.
The GitHub repository has branches for the evaluator and for training; the experiment tracker on Weights and Biases has seventeen logged runs. I came in with a strong ML background but essentially no robotics experience, which meant I had to learn the simulation stack — ROS 2, Gazebo from Google DeepMind, Isaac Lab from NVIDIA, the AIC evaluation engine — while simultaneously being useful on the ML side, which is where I was actually needed.
The learning curve is steep in a specific way I had not anticipated. Some concepts I was already familiar with, such as ACT (Action Chunking with Transformers) is a transformer-based imitation learning method, flow-matching is continuous normalising flows, both of which I can work through from first principles. The difficulty I faced was often that the gap between understanding an architecture and getting it to run correctly inside a specific evaluation pipeline, on a specific dataset format, with a specific action space, is substantial. The team's recording pipeline captures 19-dimensional actions; the AIC standard is 6-dimensional. The observation space is 48 dimensions vs. the standard 26. Every discrepancy is a decision, and every decision has downstream consequences for training.
RViz: the UR5e model with its full TF frame tree. The overlapping labels are a sign of a misconfigured transform tree — one of those bugs that takes considerably longer to find than it should.
What I have found genuinely surprising is how much of the work at this level is not about the models at all. It is about infrastructure: getting checkpoints to load, debugging why GPU utilisation is 15% instead of 90% (almost certainly a CPU DataLoader bottleneck — more workers, pin_memory, prefetching), building the evaluation loop so that a trained model can actually be scored by the competition engine. The people who are effective on this team are effective because they can hold the full stack in their head simultaneously and move between layers without losing the thread. That is a skill I am actively developing.
My specific contributions: ACT and flow-matching policy training, W&B experiment tracking, data curation (filtering successful insertions from a brute-force collected dataset — the oracle cheat-code policy fails non-deterministically because cables get physically stuck in Zone 2 of the connector board). I am also the person most likely to be investigating the DataLoader bottleneck at 1am, which is, I think, a reasonable description of where I currently sit in the project hierarchy. Moving up that hierarchy, quickly, is the goal.
Forty seconds after launch. The contrail is already crossing the frame on all three feeds simultaneously.
I set up three simultaneous live feeds — laptop, external monitor, tablet, arranged in a shallow arc across my desk — to watch the Artemis II launch. This is not a remotely efficient use of a morning and I am not apologetic about it. Some things are worth the hours they cost.
Artemis II carries four astronauts on a lunar flyby: the first human beings to travel to the Moon's vicinity since Apollo 17 in December 1972, a gap of more than fifty years that is either a remarkable fact about the difficulty of the undertaking or a dispiriting fact about institutional priorities, depending on your mood. The Space Launch System is not the most elegant vehicle that has ever been pointed at the sky — SpaceX would, and does, have opinions about this — but the 8.8 million pounds of thrust at ignition produces a particular quality of awe that is not really about elegance.
The same desk, after launch. Three screens still running, the Gonville and Caius card visible on the wall to the right. The cracked monitor has been through a lot.
The reason I care about this — beyond the obvious spectacle of it, which is considerable — is the same reason I care about quantum computing and robotics: there are engineering problems that exist at the genuine frontier of what is physically possible, and watching them get solved, slowly and expensively and in full public view, feels like one of the more honest ways to spend an afternoon. The RS-25 engines at the base of the SLS operate on the same combustion cycle as the Space Shuttle main engines, built decades ago, but manufactured now to tolerances that did not exist when the original specification was written. That compression of time — the same fundamental design, made incrementally more precise across half a century — is what real engineering progress actually looks like, as opposed to what it looks like in a press release.
The wall behind the monitors, if you can make it out: a poster from the Goethe Institut — Das Leben ist kein Ponyhof, life is no pony farm — a card from the Cambridge CS department, a collection of origami flowers I have been folding at my desk while thinking about difficult problems. A reasonably honest cross-section of what goes on in this room on a Saturday morning.
Jane Street Europe. The circular logo is the same one on the cup on my shelf. The building is exactly what you'd expect and somehow still impressive.
I came second in Jane Street's Estimathon during FOCUS week, and I want to be precise about what that means and, more importantly, what it doesn't — because the Estimathon is not the kind of competition where second place is a comfortable shorthand for anything in particular.
The format is fifteen estimation problems in thirty minutes, scored by the inverse of your error across all of them. The questions span several orders of magnitude of difficulty, from questions that reward basic numerical intuition to ones that require a genuine understanding of physical constants, biological scales, or historical data that most people have never thought to commit to memory. The people who perform well are not, in my observation, the people who happen to know the most. They are the people who have learned to commit quickly to an estimate that is defensible, update it cleanly when new information arrives, and resist the specific kind of paralysis that comes from wanting to be more exact than the question actually requires. Approximation, wielded deliberately, is the entire game.
What surprised me, and has stayed with me since, is how closely the disposition that does well in the Estimathon resembles the disposition that seems to do well in research more generally. The Figgie card game — Jane Street's own invention, built explicitly to teach probability and market-making — makes this structure explicit in a way that is almost pedagogical: you are buying and selling contracts whose underlying value is uncertain, updating your expected value in real time as the market reveals information. It is Bayesian reasoning conducted under time pressure and competitive incentive. I found it completely absorbing in a way that told me something about myself that was useful to know.
What I have been sitting with since is a cleaner sense of where, within a firm like Jane Street, I would actually want to operate. The quant research side — building models, forming and testing hypotheses about market structure, working at the boundary between mathematics and financial reality — is where my interests genuinely lie. The pure execution side is a different kind of problem, and an important one, but it is not the problem that wakes me up in the morning. That distinction is worth being honest about when thinking carefully about where to direct the next several years of effort.
The Royal Exchange at night, walking back from Jane Street. The City looks best in the rain at 8pm. I stand by this.
Certificate from the Qubit by Qubit High School Quantum Computing Camp, July 2023. Instructors drawn from MIT and Caltech. I was fifteen.
The Summer of Quantum programme — run by The Coding School, with instructors and researchers drawn from MIT and Caltech — was my first formal encounter with quantum computing, and the thing I most remember about it is how quickly I became convinced that this was a field worth taking seriously. Not because of the applications, which were at that point largely hypothetical, but because of the mathematics. Quantum mechanics makes contact with linear algebra in a way that is both completely natural and genuinely surprising, and the moment that connection clarified — that a quantum state is a vector in Hilbert space, that a quantum gate is a unitary transformation, that measurement is projection — I understood that I was looking at something that was going to occupy a significant portion of my thinking for a long time.
I was fifteen. I know that sounds like a detail one mentions to impress, and I want to be honest that I am mentioning it partly for that reason, but also because it matters to the shape of what happened next. I was young enough that I did not yet have a clear sense of what was within reach and what was not, which meant I had not yet developed the particular habit of caution that tends to accumulate with experience. So when the programme ended, I wrote a paper about what I had learned.
The paper was not, if I am being truthful, a very good piece of academic writing. I was fifteen and I had been doing quantum computing for the better part of two weeks. What it was, was sincere — a genuine attempt to work through the material carefully, to understand not just the results but the derivations, to articulate what was clear and to identify honestly what was not. I wrote it because the act of writing something down is how I find out whether I actually understand it. The paper was the test, not the certificate.
What I did not anticipate was where the paper would lead. I submitted it to an essay competition — one with thousands of entries — not with any particular expectation, but because it existed and it seemed like the thing to do with it. Winning was a surprise. Being invited to Cambridge on the strength of it was a larger one. I met the Director of Studies in Computer Science at Gonville and Caius College, and visited the college properly — the courtyard, the sundial tower, the Hawking plaque set into the stones. I want to be careful not to overstate what the meeting was — it was a conversation, not an admission offer — but it was the first time someone at that level of academic seniority had taken my thinking seriously on the basis of something I had produced myself, and that is its own category of thing regardless of what formally followed.
What I took from it, practically: the paper mattered not because it was good, but because it existed and I submitted it. Writing something down is one act; sending it somewhere is a second and distinct act, with its own nonzero probability of producing something unexpected. Most of the time the probability is low. Fifteen-year-old me, writing a quantum computing paper in a school holiday and submitting it to a competition with thousands of entries, had a better intuition about this than she was consciously aware of.
Gonville and Caius College, Cambridge — the Stephen Hawking plaque, with the Bekenstein-Hawking entropy formula S = kc³A/4ℏG. It is set into the courtyard floor. Remember to look up at the stars and not down at your feet.
Gonville and Caius College. The sundial tower dates from the seventeenth century. The college has been here longer than most of the ideas I had come to discuss.
The workshop. Safety glasses on, metal frame clamped to the bench, arguing about whether the holes are in the right place. They were not.
BABY — Battlefield Aid Brought to You — was a year-long project conducted under Leonardo Defence Systems' Industrial Cadets programme, with a brief that was refreshingly unambiguous: design and build a fully autonomous ground vehicle capable of navigating to a target location, detecting and avoiding obstacles, and delivering a medical payload under its own guidance. No templates, no reference designs, no starter kit. A sheet of aluminium and a list of constraints.
The gap between understanding how pulse-width modulation works — which is a matter of an afternoon and a textbook — and getting two motors to spin smoothly and in coordination without overheating the driver board, is not a small gap. It is, in practice, a gap that takes weeks to close, and the closing of it involves a particular kind of learning that no amount of reading accelerates. The ultrasonic sensor triggered false positives from reflective floors. The GPS module had a five-metre accuracy radius, which is an enormous circle when the corridor you are navigating is three metres wide, so we built a visual landmark detection system as a fallback. Every subsystem, tested in isolation, behaved exactly as it should. Every subsystem, integrated with the others, produced behaviour that none of us had anticipated. That is engineering. Not the clean version described in lectures — the actual version, which is mostly a negotiation between what you designed and what the world does to it.
The chassis before electronics. Tank tracks, differential drive motors, aluminium frame. Heavier than expected, which became a tuning problem.
Internals: ultrasonic sensor, motor controllers, wiring harness. Eventually this got tidied up.
BABY navigating the test corridor autonomously. The moment this worked without intervention was legitimately exciting.
We won Project of the Year nationally, from thousands of submissions across schools in the United Kingdom, and received the Industrial Cadets Gold Award under the patronage of HM King Charles III. I am genuinely proud of both. But the thing I carry from that year is not the award — it is the specific quality of knowledge that comes from having built something physical, something that either works or doesn't, in a domain where there is no compiler error to blame and no stack trace to follow. The constraints are physics, time, and the tools you have to hand. That is a different kind of problem-solving from what I had practised before, and it changed the way I approach every problem I have worked on since.
CSES awards ceremony. The Project of the Year trophy was presented by the Mayor of Chelmsford. The trophy is much larger in person.
Receiving the Arkwright Engineering Scholarship on stage from two RAF officers. The slide reads: Royal Air Force. The hall is full. This was not something I had rehearsed for.
The Arkwright Engineering Scholarship is awarded each year to approximately two hundred students nationally, selected from a substantially larger pool by the Smallpeice Trust, with sponsoring organisations that include the Royal Air Force. The formal benefits are well-documented — mentorship, industry access, a sustained relationship with engineers working at the frontier of aerospace and defence technology. One of the less-expected benefits was a session in a Boeing 737 full-motion flight simulator, which is where most of what follows comes from.
The Boeing 737 simulator. Every switch and dial is functional. The simulation is indistinguishable from real flight for the scenarios you would encounter in training.
A cockpit is, among other things, a physical argument about how to organise information under conditions of time pressure and high consequence — what needs to be within arm's reach and why, what must be legible at a glance and in what sequence, how redundancy is layered into every critical system so that no single point of failure is permitted to become catastrophic. The overhead panel alone contains more switches than most people interact with in a week, each one corresponding to a specific failure mode that someone, at some point, encountered in a real aircraft and decided should be addressable from a standardised position. The entire cockpit is a compressed history of disasters that were survived and lessons that were learned from them.
The overhead panel: IRS alignment, electrical systems, hydraulics, flight controls. Understanding why each thing is where it is takes longer than one session.
What I took away from the session is not a desire to fly — though the sensation of a full-motion simulator banking through a crosswind approach is genuinely something — but a lasting appreciation for what interface design looks like when the consequences of getting it wrong are not a user complaint but a loss of life. Every decision in a cockpit has been reviewed, challenged, revised, and reviewed again. Nothing is arbitrary. The distance between that standard and the standard applied to most software interfaces is considerable, and I think about it often when I am building something that other people will depend on.
What I have not written about, and probably should, is what happened after the ceremony. The scholarship came with a named RAF sponsor — a senior officer who had presented the award on stage — and I made the decision to reach out to him directly, which is not, I suspect, what most scholars do. He responded. Over the following two years, that initial contact became a genuine mentoring relationship: conversations about engineering, about career, about what the RAF looks for in the people it develops, about the gap between what is taught in classrooms and what is required in operational environments. It was, in a quiet way, one of the more formative relationships of those years.
Through that relationship came access to things I would not otherwise have encountered. Flight lessons — actual lessons, in a light aircraft, not just a simulator session. A visit to RAF headquarters and to one of their operational bases. The 737 full-motion simulator session I have already described, which was, technically, part of this same extended access. All of it came from a single decision to send an email to someone whose address was on a certificate I had just been handed. I think about that occasionally when I am deciding whether or not to reach out to someone whose work I find interesting. The worst outcome is no reply.
Papers, challenges, and resources that have occupied my attention recently. Shared not as passive bookmarks but as genuine recommendations — each has informed my thinking in some measurable way.
One of those papers that ought to be mandatory reading for anyone working in deep learning. Language model loss scales as a power law with model size, dataset size, and compute — trends holding across seven orders of magnitude. Architectural details like depth and width matter remarkably little within a wide range.
The key insight: larger models are dramatically more sample-efficient. The optimal strategy under a fixed compute budget is to train a very large model on comparatively modest data and stop well before convergence. This is counterintuitive, but the empirical evidence is unambiguous. Every decision made in frontier model training today traces back to this paper.
An open research challenge I am actively participating in. Train the best language model within a 16 MB artifact — weights and training code combined — in under 10 minutes on 8×H100 GPUs. Performance is measured in bits per byte on a held-out FineWeb validation set, making it tokeniser-agnostic.
This is L(N) optimisation in the most constrained form: the limit forces genuinely creative thinking — depth recurrence, aggressive parameter tying, quantisation-aware training from scratch, novel tokenisers, test-time compute tricks. OpenAI are offering $1M in compute credits through RunPod; standout participants may be invited to interview for research positions. The current leaderboard sits around 1.08 BPB.
The direct ancestor of Parameter Golf. The community has taken Andrej Karpathy's 45-minute GPT-2 replication and driven it to approximately 1.45 minutes through 75 successive records — an extraordinary collaborative effort spanning optimiser design, architectural innovation, and systems engineering.
The techniques catalogue here is invaluable: Muon optimiser with Newton-Schulz orthogonalisation, RoPE, QK-Norm, ReLU² activations, value embeddings, FlexAttention with sliding window warmup, fused Triton kernels, multi-token prediction, bigram hash embeddings, and partitioned hyperconnections. If you care about efficient training at all, study the record history. Every entry is a masterclass in extracting marginal gains.
Two simultaneous results that warrant serious attention. A Caltech team (Bluvstein, Cain, Preskill) designed a quantum architecture that could theoretically break RSA using roughly 100,000 neutral-atom qubits. Separately, Google's Craig Gidney published an implementation of Shor's algorithm 10× more efficient than prior methods, potentially breaking elliptic curve cryptography with under 500,000 qubits.
Neither machine exists yet. But both results compress the timeline considerably — fault-tolerant quantum computers capable of breaking widely-deployed cryptography may be years away rather than decades. The implications for post-quantum cryptography transitions are urgent, and I do not think the broader technology industry is taking this seriously enough.
A round-up of NVIDIA's robotics ecosystem: Isaac GR00T open models for natural language robot control, Cosmos world foundation models for synthetic data generation, the open-source Newton 1.0 physics engine, and Isaac Sim 6.0 now generally available.
Featured use cases include surgical robotics (PeritasAI), underwater simulation (OceanSim), agricultural automation (Aigen's solar-powered weed-removal rovers), and autonomous solar installation (Maximo). NVIDIA's strategy is transparent — embed their tooling so deeply into the robotics development pipeline that it becomes infrastructure — but the engineering output is nonetheless impressive.
A multi-month hackathon with over $100K in prizes — including a humanoid robot as the grand prize. Ran on NVIDIA's Isaac Sim / Omniverse stack, covering manipulation, perception, and agentic control across Pro, Amateur, and Junior tracks. Winning teams were flown to GTC 2026 in San Jose.
This is fundamentally a talent pipeline play. NVIDIA are embedding their tools with the next generation of robotics engineers whilst simultaneously identifying strong candidates for their ecosystem. The structure is clever: the competition is the recruitment process.
A detailed walkthrough of the Jane Street quant trader online assessment — 4 questions in 30 minutes. No algorithms, no coding. Entirely probability, expected value, and Bayesian reasoning: parity of prime sums, a dice game EV optimisation, a Bayesian coin-flip problem, and an urn strategy problem.
Essential reading not for the specific questions (which rotate) but for calibrating the type and depth of probabilistic thinking expected. The time pressure is the real filter — the questions are tractable, the clock is not.
A well-curated reading list spanning ML (ESL, Bishop, Russell & Norvig), econometrics (Wooldridge, Shumway), quantitative investing (Paleologo's Elements of Quantitative Investing, Dama's On Automated Trading), and interview preparation (Zhou, Crack's Heard on the Street, Mosteller's 50 Probability Problems). Also covers competitive programming — USACO, Codeforces, Project Euler.
Good curated lists are genuinely rare. Most "awesome" lists on the internet are bloated beyond utility. This one is not.
Free introductory lectures covering risk-adjusted performance (Sharpe ratio, diversification, portfolio construction), the quant research process (backtesting, overfitting pitfalls), and a case study on a cryptocurrency strategy designed to profit from large liquidation events.
A reasonable entry point for those curious about quantitative finance — the production quality is decent and the content avoids the hand-waving that plagues most introductory material.
A structured programme from Google covering foundational security concepts, tools, and practical skills. Included here because cybersecurity literacy is increasingly non-optional — even for those whose primary work lies elsewhere. Understanding threat models, cryptographic primitives, and network security is part of building serious systems, not a separate specialism.
A thorough tutorial covering both Python and C++ implementations of findContours() and drawContours(), the two approximation methods, and the four retrieval modes with hierarchy relationships. Real-world applications: motion detection, unattended object detection, image segmentation.
Computer vision fundamentals never go out of fashion. The retrieval mode hierarchy — RETR_LIST, RETR_EXTERNAL, RETR_CCOMP, RETR_TREE — is one of those things one looks up repeatedly until it finally sticks. This tutorial made it stick.
Last updated: April 2026
Research papers and books I have read, am reading, or keep returning to — with my own perspective on each. These range from foundational to recent; the criterion for inclusion is that they changed how I think about something.
A formal academic survey written as part of the WISER + Womanium Quantum Computing Programme. This was a postgraduate-level programme I attended before starting my undergraduate degree. Covers quantum computing foundations through to algorithms for nonlinear PDEs and differential equations: qubits, quantum gates, phase kickback, QFT, QPE, Grover's algorithm, and quantum linear solvers including HHL, VQLS, functional quantum linear solvers, and Schröddingerisation.
The HHL section gives honest treatment of the exponential speedup's caveats — state preparation cost, readout limitations, condition number dependence — alongside near-term alternatives like VQLS. The goal was to document the full arc from foundations to current research frontiers, written for a reader who wants to understand not just what the algorithms do but where the gaps are.
Submission for the British Physics Olympiad Computational Challenge, in which I was awarded Gold and Best in Cohort. The project builds a ray tracer for spherical mirrors from first principles: applying the law of reflection iteratively to simulate image formation, distortion, and multi-bounce behaviour. The writeup covers the physics derivation, implementation decisions, and analysis of where the paraxial approximation breaks down.
The interesting part is the edge cases — what happens near the focal point, where the mirror equations become singular, and how to handle rays that don't converge. The simulation makes these visible in a way that standard optics diagrams don't.
Technical writeup for the chess engine project which covers board representation, move generation, search algorithm design (minimax with alpha-beta pruning), and evaluation function construction. The writeup documents the design choices and the performance trade-offs at each stage: why bitboards over piece lists, how the search tree is pruned, what the evaluation function captures and what it misses.
Chess engines are a good forcing function for understanding search under constraints as the game tree is deep and branching factor is high. It is often that the gap between a naive implementation and a competitive one is almost entirely about algorithmic efficiency rather than brute force.
A research survey on competitive approaches to the OpenAI Parameter Golf challenge which is to build the best-performing language model within a 16MB weight budget, evaluated in bits per byte on FineWeb, trained on 8×H100s. The survey covers the competitive landscape: architecture compression techniques, tokenisation strategies, distillation approaches, and the trade-offs between model capacity and training efficiency at extreme scale constraints.
I enjoyed learning about how TurboQuant (research from DeepMind) uses KVCache cleverly for model quantisation.
The central result — that loss scales as a power law with model size, dataset size, and compute across seven orders of magnitude — is deceptively clean, but the implications are enormous. The insight that changed the field is that larger models are dramatically more sample-efficient: the optimal strategy under a fixed compute budget is to train a very large model on comparatively modest data and stop well before convergence. This is counterintuitive, and the evidence is unambiguous.
What I find worth dwelling on is what this paper says about the nature of the problem. The fact that architectural details like depth and width matter remarkably little within a wide range suggests that something more fundamental is being learned than the architecture would imply. The scaling laws feel less like engineering results and more like physics — as if intelligence itself has a thermodynamics.
Weizenbaum built ELIZA as a demonstration of the superficiality of natural language understanding — and was disturbed when users formed genuine emotional attachments to it anyway. He considered this a damning result. What strikes me now, reading it in 2023 after building my own ELIZA-inspired chatbot, is how little has changed in the fundamental dynamic: the illusion of understanding is sufficient to produce the effect of understanding, at least in interaction.
The paper is also a starting point for a serious question that LLMs make urgent again: is there a meaningful difference between a system that appears to understand and a system that does? Weizenbaum thought yes, clearly. The current evidence is less comfortable.
The paper that every subsequent LLM descends from — GPT, Claude, Gemini, all of it. The core idea (replacing recurrence with self-attention entirely) seems obvious in retrospect, but the combination of multi-head attention, positional encoding, and the encoder-decoder structure was genuinely novel. What the paper doesn't tell you is that the implications would take several years to become apparent, requiring scale results like the Kaplan paper to unlock.
Worth reading alongside the NanoGPT speedrun community's work, which strips the architecture to its minimum and demonstrates that most of the original design choices can be substantially improved.
Read this directly before the ETH Oxford hackathon where we used QAOA for routing optimisation in FZAP. The algorithm's elegance is in how it maps a combinatorial optimisation problem onto a parameterised quantum circuit — alternating cost and mixing unitaries — then optimises the parameters classically. It's a hybrid approach, which makes it practical on NISQ hardware where circuit depth is severely limited.
The honest assessment is that QAOA's quantum advantage over classical methods hasn't been demonstrated at the scales that matter yet. But it's the most tractable near-term algorithm for problems with combinatorial structure, and the Max-Cut mapping we used in FZAP worked cleanly.
HHL claims exponential speedup for solving linear systems Ax=b over classical methods — but the caveats are what make it interesting. The speedup is real only when the matrix A is sparse and well-conditioned, when state preparation is efficient, and when you only need to read off certain properties of the solution rather than the full solution vector. Each of these qualifications is significant, and together they considerably narrow the set of problems where the speedup materialises.
I spent considerable time on this for my quantum survey paper. The honest treatment matters: the algorithm is a landmark result in quantum complexity theory, but the path from HHL to practical quantum advantage in linear algebra is longer than many popular accounts suggest.
Add your perspective here — what the paper changed about how you think, what it got right, what it got wrong.
How to Think Like a Mathematician · ESL (Hastie, Tibshirani, Friedman) · Book of Integrals · A Mathematician's Apology (Hardy) · The Man Who Loved Only Numbers (Hoffman)
Add books you've read or are reading — include a short take.
Talks, conferences, and events that were worth attending — and worth writing about. Not a list of appearances, but an account of what actually landed.
Peter Steinberger — founder of OpenClaw, the open-source agentic coding framework — gave the headline keynote at the UK AI Agent Hackathon at Imperial. The slides were sparse. He spoke without the particular brand of performed confidence that keynotes at technology events tend to produce, which was, I think, the point.
The substance of what he said was not complicated, though the implications of it are. He built things in a way that most people in his position did not — slowly, carefully, according to his own technical instincts rather than the consensus of the industry around him — and he continued building that way for long enough that the work accumulated into something of genuine quality and genuine originality. He did not optimise for what was fashionable or fundable or externally impressive. The results, he said, followed from the work itself, not the other way around. "Don't just read about stuff. Play with it and actually go and build things. It doesn't even matter if you end up using it or not. It's really more like the road that's important."
What made the talk land was not the content in isolation — which is, in the abstract, advice that most people have heard in some form — but the texture of the specificity with which he described it. He was not reciting a framework. He was describing, in the particular detail that only lived experience produces, a way of working that he had actually inhabited for years. The distinction matters. There is a large difference between someone who believes that intrinsic motivation produces better work and someone who has organised their professional life around that belief and can tell you exactly what it costs and what it returns.
We placed fourth internationally at this hackathon — the only team of first-years in the competition. But the talk was the thing I kept returning to on the train home.
Three screens, a cracked monitor, a Raspberry Pi in the foreground that I did not bother to move. The countdown read T-00:00:02 when this photograph was taken. I had arranged multiple live feeds — NASASpaceflight's external coverage, NASA's own official stream, and a third tracking ground operations — because I wanted more than one camera angle, and I was not willing to miss the ignition sequence from any of them.
I find human spaceflight genuinely moving in a way that is somewhat difficult to articulate without crossing into the territory of the mawkish, which I would prefer to avoid. But there is something about the specific nature of the engineering problem involved — the extraordinary and unforgiving precision required to sustain human life in an environment that is, in every physical sense, hostile to it — that I find clarifying in a way that very few things are. It is one of the few domains in which the phrase "good enough" is not a category that can meaningfully exist; where the acceptable margin of error is not a business decision but a physical constant.
The thought I kept returning to, watching the SLS clear the launch tower: the systems that matter most are the ones in which failure is not a recoverable state. Most of what I build carries the quiet luxury of iteration — deploy, observe, correct, repeat. Artemis carries no such luxury. That asymmetry in consequence produces a different quality of rigour, and I think it is worth deliberately importing that quality into work that operates under lower stakes, precisely because the lower stakes make it easier to become careless.
The Arkwright Engineering Scholarship ceremony takes place in a formal hall of the kind that produces, without any explicit instruction, the instinct to stand slightly straighter. Approximately two hundred scholars are selected nationally each year — from a considerably larger pool — and the awards are presented on stage by representatives of the sponsoring organisations, in the presence of the other scholars, their teachers, and whichever family members have made the journey. Mine was sponsored by the Royal Air Force. The award was handed over by a former RAF Commander-in-Chief.
The certificate reads, in the precise and old-fashioned language that formal awards tend to use: in recognition of outstanding potential as a future leader in Engineering. There is something about receiving that form of words from someone who has spent a career at the operational frontier of British engineering — where the tolerances are measured in millimetres and the consequences of exceeding them are measured in lives — that carries a different weight than a certificate that arrives in the post. I don't think that weight is entirely ceremonial.
The scholarship brought mentorship, industry visits, and a sustained relationship with engineers working at the edge of aerospace and defence technology — all of which mattered. But more practically than any of those: it was the first time an institution I genuinely respected told me, without qualification, that the trajectory I was on was a considered and worthwhile one. External validation is not, in itself, the point of anything. But there are moments when it is useful to have confirmed that you have not been completely misreading the map.
Imperial College London. Academic record from school — the formal substrate beneath everything else on this site.
| Subject | Grade | Notes |
|---|---|---|
| Mathematics | A* | — |
| Further Mathematics | A* | — |
| Physics | A* | — |
| Computer Science | A* | — |
| German | A* | — |
Five A-levels — the vast majority of students take three. All five at A*.
A record of recognitions received across engineering, mathematics, physics, and computing. Some you frame. Some you earn in rooms where you can barely breathe.
I'm looking for research collaborations in ML: Physical AI, sim-to-real robotics, Reinforcement Learning, Language Models and even concepts in Mathematics and Physics which are intriguing and interdisciplinary.
I am also still participating and applying for spring weeks, internships, programs and research fellowships. I think it's a fantastic way to learn, throwing yourself into the deep end with a high learning rate and hopefully decreasing loss over time!
I also love having conversations about quantum computing, ML systems, or anything at the intersection of mathematics and computer science (and even Physics!)
I strive to inspire and be inspired — so if you're working on something challenging and/or revolutionary, I'd love to hear about it.
Currently based in London · Imperial College London
Things I've been watching, reading, and thinking about — written up while still fresh. Sources linked throughout. These are notes in the genuine sense: not essays, just thinking out loud.