The Robotic Dexterity Deadlock
February 12, 2026 · Quanting Xie, Tongzhou Liao, Yonatan Bisk drag to rotate · scroll to zoom > TL;DR: Robot dexterity is stuck behind many unsolved problems. This post focuses on one that we think is underappreciated: the gearbox. High-ratio gearboxes break sim-to-real transfer, destroy force transparency, and are the first thing to wear out. We explain why, and what we did about it. Contents 1. Introduction: What is a dexterity deadlock? The gap between locomotion and manipulation 2. The Problem The Geometric Curse · Sim-to-Real Gap · N² Impedance · Chaos Term · Information Wall · Weakest Link 3. Why Manipulation Needs Better Contact patterns · Impedance control · QDD for hands 4. Our Solution From 288:1 to 15:1 · Axial flux motors · Thermal optimization 5. Validation Hardware testing · Backdrivability · Force sensing through motor current 6. Conclusion Hardware fixes for software problems You’ve probably seen the video: dozens of humanoid robots dancing in sync at the 2026 Chinese New Year celebration. Pretty cool. While legged robots can sprint across rough terrains and do all those fancy flips, the locomotion problem isn’t entirely solved; but clearly it’s on the trajectory. But we haven’t seen anything close to that for manipulation. Why? Watch what human hands do every day: Folding paper: precise control, constant tactile feedback, coordinated finger motion. Soldering wires: tool manipulation, thermal awareness, millinewton force precision. Think about what folding an origami actually requires: crease the paper along an exact line without tearing, apply just enough force to make a sharp fold, coordinate multiple fingers to hold and guide simultaneously, and constantly adjust based on tactile feedback as the paper’s stiffness changes with each fold. You do this without thinking. A robot hand slowly struggles with every single step. And that’s just paper; forget about soldering tiny wires, tying shoelaces, threading needles, or basically anything your eight-year-old does without thinking. The gap between locomotion and manipulation isn’t just large. It’s widening. We kept asking ourselves: why? There are many deadlocks standing in the way of true dexterity: occlusions, multi-finger coordination and planning, contact modeling. We started calling the whole tangle the Dexterity Deadlock. Today we want to look at one specific piece of it: the gearbox. Why mount motors in the fingers at all? You _could_ place motors on the forearm and route power through cables, as human anatomy does. Tendon-driven designs have real advantages: low reflected inertia, fast response, and they keep the heavy actuators off the hand. Many recent systems take this path. But robot tendons introduce friction and play at every guide point, and they stretch over time, making the system fundamentally unreliable. Cable-driven hands need constant recalibration as tension drifts, friction varies with temperature and wear, and backlash accumulates in the routing. We won’t address tendon-driven architectures in this post; they deserve their own deep dive. The other common path: mount small motors directly in the fingers, then use high-ratio gearboxes to amplify their weak torque. But those gearboxes destroy the very things dexterous manipulation needs: accurate simulation, force transparency, mechanical reliability. You can’t get dexterity _with_ the gearbox. You can’t get torque _without_ it. Nearly every robot hand on the market is stuck in this trap: packed with 100:1, 200:1, sometimes 288:1 gearboxes that poison everything downstream. They make simulation inaccurate. They block force information from reaching the motor. And they’re the first thing to break. When a learned policy fails to transfer from sim to real, the instinct is to blame the algorithm. Train a bigger network. Crank up domain randomization. Those approaches have made real progress; we don’t deny that. But at some point we started wondering: are we treating the symptom or the disease? What if the transmission itself is the bottleneck? At Origami, we’re taking a different approach: instead of patching the software to work around bad hardware, we’re redesigning the hardware to need less patching. A key part of this is dramatically reducing the gear ratio. What follows is the story of why that matters, what we had to invent to make it possible, and what it unlocked. The Problem ----------- ### The Geometric Curse First, you have to understand why the gearbox is there in the first place. Nobody _wants_ a 288:1 gearbox in a finger. So why do nearly all robot hands have one? .png) A leg motor (Gear Ratio 6) vs. a finger servo (Gear Ratio 288). Torque scales with r³. Look at a leg motor. Big radius, big lever arm, τ=F×r\tau = F \times r. It’s naturally strong. Barely needs gearing. Ratio 6:1. Transparent. Now look at a finger. There’s no room. You have to shrink the motor until it fits, and torque collapses. Here is the cruel math: for geometrically similar motors (where all dimensions including length shrink equally, like comparing a 1cm³ motor to a 10cm³ motor), torque scales with the cube of the linear dimension (r 3 r^3). Both the cross-sectional area and the lever arm shrink together. Make a motor 10× smaller in each dimension and it gets roughly 1,000× weaker. Torque doesn’t just drop. It vanishes. (Clever winding and magnet choices can soften the blow, but the scaling trend is real and punishing.) So engineers compensate with massive gear ratios: 200:1, 288:1. The motor fits. The finger moves. Problem solved? No. This is where the problems _start_. This trade-off is the birthplace of the sim-to-real gap. ### The Sim-to-Real Gap Torque error ~23%Model real gearbox In simulation, motors are ideal torque sources. You command 5 Newton-meters, the joint applies exactly 5 Newton-meters, instantly. Clean: τ o u t=τ i n\tau_{out} = \tau_{in} But inside a real gearbox, physics fights back: τ o u t=τ i n−f(θ˙)⏟Friction−δ(θ)⏟Backlash−ϵ chaos\tau_{out} = \tau_{in} - \underbrace{f(\dot{\theta})}_{\text{Friction}} - \underbrace{\delta(\theta)}_{\text{Backlash}} - \epsilon_{\text{chaos}} (We’re even omitting the reflected inertia term N 2 J r o t o r θ¨N^2 J_{rotor}\ddot{\theta}, which we’ll get to; the static losses alone are enough to break the sim-to-real bridge.) Every one of these terms is a nail in the coffin. Play with the chart below. The black curve is the real friction characteristic (a _Stribeck curve_) with a sharp discontinuity at zero velocity where stiction kicks in. The dashed red curve is what simulators use: a smooth approximation. Toggle each and watch how they diverge, especially near velocity reversals where the sim completely misses the stiction jump. Model gap near v≈0 ~75%At reversal sim misses stiction The Structural Gap f(⋅)f(\cdot). Here’s what took us a while to see. Stiction and backlash aren’t just “noise”; they’re _discontinuous_ functions. Step changes. Dead zones. Continuous approximations exist (see bristle models of friction), but these sharp discontinuities remain difficult to model in simulation and nearly impossible for neural networks to learn accurately. You’re trying to approximate a jagged cliff with a smooth curve. A policy trained on smooth physics will hallucinate when it hits the jagged reality of a gearbox. Software fixes introduce side-effects and approximations that compound over time. A hardware fix (reducing the discontinuity at the source) is more fundamental. Seeing It in Motion. Equations are one thing. Now watch what a gear train actually does. Toggle between low-ratio and high-ratio: dead zones appear, energy vanishes into friction, and backdrivability dies. Backlash minimalFriction loss ~2%Backdrivable yes The N 2 N^2 Impedance Mismatch. This one is brutal. Reflected inertia doesn’t scale linearly with the gear ratio. It scales with the _square_: J r e f l e c t e d=N 2⋅J r o t o r J_{reflected} = N^2 \cdot J_{rotor} Reflected inertia 225×Output feels moderate As Russ Tedrake explains in his MIT Robotic Manipulation course, this is why most commercial robots are position-controlled rather than torque-controlled: the gearbox makes the motor’s dynamics dominate the world’s dynamics. Standard hands use N≈100 N \approx 100. That means N 2≈10,000 N^2 \approx 10{,}000. The simulator thinks the finger is light and backdrivable. The real finger hits with the momentum of a sledgehammer. Delicate manipulation becomes very difficult when the finger itself has that much inertia. The Chaos Term ϵ chaos\epsilon_{\text{chaos}}. This one might be the most insidious. It represents time-variant dynamics that no static model can capture: grease viscosity changing as the motor heats up mid-experiment, microscopic wear on gear teeth after 100 hours of operation, manufacturing tolerances that make every single unit slightly different from the last. It’s difficult to simulate because it’s a moving target. It’s difficult to calibrate away because it drifts. It’s not noise. It’s entropy. Consider OpenAI’s Rubik’s Cube project: a policy trained on a Shadow Hand (tendon-driven, high-ratio) that took _thousands of years_ of simulated experience and Automatic Domain Randomization (ADR). The task is genuinely hard; but ADR exists because the sim-to-real gap is massive, and much of that gap traces back to the tendon transmission. The hardware didn’t cause the entire problem, but it made it vastly more expensive to brute-force through. The broader response follows the same pattern: model the gearbox, add domain randomization, or train a compensation network like the Unsupervised Actuator Net from Fey et al. at MIT. These approaches can work; but they’re specific to one physical unit and drift as the gearbox wears. What works in the lab may need retuning on the next unit off the line. For a prototype, that’s manageable. For a fleet, it’s a maintenance problem disguised as a modeling problem. The visualization below shows why. With a low-ratio transmission (left), even modest domain randomization covers the sim-to-real gap. With a high-ratio gearbox (right), friction, backlash, and chaos push reality far from the simulation’s mean output. Crank the DR width to maximum; reality stays stubbornly outside, and the training cost explodes. DR Width 10% Training cost 7×Low-ratio gap not coveredHigh-ratio gap not covered We think there’s a simpler path: reduce the hardware complexity so there’s less to model in the first place. ### The Information Wall Everything above focuses on actuation: commanding forces and motions. But manipulation requires information flow in _both_ directions. The hand must also sense: read contact forces, detect slip, feel compliance. The DDHand project at CMU framed this explicitly: _“We view the gripper as a signal transmission channel, and seek high-bandwidth, high-fidelity transmission of force and motion signals in both directions.”_ A hand is a full-duplex communication channel. It receives information from the world through torque feedback, informing the policy how to act. It transmits information to the world, instructing how the environment should evolve in response. What does channel capacity have to do with manipulation? Think of a phone call over a bad connection: the bandwidth is too low (voice cuts out), and the SNR is too low (static drowns out words). You can barely communicate. Now consider Shannon’s AWGN channel capacity formula: C=B log2(1+SNR)C = B \log_2(1 + \text{SNR}), where C C is information capacity (bits/second), B B is bandwidth (Hz), and SNR\text{SNR} is signal-to-noise ratio. High capacity means more information per second. For a robot hand, high _actuator information capacity_ means the ability to execute and sense complex, responsive behaviors: a clear conversation between hand and world. A high-ratio gearbox degrades both terms. Bandwidth collapses: reflected inertia scales as N 2 N^2, limiting how fast the system can respond. SNR collapses: external force reflected to the motor is divided by N N, while friction noise stays constant. The signal-to-noise ratio degrades at least as fast as 1/N 1/N: SNR≈τ e x t/N τ f r i c t i o n=τ e x t N⋅τ f r i c t i o n\text{SNR} \;\approx\; \frac{\tau_{ext} / N}{\tau_{friction}} \;=\; \frac{\tau_{ext}}{N \cdot \tau_{friction}} By keeping reflected inertia low, we maximize bandwidth. By keeping friction, backlash, and chaos low, we maximize SNR. A low-ratio transmission preserves actuator information capacity in both directions: fast, precise actuation _and_ high-fidelity force sensing. At N≈100 N \approx 100, the reflected force from a gentle touch is smaller than the friction noise floor. The channel is effectively closed: like trying to have a conversation through a wall. SNR 13 dBReflected inertia 225×Transparency good And stiction makes it worse. It doesn’t just add noise: it creates a hard _dead zone_. Below the stiction threshold, force information isn’t degraded. It’s _erased_. The controller receives nothing until the force is large enough to overcome the static friction of the entire gear train. For delicate manipulation; sensing millinewtons of contact; this is devastating. This is why most geared robot hands need external force/torque sensors at every fingertip: adding cost, fragility, and wiring complexity. A transparent actuator doesn’t need them. The motor _is_ the sensor. Sangbae Kim, whose MIT Biomimetic Robotics Lab pioneered this approach for legs, calls it _proprioceptive actuation_: deliberately echoing the biological term. In biology, a surprising amount of force information comes not from the skin but from the muscles themselves, through low-friction tendons that preserve the signal. A low-ratio gearbox does the same thing for a robot. ### The Weakest Link There’s a third problem that doesn’t get enough attention in papers but matters enormously in practice: gearboxes break.
A high-ratio gearbox small enough to sit on a fingertip. The teeth are tiny: and fragile under impact. Look at the size of those teeth. A high-ratio gearbox is the most mechanically complex component in a servo actuator, and it has to fit inside a finger. Gear teeth mesh under load, grease degrades, backlash increases over time; and because the teeth themselves are so small, they’re easy to break under impact. We’ve tested and taken apart a lot of hands. The gearbox is almost always the first thing to fail. And when it fails, the entire finger is dead. This isn’t just an engineering annoyance. It’s a scaling problem. If you want to deploy robot hands in the real world, reliability matters as much as performance. Every gear tooth is a potential failure point. Going from 288:1 to 15:1 doesn’t just improve the dynamics. It dramatically simplifies the mechanical system, reduces wear surfaces, and extends the operational lifetime of the hand. Why Manipulation Needs Better ----------------------------- Quasi-direct drive isn’t a new idea. The MIT Cheetah proved it for legs. UC Berkeley’s Blue robot proved it for arms. Low gear ratios made sim-to-real transfer work and gave these robots the compliance to survive real-world impacts. So the obvious question: why hasn’t anyone done this for hands? The short answer is the geometric curse; fingers are just too small for powerful motors. But there’s a deeper question worth asking first: does manipulation actually _need_ transparent actuators? Could you get away with position control behind a high-ratio gearbox, as long as you solve the sim-to-real problem some other way? We don’t think so. And it comes down to how differently manipulation and locomotion relate to contact. A walking robot touches the ground in brief, periodic, predictable patterns. Foot strikes, pushes off, lifts. Contact is mostly a _side effect_ of the trajectory. A position controller behind a stiff gearbox can handle this; QDD buys you better sim-to-real transfer and energy efficiency, but locomotion _can_ work without it. Manipulation is different. Contact isn’t a side effect. It _is_ the task. When a hand grasps a cup, threads a needle, or turns a screwdriver, every finger is in sustained, force-rich contact with the object. The quality of that contact; how much force, how compliant, how responsive; determines whether the task succeeds or fails. As Neville Hogan argued in his foundational work on impedance control, manipulation requires controlling the _relationship_ between force and motion, not just one or the other. Locomotion ~12% contact timeManipulation ~100% contact time There’s a subtler point too. As Kim has observed, people assume you need tactile sensors to manipulate; but in biology, a large share of force information comes not from skin mechanoreceptors but from the muscles themselves, through proprioception. The low-friction tendon acts as a transparent channel. A low-ratio actuator does the same: it lets the motor feel what the finger feels. A high-ratio gearbox walls that channel off. Locomotion proved that QDD works. Manipulation is where we believe it matters most. The gearbox doesn’t just make simulation harder. It makes the task itself harder. It blinds the hand, stiffens the fingers, and destroys the force control that manipulation fundamentally requires. * What We Built ------------- ### From 288 to 15 So the question became: can we build a motor small enough for a finger but strong enough to barely need a gearbox? As far as we know, nobody had done it. The geometric curse made it seem impossible. The key insight was rethinking the motor topology itself. Conventional servo motors use a radial flux design; compact, well-understood, but the lever arm is limited by the rotor radius. We switched to an axial flux architecture. It’s flatter, which fits better inside a finger, and the magnets sit at a larger effective radius; longer lever arm, more torque per unit volume. Same physics that makes the geometric curse so punishing, now working in our favor. .png) Radial flux (conventional) vs. axial flux (ours): smaller volume, longer lever arm. But a better topology alone isn’t enough. A motor that produces 1.6× more torque per volume still can’t sustain that torque if the windings melt. So we also optimized the power electronics and thermal management; making sure the steady-state temperature stays within safe limits even under continuous load. It’s the combination of both that lets us drop the gear ratio from 288:1 all the way down to 15:1 while maintaining roughly the same output torque. (Full technical details in our paper.) That single number; 288 to