Robo2u
All posts
humanoid-robotstesla-optimusfigureunitreeactuatorsdegrees-of-freedombipedal-locomotionembodied-airobotics-hardwareguide

Humanoid Robot Hardware: The Ultimate Guide

An engineer's teardown of 2026 humanoid robot hardware — actuators, hands, legs, sensing, power, compute — with real DoF, mass, torque, and cost numbers, plus an honest read on teleop demos.

By Robo2u Editorial · 38 min read

A humanoid robot is the hardest commodity in robotics: a bipedal, two-armed, dexterous machine that has to balance, walk, manipulate, perceive, and think — all inside a power and mass budget roughly the size of a person. Every subsystem fights every other one. Make the actuators stronger and you add mass, which needs stronger actuators. Add battery for runtime and you add mass, which cuts runtime. The whole discipline is an exercise in not losing that fight too badly.

This guide is the long version, subsystem by subsystem: the 2026 roster and what's actually shipping, degrees of freedom and how they're spent, the actuator problem (which is the problem), hands, legs, sensing, power, compute, and the uncomfortable truth about teleoperation. Real numbers with units, real robots, and opinions with reasons. The goal is that you finish able to look at a humanoid spec sheet — or a glossy launch video — and know what's real, what's marketing, and what's quietly being left out.

The take: In 2026, humanoid hardware is far ahead of humanoid autonomy. The bodies can walk, balance, and grasp; the actuators are good enough; the bill of materials is on a credible path to under $50k. What is not solved is letting the robot decide what to do on its own in an unstructured environment. A large fraction of the impressive "autonomous" manipulation demos you have seen are teleoperated, or are narrow policies trained on exactly that scene. Read every demo with that prior. The bottleneck is not motors anymore — it's the software stack and the data to train it.

Companion reading: robot actuators, brushless DC motors, gearboxes (harmonic & cycloidal), and legged & quadruped robot hardware.

Table of contents

  1. Key takeaways
  2. Why humanoids now
  3. The 2026 humanoid roster
  4. Degrees of freedom & kinematics
  5. The actuator problem
  6. Hands & manipulation hardware
  7. Bipedal locomotion hardware
  8. The sensing suite
  9. Power & thermal
  10. Onboard compute
  11. The teleoperation reality
  12. Manufacturing & cost
  13. The 2026→2027 outlook
  14. Frequently asked questions

Why humanoids now

The question is not "can we build a human-shaped robot" — we have for decades, going back to Honda's P2 in 1996 and ASIMO in 2000. The question is "why is everyone building them now, with serious money." Three things changed.

The form-factor argument

The world is full of infrastructure designed for a 1.7 m bipedal primate with two five-fingered hands: 0.7–0.9 m countertops, 0.8 m door openings, stair risers around 0.18 m, steering wheels and pedals, tools with handles sized for a human grip. A wheeled arm can't climb the stairs; a fixed cell can't move to the work. A humanoid is a general-purpose physical adapter to all of that without re-engineering the environment.

Rule of thumb: The humanoid form is rarely the optimal shape for any single task. A wheeled base beats legs on a flat warehouse floor; a fixed gantry beats an arm for repetitive pick-place. The humanoid bet is that one body that can do everything passably beats ten special-purpose machines — because deployment, retraining, and capital flexibility dominate at scale.

That's a real argument and also a convenient one for raising capital. Be honest about which half is talking.

The software unlock

The body was buildable in 2010. What wasn't buildable was a controller that could decide what to do. Classical robotics scripted every motion; that doesn't generalize to "tidy this room." Two developments cracked the ceiling:

  • Large language / multimodal models that can take a goal in natural language and produce a plan, and can ground that plan in what a camera sees.
  • Vision-language-action (VLA) models — policies that map pixels and a language goal directly to motor commands, trained on large demonstration datasets. This is the architecture behind most 2026 manipulation work (Figure's Helix, Physical Intelligence's π-series, Google's RT-2 lineage, NVIDIA's GR00T).

Suddenly a humanoid had a plausible path to general behavior. That's why the money showed up.

The honest state

Here's the part the videos don't say out loud. The hardware is capable — a 2026 humanoid can physically perform almost any single human task you'd show in a demo. The autonomy is immature — letting it choose and chain those tasks reliably in an environment it hasn't been trained on is unsolved.

The honest take: We have working bodies and toddler brains. Progress in 2026–2027 is gated by data and learning algorithms, not by torque density or DoF. Anyone selling you "the hardware is the hard part, and we've cracked it" is half right and using it to skip the half they haven't.

This guide is about the hardware. Just don't confuse a great body for a finished product.

The 2026 humanoid roster

The field is crowded. Below is the serious tier as of mid-2026. Numbers are best-available public figures; vendors disclose selectively and "spec" often means "target" or "best demo unit," so treat anything to two significant figures as approximate and anything about price as aspirational.

Robot Height Mass DoF (approx) Payload Runtime Price target Actuation notable
Tesla Optimus (Gen 2/3) ~1.73 m ~57–73 kg ~28 body + ~11–22/hand ~9 kg (claimed ~20 kg) ~2–5 hr <$20–30k (target) Mixed rotary + linear; in-house actuators
Figure 02 / 03 ~1.68 m ~60–70 kg ~30+ body ~20 kg ~4–5 hr undisclosed In-house actuators; Helix VLA
1X Neo ~1.65 m ~30 kg ~30+ small ~2–4 hr ~$20k / subscription Tendon-driven, deliberately low-force/soft
Boston Dynamics Atlas (electric) ~1.75–1.9 m ~90 kg ~56 (incl. hands) ~30 kg sustained ~4 hr not for sale All-electric custom actuators; extreme range of motion (360° hip/waist/neck joints)
Unitree H1 ~1.8 m ~47 kg ~19 (no hands) ~30 kg rated ~2 hr ~$90k+ QDD joint motors; fast walker/runner
Unitree G1 ~1.27 m ~35 kg ~23–43 small ~2 hr ~$16k+ QDD; aggressively cheap
Apptronik Apollo ~1.73 m ~73 kg ~28+ ~25 kg ~4 hr (swap pack) ~$50k (target) Linear actuators, modular, hot-swap battery
Agility Digit ~1.75 m ~65 kg ~16–20 ~16 kg ~2–4 hr lease/RaaS Bird-like legs (rearward knee), warehouse-tuned
Sanctuary Phoenix ~1.7 m ~70 kg ~20+ (rich hands) ~25 kg undisclosed undisclosed Hydraulic-ish high-DoF hands, teleop data focus

A few honest observations:

  • DoF counts are slippery. Some vendors count hand joints, some don't; some count coupled tendon joints as one DoF, some as several. A "43-DoF G1" and a "19-DoF H1" are not as far apart as they sound once you normalize for hands.
  • Mass spans ~30–90 kg. 1X Neo at ~30 kg made a deliberate choice to be light and weak (safer around people, tendon-driven, lower torque); Atlas electric at ~90 kg made the opposite choice (force and range of motion for spectacular dynamics). Both are defensible; they're solving different problems.
  • Price targets are mostly fiction until volume. Unitree G1's ~$16k is real and shipping (it's a research/education platform, not a labor robot). Optimus's "<$20–30k at scale" is a manufacturing thesis, not a 2026 price.
  • Agility Digit is the outlier worth respecting: it deliberately isn't anthropomorphic in the legs (reversed knees, like an ostrich) and is the furthest along in real paid warehouse deployments precisely because it picked a narrow, structured job.

The honest take: The most commercially advanced humanoid in 2026 is the least "general." Digit makes money moving totes in warehouses because the task is bounded. The robots with the flashiest home demos make the least money. That ordering tells you where the technology actually is.

Degrees of freedom & kinematics

Degrees of freedom (DoF) are the independently actuated joints — the count that sets how many ways the robot can move. A human has roughly 230 DoF if you count everything including the spine and each finger joint; a humanoid robot dramatically simplifies that. For motion planning across all those joints, see the motion planning & kinematics guide.

Typical DoF budget

A capable 2026 humanoid lands around 28–60 actuated DoF. Here's a representative split for a ~30-DoF body (hands counted separately, which is the honest way to do it):

DoF accounting — representative ~30-DoF humanoid (excl. hands)

Each leg:          6 DoF  ×2 = 12   (hip 3, knee 1, ankle 2)
Each arm:          7 DoF  ×2 = 14   (shoulder 3, elbow 1, wrist 3)
Torso/waist:       1–3 DoF        (yaw, sometimes pitch/roll)
Neck/head:         2–3 DoF        (pan, tilt, sometimes roll)
                  ----------
Body total:       ~28–32 DoF

Hands (optional): 6–22 DoF each — often DOUBLES the whole count

The structure is near-universal because it mirrors human kinematics:

  • 6 DoF per leg is the minimum for placing the foot at an arbitrary position and orientation in space — 3 at the hip, 1 at the knee, 2 at the ankle (pitch + roll). Drop the ankle roll and you lose the ability to keep the foot flat on uneven ground.
  • 7 DoF per arm gives a redundant arm: 6 DoF reach any pose, the 7th lets the elbow swing without moving the hand (reconfiguration around obstacles). Cheaper humanoids use 6 DoF arms and accept the loss.
  • Torso yaw matters more than people expect — it dramatically extends reach and lets the robot twist to place a load without stepping.

Why not more DoF?

Every DoF is an actuator: a motor, a gearbox, a driver, an encoder, wiring, mass, cost, and a failure point. The marginal DoF has to earn its place. This is why hands are contentious — going from a 6-DoF gripper-hand to a 22-DoF anthropomorphic hand can add more actuators than the entire rest of the arm, for capability you can't yet reliably control.

Rule of thumb: Count DoF excluding hands when comparing locomotion-and-reach capability, and count hands separately. A vendor quoting "40+ DoF" is almost always front-loading finger joints to inflate the headline.

The actuator problem

If you remember one thing from this guide: the actuator is the hardware problem. Not sensors, not compute — those ride Moore's-law-adjacent curves and are largely commoditized. The actuator is where physics pushes back hardest, and it's the single biggest cost, mass, and capability driver in the machine. Start with the robot actuators guide, the BLDC motors guide, and the gearboxes guide for the fundamentals; here's how they specialize for humanoids.

What a humanoid actuator must do

A humanoid joint actuator has a brutal spec: high peak torque (to lift, to catch a fall), high torque density (because mass at the joint is mass the robot must also carry and accelerate), backdrivability and force control (for safe contact and balance), high bandwidth (to react to disturbances in milliseconds), and decent efficiency (so the battery lasts). No single technology nails all of these, which is why the field is split.

Torque density — the figure of merit

τ/m  = joint torque per actuator mass   [N·m / kg]

A good 2026 humanoid hip/knee actuator:
  peak torque ~150–360 N·m, mass ~1.5–4 kg
  → ~60–120 N·m/kg peak, ~20–50 N·m/kg continuous

Thermal, not torque, is usually the real ceiling:
  continuous τ is limited by I²R heating in the windings,
  peak τ is limited by demagnetization and structure.
  You can hit peak for ~seconds; continuous is what you live on.

The rotary QDD camp

Quasi-direct-drive (QDD) uses a high-torque BLDC motor with a low single-stage gear ratio (typically 6:1 to 10:1). The low ratio means low reflected inertia and friction, which gives you backdrivability and clean proprioceptive force estimation from motor current — no force sensor needed. This is the MIT Cheetah lineage and is what makes Unitree's quadrupeds and humanoids so dynamic.

  • Pros: transparent, backdrivable, great for impacts and balance, force control "for free," mechanically simple, robust.
  • Cons: low ratio means you need a big motor for high torque, which is heavy and draws a lot of current to hold a static load (no mechanical advantage to lean on). Holding a heavy arm extended is thermally expensive.

The linear ball-screw camp

A linear actuator — a BLDC motor driving a ball-screw or roller-screw, pushing a rod that levers the joint — trades transparency for efficiency at high static loads. The screw provides huge mechanical advantage, so holding a load draws little current, and the package can be compact and very high-force.

  • Pros: excellent force density, efficient at holding static loads, compact, naturally high stiffness.
  • Cons: poor backdrivability (the screw resists being driven backward), so force control needs a load cell; the screw and its bearings wear; impact loads go straight into the screw nut.

Optimus's deliberate mix

Tesla's Optimus is the cleanest public example of refusing to pick a side. It reportedly uses both — rotary actuators where backdrivability and range of motion matter, and linear actuators where high static force in a compact envelope matters (notably knees and other high-load joints). Tesla designed its actuators in-house specifically to optimize this mix per-joint, which is a manufacturing and integration bet as much as a control one.

Approach Torque/force density Backdrivable Static-hold efficiency Force sensing Best joints
Rotary QDD (BLDC + 6–10:1) High (rotary) Yes (good) Poor (current-hungry) From motor current Hips, shoulders, ankles, dynamic joints
Rotary high-ratio (harmonic) High, compact No Good Needs torque sensor Wrists, neck, low-speed precision joints
Linear ball/roller-screw Very high (force) No (poor) Excellent Needs load cell Knees, high-load lever joints
Series-elastic (SEA) Moderate Yes Moderate From spring deflection Legs/ankles where impact tolerance matters

The honest take: There is no universal winner. The right answer is per-joint: QDD where you need to feel the world and survive impacts, screws where you need to hold a heavy static load efficiently, harmonic drives where you need compact precision at low speed. A vendor that uses one technology everywhere has optimized for manufacturing simplicity, not performance.

The thermal trap

The most common field failure mode isn't a torque limit — it's heat. Continuous torque is capped by I²R losses heating the windings; exceed it and you cook the motor or trip thermal derating. A humanoid holding a 5 kg object at arm's length can be drawing near-continuous-limit current with the arm not moving at all. This is why static poses, not dynamic motion, often dominate the thermal budget, and why screw drives (which hold cheaply) are attractive for load-bearing joints.

Hands & manipulation hardware

The hand is where humanoids go to die. It is simultaneously the highest-value subsystem (manipulation is the point) and the hardest, most expensive, least mature one. See the end-effectors & grippers guide and the robot sensors guide for the broader landscape; here's the humanoid-specific picture.

Why hands are so hard

A human hand has ~27 DoF, dozens of muscles, thousands of mechanoreceptors, and a control system tuned over a lifetime. It does fine force control, in-hand manipulation, and tactile inference simultaneously. Replicating even a fraction of that inside a ~0.5 kg package the size of a real hand, while routing actuation and sensing, is genuinely at the frontier.

The tradeoffs stack against you: more fingers and joints mean more actuators (and you can't fit motors in the fingers — they're too small), so you move actuation to the forearm and transmit it down. Both transmission methods have costs.

Tendon vs. linkage drives

  • Tendon-driven (cables routed over pulleys, motors in the forearm) — this is how human hands work and how most high-DoF robot hands work (Shadow Hand, many research hands, 1X Neo). Pros: compact fingers, biomimetic, can be lightweight and compliant. Cons: cables stretch, fray, and need tensioning; friction and routing make precise force control hard; maintenance is real.
  • Linkage-driven (rigid four-bar and gear linkages) — motors drive mechanical linkages directly. Pros: stiff, precise, durable, no cable maintenance. Cons: bulkier, fewer independent DoF for the volume, less compliant.

Most production humanoid hands underclaim DoF deliberately — a 6-DoF hand (one actuator per finger plus a thumb opposition) covers a huge fraction of grasps at a fraction of the cost and control burden of a 16–22-DoF hand. The capability-per-dollar curve is brutally diminishing past simple grasping.

Tactile sensing

Vision alone cannot tell you grip force, slip, or contact location when the hand occludes the object. Tactile sensing is essential for dexterous manipulation and is itself immature:

  • Force/torque at the wrist — cheap, coarse, common.
  • Fingertip force sensors — strain gauges or barometric/MEMS sensors per fingertip.
  • High-resolution optical tactile (GelSight-style, where a camera images a deformable gel) — rich contact geometry and slip detection, but bulky and adds a camera per fingertip.

Cost reality

Subsystem Rough share of a humanoid BoM Why
Two dexterous hands 15–30% High DoF, tiny precision actuators, tactile sensing, low-volume
Leg actuators (×2 legs) 20–30% High-torque motors + gearboxes/screws, the most mass
Arm actuators (×2 arms) 10–20% 7 DoF each, moderate torque
Battery pack 5–10% Cells + BMS + thermal
Compute 5–10% AI SoC/GPU + RT controller
Sensors (cameras/IMU/F-T) 5–10% Mostly commoditized
Structure/skin/wiring/assembly 15–25% Frame, covers, harness, labor

The honest take: A pair of genuinely dexterous hands can cost as much as both legs. That's why almost every shipping humanoid runs simplified hands and saves the 20-DoF marvel for the demo reel. If a robot is doing real work in 2026, look at its hands — they're probably grippers wearing finger-shaped covers.

Bipedal locomotion hardware

Bipedal walking is the canonical humanoid party trick, and it is both more solved and less solved than it looks. For the broader legged landscape and where quadrupeds win, see the legged & quadruped robot hardware guide.

The leg

A humanoid leg is typically 6 DoF: 3 at the hip (yaw, roll, pitch), 1 at the knee (pitch), 2 at the ankle (pitch, roll). The hip and knee carry the highest torque demands — a knee actuator on a 70 kg robot may need 150–360 N·m peak to stand up from a squat or absorb a landing. This is exactly where linear screw actuators earn their place: high static-hold force, efficiently.

The ankle is special. Two DoF (pitch + roll) let the foot stay flat on uneven ground and let the robot shift its center of pressure within the foot — the primary fine balance authority. Some designs put the ankle actuators up near the knee and use linkages to keep distal mass (and thus leg inertia) low, which improves swing dynamics. Distal mass is the enemy: every kg at the ankle is a kg the hip must accelerate every step.

Why "solved" walking isn't robust walking

Flat-floor walking with known geometry is a controls exercise that's been demonstrated for years. Robust walking — over debris, slopes, stairs, soft ground, while carrying a variable load and being shoved by a person — is where humanoids still fall. The hardware needs:

  • Fast, backdrivable joints to react to disturbances within milliseconds (QDD or SEA help here).
  • Good foot force sensing to know when and how hard each foot contacts.
  • Whole-body control (WBC) running at high rate to coordinate all ~28 joints to keep the center of mass over a viable support region.

ZMP, WBC, and what the hardware must enable

Classical bipeds used the Zero Moment Point (ZMP) criterion — keep the point where ground-reaction forces produce no horizontal moment inside the support polygon (the foot, or the convex hull of both feet). ZMP gives the flat-footed, knees-bent, slightly robotic gait of older humanoids. It's reliable and conservative.

Modern dynamic humanoids use whole-body control and model-predictive control (MPC), treating the whole robot as a coupled dynamic system and planning ground-reaction forces over a short horizon. This allows toe-off, heel-strike, running, and recovery from large pushes — but it demands hardware that classical methods didn't: torque-controllable joints (not just position), fast force sensing, and the real-time compute to solve the optimization at 100–1000 Hz. See the real-time control systems guide for why that timing budget is unforgiving.

Rule of thumb: If a humanoid walks flat-footed with permanently bent knees, it's running a conservative ZMP-style controller. If it heel-strikes, toes-off, and recovers from a shove, it's running torque-level WBC/MPC — and its joints can do force control. The gait tells you the control stack.

The sensing suite

A humanoid's sensing needs split into two jobs: proprioception (knowing its own body state, for balance and control) and exteroception (perceiving the world, for navigation and manipulation). For the full taxonomy see the robot sensors guide and, for the cameras specifically, the LiDAR & depth cameras guide.

Proprioception (the fast, essential layer)

  • Joint position encoders — one per joint, usually magnetic absolute encoders, feeding the kHz control loop. Non-negotiable.
  • Joint torque sensing — either dedicated torque sensors (harmonic-drive joints) or estimated from motor current (QDD joints). This is what enables force control and compliance.
  • IMU(s) — a 6- or 9-axis inertial measurement unit (often in the torso/pelvis) gives body orientation and angular rate, the backbone of balance. High-end designs run multiple IMUs for redundancy and to estimate limb states.
  • Foot force / contact sensors — load cells or pressure arrays in the soles to detect contact timing and force distribution. Critical for walking; surprisingly often skimped on.

Exteroception (the slow, AI-facing layer)

  • RGB cameras — multiple, for the VLA model's eyes. Figure and Tesla lean heavily on cameras over LiDAR (the Tesla "vision-first" philosophy carried over).
  • Depth — stereo cameras or structured-light/ToF depth in the head and sometimes chest, for obstacle and object geometry. Some humanoids add a head LiDAR for mapping; many skip it to save mass and cost.
  • Hand/wrist cameras — close-range cameras for manipulation, since the head camera is occluded by the robot's own arms during a grasp.
Sensing rate budget (representative)

Joint encoders / IMU:     1–10 kHz   → real-time control loop
Foot force / joint torque: 1 kHz     → balance / WBC
Depth cameras:             30–90 Hz  → perception / mapping
RGB to VLA model:          1–30 Hz   → high-level policy

The control loop is ~1000× faster than the "thinking" loop.
That split is the whole architecture of the machine.

The honest take: Proprioception is mature and cheap; you can buy excellent encoders and IMUs. The hard, expensive, immature sensing is tactile (covered with hands) and the fusion of vision into reliable action. Adding more cameras is easy; making the robot reliably understand what it sees is not.

Power & thermal

Runtime is the constraint that the launch videos quietly omit. A humanoid is a power-hungry machine carrying its own battery, and the physics is unforgiving. See the robot power & batteries guide for the cell-level detail.

The numbers

Power budget — representative 60–70 kg humanoid

Standing / idle (holding pose):   ~150–500 W
Walking (no load):                ~500–1500 W
Manipulation under load / lifting: ~1–3 kW peak
Compute (AI SoC + controllers):    ~100–500 W (constant!)

Battery pack:                      ~1.0–2.3 kWh
→ Runtime: ~1–5 hr depending on duty cycle

Energetics check:
  2 kWh pack / 600 W average draw ≈ 3.3 hr
  2 kWh pack / 1500 W heavy work  ≈ 1.3 hr

Two things stand out. First, compute is a constant tax — a few hundred watts that never stops, even standing still, which is why an idle humanoid still drains. Second, standing is not free: holding a pose draws real current in QDD joints (the thermal trap again), so even "doing nothing" costs watts. Atlas-class robots doing dynamic motion can spike to several kW.

Why runtime is hard to fix

You can't just add battery — every kWh of lithium-ion is ~5–7 kg of mass the robot must then carry and accelerate, which raises every actuator's load, which raises power draw. There's a point of diminishing returns around 2–2.5 kWh for a human-sized robot. The practical answers are:

  • Hot-swappable packs (Apptronik Apollo's approach) — a human or a dock swaps a fresh pack in under a minute, so the robot's duty cycle approaches 24/7 even if a single charge is ~4 hr.
  • Opportunity charging / docking — the robot returns to a charger between tasks.
  • Tethering — viable for fixed industrial cells, useless for mobile work.

Thermal management

Beyond batteries, the actuators and compute generate heat that must go somewhere. Most 2026 humanoids use a mix of passive conduction through the structure, forced-air fans, and (increasingly) liquid cooling loops for the highest-power leg actuators and the AI compute. Thermal derating — the controller throttling torque to protect a hot motor — is a real and under-discussed limit on sustained work.

The honest take: "It walked for the whole demo" usually means ~1–4 hours of mixed activity, not a shift. Anyone promising all-day continuous operation from a single charge in a human-sized package is fighting energy density, and energy density isn't improving fast enough to win that fight in 2026. The realistic model is swap-and-charge, not run-forever.

Onboard compute

A humanoid runs two fundamentally different computers, often physically separate, because their requirements conflict. See the real-time control systems guide for why you cannot run both jobs on one stack.

The split

  • Real-time control layer — runs the joint loops, balance, and whole-body control at 1–10 kHz with hard deadlines. A missed deadline can mean a fall. This runs on microcontrollers (per-joint) and a central real-time SoC or RTOS host, deterministically. It does not run a general-purpose OS for the critical path.
  • AI inference layer — runs the VLA model, perception, and planning at 1–30 Hz, soft real-time, on a GPU/AI SoC. Latency matters but a hiccup degrades behavior rather than dropping the robot.

This is the classic "fast reflexes, slow deliberation" architecture, and it mirrors the sensing-rate split from earlier: the control loop is ~1000× faster than the thinking loop.

The silicon

The AI layer in 2026 commonly runs on NVIDIA Jetson Thor class hardware (high TOPS, automotive/robotics-grade, ~tens to low-hundreds of watts) or custom in-house silicon (Tesla, for instance, leverages its own inference accelerators). The numbers vendors care about:

  • TOPS / FLOPS for VLA inference throughput.
  • Memory bandwidth and capacity — modern VLA models are large; getting them on-device and fast is a real constraint.
  • Power and thermal — every watt of compute is a watt off the battery and heat to reject (see the power section).

The real-time layer is unglamorous by comparison — ARM Cortex-R/M class microcontrollers and a deterministic bus (EtherCAT, CAN-FD, or a custom high-rate link) tying the joints together.

Rule of thumb: If a humanoid's AI compute is on-board (not streamed to a server), it's spending 100–500 W continuously and rejecting that as heat. Cloud-offloading the AI saves power and heat but adds latency and a connectivity dependency that's unacceptable for balance-critical loops — which is why the control layer is always local, no matter what.

The teleoperation reality

This is the section the rest of the industry would prefer you skip. Teleoperation — a human remotely driving the robot, often via a VR headset and hand-tracking gloves or a motion-capture rig — is pervasive in humanoid robotics, and it plays two very different roles.

The legitimate role: data collection

VLA models need demonstrations — thousands of hours of a robot doing the task, with the exact sensor inputs and motor outputs. The cleanest way to generate that data is to have a human teleoperate the actual robot through the task many times. The robot's body experiences the real physics; the human provides the intelligence; the recordings train the policy. This is honest, necessary, and how most current manipulation policies are bootstrapped. Sanctuary, 1X, Figure, and Tesla all run large teleop data operations.

The dishonest role: faking autonomy

The same teleop rig, pointed at a camera, produces a video of a robot "autonomously" folding laundry or fetching a drink — when in fact a person in the next room is driving every motion. Sometimes it's disclosed in fine print; often it isn't. Other times the demo is genuinely autonomous but is a narrow policy that only works on that exact scene, lighting, and object set, and would fail if you moved a cup 10 cm.

How to read a humanoid demo critically

The honest take — the teleop tell-sheet:

  • Smooth, confident, human-paced manipulation with no hesitation? Likely teleoperated. Autonomous policies in 2026 are jerky, slow, and pause to "think."
  • A single uncut take of a long task chain? Strong autonomy signal — or strong teleop signal. Look closer.
  • No mention of autonomy in the caption? Assume teleop. Companies that achieve autonomy say so loudly and specifically.
  • The robot recovers from an unexpected perturbation (someone moves an object mid-task)? That's hard to fake and a real autonomy signal.
  • Cuts between every action? Each segment may be a separate take, retried until it worked.
  • "X% autonomous" or "speed 1.0x" captions? Companies started adding these because the credibility problem got bad enough to address. Reward the disclosure; don't assume its absence means autonomy.
  • Same scene, same objects, same lighting every time? Probably a scene-specific policy, not generalization.

None of this means teleop is bad — it's a vital tool. It means you should never infer autonomy from a demo without explicit, specific disclosure. The gap between "the robot can physically do this" and "the robot decided to do this by itself" is the entire unsolved problem, and demos are designed to blur it.

Manufacturing & cost

The thesis that makes humanoids an investable category is cost at volume: that a useful humanoid can be built for under $50k, and eventually under $20k, putting it below the multi-year cost of the human labor it might augment. Whether that's true is a manufacturing question, and manufacturing is where Tesla and the automakers think they have an edge.

Where the money goes

From the BoM table earlier, actuators and hands dominate — together commonly 50–70% of hardware cost. This is the opposite of consumer electronics, where silicon dominates. A humanoid is an electromechanical product, so its cost curve is set by motors, gearboxes, screws, bearings, and precision assembly — not by chips, which are comparatively cheap and commoditized.

The levers to <$50k

  • Vertical integration of actuators. Buying off-the-shelf harmonic drives and servo motors is expensive at low volume. Designing your own actuators (Tesla, Figure, Boston Dynamics) lets you optimize per-joint, remove margin stacking, and design for high-volume production. This is the single biggest cost lever.
  • Design for manufacture (DfM). Reducing part count, using castings/stampings over machined parts, standardizing actuators across joints (one or two actuator "sizes" reused everywhere), and minimizing fasteners and wiring.
  • Volume. Most of the <$20k story is amortization — tooling, automation, and supply-chain scale that only pay off at tens of thousands of units per year. At hundreds of units, every humanoid is effectively hand-built and costs 5–10× the target.
  • Simplify the hard parts. The fastest way to cut the BoM is to ship simpler hands and fewer DoF. Much of the price spread between robots is a hand-complexity decision.

What does not drive cost down

Exotic materials and clever lightweighting are mostly a distraction at this stage — carbon fiber and titanium add cost, not remove it. The robots winning on cost (Unitree) win through aggressive supply-chain leverage and accepting lower-end performance, not materials science.

The honest take: The <$20k humanoid is a volume claim, not a technology claim. The technology to build a $20k humanoid exists today; the volume to make it cost $20k does not. Until someone is shipping tens of thousands per year, treat sub-$30k price tags as roadmap, not reality. Unitree's ~$16k G1 is real, but it's a lightweight research platform, not a 25 kg-payload labor robot — different product, different cost basis.

The 2026→2027 outlook

Putting the subsystems together, here's a defensible read on where this goes near-term.

What's real

  • The hardware works. Walking, balancing, two-arm coordination, basic grasping, and dynamic recovery are demonstrated and reproducible across multiple vendors. The body is no longer the blocker.
  • Structured commercial deployment. Warehouses, fixed manufacturing cells, and other bounded environments will see real, paid humanoid (and humanoid-adjacent) work expand. Agility Digit is the template: pick a narrow job, nail it, scale it.
  • Teleop-driven data flywheels. The companies collecting the most real-robot demonstration data are building a genuine moat, because that data trains the policies that close the autonomy gap.

What's hype

  • The general home robot. A humanoid that autonomously handles arbitrary household tasks reliably is not a 2026–2027 product. The unstructured home is the hardest environment and the furthest from being solved.
  • Sub-$20k price tags at useful capability. Roadmap, not reality, until volume manufacturing exists.
  • Most "autonomous" manipulation reels. See the teleop section. Discount accordingly.

Where the bottlenecks are

The bottleneck has moved off the actuator and onto software and data:

  1. Generalization — policies that work outside their training distribution. This is the big one.
  2. Manipulation reliability — dexterous, robust grasping of arbitrary objects, which needs better hands and better tactile-informed policies.
  3. Data — enough high-quality real-robot demonstrations to train general policies, which is why teleop data ops are a strategic asset.
  4. Cost-at-volume — a manufacturing and capital problem, downstream of demand that depends on (1)–(3).

The honest take for 2026→2027: Expect impressive, narrowing-scope commercial deployments and continued spectacular demos. Expect the autonomy gap to close gradually, not in a single breakthrough. The companies that win will be the ones quietly grinding on data and reliability in boring structured environments — not the ones with the best laundry-folding video. The hardware race is largely over; the data-and-software race is just getting started.

Frequently asked questions

How many degrees of freedom does a typical humanoid robot have? Most capable 2026 humanoids have 28–60 actuated DoF. The body (legs, arms, torso, neck) is usually ~28–32 DoF; hands can add anywhere from 12 (two simple 6-DoF hands) to 40+ (two anthropomorphic hands), which is why total counts vary so widely. When comparing robots, separate body DoF from hand DoF — vendors inflate headline numbers with finger joints.

What is the hardest part of building a humanoid robot? The hardware answer is actuators (torque density, efficiency, backdrivability, thermal limits) and hands (dexterity in a tiny, expensive package). The system answer is autonomy — letting the robot reliably decide and execute tasks in unstructured environments. In 2026 the body is largely solved; the brain and the data to train it are the bottleneck.

Are humanoid robot demos real or teleoperated? Many are teleoperated, either openly (as legitimate data collection) or misleadingly (faking autonomy). Smooth, fast, confident manipulation with no hesitation is a teleop tell; jerky, slow, pausing behavior and recovery from unexpected perturbations are autonomy signals. Never infer autonomy without explicit, specific disclosure.

Why rotary vs. linear actuators in humanoids? Rotary quasi-direct-drive (QDD) actuators are backdrivable and give force control "for free" from motor current — great for dynamic, contact-rich joints (hips, ankles, shoulders). Linear ball-screw actuators give very high force density and hold static loads efficiently — great for high-load joints like knees. Tesla's Optimus deliberately uses both, choosing per-joint. There's no single winner.

How long can a humanoid robot run on one charge? Typically 1–5 hours, depending on duty cycle, from a ~1–2.3 kWh battery. Standing draws a few hundred watts (including constant compute), walking ~0.5–1.5 kW, and heavy manipulation can spike to several kW. Continuous all-day operation realistically requires hot-swappable battery packs or docking, not a single charge.

How much do humanoid robots cost in 2026? Research platforms like Unitree G1 start around $16k; capable labor-oriented humanoids are far more (Unitree H1 ~$90k+; others undisclosed). Targets of <$50k and eventually <$20k are volume manufacturing claims that depend on producing tens of thousands of units per year — they are roadmap, not 2026 pricing for a high-payload robot.

What sensors does a humanoid robot use? Two layers. Proprioception (fast, essential): joint encoders, joint torque sensing or motor-current estimation, one or more IMUs, and foot force/contact sensors. Exteroception (for AI): multiple RGB cameras, depth (stereo/ToF, sometimes head LiDAR), and wrist/hand cameras for manipulation. Proprioception is mature and cheap; tactile sensing and vision-to-action fusion are the hard, immature parts.

Why are robot hands so difficult and expensive? You can't fit motors in human-sized fingers, so actuation moves to the forearm and transmits via tendons (compact but maintenance-heavy) or linkages (durable but bulky). Add tactile sensing, high DoF, and low production volume, and a pair of dexterous hands can cost as much as both legs. Most shipping humanoids use simplified hands precisely because the cost-and-control burden of full dexterity isn't yet worth it.

Is bipedal walking a solved problem? Flat-floor walking is essentially solved and has been for years. Robust walking — over debris, slopes, and stairs, while carrying a load and resisting pushes — is not. It requires torque-controllable joints, fast foot-force sensing, and whole-body/model-predictive control running at high rate. If a robot heel-strikes and recovers from shoves, it's running modern torque-level control; if it walks flat-footed with bent knees, it's running a conservative ZMP-style controller.

What compute does a humanoid need? Two computers. A real-time control layer (1–10 kHz, hard deadlines, on MCUs/RTOS) for balance and joint control, and an AI inference layer (1–30 Hz, soft real-time, on a GPU/SoC like NVIDIA Jetson Thor or custom silicon) for the VLA model and planning. The control loop runs ~1000× faster than the thinking loop, and the AI layer draws 100–500 W continuously.

Which humanoid robot is the most advanced? "Advanced" depends on the axis. Boston Dynamics Atlas (electric) leads on dynamic athleticism and range of motion; Tesla Optimus and Figure lead on the manufacturing-and-AI integration thesis; Unitree leads on cost and accessibility. Commercially, Agility Digit is furthest along in paid real-world deployment precisely because it targets a narrow, structured warehouse job rather than general capability.

Will humanoids replace human workers in 2026–2027? Not broadly. Expect them in bounded, structured commercial settings (warehouses, fixed manufacturing cells) where the task is well-defined, and slow progress in open-ended environments like homes. The bottleneck is autonomy and reliability, not bodies. Treat near-term deployment as task-specific augmentation, not general labor replacement.

Related guides