Robo2u
All posts
robot-simulationdigital-twingazeboisaac-simmujocosim-to-realphysics-enginedomain-randomizationguide

Robot Simulation & Digital Twins: The Ultimate Guide

A working roboticist's deep guide to robot simulation and digital twins in 2026: physics engines, Gazebo vs Isaac Sim vs MuJoCo vs PyBullet, GPU-parallel sim, sensor models, the reality gap, sim-to-real, and how to choose.

By Robo2u Editorial · 39 min read

Every robot you have ever shipped was simulated first, whether you admit it or not. The cheap version is a spreadsheet of torque-speed curves and a back-of-the-envelope battery estimate. The expensive version is a multi-body dynamics engine running a contact solver at 1 kHz, feeding synthetic lidar returns and camera frames into the exact same ROS 2 stack that will run on the robot. The gap between those two is the subject of this guide.

This is about robot simulation — modeling a robot and its environment in software well enough to design, test, and train on it — and its overhyped cousin, the digital twin. We will start from why you simulate at all, go down into the physics engines (rigid-body dynamics, the contact problem, solvers, timestep), compare the simulators engineers actually run (Gazebo, NVIDIA Isaac Sim and Isaac Lab, MuJoCo, PyBullet, Webots, CoppeliaSim), look at fidelity-versus-speed and the real-time factor, work through sensor and rendering simulation, then the thing that changed robot learning — GPU-accelerated massively-parallel sim — and finally the hard part: the reality gap, sim-to-real, what a digital twin actually is versus what the marketing says, and when the simulator is quietly lying to you.

The take: in 2026 simulation is not optional and it is not one tool. You will run at least two simulators — a high-throughput GPU sim (Isaac Lab or MuJoCo) to train policies on millions of trajectories, and a higher-fidelity, ROS-native sim (Gazebo or Isaac Sim) to integrate and regression-test the full software stack before it touches hardware. The single biggest source of sim-to-real failure is not the renderer and not the robot model; it is contact and friction, because that is the one part of the physics every engine approximates differently and none gets exactly right. Spend your fidelity budget there. And stop calling an offline simulation a "digital twin" — a twin is synchronized with a real asset in real time, and if yours is not, it is just a sim with a nicer dashboard.

Companion reading: reinforcement learning for robotics, motion planning & kinematics, ROS 2, legged & quadruped robot hardware, humanoid robot hardware, and LiDAR & depth cameras.

Table of contents

  1. Key takeaways
  2. Why simulate at all
  3. Physics engines: rigid-body dynamics
  4. The contact problem (why sims disagree)
  5. The major simulators compared
  6. Fidelity vs speed and the real-time factor
  7. Rendering and sensor simulation
  8. GPU-accelerated massively-parallel sim
  9. The reality gap and sim-to-real
  10. Digital twins: what the word actually means
  11. When the simulation lies
  12. Validation and CI in simulation
  13. Selecting a simulation stack
  14. Frequently asked questions

Why simulate at all

Before the tools, the motivation. There are five reasons to simulate, and they are not equally important for every team.

Cost. Robots are expensive and fragile. A 7-kg quadruped that falls off a ledge during a controller bug is a 5,000-USD repair and a week of downtime. In sim that same fall costs you a log file. The asymmetry is enormous on early-stage development where the controller will be buggy.

Safety. Some failures you cannot afford to discover on hardware: a 30 kg industrial arm swinging through where a person stands, a humanoid losing balance near a workbench, a mobile robot at 2 m/s testing its emergency stop. You validate the dangerous envelope in sim first, then narrow the hardware test to the cases that passed.

Scale. You cannot run 1,000 robots in a lab. You can run 1,000 — or 4,096, or 16,384 — simulated robots on one GPU. Scale matters for two things: statistical coverage of edge cases (run the docking maneuver 10,000 times with randomized start poses) and, more importantly, for learning.

Reinforcement-learning data. This is the reason simulation went from "useful" to "indispensable" in the last several years. RL needs millions to billions of environment steps. You cannot collect that on hardware — it would take years and destroy the robot. GPU sim generates it in hours. See reinforcement learning for robotics for the policy side; this guide is the environment side.

Regression testing. Once a system works, the job becomes keeping it working as the code changes. A simulation gives you a repeatable environment to re-run the same scenarios on every commit. This is the least glamorous reason and arguably the highest-value one for a shipping product.

Rule of thumb: if a test is dangerous, slow to set up, hard to repeat, or needs to run thousands of times, it belongs in simulation. If it depends on the exact physics your sim approximates worst — fine contact, deformables, real sensor noise — keep a hardware version too.

What simulation does not do is replace hardware testing. It de-risks it, front-loads it, and amplifies it. The teams that get burned are the ones who treat a green sim run as a ship signal. Sim tells you the logic is right and the gross dynamics are plausible. Hardware tells you the truth.

Physics engines: rigid-body dynamics

A physics engine integrates the equations of motion of a system of bodies forward in time. For robots that system is almost always articulated rigid bodies — links connected by joints — plus contacts with the ground and objects.

The core loop, every timestep dt:

  1. Compute forces and torques (gravity, actuators, springs, external).
  2. Resolve constraints (joints keep links connected; contacts keep bodies from interpenetrating).
  3. Integrate accelerations to velocities and velocities to positions.

The hard part is step 2. Joints are equality constraints — relatively easy. Contacts are inequality constraints (bodies may push apart but not pull together) plus friction (which is itself a constraint coupling normal and tangential forces). That makes the dynamics non-smooth: velocities jump discontinuously at impact, and the system switches between sticking and sliding.

Two broad formulations:

  • Maximal coordinates. Each body has 6 degrees of freedom; joints are enforced as constraints. Simple to implement, used by ODE and Bullet historically. Drift in the joint constraints is a real issue and gets stabilized with hacks (Baumgarte stabilization, error-reduction parameters).
  • Generalized (reduced) coordinates. The system state is the joint angles directly; the kinematic tree is built in, so joints can never drift apart. MuJoCo, DART, and PhysX's articulation system use this. It is more accurate for articulated robots and is why MuJoCo feels so clean on arms and legs.

The solver that resolves the constraints is where engines diverge:

  • Projected Gauss-Seidel (PGS) — iterative, fast, the classic ODE/Bullet approach. Cheap per iteration but converges slowly; under-iterated PGS makes contacts feel spongy and joints slightly loose.
  • Sequential impulse — Bullet's contact solver; impulse-based, robust, fast, the game-physics standard.
  • TGS (Temporal Gauss-Seidel) — PhysX's improved solver (sub-stepping the constraint solve), much better at stiff stacks and high mass ratios.
  • Convex optimization / Newton solvers — MuJoCo solves contact as a convex optimization problem each step, which is why it is stable at large timesteps and high stiffness where PGS would explode.

Here is the comparison engineers actually need.

Engine Coordinates Contact solver Strengths Weaknesses Used in
ODE Maximal PGS (LCP) Mature, stable for simple scenes, ROS legacy Slow, spongy contacts, dated Gazebo (default historically)
Bullet Maximal (+ Featherstone) Sequential impulse / PGS Fast, broad adoption, soft-body option Contact stiffness tuning is fiddly PyBullet, Gazebo, Isaac (early)
PhysX 5 Generalized articulations TGS GPU-accelerated, stiff stacks, scales NVIDIA-centric, less transparent Isaac Sim / Isaac Lab
MuJoCo Generalized Convex (Newton/PGS option) Best-in-class articulated accuracy & stability, large dt, soft contacts Primitive geoms preferred, smaller sensor suite DeepMind MuJoCo, MJX
DART Generalized LCP / Featherstone Accurate analytical dynamics, research-grade Smaller community, slower Gazebo (optional), research

Opinion with reason: for articulated-robot dynamics — arms, legs, humanoids — MuJoCo and PhysX articulations are the right choice over ODE/Bullet, because generalized coordinates eliminate joint drift and the modern solvers stay stable at the large stiffness and mass ratios real robots have (a 0.1 kg foot pushing a 30 kg torso). ODE's age shows exactly here.

The integration scheme matters too. Explicit Euler is cheap and unstable for stiff systems; semi-implicit (symplectic) Euler is the common default; implicit / Runge-Kutta variants buy stability at the cost of per-step compute. MuJoCo's implicit integration is a big part of why it tolerates a 5 ms step where Bullet wants 1 ms.

The contact problem (why sims disagree)

If you take one idea from this guide, take this: simulators agree on flight and disagree on contact. Throw a ball with no spin and every engine gives nearly the same parabola. Drop a stack of blocks, push a box across a floor, or close a gripper on a cylinder, and the engines diverge — sometimes the box slides differently, sometimes the stack topples in one engine and stands in another.

Why? Three approximations that every engine makes differently.

1. Contact detection and penetration. Engines detect contact by collision geometry, then must decide what to do about the small interpenetration that numerically always occurs. Penalty methods model contact as a stiff spring-damper (push proportional to penetration depth) — simple but requires tiny timesteps or it oscillates. Constraint methods solve for the impulse that exactly prevents penetration (an LCP or convex program) — stable but expensive and approximate when under-iterated. The choice changes how "hard" a floor feels.

2. The friction cone. Coulomb friction says the tangential force magnitude is bounded by μ times the normal force, in any tangential direction — a cone. Solving the true cone is a nonlinear problem, so most engines linearize it into a pyramid (4 or 8 facets). A pyramidized cone makes friction slightly anisotropic: a box pushed at 45° behaves differently from one pushed along an axis. MuJoCo can use an elliptic (true-cone) model, which is one reason its sliding behaves better.

3. Restitution and simultaneous contacts. Multiple contacts resolved at once (a box on a floor has 4 corners) are order-dependent in iterative solvers, so the result depends on solver iterations and ordering. Bouncing (restitution) is even less consistent across engines.

The practical consequence:

Same robot, same gripper, same 50 mm cylinder, μ = 0.6:
  Engine A: grasp holds, object stays put
  Engine B: object slowly rotates out of the fingers
  Engine C: object squirts out at contact (penetration recovery impulse)

None is "wrong" — they make different contact approximations.
The policy you train on B may fail on hardware AND on A.

This is why contact-rich manipulation has the worst sim-to-real transfer of any robotics task, and why legged locomotion — which is also contact-rich but more forgiving because feet are points and gaits self-stabilize — transfers better than you'd expect. It is also why you should never tune a grasp controller to a single engine's contact behavior and call it done.

Rule: treat friction coefficients, contact stiffness, and restitution as uncertain parameters to randomize, not as physical constants you can measure once. The number you measure on one surface at one speed is not the number the solver wants.

The major simulators compared

Six tools cover almost the entire field. Here is the honest comparison, then notes on each.

Simulator Physics Rendering GPU parallel ROS 2 Best at Weakness
Gazebo (Harmonic/Ionic) DART (default), Bullet, ODE OGRE 2 (raster) No (multi-process) First-class ROS integration, system testing, sensors Not built for massive parallel RL; rendering is functional, not photoreal
Isaac Sim PhysX 5 RTX ray-tracing Yes Bridge Photoreal sensors, digital twins, USD pipelines Heavy, NVIDIA RTX GPU required, steep setup
Isaac Lab PhysX 5 (GPU) RTX (optional) Yes (thousands) Via Isaac Sim GPU-parallel RL training at scale Learning-focused; not a general integration sim
MuJoCo / MJX MuJoCo (CPU + GPU via MJX) Built-in (basic) + MuJoCo-Warp Yes (MJX/JAX) Community Articulated dynamics accuracy, fast RL, research Sparse sensor/rendering suite; primitive geoms preferred
PyBullet Bullet OpenGL / TinyRenderer Limited Community Fast prototyping, free, hackable, huge tutorial base Aging, contact tuning fiddly, no massive parallel
Webots Fork of ODE (custom) OpenGL No Bridge Education, batteries-included robot library, cross-platform Smaller ecosystem, less used in industry RL
CoppeliaSim (V-REP) ODE/Bullet/Vortex/Newton (4 engines) OpenGL No Bridge Swappable physics, scripting, sensors, prototyping Closed-core, smaller modern community

Gazebo (formerly Ignition), versions Harmonic and Ionic. The default ROS simulator. If your robot runs ROS 2 and you want to test the whole stack — controllers, nav, perception, the lot — against simulated sensors and physics, this is the tool. It is modular (separate physics, rendering, sensor, GUI processes), DART is the default physics, and the sensor simulation is solid. It is not the tool for training a policy on 4,096 parallel environments; it was never designed for that. Strength: realism of the software interface. Weakness: throughput and photorealism.

NVIDIA Isaac Sim. Built on Omniverse and USD (Universal Scene Description), PhysX 5 physics, RTX ray-traced rendering. This is the high-fidelity end: photoreal cameras, physically-based materials, accurate-ish sensor models, and a real path to a digital twin of a physical cell because USD is a proper scene-description and data-interchange format. It is heavy — you need an RTX GPU and patience for setup — but nothing else gives you sensor realism at this level with this much physics behind it.

NVIDIA Isaac Lab (the successor to Isaac Gym and the older Orbit/Isaac Sim RL workflows). This is the GPU-parallel learning framework that sits on Isaac Sim's physics. It runs thousands of environments on a single GPU and is the production path for training locomotion and manipulation policies. Think of Isaac Sim as the simulator and Isaac Lab as the training harness on top of it.

MuJoCo (DeepMind, open-source since 2021/2022). The connoisseur's choice for articulated-robot dynamics: generalized coordinates, a convex contact solver, stable at large timesteps. MJX is the JAX reimplementation that runs on GPU/TPU for massively-parallel RL, and MuJoCo Playground is the curated suite of RL environments on top. If you are doing locomotion or whole-body control research, MuJoCo's dynamics fidelity per unit of compute is hard to beat. The trade is a thinner sensor and rendering story.

PyBullet. The Python binding to Bullet. Free, fast enough, runs anywhere, and has the largest collection of tutorials and research code of any of these. It is the right tool for a quick prototype, a class, or reproducing a paper. It is showing its age against the GPU sims for training and against Isaac Sim for fidelity, but for "I need a robot in a sim by tonight," it still wins.

Webots (open-source, Cyberbotics). Batteries-included: a big library of robot and sensor models, cross-platform, friendly. Heavily used in education and competitions. Custom physics (ODE-derived). A solid all-rounder; less common in industrial RL pipelines.

CoppeliaSim (formerly V-REP). Notable for letting you swap among four physics engines (ODE, Bullet, Vortex, Newton) in the same scene, strong scripting, good sensor models. A capable prototyping and education tool with a smaller modern community than the others.

Opinion with reason: most serious 2026 programs run two of these — a GPU sim (Isaac Lab or MuJoCo/MJX) to train, and a ROS-native sim (Gazebo, or Isaac Sim if you need fidelity) to integrate and regression-test. One tool optimized for throughput and one optimized for stack realism. Trying to do both jobs in one simulator is where teams waste months.

Fidelity vs speed and the real-time factor

Every simulation choice is a trade between fidelity and speed, and the single number that captures it is the real-time factor.

RTF = simulated_time / wall_clock_time

RTF = 1.0  → sim runs at real speed (1 sim-second per wall-second)
RTF = 10   → 10x faster than reality (great for batch testing)
RTF = 0.1  → 10x slower than reality (heavy contact / sensors)

Computing it from the timestep and per-step cost:

Let dt        = physics timestep        (e.g. 0.001 s = 1 kHz)
    t_step    = wall time per step       (e.g. 0.0002 s = 200 µs)

steps_per_sim_second = 1 / dt            = 1000 steps
wall_time_per_sim_sec = steps * t_step   = 1000 * 200e-6 = 0.2 s
RTF = 1 / 0.2 = 5.0   → 5x real-time on one CPU core

Levers that change t_step (and thus RTF):

  • Timestep dt. Halving dt doubles steps per sim-second → halves RTF. But too large a dt and stiff contacts go unstable. This is the central tension.
  • Solver iterations. More PGS iterations = more accurate contacts = slower. Fewer = spongy but fast.
  • Collision complexity. Convex primitives (box, sphere, capsule) are cheap; full triangle meshes are expensive. Decompose meshes into convex hulls.
  • Sensor rendering. A 1080p RTX camera at 30 Hz can dominate the entire step budget. Lidar ray-casts scale with beam count.
  • Number of bodies and contacts. Contact count drives solver cost super-linearly in bad cases.

A useful mental model of the fidelity-speed spectrum:

Use case Typical dt Fidelity priority Target RTF Tool
RL training (parallel) 4–10 ms (substepped) Throughput, "good enough" contact thousands (aggregate) Isaac Lab, MJX
Controller-in-the-loop 1 ms Dynamics + actuator model ~1 (real-time) MuJoCo, Gazebo
Full-stack integration 1–4 ms Sensor + ROS interface realism 0.3–2 Gazebo, Isaac Sim
Photoreal perception 1–4 ms Rendering / sensor realism 0.05–0.5 Isaac Sim
Contact-rich manipulation 0.5–2 ms Contact/friction fidelity 0.1–1 MuJoCo, Isaac Sim

Note the aggregate RTF for parallel training: a single environment might run at RTF 2, but 4,096 of them in lockstep on one GPU produce an aggregate throughput equivalent to thousands of times real-time. That aggregate number is what makes RL tractable, and it is the subject of the next-but-one section.

Rule: real-time (RTF ≈ 1) only matters when a human or real hardware is in the loop. For batch testing run as fast as you can; for training run as parallel as you can; for hardware-in-the-loop you are pinned to RTF = 1 and must drop fidelity to hit it.

Rendering and sensor simulation

A robot does not perceive ground-truth state; it perceives sensors. If your sim hands the policy perfect joint angles and noise-free depth, you have trained on a robot that does not exist. Sensor simulation is a fidelity axis entirely separate from dynamics, and for perception-driven robots it is the more important one.

Cameras. Two rendering paths. Rasterization (OGRE in Gazebo, OpenGL in PyBullet/Webots) is fast and fine for geometry and basic appearance. Ray-tracing (Isaac Sim's RTX) gives physically-based lighting, reflections, soft shadows, and global illumination — which matters when your perception net was trained to expect realistic light. The gap between a rasterized and a ray-traced frame is exactly the gap a vision model notices.

Depth cameras. Easy to simulate naively (read the depth buffer) and hard to simulate well. Real depth sensors have characteristic artifacts: missing returns on dark/shiny/transparent surfaces, edge fattening, quantization, and — for stereo and structured light — failure in low texture. A depth image without those artifacts is too clean and will not transfer. See LiDAR & depth cameras for the real sensor physics you are trying to mimic.

Lidar. Simulated by ray-casting against the collision/visual geometry: one ray per beam per angular step, returning range. Good lidar sim adds intensity (material- and angle-dependent return strength), dropout (no return on absorptive or specular surfaces), range noise (a few mm to cm), and motion distortion for spinning sensors. GPU ray-casting (Isaac Sim's RTX lidar) makes high-beam-count sensors affordable; CPU ray-casting a 128-beam lidar at 20 Hz is a real cost in Gazebo.

IMU. The cheapest sensor to simulate badly and a common transfer killer. A real IMU has bias (slowly drifting offset), random walk, white noise, scale-factor error, and misalignment. Integrate a noise-free simulated IMU and your state estimator looks heroic; feed it a properly modeled one and you discover your filter tuning was fantasy. Model bias and noise, and randomize them.

Contact and force/torque sensors. As accurate as the contact solver, which — per the contact section — means treat them with suspicion for absolute values and trust them more for events (contact made/broken) than magnitudes.

A compact view of what to model:

Sensor Cheap to fake Must model for transfer
RGB camera Geometry, color PBR lighting, exposure, motion blur, lens distortion, sensor noise
Depth Depth buffer Dropouts on shiny/dark/clear, edge artifacts, quantization
Lidar Range via ray-cast Intensity, dropout, range noise, motion distortion
IMU Ground-truth accel/gyro Bias, random walk, white noise, scale/misalignment
Wheel encoder Joint angle Quantization, slip, backlash
Force/torque Solver contact force Solver-dependent magnitudes — trust events over values

Opinion with reason: for perception-driven robots, spend your fidelity budget on sensor noise models before renderer photorealism. A perfectly ray-traced but noise-free depth image transfers worse than a rasterized one with realistic dropouts, because the policy learns to trust depth edges that the real sensor never produces. Noise models are cheap and high-leverage; photorealism is expensive and only pays off for appearance-based perception.

GPU-accelerated massively-parallel sim

This is the development that changed robot learning, so it gets its own section.

The old way: one simulation per CPU core. A workstation with 32 cores runs 32 environments. To collect the ~10⁹ environment steps a locomotion policy needs, you rented a CPU cluster and waited days to weeks. Robot RL was a big-lab activity because the data collection was a big-lab cost.

The new way (Isaac Gym → Isaac Lab, and MuJoCo MJX): put thousands of independent environments on a single GPU, stepping them all in lockstep as batched tensor operations, with observations and actions never leaving GPU memory. The simulation, the neural-network policy, and the gradient updates all live on the same device. No CPU-GPU transfer bottleneck.

The throughput math is the whole story:

Single CPU env:
  ~1,000–5,000 steps/s per core
  32 cores ≈ 100k steps/s

GPU parallel (one modern data-center / high-end GPU):
  N = 4,096 environments
  per-env step rate ≈ 5,000 steps/s   (substepped, simplified contact)
  aggregate ≈ N * 5,000 ≈ 20,000,000 steps/s

→ ~200x the CPU cluster, on one GPU.
Wall-clock to collect 1e9 steps:
  CPU cluster (100k steps/s):   1e9 / 1e5  = 10,000 s ≈ 2.8 hours ... per node
                                 (and you needed many nodes / days end-to-end)
  GPU parallel (2e7 steps/s):   1e9 / 2e7  = 50 s

Quadruped locomotion that took days now trains in minutes-to-hours.

That collapse — days to hours — is why the 2020s wave of legged robots (and now humanoids) learned to walk, run, and recover in simulation. The famous ANYmal and quadruped results, and the locomotion stacks behind today's commercial quads and humanoids, were trained this way: thousands of parallel environments, heavy domain randomization, then zero-shot transfer to hardware. See legged & quadruped robot hardware and humanoid robot hardware for the machines, and reinforcement learning for robotics for the algorithms that consume this firehose of data.

The catch: GPU sim trades contact fidelity for throughput. To run thousands of environments fast you simplify collision geometry, substep the solver, and accept softer contacts. That is fine for locomotion (gaits are robust) and acceptable for many manipulation tasks with enough domain randomization, but it is not the tool for validating a delicate contact interaction. Train on the GPU sim, then validate the trickiest contacts on a higher-fidelity sim or hardware.

Rule: use GPU-parallel sim to train (throughput is king, fidelity is "good enough + randomization"); use a higher-fidelity sim to validate the contact-critical cases the fast sim glosses over. They are different jobs.

The reality gap and sim-to-real

The reality gap is the difference between your simulation and the real world. A policy or controller that works in sim and fails on hardware fell into the gap. Closing it is the central engineering problem of simulation-based development.

Where the gap actually lives — ranked by how often it bites:

  1. Contact and friction (the contact section — this is #1 for a reason).
  2. Actuator dynamics. Real motors have torque limits, current limits, electrical and mechanical lag, gearbox backlash, and friction. A sim that commands ideal torque instantly is modeling a motor that does not exist. Model the actuator (a first-order lag plus torque saturation is a cheap, high-value start).
  3. Latency. Sensing-to-actuation delay in sim is often zero; on hardware it is 1–20 ms through the stack. A controller tuned with zero latency can be unstable with real latency.
  4. Compliance and flexibility. Real links flex, real joints have series elasticity, real cables tug. Rigid-body sim assumes none of it.
  5. Sensor noise and artifacts (the sensor section).
  6. Mass and inertia errors. Your CAD-derived inertia is wrong by some percent; the real robot's mass distribution shifted when someone added a cable harness.

Three families of technique close the gap, and mature programs use all three.

System identification (sysID). Make the sim match this robot by measuring real parameters and fitting the model: run the real actuator through a chirp, fit the motor model; measure the real friction and inertia; calibrate sensor noise. SysID narrows the gap by making the sim center on reality. It is necessary but never sufficient — you cannot measure everything, and parameters drift.

Domain randomization (DR). Instead of one precise sim, train across a distribution of sims: randomize masses (±10–30%), friction coefficients (e.g. 0.4–1.2), actuator gains, latencies (0–20 ms), sensor noise, and — for vision — textures, lighting, and camera pose. The policy that survives all of them treats the real world as just one more sample from the training distribution. DR is the workhorse of modern sim-to-real and the reason zero-shot transfer works at all.

Dynamics randomization is DR applied specifically to the physics parameters (mass, friction, damping, latency) as opposed to the visuals. Visual domain randomization randomizes appearance so a vision policy ignores texture and lighting it will never see again. Both matter; which dominates depends on whether your policy is proprioceptive (legs) or perceptive (vision-based manipulation).

Domain adaptation. When randomization alone leaves a gap, adapt: fine-tune on a little real data, learn a model that maps sim observations to real ones (or vice versa), or use online system identification where the policy infers the real dynamics parameters from a short history and adjusts. "Rapid motor adaptation" and similar techniques — estimate the environment's latent parameters on the fly — are how the best legged policies handle terrain and payloads they never saw in training.

The sim-to-real recipe that actually works in 2026:

  1. sysID the big things   → center the sim on the real robot
  2. model the actuator     → lag + torque/current limits + backlash
  3. add latency            → match the real sensing→actuation delay
  4. domain-randomize wide  → mass, friction, gains, latency, noise, visuals
  5. train at scale         → GPU parallel, millions–billions of steps
  6. adapt online (optional)→ infer latent dynamics, adjust on hardware
  7. validate on hardware   → narrow the gap on the cases that still fail

Opinion with reason: if you can only do two things, do actuator modeling and wide domain randomization. Actuator modeling fixes the most common single cause of "works in sim, falls over on hardware," and wide DR buys robustness to everything you failed to model. Photorealistic rendering is a distant third for anything that isn't vision-dominated.

Digital twins: what the word actually means

"Digital twin" is the most abused term in the field, so let's be precise.

A digital twin is a virtual model of a specific physical asset that is kept synchronized with that asset in real time via a live data link. The defining property is the synchronization: telemetry flows from the physical robot/cell into the model, and (often) commands or predictions flow back. The twin reflects the current state of that one machine — its wear, its calibration, its current payload — not a generic model of its type.

Contrast with a plain simulation: a model of a robot or cell used offline for design, testing, or training. It might be extremely detailed. It is not a twin, because it is not synchronized with a specific live asset.

The useful distinction is the data link:

Offline simulation Digital twin
Tied to a specific physical asset No (a model of a type) Yes (a model of that unit)
Live data sync No Yes — continuous telemetry
Reflects wear/calibration/state No Yes
Primary use Design, test, train Monitor, predict, optimize that asset
Runs when asset is off Yes Usually paired with the running asset

What a real digital twin is good for: predictive maintenance (the twin runs ahead of the real machine and flags an impending bearing failure), what-if on the live system (test a new cycle on the twin before pushing it to the running cell), anomaly detection (real telemetry diverges from twin prediction → something is wrong), and operator training / monitoring on the actual deployed configuration.

The honest take: most products marketed as "digital twins" are offline simulations with a telemetry dashboard. That is still useful — a good sim of your cell plus a live data view is valuable — but if there is no real-time model running in step with the physical asset and being corrected by its data, it is not a twin in the meaningful sense. Isaac Sim with USD is one of the few stacks built to do the real thing, because USD is a proper bidirectional scene/data format and Omniverse is designed for live synchronization. Gazebo can be wired into a twin-like loop with ROS 2 telemetry, but you are building the sync layer yourself.

Rule: before you call something a digital twin, ask "what is the live data link, and does the model state change when the real asset's state changes?" No link, no twin. It's a sim — which is fine, just name it correctly.

When the simulation lies

Every simulator lies. The professional skill is knowing which lies yours tells so you don't trust a result it can't support.

Contact lies. Already covered, and the biggest one. Stacking, grasping, pushing, and any task where the exact contact behavior matters is suspect. The friction your gripper relies on, the precise moment a foot slips, the way a peg jams in a hole — these are where rigid-body engines are weakest.

Deformables lie. Cables, fabric, foam, food, skin, soft grippers — rigid-body engines either skip them or fake them with simplified models (mass-spring, position-based dynamics, or finite-element add-ons that are slow). If your task involves a deformable object and your sim is a rigid-body engine, the sim's behavior is decorative. Specialized FEM/soft-body sims exist but are slow and narrow.

Friction lies. Coulomb friction with a single coefficient is a model, not reality. Real friction is velocity-dependent (static > kinetic), surface-dependent, contamination-dependent, and wears over time. The linearized friction cone (the pyramid) adds directional bias on top. Never trust a single friction number.

Sensor artifact lies. Default sensors are too clean. Depth has no dropouts, cameras have no motion blur or rolling shutter, lidar has no intensity falloff, IMUs have no bias. Each missing artifact is a way the real sensor will surprise your perception stack.

Numerical lies. Energy can leak or be injected by the integrator; under-iterated solvers make joints feel loose; large timesteps make stiff contacts bouncy or unstable; penetration-recovery impulses launch objects ("the object squirts out"). These are artifacts of how the sim computes, not of any physics.

The determinism trap. A sim can be perfectly deterministic — same seed, same result — and perfectly wrong. Determinism is great for CI and debugging; it is not evidence of physical accuracy. A reproducible lie is still a lie.

Rule: maintain a written list of "things our sim does not model" (deformables, exact friction, sensor X's artifact, cable drag) and gate every sim-only claim against it. The result you should distrust most is the one that depends on the physics your engine approximates worst.

Validation and CI in simulation

Simulation's most underused superpower is continuous integration. A sim is a repeatable environment; a repeatable environment is testable; a testable system can be guarded against regressions automatically. Most teams build a sim and never wire it into CI. That is leaving the best value on the table.

What a sim CI pipeline looks like:

  • Headless, containerized sim. No GUI, runs in a Docker container on a CI runner. Gazebo runs headless cleanly; Isaac Sim has headless modes; MuJoCo/PyBullet are trivial to run headless.
  • Deterministic seeds. Fix the random seed so a failure is reproducible. (Remember the determinism trap: this makes the test repeatable, not physically authoritative.)
  • Scripted scenarios. "Navigate from A to B avoiding the obstacle," "pick the part from this pose," "recover from this push." Each scenario is a test case.
  • Quantitative pass/fail metrics. Not "did it look right" but "final position error < 5 cm," "no collision events," "task completed within 12 s," "joint torque stayed under limit." Numbers, with units, and thresholds.
  • Run on every merge. The point is to catch the regression in the PR, not in the field.

A staged validation ladder, cheapest to most expensive:

  1. Unit / logic tests — no physics, just code. Milliseconds.
  2. Fast sim regression — PyBullet/MuJoCo headless, scripted scenarios, deterministic. Seconds to minutes. Runs on every commit.
  3. Full-stack sim — Gazebo or Isaac Sim with the real ROS 2 stack and realistic sensors. Minutes. Runs nightly or per-merge on key branches. See ROS 2 for the stack this exercises.
  4. Hardware-in-the-loop (HIL) — real controller/compute, simulated plant, RTF pinned to 1. Catches timing and latency bugs sim misses.
  5. Hardware test — the truth. Reserved for what passed everything above.

The reason to invest here is the same as for any test suite: it converts "we think it still works" into "we know it still works, here's the green run." For robotics that conversion is worth more than usual, because the alternative way to discover a regression is a robot driving into a wall.

Opinion with reason: put a fast deterministic sim regression suite in CI before you build anything fancier. It is the cheapest tier and catches the most bugs per dollar — logic errors, broken interfaces, obvious controller breakage — long before you spend GPU time on a photoreal twin.

Selecting a simulation stack

Choose by the job in front of you. The honest decision tree:

"I need to test my ROS 2 stack against simulated sensors and physics."Gazebo (Harmonic or Ionic). First-class ROS 2 integration, good sensor sim, DART physics. The default for system and integration testing.

"I need to train a locomotion or manipulation policy with RL, fast."Isaac Lab (if you have NVIDIA RTX hardware and want the full Omniverse ecosystem) or MuJoCo MJX / Playground (if you want open-source, cleaner articulated dynamics, and JAX). Both give GPU-parallel throughput. See reinforcement learning for robotics.

"I need photoreal sensors and/or a real digital twin of a physical cell."Isaac Sim. RTX rendering, PhysX 5, USD pipeline, the only one of these built for live synchronization at scale. Budget for the GPU and the setup time.

"I need a quick prototype, a teaching tool, or to reproduce a paper."PyBullet. Free, fast, hackable, enormous tutorial base. Or MuJoCo if the paper used it (much robotics RL research does).

"I want batteries-included with a big robot library for education or competition."Webots or CoppeliaSim.

A selection matrix on the axes that actually decide it:

If your priority is... Pick
ROS 2 integration & system testing Gazebo
GPU-parallel RL training Isaac Lab or MuJoCo MJX
Articulated-dynamics fidelity / research MuJoCo
Photoreal sensors & digital twins Isaac Sim
Fast free prototyping PyBullet
Education, batteries-included Webots / CoppeliaSim
Swappable physics engines in one scene CoppeliaSim

And the meta-decision most teams get wrong:

Opinion with reason: do not try to make one simulator do every job. Run a GPU sim for training and a ROS-native sim for integration. The cost of running two tools is far lower than the cost of fighting a training framework to do integration testing, or a integration sim to do parallel RL. Specialize the tools; share the robot model (URDF/USD/MJCF) across them as much as you can — and budget for the fact that model formats and contact behavior will not perfectly match between them, which is itself a small reality gap to manage.

The model-format reality: URDF is the ROS lingua franca (Gazebo, and importable elsewhere), MJCF is MuJoCo's native format, and USD is the Isaac/Omniverse format. Converters exist and mostly work for kinematics and visuals; they do not reliably carry contact parameters, friction, and actuator models across. Re-tune physics per simulator. Treat a clean cross-tool import as a bonus, not a guarantee.

Frequently asked questions

Which simulator should a beginner start with? PyBullet for the gentlest on-ramp (free, Python, huge tutorial base), or Gazebo if you are already in ROS 2. Move to MuJoCo or Isaac Lab once you hit RL and need throughput. Starting with Isaac Sim is a steep first climb unless photorealism or a digital twin is the actual goal.

Is Gazebo the same as Ignition? Yes. The project formerly called Ignition Gazebo was renamed back to "Gazebo" (the original Gazebo Classic is now legacy). Current releases are named alphabetically — Harmonic and Ionic are the recent ones. If a tutorial says "Ignition," it means modern Gazebo.

Why do my grasp results differ between PyBullet and Isaac Sim? Different physics engines (Bullet vs PhysX), different contact and friction models, different solver settings, and likely different friction parameters after import. Contact-rich tasks are exactly where engines disagree most. Re-tune friction and contact stiffness per engine and never assume a grasp tuned in one transfers to another — let alone to hardware.

Do I really need a GPU for robot simulation? Not for everything. Gazebo, PyBullet, MuJoCo (CPU), Webots, and CoppeliaSim run fine on CPU for single-environment integration and prototyping. You need a GPU for two things: photoreal rendering (Isaac Sim's RTX) and GPU-parallel RL training (Isaac Lab, MuJoCo MJX). If you're doing large-scale RL, the GPU is not optional.

What timestep should I use? Start at 1 ms (1 kHz) for contact-rich or stiff systems; you can often go to 2–5 ms with MuJoCo's stable solver, or substep in PhysX/Isaac. If contacts get bouncy, joints feel loose, or the sim explodes, the timestep is too large or the solver under-iterated. Smaller dt costs linearly in compute via lower RTF.

How do I actually close the reality gap? In order: model the actuator (lag + torque/current limits + backlash), add realistic sensing-to-actuation latency, run wide domain randomization over masses/frictions/gains/latency/noise, train at scale, and optionally adapt online. SysID centers the sim on your robot; randomization makes the policy robust to what you couldn't measure. Then validate on hardware.

Is domain randomization always the right move? For sim-to-real transfer of learned policies, almost always yes — it trades a little peak sim performance for robustness, which is the correct trade for deployment. The exception is when you have a very accurate model and a precise, repeatable environment (some industrial cells), where tight sysID can beat wide randomization. For anything operating in the messy real world, randomize.

Can a digital twin replace hardware testing? No. Even a real, synchronized twin is a model corrected by data; it cannot discover physics it doesn't model. A twin reduces, predicts, and monitors — it does not eliminate the need to validate on the physical asset. Anyone selling a twin as a hardware-test replacement is overselling.

Why does MuJoCo feel more stable than ODE or Bullet? Generalized coordinates (joints can't drift apart) plus a convex contact solver and implicit integration. That combination stays stable at larger timesteps and at the high stiffness and mass ratios real articulated robots have, where iterative PGS solvers in maximal coordinates struggle. It's a genuinely better fit for arms, legs, and humanoids.

What's the difference between Isaac Sim, Isaac Gym, and Isaac Lab? Isaac Sim is the full simulator (PhysX + RTX + USD). Isaac Gym was the original standalone GPU-parallel RL environment (now deprecated). Isaac Lab is the current GPU-parallel learning framework, built on Isaac Sim's physics, that replaced Isaac Gym and the earlier Orbit workflow. For new RL work, use Isaac Lab.

How fast can simulation actually run? A single contact-heavy, sensor-rich environment can run below real-time (RTF < 0.1). A simple environment runs many times real-time on one CPU core. GPU-parallel sim runs thousands of environments at once, for an aggregate throughput equivalent to thousands of times real-time — which is why RL data collection that used to take days now takes hours.

Should sensor noise be modeled even for non-learning controllers? Yes, if perception feeds the controller. A state estimator or perception stack tuned against noise-free simulated sensors is tuned against a fantasy. At minimum model the noise and bias of the sensors your control loop depends on, so your filter tuning and failure handling face something resembling reality.

Related guides