Compiling Ruby to machine language
Compiling Ruby to machine language

### Beyond the Interpreter: The Quest to Compile Ruby to Machine Code
For decades, the Ruby community has cherished the language for its elegance, developer happiness, and metaprogramming magic. It’s a language designed for humans first. This design, however, has traditionally placed it in the “interpreted” camp, alongside languages like Python and JavaScript. When you run a Ruby script, an interpreter reads your code line by line (or, more accurately, bytecode by bytecode) and executes it.
This stands in contrast to “compiled” languages like C, Go, or Rust, where a compiler translates the entire source code into native machine language—the raw 1s and 0s the CPU understands—before the program is ever run. This ahead-of-time compilation almost always results in a significant performance advantage.
This begs the question that has tantalized Ruby developers for years: Can we get the best of both worlds? Can we compile our beautiful Ruby code directly to high-performance machine language?
The answer is not a simple yes or no. It’s a fascinating journey through different strategies, trade-offs, and groundbreaking projects.
#### The Standard: How Ruby (MRI) Actually Runs
Before we talk about compiling, let’s clarify the default process. The standard Ruby implementation, MRI (Matz’s Ruby Interpreter), doesn’t directly interpret your `.rb` file. Instead, it follows a multi-step process:
1. **Parsing:** Your code is read and turned into an Abstract Syntax Tree (AST), a tree-like representation of your code’s structure.
2. **Compilation to Bytecode:** The AST is then compiled into a set of instructions for Ruby’s own virtual machine, called YARV (Yet Another Ruby VM). This isn’t machine code; it’s a portable, intermediate language that YARV understands.
3. **Interpretation:** YARV then executes this bytecode.
This process is highly flexible and enables Ruby’s famous dynamic nature, but the interpretation step is where the performance overhead lies.
#### The Modern Boost: Just-In-Time (JIT) Compilation
The most significant and practical step towards machine code in the mainline Ruby world has been the introduction of JIT compilers. A JIT compiler doesn’t compile everything upfront. Instead, it watches the program as it runs and identifies “hot spots”—methods or loops that are executed frequently. It then compiles *only these hot spots* into native machine code on the fly.
**MJIT (Method-based Just-In-Time Compiler)**
Introduced in Ruby 2.6, MJIT was the first official attempt. Its approach was clever but had limitations. It would write the Ruby method out as C code to a temporary file, then invoke a standard C compiler (like GCC or Clang) to turn it into a shared library, which was then loaded back into the running process. While it provided speedups for some CPU-intensive workloads, the overhead of calling out to an external compiler was significant.
**YJIT (Yet another Just-In-Time Compiler)**
The real game-changer is YJIT. Developed at Shopify and merged into Ruby 3.1, YJIT is a new, in-process JIT. It doesn’t rely on an external C compiler. It generates machine code directly within the Ruby process itself, making it far faster and more efficient.
YJIT has demonstrated impressive performance gains (30-40% on real-world applications like Shopify’s storefront) and is the primary focus for performance improvements in the Ruby ecosystem today. For most Rubyists, enabling YJIT is the most direct way to get parts of their application running as native machine code.
#### The Holy Grail: Ahead-of-Time (AOT) Compilation
JIT is fantastic, but what about compiling an *entire* Ruby application into a single, standalone executable, just like you would with Go or Rust? This is Ahead-of-Time (AOT) compilation, and it’s a much harder problem for a language as dynamic as Ruby.
The challenge lies in Ruby’s core features:
* **Metaprogramming:** How do you compile `define_method` when the method doesn’t exist until runtime?
* **Dynamic Typing:** The compiler doesn’t know if a variable `x` is an Integer, a String, or a custom object. This ambiguity makes it incredibly difficult to generate optimized machine code.
* **`eval`:** The ability to execute a string as code is the ultimate AOT compiler’s nightmare.
Despite these hurdles, several projects have tackled this challenge:
**Sorbet Compiler**
Stripe’s Sorbet is a static type checker for Ruby. Building on this, the Sorbet team has been developing an AOT compiler. The key is that it compiles a *statically-typed subset* of Ruby. By using Sorbet’s type annotations, you give the compiler the information it needs to generate efficient, native code. It’s a powerful demonstration that if you are willing to trade some of Ruby’s dynamism for type safety, you can unlock incredible performance.
**Crystal Lang**
Crystal is not a Ruby compiler, but a separate language that is heavily inspired by Ruby’s syntax. It was born from the question, “What if Ruby was statically typed and AOT compiled?” The result is a language that feels remarkably like Ruby to write but compiles down to a single, blazing-fast native binary. For developers looking for C-like performance with Ruby-like syntax, Crystal is often the answer.
### Conclusion: A Blurring Line
So, can you compile Ruby to machine language?
**Yes, absolutely.** The modern Ruby virtual machine, through YJIT, is already doing it for the hottest parts of your code at runtime. This is the most practical and widely adopted approach today for speeding up existing Rails and Ruby applications.
For those seeking the absolute peak of performance, AOT compilation is the goal. While compiling the full, dynamic Ruby language remains a monumental task, projects like the Sorbet Compiler show that by adopting static types, you can compile large subsets of Ruby into native executables. And for new projects, languages like Crystal offer a “compiled Ruby” experience from the ground up.
The line between interpreted and compiled is no longer a rigid wall. It’s a spectrum, and Ruby is steadily moving towards a future where developer joy and machine-level performance are no longer mutually exclusive.
