## Tuesday, June 17, 2008

### Here's why dynamic languages are slow and how to fix it

Dynamic languages are emerging as the Next Big Thing. They are known for making development faster, being more powerful and more flexible. Today, more and more people are using them in production environments. However, one problem stands in the way of mass adoption: SPEED. There is an urban legend that dynamic programs are way slower than their static counterparts. Here's my take on it.

Why are dynamic languages slow TODAY?

The purpose of a dynamic language is to have as few static elements as possible. The idea is that this offers more flexibility. For example, in Python method calls are never static. This means that the actual code that will be executed is known only at run time. This is what makes monkey patching possible. This is what allows you to have great unit testing frameworks.

# globals.pyA = 2
# main.pyfrom globals import AA = []
Dynamic languages leave as many decisions as possible to run time. What is the type of A? You can only know for sure when the code runs because it can be changed at any point in the program.
The result is that it is hard to analyse dynamic languages in order to make optimizations. Compared to static languages - which offer plenty of opportunities for optimization - dynamic languages are hard to optimize. Thus their implementations are usually slow.

The problem with dynamic languages is that it isn't trivial to optimize an addition. You can hardly know what '+' will be binded to at runtime. You probably can't even infer the types of the operands. This is the result of mutation. In Python, almost everything is mutable. This leaves few information the compiler can rely on.

Does mutability hurt performance and why?

It can, depending on the case. Let me illustrate how by comparing the factorial function in C and Python. Don't think of this as a benchmark. This is just an example.

Compiling the factorial function in C with LLVM-GCC will generate efficient machine code.

// Factorial in Cint fac(int n) {  if (n == 0) return 1;  return n*fac(n-1);}int main(){  return fac(30);}
; Assembly generated by LLVM-GCC_main:  movl $1, %eax xorl %ecx, %ecx movl$30, %edx.align 4,0x90LBB1_1: ## bb4.i  imull %edx, %eax  decl %edx  incl %ecx  cmpl \$30, %ecx  jne LBB1_1 ## bb4.iLBB1_2: ## fac.exit  ret

The compiler was able to infer many properties from the source code. For example, it concluded that the fac function referenced in main was the fac defined at compile time. This allowed the compiler to replace the assembly call instruction with fac's code. The function was then specialized for the call site and thanks to static typing, the compiler was able to transform each arithmetic operations into direct machine instructions.
Can you notice the other optimizations?

Let's look at how CPython executes the factorial.

# fac.pydef fac(n):  return 1 if n == 0 else n * fac(n -1)fac(30)
First, fac.py is parsed and translated to bytecode instructions. Then the bytecode instructions are interpreted by the CPython Virtual Machine.
# CPython Bytecode for fac.py# Think of this as an interpreted language which Python is translated into.# See http://docs.python.org/lib/bytecodes.html# fac11           0 LOAD_FAST                0 (n)    3 LOAD_CONST               1 (0)    6 COMPARE_OP               2 (==)    9 JUMP_IF_FALSE            7 (to 19)   12 POP_TOP      13 LOAD_CONST               2 (1)   16 JUMP_FORWARD            18 (to 37)>>   19 POP_TOP      20 LOAD_FAST                0 (n)   23 LOAD_GLOBAL              0 (fac)   26 LOAD_FAST                0 (n)   29 LOAD_CONST               2 (1)   32 BINARY_SUBTRACT   33 CALL_FUNCTION            1   36 BINARY_MULTIPLY>>   37 RETURN_VALUE# main14           0 LOAD_GLOBAL              0 (fac)    3 LOAD_CONST               1 (30)    6 CALL_FUNCTION            1    9 RETURN_VALUE

CPython could not inline the call to fac because this would violate the language's semantics. In Python, fac.py could be imported at run time by another module. It cannot inline fac into main because a sub-module could change the binding of fac and thus invalidate main. And because main doesn't have it's own copy of fac, the code cannot be specialized for this particular call. This hurts because it would be very beneficial to specialize the function for an integer argument.

Notice that there are no references to machine addresses. CPython adds a layer of indirection to access every object in order to implement the dynamism of Python. For example, main is found by a look-up in a table. Even constant numbers are found through look-ups. This adds a significant amount of slow memory read/writes and indirect jumps.

Python doesn't even contain any explicit hints you can give to help the compiler. This makes the problem of optimizing Python non-trivial.

What about type inference?

The problem of type inference in dynamic languages remains unsolved. Type inference is a form of static analysis. Static analysis is the analysis of source code at compile time to derive some "truths" about it. You can imagine how this falls short for dynamic languages.

Michael Salib attempted to solve this problem with StarKiller. The compiler manages type inference by collecting more information than usual and using the CTA algorithm. Instead of compiling each module separatly, like most compilers, the whole program is analyzed and compiled in one pass. The knowledge of the complete program opens the door to more optimizations. The fac function of the previous example can be specialized by Starkiller because it knows how it will be used.

Though the work seems very promising, it has three major flaws. First, the compiler accepts only a subset of the Python language. Advanced functions like eval and exec aren't supported. Second, whole-program analysis doesn't scale with bigger projects. Compiling 100,000 LOC would take a prohibitive amount of time. Third, the compiler violates Python's semantics by doing whole-program analysis. Like most dynamic languages, the import mechanism of Python is done at runtime. The language doesn't guarantee that the module available at compile time is the same as the module available at run time.

Read this for more.

Virtual Machines are a natural fit for dynamic languages. VM with JIT compilers are able to optimize a dynamic program without having to guess it's behavior in advance. This saves a lot of heavy lifting. Programs are optimized simply by observing their behavior while they run. This is known as dynamic analysis. For instance, noticing that fac is often called with an integer argument, the VM could create a new version of that function specialized for integers and use it instead.

In my opinion Virtual Machines are not a long-term solution.

1. Self-hosting a VM is prohibitive.
2. A VM sets a limit on the kinds of programs you can make. No Operating Systems, no drivers, no real-time systems, etc.
3. Optimizing a program run through a VM is hard because you cannot know exactly what is going on behind the hood. There are many layers and many corners where performance can slip out.

For most projects, these problems aren't an issue. But I believe their existence would restrain dynamic languages. They are enough to prevent a dynamic language from being a general purpose tool. And that is what people want: no restrictions, no surprises, pure freedom.

How would I make them faster?
from types import ModuleTypeimport redeclare(re, type=ModuleType, constant=True, inline=True)

A compiler helped by Static Annotations is the way to go. Please don't put all static annotations in the same bag. Static annotations like type declarations don't have to be as painful as JAVA's. Annotations are painful in Java because they are pervasive and often useless. They restrict the programmer. Programmers have to fight them. Annotations can be just the opposite. They can give the programmer more freedom! With them, programmers can set constraints to their code where it matters. Because they have the choice, static annotations become a tool that offers MORE flexibility.

A savvy programmer could reduce the dynamism of his code at a few key points. Just enough to allow type inference and the likes to do their job well. Optimizing some code would usually just become a matter of expressing explicitly the natural constraints that apply to it.

# Just an example.def fac(n):  assert type(n) in [float, int]  return 1 if n == 0 else n * fac(n -1)

There are a many ways to implement static annotations in dynamic languages. I believe the flexibility of dynamic languages can allow static annotations to be very convenient. How would you do it?

Reilly said...

Just one quibble. Static types don't prevent kick-ass testing frameworks. QuickCheck for Haskell is the equal of any of the testing frameworks for dynamically typed languages.

Maht said...

As interesting as your analysis is, it is not true that you can't make a VM that's also an OS.

Here's one that's mature and is in use today :

http://www.vitanuova.com/

irc://ircfreenode.org/#inferno

Kid Meier said...

To further Maht's point: while you can't get down to the hardware directly in a VM, its really a moot point since that is only a concern in a small number of applications/systems.

The widespread success of the Java VM is proof of this. Yes, I don't see any OSes with significant market share written in Java, but I would have to question if that is really a bad thing.

You will get almost nowhere trying to make a single tool that is designed to solve every single problem in computing.

Yann said...

@reilly
Yes, there are some good testing frameworks for static languages but they are very different from what I was talking about.
Quickcheck automatically generates tests, right? I don't know any dynamic language with a framework like this. But with monkey patching - for instance - dynamic languages offer other kinds of possibilities. I really recommend you take a look at the link I mentioned:
http://www.djangoproject.com/documentation/testing/

@maht:
Very cool link. Also I didn't mean to say it was impossible to make an OS with a VM.
Microsoft is even supporting similar effort:
http://en.wikipedia.org/wiki/Singularity_(operating_system)

michaelw said...

@Yann:
Quickcheck automatically generates tests, right? I don't know any dynamic language with a framework like this.

Quviq QuickCheck is a port to Erlang, with some interesting added capabilities (shrinking test cases).

Anothing framework for automatically generating test cases is an RT extension by pfdietz.

riffraff said...

there are quickcheck ports for ruby (rushcheck) and for perl (test::lectrotest, IIRC).
I recall one for python too, but not sure about the name.

But I can't see why you could not have testing DSLs a-la BDD in haskell, type classes seem enough, and Scala seem to have a nice framework for that.

Luis said...

You mentioned Starkiller, which is just vaporware (announced with fanfare but never released). For the real thing check Mark Dufour's Shedskin (http://shed-skin.blogspot.com/). It is a python to c++ compiler which can compile a whole program or generate extensions for cpython. It works right now and is very usable.
It works by restricting your coding style in a static way (not changing the type of variables at any time within your code), it performs a type inference analisis and generates equivalent c++ code.

As for your prefered way of doing it, you may want to check Boo (http://boo.codehaus.org/) or Cobra (http://cobra-language.com/). Both languages are very similar and are python-like languages for the .NET framework. They are static, but with type inference, allowing you to write code without declaring types.

Rick said...
This comment has been removed by a blog administrator.
Karthika Shree said...

It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command.
Java Training in Chennai

sunitha vishnu said...

It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
Android Training in Chennai
Ios Training in Chennai