NP Contemplation: June 2008

Monday, June 30, 2008

The difference between smalltalk and the rest

Smalltalk programs are ecosystems.

A program behaves like an ecosystem when the focus is put on run time - not compile time. This is a major shift.

People coming from static languages complain that Smalltalk doesn't have Netbeans, Eclipse or whatever. Smalltalk - and potentially other dynamic languages - has something different.

Smalltalk provides an environment where you can edit, run and analyze code in real time. Imagine being able to grow a program. Imagine being able observe it grow. Imagine being able to painlessly debug and analyze it. This is what it means to focus on run time.

This is hard to understand if you're coding in a glorified notepad.

When you're coding in Smalltalk your program is running persistently in the background. It is alive. Inspecting it is just a click away. When you create an object, you can actually right-click on it and get a list of it's methods. You can just as easily change the implementations of these methods. Without restarting anything.

It's a major cultural shift. Smalltalk programmers never fight the compiler, they spend their time debugging their programs. This is a different way of developing a program.

Don't trust me. Take 2 minutes and see something interesting.

Tuesday, June 17, 2008

Here's why dynamic languages are slow and how to fix it

Dynamic languages are emerging as the Next Big Thing. They are known for making development faster, being more powerful and more flexible. Today, more and more people are using them in production environments. However, one problem stands in the way of mass adoption: SPEED. There is an urban legend that dynamic programs are way slower than their static counterparts. Here's my take on it.

Why are dynamic languages slow TODAY?

The purpose of a dynamic language is to have as few static elements as possible. The idea is that this offers more flexibility. For example, in Python method calls are never static. This means that the actual code that will be executed is known only at run time. This is what makes monkey patching possible. This is what allows you to have great unit testing frameworks.


# globals.py
A = 2


# main.py
from globals import A
A = []

Dynamic languages leave as many decisions as possible to run time. What is the type of A? You can only know for sure when the code runs because it can be changed at any point in the program.
The result is that it is hard to analyse dynamic languages in order to make optimizations. Compared to static languages - which offer plenty of opportunities for optimization - dynamic languages are hard to optimize. Thus their implementations are usually slow.

The problem with dynamic languages is that it isn't trivial to optimize an addition. You can hardly know what '+' will be binded to at runtime. You probably can't even infer the types of the operands. This is the result of mutation. In Python, almost everything is mutable. This leaves few information the compiler can rely on.

Does mutability hurt performance and why?

It can, depending on the case. Let me illustrate how by comparing the factorial function in C and Python. Don't think of this as a benchmark. This is just an example.

Compiling the factorial function in C with LLVM-GCC will generate efficient machine code.


// Factorial in C
int fac(int n) {
  if (n == 0) return 1;
  return n*fac(n-1);
}
int main(){
  return fac(30);
}


; Assembly generated by LLVM-GCC
_main:
  movl $1, %eax
  xorl %ecx, %ecx
  movl $30, %edx
.align 4,0x90
LBB1_1: ## bb4.i
  imull %edx, %eax
  decl %edx
  incl %ecx
  cmpl $30, %ecx
  jne LBB1_1 ## bb4.i
LBB1_2: ## fac.exit
  ret

The compiler was able to infer many properties from the source code. For example, it concluded that the fac function referenced in main was the fac defined at compile time. This allowed the compiler to replace the assembly call instruction with fac's code. The function was then specialized for the call site and thanks to static typing, the compiler was able to transform each arithmetic operations into direct machine instructions.
Can you notice the other optimizations?

Let's look at how CPython executes the factorial.

# fac.py
def fac(n):
  return 1 if n == 0 else n * fac(n -1)
fac(30)

First, fac.py is parsed and translated to bytecode instructions. Then the bytecode instructions are interpreted by the CPython Virtual Machine.


# CPython Bytecode for fac.py
# Think of this as an interpreted language which Python is translated into.
# See http://docs.python.org/lib/bytecodes.html
# fac
11           0 LOAD_FAST                0 (n)
    3 LOAD_CONST               1 (0)
    6 COMPARE_OP               2 (==)
    9 JUMP_IF_FALSE            7 (to 19)
   12 POP_TOP   
   13 LOAD_CONST               2 (1)
   16 JUMP_FORWARD            18 (to 37)
>>   19 POP_TOP   
   20 LOAD_FAST                0 (n)
   23 LOAD_GLOBAL              0 (fac)
   26 LOAD_FAST                0 (n)
   29 LOAD_CONST               2 (1)
   32 BINARY_SUBTRACT
   33 CALL_FUNCTION            1
   36 BINARY_MULTIPLY
>>   37 RETURN_VALUE
# main
14           0 LOAD_GLOBAL              0 (fac)
    3 LOAD_CONST               1 (30)
    6 CALL_FUNCTION            1
    9 RETURN_VALUE

CPython could not inline the call to fac because this would violate the language's semantics. In Python, fac.py could be imported at run time by another module. It cannot inline fac into main because a sub-module could change the binding of fac and thus invalidate main. And because main doesn't have it's own copy of fac, the code cannot be specialized for this particular call. This hurts because it would be very beneficial to specialize the function for an integer argument.

Notice that there are no references to machine addresses. CPython adds a layer of indirection to access every object in order to implement the dynamism of Python. For example, main is found by a look-up in a table. Even constant numbers are found through look-ups. This adds a significant amount of slow memory read/writes and indirect jumps.

Python doesn't even contain any explicit hints you can give to help the compiler. This makes the problem of optimizing Python non-trivial.

What about type inference?

The problem of type inference in dynamic languages remains unsolved. Type inference is a form of static analysis. Static analysis is the analysis of source code at compile time to derive some "truths" about it. You can imagine how this falls short for dynamic languages.

Michael Salib attempted to solve this problem with StarKiller. The compiler manages type inference by collecting more information than usual and using the CTA algorithm. Instead of compiling each module separatly, like most compilers, the whole program is analyzed and compiled in one pass. The knowledge of the complete program opens the door to more optimizations. The fac function of the previous example can be specialized by Starkiller because it knows how it will be used.

Though the work seems very promising, it has three major flaws. First, the compiler accepts only a subset of the Python language. Advanced functions like eval and exec aren't supported. Second, whole-program analysis doesn't scale with bigger projects. Compiling 100,000 LOC would take a prohibitive amount of time. Third, the compiler violates Python's semantics by doing whole-program analysis. Like most dynamic languages, the import mechanism of Python is done at runtime. The language doesn't guarantee that the module available at compile time is the same as the module available at run time.

Read this for more.

What about VMs?

Virtual Machines are a natural fit for dynamic languages. VM with JIT compilers are able to optimize a dynamic program without having to guess it's behavior in advance. This saves a lot of heavy lifting. Programs are optimized simply by observing their behavior while they run. This is known as dynamic analysis. For instance, noticing that fac is often called with an integer argument, the VM could create a new version of that function specialized for integers and use it instead.

In my opinion Virtual Machines are not a long-term solution.

Self-hosting a VM is prohibitive.
A VM sets a limit on the kinds of programs you can make. No Operating Systems, no drivers, no real-time systems, etc.
Optimizing a program run through a VM is hard because you cannot know exactly what is going on behind the hood. There are many layers and many corners where performance can slip out.

For most projects, these problems aren't an issue. But I believe their existence would restrain dynamic languages. They are enough to prevent a dynamic language from being a general purpose tool. And that is what people want: no restrictions, no surprises, pure freedom.

How would I make them faster?


from types import ModuleType
import re
declare(re, type=ModuleType, constant=True, inline=True)

A compiler helped by Static Annotations is the way to go. Please don't put all static annotations in the same bag. Static annotations like type declarations don't have to be as painful as JAVA's. Annotations are painful in Java because they are pervasive and often useless. They restrict the programmer. Programmers have to fight them. Annotations can be just the opposite. They can give the programmer more freedom! With them, programmers can set constraints to their code where it matters. Because they have the choice, static annotations become a tool that offers MORE flexibility.

A savvy programmer could reduce the dynamism of his code at a few key points. Just enough to allow type inference and the likes to do their job well. Optimizing some code would usually just become a matter of expressing explicitly the natural constraints that apply to it.


# Just an example.
def fac(n):
  assert type(n) in [float, int]
  return 1 if n == 0 else n * fac(n -1)

There are a many ways to implement static annotations in dynamic languages. I believe the flexibility of dynamic languages can allow static annotations to be very convenient. How would you do it?

Saturday, June 14, 2008

What everybody ought to know about RESEARCH

Why do so few scientists make significant contributions and so many are forgotten in the long run?

Richard Hamming asked himself and some of the greatest scientists of the 20th century this very question. In his classic "You and Your Research" talk, he relates what led him to the discovery of the Hamming Code and the Hamming Distance among other things. The following is my humble attempt to summarize it to make it more accessible.

1) Research is not just a matter of luck. Consider Einstein for example. Can luck explain that he discovered Special Relativity and - 10 years later - the General Theory of Relativity? One after another, you see people setting a pattern of Great Science.

2) Successful scientists are courageous. Once you get your courage up and believe that you can do important problems, then you can. If you think you can't, almost surely you are not going to. Research is not easy. If you always give up early on, you won't get anywhere. Think and continue to think any under circumstance.

3) Don't work on big problems right away. Research is hard. Expect to be paralyzed if you skip stepping stones to work a big problem. Build some background knowledge by working on smaller problems first.

4) Work hard. Given two people of approximately the same ability with one working 10% more than the other, the latter will outproduce the former more than twice over the course of a lifetime. The more you know, the more you learn; the more you learn, the more you can do; the more you can do, the more the opportunity.

5) It's important to cultivate ambiguity. Believe in you theory enough to push forward. Doubt it enough to notice the flaws and the errors. If you don't believe, you will never get started. If you don't doubt, you may lose a lot of time working on something wrong. Noticing and fixing flaws will make your theory stronger.

6) You have to want to do something significant. To quote Pasteur, "Luck favors the prepared mind". You can't win Lotto without participating. If you never try to work on anything significant, the odds are against you. Newton used to say "If others would think as hard as I did, then they would get similar results". You have to try.

If you enjoyed this, I recommend the original talk.

Friday, June 13, 2008

The secret of the LLVM C bindings

Ever wanted to use LLVM from C? Can't find any documentation? Welcome.

Since I'm considering retargeting CLISP'S JIT Compiler I've been experimenting with LLVM. LLVM is an optimizing compiler for a virtual instruction set. Technically, it is very interesting. And this year, with Apple and CLANG in the game, it seems to be here to stay.

A Factorial in C with LLVM
Let's make a factorial function using the C bindings of LLVM 2.3+.
The function we will describe in LLVM instructions is illustrated below.

I inserted the phi instruction manually to make things more interesting.
Paste this in your favorite editor and save it as "fac.c":


// Headers required by LLVM
#include <llvm-c/Core.h>
#include <llvm-c/Analysis.h>
#include <llvm-c/ExecutionEngine.h>
#include <llvm-c/Target.h>
#include <llvm-c/Transforms/Scalar.h>


// General stuff
#include <stdlib.h>
#include <stdio.h>


int main (int argc, char const *argv[])
{
  char *error = NULL; // Used to retrieve messages from functions
  LLVMModuleRef mod = LLVMModuleCreateWithName("fac_module");
  LLVMTypeRef fac_args[] = { LLVMInt32Type() };
  LLVMValueRef fac = LLVMAddFunction(mod, "fac", LLVMFunctionType(LLVMInt32Type(), fac_args, 1, 0));
  LLVMSetFunctionCallConv(fac, LLVMCCallConv);
  LLVMValueRef n = LLVMGetParam(fac, 0);

  LLVMBasicBlockRef entry = LLVMAppendBasicBlock(fac, "entry");
  LLVMBasicBlockRef iftrue = LLVMAppendBasicBlock(fac, "iftrue");
  LLVMBasicBlockRef iffalse = LLVMAppendBasicBlock(fac, "iffalse");
  LLVMBasicBlockRef end = LLVMAppendBasicBlock(fac, "end");
  LLVMBuilderRef builder = LLVMCreateBuilder();

  LLVMPositionBuilderAtEnd(builder, entry);
  LLVMValueRef If = LLVMBuildICmp(builder, LLVMIntEQ, n, LLVMConstInt(LLVMInt32Type(), 0, 0), "n == 0");
  LLVMBuildCondBr(builder, If, iftrue, iffalse);

  LLVMPositionBuilderAtEnd(builder, iftrue);
  LLVMValueRef res_iftrue = LLVMConstInt(LLVMInt32Type(), 1, 0);
  LLVMBuildBr(builder, end);

  LLVMPositionBuilderAtEnd(builder, iffalse);
  LLVMValueRef n_minus = LLVMBuildSub(builder, n, LLVMConstInt(LLVMInt32Type(), 1, 0), "n - 1");
  LLVMValueRef call_fac_args[] = {n_minus};
  LLVMValueRef call_fac = LLVMBuildCall(builder, fac, call_fac_args, 1, "fac(n - 1)");
  LLVMValueRef res_iffalse = LLVMBuildMul(builder, n, call_fac, "n * fac(n - 1)");
  LLVMBuildBr(builder, end);

  LLVMPositionBuilderAtEnd(builder, end);
  LLVMValueRef res = LLVMBuildPhi(builder, LLVMInt32Type(), "result");
  LLVMValueRef phi_vals[] = {res_iftrue, res_iffalse};
  LLVMBasicBlockRef phi_blocks[] = {iftrue, iffalse};
  LLVMAddIncoming(res, phi_vals, phi_blocks, 2);
  LLVMBuildRet(builder, res);

  LLVMVerifyModule(mod, LLVMAbortProcessAction, &error);
  LLVMDisposeMessage(error); // Handler == LLVMAbortProcessAction -> No need to check errors


  LLVMExecutionEngineRef engine;
  LLVMModuleProviderRef provider = LLVMCreateModuleProviderForExistingModule(mod);
  error = NULL;
  LLVMCreateJITCompiler(&engine, provider, &error);
  if(error) {
    fprintf(stderr, "%s\n", error);
    LLVMDisposeMessage(error);
    abort();
  }

  LLVMPassManagerRef pass = LLVMCreatePassManager();
  LLVMAddTargetData(LLVMGetExecutionEngineTargetData(engine), pass);
  LLVMAddConstantPropagationPass(pass);
  LLVMAddInstructionCombiningPass(pass);
  LLVMAddPromoteMemoryToRegisterPass(pass);
  // LLVMAddDemoteMemoryToRegisterPass(pass); // Demotes every possible value to memory
  LLVMAddGVNPass(pass);
  LLVMAddCFGSimplificationPass(pass);
  LLVMRunPassManager(pass, mod);
  LLVMDumpModule(mod);

  LLVMGenericValueRef exec_args[] = {LLVMCreateGenericValueOfInt(LLVMInt32Type(), 10, 0)};
  LLVMGenericValueRef exec_res = LLVMRunFunction(engine, fac, 1, exec_args);
  fprintf(stderr, "\n");
  fprintf(stderr, "; Running fac(10) with JIT...\n");
  fprintf(stderr, "; Result: %d\n", LLVMGenericValueToInt(exec_res, 0));

  LLVMDisposePassManager(pass);
  LLVMDisposeBuilder(builder);
  LLVMDisposeExecutionEngine(engine);
  return 0;
}

Compiling the code
Generating the object file is a no-brainer:


gcc `llvm-config --cflags` -c fac.c

Linking is a little trickier. Even though you are writing C code, you have to use a C++ linker.


g++ `llvm-config --libs --cflags --ldflags core analysis executionengine jit interpreter native` fac.o -o fac

All set!

NP Contemplation

Monday, June 30, 2008

The difference between smalltalk and the rest

Tuesday, June 17, 2008

Here's why dynamic languages are slow and how to fix it

Saturday, June 14, 2008

What everybody ought to know about RESEARCH

Friday, June 13, 2008

The secret of the LLVM C bindings

About Me

More

Blog Archive

Labels