NP Contemplation

A machine that can dream

2012-02-16T14:04:00.000-05:00

Note: This takes 20-30s to load, supported browsers are Chrome, Safari 4.0+,
Firefox 4+, Opera 10.0+. It requires heavy computations.

This is a live demo of a type of Boltzmann machine in Javascript. The evolving image at the bottom is what the machine is thinking and the flickering lights are the state of its neurons. This particular machine has been shown thousands of faces, and now it imagines faces when it dreams.

The Boltzmann machines have a remarkable ability similar to dreaming. They were first introduced by Geoff Hinton and Terry Sejnowski as a model of the brain in 1983. They can discover patterns when they are learning from data. And when run in a closed loop they can generate or dream new examples based on what is has learned.

How do they work? The full answer is beyond the scope of this post, but for motivated readers here's a quick explanation focusing on the restricted Boltzmann machine (RBM). It is defined by its so-called energy function

$$E({\bf v}, {\bf h}) = - \sum\limits_{i,j} v_i h_j w_{ij}$$

This function measures the energy between a sensory input vector ${\bf v}$ and the state of each neuron ${\bf h}$. The parameters $w_{ij}$ weight correlations in the data. This is used to define the probability

$$p({\bf v}, {\bf h}) = \frac{e^{-E({\bf v}, {\bf h})}}{\sum\limits_{{\bf v}',{\bf h}'} e^{-E({\bf v}', {\bf h'})}}$$

where the denominator is the summation of the energy of all possible configurations of inputs and brain states.

Learning consists of adjusting $w_{ij}$ to maximize the probability the RBM assigns to what you show it. This will make the neurons detect patterns in the sensory input. Dreaming consists of traveling in probable sensory inputs and brain states using Markov Chain Monte Carlo (MCMC).

If you want to know more about Boltzmann machines and Deep Learning, you should checkout this excellent talk by Geoff Hinton, or you can read this introductory paper by Yoshua Bengio and Yann LeCun.

You can also find here a pythonic implementation of the binary restricted Boltzmann machine (RBM) that I wrote.

Self-Organization and Conway's Game of life (With Interactive Javascript Canvas)

2009-08-23T23:47:00.006-04:00

Introduction

I want to show you that self-organization is the magic of life. Conway's game of life defines a simple universe governed by 3 simple laws in which creative, intelligent and stable organism emerge and die.

Conway's Game of Life in Javascript

Generate Big bang!

Note: You can also paint on the canvas using the mouse.

Supported browsers are Safari 2.0+, Opera 9.0+, Firefox 1.5+ and Chrome. Use Safari or Chrome for a much better experience. Internet Explorer does not support this technology yet.

Explanation

The universe of the game is a two-dimensional grid of cells. Each cell can be either dead or alive. Each cell interacts with it's 8 direct neighbors in the following way:

1. Birth. Any dead cell with exactly 3 live neighbors becomes live.
2. Survival. Any live cell with exactly 2 or 3 live neighbors survives.
3. Death. Any live cell with less than 2 or more than 3 neighbors dies.

The behaviors that emerge from these simple rules may be considered creative and beautiful.

Take away

Here's a quote from a simpler way by Margaret J. Wheatley.

The tendency to organize is not just found in living beings. While it is increasingly difficult in science to distinguish the living from the non-living, few of us would categorize light bulbs as alive. Yet light bulbs have exhibited a breathtaking tendency to self-organize when wired together with other bulbs. Building on earlier work, theoretical biologist Stuart Kauffman conducted a light bulb experiment in the 1960s.

ToadKauffman was interested in exploring how the complex network of human genes had developed, but he used light bulbs to demonstrate that self-organization is a fundamental process found everywhere. He wired together a network of two hundred light bulbs. Each bulb was assigned a relationship with two other bulbs. It was to turn on or off based only on the behavior of either of its two assigned partners. Even with such simple conditions, the number of possible states of on and off bulbs is 10³⁰. The human imagination cannot begin to comprehend this number of possibilities. [...]

But the pattern of organization appeared instantly. After exploring only thirteen states, the system of bulbs settled into a repeatable pattern, flashing on and off in a repetitive cycle of four configuration. [...]

GliderWe live in a universe which seeks organization. When simple relationships are created, patterns of organization emerge. Networks, living or not, have the capacity to self-organize. Global order arises from local connections. It was these cooperative structures that first created life. Life linked with other life and discovered how to continue discovering itself. [...]

To me this explains how life can emerge from the inanimate.

Clojure: Genetic Mona Lisa problem in 250 beautiful lines

2009-01-25T14:45:00.015-05:00

Clojure is surrounded by hype these days. The word on the streets is that Clojure is the Next Big Thing. It has access to the largest library of code and it proposes a nice solution the to the concurrency problem. Lots more has been said...

But I haven't seen a lot of code.

So I set out to make a small but meaningful program in Clojure to get a sense of it's potential.

I give Clojure two thumbs up, and I think you'll do too.

The Mona Lisa Problem

The program I present tries to paint Mona Lisa with a small number of semi-transparent colored polygons. It does so by using Darwin's theory of evolution to evolve programs that draw Mona Lisa.

Here's the simplified algorithm:
1. Generate an initial population of programs
2. Take the n best programs of the current population
3. Create Children from the best programs by mating and mutating them
4. Replace the current population with the n-best and the children programs
5. Repeat from 2 until satisfied

See my more complete java version for details and don't miss Roger Alsing's seminal post.

Clojure is Lisp

Lisp code can be treated as data. That makes evolving programs painless. My Genetic Algorithm simply evolves lambdas. Running the evolved programs is a matter of calling 'eval.

The program is side-effects free(almost!). The majority of the program is functional. There are only two sources of side-effects:
1. Drawing on the canvas
2. Handling the GUI

Clojure is Java

It can be distributed and run anywhere. Clojure is compiled to Java bytecodes.

Clojure can use Java objects directly without wrappers. I was able to create a cross-platform GUI in a few lines with Swing.

Let me illustrate this by creating an object deriving from JPanel that overides the paint method to draw a green rectangle.


(def rec-panel (proxy [JPanel] []
                 (paint [graphics]
                   (doto graphics
                     (.setColor Color/green)
                     (.fillRect 0 0 10 10)))))

Parallelism

I painlessly parallelized my code because it is side-effects free. Clojure provides primitives that can parallelize functional code.

The bottleneck of the application is calculating the fitness of each individual of a population. In functional terms, it is expressed by mapping each individual to it's fitness using 'map with 'fitness function as it's first argument and the population as it's second argument.

Clojure provides the 'pmap function to make that mapping parallel. It divides the work between worker threads and it uses as many of them as you have CPU cores.

Thus, writing functionaly allowed me to parallelize my code by adding one 'p' character.

See clojure.parallel.

Performance

Performance wasn't a concern when I wrote the application. I tried to keep it simple. After all, that is the purpose of using a high-level language.

Surprisingly, the fitness function(the bottleneck) runs faster in Clojure than in Java. Unfortunately I don't have time to dig into this now.

Here's a graph comparing the run time the fitness function in Java and Clojure. The measure is the average of 25 samples of 100 runs of the fitness function in each language.

Here's the benchmark for reference.

A deal breaker

Lambdas are not garbage collected.

Yes. That means lambdas can be the cause of memory leaks.

As described by Charles Nutter, each lambda in Clojure is an anonymous class in Java. The problem is that classes are never garbage collected. They reside in a special place called the PermGen.

No need to say, my program quickly fills up the PermGen.

The only solution for now is to extend the PermGen.


java -XX:MaxPermSize=1024m -cp clojure.jar clojure.lang.Repl mona-clojure.clj

I don't think this is a problem for most applications though.

EDIT: As of r1232, lambdas created in eval can be GCed. Thanks to Christophe Grand for pointing it out.

The Mona Lisa Challenge

Let's see what your favorite language can do. The challenge is to write a small program that solves the Mona Lisa problem using Genetic Programming.

Show us some code!

Some of the languages I'd like to see are Haskell, Factor, Potion, Ioke, Erlang among lots of others.

Don't forget to leave a link in the comment section of this post.

The code

Here's the github repository.

Please read the source code with syntax highlighting on github.

The following is the full code listing for the impatient only.


(import
  '(java.awt Graphics Graphics2D Color Polygon)
  '(java.awt.image BufferedImage PixelGrabber)
  '(java.io File)
  '(javax.imageio ImageIO)
  '(javax.swing JFrame JPanel JFileChooser))

; ---------------------------------------------------------------------
; This section defines the building blocks of the genetic programs.

; color :: Integer -> Integer -> Integer -> Integer -> Color
(defn color [red blue green alpha] {:type :Color :red red :blue blue :green green :alpha alpha})

; point :: Integer -> Integer -> Point
(defn point [x y] {:type :Point :x x :y y})

; polygon :: Color -> [Point] -> Polygon
(defn polygon [color points] {:type :Polygon :color color :points points})

; draw-polygon :: Graphics -> Polygon -> Nothing
(defn draw-polygon [graphics polygon]
  (doto graphics
    (.setColor (new Color (:red (:color polygon)) 
                          (:blue (:color polygon))
                          (:green (:color polygon))
                          (:alpha (:color polygon))))
    (.fillPolygon (let [jpolygon (new Polygon)]
                    (doseq [p (:points polygon)] (. jpolygon (addPoint (:x p) (:y p))))
                    jpolygon)))
  nil)

; ----------------------------------------------------------------------
; This sections defines helper functions.

; random-double :: Double
(defn random-double
  "Returns a double between -1.0 and 1.0."
  []
  (- (* 2 (rand)) 1))

; remove-item :: Sequence -> Integer -> Sequence
(defn remove-item
  "Returns a sequence without the n-th item of s."
  [s n]
  (cond
    (vector? s) (into (subvec s 0 n)
                (subvec s (min (+ n 1) (count s)) (count s)))
    (list? s) (concat (take n s)
                      (drop (inc n) s))))

; replace-item :: [a] -> Integer -> a -> [a]
(defn replace-item
  "Returns a list with the n-th item of l replaced by v."
  [l n v]
  (concat (take n l) (list v) (drop (inc n) l)))

; grab-pixels :: BufferedImage -> [Integer]
(defn grab-pixels
  "Returns an array containing the pixel values of image."
  [image]
  (let [w (. image (getWidth))
        h (. image (getHeight))
        pixels (make-array (. Integer TYPE) (* w h))]
    (doto (new PixelGrabber image 0 0 w h pixels 0 w)
      (.grabPixels))
    pixels))

; ----------------------------------------------------------------------
; This sections define the primitives of the genetic algorithm.

; program :: S-Expression -> Maybe Integer -> Maybe BufferedImage -> Program
(defn program [code fitness image] {:type :Program :code code :fitness fitness :image image})

; initial-program :: Program
(def initial-program (program '(fn [graphics]) nil nil))

; program-header :: Program -> S-Expression
(defn program-header [p] (take 2 (:code p)))

; program-expressions :: Program -> S-Expression
(defn program-expressions [p] (drop (count (program-header p)) (:code p)))

; mutate :: a -> Map -> a
(defmulti mutate :type)

; mutate :: Color -> Map -> Color
(defmethod mutate :Color [c settings]
  (let [dr (int (* (:red c) (random-double)))
        dg (int (* (:green c) (random-double)))
        db (int (* (:blue c) (random-double)))
        da (int (* (:alpha c) (random-double)))]
    (assoc c :red (max (min (- (:red c) dr) 255) 0)
             :green (max (min (- (:green c) dg) 255) 0)
             :blue (max (min (- (:blue c) db) 255) 0)
             :alpha (max (min (- (:alpha c) da) 255) 0))))

; mutate :: Point -> Map -> Point
(defmethod mutate :Point [p settings]
  (let [dx (int (* (:x p) (random-double)))
        dy (int (* (:y p) (random-double)))]
    (assoc p :x (max (min (- (:x p) dx) (:image-width settings)) 0)
             :y (max (min (- (:y p) dy) (:image-height settings)) 0))))

; mutate :: Polygon -> Map -> Polygon
(defmethod mutate :Polygon [p settings] 
  ; mutate-point :: Polygon -> Map -> Polygon
  (defn mutate-point [p settings]
    (let [n (rand-int (count (:points p)))]
      (update-in p [:points n] (fn [point] (mutate point settings)))))

  ; mutate-color :: Polygon -> Map -> Polygon
  (defn mutate-color [p settings] (assoc p :color (mutate (:color p) settings)))
  
  (let [roulette (rand-int 2)]
    (cond
      (= 0 roulette) (mutate-point p settings)
      (= 1  roulette) (mutate-color p settings))))

; mutate :: Program -> Map -> Program
(defmethod mutate :Program [p settings]
  ; add-polygon :: Program -> Map -> Program
  (defn add-polygon [p settings]
    (assoc p :code 
             (concat (:code p)
                     [(list 'draw-polygon
                            (first (nth (:code initial-program) 1))
                            (polygon
                              (color (rand-int 255) (rand-int 255) (rand-int 255) (rand-int 255))
                              (vec (map 
                                 (fn [n]
                                     (point 
                                       (rand-int (:image-width settings))
                                       (rand-int (:image-height settings))))
                                 (range 5)))))])
             :fitness nil :image nil))

  ; remove-polygon :: Program -> Map -> Program
  (defn remove-polygon [p settings]
    (let [n (rand-int (count (program-expressions p)))]
      (assoc p :code (concat (program-header p)
                             (remove-item (program-expressions p) n))
               :fitness nil :image nil)))

  ; mutate-polygon :: Program -> Map -> Program
  (defn mutate-polygon [p settings]
    (let [expressions (program-expressions p)
          n (rand-int (count expressions))
          target (nth expressions n)]
      (assoc p :code
               (concat (program-header p)
                       (replace-item expressions
                                     n
                                     (list (nth target 0)
                                           (nth target 1)
                                           (mutate (nth target 2) settings))))
               :fitness nil :image nil)))
  
  (let [polygon-count (count (program-expressions p))
        roulette (cond
                   (empty? (program-expressions p)) 4
                   (>= polygon-count (:max-polygons settings)) (rand-int 4)
                   :else (rand-int 5))]
    (cond
      (> 3 roulette) (mutate-polygon p settings)
      (= 3 roulette) (remove-polygon p settings)
      (= 4 roulette) (add-polygon p settings))))

; fitness :: Program -> Map -> Program
(defn fitness [individual settings]
  (if (:fitness individual)
    individual
    (let [gen-image (new BufferedImage (:image-width settings)
                                       (:image-height settings)
                                       BufferedImage/TYPE_INT_ARGB)
          src-pixels (:source-pixels settings)]
      (apply (eval (:code individual)) [(. gen-image (createGraphics))])
      (def gen-pixels (grab-pixels gen-image))
      (loop [i (int 0)
             lms (int 0)]
        (if (< i (alength gen-pixels))
          (let [src-color (new Color (aget src-pixels i))
                gen-color (new Color (aget gen-pixels i))
                dr (- (. src-color (getRed)) (. gen-color (getRed)))
                dg (- (. src-color (getGreen)) (. gen-color (getGreen)))
                db (- (. src-color (getBlue)) (. gen-color (getBlue)))]
            (recur (unchecked-inc i) (int (+ lms (* dr dr) (* dg dg) (* db db )))))
          (assoc individual :fitness lms :image gen-image))))))

; select :: [Program] -> Map -> [Program]
(defn select [individuals settings]
  (take (:select-rate settings)
        (sort-by :fitness
                 (pmap (fn [i] (fitness i settings))
                       individuals))))

; evolve :: Map -> Nothing
(defn evolve [settings]
  (loop [i 0
         population (list initial-program)]
    (let [fittest (select population settings)
          newborns (map (fn [i] (mutate i settings)) fittest)]
      ((:new-generation-callback settings (fn [a b])) i fittest)
      (when-not (= (first population) (first fittest))
        ((:new-fittest-callback settings (fn [a b])) i fittest))
      (recur (inc i) (concat fittest newborns)))))

; ----------------------------------------------------------------------
; This sections defines the graphical interface.

; main :: Nothing
(defn main []
  (def file-chooser (new JFileChooser))
  (doto file-chooser
    (.setCurrentDirectory (new File "."))
    (.showOpenDialog nil))
  
  (let [jframe (new JFrame "Fittest Program")
        fittest (atom (list initial-program))
        image (ImageIO/read (. file-chooser (getSelectedFile)))
        settings {:image-width (. image (getWidth))
                  :image-height (. image (getHeight))
                  :source-pixels (grab-pixels image)
                  :select-rate 1 :max-polygons 50
                  :new-fittest-callback (fn [i f]
                                            (swap! fittest (fn [o n] n) f)
                                            (. jframe (repaint)))}]
    (doto jframe
      (.setSize (. image (getWidth)) (. image (getHeight)))
      (.add (proxy [JPanel] []
        (paint [g]
          (doto g 
            (.setColor Color/white)
            (.fillRect 0 0 (. image (getWidth)) (. image (getHeight)))
            (.drawImage (:image (first @fittest)) nil 0 0)))))
      (.setVisible true))
    (evolve settings)))

(main)

One-Time Pet Project

2008-11-23T10:25:00.002-05:00

Boni-Asm

I just finished a new project. It's an Assembler that can easily support new architectures.

I wrote it in Python for ease of development. I chose Yapps as a Parser Generator because it is the most Pythonic Parser Generator. The Yapps Grammar language is pretty easy to pick up, even though the documentation is a bit incomplete. Yapps produces LL(1) parsers so I had to be a little creative.

Boni's best feature is the dynamicity of the code. The assembler doesn't depend on the format of the machine code being generated. That means Boni can be retargeted by changing one configuration file.

I had a lot of fun writing it. I don't know if it'll ever be useful though. At least the very least it will show how Yapps is used.

Anyway, have a look:
http://github.com/ynd/boni-asm/tree/master

Object Oriented Programming is not Procedural programming with structures

2008-07-19T08:08:00.003-04:00

For beginners.

Object Oriented Programming is not Procedural programming with structures. They are 2 different ways to think about programs. This is a brief explanation of why the two approaches are different.

Procedural model

Procedural programs are a series of steps. It is the most natural and widespread style of programming. The first HL language - Plankalkül - was a procedural language.

I don't know why Procedural programming is natural... but think about the first program you were taught. Nope, not "Hello World!". More like the program to tie your shoes, or the program to use the big boy's potty. The program they taught you was a series of sequential actions - a procedure.

We use Procedural programming all the time. Cooking recipes for example. Procedural programming is easier to grasp.

The OO model

Object Oriented programming shift the focus away from the procedure to the definition and combination of high-level constructs. These constructs are called Objects.

Instead of being too theoretic, I'll jump into an example to illustrate why this is a major paradigm shift.

Example: Procedural approach

I found a good snippet to illustrate how they differ. Many thanks to Martin Carel.


#!/usr/bin/env python
# Thanks to Martin Carel from http://dev-logger.blogspot.com/

import time
import urllib
from elementtree import ElementTree

feed_link = "http://feeds.feedburner.com/37signals/beMH"
title, published_date = "", ""
TITLE_PATH = ".//item/title"
DATE_PATH = ".//item/pubDate"

while True:
 feed = urllib.urlopen(feed_link).read()
 tree = ElementTree.fromstring(feed)
 fetched_title = tree.findtext(TITLE_PATH)
 fetched_published_date = tree.findtext(DATE_PATH)

 if title != fetched_title:
   print fetched_title, fetched_published_date
   title, published_date = fetched_title, fetched_published_date

 time.sleep(5 * 60)

Writing this procedure was straightforward. I simply listed the steps needed to get the desired result. I didn't spend a lot of time analyzing my problem.

How am I going to write the OO program?

First, I have to think about the task at hand. Analysis is important. You cannot have a good OO program without understand your problem space well.

For example, I have to identify what concepts of the problem domain I will model as objects. Then I will define the relations between objects and what operations will be possible on them.

Finally, I have to express the program as a combination of the building blocks I defined earlier.

Example: Hybrid functional/OO approach


#!/usr/bin/env python
# Thanks to Martin Carel from http://dev-logger.blogspot.com/

import time
import urllib
from elementtree import ElementTree

class Feed:
 class Entry:
   def __init__(self, elem):
     self.title = elem.findtext("./title")
     self.last_updated = elem.findtext("./pubDate")

   def __eq__(self, other):
     return True if self.title == other.title else False

 def __init__(self, url):
   self.url = url
   self.entries = []

 def fetch(self):
   feed = urllib.urlopen(self.url).read()
   tree = ElementTree.fromstring(feed)
   self.entries = [Feed.Entry(e) for e in tree.findall(".//item")]

 def has_recent_post(self):
   old = self.entries[:1]
   self.fetch()
   return old != self.entries[:1]

# High-level functionality
feed = Feed("http://feeds.feedburner.com/37signals/beMH")
while True:
 if feed.has_recent_post():
   print feed.entries[0].title, feed.entries[0].last_updated
 time.sleep(5 * 60)

Why is this better?

You know instantly what the program does by looking at the high-level functionality. That's because I was able to match my program to the problem definition by defining the right constructs.

The OO approach forces you to create black boxes. A black box is an abstract element that you can use through it's input and output without having to know it's implementation. Engineers use them everyday.

Black boxes reduce complexity dramatically. It's easy to reason about them. First, each has an isolated and simple function. Second, the interactions between black boxes are explicit. They also reduce complexity by restricting the number of possible interactions. And since a black box can be made of other black boxes, they can organize your program into neat and coherent layers.

Black box designs are easier to understand. You can even choose to know only the boxes and layers of boxes that matter to you. You cannot do the same with a procedural program because nothing is isolated into a component that you can understand independently. You always have to understand how everything interacts.

The OO program is easier to extend because it has well defined extension points. New functionality can be added by adding methods to the class. And it can be used by using the method from the main program.

Lastly, having high-level components allows you to perform high-level operations like late binding and reflection. This is the source of OOP's real power and prowess. It is a very wide and interesting topic so I won't cover that here.

The difference between smalltalk and the rest

2008-06-30T22:34:00.000-04:00

Smalltalk programs are ecosystems.

A program behaves like an ecosystem when the focus is put on run time - not compile time. This is a major shift.

People coming from static languages complain that Smalltalk doesn't have Netbeans, Eclipse or whatever. Smalltalk - and potentially other dynamic languages - has something different.

Smalltalk provides an environment where you can edit, run and analyze code in real time. Imagine being able to grow a program. Imagine being able observe it grow. Imagine being able to painlessly debug and analyze it. This is what it means to focus on run time.

This is hard to understand if you're coding in a glorified notepad.

When you're coding in Smalltalk your program is running persistently in the background. It is alive. Inspecting it is just a click away. When you create an object, you can actually right-click on it and get a list of it's methods. You can just as easily change the implementations of these methods. Without restarting anything.

It's a major cultural shift. Smalltalk programmers never fight the compiler, they spend their time debugging their programs. This is a different way of developing a program.

Don't trust me. Take 2 minutes and see something interesting.

Here's why dynamic languages are slow and how to fix it

2008-06-17T19:15:00.000-04:00

Dynamic languages are emerging as the Next Big Thing. They are known for making development faster, being more powerful and more flexible. Today, more and more people are using them in production environments. However, one problem stands in the way of mass adoption: SPEED. There is an urban legend that dynamic programs are way slower than their static counterparts. Here's my take on it.

Why are dynamic languages slow TODAY?

The purpose of a dynamic language is to have as few static elements as possible. The idea is that this offers more flexibility. For example, in Python method calls are never static. This means that the actual code that will be executed is known only at run time. This is what makes monkey patching possible. This is what allows you to have great unit testing frameworks.


# globals.py
A = 2


# main.py
from globals import A
A = []

Dynamic languages leave as many decisions as possible to run time. What is the type of A? You can only know for sure when the code runs because it can be changed at any point in the program.
The result is that it is hard to analyse dynamic languages in order to make optimizations. Compared to static languages - which offer plenty of opportunities for optimization - dynamic languages are hard to optimize. Thus their implementations are usually slow.

The problem with dynamic languages is that it isn't trivial to optimize an addition. You can hardly know what '+' will be binded to at runtime. You probably can't even infer the types of the operands. This is the result of mutation. In Python, almost everything is mutable. This leaves few information the compiler can rely on.

Does mutability hurt performance and why?

It can, depending on the case. Let me illustrate how by comparing the factorial function in C and Python. Don't think of this as a benchmark. This is just an example.

Compiling the factorial function in C with LLVM-GCC will generate efficient machine code.


// Factorial in C
int fac(int n) {
  if (n == 0) return 1;
  return n*fac(n-1);
}
int main(){
  return fac(30);
}


; Assembly generated by LLVM-GCC
_main:
  movl $1, %eax
  xorl %ecx, %ecx
  movl $30, %edx
.align 4,0x90
LBB1_1: ## bb4.i
  imull %edx, %eax
  decl %edx
  incl %ecx
  cmpl $30, %ecx
  jne LBB1_1 ## bb4.i
LBB1_2: ## fac.exit
  ret

The compiler was able to infer many properties from the source code. For example, it concluded that the fac function referenced in main was the fac defined at compile time. This allowed the compiler to replace the assembly call instruction with fac's code. The function was then specialized for the call site and thanks to static typing, the compiler was able to transform each arithmetic operations into direct machine instructions.
Can you notice the other optimizations?

Let's look at how CPython executes the factorial.

# fac.py
def fac(n):
  return 1 if n == 0 else n * fac(n -1)
fac(30)

First, fac.py is parsed and translated to bytecode instructions. Then the bytecode instructions are interpreted by the CPython Virtual Machine.


# CPython Bytecode for fac.py
# Think of this as an interpreted language which Python is translated into.
# See http://docs.python.org/lib/bytecodes.html
# fac
11           0 LOAD_FAST                0 (n)
    3 LOAD_CONST               1 (0)
    6 COMPARE_OP               2 (==)
    9 JUMP_IF_FALSE            7 (to 19)
   12 POP_TOP   
   13 LOAD_CONST               2 (1)
   16 JUMP_FORWARD            18 (to 37)
>>   19 POP_TOP   
   20 LOAD_FAST                0 (n)
   23 LOAD_GLOBAL              0 (fac)
   26 LOAD_FAST                0 (n)
   29 LOAD_CONST               2 (1)
   32 BINARY_SUBTRACT
   33 CALL_FUNCTION            1
   36 BINARY_MULTIPLY
>>   37 RETURN_VALUE
# main
14           0 LOAD_GLOBAL              0 (fac)
    3 LOAD_CONST               1 (30)
    6 CALL_FUNCTION            1
    9 RETURN_VALUE

CPython could not inline the call to fac because this would violate the language's semantics. In Python, fac.py could be imported at run time by another module. It cannot inline fac into main because a sub-module could change the binding of fac and thus invalidate main. And because main doesn't have it's own copy of fac, the code cannot be specialized for this particular call. This hurts because it would be very beneficial to specialize the function for an integer argument.

Notice that there are no references to machine addresses. CPython adds a layer of indirection to access every object in order to implement the dynamism of Python. For example, main is found by a look-up in a table. Even constant numbers are found through look-ups. This adds a significant amount of slow memory read/writes and indirect jumps.

Python doesn't even contain any explicit hints you can give to help the compiler. This makes the problem of optimizing Python non-trivial.

What about type inference?

The problem of type inference in dynamic languages remains unsolved. Type inference is a form of static analysis. Static analysis is the analysis of source code at compile time to derive some "truths" about it. You can imagine how this falls short for dynamic languages.

Michael Salib attempted to solve this problem with StarKiller. The compiler manages type inference by collecting more information than usual and using the CTA algorithm. Instead of compiling each module separatly, like most compilers, the whole program is analyzed and compiled in one pass. The knowledge of the complete program opens the door to more optimizations. The fac function of the previous example can be specialized by Starkiller because it knows how it will be used.

Though the work seems very promising, it has three major flaws. First, the compiler accepts only a subset of the Python language. Advanced functions like eval and exec aren't supported. Second, whole-program analysis doesn't scale with bigger projects. Compiling 100,000 LOC would take a prohibitive amount of time. Third, the compiler violates Python's semantics by doing whole-program analysis. Like most dynamic languages, the import mechanism of Python is done at runtime. The language doesn't guarantee that the module available at compile time is the same as the module available at run time.

Read this for more.

What about VMs?

Virtual Machines are a natural fit for dynamic languages. VM with JIT compilers are able to optimize a dynamic program without having to guess it's behavior in advance. This saves a lot of heavy lifting. Programs are optimized simply by observing their behavior while they run. This is known as dynamic analysis. For instance, noticing that fac is often called with an integer argument, the VM could create a new version of that function specialized for integers and use it instead.

In my opinion Virtual Machines are not a long-term solution.

Self-hosting a VM is prohibitive.
A VM sets a limit on the kinds of programs you can make. No Operating Systems, no drivers, no real-time systems, etc.
Optimizing a program run through a VM is hard because you cannot know exactly what is going on behind the hood. There are many layers and many corners where performance can slip out.

For most projects, these problems aren't an issue. But I believe their existence would restrain dynamic languages. They are enough to prevent a dynamic language from being a general purpose tool. And that is what people want: no restrictions, no surprises, pure freedom.

How would I make them faster?


from types import ModuleType
import re
declare(re, type=ModuleType, constant=True, inline=True)

A compiler helped by Static Annotations is the way to go. Please don't put all static annotations in the same bag. Static annotations like type declarations don't have to be as painful as JAVA's. Annotations are painful in Java because they are pervasive and often useless. They restrict the programmer. Programmers have to fight them. Annotations can be just the opposite. They can give the programmer more freedom! With them, programmers can set constraints to their code where it matters. Because they have the choice, static annotations become a tool that offers MORE flexibility.

A savvy programmer could reduce the dynamism of his code at a few key points. Just enough to allow type inference and the likes to do their job well. Optimizing some code would usually just become a matter of expressing explicitly the natural constraints that apply to it.


# Just an example.
def fac(n):
  assert type(n) in [float, int]
  return 1 if n == 0 else n * fac(n -1)

There are a many ways to implement static annotations in dynamic languages. I believe the flexibility of dynamic languages can allow static annotations to be very convenient. How would you do it?

What everybody ought to know about RESEARCH

2008-06-14T14:19:00.000-04:00

Why do so few scientists make significant contributions and so many are forgotten in the long run?

Richard Hamming asked himself and some of the greatest scientists of the 20th century this very question. In his classic "You and Your Research" talk, he relates what led him to the discovery of the Hamming Code and the Hamming Distance among other things. The following is my humble attempt to summarize it to make it more accessible.

1) Research is not just a matter of luck. Consider Einstein for example. Can luck explain that he discovered Special Relativity and - 10 years later - the General Theory of Relativity? One after another, you see people setting a pattern of Great Science.

2) Successful scientists are courageous. Once you get your courage up and believe that you can do important problems, then you can. If you think you can't, almost surely you are not going to. Research is not easy. If you always give up early on, you won't get anywhere. Think and continue to think any under circumstance.

3) Don't work on big problems right away. Research is hard. Expect to be paralyzed if you skip stepping stones to work a big problem. Build some background knowledge by working on smaller problems first.

4) Work hard. Given two people of approximately the same ability with one working 10% more than the other, the latter will outproduce the former more than twice over the course of a lifetime. The more you know, the more you learn; the more you learn, the more you can do; the more you can do, the more the opportunity.

5) It's important to cultivate ambiguity. Believe in you theory enough to push forward. Doubt it enough to notice the flaws and the errors. If you don't believe, you will never get started. If you don't doubt, you may lose a lot of time working on something wrong. Noticing and fixing flaws will make your theory stronger.

6) You have to want to do something significant. To quote Pasteur, "Luck favors the prepared mind". You can't win Lotto without participating. If you never try to work on anything significant, the odds are against you. Newton used to say "If others would think as hard as I did, then they would get similar results". You have to try.

If you enjoyed this, I recommend the original talk.

The secret of the LLVM C bindings

2008-06-13T18:10:00.000-04:00

Ever wanted to use LLVM from C? Can't find any documentation? Welcome.

Since I'm considering retargeting CLISP'S JIT Compiler I've been experimenting with LLVM. LLVM is an optimizing compiler for a virtual instruction set. Technically, it is very interesting. And this year, with Apple and CLANG in the game, it seems to be here to stay.

A Factorial in C with LLVM
Let's make a factorial function using the C bindings of LLVM 2.3+.
The function we will describe in LLVM instructions is illustrated below.

I inserted the phi instruction manually to make things more interesting.
Paste this in your favorite editor and save it as "fac.c":


// Headers required by LLVM
#include <llvm-c/Core.h>
#include <llvm-c/Analysis.h>
#include <llvm-c/ExecutionEngine.h>
#include <llvm-c/Target.h>
#include <llvm-c/Transforms/Scalar.h>


// General stuff
#include <stdlib.h>
#include <stdio.h>


int main (int argc, char const *argv[])
{
  char *error = NULL; // Used to retrieve messages from functions
  LLVMModuleRef mod = LLVMModuleCreateWithName("fac_module");
  LLVMTypeRef fac_args[] = { LLVMInt32Type() };
  LLVMValueRef fac = LLVMAddFunction(mod, "fac", LLVMFunctionType(LLVMInt32Type(), fac_args, 1, 0));
  LLVMSetFunctionCallConv(fac, LLVMCCallConv);
  LLVMValueRef n = LLVMGetParam(fac, 0);

  LLVMBasicBlockRef entry = LLVMAppendBasicBlock(fac, "entry");
  LLVMBasicBlockRef iftrue = LLVMAppendBasicBlock(fac, "iftrue");
  LLVMBasicBlockRef iffalse = LLVMAppendBasicBlock(fac, "iffalse");
  LLVMBasicBlockRef end = LLVMAppendBasicBlock(fac, "end");
  LLVMBuilderRef builder = LLVMCreateBuilder();

  LLVMPositionBuilderAtEnd(builder, entry);
  LLVMValueRef If = LLVMBuildICmp(builder, LLVMIntEQ, n, LLVMConstInt(LLVMInt32Type(), 0, 0), "n == 0");
  LLVMBuildCondBr(builder, If, iftrue, iffalse);

  LLVMPositionBuilderAtEnd(builder, iftrue);
  LLVMValueRef res_iftrue = LLVMConstInt(LLVMInt32Type(), 1, 0);
  LLVMBuildBr(builder, end);

  LLVMPositionBuilderAtEnd(builder, iffalse);
  LLVMValueRef n_minus = LLVMBuildSub(builder, n, LLVMConstInt(LLVMInt32Type(), 1, 0), "n - 1");
  LLVMValueRef call_fac_args[] = {n_minus};
  LLVMValueRef call_fac = LLVMBuildCall(builder, fac, call_fac_args, 1, "fac(n - 1)");
  LLVMValueRef res_iffalse = LLVMBuildMul(builder, n, call_fac, "n * fac(n - 1)");
  LLVMBuildBr(builder, end);

  LLVMPositionBuilderAtEnd(builder, end);
  LLVMValueRef res = LLVMBuildPhi(builder, LLVMInt32Type(), "result");
  LLVMValueRef phi_vals[] = {res_iftrue, res_iffalse};
  LLVMBasicBlockRef phi_blocks[] = {iftrue, iffalse};
  LLVMAddIncoming(res, phi_vals, phi_blocks, 2);
  LLVMBuildRet(builder, res);

  LLVMVerifyModule(mod, LLVMAbortProcessAction, &error);
  LLVMDisposeMessage(error); // Handler == LLVMAbortProcessAction -> No need to check errors


  LLVMExecutionEngineRef engine;
  LLVMModuleProviderRef provider = LLVMCreateModuleProviderForExistingModule(mod);
  error = NULL;
  LLVMCreateJITCompiler(&engine, provider, &error);
  if(error) {
    fprintf(stderr, "%s\n", error);
    LLVMDisposeMessage(error);
    abort();
  }

  LLVMPassManagerRef pass = LLVMCreatePassManager();
  LLVMAddTargetData(LLVMGetExecutionEngineTargetData(engine), pass);
  LLVMAddConstantPropagationPass(pass);
  LLVMAddInstructionCombiningPass(pass);
  LLVMAddPromoteMemoryToRegisterPass(pass);
  // LLVMAddDemoteMemoryToRegisterPass(pass); // Demotes every possible value to memory
  LLVMAddGVNPass(pass);
  LLVMAddCFGSimplificationPass(pass);
  LLVMRunPassManager(pass, mod);
  LLVMDumpModule(mod);

  LLVMGenericValueRef exec_args[] = {LLVMCreateGenericValueOfInt(LLVMInt32Type(), 10, 0)};
  LLVMGenericValueRef exec_res = LLVMRunFunction(engine, fac, 1, exec_args);
  fprintf(stderr, "\n");
  fprintf(stderr, "; Running fac(10) with JIT...\n");
  fprintf(stderr, "; Result: %d\n", LLVMGenericValueToInt(exec_res, 0));

  LLVMDisposePassManager(pass);
  LLVMDisposeBuilder(builder);
  LLVMDisposeExecutionEngine(engine);
  return 0;
}

Compiling the code
Generating the object file is a no-brainer:


gcc `llvm-config --cflags` -c fac.c

Linking is a little trickier. Even though you are writing C code, you have to use a C++ linker.


g++ `llvm-config --libs --cflags --ldflags core analysis executionengine jit interpreter native` fac.o -o fac

All set!