Some email for Ra, the R just-in-time compiler

Some Ra email           Ra version 1.0.7
-------------

These are fragments of representative email from various people with
my edited replies. It's a prototype FAQ.

If you are interested, there is more documentation, also informal, in
the NOTES.txt file included in the release tar file.


----------------------------------------------------------------

> what do you compile into and what architectures do you support?

The Ra web page has some details.  The code currently gets compiled to
instructions for an interpreter in C e.g. "x <- 12 + 34" becomes,
assuming x is of length 1,

    push_r          operand double length 1 [1] 12
    push_r          operand double length 1 [1] 34
    add_r1_r1       result double length 1
    assign_r1_r1    operand symbol "x"

So all architectures supported by R are supported by the jitter.

The current version of the jit interpreter is very slow but that can
be "easily" fixed and there are more important things to take care of
first.  There is a clean division between the jitted code _generator_
and the jitted code _executor_.  This means any kind of code executor
could be used, including native machine code, if someone has the time
to write that.  The next version of the interpreter will used threaded
code, but there will be a few more Ra releases before that happens.


> But what is your real focus?  'Simple' loops?  Doesn't vectorisation
> always win?  And are you restricted in terms of data types?

The principal objective is to make arithmetic faster.  Within that,
the project is open ended.

Vectorization is often the way to go but there are still lots of
things in packages that are done in C or Fortran that would have been
done in R if R was faster.  We know that ideally you shouldn't have to
leave R.  The jitter takes us closer to the ideal.

Only a restricted set of data types is currently supported: logical,
integer, and double vectors.  But that covers a lot of ground for the
type of code where you say "I wish this could be faster in R so I
don't have to write it in C".

There is a compelling reason why the jitter should not be too
comprehensive --- maintenance.  Code that compiles a limited subset of
R is robust against changes from version to version of R.  Especially
since the jitter compiles a subset of R (arithmetic expressions and
loops) which are a part of R that seldom (never?) changes.


> writing a good JIT requires a lot more work, e.g.
> you need to know which functions have side effects and which don't so
> it's quite a big task (e.g. tagging all functions in base etc.). Do
> you have any plans for those issues? Do you support method dispatch? I
> think the idea if JIT for R is quite intriguing, but gave up on the
> complexity - it works fine for test examples, but real-life problems
> are quite hard...

The jitter works by keeping a trace of what eval() does and thus
automatically sidesteps a lot of the complexity you mention.

For example, is someone redefines sin() for whatever weird reason,
eval() will call the new sin and the jitter will thus generate code
for the new sin.  Easy peasy.  But not easy with static analysis.

Likewise with side effects -- the side effects get jitted just like
anything else evaluated by eval().

I've taken that tack of only compiling arithmetic code in R loops.
That simplifies the implementation immensely and will give very useful
results until we consider something more ambitious.

Here are some line counts:

$ wc -l ../main/jit.c ../main/jitops.c ../main/jitlabels.c \
          ../include/jit.h ../library/ra/R/jit.R \
          ../../ra/tests/test-jit.R

  2482 ../main/jit.c
  1598 ../main/jitops.c
   110 ../main/jitlabels.c
   237 ../include/jit.h
    23 ../library/ra/R/jit.R
  3313 ../../ra/tests/test-jit.R  # testing code
  7763 total

In comparison, Python's Psyco has 32000 lines of .c and .h, and 9000
lines of .p*.


> Unfortunately your example below doesn't say anything - what do you
> mean by 30% - for what na,nb, mode of a/b etc.?

The convolution code is the example in the R extensions document i.e.
a, b and ab are double vectors.  The 30% is for any na and nb above
about 100.  I was intentionally vague in my initial email to r-devel
because the time figure is really just a transient figure for the
current state of the jitter.  I wanted to pique people's interest
without a lengthy email.  I've subsequently added a timing chart to
the Ra web page.


> Interesting work -- how does it compare to the in-progress
> stuff by Luke ?

There is some documentation about the relationship of Luke's stuff to
the jitter on the Ra web page.  Briefly, the two projects approach
the speed problem from two different ends.  The fastest results would
combine Luke's compiler and the jitter.


> I wonder what results you get on on Luke's examples.

There are now some timing results on the Ra webpage.

We can expect that jitted code will be faster than code compiled by
Luke's compiler.  Jitted code simply has a lot less to do than code
that has to call findVar(), alloc memory, check types at run time,
etc.  But once again, remember that only arithmetic code in loops get
jitted.  Luke's compiler is much more comprehensive.


> Nice site, esp the reading list is impressive and I like the fact that
> you mention my favourite tool RCpp by Dominick. Do you use it?  Earth
> is all in C, and it appears jit is the
> same. No C++ love at your end? ;-)

I haven't needed to use RCpp yet but I looked at it when I was
considering different approaches to speed up R.

The language I usually use is C++ (although I have no special love for
the language) but I wanted the jitting C code to be compilable with no
new compiler flags in the standard R makefiles.


> ... and the C++ style comments ...

My use of // comments is non-standard for R's C code but supported by
the gcc compiler with no new flags in the R makefiles.  I use /* */ in
existing files i.e. I respect the coding style of the original author.
In fact my general approach is "minimum force" when I make changes to
existing code.

There is an apparent contradiction in the R docs (but maybe not):

R extensions section 1.7 says: R is still used on platforms where the C
compiler does not accept C++/C99 comments (starting //).

R internals section 4 says: The following tools can "safely be
assumed" for R extensions.  An ISO C99 C compiler...  Packages will be
more portable if written assuming only C89, but this should not be
done where using C99 features will make for cleaner or more robust
code.

BTW gnuwin32/malloc.c has some // comments but I think they may be in code
that is #if'ed out.


> Many years ago (10+ at least) there was an R/S utility around called
> Scompile by which one could take a whole function and compile it. I've
> attached as a zip file; just change the extension. Maybe you can use
> that for something in your work?

Thanks for the code, I will take a close look at it.  Scompile is
quite limited as far as I remember.



www.milbo.users.sonic.net
4 Feb 2008
To Ra homepage