Thursday, November 02, 2006

Anyone interested in GCC-CIL?

Is anyone out there interested in GCC-CIL? It might be possible to prod me into finishing it. If you're interested, send a note to <jey#kottalam!net> with the proper symbols substituted.

If you're wondering what happened to Summer of Code 2005: I broke my arm in August and couldn't spare any keystrokes to update the blog. In the end, I collected the $4500 (thanks Google!), had a lot of fun (thanks Miguel!), then quit college and got a job.

Sunday, August 07, 2005

Crunch Time

No, I'm not dead. I've just been a bit busy with my day job.

I'm buckling down into crunch mode now. TODO list for the immediate future:
  • Implement passing and returning large types by value

  • Generate pinvoke declarations for native code interop

    • Prerequisite: work out details of a magical mechanism to fake the types of the pinvoked functions' arguments

  • Fix/Implement initialized variables


Three weeks remaining.

Thursday, July 21, 2005

Types and linkers

The compiler is coming along well. Lately I've been hacking on aggregate types and the linker, along with the usual work on the CIL instruction generation.

Aggregate Types
For the first stage, I've just implemented aggregate types as chunks of memory of the appropriate size. That is, both char foo[28]; and struct X { int i; float f; char s[20]; }; are simply represented in the IL as an object of type "S28", or 28-byte type. I am planning to eventually emit the equivalent CIL types for each 'native' type. This would allow a number of advantages over the current technique: we can probably take advantage of CIL function overload resolution (currently I simply use the mangled names), and we can allow for proper interoperability with other CIL languages. For example, allowing C structs and C++ classes to be used from C#, VB, etc. I had originally planned to implement this from the start, but later realized that it would be difficult to implement correctly with my poor understanding of GCC and the CIL type system, and decided that my time would be better spent getting everything working first.

The Linker
Last week I wrote the first iteration of the linker in Python. The compiler emits "linker annotations" with information on types, methods, strings, global variables, and references to the aforementioned. The linker itself is a fairly simple python script which checks the references, removes duplicates, and rewrites its input. In the near future I plan to add support to the linker to automatically generate P/Invoke declarations for calls to native methods, and do some fixups on function calls where the prototype is not known ahead of time. Once the linker functionality stabilizes a bit, I think I'll reimplement it in C#.

Sunday, July 03, 2005

"OK, so pointers work more or less... I think?"

Today's post's title is from the log message for revision 9. I implemented loads and stores through pointers.

Also did some restructuring and cleaning up of the code and generated code, now that I have a better idea of things thanks to David Hanson's paper on lcc.NET.

I should get rid of those fixed size buffers sometime soon.

Wednesday, June 29, 2005

Progress so far

I began hacking on GCC-CIL shortly after the Summer of Code was announced, and right now enough has been implemented to compile some programs, but the backend is still far from complete. For an example, here's a C program I submitted with my Google Summer of Code application that solves the N-Queens Problem for N=8 (standard chessboard), and the corresponding output from the June 14 snapshot of GCC-CIL at optimization levels 0 and 1: nqueens.c nqueens.0.s nqueens.1.s You'll need ilasm from mono, .NET or DotGNU to assemble and run the program. It was extremely satisfying to see a "real" program actually working. :-)

Hello, World!

Hello, world. My name is Jeyasankar "Jey" Kottalam, and I'm working on a CIL backend for GCC for The Mono Project as part of Google's Summer of Code.

"What??"
GCC is a retargetable compiler (loosely, software that "translates" high-level computer programs into machine-level computer programs) developed by the GNU Project. GCC takes programs written in a variety of languages as input, and produces assembler output for any of dozens of target architectures. My task is to add support for generating code for the ECMA Common Language Infrastructure, more commonly known by the popular implementations of this standard, Microsoft's .NET or Ximian's Mono. The low-level IL used for expressing programs for the ECMA CLI is called "CIL", or the Common Intermediate Language. My modifications to GCC allow it to emit CIL.

This is somewhat unique because the CLI is a stack architecture, and the target backend infrastructure in GCC is designed for register machines. My current approach is to generate CLI instructions directly from the optimized GIMPLE trees, which is one of the internal representations used by GCC. Traditionally, GCC expands these GIMPLE trees into RTL (Register Transfer Langauge) instructions, and further optimization, register allocation, and code generation is performed. In other words, the current approach bypasses nearly all RTL-related portions of GCC. Some GCC hackers suggested that it may be possible to use RTL up to the register allocation stage, and emitting CIL from the RTL instructions at that point... but that's for later.

"OK, so what?"
This [theoretically] allows programmers to easily port or target existing code written in C, C++, Objective C, FORTRAN, Ada, Pascal, D, or anything else that has a GCC front end to the ECMA CLI platform.