Monday, March 1, 2010

Source code as data

Von Neumann architecture opened an era of modern programming where program instructions became data. Later, computer languages, compilers and source code were invented in order to help programmers deal with the growing complexity and size of programs. Today, 65 years later, people are able to manage terabytes of data, literally build clouds out of computers, but still struggle with programs with millions of lines of code which are far beyond a human's comprehension.

In the programming world, source code is treated like sacred knowledge, stored in text files, written in programming languages and understandable mostly by authors. Most difficulties in code comprehension and research stem from the fact that code is not considered data (data as in a database). Every programming language, technology or framework has it's own syntax and
program structure. Source code search is implemented by generic text search engines, which allow searching only by terms or names of classes, methods or so. It is impossible to formulate a precise query using names, program structure and references in a generic way. For example, generic text search engine cannot find a list of all functions called from said functions, recursively called functions etc.

It’s clear that exponential growth of programs size can be managed only by applying principles of structured data organization for source code. Program artifacts must be indexed and stored in database in technology-independent, generic form. Such program database must allow querying and modifying artifact in a transactional way. Generic form of artifacts will allow transferring algorithms, business logic between programming languages and technologies.

No comments:

Post a Comment