The pedestrian technology underneath the hype

1 day ago 3

We argue that, underneath all the large-language model hype, there is a pedestrian technology that has the potential to automate a class of problems that classical software can never do.

To study the capabilities of large-language models in the context of writing code, we study two extreme cases. First, the application to a critical project with high cognitive complexity, consisting of 40 million lines of code, and second, the application to a simple from-scratch toy project.

For the first study, I tried incorporating the technology into everyday-contributions to LLVM. The suggestions have over a 95% reject-rate, and the good suggestions appear when you're editing one instance of a mechanical pattern, and want to change all instances.

For example, consider the following transformation:

if (match(I, m_c_Xor( m_CombineOr(m_ZextOrTrunc(m_Specific(P1)), m_Specific(P1)), m_CombineOr(m_ZextOrTrunc(m_Specific(P1)), m_Specific(P2)))))

to:

if (match(I, m_c_Xor(m_CastOrSelf(m_Specific(P1)), m_CastOrSelf(m_Specific(P2)))))

This can, in theory, be done by a find-and-replace with a regex, but nobody ever does that for a couple instances.

As another example, consider updating argument names and corresponding comments like:

/// Checks that \p L and \p R are used together in an XOR in the use-def chain /// of \p SI's condition, ignoring any casts. The purpose of this function is to /// ensure that LHSAux from the SimpleRecurrence is used correctly in the CRC /// computation. We cannot check the correctness of casts at this point, and /// rely on the KnownBits propagation to check correctness of the CRC /// computation. static bool isConditionalOnXorOfPHIs(const SelectInst *SI, const PHINode *L, const PHINode *R, const Loop &Lp);

to:

/// Checks that \p P1 and \p P2 are used together in an XOR in the use-def chain /// of \p SI's condition, ignoring any casts. The purpose of this function is to /// ensure that LHSAux from the SimpleRecurrence is used correctly in the CRC /// computation. We cannot check the correctness of casts at this point, and /// rely on the KnownBits propagation to check correctness of the CRC /// computation. static bool isConditionalOnXorOfPHIs(const SelectInst *SI, const PHINode *P1, const PHINode *P2, const Loop &L);

A find-and-replace would match the L in the Loop argument, erroneously replacing it with P1.

You could say that all the automation boils down to a smart and automatic find-and-replace in this study, but it is important to note this class of problems can never be automated by classical software.


For the second study, I tried applying the technology to writing a tree-sitter based LLVM IR parser. The entire task is a mechanical chore of reading docs and ample examples in the test suite, and encoding the knowledge in the parser. To study the limitations, I tried generating the entire parser using a prompt, but the result was entirely unusable. When writing the parser from a clean-slate, the suggestions were actually quite good. Consider the auto-complete with the following snippet:

binary_instruction: ($) =>

to:

binary_instruction: ($) => seq($._binary_op, $._type, $.ssa_value, $.ssa_value), _binary_op: () => token( choice( "add", "sub", "mul", "div", "mod", "shl", "shr", "and", "or", "xor", ), ), call_instruction: ($) => seq( optional("tail"), "call", $._type, $.atname, "(", optional($.fn_args), ")", ), compare_instruction: ($) => seq($._compare_op, $._type, $.ssa_value, $.ssa_value), _compare_op: () => token( choice( "eq", "ne", "slt", "sgt", "sle", "sge", "ult", "ugt", "ule", "uge", ), ), cast_instruction: ($) => seq($._cast_op, $._type, $.ssa_value, optional(seq("to", $._type))), _cast_op: () => token( choice( "trunc", "zext", "sext", "fptrunc", "fpext", "fp_to_ui", "fp_to_si", "uitofp", "sitofp", "ptrtoint", "inttoptr", "bitcast", ), ), extract_instruction: ($) => seq(token("extractvalue"), $._type, $.ssa_value, "[", $.ssa_value, "]"), insert_instruction: ($) => seq( token("insertvalue"), $._type, $.ssa_value, $.ssa_value, "[", $.ssa_value, "]", ), load_instruction: ($) => seq(token("load"), $._type, $.ssa_value), phi_instruction: ($) => seq(token("phi"), $._type, $.ssa_value, $.ssa_value), store_instruction: ($) => seq(token("store"), $._type, $.ssa_value, $.ssa_value), switch_instruction: ($) => seq(token("switch"), $._type, $.ssa_value, $.ssa_value), branch_instruction: ($) => seq(token("br"), $._type, $.ssa_value), return_instruction: ($) => seq(token("ret"), $._type, $.ssa_value), unreachable_instruction: ($) => token("unreachable"),

which is actually a smarter version of a copy-paste with some knowledge from the internet. Of course the code needs editing, but this is the kind of copy-pasting you'd normally do before making edits.

The technology was also able to generate 300 tests to exercise the parser, and the tests needed little tweaking to pass: this task can be boiled down to copying reduced-case LLVM IR from the test-suite, and matching them with parser nodes. Of course the generated tests aren't high-quality, but about half the tests are usable as-is, and a lot of tedium has been eliminated.

To conclude, the technology is at alpha-stage, as the automation comes at the cost of putting up with bad visual feedback and fighting the auto-complete nearly all the time, but promises a new class of mechanical automation, whose utility is higher in small codebases following mechanical patterns.

Read Entire Article