<?xml version="1.0" encoding='ISO-8859-1'?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">

<article id="tree-hacking">
<artheader>
	<title>The A,B of GCC tree hacking</title>

	<authorgroup>
		<author>
			<firstname>John</firstname>
			<surname>Levon</surname>
			<affiliation>
				<address><email>levon@movementarian.org</email></address>
			</affiliation>
		</author>
	</authorgroup>

	<copyright>
		<year>2003</year>
		<holder>John Levon</holder>
	</copyright>

	<legalnotice>
	<para>This document can be freely translated and distributed. It's released
	under the LDP License.</para>
	</legalnotice>

	<keywordset>
		<keyword>GCC</keyword>
		<keyword>tree</keyword>
		<keyword>compiler</keyword>
	</keywordset>
</artheader>

<chapter id="introduction"><title>Introduction</title> 
 
<para>
I made a rather lame and simple patch to GCC 3.4 CVS to detect always-false
expressions, as in <computeroutput>if (!ua &amp; 0x4)</computeroutput>. Whilst
the change was relatively simple, even for some-one as clueless as I am, I found
a couple of things quite awkward and not well-documented. So, I wrote this brief note.
</para>
</chapter>

<chapter id="building"><title>Getting and building the source</title>

<para>
This is obviously the first step. You want to build current CVS and install it so you have
a baseline to work against when something goes wrong. Follow the instructions for 
<ulink url="http://gcc.gnu.org/cvs.html"> getting the CVS</ulink> and build it as
<ulink url="http://gcc.gnu.org/install/">described</ulink>. In my case, I did :
</para>
<para>
<programlisting>
cvs co -r gcc-3_3-branch -d gcc-3.3 gcc
mkdir gcc/obj &amp;&amp; cd gcc/obj
../configure --prefix=/usr/local/gcc-3.3cvs
make bootstrap
make install
cd ../ &amp;&amp; rm -rf obj &amp;&amp; mkdir obj &amp;&amp; cd obj
../configure --prefix=/usr/local/gcc-3.3mine
</programlisting>
</para>

<para>
Note that I was working on the 3.3 branch, because
at the  time of writing 3.4 is unusable.
Now I have a GCC installed in /usr/local/gcc-3.3cvs I haven't touched, and I'm free
to start hacking. This was a stupid way of doing it: note the comment about
<command>make restrap</command> below.
</para>

</chapter>

<chapter id="frontend"><title>Hacking the frontend code</title>

<para>
OK, so here I was, with no idea on where to start looking. So I just nosied around.
<filename>gcc</filename> contains most of the "interesting" code for what I needed.
<filename>gcc/cp</filename> is the C++ frontend, which shares quite a lot of code with
the C frontend which is directly in <filename>gcc</filename>.
</para>
<para>
First thing to remember is when making a change, you should build with <command>make restrap</command>.
That  way you won't have to bootstrap all over again.
</para>
<para>
Remember the case I wanted to fix was <computeroutput>if (!ua &amp; 0x4)</computeroutput>
not giving a warning. This expands to always-false, and  the author undoubtedly meant
<computeroutput>if (!(ua &amp; 0x4))</computeroutput>. So, I guessed that the code related to
binary operations might be relevant. I chose a warning about binary operations (namely,
"comparison between pointer and integer") and grepped for it. I found it inside the
function <function>build_binary_op()</function> inside <filename>c-typeck.c</filename>.
</para>
<para>
This referred to two "codes". A bit of grepping and reading code showed me that I needed
to deal with the "tree" concept as described (all too briefly) in the GCC internals manual.
A tree describes (part of) an expression, types, or values. Examples are the literal constant 0,
COND_EXPR (a ? b : c), and the integer type. The  header file <filename>tree.h</filename>
contains a lot of tree-related functions and definitions, and there's some useful comments there
too.
</para>
<para>
You can get the code of a tree with the helper <function>TREE_CODE(tree)</function>. This returns an enum with one
of a set of codes. Expressions have codes like <constant>LT_EXPR</constant>
for a binary &lt;= expression. I soon found out that constants are represented similarly,
with codes like <constant>INTEGER_CST</constant>.
</para>
<para>
Every tree (or sub-tree) also has a <emphasis>type</emphasis>, as distinct from its code.
You can get the type of a tree using
<function>TREE_CODE(TREE_TYPE(tree))</function>. For example, a <constant>LT_EXPR</constant>
tree will have a type of <constant>BOOLEAN_TYPE</constant>.
</para>
<para>
There are undoubtedly many caveats to understanding how trees work, and I've  only looked at a tiny subset.
But here's one - <constant>NOP_EXPR</constant>. I was encountering this expression in
the C++ frontend in the code <computeroutput>bool b; if (!b &amp; 0x4) ..</computeroutput>.
The tree looked a bit like <computeroutput>&lt;bit_and_expr&lt;nop_expr&lt;truth_not_expr&lt;var_decl(bool)&gt;&gt; &amp; &lt;integer_cst(0x4)&gt;&gt;&gt;</computeroutput>.
What's going on here is that the "!b" is getting wrapped inside a <constant>NOP_EXPR</constant>,
which essentially means "no conversion code is needed" (because a boolean value can become
an integer one, as required by the bit-wise and, without anything special needing doing).
As it turned out, this case needed special casing in my eventual patch.
</para>
<para>
The "tree" type is just a pointer, and every tree is garbage-collected. Some helpers
like <function>STRIP_NOPS</function> (which looks below any <constant>NOP_EXPR</constant>s
in the given tree) merely alter the pointer to a lower level in the tree structure. The actual
tree is a simple "C object orientation" struct, where each sub-class has a common prefix
of values, and its class is defined by the tree code. I was finding all this out in a very
laborious manner until a developer on IRC pointed me to <function>debug_tree()</function>,
an extremely handy function that prints out a text representation of a tree. You can use
this with breakpoints in GDB to see exactly what is happening simply by calling
<computeroutput>call debug_tree(op1)</computeroutput> or whatever (remember, the "gcc" executable
is just a frontend, you can invoke the actual compilers in GDB directly: they are called
cc1 and cc1plus, and reside inside <filename>prefix/lib/gcc-lib/platform/version/</filename>).
</para>
<para>
I found the helper <function>integer_onep</function> (note the LISPy heritage ...) which
tells me if a tree evaluates to constant 1. Along with <function>truth_value_p()</function>,
and the check for <constant>NOP_EXPR</constant> as mentioned, my patch was good to go !
</para>
</chapter>

<chapter id="submitting"><title>Making the fix into a patch</title>

<para>
Now, I'd done the work, and even found a bug with it (inside the Linux kernel's pnpbios code).
I may as well submit it and see if it's good enough to go in. There are two main  requirements
here: testsuite entries, and documentation. Documentation was easy - edit the <filename>
gcc/.texi</filename> files to mention the new warning. The testsuite changes were slightly
less obvious, but the changes basically amounted to adding <filename>Wboolean-bitwise-and-{1,2}.c</filename>
in <filename>gcc/testsuite/gcc.dg</filename>, and <filename>boolean-bitwise-and-{1,2}.c</filename>
in <filename>gcc/testsuite/g++.dg</filename>. As these were testing a warning, I made
the header say <computeroutput>dg-options "-Wboolean-bitwise-and "</computeroutput>, and for
each line that would cause a warning, added <computeroutput>/* { dg-warning "bitwise and expression is always false" } */</computeroutput>.
</para>
<para>
Two of these test files were PASS, which means that the testsuite would error out if they failed
for some reason (internal compiler error (ICE), extra warnings, etc.). However, my code doesn't
handle <computeroutput>if (!(a ? b : c) &amp; 0x4)</computeroutput>, so on a suggestion from a 
developer, I added this case as XFAIL testcases (which means: should pass, but we know
it currently won't, so don't whine).
</para>
<para>
You run the testsuite with <command>make -k check</command> in the <filename>obj/</filename> directory.
This creates <filename>.log</filename> and <filename>.sum</filename> files, which you can look
at to see what happened in case your changes break a testcase. Note that you need to use the
<option>-k</option> option, as otherwise the test run stops at the first test that fails unexpectedly
(which seems to be often the case with GCC CVS). The testsuite can take forever, so I did
<command>make -k check RUNTESTFLAGS=dg.exp</command> instead a couple of times, which only runs
a subset of the tests.
</para>
<para>
So now it's tested and documented, the only thing left to do is to make a diff and send
it to gcc-patches@gcc.gnu.org, with a suitable changelog in the mail. You'll also need
a copyright assignment from assign@gnu.org of the correct sort, of course.
</para>
</chapter>
</article>
