Say you want to investigate part of the code you use from a rather large library, but it's buried between all kinds of #defines and classes etc, that you're never gonna use in your code. Is there a tool for stripping those unused/unnecessary parts away (like the (pre-)compiler), without converting it to machine language? That would make it much easier to see what happens in the code, and what code does what. Would be a great learning tool imho.
3 Answers
The compiler is called in optimization mode (it is even possible to specify how aggressive it will optimize). It will remove all variables and functions, that are not used in your code. Also it can replace variables that never change with literals to make the code smaller. That's already what is happening and you don't need to worry about that.
About the #defines: Everything with a # at the start is interpreted by the preprocessor. That means, that these things are done even before the compiler gets to see the code. Lets say you define a pin number via #define pin 6, then the preprocessor will replace any mentioning of pin with 6 in the code, before it hands the result over to the compiler. The compiler will never see any defines.
I'm not able to tell you, how the compiler optimizes the code. That's probably very complex. Though the avr-gcc compiler is opensource, so if you really want to learn what it does, you can always analyze its source code.
I thought if a compiler can be smart enough to strip all unused variables and functions away and convert what's left in machine language
While I don't know much about how the optimization works, I don't think that it happens in that order. I find it very unlikely, that the compiler would optimize the C++ code. Though it might do that at the assembler level (since avr-gcc first converts the C++ code to assembler and then compiles to machine code). Searching in the assembler code equivalent of the C++ code isn't a real option either.
You are probably stuck with a full text search function of a program of your choice. Most OSs will also provide that function out of the box. I know, that you can search all files of a directory for a term from the Windows explorer. On systems like Ubuntu/Debian/CentOS and similar you can use the grep command in recursive mode to search for a term. On Mac you can probably also use grep.
- 16,622
- 2
- 18
- 27
The compiler doesn't quite work like that. It compiles each individual .cpp file into an object file (the .ino file becomes a .cpp file after a bit of pre-processing). I have an answer about the exact process the compiler uses there.
The libraries are also collections of .cpp files which also get turned into object files.
Optimization of unused variables would occur at that point. In other words, the compiler can optimize one compilation unit (one source file) because it can see what parts are used and what are not.
Then the linking phase commences which joins all the object files together, looking for places where a function in a library is called. So for example if you call Serial.begin() somewhere, then the linker will try to find an object file which implements that.
The linker is smart enough to not include code that is not required, so for example, if you never called Serial.begin () then it would not copy in that part of the Serial library.
In the process of doing that it resolves "jumps" and subroutine calls, so that your calls to library functions end up pointing to the place in the program memory where it has put them.
Because this is done at a late stage (after the source has been compiled) you don't really ever have a copy of the source "as linked".
However what you can do is disassemble the object file which gives you the machine code that has actually made it to the final executable. Done properly it will also include the lines of source (C++) code that generated each batch of assembler lines.
If you turn on "verbose compiling" you should see near the end of the compilation output an "ELF" file, and you can then send that to avr-objdump to turn it back into assembler:
avr-objdump -S xxx.elf
It's not great, if you aren't used to assembler, but it gives you a clue as to what made it into the executable file, and where. For a single line of C++ code you might see something like this:
reg = portModeRegister(port);
310: 90 e0 ldi r25, 0x00 ; 0
312: 88 0f add r24, r24
314: 99 1f adc r25, r25
316: fc 01 movw r30, r24
318: e8 59 subi r30, 0x98 ; 152
31a: ff 4f sbci r31, 0xFF ; 255
31c: a5 91 lpm r26, Z+
31e: b4 91 lpm r27, Z+
- 38,901
- 13
- 69
- 125
The -E command for gcc generates preprocessed C and CPP source.
I didn't find a way to integrate it into the Arduino build process. Only thing I could do was edit platform.txt and change the compile command so it generates the preprocessed cpp of a compilation unit instated of .o files. Then linking failed of course.
The produced cpp was I think less readable than the original because it contained many generated comments and original line number instructions.
the changed cpp recipe in platform txt:
recipe.cpp.o.pattern="{compiler.path}{compiler.cpp.cmd}" {compiler.cpp.flags} -mmcu={build.mcu} -DF_CPU={build.f_cpu} -DARDUINO={runtime.ide.version} -DARDUINO_{build.board} -DARDUINO_ARCH_{build.arch} {compiler.cpp.extra_flags} {build.extra_flags} -E {includes} "{source_file}" -o "{object_file}.cpp"
I added the -E flag and .cpp file extension for the -o option value
You can see the result in the temporary build folder. You can get the location of that folder from the IDE console when you build the sketch with Verify. (you may need to turn on the verbose compile in Preferences).
- 18,264
- 4
- 31
- 49