Are there any open source or examples for using raspberry pi's ARM cpu's PMU, performance counter?
It can be a bare metal example or user-space application.
I have seen this and that (which I don't understand how can I use to profile bare metal code). This link is work in progress.