Note: this is an attempt to answer the question as asked. This
answer is unlikely to be of any use to the original poster, who
presumably asked the wrong question. I am writing this only as a way to
explore the limits on how fast a modest AVR can sample a port. For an
answer that genuinely attempts to address the OP’s problem, see
Majenko’s answer.
I read the question as follows: can we sample a digital port at
2 MHz on an Arduino Nano clocked at 8 MHz? Can we do so while
storing the values in a RAM-based buffer?
The answer is yes, but it is non trivial, and it requires some assembly.
To see the problem, let’s start by trying to do it in C++:
uint8_t buffer[1024];
void fill_buffer()
{
cli();
for (size_t i = 0; i < sizeof buffer; i++)
buffer[i] = PINB;
sei();
}
Note that the loop runs with interrupts disabled, otherwise the timer
interrupt would wreak havoc with the loop timing. This is translated by
gcc into an assembly equivalent to this:
cli
ldi r30, lo8(buffer) ; load the buffer address into pointer Z
ldi r31, hi8(buffer) ; ditto
0: in r24, 0x03 ; read the port
st Z+, r24 ; store into buffer, increment the pointer
ldi r24, hi8(buffer+1024) ; save (buffer+1024)>>8 in r24
cpi r30, lo8(buffer+1024) ; compare the pointer with buffer+1024
cpc r31, r24 ; ditto
brne 0b ; loop back
sei
ret
The loop takes 8 cycles per iteration. With an 8 MHz clock, that
would be one reading per microsecond. Too slow by a factor two.
One could save one cycle by using a different register for the port data
and for the end-of-loop condition, and by moving the third ldi out of
the loop. Another cycle could be saved by testing only the high byte of
the Z pointer, but that would require aligning the buffer to
256 byte boundaries. With those two optimizations, we still need
6 CPU cycles per iteration, i.e. 0.75 µs at 8 MHz.
In order to make this faster, the only solution is to unroll the loop.
This can be done in assembly by using the .rept (meaning “repeat”)
directive:
void fill_buffer()
{
cli();
asm volatile(
".rept %[count]\n" // repeat (count) times:
"in r0, %[pin]\n" // read the port input register
"st Z+, r0\n" // store in RAM
"nop\n" // 1 cycle delay
".endr"
:: "z" (buffer),
[count] "i" (sizeof buffer),
[pin] "I" (_SFR_IO_ADDR(PINB))
: "r0"
);
sei();
}
This takes 4 cycles, or 0.5 µs per iteration. Note that a
delay cycle had to be introduced, otherwise the sampling would be too
fast : 3 cycles, or 0.375 µs, per iteration.
This is not the fastest one can get. It is possible to take one sample
per CPU cycle with something like this:
in r0, 0x03
in r1, 0x03
in r2, 0x03
...
However this technique is limited to burst readings of at most
32 samples.