- Lab 4 - ARC4 Stream Cipher
- On-Chip vs Off-Chip
- Embedded Memory vs Flip Flops
- Read Write Port Configurations
- Memory in Verilog
Lab 4 - ARC4 Stream Cipher
The encryption block generates a bunch of random bits which is seeded by a given key. The encrypted message is
XOR‘d with the corresponding bits from the bitstream to obtain the decrypted message. Since
XOR is symmetric, the same process can be used to encrypt.
In order to use the array, we need to use the on-board memory.
The first part is to initilize the array. The second part is to decrypt where the switches act as the secret key. The third part is to crack the message by brute force. With 24 bits, we need about 10 minutes, but 40 bits would need 400+ years.
Finally, the last part is to instantiate multiple cores to see who can crack the fastest.
On-Chip vs Off-Chip
The DE1-SoC there are local storage on-chip like the 4M embedded SRAM. But there also exist off-chip memories such as SDRAM, they are used for larger datasets but they’re typically much slower. Some other off-chip memory include the 1GB DDR3 RAM and 512MB FLASH storage. For much larger storage, connect external storage elements or interface to internet and utilize cloud storage.
Embedded Memory vs Flip Flops
Embedded memory blocks are used because they’re much more efficient. They are spread all over the FPGA so that they can be accessed easily from anywhere. In the DE1-SoC board, we have about 4Mbit of memory.
Some other advantages include:
- Much more efficient for large memories
- Address decoders, MUXs, are already implemented as part of the memory blocks; usually they would be wasting logic blocks if implemented otherwise.
Some disadvantages include:
- Limited to fixed configurations and less flexible
- read and write time might be longer than using registers for small memories
Flip flops can be used to implement memory, the DE1-SoC has about 128K flip-flops. They are not very efficient to use as larger memory storage in terms of space.
Advantages of using flip-flops:
- Very flexible and we can combine registers to fit desired design
- Fast connections between logic and memory - since registers are in the “logic fabric”
A third way is to use configuration bits in the FPGA’s LUT MUX as actual memory bits.
Read Write Port Configurations
Fixed Read and Write Port in Embedded Memories
Recall that we can configure the memories in several ways:
- Single port memory: one read/write per cycle
- Simple dual-port memory: simultaneous read/write
- True dual port memory: has two ports, we can choose to read/write mode for each port
- ROM: read only memory (no write)
- FIFO buffers: FIFO queue
Important: these memory blocks are very flexile so use the appropriate mode based on application to make the most efficient use of blocks
Single Port Memory
There is a
address bus for writing to memory. The
address bus will specify which part of memory to access to read/write. There exists a
wren (write enable) signal to specify if we want to read or write.
page 25 (what it looks like internally)
The output mux can be configured to have the output register enabled/disabled. The output register is certainly used for timing purposes.
Dual Port Memory
In this configuration, there are two separate address inputs; one is for read, the other is for write. Thus we can read and write at the same time. This is useful for implementing a FIFO queue.
True Dual Port Memory
There are two ports, but both can read or write.
Creating Read and Write ports in Flip Flops
Recall that flip flops are not very efficient. This is because the 8-1 mux is a hierarchy of smaller 2 input mux’s. Thus for an 8-1 mux uses 7 2 input muxes. So a 256 bit x 8 bit memories would need 2000+ logic blocks.
Long story short, super inefficient - try to avoid them as much as you can.
Memory in Verilog
- Explicitly indicate that we want to use memory using a pre-defined Altera macro. This way, we can specify the ports, size, width, output registers, etc.
- It can be done directly through a port map or using a Wizard
altsyncram altsyncram_component( .address_a(address), .clock0(clock), .data_a(data), .wren_a(wren), .q_a(sub_wire0) ); defparam // memory here...
Alternatively, the IP Log on the right-hand side of Quartus launches a wizard to configure memory. We can choose specific parameters such as width, depth, clocks, etc. The Wizard will create a module that will be instantiated when we compile in Quartus.
To read and write, just specify the inputs and data may be obtained in the next clock cycle.
We can have as much memory as the design requires as long as it doesn’t exceed the number of logic blocks in the FPGA.
- Write behavior of memory
But you’ll never know what you’ll really get exactly.
To do it in Verilog, we can use an array construct to describe a memory:
reg [7:0] mem [128:0];
To read from array behavior:
var = mem[i];
To write to array behavior:
mem[i] = var;
However, if the behavior for the array/memory is purely combinational, the synthesizer might not infer a proper memory. Another problem is that there is not enough ports. Quartus might also synthesize flip flops (bad).
In general, the more ports, the more simultaneous read and write you can do. But you don’t always want to do this:
- In a custom chip, dual port memory are a bit slower and larger
- You might want to use the port to do something else (such as snooping the memory in Quartus)
It is important to realize the trade-offs of using more read and write ports.
We need to come up with some kind of schedule, cycle by cycle, which we will read and write from memory. Thus, first we need to identify all the read and writes to the memory. (We use dual port memory so that we can look into the memory to debug.)
Note that reading and writing to memory actually happens on the next cycle due to the latches. However, we don’t care if writing to memory happens to the next cycle because there is no confirmation. However, this does affect the cycles used for reading values.
The value of the read is not ‘returned’ in the same cycle as the cycle where we give the address. The value of read is given in the next cycle.
In a nutshell, 2 cycles are required to read from memory. Be very careful about memory timing.