On-Chip Memories

Updated 2018-02-19

Lab 4 - ARC4 Stream Cipher
On-Chip vs Off-Chip
Embedded Memory vs Flip Flops
Read Write Port Configurations
- Fixed Read and Write Port in Embedded Memories
- Creating Read and Write ports in Flip Flops
  - Read Ports
  - Write Ports
Memory in Verilog
- Explicit Instantiation
- Inferred Memories
Scheduling

Lab 4 - ARC4 Stream Cipher

page 3

The encryption block generates a bunch of random bits which is seeded by a given key. The encrypted message is XOR‘d with the corresponding bits from the bitstream to obtain the decrypted message. Since XOR is symmetric, the same process can be used to encrypt.

In order to use the array, we need to use the on-board memory.

The first part is to initilize the array. The second part is to decrypt where the switches act as the secret key. The third part is to crack the message by brute force. With 24 bits, we need about 10 minutes, but 40 bits would need 400+ years.

Finally, the last part is to instantiate multiple cores to see who can crack the fastest.

On-Chip vs Off-Chip

page 11

The DE1-SoC there are local storage on-chip like the 4M embedded SRAM. But there also exist off-chip memories such as SDRAM, they are used for larger datasets but they’re typically much slower. Some other off-chip memory include the 1GB DDR3 RAM and 512MB FLASH storage. For much larger storage, connect external storage elements or interface to internet and utilize cloud storage.

Embedded Memory vs Flip Flops

Embedded memory blocks are used because they’re much more efficient. They are spread all over the FPGA so that they can be accessed easily from anywhere. In the DE1-SoC board, we have about 4Mbit of memory.

Some other advantages include:

Much more efficient for large memories
Address decoders, MUXs, are already implemented as part of the memory blocks; usually they would be wasting logic blocks if implemented otherwise.

Some disadvantages include:

Limited to fixed configurations and less flexible
read and write time might be longer than using registers for small memories

Flip flops can be used to implement memory, the DE1-SoC has about 128K flip-flops. They are not very efficient to use as larger memory storage in terms of space.

Advantages of using flip-flops:

Very flexible and we can combine registers to fit desired design
Fast connections between logic and memory - since registers are in the “logic fabric”

A third way is to use configuration bits in the FPGA’s LUT MUX as actual memory bits.

Read Write Port Configurations

Fixed Read and Write Port in Embedded Memories

Recall that we can configure the memories in several ways:

Single port memory: one read/write per cycle
Simple dual-port memory: simultaneous read/write
True dual port memory: has two ports, we can choose to read/write mode for each port
ROM: read only memory (no write)
FIFO buffers: FIFO queue

Important: these memory blocks are very flexile so use the appropriate mode based on application to make the most efficient use of blocks

Single Port Memory

page 24

There is a data and address bus for writing to memory. The address bus will specify which part of memory to access to read/write. There exists a wren (write enable) signal to specify if we want to read or write.

page 25 (what it looks like internally)

The output mux can be configured to have the output register enabled/disabled. The output register is certainly used for timing purposes.

Dual Port Memory

page 26

In this configuration, there are two separate address inputs; one is for read, the other is for write. Thus we can read and write at the same time. This is useful for implementing a FIFO queue.

True Dual Port Memory

page 28

There are two ports, but both can read or write.

Creating Read and Write ports in Flip Flops

Read Ports

Recall that flip flops are not very efficient. This is because the 8-1 mux is a hierarchy of smaller 2 input mux’s. Thus for an 8-1 mux uses 7 2 input muxes. So a 256 bit x 8 bit memories would need 2000+ logic blocks.

Write Ports

page 35

Long story short, super inefficient - try to avoid them as much as you can.

Memory in Verilog

Explicit Instantiation

Explicitly indicate that we want to use memory using a pre-defined Altera macro. This way, we can specify the ports, size, width, output registers, etc.
It can be done directly through a port map or using a Wizard

Use ALTSYNCRAM macro:

altsyncram altsyncram_component(
    .address_a(address),
    .clock0(clock),
    .data_a(data),
    .wren_a(wren),
    .q_a(sub_wire0)
);

defparam
	// memory here...

Alternatively, the IP Log on the right-hand side of Quartus launches a wizard to configure memory. We can choose specific parameters such as width, depth, clocks, etc. The Wizard will create a module that will be instantiated when we compile in Quartus.

To read and write, just specify the inputs and data may be obtained in the next clock cycle.

We can have as much memory as the design requires as long as it doesn’t exceed the number of logic blocks in the FPGA.

Inferred Memories

Write behavior of memory

But you’ll never know what you’ll really get exactly.

To do it in Verilog, we can use an array construct to describe a memory:

reg [7:0] mem [128:0];

To read from array behavior:

var = mem[i];

To write to array behavior:

mem[i] = var;

However, if the behavior for the array/memory is purely combinational, the synthesizer might not infer a proper memory. Another problem is that there is not enough ports. Quartus might also synthesize flip flops (bad).

In general, the more ports, the more simultaneous read and write you can do. But you don’t always want to do this:

In a custom chip, dual port memory are a bit slower and larger

You might want to use the port to do something else (such as snooping the memory in Quartus)

It is important to realize the trade-offs of using more read and write ports.

Scheduling

We need to come up with some kind of schedule, cycle by cycle, which we will read and write from memory. Thus, first we need to identify all the read and writes to the memory. (We use dual port memory so that we can look into the memory to debug.)

Note that reading and writing to memory actually happens on the next cycle due to the latches. However, we don’t care if writing to memory happens to the next cycle because there is no confirmation. However, this does affect the cycles used for reading values.

page 68

The value of the read is not ‘returned’ in the same cycle as the cycle where we give the address. The value of read is given in the next cycle.

page 71

In a nutshell, 2 cycles are required to read from memory. Be very careful about memory timing.