Sunday, February 28, 2021

MIPS Processor Design using Verilog: Part 1

Let us design a simple MIPS based processor and write a Verilog code for it.

Let us first individually examine the typical components of a generic processor and then put them all together to build the complete design of the processor.

We will be designing a 32-bit processor. This means that we will be handling 1 Dword (4-bytes) of data at a time.
We will use RISC Instruction Set Architecture. Here, each instruction executes in exactly 1 clock cycle.

Let us build a simple processor that supports only the following instructions:
AND, OR, ADD, SUB, SLT, NOR, LW, SW, BEQ.

The processor broadly consists of a Datapath and a Control Unit.

Datapath refers to all the elements that work with data and process them. We will look at all the datapath components in detail in this section.

Control Unit generates control signals in order to direct the processor operation under different situations. We will see this in the next section.

Single clock cycle vs Pipelined execution: 
All instructions do not take the same time to execute. Hence for single clock cycle execution, we need to ensure that the clock period is long enough to accomodate the execution of the slowest instruction.
To speed up the execution process, the Pipelining technique is commonly used where multiple instructions are overlapped in execution. Pipelining however, is prone to different 'hazards' which we will not discuss here.

Here, we have considered single clock cycle execution.

Components of the processor datapath:

1. Instruction Memory: 

  • To store the instructions of a program.
  • Given the address, it must supply the instruction located at that address.
  • We can load instruction memory using readmemh function from a mem file. The contents of the mem file will be explained in a later section of the tutorial.
    Verilog Code:

module Instruction_Memory(
	instrn_address,
	instrn
    );

input [31:0] instrn_address; //5-bit address holds 8 instructions of 32-bit width
output wire [31:0] instrn;
reg [7:0] instrn_mem [31:0];

initial begin
$readmemh("instrn_memory.mem", instrn_mem);	//load initial values
end

assign instrn = {instrn_mem[instrn_address+3],instrn_mem[instrn_address+2],	
                 instrn_mem[instrn_address+1],instrn_mem[instrn_address]};

endmodule


2. Program Counter (PC):
  • Holds the address of the current instruction.
  • For a 32-bit (4-byte) processor, we must increment the address by 4 to fetch the next instruction (as the width of each instruction is 4 bytes). This adder module will be connected to the PC.
  • This address increment must happen at every clock cycle and hence, it will be a D Flip-Flop.
  • A new instruction is executed every clock cycle.
    Verilog Code:

module Program_Counter(
	clk,
	rst_n,
	in_address,
	out_address
    );

input clk, rst_n;
input [31:0] in_address;
output reg [31:0] out_address;

always @ (posedge clk or negedge rst_n)
begin
  if(!rst_n)
  out_address <= 32'd0;
  else
  out_address <= in_address;
end

endmodule


3. Register File:
  • This module will enclose all the independent registers of the processor, to perform write and read operations.
  • MIPS consists of 32 inbuilt registers as shown in the below table. We will use the same configuration for our design.

  • R-Format instructions have three operands. So we will need to read 2 dwords from the register file (2 output read ports) and 1 write port (input port) along with a write enable signal that indicates when the data has to be written.
  • Example Instruction: add $t1, $t2, $t3
    To execute this instruction, we need to read two registers t1 and t2. Add them. Then write the result to register t3.
  • In the Verilog code below, combinational read is done from the register memory using assign statement. But usually, read data will appear only after 1 clock cycle (flopped)
  • We can load register memory using readmemh function from a mem file.
    Verilog Code:

module Register_File(
	clk,
	rst_n,
	read_addr1,
	read_addr2,
	write_en,
	write_addr,
	write_data,
	read_data1,
	read_data2
    );

input clk;
input rst_n;
input [4:0] read_addr1;
input [4:0] read_addr2;
input write_en;
input [4:0] write_addr;
input [31:0] write_data;

output wire [31:0] read_data1;
output wire [31:0] read_data2;

reg [31:0] reg_mem [31:0];

initial begin
$readmemh("reg_memory.mem", reg_mem); //Load initial values
end

assign read_data1 = reg_mem[read_addr1];
assign read_data2 = reg_mem[read_addr2];

always @ (posedge clk or negedge rst_n)
begin
if (!rst_n)
begin
reg_mem[write_addr] <= reg_mem[write_addr];
end
else
begin
reg_mem[write_addr] <= write_en ? write_data : reg_mem[write_addr];
end
end

endmodule

4. ALU:
  • ALU will be required to perform the required operations on the data provided to it.
  • For our processor, we will need to perform the following operations: add, subtract, and, or, nor, less than (for SLT).
  • The post on ALU design using MIPS Instruction set explains about this ALU design. We will use the same ALU here.
The above modules should be sufficient to design a processor with our supported R-Type Instructions: 
AND, OR, ADD, SUB, SLT, NOR

For Load Word (LW) and Store Word (SW) Instructions, we will additionally require two more components.

5. Sign Extension Unit:
  • See the format of the lw and sw instructions:
    lw $t1, $t2, offset
    sw $t1, $t2, offset
    where the offset_value is a signed 16-bit value.
  • Since we are working with 32-bit values, we will need to sign extend the 16-bit offset value to bring it to 32-bits, and so we require a Sign Extension Unit.
    Verilog Code:

module Sign_Extension(
	bits16_in,
	bits32_out
    );

input [15:0] bits16_in;
output wire [31:0] bits32_out;

assign bits32_out = {{16{bits16_in[15]}} , bits16_in[15:0]};

endmodule

6. Data Memory:
  • For lw and sw instructions, we are computing a data memory address from which we have to either fetch the data or store the data.
  • lw $t1, $t2, offset means
    Fetch the data from this data memory address (value present in $t1 + sign-extended offset value) and store it in $t2 
  • sw $t1, $t2, offset means
    The data present in $t2 has to be stored in the calculated data memory address  (value present in $t1 + sign-extended offset value)
  • Below is the Verilog code, again it is similar to the other memories we have designed above.
  • We can load data memory using readmemh function.
    Verilog Code:

module Data_Memory(
	clk,
	address,
	write_en,
	write_data,
	read_data
    );
	 
input clk;
input [31:0] address;
input write_en;
input [31:0] write_data;
output wire [31:0] read_data;

//Registers are addressed as per MIPS register table
reg [7:0] data_mem [31:0];
										
initial begin
$readmemh("data_memory.mem", data_mem);
end

assign read_data = {data_mem[address+3],data_mem[address+2],
		     data_mem[address+1],data_mem[address]};

always @ (posedge clk)
begin
data_mem[address]   <= write_en ? write_data[7:0]   : data_mem[address];
data_mem[address+1] <= write_en ? write_data[15:8]  : data_mem[address+1];
data_mem[address+2] <= write_en ? write_data[23:16] : data_mem[address+2];
data_mem[address+3] <= write_en ? write_data[31:24] : data_mem[address+3];
end

endmodule

The final remaining instruction is the Branch on Equal (BEQ) instruction.
Format: beq $t1, $t2, offset
Operation: 
  • Compare the two register values $t1 and $t2 to check for equality (can be done by subtraction operation in ALU and check for zero)
  • If condition is false, we directly execute the next instruction (no issues here)
  • If condition is true, we branch to the instruction with address = next instruction address + 18-bit offset (16-bit offset shifted left by 2 bits). To enable this shifting, we require a shifter module as well. For above address addition, we will require another small ALU.
  • We do the shift left by 2 bits because for every offset value, we must increment address by 4 to reach the next address.
    Example: Offset 1 means 1<< 2 = 4 meaning take the next instruction from address = address+4

7. Shifter (for BEQ)

    Verilog Code:

module Shifter(
	indata,
	shift_amt,
	shift_left,
	outdata
    );

input [31:0] indata;
input [1:0] shift_amt;
input shift_left;
output wire [31:0] outdata;

assign outdata = shift_left ? indata<<shift_amt : indata>>shift_amt;

endmodule

The above modules will be sufficient to design the datapath of the desired processor. 
Now the job is to generate the control logic and create the top module.

Let us proceed to part 2 of the Processor Design Series.

No comments:

Post a Comment