Simulation and Test Benches
测试语言中有很大一部分专门用于测试台和测试。在本章中,我们将介绍为硬件设计编写高效测试台的一些常用技术。
6.1 How SystemVerilog Simulator Works
在深入研究如何编写适当的测试台之前,我们需要深入了解模拟器的工作原理和事件调度方式。这将有助于我们将来排除故障和错误。
符合规范的 SystemVerilog 模拟器遵循离散事件执行模型,模拟时间随着值的更新而前进。硬件设计本质上是并行的,目前正在执行 always_comb 和 always_ff 等进程。每次网络/变量的值发生变化时,我们都会有一个更新事件,任何对该事件敏感的进程也需要进行评估,这就是所谓的评估事件。在每个 "时间戳 "上,模拟器需要首先计算更新事件,评估更新事件,并循环查看是否有更多的更新事件是由之前的更新事件触发的。
在 SystemVerilog 中,"时间戳 "一词指的是仿真时间。它可以使用本书前面介绍的时间刻度编译器指令转换回实时时间。为了避免混淆,我们在整章中都使用仿真时间或简单的时间。
虽然设计和测试台本质上是并行的,但大多数模拟器都是单线程的,并遵循一定的规则来评估代码,以确保其概念正确。通常情况下,模拟器会将单位时间斜坡划分为多个区域,在这些区域中,事件可以按照预定顺序排列。在每个区域中,事件可以任意安排,模拟器可以在认为合适时进行优化。图 12 显示了如何将时隙划分为不同区域以及不同区域间的执行流程。
Figure 12: Event scheduling regions. Image taken from SystemVerilog LRM Figure4-1
本书稍后将详细讨论 PLI 区域。现在,我们只需要知道有一些区域是为第三方库预留的,这些库可以加载到模拟器中,并可以直接访问模拟器状态。
要全面介绍每个区域需要大量的细节,我们鼓励读者通读语言 LRM,甚至尝试实现一个简单的基于解释器的模拟器。我们将重点讨论三个主要区域:活动事件区域、非活动事件区域和 NBA 事件区域。
一般来说,在 always_comb 和连续赋值中指定的任何事件(如阻塞赋值)都会在活动事件区域中进行评估。模拟器会继续循环评估活动事件区域中的事件,直到该区域中没有事件为止。如果进程中存在显式时序控制(如 #0 延迟控制),进程将被暂停,并将后续事件排入非活动事件区域。模拟器将再次循环运行,以清除非活动事件区域中的事件。
NBA 事件区域包含非阻塞赋值更新。只有在清除了活动和非活动区域的先例后,它才会被执行。
6.1.1 Simulation order
The SystemVerilog LRM guarantees a certain scheduling order. Any simulator claims to be standard compliant should obey the execution order:
- Statements within a
begin
-end
block shall be executed in lexical order, i.e., the order in whey they appear in the source code - NBAs shall be performed in the order the statement where executed.
To understand the second requirement, let’s consider the following example:
initial begin
a <= 0;
a <= 1;
end
在模拟时间结束时,变量 a 将首先赋值为 0,然后赋值为 1。
正如我们所猜想的那样,这种排序对与重排序相关的编译器优化构成了硬性限制。仿真供应商通常会采用不同类型的优化来确保语义得到满足,但实际的执行顺序是必要的。例如,如果不希望第三方实体读出确切的仿真顺序(如允许逐步执行的调试器),只要不产生副作用并符合排序语义,我们就可以对语句重新排序。这大大加快了仿真速度,但如果用户希望调试和逐步查看代码,则需要额外的标志,例如 Xcelium 中的 -line_debug 标志。另一方面,Verilator 只是为了性能而提供重新排序的仿真顺序。因此,它不符合标准。
然而,SystemVerilog LRM 并没有指定评估进程的顺序。因此,要由模拟器决定先执行哪个进程。这就在模拟器之间引入了非确定性。非确定性的另一个来源是,模拟器在遇到定时控制语句时可能会暂停进程,并将部分完成的事件作为待处理事件放置在事件区域中。这种情况通常发生在测试台而不是 RTL 设计中,因为可综合 RTL 不允许时序控制,但 always_ff 除外。
6.2 Timing Controls
时序是编写测试台时需要考虑的最重要因素之一。信号是否应该在时钟沿之前就稳定,或者信号的有效期应该有多长?延迟意味着什么?本节将介绍定时控制的各个方面。
编译器指令 `timescale 指定了模拟器运行的精度。由于不同的模块可能有不同的时间刻度,因此模拟器需要决定如何表示模拟时间。在大多数模拟器中,实际上是任何支持 VPI 标准(稍后讨论)的模拟器中,模拟时间都表示为无符号 64 位整数,尽管 RTL 模型可能希望时间是浮点数。为此,时间被四舍五入到指定精度,然后缩放为仿真时间单位。请看下面的示例:
解释
`timescale 1ns/10ps
module A;
logic a;
initial begin
#1.2 a = 1;
end
endmodule
logic b;
initial begin
#3.4 b = 1;
end
endmodule
For all modules, 10ps is the finest precision so 1 simulation time unit corresponds to 10ps. Before we convert every delay into the simulation time, we first round the delay into module’s precision. So 1.2 in module A
becomes 1.2��=120×10��, i.e. 120 10-picoseconds unit; 3.4 in module B
becomes 3.4��=340×10��, i.e. 340 10-microseconds. Then we scale everything into simulation time. Hence 1.2 in module A
becomes 120 10-picoseconds and 3.4 in module B
becomes 340000 10-picoseconds.
To obtain the simulation time, we can use $time
, which can be printed out either via %d
or %t
in the $display
function.
The most common usage of timing control is setting the clock. A standard code style is shown below:
module top;
logic clk;
always clk = #10 ~clk;
Notice that the clock changes its value every 10 units of time, hence the clock period is 20 units of time. Because this always
block runs forever, we have to terminate the simulation with the builtin SystemVerilog task $finish
, as shown below:
initial begin // test bench logic $finish; end
To synchronize the values against the clock, we highly discourage readers to set delays by hand, which is error-prone and reduce the readability. Instead, we recommend to use timing controls (@
) directly. Here is an example:
initial begin
input1 = 1;
input2 = 2;
input2 = 3;
end
In such way, we are guaranteed that signals input1
and input2
are set before the rising edge of the clock signal, regardless of the clock period! If you have checking/assertion logics, you can place them after the negative edge of the clock, assuming there is no synchronous logic depends on negative edge of the clock in your design (dual triggering typically happens in some high-performance design), as shown below:
解释
initial begin // input logic input1 = 1; @posedge (clk); @negedge (clk); // checking logic assert(output1 == 1); // input logic input1 = 2; @posedge (clk); @negedge (clk); // checking logic assert(output1 == 2); //... end
We will discuss more complex but reusable test bench design pattern later in the chapter.
6.2.1 Fork and Join
Because hardware is inherently concurrent, in many cases we want to have multiple threads performing tasks at the same time, either driving or checking differently parts of the design. SystemVerilog offers fork and join semantics that is similar to that of software programming languages, e.g. std::thread
in C++.
The general syntax for fork
and join
is shown below. Notice that each statement inside the fork
join
is an individual thread, so if you want complex logic, you need to enclose it with begin
and end
block.
解释
fork // thread 1 // thread 2 // thread 3 join
Here is a simple example to illustrate how to use fork
and join
:
解释
module fork_join_ex; initial begin fork #10 $display("Thread 1 finished at %t", $time); begin // thread 2 #5 $display("Thread 2 finished at %t", $time); end #20 $display("Thread 3 finished at %t", $time); join end endmodule
Run the file (code/06/fork_join_ex.sv
) with xrun
we will get:
Thread 2 finished at 5
Thread 1 finished at 10
Thread 3 finished at 20
Notice that you can even have nested fork
join
, i.e. one thread can spawn multiple threads as well. Although the fork join semantics is similar to software programming languages, there are some properties we need to keep in mind:
- All statements are executed concurrently, regardless of whether it is simulated on a single CPU core or not.
- Timing controls are local to each fork block and are computed relative to the simulation time when entering the block.
-
It is always a good practice to name the fork block, especially when you’re creating variables inside, as shown below:
解释
fork begin: blk_1 // logic end: blk_1 begin: blk_2 // logic end: blk_2 join
- Since fork and join is part of SystemVerilog’s timing control, it is not allowed inside
function
. You need to usetask
instead. - Any objects declared inside the fork-join block are managed by the simulator, so we don’t need to worry about dangling references or memory leaks. However, they should be declared as
automatic
so that it is local to the block. -
You cannot put fork-join inside
always_comb
.
6.2.1.1 Different Join Semantics
There are three different join keywords we can use in SystemVerilog and each have different semantics:
join
: this keyword blocks the execution until all the forked processes finish, This is similar tojoin()
in software threadsjoin_any
: this keyword blocks until any of the forked processes finishes. As a result, some processes may still be running when the execution of the main thread continuesjoin_none
: this keyword does not block and execution as the forked processes continue to execute in the background.
6.3 Standard Data Structures
SystemVerilog 引入了许多常用数据结构,以帮助设计人员构建复杂的测试逻辑。这些数据结构接口深受 C++ 标准库的影响。我们将简要介绍一些常用的数据结构。感兴趣的读者可参阅 LRM 了解更多信息。请记住,本章介绍的所有数据结构都是不可合成的,本章讨论的所有结构也是如此。
6.3.1 Dynamic Array
SystemVerilog 中的大多数数组都是固定大小的,运行时无法更改其尺寸。动态数组,顾名思义,是一种未打包的数组,其尺寸可以在运行时更改。要十进制动态数组,我们可以使用以下语法:
// data_type name[];
integer a[];
logic[15:0] b[];
You can also combine it with other arrays, as shown below, which declares a fix-sized array of dynamic arrays.
integer a[1:0][];
To initialize the dynamic array, we can use the keyword new
with the targeted dimension:
integer a[];
a = new[10];
Keep in mind that even though we have initialized the dynamic array, the content of each array element is still uninitialized. As a result, you can get x
when reading the element values.
To loop through the array, we can simply do
integer a[]; a = new[4]; foreach (a[i]) begin $display("a[%0d] = %0d", i, a[i]); end
Notice that we implicitly create an index variable i
with the foreach
keyword.
Below is a list of methods associated with the dynamic array:
size()
: in additional to the standard system call function$size()
, dynamic array has a method that returns the size of the array.delete()
: clears all the elements and becomes an empty array.
6.3.2 Queue
Queue is SystemVerilog’s equivalence for vector in C++. To declare a queue, we can use the following syntax:
// type name[$];
string names[$];
integer values[$];
与普通数组一样,queue 也支持切分操作: - 与通常的切分操作符一样,索引是包容性的,也就是说,queue[a:b] 应返回 b - a + 1 个元素。- 如果分片超出范围或畸形,例如 queue[1:0],则应返回空队列。- 如果任何包含 x 或 z 的 4 态值被用于切分,则应返回空队列。
队列循环与动态数组循环相同:
integer a[$]; foreach (a[i]) begin $display("a[%0d] = %d", i, a[i]); end
下面列出了与队列相关的方法: - size():除了标准的系统调用函数 $size(),size() 返回队列的大小。- delete(index):根据给定的索引删除元素;如果索引未作为函数参数提供,则清除队列。- insert(index,value):将值插入给定的索引中。- push_back(value):将元素放到队列的末尾。- pop_back():删除并返回队列中的最后一个元素。如果队列为空,则返回数据类型的默认值,并可能发出警告。- push_front(value):将元素放到队列的前面。- pop_front():删除并返回队列中的第一个元素。如果队列为空,则返回数据类型的默认值,并可能发出警告。
6.3.3 Associative Array
Associative array is SystemVerilog’s equivalence for map containers in C++. The index expression can be any legal SystemVerilog type and the size of the container grows as more elements are inserted. To declare an associative array, we can use the following syntax:
解释
// data_type name [index_type] integer array1[string]; logic[15:0] array2[ClassA]; // ClassA is a class // * implies any integral expression of any size // more details below logic array3[*];
SystemVerilog supports using *
as a wildcard for index type with the following restrictions:
- The index type must be an integral type, but can be different size. The “true value” is used for indexing; that is, SystemVerilog needs to resolve two values with different sizes to the same index location if their values match.
- 4-state values with
x
andz
is illegal. - Non-integral index types/values are illegal and will result in an error
- String can be used, but will be casted as integral values.
To initialize the associative map when declaring it, we can use the following syntax:
string map[integer] = {0: "a", 1: "b"};
Similar to other data structures, we can loop through the associative array using foreach
keyword:
解释
string map[integer] = {0: "a", 1: "b"}; foreach (map[key]) begin string value = map[key]; end
Below is a list of useful methods for associative array: - size()
: in additional to the standard system call function $size()
, size()
returns the number of elements in the associative array. - delete([index])
: if index is provided, deletes the index and its associated value from the array. If index is not provided as function argument, clear the entire array. - exists(index)
: returns 1 if the element with given index exists and 0 otherwise.
6.4 Event Control and Synchronization
由于标准 RTL 测试台的编程模型需要并发性,因此需要同步,SystemVerilog 提供了各种结构和关键字来帮助程序员推理并发性。
基本同步单元是事件,可以是已命名或未命名的事件。未命名事件是通过检测网络和变量的值变化隐式创建的。有三种值变化可以触发事件:
posedge
: it happens when the net become non-zero from zero or fromx/z
to 1, , e.g.0 -> 1
or0 -> x
negedge
: it happens when the net becomes non-one from one or fromx/z
to 0, e.g.1 -> 0
or1 -> x
edge
: it happens wheneverposedge
ornegedge
happens.
隐式事件中只能使用整数值或字符串。
要使逻辑与边缘触发事件同步,我们需要使用 @ 关键字,如下所示。请注意,我们已经在 always_ff 和前面关于如何编写简单测试台的章节中看到了事件控制!
@(posedge clk);
@(negedge ckl);
Events can also be OR-ed together so that the code can be synchronized by any of the events, as shown below:
@(posedge clk or posedge clk2);
Notice that SystemVerilog also offers a syntax sugar that uses comma (,
) as OR operator in the events, which we have seen in the always_ff
earlier:
always_ff @(posedge clk, negedge rst_n);
同步事件的另一种方法是阻塞执行,直到条件变为真。这被称为 "级别敏感"(level-sensitive),而不是使用 @ 时的 "边缘敏感"(edge-sensitive)。为此,我们需要使用 wait 关键字来评估指定的条件。如果条件为假,下面的存储过程语句将被阻塞,直到该条件变为真。下面是一个使用 fork 的 wait 的示例(code/06/wait_ex.sv):
module wait_ex;
logic a;
fork
begin
#10;
a = 1;
end
begin wait(a);
$display("@(%0t) a = %d", $time, a);
end
join
end
endmodule
After running the example we will see the following printout, which is expected.
@(10) a = 1
虽然 @ 和等待看似相似,但它们有本质区别,因为一个是边沿触发,另一个是电平触发。这直接导致它们在模拟器中的调度方式不同。
命名事件可以通过 SystemVerilog 的内置类型事件来构建,该类型允许别名,如下所示。
event e1; // declare a new event called e1
event e2 = e1; // e2 becomes an alias of e1
event e3 = null; // event e3 does not hold any synchronization object
To trigger a named event, we can use ->
and ->>
. ->>
is the non-blocking version of ->
. To wait for an event to be triggered, we can use triggered
with wait
keyword, as shown below.
解释
module event_trigger_ex; event e; initial begin fork begin #10; -> e; end begin wait(e.triggered); $display("@(%0t) e is triggered", $time); end join end endmodule
We should expect similar output as the wait
example:
@(10) e is triggered
There are several advantages of using events compared to using normal signals
- Events can be passed into tasks and other hierarchy due to its aliasing semantics.
-
Events avoid a common case of race condition. Considering the following example:
wait (a); wait (e.triggered)
If the simulator evaluates the
wait
statement and updatinga
value at the same time, the ordering of execution is undetermined since this is a race condition. Usingtriggered
however, is guaranteed to be executed properly, regardless of the ordering of execution.
6.4.1 Semaphore: How to Avoid Race Conditions
并发软件系统中的一个天然挑战就是竞赛条件。由于硬件仿真通常是在软件中完成的,因此如果不加注意,就会出现竞赛条件。SystemVerilog 提供了一种称为 semaphore 的结构来促进共享资源同步。在本章中,我们假设读者已经掌握了一些 POSIX 线程 (pthread) 的基本知识。如果没有,我们强烈建议阅读 pthreads(7) 的 Linux 手册页面和其他相关页面。
要初始化信号,我们可以使用下面的语法,声明并初始化一个带有 10 个初始资源的信号 s:
semaphore s;
int num_resources = 10;
s = new(num_resources);
要从 semaphore 中获取一定数量的资源,我们可以使用 get() 方法。请注意,该方法是阻塞的,这意味着只有在函数返回后,即成功获取所需资源后,才会对下一条程序语句进行评估。
s.get(1); // blocking
To release resources back to the semaphore, we can use put
methods. This will unlock threads that’s waiting for resources:
s.put(1);
A best-effort getting resource can be done via try_get()
. Notice that this method is non-blocking and caller thread should check the return value to see how many resources actually get allocated.
int num = s.try_get(5);
Here is an example (code/06/semaphore_ex.sv
) of semaphore with fork-join
:
解释
module semaphore_ex; semaphore s; initial begin s = new(10); fork begin s.get(5); #10 s.put(5); $display("Thread 1 finished @ %0t", $time); end begin s.get(5); #20 s.put(5); $display("Thread 2 finished @ %0t", $time); end begin #1; s.get(10); $display("Thread 3 finished @ %0t", $time); end join end endmodule
We should expect the following output:
Thread 1 finished @ 10
Thread 2 finished @ 20
Thread 3 finished @ 20
尽管 SystemVerilog 没有提供互斥构造,但通过将初始资源设置为 1,可以很容易地实现互斥。
6.4.2 Mailboxes: Thread-safe Messaging Passing
邮箱是一种消息传递结构,允许不同进程之间进行消息交换。顾名思义,它的设计遵循了现实生活中 "邮箱 "的概念。也就是说,邮箱有固定的容量,如果邮箱满了,邮件就会被拒收:投递员需要稍后再来尝试投递。类似地,SystemVerilog 中的邮箱也是一个固定容量的容器,如果满了就会阻止进程的投递尝试。
要创建一个邮箱,我们可以使用下面的构造函数:
mailbox box1 = new();
mailbox box2 = new(10);
请注意,默认构造函数将容量设为 0,这意味着容量是无限的。在这种情况下,邮箱就像一个具有无限容量的 FIFO。
要将消息放入邮箱,我们只需使用 put(obj),obj 可以是任何表达式或对象句柄。要从邮箱中获取信息,我们可以使用 get() 方法。put()和 get() 方法都遵循先进先出排序,这对于验证工作来说是一个很好的特性。请注意,put() 和 get() 都是阻塞的,也就是说,如果邮箱已满,put() 会阻塞当前进程,直到邮箱中出现空位,而 get() 则会阻塞,直到邮箱中出现一条消息。
如果需要调用非阻塞函数,我们可以使用 try_put() 和 try_get()。如果邮箱已满,try_put() 返回 0,如果操作成功,则返回一个正整数。由于我们试图赋值给一个类型可能不兼容的变量,如果发生类型错误,将返回负数。
要检查邮箱中的邮件数量,我们可以使用 num() 方法。请注意,如果进程先调用 num(),然后再使用结果决定是否放入/获取邮件,则会出现竞赛条件。由于这两个操作不是原子操作,因此另一个进程可能会执行一个操作,导致在将消息放入目标进程时,先前从 num() 得到的结果已不再准确!设计者应考虑使用 try_get/try_put() 代替!
Here is an example (code/06/mailbox_ex.sv
) of using variable methods of the mailbox:
解释
module mailbox_ex; mailbox mb; initial begin mb = new(2); fork begin for (int i = 0; i < 4; i++) begin #10 mb.put(i); $display("[0]: @(%0t) put in value: %0d", $time, i); end end begin for (int i = 0; i < 2; i++) begin int value; mb.get(value); $display("[1]: @(%0t) get value: %0d", $time, value); end end begin for (int i = 0; i < 2; i++) begin int value; automatic int attempt = 0; while (mb.try_get(value) <= 0) begin #1; attempt++; end $display("[2]: @(%0t) get value: %0d after %0d attempts", $time, value, attempt); end end join end endmodule
We will see the following output:
解释
[0]: @(10) put in value: 0 [1]: @(10) get value: 0 [0]: @(20) put in value: 1 [1]: @(20) get value: 1 [0]: @(30) put in value: 2 [2]: @(30) get value: 2 after 30 attempts [0]: @(40) put in value: 3 [2]: @(40) get value: 3 after 10 attempts
请注意,由于模拟器调度不同,你可能会看到略有不同的输出,因为线程 1 和线程 2 可能会交换输出值。这是因为线程 1 和线程 2 竞相从同一个邮箱获取消息。
细心的读者可能会注意到,我们可以通过邮箱来实现一个信号。我们鼓励读者尝试一下!
6.5 Generator, Driver, Monitor, and Scoreboard Design Pattern
随着设计变得越来越复杂,通常要经历几代,每一代都要对设计的微架构进行调整,以获得更好的性能。为了避免浪费精力为每一代设计构建测试基础设施,设计人员采用了所谓的生成器-驱动程序-监控器-记分板模式。这种测试模式背后的主要理念是,在测试不同的设计集(通常称为被测设计(DUT))时,尽可能多地重复使用组件。
生成器:在事务级生成测试刺激。- 驱动器:接收测试刺激并驱动 DUT。执行事务级到信号级的转换。- 监控器:监控 DUT 接口并提取相关信号。记分板:将事务级信息与黄金模型进行比较,并在出现错误时进行报告。
只要每个组件之间的接口定义明确,我们就可以根据 DUT 替换或重用某些组件。每个组件通常通过线程安全通道(如邮箱)进行通信,邮箱中的消息内容通常是事务类或结构体。我们将首先逐一介绍各个组件,然后展示所有组件相互交互的完整测试环境示例。
6.5.1 A Simple Ready-Valid Design
在解释测试环境之前,让我们先简单了解一下我们的 DUT,它是一个简单的乘法器,具有现成的有效接口。理想情况下,乘法器实际上是以多周期方式实现的。由于这不是本章的重点,我们将用一个具有多个假流水线级的组合式乘法器来代替。如果读者不熟悉即用型设计模式,请参阅设计中的相关章节。下面是 DUT 的代码(code/06/mult_ex.sv)。请注意,为了简单起见,我们模仿了流水线乘法器。
解释
module mult_ex #( parameter WIDTH = 32 ) ( input logic clk, input logic rst_n, input logic[WIDTH-1:0] a, input logic[WIDTH-1:0] b, output logic[WIDTH-1:0] lo, output logic[WIDTH-1:0] hi, // ready-valid interface // input channel input logic valid_in, output logic ready_out, // output channel output logic valid_out, input logic ready_in ); logic[WIDTH-1:0] data_lo; logic[WIDTH-1:0] data_hi; // we will implement it in a simple FSM typedef enum logic[1:0] { IDLE, WORKING1, WORKING2, FINISH } state_e; state_e state; // set FSM state and values always_ff @(posedge clk, negedge rst_n) begin if (!rst_n) begin data_lo <= 'd0; data_hi <= 'd0; state <= IDLE; end else begin unique case (state) IDLE: begin // if the input is valid if (valid_in) begin // we do some work here since it's simple enough {data_hi, data_lo} <= a * b; // switch to the next state state <= WORKING1; end end WORKING1: begin state <= WORKING2; end WORKING2: begin state <= FINISH; end FINISH: begin // only if the ready in is hi, otherwise we hold the output // values if (ready_in) state <= IDLE; end default: state <= IDLE; endcase end end // set outputs based on internal state always_comb begin ready_out = 1'b0; valid_out = 1'b0; lo = 0; hi = 0; unique case (state) IDLE: begin // we are ready ready_out = 1'b1; end FINISH: begin // the output is valid valid_out = 1'b1; lo = data_lo; hi = data_hi; end default: begin // nothing end endcase end endmodule
Our mult_ex
listens to the ready_in
and use the operand a
and b
to produce hi
and lo
. Once the result is ready, we set the valid to high. Everything is controlled by a simple 2-block FSM.
Another design aspect we need to take care of is the interface. We will connect each component directly using interface to make the code simpler and easier to maintain. Here is a simple interface design we will use (code/06/mult_io_interface.sv
):
解释
interface mult_io_interface #( parameter WIDTH=32 ) ( input logic clk, input logic rst_n ); logic[WIDTH-1:0] a, b, lo, hi; logic ready_in, ready_out, valid_in, valid_out; // define driver and monitor modport interface modport driver ( output a, output b, output valid_in, output ready_in, input clk, input rst_n, input ready_out, input valid_out ); // monitor just passively sample signals modport monitor ( input a, input b, input lo, input hi, input clk, input rst_n, input ready_out, input ready_in, input valid_out, input valid_in ); endinterface
Notice that we use modport
to directly connect some outputs of the driver to the monitor. If needed, we can split the interface into two separated ones should the design gets complex, i.e. one interface for the driver and one for the monitor.
6.5.2 Generator Design
发生器的作用是为我们的任务提供输入刺激。产生所需输入的方法有很多,以下是几种常用的方法:
1. 约束随机。这种方法利用模拟器的求解器产生大量随机但有效的输入,以测试 dut。我们将在本书稍后部分讨论
2. 轨迹重放。这种方法用于重新创建真实的测试环境,其中的输入均来自真实世界的使用情况。
3. 手动生成输入序列。对于小型设计,测试序列可由设计人员直接编码。虽然它不具有可扩展性,但经常被用作直接测试,以测试一些使用受限随机难以覆盖的角落情况。
要设计生成器,我们首先需要编写一个输入类,封装事务的所有输入信息。在本例中,由于我们使用的是乘法器,因此只需将操作数值放入类中即可:
class GeneratorXact #( parameter WIDTH=32 ); logic[WIDTH-1:0] a; logic[WIDTH-1:0] b; endclass
请注意,GeneratorXact 类是以操作数宽度为参数的。在这种情况下,我们可以在不同宽度的乘法器上重复使用同一个类。
为了与驱动程序通信,我们将在生成器内部使用一个邮箱,该邮箱将在构造函数中传入。我们将使用简单的随机数生成器来生成操作数,本书稍后将介绍约束随机数。下面是
class mult_generator; // communication channel to the driver mailbox gen2driver; // number of transaction to generate int num_xact; // the packet GeneratorXact xact; function new(mailbox mb, int num); this.gen2driver = mb; this.num_xact = num; endfunction // entry point of this generator task main(); repeat (this.num_xact) begin xact = new(); xact.a = $random(); xact.b = $random(); this.gen2driver.put(xact); end endtask endclass
Notice that we have a public task main()
that’s used to produce input transactions. This task will be called inside the test environment.
6.5.3 Driver Design
The role of the driver is to serialize the input stimulus onto the interface bus. Unlike the generator, it needs to understand the interface protocol our dut is using, in this case, a simple ready-valid handshake. It pulls the transaction from the mailbox used by the generator, and then drives the net, as shown in the code below (code/06/mult_driver.sv
):
class mult_driver; mailbox gen2driver; // virtual interface handle virtual mult_io_interface.driver driver; GeneratorXact xact; function new(mailbox gen2driver, virtual mult_io_interface.driver driver); this.gen2driver = gen2driver; this.driver = driver; endfunction task reset(); // reset the driver interface wait (!driver.rst_n); driver.a = 0; driver.b = 0; driver.valid_in = 0; driver.ready_in = 0; wait(driver.rst_n); endtask // entry point task main(); // loop forever // we are always ready to receive data driver.ready_in = 1'b1; forever begin this.gen2driver.get(xact); // drive the bus. need to make sure that the dut is ready // block until we have successfully put one transaction in while (1) begin @(posedge driver.clk); if (driver.ready_out) begin // dut is ready driver.a = xact.a; driver.b = xact.b; driver.valid_in = 1'b1; break; end else begin driver.valid_in = 1'b0; end end end endtask endclass
请注意,除了 main() 任务外,我们还有一个重置任务,负责初始化 dut。我们还需要遵守 Ready-valid 协议,也就是说,我们必须等到 dut 准备就绪,否则我们将保持待处理事务并等待。驱动程序不需要知道事务总数等细节。它要做的只是从邮箱中获取一个事务(如果有的话),然后驱动接口。
还要注意的是,我们使用了一种新语法,即使用关键字 virtual 来获取接口的引用。这样我们就可以直接为接口设置值,就好像接口是一个对象一样。
6.5.4 Monitor Design
监控器接入接口总线,并将信号序列化为高级事务类。与驱动程序类似,它需要理解接口协议,然后将事务对象放入与记分板共享的邮箱中。由于完成数据收集通常需要多个周期,因此监控器通常会有内部状态来存储信息。
下图显示了监听接口端口并在 dut 准备就绪时收集数据的监视器(code/06/mult_monitor.sv)
解释
class mult_monitor; mailbox monitor2score; ScoreBoardXact xact; // virtual interface handle virtual mult_io_interface.monitor monitor; function new(mailbox mb, virtual mult_io_interface.monitor monitor); this.monitor2score = mb; this.monitor = monitor; endfunction // entry point task main(); forever begin xact = new(); @(posedge monitor.clk); wait (monitor.valid_in); // grab signals from the bus xact.a = monitor.a; xact.b = monitor.b; @(posedge monitor.clk); // wait until valid out is high wait (monitor.valid_out); // grab the output from the bus xact.lo = monitor.lo; xact.hi = monitor.hi; // put it into the mailbox monitor2score.put(xact); end endtask endclass
与驱动程序类似,它使用邮箱与记分板交互。它等待输入有效信号变为高电平,然后抓取输入信号 a 和 b。然后等待输出有效信号变为高电平,即 dut 已成功计算输出,然后抓取输出 lo 和 hi。一旦我们获得了数据包所需的一切,我们就会将数据包组装起来并放入邮箱。请注意,monitor 并不关心计算输出是否正确!
6.5.5 Scoreboard Design
一旦获得完整的数据包,我们就可以将输出与模型进行比较。我们还可以找出是否有任何数据包丢失或畸形。Scoreboard 通常与 C/C++ 或 SystemVerilog 编写的高级函数模型接口。稍后我们将介绍如何与 C/C++ 模型接口。现在,我们只需在 SystemVerilog 中计算黄金输出。
Below shows the scoreboard for our multiplier (code/06/mult_scoreboard.sv
) :
class mult_scoreboard; mailbox monitor2score; int num_xact; ScoreBoardXact xact; logic[31:0] lo, hi; function new(mailbox mb); this.monitor2score = mb; this.num_xact = 0; endfunction task main(); forever begin monitor2score.get(xact); // assertion part // simplified this.num_xact++; {hi, lo} = xact.a * xact.b; assert (hi == xact.hi); assert (lo == xact.lo); end endtask endclass
Notice that we also keep track of number of transactions, in case there is some protocol bug that drops transaction packets. Once we get the transaction from the mailbox, we simply compute the gold output and assert the result.
6.5.6 Test Environment Setup
Now we have all the major components written, the next step is to set up the test environment. The role of the environment is to instantiate and run the test suites. Below shows an example of test environment (`code/06/mult_env.sv):
解释
class mult_env; // instances mult_generator gen; mult_driver driver; mult_monitor monitor; mult_scoreboard scoreboard; // mailboxes mailbox gen2driver; mailbox monitor2score; function new(int num_xact, virtual mult_io_interface io); // initial mail box first this.gen2driver = new(); this.monitor2score = new(); this.gen = new(gen2driver, num_xact); this.driver = new(gen2driver, io.driver); this.monitor = new(monitor2score, io.monitor); this.scoreboard = new(monitor2score); endfunction task reset(); this.driver.reset(); endtask task test(); fork gen.main(); driver.main(); monitor.main(); scoreboard.main(); join_any endtask task finish(); wait(gen.num_xact == scoreboard.num_xact); endtask task run(); reset(); test(); finish(); $finish(); endtask
In the test environment, we instantiates the test components as well as the mailboxes. Notice that the constructor takes the full interface and use modport when instantiating test components. The main test task, task()
uses fork so each component runs concurrently. We finish the test when the number of transaction received in the scoreboard equals to the number generated from the generator. The entry task is run()
, which first resets the dut, then calls test()
, and eventually finish()
.
To use the test environment, we need to following test bench code (code/06/mult_top
):
module mult_top; // env mult_env env; // interface logic clk, rst_n; // num of xacts localparam num_xact = 42; mult_io_interface io(.*); // dut mult_ex dut (.clk(io.clk), .rst_n(io.rst_n), .a(io.a), .b(io.b), .hi(io.hi), .lo(io.lo), .valid_in(io.valid_in), .valid_out(io.valid_out), .ready_in(io.ready_in), .ready_out(io.ready_out) ); // clocking initial clk = 0; always clk = #5 ~clk; // reset sequence initial begin rst_n = 1; #1; rst_n = 0; #1; rst_n = 1; end // start the test initial begin env = new(num_xact, io); env.run(); end // in case of bug, terminate after certain times initial #(num_xact * 10 * 5) $finish;
The test bench top drives the clock as well as the reset signal. Notice that in order to avoid infinite loop when we have a missing packet (the end condition will never trigger), we set a terminal condition based on the number of cycles run.