1 概述
Tap设备通常用于虚拟化场景下,参考如下场景:
图中标注了关键函数,以及数据流向。
tun有两个数据接口,
- file,给用户态使用;
- socket,给内核态使用,例如vhost
2 异步处理
图中,蓝色线标识的是虚机输出的网络流量,在tap设备这边,不存在异步处理,参考代码:
tun_sendmsg() / tun_chr_write_iter()
-> tun_get_user()
-> tun_rx_batched()
-> netif_receive_skb()
如红色线部分,是输入到虚拟机的网络流量,在tap设备这边,则存在异步处理,需要wait和wakeup,参考代码:
tun_recvmsg() / tun_chr_read_iter()
-> tun_do_read()
-> tun_ring_recv()
---
ptr = ptr_ring_consume(&tfile->tx_ring);
if (ptr)
goto out;
if (noblock) {
error = -EAGAIN;
goto out;
}
add_wait_queue(&tfile->socket.wq.wait, &wait);
while (1) {
set_current_state(TASK_INTERRUPTIBLE);
ptr = ptr_ring_consume(&tfile->tx_ring);
if (ptr)
break;
...
schedule();
}
__set_current_state(TASK_RUNNING);
remove_wait_queue(&tfile->socket.wq.wait, &wait);
---
tun_net_xmit()
---
if (ptr_ring_produce(&tfile->tx_ring, skb))
goto drop;
/* NETIF_F_LLTX requires to do our own update of trans_start */
queue = netdev_get_tx_queue(dev, txq);
queue->trans_start = jiffies;
/* Notify and wake up reader process */
if (tfile->flags & TUN_FASYNC)
kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
tfile->socket.sk->sk_data_ready(tfile->socket.sk);
---
sock_def_readable()
---
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
if (skwq_has_sleeper(wq))
wake_up_interruptible_sync_poll(&wq->wait, EPOLLIN | EPOLLPRI |
EPOLLRDNORM | EPOLLRDBAND);
sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
rcu_read_unlock();
---
By default, sk->sk_wq is socket->wq,参考sock_init_data()
vhost只是数据使用了socket接口,在等待来自tap的数据时,它使用的了poll,
vhost_net_enable_vq()
---
sock = vhost_vq_get_backend(vq);
if (!sock)
return 0;
return vhost_poll_start(poll, sock->file);
---
tun_chr_poll()
---
sk = tfile->socket.sk;
poll_wait(file, sk_sleep(sk), wait);
...
---
vhost_poll_init()
---
init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
---
sk_sleep()就是sk->sk_wq,在sk_def_readable()会对其执行唤醒操作,进而调用vhost_poll_wakeup(),后者会提交一个vhost work,执行handle_rx操作。