简介
本文讲解得e1000e网卡驱动主要用于intel网卡,以驱动的设计流程,分析整个驱动的接收和发送包过程。
首先介绍4个e1000e基础知识: 1)PCIE的配置空间初始化:PCIE卡都遵循一个标准, x86通过往2个内存地址读写就可以控制IO桥访问一个内部寄存器+一个地址偏移, 就可以读写PCI的配置空间, 操作系统实际上就是用这个机制, 判断卡位是否插上了卡, 卡是否合法, 以及写对应的配置区域(相当于初始化); 2)msix机制及初始化:OS在初始化配置区的时候, 会根据卡将pci卡的msix起始地址写到pci配置的扩展能力区域, 驱动只需要去读取对应的区域, 像os申请msix向量, 即可使用msix中断是一种特殊的中断, 不需要中断线, 但需要PCIE具备msix能力, 主机也必须支持apic才可. 当系统初始化时, 同时初始化主机上2个特殊硬件, IOAPIC和LocalAPIC, 在内存虚拟地址中开辟一段内存, 给每个cpu分配中断向量. 后面只要往这个内存上写触发设备信息, 那么就会被内存控制器劫持, 内存控制立即明白这是有外设触发了中断, 通知ioapic发送广播, 当对应的cpu判断对应的向量, 知道这个是要被自己处理, 就会处理这个中断。 3)napi机制:napi也是网络设备的一个机制, 把设备的napi的list挂到系统上, 随即发送一个软中断, 调用一个回调函数 4) dma机制:e1000采用的是自动收发, 就是说数据包从网卡的fifo到skb里面, 或者从skb到网卡的fifo是由dma自动完成的, 在完成后会触发msix中断
下面进行源码的分析:1、注册网卡驱动:
static int __inite1000_init_module(void){ return pci_register_driver(&e1000_driver);}module_init(e1000_init_module);
e1000_init_module() 只干了一个事情, 注册了一个pci驱动结构体到pci驱动链表, 当pci注册后, 根据pci驱动框架, 匹配成功后自然会执行probe函数。
static struct pci_driver e1000_driver = { .name = e1000e_driver_name, .id_table = e1000_pci_tbl, .probe = e1000_probe, .remove = e1000_remove, .driver = { .pm = &e1000_pm_ops, },};
probe函数主要做以下工作: 1、协调bus总线宽度 2、分配虚拟地址映射地址 3、分配并初始化网络设备,主要描述硬件, 及设备内存, 控制调度等信息 4、初始化网卡的收发队列net_queue、网卡的mac地址链表、 name space挂载到内核链表、 报文最大长度、napi、设备名称、硬件frame长度、 映射bar0空间、分配中断信息、分配adapter的ring结构、读取eeprom的信息等等…
staticinte1000_probe(struct pci_dev *pdev, conststruct pci_device_id *ent){ struct net_device *netdev; struct e1000_adapter *adapter; struct e1000_hw *hw; const struct e1000_info *ei = e1000_info_tbl[ent->driver_data]; resource_size_t mmio_start, mmio_len; resource_size_t flash_start, flash_len; err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));//协调总线宽度 if (!err) { pci_using_dac = 1; } else { err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32)); if (err) { dev_err(&pdev->dev, "No usable DMA configuration, aborting\n"); goto err_dma; } } bars = pci_select_bars(pdev, IORESOURCE_MEM);//映射bar err = pci_request_selected_regions_exclusive(pdev, bars, e1000e_driver_name); // 霸占虚拟地址映地址 pci_set_master(pdev); /* PCI config space info */ err = pci_save_state(pdev); netdev = alloc_etherdev(sizeof(struct e1000_adapter)); //分配并初始化网络设备 if (!netdev) goto err_alloc_etherdev; SET_NETDEV_DEV(netdev, &pdev->dev); netdev->irq = pdev->irq; pci_set_drvdata(pdev, netdev); adapter = netdev_priv(netdev); mmio_start = pci_resource_start(pdev, 0); mmio_len = pci_resource_len(pdev, 0); adapter->hw.hw_addr = ioremap(mmio_start, mmio_len); if (!adapter->hw.hw_addr) goto err_ioremap; if ((adapter->flags & FLAG_HAS_FLASH) && (pci_resource_flags(pdev, 1) & IORESOURCE_MEM) && (hw->mac.type < e1000_pch_spt)) { flash_start = pci_resource_start(pdev, 1); flash_len = pci_resource_len(pdev, 1); adapter->hw.flash_address = ioremap(flash_start, flash_len); if (!adapter->hw.flash_address) goto err_flashmap; } /* Set default EEE advertisement */ if (adapter->flags2 & FLAG2_HAS_EEE) adapter->eee_advert = MDIO_EEE_100TX | MDIO_EEE_1000T; /* construct the net_device struct */ netdev->netdev_ops = &e1000e_netdev_ops; e1000e_set_ethtool_ops(netdev); netdev->watchdog_timeo = 5 * HZ; netif_napi_add(netdev, &adapter->napi, e1000e_poll, 64); strlcpy(netdev->name, pci_name(pdev), sizeof(netdev->name)); netdev->mem_start = mmio_start; netdev->mem_end = mmio_start + mmio_len; adapter->bd_number = cards_found++; e1000e_check_options(adapter); /* setup adapter struct */ e1000_sw_init(adapter); if (hw->phy.ops.check_reset_block && hw->phy.ops.check_reset_block(hw)) dev_info(&pdev->dev, "PHY reset is blocked due to SOL/IDER session.\n"); e1000_eeprom_checks(adapter);//读取eeprom的信息 /* copy the MAC address */ e1000e_read_mac_addr(&adapter->hw); memcpy(netdev->dev_addr, adapter->hw.mac.addr, netdev->addr_len);//初始化e1000_adapter成员参数 init_timer(&adapter->watchdog_timer); adapter->watchdog_timer.function = e1000_watchdog; adapter->watchdog_timer.data = (unsigned long)adapter; /* reset the hardware with the new settings */ e1000e_reset(adapter);//重置e1000e网卡 /* If the controller has AMT, do not set DRV_LOAD until the interface * is up. For all other cases, let the f/w know that the h/w is now * under the control of the driver. */ if (!(adapter->flags & FLAG_HAS_AMT)) e1000e_get_hw_control(adapter); strlcpy(netdev->name, "eth%d", sizeof(netdev->name)); err = register_netdev(netdev);//注册网卡驱动 /* carrier off reporting is important to ethtool even BEFORE open */ netif_carrier_off(netdev); e1000_print_device_info(adapter); return 0;}
这个阶段有2个部分需要注意:
- 网络设备基础信息被分配, 但是真正数据传输相关结构和内存没有分配, 中断线没有分配, 也就是说, 这个过程仅仅实例化了一个网卡设备的空壳, 并没有占用实际的硬件资源。
- struct net_device的操作函数被初始化, 也就是说, 后面后面网卡执行up和down的时候, 就可直接调用网卡的ops方法 . 这种设计是非常好的, 用的时候分配, 不用的时候不占用资源。
2、网卡up过程 调用e1000_open函数: 分配adapter->tx_ring的desc一致性内存, 共256个desc, 并初始化tx_ring,desc是真正要分配给dma控制器的:
inte1000e_setup_tx_resources(struct e1000_ring *tx_ring){ struct e1000_adapter *adapter = tx_ring->adapter; int err = -ENOMEM, size; size = sizeof(struct e1000_buffer) * tx_ring->count; tx_ring->buffer_info = vzalloc(size); if (!tx_ring->buffer_info) goto err; /* round up to nearest 4K */ tx_ring->size = tx_ring->count * sizeof(struct e1000_tx_desc); tx_ring->size = ALIGN(tx_ring->size, 4096); e1000_alloc_ring_dma(adapter, tx_ring);}
分配adapter的buffer_info结构, 共256个,buffer_info只是一个描述结构体
inte1000e_setup_rx_resources(struct e1000_ring *rx_ring){ struct e1000_adapter *adapter = rx_ring->adapter; struct e1000_buffer *buffer_info; int i, size, desc_len, err = -ENOMEM; size = sizeof(struct e1000_buffer) * rx_ring->count; rx_ring->buffer_info = vzalloc(size); for (i = 0; i < rx_ring->count; i++) { buffer_info = &rx_ring->buffer_info[i]; buffer_info->ps_pages = kcalloc(PS_PAGE_BUFFERS, sizeof(struct e1000_ps_page), GFP_KERNEL); if (!buffer_info->ps_pages) goto err_pages; } desc_len = sizeof(union e1000_rx_desc_packet_split); /* Round up to nearest 4K */ rx_ring->size = rx_ring->count * desc_len; rx_ring->size = ALIGN(rx_ring->size, 4096); e1000_alloc_ring_dma(adapter, rx_ring);}
e1000_configure主要做下面的工作: 1、设置收包模式, 设置管理信息等; 2、设置发包函数dma控制的传输地址; 3、设置收包函数内存分配回调函数和清理回调函数以及设置dma控制的传输地址。
staticvoide1000_configure(struct e1000_adapter *adapter){ struct e1000_ring *rx_ring = adapter->rx_ring; e1000e_set_rx_mode(adapter->netdev); e1000_restore_vlan(adapter); e1000_init_manageability_pt(adapter); e1000_configure_tx(adapter);//设置dma控制的传输地址 if (adapter->netdev->features & NETIF_F_RXHASH) e1000e_setup_rss_hash(adapter); e1000_setup_rctl(adapter); e1000_configure_rx(adapter);//设置收包函数内存分配回调函数, 和清理回调函数. 及设置dma控制的传输地址 adapter->alloc_rx_buf(rx_ring, e1000_desc_unused(rx_ring), GFP_KERNEL);}
然后调用发包函数e1000_configure_tx:
staticvoide1000_configure_tx(struct e1000_adapter *adapter){ struct e1000_hw *hw = &adapter->hw; struct e1000_ring *tx_ring = adapter->tx_ring; u64 tdba; u32 tdlen, tctl, tarc; /* Setup the HW Tx Head and Tail descriptor pointers */ tdba = tx_ring->dma;}
最后调用收包函数e1000_configure_rx:
staticvoide1000_configure_rx(struct e1000_adapter *adapter){ struct e1000_hw *hw = &adapter->hw; struct e1000_ring *rx_ring = adapter->rx_ring; u64 rdba; u32 rdlen, rctl, rxcsum, ctrl_ext; rdlen = rx_ring->count * sizeof(union e1000_rx_desc_extended); adapter->clean_rx = e1000_clean_rx_irq; adapter->alloc_rx_buf = e1000_alloc_rx_buffers; rdba = rx_ring->dma;}
e1000_clean_rx_irq: 收包函数 (umap已收取的报文skb, 并申请新的skb, 做dma map到desc上, 并把报文传递给e1000_receive_skb(内核中上层报文处理函数:
staticboole1000_clean_rx_irq(struct e1000_ring *rx_ring, int *work_done, int work_to_do){dma_rmb(); /* read descriptor and rx_buffer_info after status DD */ skb = buffer_info->skb; buffer_info->skb = NULL; cleaned_count++; dma_unmap_single(&pdev->dev, buffer_info->dma, adapter->rx_buffer_len, DMA_FROM_DEVICE); buffer_info->dma = 0; length = le16_to_cpu(rx_desc->wb.upper.length); if (length < copybreak) { struct sk_buff *new_skb = napi_alloc_skb(&adapter->napi, length); e1000_receive_skb(adapter, netdev, skb, staterr, rx_desc->wb.upper.vlan);}
e1000_alloc_rx_buffers函数主要进行skb分配256个, 同时将skb的物理地址传给dma控制器, 并提示dma可以0处开始收包, 可以一直收256个:
staticvoide1000_alloc_rx_buffers(struct e1000_ring *rx_ring, int cleaned_count, gfp_t gfp){ struct e1000_adapter *adapter = rx_ring->adapter; struct net_device *netdev = adapter->netdev; struct pci_dev *pdev = adapter->pdev; union e1000_rx_desc_extended *rx_desc; struct e1000_buffer *buffer_info; struct sk_buff *skb; unsigned int i; unsigned int bufsz = adapter->rx_buffer_len; i = rx_ring->next_to_use; buffer_info = &rx_ring->buffer_info[i]; while (cleaned_count--) { skb = buffer_info->skb; if (skb) { skb_trim(skb, 0); goto map_skb; } skb = __netdev_alloc_skb_ip_align(netdev, bufsz, gfp); if (!skb) { /* Better luck next round */ adapter->alloc_rx_buff_failed++; break; } } buffer_info->skb = skb;map_skb: buffer_info->dma = dma_map_single(&pdev->dev, skb->data, adapter->rx_buffer_len, DMA_FROM_DEVICE);}
2、申请msix中断和常规中断
static int e1000_request_irq(struct e1000_adapter *adapter){ struct net_device *netdev = adapter->netdev; int err; if (adapter->msix_entries) { err = e1000_request_msix(adapter); if (!err) return err; /* fall back to MSI */ e1000e_reset_interrupt_capability(adapter); adapter->int_mode = E1000E_INT_MODE_MSI; e1000e_set_interrupt_capability(adapter); } if (adapter->flags & FLAG_MSI_ENABLED) { err = request_irq(adapter->pdev->irq, e1000_intr_msi, 0, netdev->name, netdev); if (!err) return err; /* fall back to legacy interrupt */ e1000e_reset_interrupt_capability(adapter); adapter->int_mode = E1000E_INT_MODE_LEGACY; } err = request_irq(adapter->pdev->irq, e1000_intr, IRQF_SHARED, netdev->name, netdev); return err;}
e1000_request_msix函数分别注册了3个中断函数: e1000_intr_msix_rx, e1000_intr_msix_tx , e1000_msix_other
static int e1000_request_msix(struct e1000_adapter *adapter){ struct net_device *netdev = adapter->netdev; int err = 0, vector = 0; err = request_irq(adapter->msix_entries[vector].vector, e1000_intr_msix_rx, 0, adapter->rx_ring->name, netdev); err = request_irq(adapter->msix_entries[vector].vector, e1000_intr_msix_tx, 0, adapter->tx_ring->name, netdev); adapter->tx_ring->itr_register = adapter->hw.hw_addr + E1000_EITR_82574(vector); adapter->tx_ring->itr_val = adapter->itr; vector++; err = request_irq(adapter->msix_entries[vector].vector, e1000_msix_other, 0, netdev->name, netdev); e1000_configure_msix(adapter);//设置dma发送中断的频率以及内容 return 0;}
e1000_intr_msix_rx: 调用系统收包函数. 通过调用发送软中断, 通知内核调度网卡napi的poll函数. 调用e1000_clean, 这个函数清理收队列skb的映射信息. 申请同等数量的skb, 同时根据流量, 设置是否卡其常规中断
staticirqreturn_te1000_intr_msix_rx(int __always_unused irq, void *data){ struct net_device *netdev = data; struct e1000_adapter *adapter = netdev_priv(netdev); struct e1000_ring *rx_ring = adapter->rx_ring; /* Write the ITR value calculated at the end of the * previous interrupt. */ if (rx_ring->set_itr) { u32 itr = rx_ring->itr_val ? 1000000000 / (rx_ring->itr_val * 256) : 0; writel(itr, rx_ring->itr_register); rx_ring->set_itr = 0; } if (napi_schedule_prep(&adapter->napi)) { adapter->total_rx_bytes = 0; adapter->total_rx_packets = 0; __napi_schedule(&adapter->napi); } return IRQ_HANDLED;}
e1000_intr_msix_tx: 清理中间信息. 调用e1000_clean_tx_irq释放已经发送完成的skb内存, 解除skb的dma映射:
staticirqreturn_te1000_intr_msix_tx(int __always_unused irq, void *data){ struct net_device *netdev = data; struct e1000_adapter *adapter = netdev_priv(netdev); struct e1000_hw *hw = &adapter->hw; struct e1000_ring *tx_ring = adapter->tx_ring; adapter->total_tx_bytes = 0; adapter->total_tx_packets = 0; if (!e1000_clean_tx_irq(tx_ring)) /* Ring was not completely cleaned, so fire another interrupt */ ew32(ICS, tx_ring->ims_val); if (!test_bit(__E1000_DOWN, &adapter->state)) ew32(IMS, adapter->tx_ring->ims_val); return IRQ_HANDLED;}
最后要说明一下的是发包函数, 系统发包调用底层的的e1000_xmit_frame, 这个函数重要的功能就是把要发送的报文地址映射到dma发射区. 通知dma发送
staticnetdev_tx_te1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev){ struct e1000_adapter *adapter = netdev_priv(netdev); struct e1000_ring *tx_ring = adapter->tx_ring; unsigned int first; unsigned int tx_flags = 0; unsigned int len = skb_headlen(skb); unsigned int nr_frags; unsigned int mss; int count = 0; int tso; unsigned int f; count = e1000_tx_map(tx_ring, skb, first, adapter->tx_fifo_limit, nr_frags); netdev_sent_queue(netdev, skb->len); e1000_tx_queue(tx_ring, tx_flags, count); /* Make sure there is space in the ring for the next send. */ e1000_maybe_stop_tx(tx_ring, (MAX_SKB_FRAGS * DIV_ROUND_UP(PAGE_SIZE, adapter->tx_fifo_limit) + 2)); dev_kfree_skb_any(skb);}
总结
1、收包的大致流程:
- 申请skb, 把skb映射到dma, 开启dma收包;
- dma收包后发起中断, 调用, 清理dma映射区, 申请同等数量的skb, 把这些skb重新映射到dma, 相当于把空闲的dma补上;
- 并根据流量大小, 看是否开启传统中断(传统中断的处理上面有描述);
- 循环; e1000_intr_msix_rx -> e1000_clean -> e1000_clean_rx_irq -> e1000_receive_skb
2、发包的大致流程:
- 调用底层驱动e1000_xmit_frame发送skb;
- 发送完成,发送中断, 调用e1000_intr_msix_tx, 解除skb的dma映射;
- 循环; e1000_xmit_frame-> e1000_tx_map -> e1000_tx_queue -> e1000_intr_msix_tx -> e1000_clean_tx_irq -> (skb_dma_unmap, dev_kfree_skb_any)