3 CPU caches · 软件开发

https://lwn.net/Articles/252125/ ## 3.3.3 Write Behavior Before we start looking at the cache behavior when multiple execution contexts (threads or processes) use the same memory we have to explore a detail of cache implementations. Caches are supposed to be coherent and this coherency is supposed to be completely transparent for the userlevel code. Kernel code is a different story; it occasionally requires cache flushes. 在我们开始研究多个线程或进程同时使用相同内存之前，先来看一下缓存实现的一些细节。我们要求缓存是一致的，而且这种一致性必须对用户级代码完全透明。而内核代码则有所不同，它有时候需要对缓存进行转储(flush)。 ***** This specifically means that, if a cache line is modified, the result for the system after this point in time is the same as if there were no cache at all and the main memory location itself had been modified. This can be implemented in two ways or policies: 这意味着，如果对缓存线进行了修改，那么在这个时间点之后，系统的结果应该是与没有缓存的情况下是相同的，即主存的对应位置也已经被修改的状态。这种要求可以通过两种方式或策略实现： ***** * write-through cache implementation; * write-back cache implementation. * 写通(write-through) * 写回(write-back) ***** The write-through cache is the simplest way to implement cache coherency. If the cache line is written to, the processor immediately also writes the cache line into main memory. This ensures that, at all times, the main memory and cache are in sync. The cache content could simply be discarded whenever a cache line is replaced. This cache policy is simple but not very fast. A program which, for instance, modifies a local variable over and over again would create a lot of traffic on the FSB even though the data is likely not used anywhere else and might be short-lived. 写通比较简单。当修改缓存线时，处理器立即将它写入主存。这样可以保证主存与缓存的内容永远保持一致。当缓存线被替代时，只需要简单地将它丢弃即可。这种策略很简单，但是速度比较慢。如果某个程序反复修改一个本地变量，可能导致FSB上产生大量数据流，而不管这个变量是不是有人在用，或者是不是短期变量。 ***** The write-back policy is more sophisticated. Here the processor does not immediately write the modified cache line back to main memory. Instead, the cache line is only marked as dirty. When the cache line is dropped from the cache at some point in the future the dirty bit will instruct the processor to write the data back at that time instead of just discarding the content. 写回比较复杂。当修改缓存线时，处理器不再马上将它写入主存，而是打上已弄脏(dirty)的标记。当以后某个时间点缓存线被丢弃时，这个已弄脏标记会通知处理器把数据写回到主存中，而不是简单地扔掉。 ***** Write-back caches have the chance to be significantly better performing, which is why most memory in a system with a decent processor is cached this way. The processor can even take advantage of free capacity on the FSB to store the content of a cache line before the line has to be evacuated. This allows the dirty bit to be cleared and the processor can just drop the cache line when the room in the cache is needed. 写回有时候会有非常不错的性能，因此较好的系统大多采用这种方式。采用写回时，处理器们甚至可以利用FSB的空闲容量来存储缓存线。这样一来，当需要缓存空间时，处理器只需清除脏标记，丢弃缓存线即可。 ***** But there is a significant problem with the write-back implementation. When more than one processor (or core or hyper-thread) is available and accessing the same memory it must still be assured that both processors see the same memory content at all times. If a cache line is dirty on one processor (i.e., it has not been written back yet) and a second processor tries to read the same memory location, the read operation cannot just go out to the main memory. Instead the content of the first processor's cache line is needed. In the next section we will see how this is currently implemented. 但写回也有一个很大的问题。当有多个处理器(或核心、超线程)访问同一块内存时，必须确保它们在任何时候看到的都是相同的内容。如果缓存线在其中一个处理器上弄脏了(修改了，但还没写回主存)，而第二个处理器刚好要读取同一个内存地址，那么这个读操作不能去读主存，而需要读第一个处理器的缓存线。在下一节中，我们将研究如何实现这种需求。 ***** Before we get to this there are two more cache policies to mention: * write-combining; and * uncacheable. 在此之前，还有其它两种缓存策略需要提一下: * 写入合并 * 不可缓存 ***** Both these policies are used for special regions of the address space which are not backed by real RAM. The kernel sets up these policies for the address ranges (on x86 processors using the Memory Type Range Registers, MTRRs) and the rest happens automatically. The MTRRs are also usable to select between write-through and write-back policies. 这两种策略用于真实内存不支持的特殊地址区，内核为地址区设置这些策略(x86处理器利用内存类型范围寄存器MTRR)，余下的部分自动进行。MTRR还可用于写通和写回策略的选择。 ***** Write-combining is a limited caching optimization more often used for RAM on devices such as graphics cards. Since the transfer costs to the devices are much higher than the local RAM access it is even more important to avoid doing too many transfers. Transferring an entire cache line just because a word in the line has been written is wasteful if the next operation modifies the next word. One can easily imagine that this is a common occurrence, the memory for horizontal neighboring pixels on a screen are in most cases neighbors, too. As the name suggests, write-combining combines multiple write accesses before the cache line is written out. In ideal cases the entire cache line is modified word by word and, only after the last word is written, the cache line is written to the device. This can speed up access to RAM on devices significantly. 写入合并是一种有限的缓存优化策略，更多地用于显卡等设备之上的内存。由于设备的传输开销比本地内存要高的多，因此避免进行过多的传输显得尤为重要。如果仅仅因为修改了缓存线上的一个字，就传输整条线，而下个操作刚好是修改线上的下一个字，那么这次传输就过于浪费了。而这恰恰对于显卡来说是比较常见的情形——屏幕上水平邻接的像素往往在内存中也是靠在一起的。顾名思义，写入合并是在写出缓存线前，先将多个写入访问合并起来。在理想的情况下，缓存线被逐字逐字地修改，只有当写入最后一个字时，才将整条线写入内存，从而极大地加速内存的访问。 ***** Finally there is uncacheable memory. This usually means the memory location is not backed by RAM at all. It might be a special address which is hardcoded to have some functionality outside the CPU. For commodity hardware this most often is the case for memory mapped address ranges which translate to accesses to cards and devices attached to a bus (PCIe etc). On embedded boards one sometimes finds such a memory address which can be used to turn an LED on and off. Caching such an address would obviously be a bad idea. LEDs in this context are used for debugging or status reports and one wants to see this as soon as possible. The memory on PCIe cards can change without the CPU's interaction, so this memory should not be cached. 最后来讲一下不可缓存的内存。一般指的是不被RAM支持的内存位置，它可以是硬编码的特殊地址，承担CPU以外的某些功能。对于商用硬件来说，比较常见的是映射到外部卡或设备的地址。在嵌入式主板上，有时也有类似的地址，用来开关LED。对这些地址进行缓存显然没有什么意义。比如上述的LED，一般是用来调试或报告状态，显然应该尽快点亮或关闭。而对于那些PCI卡上的内存，由于不需要CPU的干涉即可更改，也不该缓存。