https://stackoverflow.com/questions/48360238/how-can-the-l1-l2-l3-cpu-caches-be-turned-off-on-modern-x86-amd64-chips
11
3
Every modern high-performance CPU of the x86/x86_64 architecture has some hierarchy of data caches: L1, L2, and sometimes L3 (and L4 in very rare cases), and data loaded from/to main RAM is cached in some of them.
Sometimes the programmer may want some data to not be cached in some or all cache levels (for example, when wanting to memset 16 GB of RAM and keep some data still in the cache): there are some non-temporal (NT) instructions for this like MOVNTDQA (https://stackoverflow.com/a/37092 http://lwn.net/Articles/255364/)
But is there a programmatic way (for some AMD or Intel CPU families like P3, P4, Core, Core i*, ...) to completely (but temporarily) turn off some or all levels of the cache, to change how every memory access instruction (globally or for some applications / regions of RAM) uses the memory hierarchy? For example: turn off L1, turn off L1 and L2? Or change every memory access type to "uncached" UC (CD+NW bits of CR0??? SDM vol3a pages 423 424, 425 and "Third-Level Cache Disable flag, bit 6 of the IA32_MISC_ENABLE MSR (Available only in processors based on Intel NetBurst microarchitecture) — Allows the L3 cache to be disabled and enabled, independently of the L1 and L2 caches.").
I think such action will help to protect data from cache side channel attacks/leaks like stealing AES keys, covert cache channels, Meltdown/Spectre. Although this disabling will have an enormous performance cost.
PS: I remember such a program posted many years ago on some technical news website, but can't find it now. It was just a Windows exe to write some magical values into an MSR and make every Windows program running after it very slow. The caches were turned off until reboot or until starting the program with the "undo" option.
x86 intel cpu-cache memory-access msr
shareimprove this question
edited Jan 21 '18 at 21:19
Boann
38.3k1313 gold badges9292 silver badges123123 bronze badges
asked Jan 20 '18 at 19:26
osgx
59.3k3333 gold badges257257 silver badges415415 bronze badges
Hello! Check "Disabling and Enabling the L3 Cache" (and around) and "MTRR" sections of the Intel SDM vol. 3a software.intel.com/en-us/articles/intel-sdm - xem.github.io/minix86/manual/intel-x86-and-64-manual-vol3/… xem.github.io/minix86/manual/intel-x86-and-64-manual-vol3/… "The third-level cache disable flag (bit 6 of the IA32_MISC_ENABLE MSR) allows the L3 cache to be disabled and enabled, independently of the L1 and L2 caches", and IA32_MISC_ENABLE, page 424 "Table 11-5. Cache Operating Modes" CD flag of CR0 reg – osgx Jan 20 '18 at 19:36
Possible duplicate of enable/disable cache on intel 64bit machine: CD bit always set? and system becomes extremely slow after disable cache. Also: Disabling cache was used to attack SGX enclave: Georgia/MS 2016 1611.06952v1 "Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch Shadowing" "Disabling cache. If we want to attack .. short loop" – osgx Jan 20 '18 at 19:49
Also: linuxquestions.org/questions/linux-kernel-70/… and memtest's cache_on/cache_off functions: github.com/vathpela/memtest86-/blob/master/test.h#L206; software.intel.com/en-us/forums/… "The caches may not be used but they are not disabled." and software.intel.com/en-us/forums/… "CR0.CD has a scope of "core".". Also for partial disable: PCD "page-level cache disable (bit 4 of cr3)" – osgx Jan 20 '18 at 19:52
add a comment
1 Answer
active oldest votes
10
The Intel's manual 3A, Section 11.5.3, provides an algorithm to globally disable the caches:
11.5.3 Preventing Caching
To disable the L1, L2, and L3 caches after they have been enabled and have received cache fills, perform the following steps:
Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag to 0.
Flush all caches using the WBINVD instruction.
Disable the MTRRs and set the default memory type to uncached or set all MTRRs for the uncached memory type (see the discussion of the discussion of the TYPE field and the E flag in Section 11.11.2.1, “IA32_MTRR_DEF_TYPE MSR”).
The caches must be flushed (step 2) after the CD flag is set to ensure system memory coherency. If the caches are not flushed, cache hits on reads will still occur and data will be read from valid cache lines.
The intent of the three separate steps listed above addresses three distinct requirements: (i) discontinue new data replacing existing data in the cache (ii) ensure data already in the cache are evicted to memory, (iii) ensure subsequent memory references observe UC memory type semantics. Different processor implementation of caching control hardware may allow some variation of software implementation of these three requirements. See note below.
NOTES Setting the CD flag in control register CR0 modifies the processor’s caching behaviour as indicated in Table 11-5, but setting the CD flag alone may not be sufficient across all processor families to force the effective memory type for all physical memory to be UC nor does it force strict memory ordering, due to hardware implementation variations across different processor families. To force the UC memory type and strict memory ordering on all of physical memory, it is sufficient to either program the MTRRs for all physical memory to be UC memory type or disable all MTRRs.
For the Pentium 4 and Intel Xeon processors, after the sequence of steps given above has been executed, the cache lines containing the code between the end of the WBINVD instruction and before the MTRRS have actually been disabled may be retained in the cache hierarchy. Here, to remove code from the cache completely, a second WBINVD instruction must be executed after the MTRRs have been disabled.
That's a long quote but it boils down to this code
;Step 1 - Enter no-fill mode
mov eax, cr0
or eax, 1<<30 ; Set bit CD
and eax, ~(1<<29) ; Clear bit NW
mov cr0, eax
;Step 2 - Invalidate all the caches
wbinvd
;All memory accesses happen from/to memory now, but UC memory ordering may not be enforced still.
;For Atom processors, we are done, UC semantic is automatically enforced.
xor eax, eax
xor edx, edx
mov ecx, IA32_MTRR_DEF_TYPE ;MSR number is 2FFH
wrmsr
;P4 only, remove this code from the L1I
wbinvd
most of which is not executable from user mode.
AMD's manual 2 provides a similar algorithm in section 7.6.2
7.6.2 Cache Control Mechanisms
The AMD64 architecture provides a number of mechanisms for controlling the cacheability of memory. These are described in the following sections.
Cache Disable. Bit 30 of the CR0 register is the cache-disable bit, CR0.CD. Caching is enabled when CR0.CD is cleared to 0, and caching is disabled when CR0.CD is set to 1. When caching is disabled, reads and writes access main memory.
Software can disable the cache while the cache still holds valid data (or instructions). If a read or write hits the L1 data cache or the L2 cache when CR0.CD=1, the processor does the following:
Writes the cache line back if it is in the modified or owned state.
Invalidates the cache line.
Performs a non-cacheable main-memory access to read or write the data.
If an instruction fetch hits the L1 instruction cache when CR0.CD=1, some processor models may read the cached instructions rather than access main memory. When CR0.CD=1, the exact behavior of L2 and L3 caches is model-dependent, and may vary for different types of memory accesses.
The processor also responds to cache probes when CR0.CD=1. Probes that hit the cache cause the processor to perform Step 1. Step 2 (cache-line invalidation) is performed only if the probe is performed on behalf of a memory write or an exclusive read.
Writethrough Disable. Bit 29 of the CR0 register is the not writethrough disable bit, CR0.NW. In early x86 processors, CR0.NW is used to control cache writethrough behavior, and the combination of CR0.NW and CR0.CD determines the cache operating mode.
[...]
In implementations of the AMD64 architecture, CR0.NW is not used to qualify the cache operating mode established by CR0.CD.
This translates to this code (very similar to the Intel's one):
;Step 1 - Disable the caches
mov eax, cr0
or eax, 1<<30
mov cr0, eax
;For some models we need to invalidated the L1I
wbinvd
;Step 2 - Disable speculative accesses
xor eax, eax
xor edx, edx
mov ecx, MTRRdefType ;MSR number is 2FFH
wrmsr
Caches can also be selectively disabled at:
Page level, with the attribute bits PCD (Page Cache Disable) [Only for Pentium Pro and Pentium II].
When both are clear the MTTR of relevance is used, if PCD is set the aching
Page level, with the PAT (Page Attribute Table) mechanism.
By filling the IA32_PAT with caching types and using the bits PAT, PCD, PWT as a 3-bit index it's possible to select one the six caching types (UC-, UC, WC, WT, WP, WB).
Using the MTTRs (fixed or variable).
By setting the caching type to UC or UC- for specific physical areas.
Of these options only the page attributes can be exposed to user mode programs (see for example this).
shareimprove this answer
answered Jan 20 '18 at 22:02
Margaret Bloom
24k55 gold badges3333 silver badges7171 bronze badges
So, PCD bit and MTTR disables all levels of cache? What about selective disabling? Will bit 6 of the IA32_MISC_ENABLE MSR disable only L3 keeping L1 and L2 online, is it documented? Is there usable linux kernel module source to test CR0.CD disabling? – osgx Jan 21 '18 at 0:47
@osgx Bit 6 of IA32_MISC_ENABLE should be present just in the Netburst arch (Pentium 4 and Xeons of the time). However, recent Xeons have the CAT (Cache Allocation Technology), described in Chapter 17.18, to assign chunks of the LLC to cores (including no chunks at all). The MTTRs disable all the caches (the processor doesn't even respond to snoops). The PCD bit does the same but due to page aliasing I'm not sure if hits go all the way to memory (I believe so, any line hitting should be invalidated on any access and eventually refilled when accessed from the aliasing page with cache enabled) – Margaret Bloom Jan 21 '18 at 9:50
@osgx I'm not aware of any kernel module, but there is an example module here. I see if I can turn it into a cache disabling module if you don't mind compiling. – Margaret Bloom Jan 21 '18 at 9:54
- 程序优化
- vtune
- linux性能监控软件Perf
- 系统级性能分析工具perf的介绍与使用
- perf的二级命令
- 全局性概况
- 全局细节
- 最常用功能perf record
- 可视化工具perf timechart
- perf引入的overhead
- perf stat
- gprof
- 三种Linux性能分析工具的比较
- perf+gprof+gprof2dot+graphviz进行性能分析热点
- 英特尔多核平台编程优化大赛报告
- 内存操作
- mmap
- mmap的分类
- 深入理解内存映射mmap
- 计算机底层知识拾遗(九)深入理解内存映射mmap
- 内核驱动mmap Handler利用技术(一)
- Windows内存管理机制及C++内存分配实例
- Linux内存管理初探
- Windows CPU信息查看
- Linux CPU信息查看
- 预留大内存
- Linux下试验大页面映射
- /dev/mem
- Linux中通过/dev/mem操控物理地址
- /dev/mem分析
- 用法举例
- Linux下直接读写物理地址内存
- 查看内存信息
- Cache Memory
- 页面缓存
- 查看各级cache信息的方法
- dmidecode命令查看cache size
- CPU Cache 机制以及 Cache miss
- ARM体系关闭mmu和cache
- CR0-4寄存器介绍
- 查看CR0,CR2,CR3的值
- Linux 下如何禁用CPU cache
- 7个示例科普CPU Cache
- 第一个例子的C代码
- 其中之一
- Linux 从虚拟地址到物理地址
- 内存测试例子
- 每个程序员都应该了解的内存
- Part 1
- 程序员能够做什么
- 3 CPU caches
- 6 What Programmers Can Do
- VirtualAlloc
- Large-Page Support
- Some remarks on VirtualAlloc and MEM_LARGE_PAGES
- DMA
- MOV和MOVS的效率问题?如何高效的拷贝内存 中的数据
- how to use movntdqa to avoid cache pollution
- 计算机底层知识拾遗(一)理解虚拟内存机制
- How to access the control registers cr0,cr2,cr3 from a program
- 细说Cache-L1/L2/L3/TLB
- what-is-the-meaning-of-non-temporal-memory-accesses-in-x86
- How can the L1, L2, L3 CPU caches be turned off on modern x86/amd64 chips?
- UA list
- GDB
- 程序运行参数
- Linux下GDB的多线程调试
- CMake
- CMake快速入门教程:实战
- cmake打印变量值
- function
- source_group
- cmake_parse_arguments
- 编译.S文件
- add_definitions
- CMake添加-g编译选项
- Debug模式下启动
- Mysql
- Mysql联合查询union和union all的使用介绍
- MySQL数据库导入错误:ERROR 1064 (42000) 和 ERROR at line xx: Unknown command '\Z'.
- 解决MYSQL数据库 Table ‘xxx’ is marked as crashed and should be repaired 145错误
- C/C++
- c语言中static的作用
- strlen和sizeof有什么区别?
- printf
- Libuv中文文档之线程
- RapidJSON
- gcc/g++ 实战之编译的四个过程
- __thread
- TARGET_LINK_LIBRARIES
- MAP_HUGETLB
- 使用Intel格式的汇编
- __m128i
- emmintrin.h
- _mm_stream_si128
- _mm_stream_load_si128
- _mm_load_si128
- _mm_xor_si128
- _mm_store_si128
- _mm_cvtsi128_si64
- Intel SSE指令集
- _mm_set_epi64x
- _mm_aesenc_si128
- _umul128
- _mm_malloc
- reinterpret_cast
- strlen
- 读取UTF-8的txt文件发现开头的多三个字节的问题
- PHP
- php计算函数执行时间的方法
- 框架
- Json Rpc远程调用框架
- PHP多进程
- PHP CLI模式下的多进程应用
- php多进程总结
- 优化
- PHP7 优化
- 让你的PHP7更快(GCC PGO)
- PHP的性能演进(从PHP5.0到PHP7.1的性能全评测)
- PHP字符串全排列算法
- 获取服务器基本信息
- cookie
- phpstudy2018 安装xdebug扩展
- 软件下载
- PHP mysqli_error() 函数
- PHP Session 变量
- curl
- curl_getinfo
- 获取请求头
- PHP使用CURL获取302跳转后的地址实例
- PHP基于cURL实现自动模拟登录
- PHP获取远程图片大小(CURL实现)
- CURL模拟登录
- curl模拟登录提交(从目录中获取文件)
- CURL HTTPS
- curl帮v
- rename
- copy
- JSON
- json_encode
- json_decode
- json_last_error_msg
- json_last_error
- PHP json_encode中文乱码解决方法
- var_dump
- PHPStorm与Xdebug设置
- Xdebug原理以notepad为例
- str_pad
- pack
- PHP二进制与字符串之间的相互转换
- PHP执行系统命令(简介及方法)
- 函数
- 十进制转二进制
- 字符串到ASSCI
- 字符串转二进制
- 合并两个表
- 图像识别
- Tesseract
- 虚拟机
- vmware下Kali 2.0安装VMware Tools
- 安装 VMware tools出现“正在进行简易安装时,无法手动启动VMware tools安装”
- 爬虫
- 有哪些好的数据来源或者大数据平台?
- Cygwin
- Git 常用命令
- 排列组合
- 含重复元素序列的全排列
- 全排列的非递归和递归实现(含重复元素)
- GitBook
- 编辑环境
- visual studio code
- 2名数学家或发现史上最快超大乘法运算法,欲破解困扰人类近半个世纪的问题
- 系统预定义常量
- 指令集
- SSE
- _MSC_VER
- msys2
- 安装cmake
- MSYS2 更新源
- 讲Cmake msys32使用问题解答 CXX CMAKE_C_COMPILER配置详解
- VirtualBox
- 解决virtualbox只能安装32位系统的问题
- Ubuntu
- 使用AES-NI的编译参数
- debian下安装内核源码的方法
- tar.xz结尾的文件的解压方法
- Linux命令
- insmod
- fatal error: openssl/bio.h
- 准备module的编译环境(kali)
- Ubuntu/Debian 之内核模块开发准备
- dmesg的详细用法
- Linux系统开机自动加载驱动module
- linux /Module 浅析(转载)
- Kali
- 找回gpedit
- Enable the Lock Pages in Memory Option (Windows)
- TLA
- 双系统
- 显卡
- 显示no CUDA的解决过程
