How to get macOS power metrics with Rust?

Posted at (1467 words, 8 minutes to read)
Tags: , ,
Also read on: medium
Table of Contents

In this post, Iā€™m going to talk about how macmon works, specifically where it gets system metrics.

MacOS has a built-in powermetics utility that can show the current CPU core frequencies, utilisation, and power consumption. This program should take these values from the system somewhere. So the first thing to check is what shared libraries and function calls (symbols) are used.

There are several programs can do this:

> otool -L /usr/bin/powermetrics
 /usr/lib/libIOReport.dylib (compatibility version 1.0.0, current version 1.0.0)
 /usr/lib/libpmsample.dylib (compatibility version 1.0.0, current version 2.0.0)
 /usr/lib/libpmenergy.dylib (compatibility version 1.0.0, current version 2.0.0)
 /System/Library/Frameworks/IOKit.framework/Versions/A/IOKit (compatibility version 1.0.0, current version 275.0.0)
...

> nm -a /usr/bin/powermetrics
/usr/bin/powermetrics (for architecture arm64e):
  ...
  U _IOPMCopyPowerStateInfo
  U _IORegistryEntryCreateCFProperties
  U _IORegistryEntryCreateCFProperty
  U _IORegistryEntryFromPath
  ...

This data provides some starting point. Next I just did a search on Github to see how these libs might have been used in other repositories.

Getting CPU / GPU Usage

The first thing I found is that macOS has an IOReport shared library that returns a lot of hardware information. Including it can do sampling and return CPU/GPU & energy usage.

IOReport operates on a subscription basis. It is possible to select interesting data channels (or all available) and perform sampling on it in order to obtain actual metrics. The channel has a group / subgroup name (sometimes may not exist). Iā€™m interesting in ā€œEnergy Modelā€, ā€œCPU Statsā€ / ā€œCPU Core Performance Statesā€ and ā€œGPU Statsā€ / ā€œGPU Performance Statesā€. These values were found by comparing available channels names with the list of those strings used in powermetrics (strings /usr/bin/powermetrics).

To subscribe to a channel and receive updates, it is first necessary to get internal channel struct as CFDictionary using IOReportCopyChannelsInGroup and merge all the dicts into one using IOReportMergeChannels. Then merged dictionary can be used to create an subscription using IOReportCreateSubscription. Once subscription object obtained, updates can be received with IOReportCreateSamples and IOReportCreateSamplesDelta. All together it looks something like this:

// simplified code without error handling
use std::ptr::null;

let mut channels = vec![];
channels.push(IOReportCopyChannelsInGroup("Energy Model", null(), 0, 0, 0));
channels.push(IOReportCopyChannelsInGroup("CPU Stats", "CPU Core Performance States", 0, 0, 0));

let chan = channels[0];
for i in 1..channels.len() {
  IOReportMergeChannels(chan, channels[i], null());
}

let size = CFDictionaryGetCount(chan);
let chan = CFDictionaryCreateMutableCopy(kCFAllocatorDefault, size, chan);

let mut subs: MaybeUninit<CFMutableDictionaryRef> = MaybeUninit::uninit();
IOReportCreateSubscription(std::ptr::null(), chan, s.as_mut_ptr(), 0, std::ptr::null());
subs.assume_init();

let sample1 = IOReportCreateSamples(subs, chan, null());
std::thread::sleep(std::time::Duration::from_millis(100)); // 100ms
let sample2 = IOReportCreateSamples(subs, chan, null());

let rs = IOReportCreateSamplesDelta(sample1, sample2, null());
CFShow(rs);

// rs is CFDictionary, so we can read metrics by keys from it

The basic dict functions are exported from core_foundation crate. IOReport functions are C-bindings to a private macOS API:

#[link(name = "IOReport", kind = "dylib")]
extern "C" {
  fn IOReportCopyChannelsInGroup(a: CFStringRef, b: CFStringRef, c: u64, d: u64, e: u64) -> CFDictionaryRef;
  fn IOReportMergeChannels(a: CFDictionaryRef, b: CFDictionaryRef, nil: CFTypeRef);
  fn IOReportCreateSubscription(a: CVoidRef, b: CFMutableDictionaryRef, c: *mut CFMutableDictionaryRef, d: u64, b: CFTypeRef) -> IOReportSubscriptionRef;
  fn IOReportCreateSamples(a: IOReportSubscriptionRef, b: CFMutableDictionaryRef, c: CFTypeRef) -> CFDictionaryRef;
  fn IOReportCreateSamplesDelta(a: CFDictionaryRef, b: CFDictionaryRef, c: CFTypeRef) -> CFDictionaryRef;
}

Full implementation can be found here.

Note: For CPU I get values separately for each core. In general, there is also a group ā€œCPU Complex Performance Statesā€ to get values for the whole cluster at once, but the group results sometimes show 100% load of the cluster, when in reality there is no load. This is caused by some internal bug. For GPU there is no information for each core separately, so the load information is obtained for the whole GPU at once ā€” good luck no errors were noticed.

Parsing CPU / GPU Usage

IOReportCreateSamplesDelta returns an object with one IOReportChannels field, which containts array of metrics objects. Each metrics object have group and subgroup (on which subscription was created), and extra fields: channel name, value unit and value itself.

The values for ā€œCPU Statsā€ / ā€œGPU Statsā€ are returned in array of tuples (string, int):

In general, internally CPU can work at different frequencies and voltages. The values of these frequencies are known in advance and are always the same. Under load CPU works at higher levels (higher frequency, higher voltage). During idle work it constantly jumps between modes (in any case OS always has some background activity). Simple tasks are mostly executed on E-cluster, more complex ones are switched to P-cluster.

Each cluster has different DVFS config (Dynamic Voltage and Frequency Scaling), but all cores within same cluster uses same DVFS config. Same approach is used for GPUs.

Usually GUI interfaces show single number as current core frequency, but in reality it is calculated as average of time each core work on DVFS level. So the average frequency of a core can be calculated like this:

// predefined DVFS values for E-cluster
let frequencies = [600, 912, 1284, 1752, 2004, 2256, 2424]; // 7 items
// time was on each level (first value is idle)
let residencies = [17563069, 0, 5787276, 156965, 419037, 100647, 106031, 41735]; // 8 items

let total_time = residencies.iter().sum::<u64>();
let usage_time = residencies.iter().skip(1).sum::<u64>();

let mut freq = 0f64;
for i in 0..frequencies.len() {
 let percent = residencies[i + 1] as f64 / usage_time as f64;
 freq += percent * frequencies[i] as f64;
}

println!("E-cluster freq: {:.2} MHz", freq);

DVFS values for target system can be read from AppleARMIODevice. Full code can be found here. For M1/M2/M3 processors looks names are: voltage-states1-sram for E-Cluster, voltage-states5-sram for P-Cluster and voltage-states9 for GPU. For Max / Ultra chips is also exists voltage-states11-sram or something like this, because this chips have two or more P-Clusters but fortunately this values are same for all clusters of same group (at least for now).

Parsing Energy Usage

In the channel earlier, we also asked for metrics for the ā€œEnergy Modelā€ group. These are easier to read because itā€™s just a number. Value unit can be mJ / uJ / nJ, so it just needs to be converted to Watts. This is easily done using the formula: P(W) = E(J) / t(s), where s is sampling time. Full code here.

let joules_raw = IOReportSimpleGetIntegerValue(metric_dict, 0) as f32;
let joules = match metric_unit.as_str() {
  "mJ" => joules_raw / 1e3f32,
  "uJ" => joules_raw / 1e6f32,
  "nJ" => joules_raw / 1e9f32,
  _ => Err(format!("Unknown energy unit: {}", metric_unit).into()),
};

let sec = (duration as f32 / 1000.0);
let watts = joules / sec;

Getting RAM Usage

Getting RAM / SWAP usage is done via libc in principle I think similar to any UNIX-like system. Nothing specific here, except to figure out how to calculate the value in bytes, since internally memory is measured in pages.

MacOS uses the vm_statistics64 structure, which has quite a few fields. By experimentation I found a value that is similar to what Activity Monitor shows. To get current memory host_statistics64 need to be called. For total memory available and SWAP info sysconf call is enough. Full code can be found here. Short example with libc crate:

let mut count: u32 = libc::HOST_VM_INFO64_COUNT as _;
let mut stats = std::mem::zeroed::<libc::vm_statistics64>();

libc::host_statistics64(
  libc::mach_host_self(),
  libc::HOST_VM_INFO64,
  &mut stats as *mut _ as *mut _, // cast to void (untyped) pointer
  &mut count,
);

let page_size_kb = libc::sysconf(libc::_SC_PAGESIZE) as u64;
let usage = page_size_kb * (
  + stats.active_count as u64
  + stats.inactive_count as u64
  + stats.wire_count as u64
  + stats.speculative_count as u64
  + stats.compressor_page_count as u64
  - stats.purgeable_count as u64
  - stats.external_page_count as u64);

Getting Temperature Values

This appeared to be the most complicated part of the program. Initially I wanted to get values for each core separately and show them in a separate tab in the interface. On macOS there are GUI programs (Stats, Macs Fan Control) that can do this, but after a detailed study I realised that the values in these programs are shown by guessing which sensor is responsible for which core.

Temperature value can be obtained from two places: IOHID for M1 on macOS 12ā€“13 and SMC for M1/M2/M3 on macOS 14ā€“15.

IOHID approach is simpler ā€” itā€™s just a function call that returns a CFDictionary with values. There seems only one person in open-source who understands how it works, because all other repositories I found referenced it.

SMC (System Management Controller) is more complicated. It works through something similar to RPC calls with large structures where fields are optionally filled in depending on which RPC method called. SMC looks like a more classic approach, but it looks like Apple didnā€™t have time to implement all the functionality during migration to ARM architecture. To make macmon works everywhere, I had to implement both approaches.

IOHID implementation can be found here, there is really nothing to talk about technically, except that the parameter values are found "magically".

In SMC, it is possible to obtain a sensorā€™s value using a key. The key itself uses FourCC notation, which is essentially int32 but appears human-readable in code (for example, Tp01 -> 0x54703031, Tp02 -> 0x54703032, etc.). This key is used to obtain KeyInfo, from which KeyData can be retrieved. Therefore, to get the sensor value, two RPC calls are needed, which is inefficient and costly. Hence, it is necessary to cache the KeyInfo value beforehand.

The SMC initialisation process looks like: obtain all possible keys using the special key #KEY (0x234b4559), then filter out only those KeyInfo that start with Tp (CPU related) and Tg (GPU related), and then make requests only to them ā€“ resulting in one RPC call per sensor.

Since I do not know which sensors correspond to which core or other sensors on the chip, the program simply calculates average of all values per type.

SMC implementation can be found here (utility to perform RPC calls), here (initialisation) and here (actual metrics gathering).

Combine all together

Thereā€™s nothing special about this part. Once per second, the current metric values are obtained from sources described above. The only thing worth noting is that SMC is used by default, if its value is not available, then IOHID is used (fallback for M1 on macOS 13). Then these values are combined into a single structure and return to caller.

Then these values are aggregated in the "frontend" code and output to the console using ratatui crate or in plain text in raw mode. But this is already a material for a separate article.


You can try macmon with Homebrew:

brew install vladkens/tap/macmon