Skip to main content

· 12 min read

I've been kicking around an idea for a while now to create a CAN bus communication stack for Rust, and now it is finally taking shape. It's still a work-in-progress, but I'm getting ready to publish a rough prototype, so I want to write about what I want it to be. I'm calling it Zencan, and it's an implementation of the CANOpen protocol, with a few planned extensions.

The repo for zencan is at https://github.com/mcbridejc/zencan.

An example project using zencan can be found at https://github.com/mcbridejc/can-io-firmware

Background

What's a CAN bus?

The Controller Area Network (CAN) bus, first popularized in automotive applications, is widely used for industrial automation and motion control. If you've ever used an OBD-II reader to read engine error codes, you were using a CAN bus. I've personally used it in UAVs and on rocket engine controllers, but I feel that it does not get as much use in the open-source world as it deserves, so I want to make CAN easy with Rust in hopes that it will encourage more use.

The CAN bus operates on a single differential pair, connected to all of the nodes. This means wiring multiple devices requires only two wires! It also means a single connection point can be used to plug in a computer and monitor and control all of the MCUs on the bus. It keeps wiring easy.

Many MCUs come with CAN controllers, and they handle a lot of the communications overhead without any CPU involvement. For example, bus arbitration, errors and re-transmission, prioritized message queuing, and received message filtering are all often supported by hardware.

CAN is fairly slow, with traditional CAN 2.0 busses running at a max bitrate of 1 Mbit/s. However, these days some MCUs are shipping with CAN-FD capable controllers, and I intend for Zencan to support CAN-FD. A CAN-FD bus allows for longer messages, and higher bitrates -- usually up to 5-8 Mbit/s, although I have heard of buses as fast as 20 Mbit/s.

The CAN protocol includes framing, so everything is sent as a message. A message has an ID, and a data payload. The ID is used to identify what's in the message, and also implicitly, who it's for (e.g. here is where message filtering comes in: if a node on the bus knows which messages it is interested in, it can setup hardware filters to drop any others). I think this can be confusing sometimes, so I will restate: generally, nodes on a CAN bus do not have addresses, or IDs as a node. Instead, the messages are tagged with IDs. If one wants to assign addresses to specific nodes, this has to be built on top of the CAN protocol!

Why I like CAN

  • It is smaller and cheaper than Ethernet.
  • On projects which have multiple MCUs, it is convenient to plug into a single bus and talk to all of them.
  • There are standard tools for monitoring and plotting data on a CAN bus, with tools like SavvyCAN and the DBC file format one can easily plot data with no code.
  • It is robust! With its low speed, smart controller hardware, and differential signalling, in my experience CAN is very tolerant of less-than-ideal wiring scenarios.

Zencan Goals

Rust

Zencan is built in Rust because that's what I'm into lately, and there's a lack of CANOpen stacks out there for Rust.

I've been writing C/C++ for embedded systems a long time, and although there are definitely some trade-offs, I see Rust as the path forward for me. Look, if you catch me at the wrong time -- like when I've just spent an hour appeasing the borrow checker, or am frustrated by difficulty viewing static variables in a debugger -- I might offer a different opinion. But on the whole...

Configurability

I want configurability! I want to be able to build a set of components, put them all on a CAN bus, and make them talk to each other, without having to change their firmware to do it. One way that a CAN bus has traditionally been designed is to keep a "master spreadsheet" of all the data that has to be transferred on the bus (motor temperature, motor current, velocity command, etc), cram these all into a set of messages, assign them each a CAN ID, and then write the appropriate software for each node to send and receive the relevant messages. This is fine, but I want to grab a generic motor controller board, and a generic IO board with an analog joystick plugged into it, and then configure the joystick to command the motor controller without modifying the software.

Observability

I want to standardize interactions one might want to have with MCUs on a bus so that I can solve them well once and re-use it. I want to plug into the bus and:

  • Report all the devices on the bus
  • Program any device with new firmware
  • Trivially plot values being generated by that device over time, or log for plotting later
  • View metadata about the device such as software versions, serial numbers, etc

Ease

A lot of code is required to get all of that, but I shouldn't have to think about it every time I start a new project. I should just be able to instantiate a node that works with a basic configuration file and a small amount of boilerplate.

Architecture

CANOpen

There is an existing protocol, built on top of CAN, which supports a LOT of what I want to do. It's called CANOpen. I think that Zencan is going to actually be a Rust implementation of a CANOpen stack, with some extra features layered onto it. I have some concerns about this, like the fact that CANOpen is managed by the CiA, which restricts access to the specification documents to members only. And even if you manage to find these documents, you may not find them as helpful or clear as you would like. But, the concept of the Object Dictionary and the Electronic Data Sheet (EDS) describing the objects in it, combined with the ability to map arbitrary object data into PDO messages, solves a lot of the configurability and observability requirements. CANOpen isn't exactly what I would design from scratch, but it is pretty close and has the benefit of being an established standard with existing devices and software tools. And, as far as I can find, there isn't yet a full-featured, mature CANOpen stack available for Rust. So I'm going to start here, and see if I can get everything I want out of it while maintaining CANOpen compatibility.

Specific Goals

Specific goals of the project are:

  • Support creating nodes on embedded (no_std) or linux targets
  • No heap; all data statically allocated.
  • Be CANOpen compatible
  • Support device discovery
  • Easy node configuration and object definition via a "device config" TOML file
  • Support CAN-FD
  • Support bulk data transfer -- e.g. a device I am targeting is an e-ink display, which requires transferring frames of pixels
  • Standardized device software updates and version reporting
  • A GUI and CLI interface for device management via socketcan

Configuration Files

I am still hashing this out, and it may change!

There is a file format called EDS, or Electronic Data Sheet, which is used to describe the objects in a CANOpen object dictionary. I initially started out using an EDS file as an input to the code generation, but found that this was not ideal. For one thing, the format is fairly denormalized/redundant, which means that it can have inconsistencies and be difficult to edit manually. For another, I want the ability to specify Zencan specific options, and had no (good) way to work that into an EDS file. So instead, I've created a TOML schema for device configuration files, and EDS files will be generated as an output to be used with tools which support them.

Static Data, shared between objects and threads, without heap

The first challenge was sharing data between different contexts. Maybe it's just code that doesn't know each other (e.g. the zencan-node crate, and user's application code) sharing access to the object dictionary. Maybe it is really on other threads. In a situation with alloc available, one might simply use an Arc to wrap the shared objects. Without Arc, we have to pass references. This quickly can lead to lifetime hell. Instead of managing these lifetimes with lots of generics, I decided to require many of the data structures to be static. In the embedded context, this is almost always the desired case anyway, and only one instance of a node is expected to exist. In other contexts, like hosting a node on linux, or in tests, this may not be the case. These can be addressed using Box::leak, to make a heap allocated object static.

Dynamic Code Generation

Some code will be generated in build.rs, using the zencan-build crate, based on a TOML file in the project. This is somewhat similar to the way that the CANOpenNode C stack does it, but the EDS file, and more specifically the C# application used to edit the EDS file generate a C source and header file for inclusion in your application. I think that rust tooling will provide a good mechanism to auto-generate this code as part of the build process. It is still auto-generated code, and that has its downsides -- especially readability -- but I don't see how to get around it. The generated code is saved to the compilation OUT_DIR, its name stored in an env var, so that it can be included via a macro somewhere in the application. This concept is modeled after slint, which does a similar thing for including generated code from a .slint file which defines the GUI.

In build.rs, the code is generated from the device config, with a name (e.g. 'EXAMPLE1'):

fn main() {
if let Err(e) = zencan_build::build_node_from_device_config("EXAMPLE1", "device_configs/example1.toml") {
eprintln!("Error building node from example1.toml: {}", e);
std::process::exit(1);
}
}

Then in your application, the generated code can be included wherever you like as:

zencan_node::include_modules!(EXAMPLE1);

The name allows multiple nodes to be instantiated in a single application, although I do not yet have a use in mind for this, other than tests.

Here's an example of what the generated code looks like, which might help understand what's being generated:

#[allow(dead_code)]
#[derive(Debug)]
pub struct Object1000 {
pub value: AtomicCell<u32>,
}
#[allow(dead_code)]
impl Object1000 {
pub fn set_value(&self, value: u32) {
self.value.store(value);
}
pub fn get_value(&self) -> u32 {
self.value.load()
}
const fn default() -> Self {
Object1000 {
value: AtomicCell::new(0i64 as u32),
}
}
}
impl ObjectRawAccess for Object1000 {
fn write(&self, sub: u8, offset: usize, data: &[u8]) -> Result<(), AbortCode> {
if sub == 0 {
if offset != 0 {
return Err(zencan_node::common::sdo::AbortCode::UnsupportedAccess);
}
let value = u32::from_le_bytes(
data
.try_into()
.map_err(|_| {
if data.len() < size_of::<u32>() {
zencan_node::common::sdo::AbortCode::DataTypeMismatchLengthLow
} else {
zencan_node::common::sdo::AbortCode::DataTypeMismatchLengthHigh
}
})?,
);
self.set_value(value);
Ok(())
} else {
Err(AbortCode::NoSuchSubIndex)
}
}
fn read(&self, sub: u8, offset: usize, buf: &mut [u8]) -> Result<(), AbortCode> {
if sub == 0 {
let bytes = self.get_value().to_le_bytes();
if offset + buf.len() > bytes.len() {
return Err(
zencan_node::common::sdo::AbortCode::DataTypeMismatchLengthHigh,
);
}
buf.copy_from_slice(&bytes[offset..offset + buf.len()]);
Ok(())
} else {
Err(AbortCode::NoSuchSubIndex)
}
}
fn sub_info(&self, sub: u8) -> Result<SubInfo, AbortCode> {
if sub != 0 {
return Err(AbortCode::NoSuchSubIndex);
}
Ok(SubInfo {
access_type: zencan_node::common::objects::AccessType::Const,
data_type: zencan_node::common::objects::DataType::UInt32,
size: 4usize,
pdo_mapping: zencan_node::common::objects::PdoMapping::None,
persist: false,
})
}
fn object_code(&self) -> zencan_node::common::objects::ObjectCode {
zencan_node::common::objects::ObjectCode::Var
}
}
pub static OBJECT1000: Object1000 = Object1000::default();
pub static NODE_STATE: NodeState<4usize, 4usize> = NodeState::new();
pub static NODE_MBOX: NodeMbox = NodeMbox::new(NODE_STATE.rpdos());
pub static OD_TABLE: [ODEntry; 31usize] = [
ODEntry {
index: 0x1000,
data: ObjectData::Storage(&OBJECT1000),
},
]

Each object gets a struct defined, and an implementation of the ObjectRawAccess trait. All objects are instantiated statically, and stored as a table in OD_TABLE. This is the object dictionary. NODE_STATE includes some static state information used by the instantiated Node, and NODE_MBOX provides a Sync data structure for pass received messages, so that messages can be received in an IRQ handler.

Threading

The expectation is that a single thread will own the Node object, and that most of the node behavior will happen on this thread in the form of calls to Node::process. A separate object, the NodeMbox allows for reception of messages on another thread. The expected use for this is to push received messages into the mailbox object in an IRQ handler. The object dictionary is Sync, using the critical_section crate for protecting data access. Critical section will allow embedded applications to implement critical sections by disabling interrupts, and linux applications to do so using global locks.

I tried to use crossbeam's AtomicCell for atomic access, but it does not currently support thumbv6 (i.e. Cortex M0) targets at all, because these targets lack CAS support, and I intend to use this on M0 targets.

The object dictionary, and all of the objects are Sync, as they may need to be accessed by various application code running in various thread contexts.

Object Storage vs Callback

Zencan supports two object types:

  • Storage objects have statically allocated storage for their data and support simple read/write operations
  • Callback objects rely on callback functions to implement their access, allowing for validation and dynamic data handling during read or write

Control tools

The zencan-cli crate comes with a REPL-style shell for interacting with devices over socketcan in real-time. A GUI version of this is planned as well.

Summary

I consider Zencan now as a prototype, and still evolving. It needs more examples, documentation, and I need to implement it in more devices to flesh things out. There are still important features missing, and I expect some churn on the architecture/API. I hope over the next few months to integrate it into a few more projects, while continuing to develop it into something that someone besides myself might want to use! There are a few loose ends to tie up before I push a first release to crates.io, but I expect to be doing that soon.

· 2 min read

This is for future-me, and for future internet searchers.

Using probe-rs to attach to an STM32G0 family MCU to read log messages via defmt over RTT, I encountered the following error:

 WARN probe_rs::util::rtt::client: RTT read pointer changed, re-attaching
WARN probe_rs::rtt::channel: RTT channel name points to unrecognized memory. Bad target description?
WARN probe_rs::rtt::channel: RTT channel name points to unrecognized memory. Bad target description?

This occurred the same using defmt_rtt or rtt_target.

Reading through the probe-rs logs, it seemed that at some point after correctly reading the RTT memory block, it would start reading junk where it expected to read a SEGGER RTT identifier.

I am using lilos, and in its executor it uses the WFI instruction to sleep. By trying different configurations, I found that the error only occurs when the lilos executor is run.

On many cortex processors, some options have to be enabled to keep the clocks running on the debug core during sleep, e.g., on the STM32G0, the following bits have to be set:

pac::DBGMCU.cr().modify(|w| {
w.set_dbg_standby(true);
w.set_dbg_stop(true);
});

What I learned today is that while this may keep the dbg core running, either the SRAM or something in the bus matrix still gets disabled unless there is at least one active master, and I guess the debug core does not count. So to keep access to SRAM via the debugger during sleep, you can enable a DMA:

pac::RCC.ahbenr().modify(|w| w.set_dma1en(true));

With all three of these bits set, I can reliably read from SRAM over SWD and thus RTT is happy.

This probe-rs issue discusses the problem.

· 10 min read

How to poll an async function manually, and then use the rust compiler as a finite-state-machine generator

2025-05-16

Introduction

An interesting thing about Rust async functions is that what that async keyword actually does is tell the compiler to automatically convert your function into a state machine so that it can pick up where it left off, and have it return a core::task::Poll result every time it's poll method is called. The future object get packed with any data it needs to keep track of it's execution state.

Normally, a one doesn't call this poll method directly, but uses an Executor like tokio or lilos to poll it. The executor may not necessarily poll it in a busy loop, it may end up blocking the OS task waiting for file handles to be ready, or waiting for an interrupt to go off to flag the task as ready to run. But, there's nothing stopping you from calling poll yourself!

I've done this a few times, in a couple basic scenarios, and I think it can be useful. At least, it's useful to know how to do it. The basic premise is that sometimes, you need to build a state machine that gets called periodically, and steps through a process. For complicated state machines, this can be keeping track of a lot of state transitions with big match variables. It's fine, but sometimes I find that the state machine written as such is hard to reason about, compared to performing the same sequence of events in a blocking manner.

The full demo code from this article can be found on github

Linear sequence vs state machine

As a simple comparison, consider a sequence of register initializations:

/// A somewhat psuedo-codey demonstration of sending a series of commands
/// to a device and waiting for an acknowledgement
fn init_device() {
let device = ImaginaryDeviceCommander::new();
device.command("write REG1 10");
while !device.is_ready() {}
device.command("write REG2 20");
while !device.is_ready() {}
device.command("write REG3 30");
while !device.is_ready() {}
}

But now imagine that our process can't block. Maybe it is part of a "super loop": an application with a single thread that cyclically calls many modules so they can update themselves. Now the device initialization process needs to be refactors so that it can be performed iteratively over a series of function calls which always return quickly, something like this:

struct DeviceInitter {
command_sent: bool,
step: usize,
device: ImaginaryDeviceCommander,
}

impl DeviceInitter {
pub fn run(&mut self) -> bool {

if step <= 3 {
match command_sent {
true => {
if self.device.is_ready() {
self.step += 1;
self.command_sent = false;
}
}
false => {
match self.step {
0 => self.device.command("write REG1 10"),
1 => self.device.command("write REG2 20"),
2 => self.device.command("write REG3 30"),
}
self.command_sent = true;
}
}
}
if self.step == 4 {
// Complete!
return true;
}
}
}

The first version certainly seems nicer to read and understand!

How to manually poll an async function

There are two main things to know:

  1. You have to pin it
  2. You have to create a dummy context to pass to poll

Both are easy to do!

Finally, you need to make sure your future will yield at appropriate times. Normally, async functions yield back to the executor when they stop to wait on some IO operation to the OS, such as reading a network socket. But you can also use the pending!() macro to yield at any point.

Here is a function to run a future once, and a simple state machine demoing it's use:

use core::pin::Pin;

/// Poll a future one time, and return its result if it completes
pub fn poll_once<T>(
mut f: Pin<&mut dyn Future<Output = T>>
) -> Option<T> {
let mut cx = futures::task::Context::from_waker(
futures::task::noop_waker_ref()
);

match f.as_mut().poll(&mut cx) {
core::task::Poll::Ready(result) => Some(result),
core::task::Poll::Pending => None,
}
}

fn main() {
let mut future = pin!(async {
let mut i = 0;
loop {
if i == 2 {
return 42;
} else {
i += 1;
// Yield back to the executor. This means that
// the future's `poll` function will return
// `Poll::Pending` and the subsequent call will
// pick up here
pending!()
}
}
});

// i = 0. Not done.
assert_eq!(poll_once(future.as_mut()), None);
// i = 1. Not done.
assert_eq!(poll_once(future.as_mut()), None);
// i = 2. Done!
assert_eq!(poll_once(future.as_mut()), Some(42));
}

There's no async runtime like tokio, async-io, etc, here; the only dependency is the futures crate.

Use case: chunked serialization

One reason I wrote this article is that a library I am working on needed to serialize a handful of data structures in an embedded context to persist it to flash. I want to impose minimum requirements on the application, so:

  • No heap usage, which means no returning a Vec<u8>.
  • Do not assume the serialized output will fit in RAM. It has to be able to be written to flash in small chunks. Only the application knows what "small" means here.

This means that the serializer has to expect a series of read(buf: &mut [u8]) calls, copy the next N bytes into the buffer, and then keep track of where it was so that it can pick up where it left off on the next call to read. This is kind of a pain in the ass! It means, e.g., that when serializing a u32, it may end up writing the first byte on one call to read, and the next 3 bytes on the next.

I started writing a state machine to do this, and then decided to try it as an async function. As an example for this article, I simplified the serialization a bit, but it is essentially the same. We are going to serialize some objects that look like this:

/// Just an example of an object to be serialized.
/// Each object has a type, and a block of bytes to describe it.
struct Object {
pub object_type: u8,
pub data: Vec<u8>,
}

They will get serialized into the flash as [<length:u16> <object_type:u8> <data:n>], where length is the length of the data + object type byte.

So first, I need to create an async function that serializes the data, but yields after writing out each byte:

/// Utility to write bytes to a provided function, returning a Poll::Pending between each byte
async fn write_bytes(src: &[u8], mut write_fn: impl FnMut(u8)) {
for i in 0..src.len() {
write_fn(src[i]);
pending!()
}
}

/// Implements a serializer for a list of objects.
///
/// Implementing this as an async function allows the sequence of writes to be written linearly and
/// simply, allowing rustc to compile it into a state machine so that the serialization can be
/// broken up into arbitrary chunks
async fn polling_write(objects: &[Object], mut write_fn: impl FnMut(u8)) {
for obj in objects {
// First serialize the size of the object, which is the length of the data + 1 byte for the
// object type
let len = (obj.data.len() + 1) as u16;
write_bytes(&len.to_le_bytes(), &mut write_fn).await;
// Serialize the object type
write_bytes(&[obj.object_type], &mut write_fn).await;
// Serialize the object data
write_bytes(&obj.data, &mut write_fn).await;
}
}

Great, that wasn't so bad.

Now, lets wrap up the future into a struct that can provide the required read function:

pub struct AsyncSerializer<'a, 'b, F: Future<Output = ()>> {
fut: Pin<&'a mut F>,
reg: &'b RefCell<u8>,
}

impl<'a, 'b, F> AsyncSerializer<'a, 'b, F>
where
F: Future<Output = ()>
{
/// Create a serializer using a future and a shared data buffer
///
/// The future should write to reg, and it *must yield after writing each byte*.
pub fn new(fut: Pin<&'a mut F>, reg: &'b RefCell<u8>) -> Self {
Self { fut, reg }
}

pub fn read(&mut self, buf: &mut [u8]) -> usize {
let mut pos = 0;
loop {
if pos >= buf.len() {
return pos;
}

if let Some(_) = poll_once(self.fut.as_mut()) {
// Serialization is complete
return pos;
} else {
buf[pos] = *self.reg.borrow();
pos += 1;
}
}
}
}

And finally, put it together and use it:

fn main() {
// `byte_buf` serves as a temporary register for communication between the future which implements
// serialization and the AsyncSerializer wrapper
let byte_buf = RefCell::new(0u8);
// Create the future from an async function, each time it is polled, it will write one byte to
// `byte_buf`
let future = pin!(polling_write(&objects, |b| *byte_buf.borrow_mut() = b));
// instantiate the wrapper which will drive the future
let mut async_serializer = AsyncSerializer::new(future, &byte_buf);

// A vec to store the fully written data
let mut output = Vec::new();
// A temporary small buffer. Data will be serialized one 3-byte chunk at a time into this buffer.
let mut buf = [0; 3];
loop {
let write_size = async_serializer.read(&mut buf);
output.extend_from_slice(&buf[0..write_size]);
if write_size < buf.len() {
break;
}
}

assert_eq!(output, [5,0,0,1,2,3,4, 9,0,1,1,2,3,4,5,6,7,8]);
}

This is made a little more complicated by the requirement that it not use the Heap. If we could use the heap, the future could be wrapped by Box::pin, and that makes managing lifetimes easier. As it is, we need to pin the future on the stack, which means that the stack frame it is created on has to outlive the serialization process. To make this a easier on the user, we can create a function to create the future and the shared RefCell, wrap them up, and provide them to a user provided callback:

    /// Create a serializer for objects, and pass it to the provided callback
///
/// This allows for pinning the required data on the stack for the duration of the serializer
/// lifetime
pub fn serialize_objects(objects: &[Object], mut cb: impl FnMut(&mut dyn PersistSerializer)) {
let reg = RefCell::new(0u8);
let fut = pin!(polling_serializer(objects, |b| *(&reg).borrow_mut() = b));
let mut serializer = AsyncSerializer::new(fut, &reg);
cb(&mut serializer);
}

And then the usage becomes:

    // A vec to store the fully written data
let mut async_output = Vec::new();

async_serialize::serialize_objects(&objects, |serializer| {
// A temporary small buffer. Data will be serialized one 3-byte chunk at a time into this buffer.
let mut buf = [0; 3];
loop {
let write_size = serializer.read(&mut buf);
async_output.extend_from_slice(&buf[0..write_size]);
if write_size < buf.len() {
break;
}
}
});

Trade-offs

To be honest, I'm not 100% convinced this is a great idea. There are for sure times when it's better to write out a sequence of linear actions. But, there are downsides! There is added complexity in creating and understanding the async function, dealing with the pinning (which is especially painful without alloc), and also, stepping through async functions in the debugger can be unpleasant.

There is another more complicated example in the repo, and splitting off the processing into the future adds a layer of communication: the "main" task cannot pass data into the future once it is created, so they have to communicate using a shared structure, which has to be Sync (at least if you want to avoid unsafe code), even though we can actually be sure that the future will never be executed on another thread.