Skip to main content

Using a Rust async function as a polled state machine

· 10 min read

How to poll an async function manually, and then use the rust compiler as a finite-state-machine generator

2025-05-16

Introduction

An interesting thing about Rust async functions is that what that async keyword actually does is tell the compiler to automatically convert your function into a state machine so that it can pick up where it left off, and have it return a core::task::Poll result every time it's poll method is called. The future object get packed with any data it needs to keep track of it's execution state.

Normally, a one doesn't call this poll method directly, but uses an Executor like tokio or lilos to poll it. The executor may not necessarily poll it in a busy loop, it may end up blocking the OS task waiting for file handles to be ready, or waiting for an interrupt to go off to flag the task as ready to run. But, there's nothing stopping you from calling poll yourself!

I've done this a few times, in a couple basic scenarios, and I think it can be useful. At least, it's useful to know how to do it. The basic premise is that sometimes, you need to build a state machine that gets called periodically, and steps through a process. For complicated state machines, this can be keeping track of a lot of state transitions with big match variables. It's fine, but sometimes I find that the state machine written as such is hard to reason about, compared to performing the same sequence of events in a blocking manner.

The full demo code from this article can be found on github

Linear sequence vs state machine

As a simple comparison, consider a sequence of register initializations:

/// A somewhat psuedo-codey demonstration of sending a series of commands
/// to a device and waiting for an acknowledgement
fn init_device() {
let device = ImaginaryDeviceCommander::new();
device.command("write REG1 10");
while !device.is_ready() {}
device.command("write REG2 20");
while !device.is_ready() {}
device.command("write REG3 30");
while !device.is_ready() {}
}

But now imagine that our process can't block. Maybe it is part of a "super loop": an application with a single thread that cyclically calls many modules so they can update themselves. Now the device initialization process needs to be refactors so that it can be performed iteratively over a series of function calls which always return quickly, something like this:

struct DeviceInitter {
command_sent: bool,
step: usize,
device: ImaginaryDeviceCommander,
}

impl DeviceInitter {
pub fn run(&mut self) -> bool {

if step <= 3 {
match command_sent {
true => {
if self.device.is_ready() {
self.step += 1;
self.command_sent = false;
}
}
false => {
match self.step {
0 => self.device.command("write REG1 10"),
1 => self.device.command("write REG2 20"),
2 => self.device.command("write REG3 30"),
}
self.command_sent = true;
}
}
}
if self.step == 4 {
// Complete!
return true;
}
}
}

The first version certainly seems nicer to read and understand!

How to manually poll an async function

There are two main things to know:

  1. You have to pin it
  2. You have to create a dummy context to pass to poll

Both are easy to do!

Finally, you need to make sure your future will yield at appropriate times. Normally, async functions yield back to the executor when they stop to wait on some IO operation to the OS, such as reading a network socket. But you can also use the pending!() macro to yield at any point.

Here is a function to run a future once, and a simple state machine demoing it's use:

use core::pin::Pin;

/// Poll a future one time, and return its result if it completes
pub fn poll_once<T>(
mut f: Pin<&mut dyn Future<Output = T>>
) -> Option<T> {
let mut cx = futures::task::Context::from_waker(
futures::task::noop_waker_ref()
);

match f.as_mut().poll(&mut cx) {
core::task::Poll::Ready(result) => Some(result),
core::task::Poll::Pending => None,
}
}

fn main() {
let mut future = pin!(async {
let mut i = 0;
loop {
if i == 2 {
return 42;
} else {
i += 1;
// Yield back to the executor. This means that
// the future's `poll` function will return
// `Poll::Pending` and the subsequent call will
// pick up here
pending!()
}
}
});

// i = 0. Not done.
assert_eq!(poll_once(future.as_mut()), None);
// i = 1. Not done.
assert_eq!(poll_once(future.as_mut()), None);
// i = 2. Done!
assert_eq!(poll_once(future.as_mut()), Some(42));
}

There's no async runtime like tokio, async-io, etc, here; the only dependency is the futures crate.

Use case: chunked serialization

One reason I wrote this article is that a library I am working on needed to serialize a handful of data structures in an embedded context to persist it to flash. I want to impose minimum requirements on the application, so:

  • No heap usage, which means no returning a Vec<u8>.
  • Do not assume the serialized output will fit in RAM. It has to be able to be written to flash in small chunks. Only the application knows what "small" means here.

This means that the serializer has to expect a series of read(buf: &mut [u8]) calls, copy the next N bytes into the buffer, and then keep track of where it was so that it can pick up where it left off on the next call to read. This is kind of a pain in the ass! It means, e.g., that when serializing a u32, it may end up writing the first byte on one call to read, and the next 3 bytes on the next.

I started writing a state machine to do this, and then decided to try it as an async function. As an example for this article, I simplified the serialization a bit, but it is essentially the same. We are going to serialize some objects that look like this:

/// Just an example of an object to be serialized.
/// Each object has a type, and a block of bytes to describe it.
struct Object {
pub object_type: u8,
pub data: Vec<u8>,
}

They will get serialized into the flash as [<length:u16> <object_type:u8> <data:n>], where length is the length of the data + object type byte.

So first, I need to create an async function that serializes the data, but yields after writing out each byte:

/// Utility to write bytes to a provided function, returning a Poll::Pending between each byte
async fn write_bytes(src: &[u8], mut write_fn: impl FnMut(u8)) {
for i in 0..src.len() {
write_fn(src[i]);
pending!()
}
}

/// Implements a serializer for a list of objects.
///
/// Implementing this as an async function allows the sequence of writes to be written linearly and
/// simply, allowing rustc to compile it into a state machine so that the serialization can be
/// broken up into arbitrary chunks
async fn polling_write(objects: &[Object], mut write_fn: impl FnMut(u8)) {
for obj in objects {
// First serialize the size of the object, which is the length of the data + 1 byte for the
// object type
let len = (obj.data.len() + 1) as u16;
write_bytes(&len.to_le_bytes(), &mut write_fn).await;
// Serialize the object type
write_bytes(&[obj.object_type], &mut write_fn).await;
// Serialize the object data
write_bytes(&obj.data, &mut write_fn).await;
}
}

Great, that wasn't so bad.

Now, lets wrap up the future into a struct that can provide the required read function:

pub struct AsyncSerializer<'a, 'b, F: Future<Output = ()>> {
fut: Pin<&'a mut F>,
reg: &'b RefCell<u8>,
}

impl<'a, 'b, F> AsyncSerializer<'a, 'b, F>
where
F: Future<Output = ()>
{
/// Create a serializer using a future and a shared data buffer
///
/// The future should write to reg, and it *must yield after writing each byte*.
pub fn new(fut: Pin<&'a mut F>, reg: &'b RefCell<u8>) -> Self {
Self { fut, reg }
}

pub fn read(&mut self, buf: &mut [u8]) -> usize {
let mut pos = 0;
loop {
if pos >= buf.len() {
return pos;
}

if let Some(_) = poll_once(self.fut.as_mut()) {
// Serialization is complete
return pos;
} else {
buf[pos] = *self.reg.borrow();
pos += 1;
}
}
}
}

And finally, put it together and use it:

fn main() {
// `byte_buf` serves as a temporary register for communication between the future which implements
// serialization and the AsyncSerializer wrapper
let byte_buf = RefCell::new(0u8);
// Create the future from an async function, each time it is polled, it will write one byte to
// `byte_buf`
let future = pin!(polling_write(&objects, |b| *byte_buf.borrow_mut() = b));
// instantiate the wrapper which will drive the future
let mut async_serializer = AsyncSerializer::new(future, &byte_buf);

// A vec to store the fully written data
let mut output = Vec::new();
// A temporary small buffer. Data will be serialized one 3-byte chunk at a time into this buffer.
let mut buf = [0; 3];
loop {
let write_size = async_serializer.read(&mut buf);
output.extend_from_slice(&buf[0..write_size]);
if write_size < buf.len() {
break;
}
}

assert_eq!(output, [5,0,0,1,2,3,4, 9,0,1,1,2,3,4,5,6,7,8]);
}

This is made a little more complicated by the requirement that it not use the Heap. If we could use the heap, the future could be wrapped by Box::pin, and that makes managing lifetimes easier. As it is, we need to pin the future on the stack, which means that the stack frame it is created on has to outlive the serialization process. To make this a easier on the user, we can create a function to create the future and the shared RefCell, wrap them up, and provide them to a user provided callback:

    /// Create a serializer for objects, and pass it to the provided callback
///
/// This allows for pinning the required data on the stack for the duration of the serializer
/// lifetime
pub fn serialize_objects(objects: &[Object], mut cb: impl FnMut(&mut dyn PersistSerializer)) {
let reg = RefCell::new(0u8);
let fut = pin!(polling_serializer(objects, |b| *(&reg).borrow_mut() = b));
let mut serializer = AsyncSerializer::new(fut, &reg);
cb(&mut serializer);
}

And then the usage becomes:

    // A vec to store the fully written data
let mut async_output = Vec::new();

async_serialize::serialize_objects(&objects, |serializer| {
// A temporary small buffer. Data will be serialized one 3-byte chunk at a time into this buffer.
let mut buf = [0; 3];
loop {
let write_size = serializer.read(&mut buf);
async_output.extend_from_slice(&buf[0..write_size]);
if write_size < buf.len() {
break;
}
}
});

Trade-offs

To be honest, I'm not 100% convinced this is a great idea. There are for sure times when it's better to write out a sequence of linear actions. But, there are downsides! There is added complexity in creating and understanding the async function, dealing with the pinning (which is especially painful without alloc), and also, stepping through async functions in the debugger can be unpleasant.

There is another more complicated example in the repo, and splitting off the processing into the future adds a layer of communication: the "main" task cannot pass data into the future once it is created, so they have to communicate using a shared structure, which has to be Sync (at least if you want to avoid unsafe code), even though we can actually be sure that the future will never be executed on another thread.