Skip to content

Conversation

bugadani
Copy link

This is a fairly low effort solution to #1010 mostly to try to motivate some discussion around this issue. The idea is that the Device trait, and everything around it might change less often than the smoltcp API surface and codebase. This means Device implementors can support multiple smoltcp versions with a single implementation - the biggest benefit is not being forced to react to breaking changes, unless they include changes to the device crate.

If the approach sounds viable, we can discuss how to make the split - whether the workspace approach is all right, or the device crate should live separately, is just a detail for me.

@bugadani bugadani force-pushed the device branch 2 times, most recently from 668ef07 to c89df35 Compare August 11, 2025 14:39
@bugadani bugadani force-pushed the device branch 2 times, most recently from 88d2939 to bb4b13d Compare August 11, 2025 14:45
Copy link

codecov bot commented Aug 11, 2025

Codecov Report

❌ Patch coverage is 92.59259% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.41%. Comparing base (ac32e64) to head (34d0962).

Files with missing lines Patch % Lines
smoltcp-device/src/lib.rs 86.11% 5 Missing ⚠️
src/iface/interface/mod.rs 75.00% 2 Missing ⚠️
src/phy/fault_injector.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1080      +/-   ##
==========================================
+ Coverage   80.33%   81.41%   +1.08%     
==========================================
  Files          81       77       -4     
  Lines       24339    24015     -324     
==========================================
+ Hits        19552    19553       +1     
+ Misses       4787     4462     -325     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@whitequark
Copy link
Contributor

whitequark commented Aug 12, 2025

The end goal of supporting multiple incompatible smoltcp versions within one hardware abstraction layer crate is self-evidently desirable and reasonable to pursue.

I don't think splitting off the crate smoltcp-device crate mechanically is the way to go though. The reason this PR is as gnarly as it is, is that it attempts to pretend that a crate that was never designed to cleave at any module boundaries includess inter-module APIs that can be stabilized as-is. In this situation what you want to do is to define a smoltcp-device API independently of the things smoltcp happens to do today. This is not a trivial problem!

I'm also not sure what exactly would happen if you released e.g. smoltcp 0.13 and smoltcp-device 1.0 today and then needed to do a breaking change and release smoltcp-device 2.0 tomorrow. What is the procedure for doing that? Does smoltcp itself have to care beyond bumping the version number of the smoltcp-device crate concurrently with a new smoltcp crate release?

It sounds like the answer is "yes" since you want to support consumers that are lagging on a smoltcp 0.x version without any upgrades to them at all, meaning that if somebody pinned the patch version of smoltcp you'd expect that to still build and be usable in a project together with other users of the Device trait. This appears to me (I may be missing something) to limit the implementation strategies for smoltcp-device such that the only remaining viable one is "smoltcp-device N.x is implemented in terms of smoltcp-device (N+1).x whenever the latter becomes available".

This sounds like a very significant maintenance and testing burden! Who are the people who can't update their smoltcp crate versions? Do they have money? Are they going to put that money towards maintenance? While I can't speak for @Dirbaio, I will not sign up to do this sort of release engineering without being very well compensated for it.

@whitequark
Copy link
Contributor

Or to summarize in a single sentence: it sounds to me like this PR comes with an implied request to support a (partially sparse) N:M mapping of smoltcp-device API variants to smoltcp API variants, but why would we accept that?

@bugadani
Copy link
Author

bugadani commented Aug 12, 2025

I'm also not sure what exactly would happen if you released e.g. smoltcp 0.13 and smoltcp-device 1.0 today and then needed to do a breaking change and release smoltcp-device 2.0 tomorrow.

In that case we would need to implement support for smoltcp-device 2.0 in esp-radio along with 1.0. That is a relatively small burden that we can do fairly easily, and wouldn't require us to include all the code that comes with smoltcp 0.13, 14, 15, or any of the relevant releases - nor would it require us to come up with an alternative way to support separate smoltcp versions.

If compatibility needs to be broken, we can handle that. The goal is to reduce the amount of times compatibility needs to be broken. Look at embassy-net and embass-net-driver, which this PR tries to mimic: the driver crate was released 2 years ago, and hasn't seen relevant changes since. In the mean time, embassy-net went through 5 or so major releases. While smoltcp itself is relatively stable, the only relevant change I see in the past 2 years is #924.

It's obvious that if a user is stuck with a smoltcp-device 1.0 implementation, they would be locked out of future smoltcp improvements. Just like they are right now, if their network device only implements smoltcp 0.11's Device trait.


Supporting multiple versions of smoltcp itself isn't a big problem for us. We can easily create smoltcp-012, -013, ... features and copy-paste the Device/RxToken/TxToken implementations fairly easily. We would be forced to release esp-radio for each new smoltcp major release, which isn't a terribly hard problem either. But users will need to match the feature version to smoltcp, and smoltcp itself can be buried behind embassy-net. Updating embassy-net means the users need to figure out they need to change an esp-radio feature flag, which is pretty hard to explain without tying unrelated crates' documentations together in some way.

@whitequark
Copy link
Contributor

It's obvious that if a user is stuck with a smoltcp-device 1.0 implementation, they would be locked out of future smoltcp improvements. Just like they are right now, if their network device only implements smoltcp 0.11's Device trait.

I'm not convinced it's obvious: right now smoltcp ends up being an implementation detail, a contract between two sub-components, and issues arise because there's a vesion disagreement. The problem of what to do in case of incompatibility doesn't go anywhere (the end user would still have to somehow figure out that it's a smoltcp version compatibility issue, get the dependency tree to be updated properly, etc), the only real difference with this proposal is that the frequency of this becomes much lower. I agree that this is desirable and that some sort of solution is needed! I'm just concerned about what happens when we bump the major of smoltcp-driver and the whole sub-ecosystem somehow breaks because people have made unsound assumptions.

@whitequark
Copy link
Contributor

Anyway, let's discuss the possible implementation strategies:

  1. smoltcp to smoltcp-device is 1:1, lives in this repo, we get rid of the features in smoltcp-device and bump the minor or the major whenever we add protocol-specific fields
  2. same as (1) but we make smoltcp-device a separate repo under the org
  3. exists to make smoltcp to smoltcp-device N:M (e.g. an adapter crate); we put smoltcp-device and that other thing in their own repos with independent history

It sounds like you're not actually interested in (3) so that's out. Personally I'm leaning towards (2) specifically because we are creating a new stable, versioned API surface. Workspaces and monorepos work well when you expect to be doing cross-cutting changes to many components at once. But we very much do not expect to be doing cross-cutting changes to smoltcp + smoltcp-device: if we did it would defeat the purpose of your request.

Thoughts?

@Dirbaio
Copy link
Member

Dirbaio commented Aug 12, 2025

I do think this is something we want to do. The "small crate with just the trait" has worked quite well in Embassy to ease the pain of releases, and the problem here is exactly the same.

It makes me sad that the API surface of smoltcp-driver has to be so large though, including time and DeviceCapabilities.

There's also the ipv4/ipv6 and medium Cargo features. which could cause trouble. If you enable medium-ethernet, medium-ip in smoltcp-device, but only medium-ethernet in smoltcp, build will fail because lots of matches in smoltcp are no longer exhaustive. embassy-net-driver doesn't have any Cargo features, it just includes all enum cases unconditionally, and then embassy-net just panics if you use a device that has a not-enabled medium. This has a runtime cost though, not sure if we're willing to accept that here.

the only real difference with this proposal is that the frequency of this becomes much lower

yes, the goal is just to reduce the frequency of breaking changes. If there's some legitimate need to change the driver contract we'd still change it.

smoltcp to smoltcp-device is 1:1, lives in this repo, we get rid of the features in smoltcp-device and bump the minor or the major whenever we add protocol-specific fields

This is what I'd do.

  • Separate repos is more work to manage (two histories, two CIs, syncing things between them). It's true most changes will be to smoltcp only, but when there is a change to smoltcp-device it is cross-cutting, it's nice to be able to review+merge everything atomically.
  • I wouldn't bother with adapter crates. When we do a smoltcp-device major release we can just do a clean break, new smoltcps will be compatible with the new smoltcp-device only.

@bugadani
Copy link
Author

bugadani commented Aug 12, 2025

It sounds like you're not actually interested in (3) so that's out.

I agree with you (at least if I understood your stance on this correctly) that attempting this would be probably unreasonably difficult. I think only abandoned implementations would be left behind, the others should have a reasonable upgrade path.

and bump the minor or the major whenever we add protocol-specific fields

This part worries me a bit. I agree we can remove the features, but I'd personally prefer adding new protocols/media to be a non-breaking change. I think the API can be formulated in a way to allow this.

The feature problem could be split in two:

  • A set of requires-* features enabled by smoltcp
  • A set of provides-* features enabled by the implementor.

If these sets don't match, the driver crate can emit a nice build error. If the user naughtily enables a feature that shouldn't be, the build would fail in some other way, but that's the user's fault and problem.

One thing that may go wrong, is multiple device implementors in a project. For example, esp-radio may provide a device that supports the ieee802154 medium, but the user may use an external ethernet chip, too, which doesn't. The two device impl crates should somehow end up building together without knowing about each other.

same as (1) but we make smoltcp-device a separate repo under the org

I don't have strong opinions about where the code should live, but a single repo is easier to work with when the device crate needs to be changed.

including time and DeviceCapabilities.

I'm not sure there is a way around DeviceCapabilities, it's after all a property of the underlying device. time, however, makes me sad as well - we only need Instant in fns transmit and receive. We could remove that part by making micros: u64 the API (although the docs on the functions say milliseconds 🤔), but I dislike losing the meaning of the type.

@whitequark
Copy link
Contributor

whitequark commented Aug 12, 2025

I agree with you (at least if I understood your stance on this correctly) that attempting this would be probably unreasonably difficult.

To clarify my position a bit: while it is difficult, the reason I don't want to do it is because I find it hard to believe that anybody will fund the development of the necessary infrastructure. If someone wants it enough I'm more or less completely fine with this proposal; I just don't want to have a half-baked version of it bitrotting for the next 5 years while becoming a major tech support problem.

I think only abandoned implementations would be left behind, the others should have a reasonable upgrade path.

To clarify my conclusions that I think are confirmed by this statement: Would you agree that the change we're discussing keeps the nature of the problem intact (crate with Device trait has more than one incompatible version installed) but reduces the perceived/experienced/felt severity of consequences by making the problem occur less often? Which is to say, you are prepared for smoltcp-device to do semver-breaking changes every once in a while, and the crux of your proposal is that there should simply be fewer of them; is this correct?

I agree we can remove the features, but I'd personally prefer adding new protocols/media to be a non-breaking change. I think the API can be formulated in a way to allow this.

To clarify my position here: I think it is not worth the complexity and headache (I think Cargo still can't quite manage features right when cross-compiling) to have features in smoltcp-device whose purpose is to make some transient data structures marginally smaller.

I am not opposed to other modifications of the feature system being used, and I think the requires/provides system you're describing here is both novel and very promising. (At least, novel to me. If you've seen it in action that's even more interesting.)

One thing that may go wrong, is multiple device implementors in a project. For example, esp-radio may provide a device that supports the ieee802154 medium, but the user may use an external ethernet chip, too, which doesn't. The two device impl crates should somehow end up building together without knowing about each other.

As far as I understand, the fundamental point or the driving force of this PR is this exact situation: having two or more implementers. In light of this we should try to analyze the proposed system by assuming that there will always multiple incompatible implementers, otherwise we could easily miss some "rare" issues that in practice will be common because people don't necessarily upgrade their dependencies at any regular cadence, or sometimes ever. (Of course it should also work for one implementer, it goes beyond saying.)

I don't have strong opinions about where the code should live, but a single repo is easier to work with when the device crate needs to be changed.

I think if you are aiming for a separate and distinct public API you should "show your work" by doing the chnages separately; not because I want you to suffer but provides inherent validation of the case where mismatching versions of the library are present by way of procedure. It also makes it abundantly clear to contributors (and to some extent downstream users) that the crates are meant to change at their own separate timescales.

(You could say that I want you to suffer exactly the same as external users would, though the idea is of course to make sure nobody involved suffers more than a trivial amount.)

time, however, makes me sad as well - we only need Instant in fns transmit and receive. We could remove that part by making micros: u64 the API (although the docs on the functions say milliseconds 🤔), but I dislike losing the meaning of the type.

How about a hard dependency on embeddede-time? It is a bit ugly but with the current solution the ugliness is just moved to the firmware... (Actually, maybe not? Last release 4 years ago, version 0.12.1, who knows if we'll paint ourselves into a corner with this module too...)

Can you show me an example where simply using smoltcp::time::Instant the way it currently is causes problems? It's only used transiently, and not really typically shared between crates, so the considerations for Debice do not apply.

I'm not sure there is a way around DeviceCapabilities

What I meant is that I think all of the fields in DeviceCapabilities should be present regardless of the features used. But now that I looked it over, even if we fix the interface, the implementation still needs to know which features are enabled. I suspect that the right solution to this problem would be to restructure the code in smoltcp::phy so you no longer need to do this in smoltcp-device.

ip_mtu() is only used via the Interface::ip_mtu() anyway, which makes it easy to remove from the Device trait. So what do we have after that?

  • Device
  • RxToken/TxToken
  • DeviceCapabilities
  • ChecksumCapabilities
  • Checksum
  • Medium
  • PacketMeta
  • Instant or whatever we do with it

is that all?

@bugadani
Copy link
Author

bugadani commented Aug 12, 2025

To clarify my position a bit: while it is difficult, the reason I don't want to do it is because I find it hard to believe that anybody will fund the development of the necessary infrastructure. If someone wants it enough I'm more or less completely fine with this proposal; I just don't want to have a half-baked version of it bitrotting for the next 5 years while becoming a major tech support problem.

I understand and agree.

To clarify my conclusions that I think are confirmed by this statement: Would you agree that the change we're discussing keeps the nature of the problem intact (crate with Device trait has more than one incompatible version installed) but reduces the perceived/experienced/felt severity of consequences by making the problem occur less often? Which is to say, you are prepared for smoltcp-device to do semver-breaking changes every once in a while, and the crux of your proposal is that there should simply be fewer of them; is this correct?

Just as Dario said above my comment, reducing the frequency of breaking changes is the goal, yes.

As far as I understand, the fundamental point or the driving force of this PR is this exact situation: having two or more implementers.

The part you're quoting is related to the provides/requires pattern. The point I was trying to make is that IF we decide to have media (or any other) features in the device crate, we'll have to make sure they are additive as they should be. Right now, the Medium enum's Default implementation is probably the only thing affected by this - the value returned would depend on the crates present in a project.

This may be an issue even right now, if implementor crates depend on smoltcp while enabling any of the medium-* features.

Can you show me an example where simply using smoltcp::time::Instant the way it currently is causes problems?

I don't think I understand what you are asking here. There are no problems with it, except that moving it out into the device crate is an unfortunate side-effect.

What I meant is that I think all of the fields in DeviceCapabilities should be present regardless of the features used.

I'm not opposed to simplifying the problem. I agree with you that if the purpose of proto-ipv4/6 in the device crate would be to cut out a single field in smoltcp-device, it's not worth having.

ip_mtu() is only used via the Interface::ip_mtu() anyway, which makes it easy to remove from the Device trait.

Indeed, I should have even noticed this before opening this PR.

is that all?

Besides the types you've listed, what remains is the packetmeta-id and std features. std can be easily forwarded from smoltcp, but packetmeta-id is visible from the implementor crate. I admit I don't understand its purpose, and I don't have an idea what to do with it without a big mess of accessor functions.


Updates:

  • Both Instance and Duration will need to stay in the driver crate, to allow implementing Instance - Instance = Duration.
  • packetmeta-id resolved by replacing the field with accessors.
  • Implemented the provides-*/requires-* split. provides enables Media variants, requires is used to cross-check the capabilities with what's enabled in smoltcp.
  • Extracted the unix-specific and test-specific interfaces to new crates. They should also serve as examples of how an implementation might look like.

@bugadani bugadani force-pushed the device branch 15 times, most recently from a2c3a65 to 46fba9f Compare August 29, 2025 15:43
@bugadani bugadani force-pushed the device branch 9 times, most recently from 85a476b to 42bf1ed Compare August 29, 2025 16:34
@bugadani bugadani marked this pull request as ready for review August 29, 2025 16:43
@bugadani bugadani changed the title Move Device and time to new crate Decouple device implementations via smoltcp-device Aug 29, 2025
@MabezDev
Copy link
Contributor

MabezDev commented Sep 2, 2025

Gentle ping here :)

We think this is a good idea, and the PR is now in a good spot (from our perspective). We'd be very open to helping with the maintence/infra, though I don't know how that would look. To reiterate, we're looking for solutions to avoid a million smoltcp X feature flags once we stabilize our wifi driver, this is just one solution which we think is a good one, and has proven to work well in embassy-net, but we're open to suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants