Less than ideal codegen for iteration

The way query iteration is currently implemented leads to inefficient code. On my machine, `iterate_mut_100k` runs in around 29us. However, running the same iteration through explicit archetypes leads to much better performance at around 10us. Looking at the assembly, the difference is loop unrolling and better auto-vectorization.

From previous experience writing code like this in Rust, I'd say that the issue is caused by having to deal with case where we need to move to the next archetype when iterating over the query. I haven't yet tested, but I'd expect similar or even worse performance were the archetypes `Iterator::chain`ed together.

Here is the code that runs ~2.9x faster:
```rust
fn iterate_mut_100k_archetypes(b: &mut Bencher) {
    let mut world = World::new();
    for i in 0..100_000 {
        world.spawn((Position(-(i as f32)), Velocity(i as f32)));
    }
    b.iter(|| {
        for archetype in world.archetypes() {
            if let (Some(mut pos), Some(vel)) = (archetype.get::<&mut Position>(), archetype.get::<&Velocity>()) {
                for (pos, vel) in pos.iter_mut().zip(vel.iter()) {
                    pos.0 += vel.0;
                }
            }
        }
    })
}
```

I'm opening this issue to start a discussion about how some of this performance could be tapped into without having to rely on explicitly iterating through all the archetypes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less than ideal codegen for iteration #351

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Less than ideal codegen for iteration #351

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions