foreach modifications #1404

ghost · 2018-03-21T09:01:28Z

ghost
Mar 21, 2018

Using foreach grantees that the compiler will not add the array bounds check. But foreach used only to read array values, making it unsuitable for code that modifies thaw array.
So, I suggest to allow us to use ref with the foreach variable, to use it to modify the array like this:

int[] a = { 1, 2, 3, 4, 5, 6 };
foreach (ref int item in a)
{
   item += 1;
}

also, Foreach will cover more ground if we can control the start and length.

foreach (ref int item in a<1, 4>)
{
   item += 1;
}

which can be done like this:

var b = new ArraySegment<int>(a, 1, 4);
foreach (ref int item in b)
{
    item += 1;
}

Note that there is no risk if the start and length are not valid for the array, becuase they will be checked when the compiler defines the ArraySegment. I mean it ls safe becuase it will throw an exception if they are out of bound.
I also want a way to get the corresponding index for each item, without having to declare a variable and increase it.. This is useful in many cases such as a nested loop that started from the outer loop counter. Maybe:

foreach (ref int item in a; int i)
{
   item += i;
}

which is the shortcut for:

int i = 0;
foreach (ref int item in a)
{
   item += i;
   i++;
}

It is important also to "for each" to iterate on two arrays:

foreach (ref int item1 in a<2, 4>; int i, ref int item2 in b<3>; int j)
{
   item1 = item2 * 2;
}

This b array iteration can be done by using the MoveNext method inside the foreach loop of a array.
The compiler can generate additional check to ensure that b has 4 or more items staring from index 3.

if (3 > b.Length -1 || 3+4 > b.Length -1) 
   throw ……;
var c = new ArraySegment<int>(a, 1, 4);
int i = 1;
var bEnum = b.GetEnumerator( );
int j = 3;
foreach (ref int item1 in c)
{
    bEnum.MoveNext( );
    ref int item2 = ref (int)bEnum.Current;
    item1 = item2 * 2;
    i++;
    j++;
}

Conclusion:
This way we can use foreach to handle most for loop cases, so we have a high level safe way to tell the compiler not to add bound checks.
IL optimization can't realize many cases of for ststments where bound checks are not necessary. Using foreach is our way to tell the complliler to mark this code as fixed. It can convert it to for loop in unsafe fixed block. My goal is optimization not compact syntax this time. It can be used with larg arrays in criticl situations. I also asked many time to provide a class for array operations like vectoe class does. Its methods can be written with pointers to aggregate one array or do math operations on two arrays, so it will be faster.

svick · 2018-03-21T11:39:36Z

svick
Mar 21, 2018
Collaborator

I suggest to allow us to use ref with the foreach variable,

This has already been proposed, see #461 and #1046.

Foreach will cover more ground if we can control the start and end

This has also already been proposed, see #185. Though the syntax is different.

Not that the compiler will check that the given start and end are valid for the array.

The compiler can do no such thing, at least in general, because it doesn't know the size of the array.

I also want a way to get the corresponding index for each item, without having to declare a variable and increase it

I would find it useful to be able to get the index when iterating an IEnumerable<T>. But when you're iterating an array, you can just use a for loop.

It is important also to "for each" to iterate on two arrays

Similarly, I think this use case is better served by using a for loop.

0 replies

ghost · 2018-03-21T13:32:54Z

ghost
Mar 21, 2018

@svick

The compiler can do no such thing, at least in general, because it doesn't know the size of the array.

Sure it does. I tried the code that I suggest to be generated and i had exception when the length is out of bounds. By the way I ment the length not the end but I was sleapy then. Sorry 😉

About for loops: the IL optimization can't realize many cases where bound checks are not necessary. Using foreach is our way to tell the complliler to mark this code as fixed. It can convert it to for loop in unsafe fixed block. My goal is optimization not compact syntax this time. It can be used with larg arrays in criticl situations. I also asked many time to provide a class for array operations like vectoe class does. Its methods can be written with pointers to aggregate one array or do math operations on two arrays, so it will be faster.

0 replies

jnm2 · 2018-03-21T15:05:40Z

jnm2
Mar 21, 2018
Collaborator

@MohammadHamdyGhanem

Sure it does. I tried the code that I suggest to be generated and i had exception when the length is out of bounds.

That wasn't the compiler, that was the runtime. I think you meant to say, "Note that the compiler will generate a check that the given start and end are valid for the array." I don't see any need for it to do that, though. Slicing an array will already do a check in library code, and if it didn't, the runtime would step in.

0 replies

ghost · 2018-03-21T15:12:46Z

ghost
Mar 21, 2018

@jnm2
You aqre right. I don't have deep knowlage of inner details, so thanks for making things clear. I corrected this in the proposal.

0 replies

svick · 2018-03-21T21:17:34Z

svick
Mar 21, 2018
Collaborator

@MohammadHamdyGhanem

the IL optimization can't realize many cases where bound checks are not necessary. Using foreach is our way to tell the compiler to mark this code as fixed. It can convert it to for loop in unsafe fixed block.

Except the compiler does no such thing. If you have an array, the IL for iterating it using foreach or for is almost the same.

I don't think it's a good idea to add features to the language that only make sense if the compiler does some thing, as long as no compiler does that thing. Maybe you should open an issue in the Roslyn repo asking for the compiler to behave this way? Though I'm not sure if you would have success with that issue, especially considering that using fixed can make performance worse (because it prevents the GC from doing a good job).

0 replies

ghost · 2018-03-21T21:26:43Z

ghost
Mar 21, 2018

It behaves like this, and eliminates the bounds checks in foreeach if c# was compiled to IL with the /optimize flag. We can ask for make this the default later. I read this info here, and then this proposal came ti me:
https://blogs.msdn.microsoft.com/clrcodegeneration/2009/08/13/array-bounds-check-elimination-in-the-clr/

0 replies

ghost · 2018-03-21T21:32:40Z

ghost
Mar 21, 2018

But, I also mentioned another way:
C# can compile these foreach statements to low-level unsafe code that uses pointers directly. This code needs no boundry checks, because fore each statemnts grantee that.

0 replies

svick · 2018-03-21T21:33:07Z

svick
Mar 21, 2018
Collaborator

@MohammadHamdyGhanem You'll notice that the article talks about eliminating bounds checks both in for loops and in foreach loops. You don't need a foreach loop to get bounds check elimination in the normal case, so you shouldn't need special foreach loops to get bounds check elimination in the special cases you're talking about.

0 replies

ghost · 2018-03-21T22:01:29Z

ghost
Mar 21, 2018

I read many articles about this. The Compiler leave out many obvious cases just to be careful. I.e, if you use any variable that carries the length of the array, the loop will not be optimized! This is the case also when using a sub range of the array!
Using foreach is a way to tell the compiler to trust us, becuase we don't access elements by index directly. The new syntaxes I suggested can allow you to write complex loops that is still safe, and that I don't expect to be optimized with normal for loops.
In short: This is a high-level way to avoid writing unsafe blocks directly. It can be implemented also in VB.NET which can't use pointers.
If we agree on the concept. it is easy for C# Compiler and JIT to implement it. I already suggesed two ways for implementation, one with safe code that uses iterators, and another with unsafe code using pointers and I think it will be faster and independnt of how JIT deal with bound checks.

0 replies

CyrusNajmabadi · 2018-03-21T22:09:26Z

CyrusNajmabadi
Mar 21, 2018
Collaborator

Using foreach is a way to tell the compiler to trust us,

The jit has no way of knowing which compiler generated the code. Nor can it just "trust" that the compiler did things correctly. What if the compiler did not? The jit must do things correctly first, looking for and exercising performance optimizations it can prove are sound to make.

0 replies

CyrusNajmabadi · 2018-03-21T22:10:47Z

CyrusNajmabadi
Mar 21, 2018
Collaborator

becuase we don't access elements by index directly

The jit doesn't need the compiler to tell it this. The Jit can see this from its own analysis of the IL presented to it. Or, if it cannot tell, no amount of assertion from teh C# compiler will change that as the C# compiler itself cannot do things like see into the implementations of ref-assemblies and the like.

0 replies

mikedn · 2018-03-21T22:26:53Z

mikedn
Mar 21, 2018

Using foreach grantees that the compiler will not add the array bounds check.

This (and a bunch of stuff that follows from it) is a bunch of nonsense. foreach is a high level language construct while range check elimination is a low level optimization performed by the JIT compiler. The JIT doesn't care (and doesn't even know) that it was a foreach loop or a for loop.

foreach does happen to have an advantage in this regard but it's more of a side effect of the syntax - if the array reference comes from a class field then it is copied to a local variable first to avoid loading the field multiple times. It happens so that this local variable makes life easier for the JIT.

Foreach will cover more ground if we can control the start and length.

Yeah. You can do that already:

foreach (var x in new Span<int>(a, 4, 2))
{
    …
}

if you use any variable that carries the length of the array, the loop will not be optimized!

Not sure what that means. If it means what I think it means then it's not an accurate description of what happens.

0 replies

ghost · 2018-03-22T08:54:26Z

ghost
Mar 22, 2018

1- Is it difficult to mark IL code with some attribute, or put it in some block, to say to JIT: "don't optimize this"?
2- In the unsafe approach. I think JIT will not do anything with the pointers, because this is what the "unsafe" means. It is the responsibility of the programmer not the compiler.

Once again: the implementation is not the issue here. If we agreed on the concept, anything can be done.
The concept is to treat foreach as a mark of a loop that is granted to be in bounds, because the programmer doesn't use indexes directly. C# compiler can do "something" to put that fact in IL, so JIT doesn't add bound checks. One way is to convert the foreach code to an unsafe fixed block that uses pointers not the high-level array indexes.

0 replies

mikedn · 2018-03-22T09:12:09Z

mikedn
Mar 22, 2018

If we agreed on the concept, anything can be done. The concept is to treat foreach as a mark of a loop that is granted to be in bounds, because the programmer doesn't use indexes directly.

None of this make any sense. The JIT compiler already eliminates range checks from loops produced by foreach on an array/span. You're proposing a solution for a problem that does not exist.

0 replies

ghost · 2018-03-22T09:29:44Z

ghost
Mar 22, 2018

I know about foreach, and said this in the first line of this proposal.

Using foreach grantees that the compiler will not add the array bounds check. But....

But foreach usage is limited, even with the use of span or ArraySegment. I Suggested to add new syntaxes to foreach to wide its use, so we can write more coplex for loops with it. I expect you can suggest more syntaxes to inculde more cases.
In short: I want to be able to use foreach in all loops that don't access array elemnts far away from the current index (I can use one variable in foreach to access the prev. element, but other cases will need a for loop. I was thinking of some sort of foreach variable that have some properties of its own, like current index, current element and prev. element, so we don't need to define more variables).
Hope I made it clear this time.
Thanks.

0 replies

mikedn · 2018-03-22T09:40:56Z

mikedn
Mar 22, 2018

I Suggested to add new syntaxes to foreach to wide its use, so we can write more coplex for loops with it

You suggested many things. Some things don't make sense, some things have already been suggested, some thing might be new/useful. You're also mixing convenience language features with optimization that might be better discussed in the context of JIT/IL optimizations. There were plenty of such discussion in the past and they always get shot down because ultimately it's not the job of the C# language compiler proper to make such optimizations.

In short, it's difficult to make head or tail of this discussion.

0 replies

ghost · 2018-03-22T11:08:43Z

ghost
Mar 22, 2018

Why you should throw all the load on the JIT optimization, while you can tell it from the beginnig that this is a safe for loop? We can writ a safe foreach code, C# compiler trnslates it to an already optimum IL, and JIT lives haply ever after :)
What I propose here is related to C#, and if it is done, we can ask the same thing from VB.NET and F#.

0 replies

mikedn · 2018-03-22T12:18:08Z

mikedn
Mar 22, 2018

Why you should throw all the load on the JIT optimization, while you can tell it from the beginnig that this is a safe for loop?

What load? Do you have any numbers to show that the cost of removing such trivial range checks is significant to the JIT?

And why should anyone spend time implementing something that already works? Or why should the C# compiler suddenly start producing unsafe/unverifiable IL for code that's obviously safe/verifiable? Why should we ask the same thing from 3 languages when we could ask this from an IL optimizer that doesn't care about the language the code was originally written to?

0 replies

ghost · 2018-03-22T13:10:51Z

ghost
Mar 22, 2018

Will JIT optimize this:

for (int i = x; i < foo(); i++)
{
      a[i] = b[i + 2];
}

If not, I suggest this:

foreach (ref int item1 in a<x, foo()-x>; int item2 in b<x+2> )
{
   item1 = item2;
}

0 replies

HaloFour · 2018-03-22T13:34:12Z

HaloFour
Mar 22, 2018

@MohammadHamdyGhanem

Randomly suggesting C# syntax doesn't help. The question is the IL and what the JIT does with it. Having to use a completely different syntax to somehow suggest to the JIT what to actually do seems pretty inappropriate. If there are common patterns that could be optimized by the JIT then that is (and should be) a CoreCLR request.

0 replies

ghost · 2018-03-22T13:47:12Z

ghost
Mar 22, 2018

@HaloFour
I just share my ideas. I can't know what is appropriate or inappropriate, and this is why we have a discussion. I can't also be aware of all previous proposals.
By the way: why there is no backword foreach statement? may be a foreachbw ?

0 replies

HaloFour · 2018-03-22T13:52:26Z

HaloFour
Mar 22, 2018

@MohammadHamdyGhanem

Which is fine, but the criticism is part of that discussion.

Enumeration is unidirectional. An IEnumerator only has one possible direction. As such a backwards foreach makes no sense in most contexts. For the special case that is arrays it doesn't make sense to have a new dedicated syntax. Just use for which gives you complete control over how you manipulate the index.

0 replies

mikedn · 2018-03-22T13:53:22Z

mikedn
Mar 22, 2018

for (int i = 1; i < 4; i++) { a[i] = b[i + 2]; }

The JIT optimizes this using loop cloning. It's not great but there's little else that can be done for such code.

foreach (ref int item1 in a<1, 3>; int item2 in b<3> )

You can achieve the same effect (no range checks inside the loop) using this code:

Span<int> sa = new Span<int>(a, 1, 3);
Span<int> sb = new Span<int>(b, 3, 3);
for (int i = 0; i < sa.Length; i++)
{
    sa[i] = sb[i];
}

Sure, it's kind of verbose. Some special language syntax for dealing with the case of iterating over "parallel" arrays might be worth considering. But mixing that with optimization issues is not a good idea.

To be more precise:

it's certainly worth considering how a language feature can be implemented in the most efficient manner
but attempting to sell a language feature using a very specific and "exotic" implementation will simply distract from the actual language feature and derail the whole discussion

0 replies

jnm2 · 2018-03-22T14:00:07Z

jnm2
Mar 22, 2018
Collaborator

My honest reaction to this:

foreach (ref int item1 in a<x, foo()-x>; int item2 in b<x+2> )

is that it's cryptic and visually complex, doesn't look like it tries to fit the style of anything in C#, and doesn't appeal to me. The for loop version looks more clean to me. If performance is of the essence, @mikedn's sample looks the clearest for that.

0 replies

mikedn · 2018-03-22T14:01:20Z

mikedn
Mar 22, 2018

foreach (ref int item1 in a<1, 3>; int item2 in b<3> )

I'd say that this syntax is not very readable. Maybe it should be built on tuple (and slice) syntax:

foreach ((ref int dst, int src) in (a[1..3], b[3..5]))
    dst = src;

0 replies

ghost · 2018-03-22T17:40:23Z

ghost
Mar 22, 2018

Thanks all for this useful discussion.
@HaloFour
Backward iteration is helpful in situations where you remove a range of items from a collection. I know it can be done by a loop that keeps removing the first item in the range, but this can look as a mysterious code for beginners. A backward sliced foreach looks more reasonable, or a RemoveRange method but it will not help if you want to do something with the removed item.
@mikedn
I don,t mind using any other suitable syntax. My focus was on the goal, so I came up with this syntax on the fly.

0 replies

DarthVella · 2018-03-22T22:37:57Z

DarthVella
Mar 22, 2018

@mikedn A foreach loop enumerating two separate IEnumerables at the same time? I could get behind that!

@MohammadHamdyGhanem foreach works on anything that has a specific method signature - IEnumerator GetEnumerator(). It does not care about the specific implementations of how that enumerator gets its items. The underlying functionality may not have a notion of "backwards;" for example, an enumerator that just generates random numbers. Making a foreachbackwards keyword would not be possible without fundamentally changing the meaning of foreach.

0 replies

foreach modifications #1404

Uh oh!

Uh oh!

Replies: 27 comments

Uh oh!

svick Mar 21, 2018 Collaborator

Uh oh!

Uh oh!

Uh oh!

jnm2 Mar 21, 2018 Collaborator

Uh oh!

Uh oh!

Uh oh!

svick Mar 21, 2018 Collaborator

Uh oh!

Uh oh!

Uh oh!

svick Mar 21, 2018 Collaborator

Uh oh!

Uh oh!

Uh oh!

CyrusNajmabadi Mar 21, 2018 Collaborator

Uh oh!

CyrusNajmabadi Mar 21, 2018 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnm2 Mar 22, 2018 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

svick
Mar 21, 2018
Collaborator

jnm2
Mar 21, 2018
Collaborator

svick
Mar 21, 2018
Collaborator

svick
Mar 21, 2018
Collaborator

CyrusNajmabadi
Mar 21, 2018
Collaborator

CyrusNajmabadi
Mar 21, 2018
Collaborator

jnm2
Mar 22, 2018
Collaborator