What is the best way to use IEqualityComparer to return distinct values?

c# distinct entity-framework iequalitycomparer

Question

A L2E query I have produces some data with duplicate objects in it. I have to get rid of those duplicate items. Basically, I should presume that the objects are duplicates if their IDs are the same. I tried.q.Distinct() duplicate items were still returned, however. Afterward, I tried developing my own IEqualityComparer and supplying it to theDistinct() method. The method returned the error message:

LINQ to Entities does not recognize the method 'System.Linq.IQueryable1[DAL.MyDOClass] Distinct[MyDOClass](System.Linq.IQueryable1[DAL.MyDOClass], System.Collections.Generic.IEqualityComparer`1[DAL.MyDOClass])' method, and this method cannot be translated into a store expression.

And this is how EqualityComparer is now being used:

  internal class MyDOClassComparer: EqualityComparer<MyDOClass>
    {
        public override bool Equals(MyDOClass x, MyDOClass y)
        {
            return x.Id == y.Id;
        }

        public override int GetHashCode(MyDOClass obj)
        {
            return obj == null ? 0 : obj.Id;
        }
    }

How then can I create my own?IEqualityComparer properly?

1
51
12/20/2011 8:08:38 AM

Accepted Answer

An EqualityComparer is not the best course of action since it can only filter your result set in memory, such as:

var objects = yourResults.ToEnumerable().Distinct(yourEqualityComparer);

The may be used toGroupBy way to group using IDs and theFirst approach to restrict database access to one unique entry per ID, for instance:

var objects = yourResults.GroupBy(o => o.Id).Select(g => g.First());
126
12/20/2011 8:07:27 AM

Popular Answer

Ladislav Mrnka and Rich. Okelly are both right in their own unique ways.

Both of their responses discuss the fact that theIEqualityComparer<T> is not going to be converted to SQL.

I believe it's worthwhile to weigh the advantages and disadvantages of each, which will need more than just a remark.

With Rich's method, the query is rewritten to provide the same outcome in a new query. Their code ought to function about as effectively as manually-written SQL would.

Ladislav's retrieves it from the database just before the distinct, at which time an in-memory strategy will function.

The database will probably be the most performant option in this situation since it excels at doing the grouping and filtering that rich's rely on. However, you could discover that because of the intricacy of what is happening before this grouping, Linq-to-entities instead creates a number of queries and then does part of the work in memory, which is potentially quite unpleasant.

On in-memory scenarios, grouping often costs more than distinct (particularly if you bring it into memory withAsList() instead ofAsEnumerable() ). Therefore, it would be more efficient if either of you were already planning to put it into memory at this point owing to another necessity.

It would also be your only option if your equality definition didn't correspond well to the information contained alone in the database, and of course it allowed you to swap equality definitions if you so desired based on an analysis.IEqualityComparer<T> an argument that was passed.

Overall, Rich's is the response I'd think is most likely to be the greatest option here, but Ladislav's has various advantages and disadvantages from Rich's, making it equally well worth examining and taking into consideration.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow