Why Entity Framework 4.1’s Linq capabilities are inferior to Linq-to-Sql

I have a small-ish website at work that contains several tools we use internally. I originally coded the database layer using Linq-to-Sql, as it was too small to overcome the learning curve of NHibernate and Entity Framework version 4 was not out at the time.

However, lately database schema changes have been harder to incorporate into the Linq-to-Sql edmx, mostly due to the naming differences between database tables and relationships and their C# class names. I decided to convert my database layer to Entity Framework 4.1 CodeFirst based on an existing database, as I use EF CodeFirst in a current project and already familiar with it.

The process of converting my Linq-to-sql was *supposed* to be a simple matter of replacing my calls to my L2S data context to my EF DbContext, and allow Linq to take care of the rest. Unfortunately, this is not the case due to Entity Framework’s Linq capabilities being extremely limited.

Very Limited Function Support

In Linq-to-Sql it was common for me to perform transformations into a view model or an anonymous data type right in the linq. Two examples of this in my code are:

var client = (from c in _context.Clients
                where c.id == id
                select ClientViewModel.ConvertFromEntity(c)).First();

var clients = (from c in _context.Clients
                orderby c.name ascending
                select new
                {
                    id = c.id,
                    name = c.name,
                    versionString = Utils.GetVersionString(c.ProdVersion),
                    versionName = c.ProdVersion.name,
                    date = c.prod_deploy_date.ToString()
                })
                .ToList();

These fail when run in Entity Framework with NotSupportedExceptions. The first one fails because it claims EF has no idea how to deal with ClientViewModel.ConvertFromEntity(). This would make sense, as this is a custom method, except this works perfectly in Linq-to-Sql. The 2nd query fails not only for Utils.GetVersionString(), but it also fails because EF has no idea how to handle the ToString() off of a DateTime, even though this is all core functionality.

In order to fix this, I must return the results from the database and locally do the transformations, such as:

var clients = _context.Clients.OrderBy(x => x.name)
                              .ToList()
                              .Select(x => new
                              {
                                  id = c.id,
                                  name = c.name,
                                  versionString = Utils.GetVersionString(c.ProdVersion),
                                  versionName = c.ProdVersion.name,
                                  date = c.prod_deploy_date.ToString()
                              })
                              .ToList();

No More Entity to Entity Direct Comparisons

In Linq-to-Sql I could compare one entity directly to another in the query. So for example the following line of code worked in L2S:

context.TfsWorkItemTags.Where(x => x.TfsWorkItem == TfsWorkItemEntity).ToList();

This fails in Entity Framework and throws an exception because it can’t figure out how to compare these in Sql. Instead I had to change it to explicitly check on the ID values themselves, such as:

context.TfsWorkItemTags.Where(x => x.TfsWorkItem.id == tfsWorkItemEntity.id).ToList();

It’s a minor change, but I find it annoying that EF isn’t smart enough to figure out how to compare entities directly, especially when it has full knowledge of how the entities are mapped and designed. Yes, I could have gone straight through using TfsWorkitemEntity.Tags, but this is a simple example to illustrate lost functionality.

Cannot Use Arrays In Queries

This issue really caught me by surprise, and I don’t understand why this was omitted. In my database versions consist of 4 parts: major, minor, build, and revision numbers, and is usually represented in string form as AA.BB.CC.DD. I have a utility method that converts the string into an array of ints, which I then used in my Linq-to-Sql query:

int[] ver = Utils.GetVersionNumbersFromString(versionString);
return context.ReleaseVersions.Any(x => x.major_version == ver[0] && x.minor_version == ver[1]
                                    && x.build_version == ver[2] && x.revision_version == ver[3]);

Under L2S, this query works fine, but (as the common theme in this post) fails in Entity Framework 4.1 with a NotSupportedException with the message “The LINQ expression node type ‘ArrayIndex’ is not supported in LINQ to Entities.”.

In order to fix this I had to split my version into individual ints instead.

Dealing With Date Arithmetic

My final issue I have come across is probably the most annoying, mostly because I can’t find a good solution except to do most of the work on the client side. In my database I store test requests, and those requests can be scheduled to run at a certain time on a daily basis. This scheduled time is stored in a time(7) field in the database. When I query for any outstanding test requests one of the criteria in the query is to make sure that the current date and time is greater than today’s date at the scheduled time. In L2S I had:

var reqs = _context.TestRequests.Where(x => DateTime.Now > (DateTime.Now.Date + x.scheduled_time.Value)).ToList();

This fails in Entity Framework for 2 reasons. The first is that DateTime.Now.Date isn’t supported, so i had to create a variable to hold that and use that in the query.

The 2nd issue is then that EF can’t make a query out of the current date added to a specific time value. This causes an ArgumentException with the message “DbArithmeticExpression arguments must have a numeric common type.”.

I have not found a way to do this in EF, and instead had to resort to pulling a list of all TestRequest entities from the database, and locally pull out only the ones that fit that criteria.

Conclusion

I am utterly baffled by how limited Entity Framework is in it’s Linq abilities. While I will not go back to L2S for my internal tool I am definitely having second thoughts of using EF 4.1 in my more complex personal projects. It definitely seems that Microsoft is going more for feature count in their frameworks rather than functional coverage lately.

Allowing Eager Loading In Business Logic via View Models

Today I am going to write a post about a library I have been thinking about for the past week. If I end up actually developing it I plan to make it open source as I think it solves an issue that a lot of systems may encounter in regards to eager loading database entities, and dealing with a separation of the data entities from the domain model. I am going to use this post to bring all of my thoughts together to make sure what’s in my head actually makes sense, as well as to get any feedback (if this is actually read by anyone but me)

The Problem

My web application is currently using the Entity Framework 4.1 CodeFirst system. The original issue came about when I was looking at document-oriented databases and the possibility of switching my backend to RavenDB. It wasn’t a serious idea, as it would be a huge hassle to convert the production data from sql server to a document oriented database for little benefit (at this time at least). However, it did get me to realize that even though my architecture allows me to change database backends and ORMs pretty painlessly, it does not allow me to change my data model without changing my business entity model to match. This is mostly due to my use of Linq to specify queries in my business layer.

My thoughts then went to switching to use a repository pattern for data access. The repository pattern would allow me to more easily organize my queries, abstract away data access from the business layer, and would mean that my data model could be completely different from my business entity model and changes can be made to one without affecting the other. Unfortunately, the repository pattern has a big issue when it comes to eager loading in that it usually ends up with many repeating methods, such as:

public class EFUserRepository : IUserRepository
{
    public User GetUserById(int id) { }
    public User GetUserByIdWithContacts(int id) { }
    public User GetUserByIdWithContactsWithCompanies(int id) { }
    ....
}

I have not found a better solution to handle eager loading with the repository pattern and the previous examples has too much repeating code for my liking.

The VMQA (View Model Querying Assistant) Library

After thinking about this problem for a bit, I later realized that what I ultimately want is to pass a view model data structure into a repository query method, for the repository to come up with (and execute) an eager loading strategy based on the view model, and then to map all the properties of the view model with the data retrieved from the database. To accomplish this I want to rewrite my repository methods to look something like:

public class EFUserRepository : IUserRepository
{
    public T GetUserById<T>(int id) where T : class 
    { 
        // Form the query
        var query = _context.Users.Where(x => x.Id == id);

        // Determine the eager loading strategy
        var includeStrings = vmqa.GetPropertiesToEagerLoadForClass(T).ConvertToEFStrings();
        foreach (string inc in includeStrings)
            query.Include(inc);

        // Execute the query and map results into the specified data structure
        var user = query.First();

        // Return an instance of the view model with the data filled in from the user
        return vmqa.MapEntityToViewModel<User, T>(user);
    }
}

The design I have floating around in my head to accomplish this has two parts.

Eager Loading

The first part is coming up with an eager loading strategy based on the specified data structure, a data structure which the library has no previous knowledge of. I envision this working by building a map of how all the data entities fit together with an interface that is similar to how configuring EF CodeFirst models is done. An example mapping would be:

public void MapEntities()
{
    vmqa.Entity<User>().HasMany(typeof(Contact), x => x.ContactsProperty);
    vmqa.Entity<Contact>().HasOne(typeof(Company), x => x.CompanyProperty);
}

This would tell the system that the User entity maps to Contacts via the User.ContactsProperty.

In my repository I would then call upon the VMQA library to look at the data structure and resolve the eager loading strategies. To allow the VMQA library to understand how to map an arbitrary view model data structure to a data entity, I would use custom attributes that would decorate a class such as:

[MapsToEntity(typeof(User))]
public class UserViewModel
{
    public int Id { get; set; }
    public string Name { get; set; }

    [MapsToEntity(typeof(Contact))]
    public IList Contacts { get; set; }
}

Using reflection, the vmqa.GetPropertiesToEagerLoadForClass should be able to figure out that the core entity maps to the User, that Contacts collection maps to the Contact. Since it knows that the view model explicitly specifies a view model for the Contacts entity it can determine that the Contacts property needs to be eager loaded, and will then return a data structure containing the Contact type and the name of the property it maps to from the User entity.

It should be easy to allow this system to allow complex mappings, so if the user view model has a property that maps to a collection of companies, the system can automatically find a path from User to for eager loading.

The ConvertToEFStrings() extension method would then convert that data structure into a collection of strings that Entity Framework can use for eager loading. This would be an extension method so that the library is not restricted to just Entity Framework, and can be used for any ORM or database system.

Entity Mapping

The second part of the whole system deals with automatically mapping the returned data entities into the view model. This seems like a natural extension of the library to eager loading to me, as if you are passing the view model into the repository to describe what data you want to retrieve, it should automatically copy that data into the view model for your usage. No matter what, the developer is going to want to do this anyway and it seem sensible to automatically do this with VMQA, since it already has knowledge about the relationships.

My first version of the library would probably do a straight name for name mapping, so that UserViewModel.Name maps to User.Name. Eventually I would probably add property mapping attributes so more complex mappings can be performed.

Conclusion

After finally getting this design written down, I feel more confident than ever that this would be an extremely valuable library to have, and that it actually seems relatively straight-forward to create. Now I just need the time!