A ViewModel Based Data Access Layer – Persisting Data

So far in my design of a Data Access Layer that I outlined in my last two posts dealt with retrieving data from the database, so now I want to explore the second half of a data access layer, how you persist data in the database.

Facade Of Simplicity

Contrary to what the repository pattern would have you believe, saving data in a data store isn’t always a simple process that is the same in all situations. Sometimes persisting data in the database requires special optimizations that are only valid in specific scenarios. To illustrate this, let’s again look at a process of saving a question at Stack Overflow with a traditional ORM.

When a user wants to add a question the process is simple, we just create a new instance of the question POCO and tell our ORM to save it to the database. We want to save all of it’s data because it’s all new. However what about when the use wants to edit his question? In all ORMs that I know of you must run a query to retrieve the record from the database, populate your POCO with the data, save the changes the user made to the question, then send the modified object back to the database for saving.

This has several inefficiencies to it. For starters it requires you to completely retrieve the data for the question from the database when none of it is actually relevant to us. To save the edits for a question we don’t care about how many upvotes it has, how many downvotes, the original date of posting, the user who made the original post of the question, we don’t even care what the original question was. All we care about is the new title, text, and tags for the question and we want those persisted, so why retrieve all that data. This may seem insignificant but when your application gets a lot of traffic this can cause a lot of repeated and unneeded traffic between your database and webserver. Since the database is usually the bottleneck in a web application, and there are situations you can be in with Sql Azure where you pay for DB bandwidth, this can end up costing you in the long run. Also consider the effect of having to mass update multiple records.

Most ORMs (or any one worth it’s salt) have some methods around this. The first usually involves creating a new POCO object with the modified data and telling the ORM that it should save the object as is. This is usually bad because it requires the POCO to already know all the data it probably shouldn’t, such as the number of upvotes, date it was created, etc.. If any of these properties aren’t set, then using this method will cause them to be nulled or zeroed out and will most likely cause data issues. It is very risky to do this, at least with Entity Framework. Another way around the inefficiency is to tell the ORM to use straight SQL to update the record, thus bypassing the the safety-net and security of ORM.

Both of these methods have their individual situations where they are beneficial, but rarely do you want to use one or the other all the time. Trying to abstract each of these situations into a data access layer that is database agnostic isn’t simple.

Persisting Data

So now the question becomes, how can I persist data in my ViewModel based DAL. To do this I have come up with the following interface:

public interface INonQuery<TViewModel, TReturn>
{
	TReturn Execute(TViewModel viewMode);
}

This interface states that you want to take the data from a specific view model and save it to the database, with a requested return type. This allows the developer to do performance optimizations on a situation by situation basis, but if need-be they can keep the implementation consolidated until they feel they need that optimization without breaking the application layer using the API

A ViewModel Based Data Access Layer – Optionally Consolidated Query Classes

After thinking upon my data access pattern outlined in my last post, I came up with the following interface to use for queries:

public interface IQuery<TViewModel, TCriteria>
{
	TViewModel Execute(TCriteria criteria);
}

I am very happy with this idea, as it seems to have multiple advantages over current data access patterns I have evaluated.

Optionally Consolidated Queries

One advantage that I see of this pattern is that it’s trivial to combine similar queries into one class, while keeping the business layer ignorant of the grouping.

To show this let’s use the example of Stack Overflow. If you look at the homepage and your user’s page you will notice that in these pages there are 2 queries that are retrieving data for a specific user, but each query returns different information. The homepage only requires the user’s username, reputation, and badge data. However, when you view a user’s page it needs that to query for that information as well as questions and answers related to the user. Even though both queries deal with retrieving data for a specific user it would be inefficient to use the latter query for the homepage, as it would have to hit multiple tables when that data isn’t used.

An example of creating an MVC controller that uses my ViewModel DAL would be:

public class ExampleController : Controller
{
	protected IQuery<UserHomepageViewModel, UserByIdCriteria> _userHomepageQuery;
	protected IQuery<UserDashboardViewModel, UserByIdCriteria> _userDashboardQuery;
	protected int _currentUserId;
	
	public ExampleController(
		IQuery<UserHomepageViewModel, UserByIdCriteria> userHomepageQuery,
		IQuery<UserDashboardViewModel, UserByIdCriteria> userDashboardQuery)
	{
		_userHomepageQuery = userHomepageQuery;
		_userDashboardQuery = userDashboardQuery;
		_currentUserId = (int)Membership.GetUser().UserProviderKey;
	}
	
	public ActionResult Index()
	{
		var criteria = new UserByIdCriteria { UserId = _currentUserId };
		var model = _userHomepageQuery.Execute(criteria);
		return View(model);
	}
	
	public ActionResult UserDashboard()
	{
		var criteria = new UserByIdCriteria { UserId = _currentUserId };
		var model = _userDashboardQuery.Execute(criteria);
		return View(model);
	}	
}

As far as the programmer in charge of this controller is concerned, two completely separate classes are used for these queries. However, the developer can save some effort by implementing these queries into one consolidated class. For example:

public class UserByIdQueries : IQuery<UserHomepageViewModel, UserByIdCriteria>, IQuery<UserDashboardViewModel, UserByIdCriteria>
{
	protected DbContext _context;
	
	public UserByIdQueries(DbContext context)
	{
		_context = context;
	}
	
	public UserHomepageViewModel Execute(UserByIdCriteria criteria)
	{
		var user = GetUserById(criteria.UserId, false);
		return Mapper.Map<User, UserHomepageViewModel>(user);
	}
	
	public UserDashboardViewModel Execute(UserByIdCriteria criteria)
	{
		var user = GetUserById(criteria.UserId, true);
		return Mapper.Map<User, UserDashboardViewModel>(user);
	}
	
	protected User GetUserById(int id, bool includeDashboardData)
	{
		var query = _context.Users.Where(x => x.Id == id);
		
		if (includeDashboardData)
			query.Include(x => x.Questions).Include(x => x.Answers);
			
		return query.SingleOrDefault();
	}
}

To me, this gives the perfect balance of easily retrieving data from the DAL based on how I am going to use the data and still give me full flexibility on how I organize and create the DAL’s implementation.

Allowing Eager Loading In Business Logic via View Models

Today I am going to write a post about a library I have been thinking about for the past week. If I end up actually developing it I plan to make it open source as I think it solves an issue that a lot of systems may encounter in regards to eager loading database entities, and dealing with a separation of the data entities from the domain model. I am going to use this post to bring all of my thoughts together to make sure what’s in my head actually makes sense, as well as to get any feedback (if this is actually read by anyone but me)

The Problem

My web application is currently using the Entity Framework 4.1 CodeFirst system. The original issue came about when I was looking at document-oriented databases and the possibility of switching my backend to RavenDB. It wasn’t a serious idea, as it would be a huge hassle to convert the production data from sql server to a document oriented database for little benefit (at this time at least). However, it did get me to realize that even though my architecture allows me to change database backends and ORMs pretty painlessly, it does not allow me to change my data model without changing my business entity model to match. This is mostly due to my use of Linq to specify queries in my business layer.

My thoughts then went to switching to use a repository pattern for data access. The repository pattern would allow me to more easily organize my queries, abstract away data access from the business layer, and would mean that my data model could be completely different from my business entity model and changes can be made to one without affecting the other. Unfortunately, the repository pattern has a big issue when it comes to eager loading in that it usually ends up with many repeating methods, such as:

public class EFUserRepository : IUserRepository
{
    public User GetUserById(int id) { }
    public User GetUserByIdWithContacts(int id) { }
    public User GetUserByIdWithContactsWithCompanies(int id) { }
    ....
}

I have not found a better solution to handle eager loading with the repository pattern and the previous examples has too much repeating code for my liking.

The VMQA (View Model Querying Assistant) Library

After thinking about this problem for a bit, I later realized that what I ultimately want is to pass a view model data structure into a repository query method, for the repository to come up with (and execute) an eager loading strategy based on the view model, and then to map all the properties of the view model with the data retrieved from the database. To accomplish this I want to rewrite my repository methods to look something like:

public class EFUserRepository : IUserRepository
{
    public T GetUserById<T>(int id) where T : class 
    { 
        // Form the query
        var query = _context.Users.Where(x => x.Id == id);

        // Determine the eager loading strategy
        var includeStrings = vmqa.GetPropertiesToEagerLoadForClass(T).ConvertToEFStrings();
        foreach (string inc in includeStrings)
            query.Include(inc);

        // Execute the query and map results into the specified data structure
        var user = query.First();

        // Return an instance of the view model with the data filled in from the user
        return vmqa.MapEntityToViewModel<User, T>(user);
    }
}

The design I have floating around in my head to accomplish this has two parts.

Eager Loading

The first part is coming up with an eager loading strategy based on the specified data structure, a data structure which the library has no previous knowledge of. I envision this working by building a map of how all the data entities fit together with an interface that is similar to how configuring EF CodeFirst models is done. An example mapping would be:

public void MapEntities()
{
    vmqa.Entity<User>().HasMany(typeof(Contact), x => x.ContactsProperty);
    vmqa.Entity<Contact>().HasOne(typeof(Company), x => x.CompanyProperty);
}

This would tell the system that the User entity maps to Contacts via the User.ContactsProperty.

In my repository I would then call upon the VMQA library to look at the data structure and resolve the eager loading strategies. To allow the VMQA library to understand how to map an arbitrary view model data structure to a data entity, I would use custom attributes that would decorate a class such as:

[MapsToEntity(typeof(User))]
public class UserViewModel
{
    public int Id { get; set; }
    public string Name { get; set; }

    [MapsToEntity(typeof(Contact))]
    public IList Contacts { get; set; }
}

Using reflection, the vmqa.GetPropertiesToEagerLoadForClass should be able to figure out that the core entity maps to the User, that Contacts collection maps to the Contact. Since it knows that the view model explicitly specifies a view model for the Contacts entity it can determine that the Contacts property needs to be eager loaded, and will then return a data structure containing the Contact type and the name of the property it maps to from the User entity.

It should be easy to allow this system to allow complex mappings, so if the user view model has a property that maps to a collection of companies, the system can automatically find a path from User to for eager loading.

The ConvertToEFStrings() extension method would then convert that data structure into a collection of strings that Entity Framework can use for eager loading. This would be an extension method so that the library is not restricted to just Entity Framework, and can be used for any ORM or database system.

Entity Mapping

The second part of the whole system deals with automatically mapping the returned data entities into the view model. This seems like a natural extension of the library to eager loading to me, as if you are passing the view model into the repository to describe what data you want to retrieve, it should automatically copy that data into the view model for your usage. No matter what, the developer is going to want to do this anyway and it seem sensible to automatically do this with VMQA, since it already has knowledge about the relationships.

My first version of the library would probably do a straight name for name mapping, so that UserViewModel.Name maps to User.Name. Eventually I would probably add property mapping attributes so more complex mappings can be performed.

Conclusion

After finally getting this design written down, I feel more confident than ever that this would be an extremely valuable library to have, and that it actually seems relatively straight-forward to create. Now I just need the time!