Asp.Net Membership Provider’s Lifetime Considerations- Part 2

Previously I made a post about issues I encountered with the Asp.Net Membership Provider.

My fix was a bit short-sighted as it only fixed the issue in one location, when the account controller tries to perform operations. The problem with my solution is that it does not deal with all the areas where Asp.Net creates the membership provider itself, and keeps that instance running through the application lifetime.

This meant that all calls to the membership provider (e.g. calling Membership.GetUser()) were all using the same database session, even across multiple web requests. Not only did this cause hidden caching issues, this also mean that if the database connection was broken all calls to the membership provider would now exception out with the only way to fix it is to restart the Asp.Net instance.

In order to fix this I had to remove my database session creation out of my custom membership provider’s constructor, and instead retrieve a new database session from my IoC system inside each method.

This seems to not only fix my original issue, but several other “random” exceptions that I could not reproduce on a regular basis.

Beware of Asp.Net’s Membership Provider Lifetime, Entity Framework Caching, and Dependency Injection

I recently struggled while dealing with a bug I encountered in my Asp.Net MVC application, and I thought I would write about it to hopefully help someone else.

I have a method in my business layer that users use to change their password. After implementing unit tests to verify the method worked properly I implemented a call to it in my account controller. Everything seemed perfect at first, as I was able to change my password successfully and even validate that the user entered their correct current password, which is required to perform a password change.

However, when I logged out and tried to log back in using the new password the login attempt failed. Yet when I used my previous password I was able to log in! To make things even more confusing, since I require users to enter their current password to change their password I was able to confirm that the password change did actually take effect. Finally, when I made a debugging change and re-ran the app, the new password worked when logging in!

A few irritating hours later, I finally figured out what was wrong. It came down to the difference of lifetimes in my MVC application between different classes. My AccountMembershipService class, which was mostly based on the default code that came with MVC, had the following constructor:

        public AccountMembershipService()
            : this(null)
        {
        }

        public AccountMembershipService(MembershipProvider provider)
        {
            _provider = provider ?? Membership.Provider;
        }

The problem is that my application loads its service classes using Castle Windsor, and entity framework is loaded from Windsor with a per-web request lifetime. Even though my custom membership provider was retrieving the database context from Windsor, Asp.Net creates the membership provider for the lifetime of the whole application.

So what happened was that I would log in using the database context for the Membership Provider, but when I changed my password the service class would use a different database context. When I changed my password, the password is correctly changed in the database (and the request’s specific database context), but the Membership Provider is still using the original database context. Since Entity Framework caches previously received entities, when I go back to log in entity framework receives the user entity out of the cache, not the database, and thus the old password is what will pass the Membership Provider’s validation check.

The fix was to replace the previous code with:

        public AccountMembershipService(MembershipProvider provider)
        {
            _provider = provider ?? new MyMembershipProvider();
        }

This forces a new membership provider instance to be created for that web request, and thus guarantees that the database context will not be holding a stale user entity.

Hopefully this saves someone from the irritation and wasted time that I went through.


Update: I have written another post adding on to the issues I wrote about here.

Keeping Asp.NET MVC Controller Constructors Clean In a Dependency Injection World – Part 2

Previously I wrote a post an article about a way to keep Asp.net MVC constructors clean with heavy dependency injection. In that post, which can be found here, proposed creating the following class to resolve dependencies on demand:

public class WindsorServiceFactory : IServiceFactory
{
    protected IWindsorContainer _container;

    public WindsorServiceFactory(IWindsorContainer windsorContainer)
    {
        _container = windsorContainer;
    }

    public ServiceType GetService<ServiceType>() where ServiceType : class
    {
        // Use windsor to resolve the service class.  If the dependency can't be resolved throw an exception
        try { return _container.Resolve<ServiceType>(); }
        catch (ComponentNotFoundException) { throw new ServiceNotFoundException(typeof(ServiceType)); }
    }
}

This seemed like a good idea in theory, even though I got told by people on here and on Ayende’s blog that it was a bad idea. After using this in a project, I can now agree that this is a bad idea.

Dependency Ambiguity

One problem is that it doesn’t make it clear what dependencies a class has, and those ambiguities are hidden behind the implementation details. At first this doesn’t seem like a big deal, but it can cause confusion down the road. An example of this is when you have a method that retrieves data about a post in your blog system. When you write a unit test this might fail on a private blog because it makes a call to another class to do a security check (make sure the user has access to the blog or post). However, there is no way to know what service class is going to be used for this call without knowing the implementation details of how the IServiceFactory.GetService() call is made. Instead, when this service class interface is passed in via IoC into the constructor, it is clear to any outside users what service interface is required to use.

IoC Testing

Another (major in my opinion) issue is you cannot easily test that ALL dependencies can be resolved. In order to unit test that all of my dependencies can be resolved by IoC, I have to explicitly create a unit test for each type of class I *could* possibly use in my business layer or presentation layer. I say could because since these dependencies are not being passed in through constructors I don’t know exactly what IServiceFactory.GetService calls will be made. The only way to be 100% sure I added everything to my IoC is by manually running through every path of my application. This is error pron and just bad.

Instead, when you pass dependencies into your constructors, testing that ALL dependencies can be resolved is simple. All you have to do is create one unit test per MVC constructor that looks like:


[TestMethod]
public void Windsor_Can_Resolve_HomeController_Dependencies()
{
	// Setup
	WindsorContainer container = new WindsorContainer();
	container.Kernel.ComponentModelBuilder.AddContributor(new SingletonLifestyleEqualizer());
	container.Install(FromAssembly.Containing<HomeController>());

	// Act
	container.Kernel.Resolve(typeof(HomeController));
}

This will not only verify that all my HomeController dependencies can be resolved, but that all child and grandchild dependencies can be resolved as well. This is MUCH less error prone.

So What Now

So now we come to the question that brought me here in the first place, how do I control constructor bloat when using dependency injection? After thinking about it, it really feels like I have been trying to squash an ant by using a bulldozer. The REAL problem with constructor bloat is that your MVC controllers are doing too much. Once you begin to realize this, the stupidity of my ServiceFactory stuff comes clear. Constructors doing too much have more issues than just constructor bloat, it also makes it hard to navigate through controller actions.

So the real solution to the problem? Utilize areas and spread out your constructor responsibilities among multiple MVC controllers instead of trying to fit all the actions into one controller, even if they seem vaugely related. This will keep your MVC application much more maintainable in the long run, and easier to debug.

A ViewModel Based Data Access Layer – Persisting Data

So far in my design of a Data Access Layer that I outlined in my last two posts dealt with retrieving data from the database, so now I want to explore the second half of a data access layer, how you persist data in the database.

Facade Of Simplicity

Contrary to what the repository pattern would have you believe, saving data in a data store isn’t always a simple process that is the same in all situations. Sometimes persisting data in the database requires special optimizations that are only valid in specific scenarios. To illustrate this, let’s again look at a process of saving a question at Stack Overflow with a traditional ORM.

When a user wants to add a question the process is simple, we just create a new instance of the question POCO and tell our ORM to save it to the database. We want to save all of it’s data because it’s all new. However what about when the use wants to edit his question? In all ORMs that I know of you must run a query to retrieve the record from the database, populate your POCO with the data, save the changes the user made to the question, then send the modified object back to the database for saving.

This has several inefficiencies to it. For starters it requires you to completely retrieve the data for the question from the database when none of it is actually relevant to us. To save the edits for a question we don’t care about how many upvotes it has, how many downvotes, the original date of posting, the user who made the original post of the question, we don’t even care what the original question was. All we care about is the new title, text, and tags for the question and we want those persisted, so why retrieve all that data. This may seem insignificant but when your application gets a lot of traffic this can cause a lot of repeated and unneeded traffic between your database and webserver. Since the database is usually the bottleneck in a web application, and there are situations you can be in with Sql Azure where you pay for DB bandwidth, this can end up costing you in the long run. Also consider the effect of having to mass update multiple records.

Most ORMs (or any one worth it’s salt) have some methods around this. The first usually involves creating a new POCO object with the modified data and telling the ORM that it should save the object as is. This is usually bad because it requires the POCO to already know all the data it probably shouldn’t, such as the number of upvotes, date it was created, etc.. If any of these properties aren’t set, then using this method will cause them to be nulled or zeroed out and will most likely cause data issues. It is very risky to do this, at least with Entity Framework. Another way around the inefficiency is to tell the ORM to use straight SQL to update the record, thus bypassing the the safety-net and security of ORM.

Both of these methods have their individual situations where they are beneficial, but rarely do you want to use one or the other all the time. Trying to abstract each of these situations into a data access layer that is database agnostic isn’t simple.

Persisting Data

So now the question becomes, how can I persist data in my ViewModel based DAL. To do this I have come up with the following interface:

public interface INonQuery<TViewModel, TReturn>
{
	TReturn Execute(TViewModel viewMode);
}

This interface states that you want to take the data from a specific view model and save it to the database, with a requested return type. This allows the developer to do performance optimizations on a situation by situation basis, but if need-be they can keep the implementation consolidated until they feel they need that optimization without breaking the application layer using the API

A ViewModel Based Data Access Layer – Optionally Consolidated Query Classes

After thinking upon my data access pattern outlined in my last post, I came up with the following interface to use for queries:

public interface IQuery<TViewModel, TCriteria>
{
	TViewModel Execute(TCriteria criteria);
}

I am very happy with this idea, as it seems to have multiple advantages over current data access patterns I have evaluated.

Optionally Consolidated Queries

One advantage that I see of this pattern is that it’s trivial to combine similar queries into one class, while keeping the business layer ignorant of the grouping.

To show this let’s use the example of Stack Overflow. If you look at the homepage and your user’s page you will notice that in these pages there are 2 queries that are retrieving data for a specific user, but each query returns different information. The homepage only requires the user’s username, reputation, and badge data. However, when you view a user’s page it needs that to query for that information as well as questions and answers related to the user. Even though both queries deal with retrieving data for a specific user it would be inefficient to use the latter query for the homepage, as it would have to hit multiple tables when that data isn’t used.

An example of creating an MVC controller that uses my ViewModel DAL would be:

public class ExampleController : Controller
{
	protected IQuery<UserHomepageViewModel, UserByIdCriteria> _userHomepageQuery;
	protected IQuery<UserDashboardViewModel, UserByIdCriteria> _userDashboardQuery;
	protected int _currentUserId;
	
	public ExampleController(
		IQuery<UserHomepageViewModel, UserByIdCriteria> userHomepageQuery,
		IQuery<UserDashboardViewModel, UserByIdCriteria> userDashboardQuery)
	{
		_userHomepageQuery = userHomepageQuery;
		_userDashboardQuery = userDashboardQuery;
		_currentUserId = (int)Membership.GetUser().UserProviderKey;
	}
	
	public ActionResult Index()
	{
		var criteria = new UserByIdCriteria { UserId = _currentUserId };
		var model = _userHomepageQuery.Execute(criteria);
		return View(model);
	}
	
	public ActionResult UserDashboard()
	{
		var criteria = new UserByIdCriteria { UserId = _currentUserId };
		var model = _userDashboardQuery.Execute(criteria);
		return View(model);
	}	
}

As far as the programmer in charge of this controller is concerned, two completely separate classes are used for these queries. However, the developer can save some effort by implementing these queries into one consolidated class. For example:

public class UserByIdQueries : IQuery<UserHomepageViewModel, UserByIdCriteria>, IQuery<UserDashboardViewModel, UserByIdCriteria>
{
	protected DbContext _context;
	
	public UserByIdQueries(DbContext context)
	{
		_context = context;
	}
	
	public UserHomepageViewModel Execute(UserByIdCriteria criteria)
	{
		var user = GetUserById(criteria.UserId, false);
		return Mapper.Map<User, UserHomepageViewModel>(user);
	}
	
	public UserDashboardViewModel Execute(UserByIdCriteria criteria)
	{
		var user = GetUserById(criteria.UserId, true);
		return Mapper.Map<User, UserDashboardViewModel>(user);
	}
	
	protected User GetUserById(int id, bool includeDashboardData)
	{
		var query = _context.Users.Where(x => x.Id == id);
		
		if (includeDashboardData)
			query.Include(x => x.Questions).Include(x => x.Answers);
			
		return query.SingleOrDefault();
	}
}

To me, this gives the perfect balance of easily retrieving data from the DAL based on how I am going to use the data and still give me full flexibility on how I organize and create the DAL’s implementation.

Design of a ViewModel Based Data Access Pattern

I have been doing a lot of reading about data access pattern, and I haven’t really been happy with them. Since then I have been doing a lot of thinking about a good way to do data access.

The Repository Pattern

The repository pattern seems to be the most popular data access pattern. The problem I have with this pattern is that it is very rigid for querying. Generic repositories are pointless in my opinion as the business logic has to have intimate logic of the data layer in order to make efficient use of it, and if you are tying the two together you might as well deal with the data access manually in your business layer.

Non-generic repositories get tricky as you get into more complex querying scenarios. Most business processes your application will go through requires different information, even if the base entity it needs is the same. If you think about StackOverflow as an example, on one screen you need to query for the user’s information and all related questions but if you click on the “responses” link now you need the user’s information with all responses the user has made. The only way to do this without relying on lazy loading (which you don’t want as lazy loading can too easily become a performance nightmare) is to create your repository with one method per query. You will then end up with something like:

public interface IUserRepository
{
	User GetUserById(int id);
	User GetUserWithResponses(int id);
	User GetUserWithQuestions(int id);
	
	// etc...
}

Now say you need to implement a query that retrieves the user’s latest activity, which combines information from their questions and responses. You can’t user the previous two methods and you will have to implement yet another. This can very easily get out of hand as you require more and more types of queries. It also makes it confusing to determine which repository a query should be in (for example, if you want all the questions for a user is that in the User repository (user with associated questions) or in the questions repository (questions by user))?

Query Objects

Another pattern for querying is using Query Objects. I first learned about these from the 2nd half of this post by Ayende. The essence of this is that each class is for a single query, and to get the information you need you just call the Execute() method of the class required. While you now have a lot classes rather than a lot of methods, I find classes easier to organize and look for than repository methods.

The problem is that you still have to keep track of how you name your query classes, and remember exactly what data is returned by each query class. How do you distinguish between a query class that returns a user with his questions versus a query class that returns a user with his activity. It’s hard to do without keeping a very strict naming convention, and even then it can get a bit confusing.

Querying By ViewModel

Let’s stop thinking about the code for a second and think of the actual business/application logic of what we are trying to do. All we (or at least I) really want is to run a query to retrieve specific data. When working with my UI or business logic I don’t care about the specifics of how that happens, I just want the database to give me the information I need.

My idea is a slight extension of query objects. While each class is for one specific query, that class is not only for retrieving data from the database, but for placing that data in a specific non-persisted data class (aka a view model). The returned class doesn’t have to resemble the database’s data model at all, and thus this would give us total flexibility to retrieve data exactly as how our application wants it rather than relying on how the database stores it.

I would accomplish this via the following interface:

public interface IQuery<TViewModel>
{
	TViewModel Execute();
}

Now, let’s say I want to retrieve the user’s info with his questions. My UI might use the following view model

public class UserQuestionsPageViewModel
{
	public int UserId { get; set; }
	public string Name { get; set; }
	public IList<QuestionSummaryViewModel> Questions { get; set; }
}

The idea is that my Asp.Net MVC’s controller code to run this query would look like:

public class UserController : Controller
{
	protected IQuery<UserQuestionsPageViewModel> _query;
	
	public Usercontroller(IQuery<UserQuestionsPageViewModel> query)
	{
		_query = query;
	}

	public ActionResult UserQuestions(int userId)
	{
		var model = _query.Execute();
		return View(model);
	}
}

The interesting aspect of this is that we don’t care about how we named the query object, that part is already covered by dependency injection. All we have to specify is that we have a specific view model and we want to fill it with data from our data store, and the query commences. This makes our data access very simple and intuitive to use, and minimizes how many layers it takes to get data from the database into the view model for the UI.

Criteria

However, there is one weakness about this pattern. How do you specify your query criteria. In the previous example we still need some way to specify exactly which user we want to run the query for. I don’t know the best way to overcome this issue. best solution I have come up with so far is to have a super criteria object, that holds all the possible query criteria in one structure. An example of this would be:

public struct QueryCriteria
{
	public UserCriteria User { get; private set; }
	public QuestionCriteria Question { get; private set; }

	public struct UserCriteria
	{
		public int? ByIdNumber { get; set; }
		public string ByName { get; set; }
	}
	
	public struct QuestionCriteria
	{
		public int? ByIdNumber { get; set; }
		public string ByQuestioningUserId { get; set; }
	}
}

This structure would be created, the requested criteria would be set, and then it would be passed into the query’s Execute() method. The individual queries would then look for any criteria being set that’s relevant to them and retrieve the data using that criteria. If no criteria was set that the query is written to support it will throw an exception. One good thing about this method is that you now have one place where all query criteria can be found, which can give you a good reference for how queries are done in the system.

Final Thoughts

This is somewhat similar to a previous post I made, but after writing that post I realized the solution presented in that previous post was convoluted and needlessly complicated. This seems like a much overall better, and simpler, solution to allow many complicated queries in an organized fashion.

Why Entity Framework 4.1’s Linq capabilities are inferior to Linq-to-Sql

I have a small-ish website at work that contains several tools we use internally. I originally coded the database layer using Linq-to-Sql, as it was too small to overcome the learning curve of NHibernate and Entity Framework version 4 was not out at the time.

However, lately database schema changes have been harder to incorporate into the Linq-to-Sql edmx, mostly due to the naming differences between database tables and relationships and their C# class names. I decided to convert my database layer to Entity Framework 4.1 CodeFirst based on an existing database, as I use EF CodeFirst in a current project and already familiar with it.

The process of converting my Linq-to-sql was *supposed* to be a simple matter of replacing my calls to my L2S data context to my EF DbContext, and allow Linq to take care of the rest. Unfortunately, this is not the case due to Entity Framework’s Linq capabilities being extremely limited.

Very Limited Function Support

In Linq-to-Sql it was common for me to perform transformations into a view model or an anonymous data type right in the linq. Two examples of this in my code are:

var client = (from c in _context.Clients
                where c.id == id
                select ClientViewModel.ConvertFromEntity(c)).First();

var clients = (from c in _context.Clients
                orderby c.name ascending
                select new
                {
                    id = c.id,
                    name = c.name,
                    versionString = Utils.GetVersionString(c.ProdVersion),
                    versionName = c.ProdVersion.name,
                    date = c.prod_deploy_date.ToString()
                })
                .ToList();

These fail when run in Entity Framework with NotSupportedExceptions. The first one fails because it claims EF has no idea how to deal with ClientViewModel.ConvertFromEntity(). This would make sense, as this is a custom method, except this works perfectly in Linq-to-Sql. The 2nd query fails not only for Utils.GetVersionString(), but it also fails because EF has no idea how to handle the ToString() off of a DateTime, even though this is all core functionality.

In order to fix this, I must return the results from the database and locally do the transformations, such as:

var clients = _context.Clients.OrderBy(x => x.name)
                              .ToList()
                              .Select(x => new
                              {
                                  id = c.id,
                                  name = c.name,
                                  versionString = Utils.GetVersionString(c.ProdVersion),
                                  versionName = c.ProdVersion.name,
                                  date = c.prod_deploy_date.ToString()
                              })
                              .ToList();

No More Entity to Entity Direct Comparisons

In Linq-to-Sql I could compare one entity directly to another in the query. So for example the following line of code worked in L2S:

context.TfsWorkItemTags.Where(x => x.TfsWorkItem == TfsWorkItemEntity).ToList();

This fails in Entity Framework and throws an exception because it can’t figure out how to compare these in Sql. Instead I had to change it to explicitly check on the ID values themselves, such as:

context.TfsWorkItemTags.Where(x => x.TfsWorkItem.id == tfsWorkItemEntity.id).ToList();

It’s a minor change, but I find it annoying that EF isn’t smart enough to figure out how to compare entities directly, especially when it has full knowledge of how the entities are mapped and designed. Yes, I could have gone straight through using TfsWorkitemEntity.Tags, but this is a simple example to illustrate lost functionality.

Cannot Use Arrays In Queries

This issue really caught me by surprise, and I don’t understand why this was omitted. In my database versions consist of 4 parts: major, minor, build, and revision numbers, and is usually represented in string form as AA.BB.CC.DD. I have a utility method that converts the string into an array of ints, which I then used in my Linq-to-Sql query:

int[] ver = Utils.GetVersionNumbersFromString(versionString);
return context.ReleaseVersions.Any(x => x.major_version == ver[0] && x.minor_version == ver[1]
                                    && x.build_version == ver[2] && x.revision_version == ver[3]);

Under L2S, this query works fine, but (as the common theme in this post) fails in Entity Framework 4.1 with a NotSupportedException with the message “The LINQ expression node type ‘ArrayIndex’ is not supported in LINQ to Entities.”.

In order to fix this I had to split my version into individual ints instead.

Dealing With Date Arithmetic

My final issue I have come across is probably the most annoying, mostly because I can’t find a good solution except to do most of the work on the client side. In my database I store test requests, and those requests can be scheduled to run at a certain time on a daily basis. This scheduled time is stored in a time(7) field in the database. When I query for any outstanding test requests one of the criteria in the query is to make sure that the current date and time is greater than today’s date at the scheduled time. In L2S I had:

var reqs = _context.TestRequests.Where(x => DateTime.Now > (DateTime.Now.Date + x.scheduled_time.Value)).ToList();

This fails in Entity Framework for 2 reasons. The first is that DateTime.Now.Date isn’t supported, so i had to create a variable to hold that and use that in the query.

The 2nd issue is then that EF can’t make a query out of the current date added to a specific time value. This causes an ArgumentException with the message “DbArithmeticExpression arguments must have a numeric common type.”.

I have not found a way to do this in EF, and instead had to resort to pulling a list of all TestRequest entities from the database, and locally pull out only the ones that fit that criteria.

Conclusion

I am utterly baffled by how limited Entity Framework is in it’s Linq abilities. While I will not go back to L2S for my internal tool I am definitely having second thoughts of using EF 4.1 in my more complex personal projects. It definitely seems that Microsoft is going more for feature count in their frameworks rather than functional coverage lately.

Keeping Asp.NET MVC Controller Constructors Clean In a Dependency Injection World

Update: After playing around with the concepts I talked about in this post I now pretty much disagree with what I wrote here. To see why, see part 2.


In my experience, dependency injection has proven to be invaluable when creating applications. It has allowed me to easily follow test driven development practices, be confident that my application functions as expected, and it gives me confidence that any bugs I had previously found do not crop up again.

However, one trend that I have noticed is that as my Asp.NET MVC application becomes more complex, my controller constructors are becoming bloated with all of the service objects that the controller might use. I say might because not all the service objects are being used in all actions, but they must be passed into the controller’s constructor to facilitate unit testing. This issue of constructor bloat is made worse due to my business/service layer being made up of many small query and command classes. Here is an example of one of my controllers:


public class ContactController : MyAppBaseController
{
    public ContactController( ContactsByCompanyQuery contactsByCompanyQuery,
                                        ContactByIdQuery contactByIdQuery,
                                        CreateContactCommand createContactCommand,
                                        EditContactCommand editContactCommand,
                                        ISearchProvider searchProvider,
                                        IEmailProvider emailProvider)
    { 
        // copy parameters into variables
    }
}

As you can see from this example, as I add more actions to my controller and need more service classes I have to add more parameters into my controller, which also means I have to update all unit tests to use this new parameter when instantiating the controller. This also strikes me as inefficient as it requires the inversion of control system to always instantiate all service classes even when the executing MVC action may only need one or two.

I needed a system that would allow me to minimize the number of parameters in my controller constructors while still giving me full flexibility with dependency injection and retaining the ability to unit test my controllers and mock my service classes. After thinking about this problem for a bit I came up with the idea of creating a factory class that instantiates service classes on demand directly from the IoC system. As I use Castle Windsor for IoC I created the following factory class:

    public class WindsorServiceFactory : IServiceFactory
    {
        protected IWindsorContainer _container;

        public WindsorServiceFactory(IWindsorContainer windsorContainer)
        {
            _container = windsorContainer;
        }

        public ServiceType GetService<ServiceType>() where ServiceType : class
        {
            // Use windsor to resolve the service class.  If the dependency can't be resolved throw an exception
            try { return _container.Resolve<ServiceType>(); }
            catch (ComponentNotFoundException) { throw new ServiceNotFoundException(typeof(ServiceType)); }
        }
    }

It turned out to be a much simpler solution than I thought it would be. Now my constructors look like

public class ContactController : MyAppBaseController
{
    protected IServiceFactory _serviceFactory;

    public ContactController (IServiceFactory factory)
    {
        _serviceFactory = factory;
    }

    public ActionResult Search(string query)
    {
        var results = _serviceFactory.GetService<ISearchProvider>().Search(query);
        return View(results);
    }
}

This is much cleaner in my opinion and it also allows me to completely ignore my constructors even as I add more functionality into my controllers.

I am also able to easily use this in my unit tests with mocks. Here is an example using Moq:

[TestMethod]
public void Can_Search_Contacts()
{
    // Setup
    var factory = new Mock<IServiceFactory>();
    var searchProvider = new Mock<ISearchProvider>();
    factory.Setup( x => x.GetService<ISearchProvider>()).Returns(searchProvider.Object);
    var controller = new ContactController(factory.Object);
    
    // Act
    var result = controller.Search();

    // Add verifications here
}

Performing a fuzzy search with multiple terms through multiple Lucene.Net document fields

Recently I have been trying to implement Lucene.Net into my web application to allow users to search through their data. My core requirements were that the search allows minor mis-spellings and that search terms can can be found across different fields. The code to accomplish this wasn’t immediately obvious, and it took me a few days and a few questions posted to StackOverflow to finally figure out how to do it. Therefore, I thought it would be beneficial to others to show how to accomplish this.

To illustrate my scenario let’s assume I am searching for data indexed based on the following class:

public class Contact
{
    public string FirstName { get; set};
    public string LastName { get; set;};
}

When I index this structure in Lucene I decided to store the FirstName and LastName properties into separate fields in the Lucene document. This is to allow more advanced searching at a later time.

So for this scenario I want to allow a user to search for a contact with the name “Jon Doe”, but also allow the user to find the contact by using the search strings “John Doe”, “Jon”, or “Doe”.

The first thing that has to be done is to use a MultiFieldQueryParser() which allows a Lucene.net to search for the search terms in all of the specified fields of a document. The MultiFieldQueryParser can be used by the following code:

public SearchResults Search(string searchString)
{
            // Define fields in the document to search through
            string[] searchableFields = new string[] { "FirstName", "LastName" };
    
            // Generate the parser
            var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, searchfields, new                                                 StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
            parser.SetDefaultOperator(QueryParser.Operator.AND);

            // Perform the search
            var query = parser.Parse(searchString);
            var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
            var searcher = new IndexSearcher(directory, true);
            var hits = searcher.Search(query, MAX_RESULTS);

            // Return results
}

The parser will split up all the terms of the search string and look for each term individually in all of the specified fields (I used an AND comparison with QueryParser.Operator.AND so that all search terms must be found for a document to be found). This code will allow me to perform successful searches with a search string of “Jon”, “Jon Doe”, and “Doe” but will not allow me to search for “John” as this method does not contain any code to allow fuzzy searches.

Implementing fuzzy search turned out to be the confusing part. The Lucene documentation’s only explanation for how to perform a fuzzy search is to add a tilde(~) to the end of a search term. With this knowledge, my first attempt was to add a tilde to the end of each word in the search query string. Unfortunately, it turns out that when Lucene is given multiple words in a search string and each word has a tilde at the end, a fuzzy search is only performed for the last word in the query, all others seem to be ignored and must be exact matches to get a hit.

After asking around it turns out that in order to allow each and every word to be non-exactly matched in the search you have to create a separate Lucene query for each term in the search string, and combine all the queries together using the BooleanQuery class. My implementation of this is:

public SearchResults Search(string searchString)
{
            // Setup the fields to search through
            string[] searchfields = new string[] { "FirstName", "LastName" };

            // Build our booleanquery that will be a combination of all the queries for each individual search term
            var finalQuery = new BooleanQuery();
            var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, searchfields, CreateAnalyzer());

            // Split the search string into separate search terms by word
            string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
            foreach (string term in terms)
                finalQuery.Add(parser.Parse(term.Replace("~", "") + "~"), BooleanClause.Occur.MUST);

            // Perform the search
            var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
            var searcher = new IndexSearcher(directory, true);
            var hits = searcher.Search(finalQuery, MAX_RESULTS);
}

The BooleanClause.Occur.MUST ensures that all term queries must produce a match in order for the document to be a hit for the search.

The purpose of the term.Replace("~", "") + "~" code is to add a tilde to the end of the search term so Lucene knows to perform a fuzzy search with this term. We must remove any tildes the user entered in the search string to prevent an exception that will occur if a search term ends with two tildes (e.g. “John~~”).

This will successfully allow you to perform a fuzzy search with multiple terms across multiple document fields. The algorithm does eventually need to be improved to not split out search terms that are inside of quotations (to signify a search for a phrase) but I hope this helps others figure out fuzzy searching more quickly than it took me.

Allowing Eager Loading In Business Logic via View Models

Today I am going to write a post about a library I have been thinking about for the past week. If I end up actually developing it I plan to make it open source as I think it solves an issue that a lot of systems may encounter in regards to eager loading database entities, and dealing with a separation of the data entities from the domain model. I am going to use this post to bring all of my thoughts together to make sure what’s in my head actually makes sense, as well as to get any feedback (if this is actually read by anyone but me)

The Problem

My web application is currently using the Entity Framework 4.1 CodeFirst system. The original issue came about when I was looking at document-oriented databases and the possibility of switching my backend to RavenDB. It wasn’t a serious idea, as it would be a huge hassle to convert the production data from sql server to a document oriented database for little benefit (at this time at least). However, it did get me to realize that even though my architecture allows me to change database backends and ORMs pretty painlessly, it does not allow me to change my data model without changing my business entity model to match. This is mostly due to my use of Linq to specify queries in my business layer.

My thoughts then went to switching to use a repository pattern for data access. The repository pattern would allow me to more easily organize my queries, abstract away data access from the business layer, and would mean that my data model could be completely different from my business entity model and changes can be made to one without affecting the other. Unfortunately, the repository pattern has a big issue when it comes to eager loading in that it usually ends up with many repeating methods, such as:

public class EFUserRepository : IUserRepository
{
    public User GetUserById(int id) { }
    public User GetUserByIdWithContacts(int id) { }
    public User GetUserByIdWithContactsWithCompanies(int id) { }
    ....
}

I have not found a better solution to handle eager loading with the repository pattern and the previous examples has too much repeating code for my liking.

The VMQA (View Model Querying Assistant) Library

After thinking about this problem for a bit, I later realized that what I ultimately want is to pass a view model data structure into a repository query method, for the repository to come up with (and execute) an eager loading strategy based on the view model, and then to map all the properties of the view model with the data retrieved from the database. To accomplish this I want to rewrite my repository methods to look something like:

public class EFUserRepository : IUserRepository
{
    public T GetUserById<T>(int id) where T : class 
    { 
        // Form the query
        var query = _context.Users.Where(x => x.Id == id);

        // Determine the eager loading strategy
        var includeStrings = vmqa.GetPropertiesToEagerLoadForClass(T).ConvertToEFStrings();
        foreach (string inc in includeStrings)
            query.Include(inc);

        // Execute the query and map results into the specified data structure
        var user = query.First();

        // Return an instance of the view model with the data filled in from the user
        return vmqa.MapEntityToViewModel<User, T>(user);
    }
}

The design I have floating around in my head to accomplish this has two parts.

Eager Loading

The first part is coming up with an eager loading strategy based on the specified data structure, a data structure which the library has no previous knowledge of. I envision this working by building a map of how all the data entities fit together with an interface that is similar to how configuring EF CodeFirst models is done. An example mapping would be:

public void MapEntities()
{
    vmqa.Entity<User>().HasMany(typeof(Contact), x => x.ContactsProperty);
    vmqa.Entity<Contact>().HasOne(typeof(Company), x => x.CompanyProperty);
}

This would tell the system that the User entity maps to Contacts via the User.ContactsProperty.

In my repository I would then call upon the VMQA library to look at the data structure and resolve the eager loading strategies. To allow the VMQA library to understand how to map an arbitrary view model data structure to a data entity, I would use custom attributes that would decorate a class such as:

[MapsToEntity(typeof(User))]
public class UserViewModel
{
    public int Id { get; set; }
    public string Name { get; set; }

    [MapsToEntity(typeof(Contact))]
    public IList Contacts { get; set; }
}

Using reflection, the vmqa.GetPropertiesToEagerLoadForClass should be able to figure out that the core entity maps to the User, that Contacts collection maps to the Contact. Since it knows that the view model explicitly specifies a view model for the Contacts entity it can determine that the Contacts property needs to be eager loaded, and will then return a data structure containing the Contact type and the name of the property it maps to from the User entity.

It should be easy to allow this system to allow complex mappings, so if the user view model has a property that maps to a collection of companies, the system can automatically find a path from User to for eager loading.

The ConvertToEFStrings() extension method would then convert that data structure into a collection of strings that Entity Framework can use for eager loading. This would be an extension method so that the library is not restricted to just Entity Framework, and can be used for any ORM or database system.

Entity Mapping

The second part of the whole system deals with automatically mapping the returned data entities into the view model. This seems like a natural extension of the library to eager loading to me, as if you are passing the view model into the repository to describe what data you want to retrieve, it should automatically copy that data into the view model for your usage. No matter what, the developer is going to want to do this anyway and it seem sensible to automatically do this with VMQA, since it already has knowledge about the relationships.

My first version of the library would probably do a straight name for name mapping, so that UserViewModel.Name maps to User.Name. Eventually I would probably add property mapping attributes so more complex mappings can be performed.

Conclusion

After finally getting this design written down, I feel more confident than ever that this would be an extremely valuable library to have, and that it actually seems relatively straight-forward to create. Now I just need the time!