ADO.Net Entity Framework is a different way of looking at persistence than most of us are used to. It wants us to do things like add objects/entities to our data context instead of saving them, then it doesn’t save those objects to the database until we call SaveChanges(). We don’t directly save specific entities but instead EF tracks the entities we’ve loaded and then saves changes to db for any entities that it thinks have changed. My first reaction when I realized how different these concepts were from my standard way of saving data was that I hated it (this actually took place with LINQ to SQL which I still don’t care for due to the way it handles sprocs). But the promise of rapid application development and more maintainable code kept me coming back. I started reading up on architectures using ORMs (mostly in the Java world) and I discovered that most of the things I initially didn’t like about Entity Framework and LINQ to SQL are actually accepted design patterns from the ORM world that have been developed by people much smarter than me who have been working for years to solve the Impedance Mismatch problem. So I thought it might be helpful to talk about some of these patterns and how they are handled by Entity Framework. The first one we’ll look at is Identity Map.
Identity Map Definition
In Martin Fowler’s book Patterns of Enterprise Application Architecture, he defines Identity Map with the following two phrases:
Ensures that each object gets loaded only once by keeping every every loaded object in a map. Looks up objects using the map when referring to them.
So what does this mean? It’s probably better to demonstrate than to explain, so let’s look at the characteristics of Identity Map through some code examples.
There Can Be Only One
Let’s start by looking at the other way of doing things. This is the non-Identity Map example. If we have an app that uses a simple persistence layer that does a database query, and returns to us a DataTable we might see code like the following:
DataTable personData1 = BAL.Person.GetPersonByEmail("bill@gates.com");
DataTable personData2 = BAL.Person.GetPersonByEmail("bill@gates.com");
if (personData1 != personData2)
{
Console.WriteLine("We have 2 different objects");
}
In this example, personData1 and personData2 both contain separate copies of the data for person Bill Gates. If we change the data in personData2, it has no effect on personData1. They are totally separate objects that happen to contain the same data. If we make changes to both and then save them back to the database there is no coordination of the changes. One just overwrites the changes of the other. Our persistence framework (ADO.Net DataTables) just doesn’t know that personData1 and personData2 both contain data for the same entity. The thing to remember about this scenario is that multiple separate objects that all contain data for the same entity, lead to concurrency problems when it’s time to save data.
Now let’s look at the Identity Map way of doing things. Below, we have some ADO.Net Entity Framework code where we create two different object queries that both get data for the same person, and then we use those queries to load three different person entity objects.
EFEntities context = new EFEntities();
var query1 = from p in context.PersonSet
where p.email == "bill@gates.com"
select p;
Person person1 = query1.FirstOrDefault<Person>();
Person person2 = query1.FirstOrDefault<Person>();
var query2 = from p in context.PersonSet
where p.name == "Bill Gates"
select p;
Person person3 = query2.FirstOrDefault<Person>();
if (person1 == person2 & person1==person3)
{
Console.WriteLine("Identity Map gives us 3 refs to a single object");
}
person1.name = "The Billster";
Console.WriteLine(person3.name); // writes The Billster
When I run the code above, all 3 entities are in fact equal. Plus, when I change the name property on person1, I get that same change on person3. What’s going on here? They’re all refs to a single object that is managed by the ObjectContext. So Entity Framework does some magic behind the scenes where regardless of how many times or how many different ways we load an entity, the framework ensures that only one entity object is created and the multiple entities that we load are really just multiple references to that one object. That means that we can have 10 entity objects in our code and if they represent the same entity, they will all be references to the same object. The result is that at save time we have no concurrency issues. All changes get saved. So how does this work?
Every entity type has a key that uniquely identifies that entity. If we look at one of our Person entities in the debugger, we notice that it has a property that Entity Framework created for us named EntityKey. EntityKey contains a lot of information on things like what key values our entity has (for our Person entity the key field is PersonGuid), what entity sets our entity belongs to, basically all the information Entity Framework needs to uniquely identify and manage our Person entity.
The EntityKey property is used by the ObjectContext (or just context) that Entity Framework generates for us. In our example the context class is EFEntities. The context class does a number of things and one of them is maintaining an Identity Map. Think of the map as a cache that contains one an only one instance of each object identified by it’s EntityKey. In fact, you will probably never hear the term Identity Map used. Most .Net developers just call it the object cache, or even just the cache. So, in our example, when we get person1 from our context, it runs the query, creates an instance of person (which the context knows is uniquely identified by field PersonGuid), stores that object in the cache, and gives us back a reference to it. When we get person2 from the context, the context does run the query again and pulls data from our database, but then it sees that it already has a person entity with the same EntityKey in the cache so it throws out the data and returns a reference to the entity that’s already in cache. The same thing happens for person3.
Quiz: What Happens To Cached Entities When the Database Changes?
So here’s a question. If we run the code sample above that loads person1, person2, and person3 from our context, but this time we use a break point to pause execution right after we load person1, then we manually update the database by changing the phone_home field on Bill Gates’ record to “(999) 999-9999”, then we continue executing the rest of our code. What value will we see for phone_home when we look at person1, person2, and person3? Will it be the original value, or the new value? Remember that all 3 entities are really just 3 references to the same entity object in the cache, and our first db hit when we got person1 did pull the original phone_home value, but then the queries for person2 and person3 also hit the database and pulled data. How does Entity Framework handle that. The answer is shown in the debugger watch window below. It throws the new data out.
This can lead to some really unexpected behavior if you don’t know to look for it, especially if you have a long running context that’s persisted and used over and over for multiple requests. It is very important to be thinking about this when you’re deciding when to create a context, how long to keep it running, and what you want to happen when data on the backend is changed. There is a way to modify this behavior for individual queries by setting the ObjectQuery.MergeOption property. But we still need to remember and plan for this default behavior.
If There’s a Cache, Why Am I Hitting The Database?
Remember the second part of Martin Fowler’s definition where he said that the Identity Map looks up objects using the map when referring to them? The natural question that comes to mind is, if I’m loading an object that already exists in my cache, and Entity Framework is just going to return a reference to that cached object and throw away any changes it gets from the database query, can’t I just get the object directly from my cache and skip the database query altogether? That could really reduce database load.
Unfortunately the answer is kind of, but not really. In Entity Framework v1, you can get an entity directly from the cache without hitting the database, but only if you use a special method to get the entity by it’s EntityKey. Having to use the EntityKey is a big limitation since most of the time you want to look up data by some other field. For example, in a login situation I need to get a person entity by email or username. I don’t have the PersonGuid. I’m hoping that we get more options for loading entities from the cache in v2 but for now, if you do have the key field, this is how you do it:
Guid billsGuid = new Guid("0F3087DB-6A83-4BAE-A1C8-B1BD0CE230C0");
EntityKey key = new EntityKey("EFEntities.PersonSet", "PersonGuid", billsGuid);
Person bill = (Person)context.GetObjectByKey(key);
There are a couple of things I want to point out. First, when we creating the key, the first parameter we have to give is the entity set name that we’re pulling from and this name must include the name of our ObjectContext class. Second, you’ll notice that GetObjectByKey() returns an Object, so we did have to cast the return value to Person.
Conclusion
So that’s one pattern down. Hopefully discussing some of these differences in approaching persistence helps ease your transition to using Entity Framework a bit. Next time we’ll cover another key pattern, Unit of Work.