nHibernate performance issue when loading large collections

| | August 8, 2015

(just to make clear: my app isn’t really about employees and departments. I just use these terms for example’s sake).
Each department has an employees collection, which is lazily-loaded. Whenever I add a new employee I want to make sure It doesn’t already exist in the collection, so I load the collection to memory and perform the check on it.
Problem is- in production environment, I have some departments with 10,000+ employees.
I found that fetching the collection and then saving the new employee takes A LOT of time.
I’ve done a little experment, in which I copied the exact same select statement generated by nH to ADO.Net SQLDataAdapter. Here are the results:

***16:04:50:437*** DEBUG NHibernate.SQL - SELECT ... FROM dbo.[Employee] emp0_ left outer join dbo.[Department] department1_ on emp0_.Department_id=department1_.Id left outer join dbo.[TableC] TableC2_ on department1_.TableC_id=TableC2_.Id WHERE emp0_.SomeField_id=@p0;@p0 = 2
***16:05:00:250*** DEBUG NHibernate.SQL - SELECT ... FROM dbo.TableD codeshared0_ left outer join dbo.[Department] department1_ on codeshared0_.Department_id=department1_.Id left outer join dbo.[TableC] TableC2_ on department1_.TableC_id=TableC2_.Id WHERE codeshared0_.Employee_id in (select emp0_.Id FROM dbo.[Employee] emp0_ left outer join dbo.[Department] department1_ on emp0_.Department_id=department1_.Id left outer join dbo.[TableC] TableC2_ on department1_.TableC_id=TableC2_.Id WHERE emp0_.SomeField_id=@p0);@p0 = 2
16:05:04:984 DEBUG NHibernate.SQL - Reading high value:select next_hi from dbo._uniqueKey with (updlock, rowlock)
16:05:05:078 DEBUG NHibernate.SQL - Updating high value:update dbo._uniqueKey set next_hi = @p0 where next_hi = @p1;@p0 = 10686, @p1 = 10685
***16:05:05:328*** DEBUG MyApp.Managers - commiting
16:05:12:000 DEBUG NHibernate.SQL - INSERT INTO dbo.[Employee] (...) VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9);@p0 = 23/04/2011 04:04:49, @p1 = 23/04/2011 03:34:49, @p2 = 23/04/2011 04:04:49, @p3 = 23/04/2011 03:34:49, @p4 = '', @p5 = False, @p6 = 433, @p7 = NULL, @p8 = 2, @p9 = 10685
16:05:12:140 DEBUG NHibernate.SQL - UPDATE dbo.[Employee] SET Department_id = @p0 WHERE Id = @p1;@p0 = 2, @p1 = 10685
16:05:12:343 DEBUG MyApp.Managers - success
16:05:12:359 DEBUG MyApp.Tests - ------------------------------------------------------------
16:05:12:359 DEBUG MyApp.Tests - Finished nHib stuff- now switching to ADO 
16:05:12:359 DEBUG MyApp.Tests - starting SQL: SELECT ... FROM dbo.[Employee] emp0_ left outer join dbo.[Department] department1_ on emp0_.Department_id=department1_.Id left outer join dbo.[TableC] TableC2_ on department1_.TableC_id=TableC2_.Id WHERE emp0_.SomeField_id=2
16:05:14:750 DEBUG MyApp.Tests - total rows received: 10036
16:05:14:750 DEBUG MyApp.Tests - SQL: SELECT ... FROM dbo.TableD codeshared0_ left outer join dbo.[Department] department1_ on codeshared0_.Department_id=department1_.Id left outer join dbo.[TableC] TableC2_ on department1_.TableC_id=TableC2_.Id WHERE codeshared0_.Employee_id in (select emp0_.Id FROM dbo.[Employee] emp0_ left outer join dbo.[Department] department1_ on emp0_.Department_id=department1_.Id left outer join dbo.[TableC] TableC2_ on department1_.TableC_id=TableC2_.Id WHERE emp0_.SomeField_id=2)
16:05:15:250 DEBUG MyApp.Tests - total rows received: 2421

as you can see- fetching takes ~15 secs with nH, compared to ~2 secs with ADO.Net.
From researching around a bit I know that nH probably isn’t meant to be used to store that many items in session. Can you think of any other possible reason for this problem, or of another suggestion other than filtering the Employees at the DB level?

thanks

EDIT
Following the below suggestions i’ve tried using Reflection Optimizer (made no difference), and IStatelessSession for loading my collection (throws an exception- collections cannot be fetched by a stateless session.).
I think my code in the Department class will have to change from the clean:

if (this.Employees.Contains(emp))
{
  ...
}  

to this ‘dirtier’ version:

var employeesRepository = IOCContainer.Get<IEmployeesRepository>();  
if (employeesRepository.EmployeeExists(this,emp))
{
  ...
}  

anyone has a better suggestion?

3 Responses to “nHibernate performance issue when loading large collections”

  1. Nadav the Gahbon on November 30, -0001 @ 12:00 AM

    hmmmm. may be its too much, and actually – i’d expect an “lazy=extra” ISet to behave like this, but you can write your own “extra lazy” ISet.
    if you didn’t encounter an extra lazy collection – its a collection that, for example, when you ask its count, it doesn’t fetch everything but issues a count query.
    your extra lazy ISet could issue an exists query whenever you try to add something.

    If you implement this, your code would be clean and you could submit the code to nhibernate core.

    you should however think about add range, and be careful not to issue N queries

    good luck.

  2. There is no reason for you to load all the empoyees to memory. you should write a query
    using HQL/Critiria API/Linq to NHibernate to check if the employee already existing in the DB.
    for example:

    var existingEmpoyee = session.Query<Employee>()
                                 .Where(e => e.Equals(newEmployee))
                                 .FirstOrDefault();
    if(existingEmployee != null)
       // Insert new employee to DB
    
  3. I would use a StatelessSession and batch optimization.

    The session will keep track of all the
    loaded objects, and if we load a lot
    of data, it will eventually blow out
    with an out of memory exception.
    Luckily, NHibernate has a ready made
    solution for this, the stateless
    session. The code now looks like this:

    using (IStatelessSession s = sessionFactory.OpenStatelessSession())
    {
        var books = new ActionableList<Book>(book => Console.WriteLine(book.Name));
        s.CreateQuery("from Book")
            .List(books);
    
    }
    

    The stateless session, unlike the
    normal NHibernate session, doesn’t
    keep track of loaded objects, so the
    code here and the data reader code are
    essentially the same thing.

    For batch optimization and more: NHibernate performance tricks.

Leave a Reply