Friday 6 December 2013

MongoDB For Analytics C# Example


I was one of the Co-founders of Appacts Open Source Mobile Analytics. I've wrote the original version on top of SQL Server 2008 R2. It was something I felt very comfortable with. I've heard of MongoDB about 4 years prior to Appacts. As many developers who got used to RDBMS systems I didn't see much point in MongoDB, I thought I get it and there is no point to it. It's a Document storage thing, who cares, right?

The bias towards technology has impaired my judgment, I even made assumptions that I understood how it works. It's a shame that for so many years I ignored it, and stayed away from it. It's so powerful. Let's find out what made me get out my comfort zone.

The Problem
I am getting thousands of raw data requests coming through (per second). I need to record event data in a cost effective way, additionally I need to be able to look this data up within milliseconds to give user an amazing experience.

This is how a raw data object looks like:
    
  public class Event
    {
        public string Data { get; set; }
        public EventType EventTypeId { get; set; }
        public string EventName { get; set; }
        public Int64 Length { get; set; }
        public string ScreenName { get; set; }
        public DateTime DateCreated { get; set; }
        public DateTime DateCreatedOnDevice { get; set; }
        public DateTime Date { get; set; }
        public Guid SessionId { get; set; }
        public string Version { get; set; }
        public Guid ApplicationId { get; set; }
        public Guid DeviceId { get; set; }
        public PlatformType PlatformId { get; set; }
    }

Imagine storing this for every request and then trying to query this over months of data.


The Solution
MongoDB has the following functions that are incredibly powerful:

These little functions change everything and document databases are great for data portioning.
Take the raw data example above, what do we really need to get from it? We don't actually need to store it in a pure raw format and we don't need to aggregate it. Instead we can create a new portioned aggregate object.

 public class ContentLoadSummary : Summary
    {
        public List<ContentDurationAggregate> Loads { get; set; }

        public ContentLoadSummary()
        {

        }

        public ContentLoadSummary(Event eventItem)
            : base(eventItem)
        {
            this.Loads = new List<ContentDurationAggregate>();
            this.Loads.Add(new ContentDurationAggregate(eventItem.ScreenName, 
                eventItem.EventName, eventItem.Length / 1000));
        }

    }

    public class ContentDurationAggregate : DurationAggregate
    {
        public string Content { get; set; }
        public string ScreenContent { get; set; }

        public ContentDurationAggregate(string screen, string content, long seconds)
            : base(screen, seconds)
        {
            this.Content = content;
            this.ScreenContent = String.Concat(screen, content);
        }

        public ContentDurationAggregate CopyOnlyKey()
        {
            return new ContentDurationAggregate(this.Key, this.Content, 0);
        }
    }

    public class DurationAggregate : Aggregate<string>
    {
        public long Seconds { get; set; }

        public DurationAggregate(string key, long seconds)
            : base(key)
        {
            this.Seconds = seconds;
        }
    }

 public class Aggregate<TKey>
    {
        public TKey Key { get; set; }
        public int Count { get; set; }

        public Aggregate()
        {

        }

        public Aggregate(TKey key)
        {
            this.Key = key;
        }

        public Aggregate<TKey> CopyOnlyKeys()
        {
            return new Aggregate<TKey>(this.Key);
        }
    }


Summary is partition base, this allows us to create new or append existing document with different dates, application versions, application platforms, etc.
    
   public abstract class Summary
    {
        public ObjectId Id { get; set; }
        public string Version { get; set; }
        public DateTime Date { get; set; }
        public Guid ApplicationId { get; set; }
        public PlatformType PlatformId { get; set; }
        public int Count { get; set; }

        public Summary()
        {

        }

        public Summary(Item item)
        {
            this.Version = item.Version;
            this.Date = DateTime.SpecifyKind(item.Date, DateTimeKind.Utc);
            this.ApplicationId = item.ApplicationId;
            this.PlatformId = item.PlatformId;
            this.Count++;
        }

        public Summary(DeviceInfo deviceInfo, ApplicationInfo applicationInfo)
        {
            this.Date = DateTime.SpecifyKind(deviceInfo.Date, DateTimeKind.Utc);
            this.ApplicationId = applicationInfo.ApplicationId;
            this.PlatformId = deviceInfo.PlatformType;
            this.Version = applicationInfo.Version;
            this.Count++;
        }
    }

The Example
 

 //You create a query, you are trying to see if you can find the data in the collection that matches your parameters.
                IMongoQuery queryBase = Query.And
                    (
                        Query<ContentLoadSummary>.EQ<DateTime>(mem => mem.Date, entity.Date),
                        Query<ContentLoadSummary>.EQ<Guid>(mem => mem.ApplicationId, entity.ApplicationId),
                        Query<ContentLoadSummary>.EQ<string>(mem => mem.Version, entity.Version),
                        Query<ContentLoadSummary>.EQ<PlatformType>(mem => mem.PlatformId, entity.PlatformId)
                    );

                // If you query and data doesn't exist then you will need to create a new document with the parameters, if it does exist then you just want to .Inc(=>) (Increment) a count
                IMongoUpdate update = Update<ContentLoadSummary>
                    .SetOnInsert(x => x.Version, entity.Version)
                    .SetOnInsert(x => x.Date, entity.Date)
                    .SetOnInsert(x => x.ApplicationId, entity.ApplicationId)
                    .SetOnInsert(x => x.PlatformId, entity.PlatformId)
                    .SetOnInsert(x => x.Loads, new List<ContentDurationAggregate>())
                    .Inc(mem => mem.Count, entity.Count);

                this.GetCollection<ContentLoadSummary>().FindAndModify(queryBase, SortBy.Descending("Date"), update, false, true);


//You need to extend your original query to also do a look up on the collection.
IMongoQuery queryLoadInsert = Query.And
                    (
                        queryBase,
                        Query.NE("Loads.ScreenContent", BsonValue.Create(entity.Loads.First().ScreenContent))
                    );

                IMongoUpdate insertLoad = Update
                    .Push("Loads", BsonValue.Create(entity.Loads.First().CopyOnlyKey().ToBsonDocument()));

//Does a look up on the query and then pushes a record in to the collection if it doesn't find the record
                this.GetCollection<ContentLoadSummary>().Update(queryLoadInsert, insertLoad);

                //Does a look on the record, if it finds it then it Increments Seconds and Count
                IMongoQuery queryLoadUpdate = Query.And
                    (
                        queryBase,
                        Query.EQ("Loads.ScreenContent", BsonValue.Create(entity.Loads.First().ScreenContent))
                    );

                IMongoUpdate updateContentLoad = Update
                    .Inc("Loads.$.Seconds", entity.Loads.First().Seconds)
                    .Inc("Loads.$.Count", 1);

                this.GetCollection<ContentLoadSummary>().Update(queryLoadUpdate, updateContentLoad);

The Output
 
{ 
    "ApplicationId" : BinData(3,"JyVgNisUD0S/P4VvblnRpg=="), 
    "Count" : 3, 
    "Date" : ISODate("2013-05-25T00:00:00Z"), 
    "Loads" : 
           [ 
              {       "Key" : "Searching", 
                      "Count" : 10,    
                      "Seconds" : NumberLong(102),      
                      "Content" : "ContentLookup", 
                       "ScreenContent" : "SearchingContentLookup" 
              },
              {       "Key" : "Saving", 
                      "Count" : 3,    
                      "Seconds" : NumberLong(7),      
                      "Content" : "CalledNumber", 
                      "ScreenContent" : "SavingCalledNumber" 
              } 
           ], 
    "PlatformId" : 6, 
    "Version" : "1.6", 
    "_id" : ObjectId("51a26322fef1ae002dd60a73") 
}



The Conclusion
MongoDB is very powerful and it's here to stay. Polyglot persistence is the way forward and companies/individuals need to embrace it. It's all about using the right tool for the right job and trying not to be too biased. It's time to get out of your comfort zone.

No comments:

Post a Comment