Sunday, 3 December 2017

ASP.NET MVC Multitenancy, Part 3 - NHibernate Tenant Data Filtering and Shared Database

In part 1 we have established tenant context. In part 2 we have configured authentication. Now we will be getting tenant data out of our RDBMS. Before we start storing and retrieving data it's important that you figure out what kind of multitenancy data segregation model you are going to go for. This is an important point and it really depends on your business requirements. To help you choose check out this Microsoft article: Multi-tenant SaaS database tenancy patterns.

My sample app is called “Stop The Line”, it is very basic. It holds very little data, and I have no idea if it's going to take off or not. I have decided to go with "Multi-tenant app with a single multi-tenant database" pattern. Put simply, this means that all my tenants are going to share the same database and I am going to use foreign key to discriminate between them. This is one the cheapest and fastest ways to implement tenancy.

When it comes to development what I don't want to do is to specify tenant id in each query and I don't want to manually specify tenant id when I create and update my domain objects. Remember the onion architecture, as far as I am concerned tenancy is an infrastructure concern and it should stay out of my domain. After few minutes of searching I have found this great blog post that meets these requirements: Bolt-on Multi-Tenancy in ASP.NET MVC With Unity and NHibernate: Part II – Commingled Data. Let's incorporate this in to my sample app.

Implementation

In my sample app it's possible to query in multitenancy context and without. For example, if a new tenant is signing up for an account then we would not apply the multitenancy context. This is because TenantContext will not exist until tenant was created in the first place.

Also in this implementation I have tried to keep infrastructure decoupled so that it would not be too difficult to move to another database tenancy pattern.

Base Configuration

As there are going to be two different contexts, non-multitenant and multitenant. I have decided to create NHConfiguration base class.

NHConfiguration.cs
   
public abstract class NHConfiguration
{
    public abstract global::NHibernate.Cfg.Configuration Config { get; }
}

Below is the standard configuration that is going to be used for non-multitenant read/write.

Configuration.cs
   
public class Configuration : NHConfiguration
{
    public override NHibernate.Cfg.Configuration Config => config;

    private NHibernate.Cfg.Configuration config;

    public Configuration()
    {
        this.config = Fluently.Configure()
            .Database(MsSqlConfiguration.MsSql2012
                .ConnectionString(c => c.FromConnectionStringWithKey("default"))
#if DEBUG
                .ShowSql()
                .FormatSql()
#endif
                .AdoNetBatchSize(50)
            ).Mappings(m =>
                m.FluentMappings.AddFromAssembly(Assembly.GetExecutingAssembly())
            )
            .ExposeConfiguration(c => SchemaMetadataUpdater.QuoteTableAndColumns(c))
            .BuildConfiguration();
    }
}

This configuration will get injected into the Unit of Work. If you are new to this, then you should read Unit Of Work Abstraction For NHibernate or Entity Framework C# Example. I have omitted the class implementation.

   
public class NHUnitOfWork : IUnitOfWork
{
    //..
    private NHConfiguration nhConfiguration;

    public NHUnitOfWork(NHConfiguration nhConfiguration)
    {
        this.nhConfiguration = nhConfiguration;
        //..
    }
    //..
}

Unfortunately with this solution you will still need to add the private "tenantId" field to your domain classes.

   
public class Some 
{
    //..
    private Guid tenantId;
}

The good news is that you don't have to do with it anything inside your domain. It will be handled by infrastructure. The bad news is that it still there. I have spent few hours trying to remove it, I have used some reflection and proxy class generation techniques, unfortunately to no avail. If you find an elegant solution please do share it.

Multitenant Writes

This is where standard configuration is expanded and interceptor is set. Interceptor will be used when some object is updated or saved.

ConfigurationMultiTenancy.cs
   
public class ConfigurationMultiTenancy : Configuration
{
    public ConfigurationMultiTenancy(NHSharedDatabaseMultiTenancyInterceptor interceptor)
    {
        this.Config.SetInterceptor(interceptor);
    }
}

Above you will notice that NHSharedDatabaseMultiTenancyInterceptor is injected in to ConfigurationMultiTenancy. Here is the actual interceptor.

NHSharedDatabaseMultiTenancyInterceptor.cs
   
public class NHSharedDatabaseMultiTenancyInterceptor : EmptyInterceptor
{
    readonly TenantContext tenantContext;

    public NHSharedDatabaseMultiTenancyInterceptor(TenantContext tenantContext)
    {
        this.tenantContext = tenantContext;
    }

    public override bool OnSave(object entity, object id, object[] state, string[] propertyNames, IType[] types)
    {
        int index = Array.IndexOf(propertyNames, "tenantId");

        if (index == -1)
            return false;

        state[index] = this.tenantContext.ID;

        entity.GetType()
                .GetField("tenantId", BindingFlags.Instance | BindingFlags.NonPublic)
                .SetValue(entity, tenantContext.ID);

        return base.OnSave(entity, id, state, propertyNames, types);
    }
}

Multitenant Reads

Configuring reads is much simpler. You just need to enable a filter and provide the parameter. First you need to define the actual filter.

MultitenantFilter.cs
   
public class MultitenantFilter : FilterDefinition
{
    public MultitenantFilter()
    {
        WithName("MultitenantFilter").AddParameter("TenantId", NHibernateUtil.Guid);
    }
}

Then you need to enable it.

NHUnitOfWorkMultitenancy.cs
   
public class NHUnitOfWorkMultitenancy : NHUnitOfWork
{
    public NHUnitOfWorkMultitenancy(NHConfiguration nhConfiguration, TenantContext tenantContext) 
        : base(nhConfiguration)
    {
        this.Session.EnableFilter("MultitenantFilter").SetParameter("TenantId", tenantContext.ID);
    }
}

Finally you just need to apply it.

   
public class SomeMap : ClassMap<Some>
{
    public SomeMap()
    {
        //..
        //This is used for writes
        Map(Reveal.Member<Some>("tenantId")).Not.Nullable();

        //This is used for reads
        ApplyFilter<MultitenantFilter>("TenantId = :TenantId");
    }
}

*Note: Code in this article is not production ready and is used for prototyping purposes only. If you have suggestions or feedback please do comment.

Thursday, 30 November 2017

ASP.NET MVC Multitenancy, Part 2 - OpenID Connect

In part 1 we have established tenant context. Most business applications need authentication. I have decided to use OpenID Connect, luckily for me it comes with awesome middleware and it's very easy to configure. Here is how you would do it:
public class Startup
{
    public void Configuration(IAppBuilder app)
    {
        //...
        app.UseOpenIdConnectAuthentication(new OpenIdConnectAuthenticationOptions
        {
            ClientId = "clientid",
            Authority = 'authority",
            RedirectUri = "http://mysite.com/"
        });
    }
}

However, not so fast, for each tenant you will need to configure client id, authority, url, etc. This also needs to be done at runtime as you don't want to ship new code every time new tenant registers to use your software. I have looked around for a solution and it turns out that this is a known problem, just take a look at the The Grand Auth Redesign. After a surprising amount of searching I have stumbled upon this amazing Multi-tenant middleware pipelines in ASP.NET Core blog post by Ben Foster. This solution is so damn good. Please read it. Unfortunately it did not work for me out of the box I had to port it over from ASP.NET Core to ASP.NET.

Hopefully by blogging and linking to Ben's article it will be easier for others to find it.

Example Usage

public class Startup
{
    public void Configuration(IAppBuilder app)
    {
        //..
        app.UsePerTenant((tenantContext, appBranch) =>
        {
            appBranch.SetDefaultSignInAsAuthenticationType(CookieAuthenticationDefaults.AuthenticationType);

            appBranch.UseCookieAuthentication(new CookieAuthenticationOptions
            { 
                CookieName = $"OAuthCookie.{tenantContext.FriendlyName}"
            });

            appBranch.UseOpenIdConnectAuthentication(new OpenIdConnectAuthenticationOptions
            {
                ClientId = tenantContext.AuthClientId,
                Authority = tenantContext.AuthAuthority,
                RedirectUri = $"http://localhost:2295/{tenantContext.FriendlyName}/"
            });
        });
    }
}

ASP.NET Implementation


TenantPipelineMiddleware.cs
   
/// <summary>
/// Workaround for the known OpenID multitenancy issue https://github.com/aspnet/Security/issues/1179
/// based on http://benfoster.io/blog/aspnet-core-multi-tenant-middleware-pipelines and https://weblogs.asp.net/imranbaloch/conditional-middleware-in-aspnet-core designs 
/// </summary>
public class TenantPipelineMiddleware : OwinMiddleware
{
    readonly IAppBuilder rootApp;
    readonly Action<TenantContext, IAppBuilder> newBranchAppConfig;
    readonly ConcurrentDictionary<TenantContext, Lazy<Func<IDictionary<string, object>, Task>>> branches;

    public TenantPipelineMiddleware(OwinMiddleware next, IAppBuilder rootApp, Action<TenantContext, IAppBuilder> newBranchAppConfig)
        : base(next)
    {
        this.rootApp = rootApp;
        this.newBranchAppConfig = newBranchAppConfig;
        this.branches = new ConcurrentDictionary<TenantContext, Lazy<Func<IDictionary<string, object>, Task>>>();
    }

    public override async Task Invoke(IOwinContext context)
    {
        TenantContext tenantContext = context.GetTenantContext();

        if (tenantContext == null || tenantContext.IsEmpty())
        {
            await this.Next.Invoke(context);
            return;
        }

        Lazy<Func<IDictionary<string, object>, Task>> branch = 
            branches.GetOrAdd(tenantContext, new Lazy<Func<IDictionary<string, object>, Task>>(() =>
            {
                IAppBuilder newAppBuilderBranch = rootApp.New();
                newBranchAppConfig(tenantContext, newAppBuilderBranch);
                newAppBuilderBranch.Run((oc) => this.Next.Invoke(oc));
                return newAppBuilderBranch.Build();
            }));

        await branch.Value(context.Environment);
    }
}

OwinContextExtensions.cs
   
public static class OwinContextExtensions
{
    static string tenantContextKey = "tenantcontext";

    public static TenantContext GetTenantContext(this IOwinContext context)
    {
        if (context.Environment.ContainsKey(tenantContextKey))
        {
            return (TenantContext)context.Environment[tenantContextKey];
        }
        return null;
    }
}

AppBuilderExtensions.cs
   
public static class AppBuilderExtensions
{
    public static IAppBuilder UsePerTenant(this IAppBuilder app, Action<TenantContext, IAppBuilder> newBranchAppConfig)
    {
        return app.Use<TenantPipelineMiddleware>(app, newBranchAppConfig);
    }
}

*Note: Code in this article is not production ready and is used for prototyping purposes only. If you have suggestions or feedback please do comment.

Sunday, 20 August 2017

ASP.NET MVC Multitenancy, Part 1 - Routing with OWIN

This blog post was rewritten on 29/11/2017.

Recently I have been migrating one of my multitenant ASP.NET MVC application's to the OWIN middleware. This has presented me with an opportunity to change my initial tenant resolution and pipeline registration implementation.

I've started by doing some research to see if I could find something I could just download and use. However, I could not find anything that would meet the following requirements:
  • ASP.NET Core and ASP.NET compatibility
  • Optional multitenant processing i.e. all requests don't have be multitenant
  • Lifecycle delegates for missing tenant identifier and tenant record, and tenant context creation
  • Runtime tenant context dependency injection registration
  • Compatible with multitenant URL paths mysite.com/sometenant/ and could work with subdomains sometenant.mysite.com

After much deliberation, I have decided to go with my own implementation. However this implementation was massively influenced by Ben Foster's Building multi-tenant applications with ASP.NET Core (ASP.NET 5) and OpenID Connect Authentication Middleware designs.

Example Usage

public class Startup
{
    public void Configuration(IAppBuilder app)
    {
        //..
        app.UseMultitenancy(new MultitenancyNotifications<YourTenantRecord>
        {
            TenantIdentifierNotFound = context =>
            {
                throw new HttpException(404, "Tenant identifier must be provided");
            },
            TenantRecordNotFound = context =>
            {
                context.Response.Redirect("/signup/tenant/");
                return Task.FromResult(0);
            },
            CreateTenantContext = (context, tenantRecord) =>
            {
                ITenantContextFactory tenantContextFactory = ServiceLocator.Resolve<ITenantContextFactory>();
                TenantContext tenantContext = tenantContextFactory.Create(tenantRecord.Id, tenantRecord.NameFriendly, tenantRecord.AuthClientId, tenantRecord.AuthAuthority);
                return Task.FromResult(tenantContext);
            }
        });
    }
}
Please note that "CreateTenantContext" is a factory delegate. It allows you to control tenant record to tenant context mapping. It also allows you to perform other actions such as runtime tenant context dependency injection registration. By doing this, downstream objects such as controllers will get tenant context injected in to them.
public class SomeController : Controller
{
    public SomeController(TenantContext tenantContext)
    {
        //..
    }
}

Implementation


Tenant Identifier Extraction

In my implementation have decided to use MVC routing with data tokens for tenant discrimination.

RouteConfig.cs
   
public class RouteConfig
{
    public static void RegisterRoutes(RouteCollection routes)
    {
        //...
        routes.MapRoute(
            "multi",
            "{tenant}/{controller}/{action}/{id}",
            new {controller = "Home", action = "Index", id = UrlParameter.Optional}
        ).DataTokens.Add("name", "webclient_multitenancy");
    }
}
By using data tokens, I am able to check the request route. In this case if request route has data token of name "webclient_multitenancy" I will know that I will be able to extract tenant identifier from the route. As you have probably gathered, in this case I am getting multitenancy identifier from the URL path mysite.com/sometenant/.

ITenantIdentifierExtractor.cs
   
public interface ITenantIdentifierExtractor
{
    bool CanExtract(IOwinContext context);
    string GetIdentifier(IOwinContext context);
}

RouteDataTokensTenantIdentifierExtractor.cs
   
public class RouteDataTokensTenantIdentifierExtractor : ITenantIdentifierExtractor
{
    public bool CanExtract(IOwinContext context)
    {
        RouteData routeData = this.getRouteData(context);
        return routeData != null && routeData.DataTokens.ContainsValue("webclient_multitenancy");
    }

    public string GetIdentifier(IOwinContext context)
    {
        if (!this.CanExtract(context))
            return null;

        RouteData routeData = this.getRouteData(context);
        return routeData.GetRequiredString("tenant");
    }

    private RouteData getRouteData(IOwinContext context)
    {
        HttpContextBase httpContext = (HttpContextBase)context.Environment["System.Web.HttpContextBase"];
        return RouteTable.Routes.GetRouteData(httpContext);
    }
}

You don’t have to use data token approach in your application, you can get multitenancy data out of the HTTP request manually.

Tenant Record Resolution

Resolution is where you call service or repository to obtain tenant record, this tenant record will be used later on to create tenant context.

ITenantRecordResolver.cs
   
public interface ITenantRecordResolver<TTenantRecord>
{
    TTenantRecord GetTenant(string tenantIdentifier);
}

In this implementation, I have decided to call the ITenantService to get the TenantDto object.

TenantRecordResolver.cs
   
public class TenantRecordResolver : ITenantRecordResolver<TenantDto>
{
    readonly ITenantService tenantService;

    public TenantRecordResolver(ITenantService tenantService)
    {
        this.tenantService = tenantService;
    }

    public TenantDto GetTenant(string tenantIdentifier)
    {
        return this.tenantService.Get(tenantIdentifier);
    }
}

Most of the time on each request you will be resolving tenant record. At scale this will become very inefficient so it is highly likely that you will want to cache tenant record, to do this you can use decorator pattern.

TenantRecordResolverCacheDecorator.cs
   
public class TenantRecordResolverCacheDecorator : ITenantRecordResolver<TenantDto>
{
    readonly TenantRecordResolver tenantRecordResolver;

    public TenantRecordResolverCacheDecorator(TenantRecordResolver tenantRecordResolver)
    {
        this.tenantRecordResolver = tenantRecordResolver;
    }

    public TenantDto GetTenant(string tenantIdentifier)
    {
        string cacheKey = $"tenantIdentifier:{tenantIdentifier}";

        TenantDto tenant = (TenantDto)MemoryCache.Default[cacheKey];

        if (tenant != null)
        {
            return tenant;
        }

        tenant = this.tenantRecordResolver.GetTenant(tenantIdentifier);
        if (tenant == null)
        {
            return null;
        }

        MemoryCache.Default.Set(cacheKey, tenant, new CacheItemPolicy
        {
            AbsoluteExpiration = DateTimeOffset.Now.AddMinutes(10)
        });
    
        return tenant;
    }
}

Middleware & Extensions

We now have all the key components in place and it is time to implement the actual middleware.

MultitenancyMiddleware.cs
   
public class MultitenancyMiddleware<TTenantRecord>
    where TTenantRecord : class
{
    readonly ITenantIdentifierExtractor tenantIdentifierExtractor;
    readonly ITenantRecordResolver<TTenantRecord> tenantRecordResolver;
    readonly MultitenancyNotifications<TTenantRecord> notifications;
    readonly Func<Task> next;

    public MultitenancyMiddleware(Func<Task> next, ITenantIdentifierExtractor tenantIdentifierExtractor, 
        ITenantRecordResolver<TTenantRecord> tenantRecordResolver,  MultitenancyNotifications<TTenantRecord> notifications)
    {
        this.next = next;
        this.tenantIdentifierExtractor = tenantIdentifierExtractor;
        this.tenantRecordResolver = tenantRecordResolver;
        this.notifications = notifications;
    }

    public async Task Invoke(IOwinContext context)
    {
        if (!this.tenantIdentifierExtractor.CanExtract(context))
        {
            await this.next();
            return;
        }

        string identifier = this.tenantIdentifierExtractor.GetIdentifier(context);

        if (string.IsNullOrEmpty(identifier))
        {
            await this.notifications.TenantIdentifierNotFound(context);
            return;
        }

        TTenantRecord tenantRecord = this.tenantRecordResolver.GetTenant(identifier);
        if (tenantRecord == null)
        {
            await this.notifications.TenantRecordNotFound(context);
            return;
        }

        TenantContext tenantContext = await this.notifications.CreateTenantContext(context, tenantRecord);

        context.SetTenantContext(tenantContext);

        await this.next();
    }
}

MultitenancyNotifications.cs
   
public class MultitenancyNotifications<TTenantRecord>
    where TTenantRecord : class
{
    public Func<IOwinContext, Task> TenantIdentifierNotFound { get; set; }
    public Func<IOwinContext, Task> TenantRecordNotFound { get; set; }
    public Func<IOwinContext, TTenantRecord, Task<TenantContext>> CreateTenantContext { get; set; }

    public MultitenancyNotifications()
    {
        this.TenantIdentifierNotFound = context => Task.FromResult(0);
        this.TenantRecordNotFound = context => Task.FromResult(0);
        this.CreateTenantContext = (context, tenantRecord) => Task.FromResult<TenantContext>(null);
    }
}

OwinContextExtensions.cs
   
public static class OwinContextExtensions
{
    static string tenantContextKey = "tenantcontext";

    public static void SetTenantContext(this IOwinContext context, TenantContext tenantContext)
    {
        context.Environment.Add(tenantContextKey, tenantContext);
    }
}
AppBuilderExtensions.cs
   
public static class AppBuilderExtensions
{
    public static IAppBuilder UseMultitenancy<TTenantRecord>(this IAppBuilder app, 
        MultitenancyNotifications<TTenantRecord> notifications)
        where TTenantRecord : class
    {
        return app.Use((context, next) =>
        {
            MultitenancyMiddleware<TTenantRecord> multitenancyMiddleware = 
                ServiceLocator.Resolve<MultitenancyMiddleware<TTenantRecord>>(new { next, notifications });
            return multitenancyMiddleware.Invoke(context);
        });
    }
}


*Note: Code in this article is not production ready and is used for prototyping purposes only. If you have suggestions or feedback please do comment.

Sunday, 2 July 2017

Testing Data for Randomness

Movie "Contact 1997"
When I was a young man I watched “Contact 1997” movie, I don’t know why but it really stuck with me. I have recently re-watched it. There were few scenes where Eleanor Arroway listened to space noise on certain frequency, eventually she heard something. It was awesome. Now, in reality I imagine what she was doing would not work at all. There are many dishes recording data on different frequencies, there would be simply too much data for one person to go through. For fun, I have decided to write a program that can detect non-randomness automatically, this way Eleanor Arroway would not need to waste hours listening to space noise.

Non Randomness Detection

To make non randomness detection work I will use pattern discovery algorithm and birthday problem calculation that I have blogged about earlier. Pattern discovery algorithm will discover patterns in the provided data. Birthday problem calculation will be given three parameters, size of the data, all possible pattern permutations given the number of unique characters and length in identified pattern, and number of unique pattern matches. If probability of a given pattern is not in the range then it’s highly likely that pattern is not random.

   

public class RandomnessDetection
{
    public static List<KeyValuePair<string, List<int>>> FindNonRandomPatterns(Dictionary<string, List<int>> patterns, string raw)
    {
        if (patterns == null || patterns.Count == 0)
            throw new Exception("No patterns found, cannot test.");

        int uniqueChars = raw.Distinct().Count();
        int rawLength = raw.Length;

        if (uniqueChars == 1)
            throw new Exception("Need to have more than one character");

        if (rawLength < 29)
            throw new Exception("String needs to be at least 30 chars long");

        List<KeyValuePair<string, List<int>>> nonRandomPatterns = new List<KeyValuePair<string, List<int>>>();

        foreach (KeyValuePair<string, List<int>> pattern in patterns)
        {
            double possibleCombinations = Math.Pow(uniqueChars, pattern.Key.Length);
            double prob = BirthdayProblemPoissonApprox.ProbabilityOfPair(rawLength - 2, possibleCombinations, pattern.Value.Count);

            if (Math.Truncate(prob * 100) / 100 <= 0.01)
                nonRandomPatterns.Add(pattern);
        }

        return nonRandomPatterns;
    }
}


Command-line interface

Now we have all the key ingredients. We just need create a program that we can interact with and we can start testing data for randomness. I have decided to use Command Line Parser Library to parse args for me, it’s awesome. You will notice that I have allowed for binary content checking. Unfortunately when I have designed pattern discovery algorithm I have not designed it to work with the binary data, so to make that work I had to encode data to base64.

   

class Program
{
    class Options
    {
        [Option('f', "file", Required = true, HelpText = "Input file to be processed.")]
        public string InputFile { get; set; }

        [Option('b', "fileisbinary", DefaultValue = false, HelpText = "If file is binary it will be encoded before pattern search.")]
        public bool FileIsBinary { get; set; }

        [Option('s', "showallpatterns", DefaultValue = false, HelpText = "Shows all patterns random and non-random.")]
        public bool ShowAllPatterns { get; set; }

        [Option('v', "verbose", DefaultValue = true,
          HelpText = "Prints all messages to standard output.")]
        public bool Verbose { get; set; }

        [ParserState]
        public IParserState LastParserState { get; set; }

        [HelpOption]
        public string GetUsage()
        {
            return HelpText.AutoBuild(this,
              (HelpText current) => HelpText.DefaultParsingErrorsHandler(this, current));
        }
    }

    static void Main(string[] args)
    {
        var options = new Options();
        if (CommandLine.Parser.Default.ParseArguments(args, options))
        {
            if (options.Verbose)
            {
                Console.WriteLine("file: " + options.InputFile);
                Console.WriteLine("fileisbinary: " + options.FileIsBinary);
                Console.WriteLine("showallpatterns: " + options.ShowAllPatterns);
            }

            try
            {
                if (!File.Exists(options.InputFile))
                {
                    Console.WriteLine("File path doesn't exist, can't do pattern discovery");
                    return;
                }

                string raw = string.Empty;

                if (options.FileIsBinary)
                    raw = System.Convert.ToBase64String(File.ReadAllBytes(options.InputFile));
                else
                    raw = File.ReadAllText(options.InputFile);

                if (String.IsNullOrEmpty(raw))
                {
                    Console.WriteLine("Input File is empty");
                    return;
                }

                Console.WriteLine("Starting to discover patterns");

                Dictionary<string, List<int>> patterns = RepeatingPatternDiscovery.DiscoverPatterns(raw);

                Console.WriteLine($"Discovered {patterns.Count} patterns, testing them for randomness");

                List<KeyValuePair<string, List<int>>> nonRandomPatterns = RandomnessDetection.FindNonRandomPatterns(patterns, raw);

                if (nonRandomPatterns.Count != 0)
                {
                    Console.WriteLine($"Data is not random, {nonRandomPatterns.Count} non random patterns found:");

                    foreach (KeyValuePair<string, List<int>> pattern in nonRandomPatterns)
                    {
                        Console.WriteLine($"Pattern: {pattern.Key}, found: {pattern.Value.Count} times, Indexes: [{String.Join(", ", pattern.Value)}]");
                    }
                }
                else
                {
                    Console.WriteLine("Data is random");
                }

                if (options.ShowAllPatterns)
                {
                    Console.WriteLine("All patterns found:");

                    foreach (KeyValuePair<string, List<int>> pattern in patterns)
                    {
                        Console.WriteLine($"Pattern: {pattern.Key}, found: {pattern.Value.Count} times, Indexes: [{String.Join(", ", pattern.Value)}]");
                    }
                }

            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error: {ex.Message}");
            }
        }
    }


Testing text for randomness

To start, I am going to test text_blogpost.txt and text_pi.txt for randomness. Is π random?
C:\data>randomnesstest -f C:\data\text_pi.txt
file: C:\data\text_pi.txt
fileisbinary: False
showallpatterns: False
Starting to discover patterns
Discovered 228688 patterns, testing them for randomness
Data is random

Apparently π is random. How about one of my blog posts?
C:\data>randomnesstest -f C:\data\text_blogpost.txt
file: C:\data\text_blogpost.txt
fileisbinary: False
showallpatterns: False
Starting to discover patterns
Discovered 559 patterns, testing them for randomness
Data is not random, non random patterns found:
Pattern:
    public Purchase Checkout(Cart cart)
    {
        Purchase purchase = new Purchase()
        {
             Customer = this,
             Products = cart.Products,
             Created = DateTime.Now
        };
        this.purchases.Add(purchase);
        return purchase;
    }
}

, found: 2 times, Indexes: [6718, 8033]
...
My blog post is not random. That’s fantastic news! Sometimes I feel like it is.

Testing pictures for randomness

Random noise picture:
white noise
C:\data>randomnesstest -f C:\data\pic_noise.png -b
file: C:\data\pic_noise.png
fileisbinary: True
showallpatterns: False
Starting to discover patterns
Discovered 7685 patterns, testing them for randomness
Data is random

Van Gogh, Bridge In The Rain picture:
van gogh rain
C:\data>randomnesstest -f C:\data\pic_vangogh.jpg -b
file: C:\data\pic_vangogh.jpg
fileisbinary: True
showallpatterns: False
Starting to discover patterns
Discovered 432021 patterns, testing them for randomness
Data is not random, non random patterns found:
Pattern: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA, found: 2 times, Indexes: [1311, 1312]
Pattern: CgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCg, found: 2 times, Indexes: [3388, 3392]
Pattern: DQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0N, found: 2 times, Indexes: [3508, 3512]
Pattern: AAAAAWFlaIAAAAAAAA, found: 2 times, Indexes: [1383, 3319]
Pattern: TWF1cmljZSBUcm9tc, found: 2 times, Indexes: [100, 548]
Pattern: ZGM6Y3JlYXRvcj4gP, found: 2 times, Indexes: [508, 596]
Pattern: EBAQAAAAAAAAAAAA, found: 2 times, Indexes: [3594, 3742]
...

Testing audio for randomness

For a final test I am going to test white noise and building noise for randomness. I have downloaded white noise file from here.
C:\data>randomnesstest -f C:\data\sound_whitenoise.wav -b
file: C:\data\sound_whitenoise.wav
fileisbinary: True
showallpatterns: False
Starting to discover patterns
Discovered 204117 patterns, testing them for randomness
Data is random
C:\data>randomnesstest -f C:\data\sound_buildingnoise.m4a -b
file: C:\data\sound_buildingnoise.m4a
fileisbinary: True
showallpatterns: False
Starting to discover patterns
Discovered 61762 patterns, testing them for randomness
Data is not random, non random patterns found:
Pattern: AwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwM, found: 2 times, Indexes: [262938, 262950]
Pattern: AAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAA, found: 2 times, Indexes: [256227, 256387]
Pattern: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA, found: 2 times, Indexes: [256274, 256275]
Pattern: aGQAAAAA0v3TfNL903wAAKxEAB, found: 2 times, Indexes: [256180, 256468]
Pattern: AAAAAAAAAAQAAAAAAAAAAAA, found: 2 times, Indexes: [256223, 256691]
Pattern: AAACvAAAAqgAAAKoAAAC, found: 2 times, Indexes: [256999, 261399]
Pattern: AAAArwAAAKwAAAClAAAA, found: 2 times, Indexes: [259132, 260604]
Pattern: AAACrAAAArwAAAKwAAAC, found: 2 times, Indexes: [259159, 260599]
...


Conclusion

So, there you go, we have just saved Eleanor Arroway centuries worth of work. Just to be clear, I’ve done this for fun, I am not expecting this to be super precise and to be used in production systems, there are many other much more sophisticated and proven ways that you can use to test for randomness.

Friday, 23 June 2017

Birthday Problem More Than 2 People and Reverse

More Than 2 People Sharing The Same Birthday


There are very few coding examples out there when it comes to calculating the probability of more than 2 people sharing the same birthday. To make this work I have used this Poisson approximation equation, in generalised form this equation looks like this:

n = random people
g = pairs
d = days
r = probability



Here is the source code:
  public static double ProbabilityOfPair(int numOfRandomInstances, int uniquePossibilities, int numOfPairs = 2)
{
    if (numOfPairs < 2)
        throw new Exception("Cannot have less than two pairs");

    if (numOfRandomInstances < 1 || uniquePossibilities < 1)
        throw new Exception("Number of random instances or unique possibilities needs to be greater than 0");

    double bionomial = calcBinomialCoefficient(numOfRandomInstances, numOfPairs);
    double totalPairPossibilities = Math.Pow(uniquePossibilities, numOfPairs - 1);
    return 1 - Math.Exp(-(bionomial / totalPairPossibilities));
}

static double calcBinomialCoefficient(double n, double k)
{
    if (k > n)
        throw new Exception("n must be greater than k");

    double result = 1;
    for(int i = 1; i <= k; i++)
    {
        result *= n - (k - i);
        result /= i;
    }
    return result;
}

Birthday Problem Reverse


Now what do you do if you want to reverse the birthday problem and instead get back number of random people required to reach certain probability? For example, instead of this:
  ProbabilityOfPair(23, 365, 2) // returns 0.5000017521827107
You would like to do something like this:
  FindNumOfRandomInstancesRequired(365, 2, 0.50) // returns 23

To begin with I’ve used ProbabilityOfPair function and just did a linear search. This did not scale very well at all, so I have decided to use binary search algorithm instead. This has worked very well, however it still felt inefficient. So, I have decided to rearrange the above equation, this is what I have ended up with:

n = random people
g = pairs
d = days
r = probability



Here is the source code:
  public static int FindNumOfRandomInstancesRequired(int uniquePossibilities, int numOfPairs, double requiredProbability)
{
    if (requiredProbability < 0.00 || requiredProbability >= 1.00)
        throw new Exception("Required probability must be between 0.00 and 1.00");

    double inverse = -Math.Log(1.0 - requiredProbability);
    double bionomialFact = factorial(numOfPairs);
    double totalPairPossibilities = Math.Pow(uniquePossibilities, numOfPairs - 1);
    double precision = (numOfPairs - 1.0) / 2.0;
    double numOfRandomInstancesRequired = Math.Pow(bionomialFact * totalPairPossibilities * inverse, 1.0 / numOfPairs) + precision;
            
    return (int)Math.Round(numOfRandomInstancesRequired, 0, MidpointRounding.AwayFromZero);
}

static int factorial(int i)
{
    if (i <= 1)
        return 1;
    return i * factorial(i - 1);
}

Full Source Code (licensed under the GNU licence v3.0)

  public class BirthdayProblemPoissonApprox
{
    /// <summary>
    /// Generalised birthday problem probability calculation using poisson approximation.  
    /// </summary>
    /// <example>
    ///  ProbabilityOfPair(23, 365), returns 0.5000017521827107
    ///  ProbabilityOfPair(30, 365, 3), returns 0.030015086543320635
    /// </example>
    /// <param name="numOfRandomInstances">Number of random instances in the group e.g. number of random people in the group</param>
    /// <param name="uniquePossibilities">Number of unique possibilities that random instances can fit into e.g. number of unique days</param>
    /// <param name="numOfPairs">Number of pairs</param>
    /// <see cref="https://en.wikipedia.org/wiki/Birthday_problem"/>
    /// <remarks>To make this scale beyond 2 pairs, I have used equations from here: https://math.stackexchange.com/questions/25876/probability-of-3-people-in-a-room-of-30-having-the-same-birthday </remarks>
    /// <returns>Probability of pair</returns>
    public static double ProbabilityOfPair(int numOfRandomInstances, int uniquePossibilities, int numOfPairs = 2)
    {
        if (numOfPairs < 2)
            throw new Exception("Cannot have less than two pairs");

        if (numOfRandomInstances < 1 || uniquePossibilities < 1)
            throw new Exception("Number of random instances or unique possibilities needs to be greater than 0");

        double bionomial = calcBinomialCoefficient(numOfRandomInstances, numOfPairs);
        double totalPairPossibilities = Math.Pow(uniquePossibilities, numOfPairs - 1);
        return 1 - Math.Exp(-(bionomial / totalPairPossibilities));
    }
    /// <summary>
    /// Generalised birthday problem reversed, find number of random instances required to fit certain probability.
    /// </summary>
    /// <example>
    /// FindNumOfRandomInstancesRequired(365, 2, 0.50), returns 23
    /// FindNumOfRandomInstancesRequired(365, 3, 0.03), returns 30
    /// </example>
    /// <param name="uniquePossibilities">Number of unique possibilities that random instances can fit into e.g. number of unique days</param>
    /// <param name="numOfPairs">Number of pairs</param>
    /// <param name="requiredProbability">Probability that must be reached</param>
    /// <returns>Number of random instances required</returns>
    public static int FindNumOfRandomInstancesRequired(int uniquePossibilities, int numOfPairs, double requiredProbability)
    {
        if (requiredProbability < 0.00 || requiredProbability >= 1.00)
            throw new Exception("Required probability must be between 0.00 and 1.00");

        double inverse = -Math.Log(1.0 - requiredProbability);
        double bionomialFact = factorial(numOfPairs);
        double totalPairPossibilities = Math.Pow(uniquePossibilities, numOfPairs - 1);
        double precision = (numOfPairs - 1.0) / 2.0;
        double numOfRandomInstancesRequired = Math.Pow(bionomialFact * totalPairPossibilities * inverse, 1.0 / numOfPairs) + precision;
            
        return (int)Math.Round(numOfRandomInstancesRequired, 0, MidpointRounding.AwayFromZero);
    }

    /// <summary>
    /// Calculates efficiently number of possible combinations, "n choose k". 
    /// </summary>
    /// <example>For example, suppose you have a deck of 5 cards (n = 5) and you want to know how many different hands of 3 cards you can make (k = 3). If you number the cards 1 through 5, then you will get 10 possible combinations.</example>
    /// <param name="n">Total set of items</param>
    /// <param name="k">Number of items to pick from n</param>
    /// <see cref="http://www.vb-helper.com/howto_calculate_n_choose_k.html"/>
    /// <returns></returns>
    static double calcBinomialCoefficient(double n, double k)
    {
        if (k > n)
            throw new Exception("n must be greater than k");

        double result = 1;
        for(int i = 1; i <= k; i++)
        {
            result *= n - (k - i);
            result /= i;
        }
        return result;
    }

    static int factorial(int i)
    {
        if (i <= 1)
            return 1;
        return i * factorial(i - 1);
    }
}