Sunday, 20 August 2017

ASP.NET MVC Multitenant routing with OWIN

Recently I have been migrating one of my multitenant ASP.NET MVC application's to the OWIN middleware. This has presented me with an opportunity to change my initial implementation. In this blog post I am going to explore a possible solution.

Requirement

Provide a simple to understand, use and maintain multitenant OWIN middleware.

Solution

This solution was inspired by the OpenIDConnect middleware. Here is how it’s going to look like when we are done:
public class Startup
{
    public void Configuration(IAppBuilder app)
    {
      //...
      app.UseMultitenancy(
          //under what conditions multitenancy middleware should be used
          new WhenMiddleware((context) =>
          {
              HttpContextBase httpContext = (HttpContextBase)context.Environment["System.Web.HttpContextBase"];
              RouteData routeData = RouteTable.Routes.GetRouteData(httpContext);
              return Task.FromResult(routeData != null && routeData.DataTokens.ContainsValue("multi"));
          }),
          new MultitenancyNotifications
          {
              TenantNameCouldNotBeFound = context =>
              {
                  //Do what you need to do... redirect, throw exception, etc.
                  throw new HttpException(400, "Tenant name must be provided");
              },
              TenantCouldNotBeResolved = context =>
              {
                  //Do what you need to do... redirect, throw exception, etc.
                  context.Response.Redirect("https://your_url_goes_here");
                  return Task.FromResult(0);
              },
              TenantResolved = (context, tenant) =>
              {
                  //Do what you need to do... update DI object, redirect, etc. 
                  ServiceLocator.Resolve<Tenant>(new {id = tenant.ID, name = tenant.Name});
                  return Task.FromResult(0);
              }
          }
      );
    }
}

Theory

When it comes to multitenancy we need to do 3 things:
  • Conditional Check - Check the context, make sure that conditions are right for the multitenancy execution.
  • Extraction - Extract tenant name from the payload, this payload can be anything, Http Request, Queue Message, etc.
  • Resolution - Resolve tenant settings by using tenant name. These setting properties can include tenant connection string (if you are creating database per tenant), authentication secrets, etc.

Implementation

To make my app work I have decided to use MVC routing with data tokens for tenant discremintation:

RouteConfig.cs
   
public class RouteConfig
{
    public static void RegisterRoutes(RouteCollection routes)
    {
        //...
        routes.MapRoute(
          name: "multi",
          url: "{tenant}/{controller}/{action}/{id}",
          defaults: new { controller = "Login", action = "Index", id = UrlParameter.Optional }
        ).DataTokens.Add("name", "multi"); 
    }
}
You don’t have to use MVC routing with data tokens in your application you can use subdomains, headers, etc.

Conditional Check

OWIN pipeline supports MapWhen, MapWhen creates pipeline branching. I have decided to avoid that unnecessary complexity at this stage. Instead I have chosen to use linear conditional approach, if conditions are correct execute the middleware otherwise just skip it. To make this work I have decided to create my own middleware:

WhenMiddleware.cs
   
public class WhenMiddleware
{
    private Func<IOwinContext, Task<bool>> condition;

    public WhenMiddleware(Func<IOwinContext, Task<Boolean>> condition)
    {
        this.condition = condition;
    }

    public async Task Invoke(IOwinContext context, Func<IOwinContext, Func<Task>, Task> conditionalNext, Func<Task> next)
    {
        if (condition(context).Result)
        {
            await conditionalNext(context, next);
        }
        else
        {
            await next();
        }
    }
}
It will be used like this later on:
   
Func<IOwinContext, Func<Task>, Task> conditionalNext = (context, next) =>
    new SomeMiddleware.Invoke(context, next);

app.Use((context, next) => when.Invoke(context,  conditionalNext, next));

Extraction

There are lots of different payloads that you might want to extract data out of. As such I have decided to introduce this interface:

ITenantNameExtractor.cs
   
public interface ITenantNameExtractor
{
    string GetName(IOwinContext context);
}

In my example I am going to extract tenant name from the MVC route using data tokens, here is the concrete implementation:

RouteDataTokensTenantNameExtractor.cs
   
public class RouteDataTokensTenantNameExtractor : ITenantNameExtractor
{
    public string GetName(IOwinContext context)
    {
        HttpContextBase httpContext = (HttpContextBase) context.Environment["System.Web.HttpContextBase"];
        RouteData routeData = RouteTable.Routes.GetRouteData(httpContext);
        if (routeData != null && routeData.DataTokens.ContainsValue("multi"))
            return routeData.GetRequiredString("tenant");

        return null;
    }
}

If I wanted to extract tenant name from the header then I would have implemented something like this:

HeaderTenantNameExtractor.cs
   
public class HeaderTenantNameExtractor : ITenantNameExtractor
{
    public string GetName(IOwinContext context)
    {
        string tenantName = context.Request.Headers["tenantname"];
        //...
    }
}

Resolution

Resolution is where you call some service or repository and you obtain all data that is required to hydrate tenant object. You might get a connection string, authentication client id and secret etc.

Tenant.cs
   
public class Tenant
{
    public  Guid ID { get; private set; }
    public string Name { get; private set; }
    //.. add your properties

    public Tenant(Guid id, string name, //..add your properties)
    {
        this.ID = id;
        this.Name = name;
        //.. add your properties 
    }
}

ITenantResolver.cs
   
public interface ITenantResolver
{
    Tenant GetTenant(string tenantName);
}

TenantResolver.cs
   
public class TenantResolver : ITenantResolver
{
    private readonly ITenantService tenantService;

    public TenantResolver(ITenantService tenantService)
    {
        this.tenantService = tenantService;
    }

    public Tenant GetTenant(string tenantName)
    {
        TenantDto tenantDto = this.tenantService.Get(tenantName).Value;
        if (tenantDto != null)
        {
            return new Tenant(tenantDto.Id, tenantDto.Name, ...);
        }
        return null;
    }
}

It’s highly likely that you will want to cache tenant resolution, to do this you can use decorator pattern and cache your tenant data:

CacheTenantResolver.cs
   
public class CacheTenantResolver : ITenantResolver
{
    private TenantResolver tenantResolver;

    public CacheTenantResolver(TenantResolver tenantResolver)
    {
        this.tenantResolver = tenantResolver;
    }

    public Tenant GetTenant(string tenantName)
    {
        string cacheKey = $"tenantName:{tenantName}";

        Tenant tenant = (Tenant)MemoryCache.Default[cacheKey];

        if (tenant != null)
            return tenant;

        tenant = this.tenantResolver.GetTenant(tenantName);
        if (tenant == null)
            return null;

        MemoryCache.Default.Set(cacheKey, tenant, new CacheItemPolicy
        {
            AbsoluteExpiration = DateTimeOffset.Now.AddMinutes(10)
        });
    
        return tenant;
    }
}

Bringing it all together

Now that we have all key components we need to bring it all together.

MultitenancyMiddleware.cs
   
public class MultitenancyMiddleware
{
    ITenantNameExtractor tenantNameExtractor;
    ITenantResolver tenantResolver;
    MultitenancyNotifications notifications;

    public MultitenancyMiddleware(ITenantNameExtractor tenantNameExtractor, 
        ITenantResolver tenantResolver, MultitenancyNotifications notifications)
    {
        this.tenantNameExtractor = tenantNameExtractor;
        this.tenantResolver = tenantResolver;
        this.notifications = notifications;
    }

    public async Task Invoke(IOwinContext context, Func<Task> next)
    {
        string name = this.tenantNameExtractor.GetName(context);

        if (string.IsNullOrEmpty(name))
        {
            await this.notifications.TenantNameCouldNotBeFound(context);
        }
        else
        {
            Tenant tenant = this.tenantResolver.GetTenant(name);
            if (tenant == null)
            {
                await this.notifications.TenantCouldNotBeResolved(context);
            }
            else
            {
                await this.notifications.TenantResolved(context, tenant);
                await next();
            }
        }

    }
}

MultitenancyNotifications.cs
   
public class MultitenancyNotifications
{
    public Func<IOwinContext, Task> TenantCouldNotBeResolved { get; set; }
    public Func<IOwinContext, Task> TenantNameCouldNotBeFound { get; set; }
    public Func<IOwinContext, Tenant, Task> TenantResolved { get; set; }

    public MultitenancyNotifications()
    {
        this.TenantCouldNotBeResolved = context => Task.FromResult(0);
        this.TenantNameCouldNotBeFound = context => Task.FromResult(0);
        this.TenantResolved = (context, tenant) => Task.FromResult(0);
    }
}

AppBuilderExtensions.cs
   
public static IAppBuilder UseMultitenancy(this IAppBuilder app, WhenMiddleware whenMiddleware, MultitenancyNotifications notifications)
{
    Func<IOwinContext, Func<Task>, Task> conditionalNext = (context, next) =>
        ServiceLocator.Resolve<MultitenancyMiddleware>(new {notifications = notifications})
            .Invoke(context, next);

    app.Use((context, next) => whenMiddleware.Invoke(context,  conditionalNext, next));

    return app;
}
Service locator is using Castle Windsor DI. Castle Windsor allows you to resolve services with anonymous types:
 
((IWindsorContainer)container).Resolve<T>(argumentsAsAnonymousType);

*Note: Code in this article is not production ready and is used for prototyping purposes only. If you have suggestions or feedback please do comment.

Sunday, 2 July 2017

Testing Data for Randomness

Movie "Contact 1997"
When I was a young man I watched “Contact 1997” movie, I don’t know why but it really stuck with me. I have recently re-watched it. There were few scenes where Eleanor Arroway listened to space noise on certain frequency, eventually she heard something. It was awesome. Now, in reality I imagine what she was doing would not work at all. There are many dishes recording data on different frequencies, there would be simply too much data for one person to go through. For fun, I have decided to write a program that can detect non-randomness automatically, this way Eleanor Arroway would not need to waste hours listening to space noise.

Non Randomness Detection

To make non randomness detection work I will use pattern discovery algorithm and birthday problem calculation that I have blogged about earlier. Pattern discovery algorithm will discover patterns in the provided data. Birthday problem calculation will be given three parameters, size of the data, all possible pattern permutations given the number of unique characters and length in identified pattern, and number of unique pattern matches. If probability of a given pattern is not in the range then it’s highly likely that pattern is not random.

   

public class RandomnessDetection
{
    public static List<KeyValuePair<string, List<int>>> FindNonRandomPatterns(Dictionary<string, List<int>> patterns, string raw)
    {
        if (patterns == null || patterns.Count == 0)
            throw new Exception("No patterns found, cannot test.");

        int uniqueChars = raw.Distinct().Count();
        int rawLength = raw.Length;

        if (uniqueChars == 1)
            throw new Exception("Need to have more than one character");

        if (rawLength < 29)
            throw new Exception("String needs to be at least 30 chars long");

        List<KeyValuePair<string, List<int>>> nonRandomPatterns = new List<KeyValuePair<string, List<int>>>();

        foreach (KeyValuePair<string, List<int>> pattern in patterns)
        {
            double possibleCombinations = Math.Pow(uniqueChars, pattern.Key.Length);
            double prob = BirthdayProblemPoissonApprox.ProbabilityOfPair(rawLength - 2, possibleCombinations, pattern.Value.Count);

            if (Math.Truncate(prob * 100) / 100 <= 0.01)
                nonRandomPatterns.Add(pattern);
        }

        return nonRandomPatterns;
    }
}


Command-line interface

Now we have all the key ingredients. We just need create a program that we can interact with and we can start testing data for randomness. I have decided to use Command Line Parser Library to parse args for me, it’s awesome. You will notice that I have allowed for binary content checking. Unfortunately when I have designed pattern discovery algorithm I have not designed it to work with the binary data, so to make that work I had to encode data to base64.

   

class Program
{
    class Options
    {
        [Option('f', "file", Required = true, HelpText = "Input file to be processed.")]
        public string InputFile { get; set; }

        [Option('b', "fileisbinary", DefaultValue = false, HelpText = "If file is binary it will be encoded before pattern search.")]
        public bool FileIsBinary { get; set; }

        [Option('s', "showallpatterns", DefaultValue = false, HelpText = "Shows all patterns random and non-random.")]
        public bool ShowAllPatterns { get; set; }

        [Option('v', "verbose", DefaultValue = true,
          HelpText = "Prints all messages to standard output.")]
        public bool Verbose { get; set; }

        [ParserState]
        public IParserState LastParserState { get; set; }

        [HelpOption]
        public string GetUsage()
        {
            return HelpText.AutoBuild(this,
              (HelpText current) => HelpText.DefaultParsingErrorsHandler(this, current));
        }
    }

    static void Main(string[] args)
    {
        var options = new Options();
        if (CommandLine.Parser.Default.ParseArguments(args, options))
        {
            if (options.Verbose)
            {
                Console.WriteLine("file: " + options.InputFile);
                Console.WriteLine("fileisbinary: " + options.FileIsBinary);
                Console.WriteLine("showallpatterns: " + options.ShowAllPatterns);
            }

            try
            {
                if (!File.Exists(options.InputFile))
                {
                    Console.WriteLine("File path doesn't exist, can't do pattern discovery");
                    return;
                }

                string raw = string.Empty;

                if (options.FileIsBinary)
                    raw = System.Convert.ToBase64String(File.ReadAllBytes(options.InputFile));
                else
                    raw = File.ReadAllText(options.InputFile);

                if (String.IsNullOrEmpty(raw))
                {
                    Console.WriteLine("Input File is empty");
                    return;
                }

                Console.WriteLine("Starting to discover patterns");

                Dictionary<string, List<int>> patterns = RepeatingPatternDiscovery.DiscoverPatterns(raw);

                Console.WriteLine($"Discovered {patterns.Count} patterns, testing them for randomness");

                List<KeyValuePair<string, List<int>>> nonRandomPatterns = RandomnessDetection.FindNonRandomPatterns(patterns, raw);

                if (nonRandomPatterns.Count != 0)
                {
                    Console.WriteLine($"Data is not random, {nonRandomPatterns.Count} non random patterns found:");

                    foreach (KeyValuePair<string, List<int>> pattern in nonRandomPatterns)
                    {
                        Console.WriteLine($"Pattern: {pattern.Key}, found: {pattern.Value.Count} times, Indexes: [{String.Join(", ", pattern.Value)}]");
                    }
                }
                else
                {
                    Console.WriteLine("Data is random");
                }

                if (options.ShowAllPatterns)
                {
                    Console.WriteLine("All patterns found:");

                    foreach (KeyValuePair<string, List<int>> pattern in patterns)
                    {
                        Console.WriteLine($"Pattern: {pattern.Key}, found: {pattern.Value.Count} times, Indexes: [{String.Join(", ", pattern.Value)}]");
                    }
                }

            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error: {ex.Message}");
            }
        }
    }


Testing text for randomness

To start, I am going to test text_blogpost.txt and text_pi.txt for randomness. Is π random?
C:\data>randomnesstest -f C:\data\text_pi.txt
file: C:\data\text_pi.txt
fileisbinary: False
showallpatterns: False
Starting to discover patterns
Discovered 228688 patterns, testing them for randomness
Data is random

Apparently π is random. How about one of my blog posts?
C:\data>randomnesstest -f C:\data\text_blogpost.txt
file: C:\data\text_blogpost.txt
fileisbinary: False
showallpatterns: False
Starting to discover patterns
Discovered 559 patterns, testing them for randomness
Data is not random, non random patterns found:
Pattern:
    public Purchase Checkout(Cart cart)
    {
        Purchase purchase = new Purchase()
        {
             Customer = this,
             Products = cart.Products,
             Created = DateTime.Now
        };
        this.purchases.Add(purchase);
        return purchase;
    }
}

, found: 2 times, Indexes: [6718, 8033]
...
My blog post is not random. That’s fantastic news! Sometimes I feel like it is.

Testing pictures for randomness

Random noise picture:
white noise
C:\data>randomnesstest -f C:\data\pic_noise.png -b
file: C:\data\pic_noise.png
fileisbinary: True
showallpatterns: False
Starting to discover patterns
Discovered 7685 patterns, testing them for randomness
Data is random

Van Gogh, Bridge In The Rain picture:
van gogh rain
C:\data>randomnesstest -f C:\data\pic_vangogh.jpg -b
file: C:\data\pic_vangogh.jpg
fileisbinary: True
showallpatterns: False
Starting to discover patterns
Discovered 432021 patterns, testing them for randomness
Data is not random, non random patterns found:
Pattern: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA, found: 2 times, Indexes: [1311, 1312]
Pattern: CgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCg, found: 2 times, Indexes: [3388, 3392]
Pattern: DQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0N, found: 2 times, Indexes: [3508, 3512]
Pattern: AAAAAWFlaIAAAAAAAA, found: 2 times, Indexes: [1383, 3319]
Pattern: TWF1cmljZSBUcm9tc, found: 2 times, Indexes: [100, 548]
Pattern: ZGM6Y3JlYXRvcj4gP, found: 2 times, Indexes: [508, 596]
Pattern: EBAQAAAAAAAAAAAA, found: 2 times, Indexes: [3594, 3742]
...

Testing audio for randomness

For a final test I am going to test white noise and building noise for randomness. I have downloaded white noise file from here.
C:\data>randomnesstest -f C:\data\sound_whitenoise.wav -b
file: C:\data\sound_whitenoise.wav
fileisbinary: True
showallpatterns: False
Starting to discover patterns
Discovered 204117 patterns, testing them for randomness
Data is random
C:\data>randomnesstest -f C:\data\sound_buildingnoise.m4a -b
file: C:\data\sound_buildingnoise.m4a
fileisbinary: True
showallpatterns: False
Starting to discover patterns
Discovered 61762 patterns, testing them for randomness
Data is not random, non random patterns found:
Pattern: AwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwM, found: 2 times, Indexes: [262938, 262950]
Pattern: AAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAA, found: 2 times, Indexes: [256227, 256387]
Pattern: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA, found: 2 times, Indexes: [256274, 256275]
Pattern: aGQAAAAA0v3TfNL903wAAKxEAB, found: 2 times, Indexes: [256180, 256468]
Pattern: AAAAAAAAAAQAAAAAAAAAAAA, found: 2 times, Indexes: [256223, 256691]
Pattern: AAACvAAAAqgAAAKoAAAC, found: 2 times, Indexes: [256999, 261399]
Pattern: AAAArwAAAKwAAAClAAAA, found: 2 times, Indexes: [259132, 260604]
Pattern: AAACrAAAArwAAAKwAAAC, found: 2 times, Indexes: [259159, 260599]
...


Conclusion

So, there you go, we have just saved Eleanor Arroway centuries worth of work. Just to be clear, I’ve done this for fun, I am not expecting this to be super precise and to be used in production systems, there are many other much more sophisticated and proven ways that you can use to test for randomness.

Friday, 23 June 2017

Birthday Problem More Than 2 People and Reverse

More Than 2 People Sharing The Same Birthday


There are very few coding examples out there when it comes to calculating the probability of more than 2 people sharing the same birthday. To make this work I have used this Poisson approximation equation, in generalised form this equation looks like this:

n = random people
g = pairs
d = days
r = probability



Here is the source code:
  public static double ProbabilityOfPair(int numOfRandomInstances, int uniquePossibilities, int numOfPairs = 2)
{
    if (numOfPairs < 2)
        throw new Exception("Cannot have less than two pairs");

    if (numOfRandomInstances < 1 || uniquePossibilities < 1)
        throw new Exception("Number of random instances or unique possibilities needs to be greater than 0");

    double bionomial = calcBinomialCoefficient(numOfRandomInstances, numOfPairs);
    double totalPairPossibilities = Math.Pow(uniquePossibilities, numOfPairs - 1);
    return 1 - Math.Exp(-(bionomial / totalPairPossibilities));
}

static double calcBinomialCoefficient(double n, double k)
{
    if (k > n)
        throw new Exception("n must be greater than k");

    double result = 1;
    for(int i = 1; i <= k; i++)
    {
        result *= n - (k - i);
        result /= i;
    }
    return result;
}

Birthday Problem Reverse


Now what do you do if you want to reverse the birthday problem and instead get back number of random people required to reach certain probability? For example, instead of this:
  ProbabilityOfPair(23, 365, 2) // returns 0.5000017521827107
You would like to do something like this:
  FindNumOfRandomInstancesRequired(365, 2, 0.50) // returns 23

To begin with I’ve used ProbabilityOfPair function and just did a linear search. This did not scale very well at all, so I have decided to use binary search algorithm instead. This has worked very well, however it still felt inefficient. So, I have decided to rearrange the above equation, this is what I have ended up with:

n = random people
g = pairs
d = days
r = probability



Here is the source code:
  public static int FindNumOfRandomInstancesRequired(int uniquePossibilities, int numOfPairs, double requiredProbability)
{
    if (requiredProbability < 0.00 || requiredProbability >= 1.00)
        throw new Exception("Required probability must be between 0.00 and 1.00");

    double inverse = -Math.Log(1.0 - requiredProbability);
    double bionomialFact = factorial(numOfPairs);
    double totalPairPossibilities = Math.Pow(uniquePossibilities, numOfPairs - 1);
    double precision = (numOfPairs - 1.0) / 2.0;
    double numOfRandomInstancesRequired = Math.Pow(bionomialFact * totalPairPossibilities * inverse, 1.0 / numOfPairs) + precision;
            
    return (int)Math.Round(numOfRandomInstancesRequired, 0, MidpointRounding.AwayFromZero);
}

static int factorial(int i)
{
    if (i <= 1)
        return 1;
    return i * factorial(i - 1);
}

Full Source Code (licensed under the GNU licence v3.0)

  public class BirthdayProblemPoissonApprox
{
    /// <summary>
    /// Generalised birthday problem probability calculation using poisson approximation.  
    /// </summary>
    /// <example>
    ///  ProbabilityOfPair(23, 365), returns 0.5000017521827107
    ///  ProbabilityOfPair(30, 365, 3), returns 0.030015086543320635
    /// </example>
    /// <param name="numOfRandomInstances">Number of random instances in the group e.g. number of random people in the group</param>
    /// <param name="uniquePossibilities">Number of unique possibilities that random instances can fit into e.g. number of unique days</param>
    /// <param name="numOfPairs">Number of pairs</param>
    /// <see cref="https://en.wikipedia.org/wiki/Birthday_problem"/>
    /// <remarks>To make this scale beyond 2 pairs, I have used equations from here: https://math.stackexchange.com/questions/25876/probability-of-3-people-in-a-room-of-30-having-the-same-birthday </remarks>
    /// <returns>Probability of pair</returns>
    public static double ProbabilityOfPair(int numOfRandomInstances, int uniquePossibilities, int numOfPairs = 2)
    {
        if (numOfPairs < 2)
            throw new Exception("Cannot have less than two pairs");

        if (numOfRandomInstances < 1 || uniquePossibilities < 1)
            throw new Exception("Number of random instances or unique possibilities needs to be greater than 0");

        double bionomial = calcBinomialCoefficient(numOfRandomInstances, numOfPairs);
        double totalPairPossibilities = Math.Pow(uniquePossibilities, numOfPairs - 1);
        return 1 - Math.Exp(-(bionomial / totalPairPossibilities));
    }
    /// <summary>
    /// Generalised birthday problem reversed, find number of random instances required to fit certain probability.
    /// </summary>
    /// <example>
    /// FindNumOfRandomInstancesRequired(365, 2, 0.50), returns 23
    /// FindNumOfRandomInstancesRequired(365, 3, 0.03), returns 30
    /// </example>
    /// <param name="uniquePossibilities">Number of unique possibilities that random instances can fit into e.g. number of unique days</param>
    /// <param name="numOfPairs">Number of pairs</param>
    /// <param name="requiredProbability">Probability that must be reached</param>
    /// <returns>Number of random instances required</returns>
    public static int FindNumOfRandomInstancesRequired(int uniquePossibilities, int numOfPairs, double requiredProbability)
    {
        if (requiredProbability < 0.00 || requiredProbability >= 1.00)
            throw new Exception("Required probability must be between 0.00 and 1.00");

        double inverse = -Math.Log(1.0 - requiredProbability);
        double bionomialFact = factorial(numOfPairs);
        double totalPairPossibilities = Math.Pow(uniquePossibilities, numOfPairs - 1);
        double precision = (numOfPairs - 1.0) / 2.0;
        double numOfRandomInstancesRequired = Math.Pow(bionomialFact * totalPairPossibilities * inverse, 1.0 / numOfPairs) + precision;
            
        return (int)Math.Round(numOfRandomInstancesRequired, 0, MidpointRounding.AwayFromZero);
    }

    /// <summary>
    /// Calculates efficiently number of possible combinations, "n choose k". 
    /// </summary>
    /// <example>For example, suppose you have a deck of 5 cards (n = 5) and you want to know how many different hands of 3 cards you can make (k = 3). If you number the cards 1 through 5, then you will get 10 possible combinations.</example>
    /// <param name="n">Total set of items</param>
    /// <param name="k">Number of items to pick from n</param>
    /// <see cref="http://www.vb-helper.com/howto_calculate_n_choose_k.html"/>
    /// <returns></returns>
    static double calcBinomialCoefficient(double n, double k)
    {
        if (k > n)
            throw new Exception("n must be greater than k");

        double result = 1;
        for(int i = 1; i <= k; i++)
        {
            result *= n - (k - i);
            result /= i;
        }
        return result;
    }

    static int factorial(int i)
    {
        if (i <= 1)
            return 1;
        return i * factorial(i - 1);
    }
}

Sunday, 11 June 2017

Repeating String Pattern Search

In this post I am going to introduce basic repeating string pattern search algorithm that I have written for fun. This algorithm’s average running time is Θ(N), worst running time is O(N^2) (see analysis below). This algorithm takes ~1 minute to search 10 million digits of Pi.

Example usage

  
string raw = "091283091283091831093810933";
Dictionary<string, List<int>> patterns = DiscoverPatterns(raw);
foreach (KeyValuePair<string, List<int>> pattern in patterns)
{
    Console.WriteLine($"Pattern: {pattern.Key}, Indexes: {String.Join(", ", pattern.Value)}");
}

Example output

Pattern: 091283091, Indexes: 0, 6
Pattern: 1093, Indexes: 17, 22

Source Code (licensed under the GNU licence v3.0)

  
public static Dictionary<string, List<int>> DiscoverPatterns(string raw, int patternSizeMin = 2)
{
    if (patternSizeMin < 1)
        throw new Exception("Minimum pattern size should not be less than 1.");

    Dictionary<string, HashSet<int>> initialPatternSearch = new Dictionary<string, HashSet<int>>();
    Dictionary<string, HashSet<int>> discoveredPatterns = new Dictionary<string, HashSet<int>>();

    int numberOfInitialSearches = raw.Length - (patternSizeMin - 1);
    for (int i = 0; i < numberOfInitialSearches; i++)
    {
        string pattern = raw.Substring(i, patternSizeMin);

        HashSet<int> patternIndexes;
        if (initialPatternSearch.TryGetValue(pattern, out patternIndexes))
        {
            discoveredPatterns[pattern] = patternIndexes;
        }
        else
        {
            patternIndexes = new HashSet<int>();
            initialPatternSearch.Add(pattern, patternIndexes);
        }

        patternIndexes.Add(i);
    }

    initialPatternSearch.Clear();

    Dictionary<string, HashSet<int>> basePatterns = discoveredPatterns;
    do
    {
        Dictionary<string, HashSet<int>> superPatterns = discoverSuperPatterns(raw, ++patternSizeMin, basePatterns);

        foreach (KeyValuePair<string, HashSet<int>> superPattern in superPatterns)
        {
            string basePattern = superPattern.Key.Substring(0, patternSizeMin - 1);
            discoveredPatterns.Remove(basePattern);

            string baseInnerPattern = superPattern.Key.Substring(1, patternSizeMin - 1);
            discoveredPatterns.Remove(baseInnerPattern);

            discoveredPatterns.Add(superPattern.Key, superPattern.Value);
        }

        basePatterns = superPatterns;
    } while (basePatterns.Count != 0);

    return discoveredPatterns.OrderByDescending(pattern => pattern.Key.Length).ToDictionary(pattern => pattern.Key, x => x.Value.ToList());
}

private static Dictionary<string, HashSet<int>> discoverSuperPatterns(string raw, int patternSize, Dictionary<string, HashSet<int>> basePatterns)
{
    Dictionary<string, HashSet<int>> initialSuperPatternSearch = new Dictionary<string, HashSet<int>>();
    Dictionary<string, HashSet<int>> discoveredSuperPatterns = new Dictionary<string, HashSet<int>>();

    foreach (KeyValuePair<string, HashSet<int>> basePattern in basePatterns)
    {
        foreach (int patternIndex in basePattern.Value)
        {
            if (raw.Length >= (patternIndex + patternSize))
            {
                string superPattern = raw.Substring(patternIndex, patternSize);

                HashSet<int> superPatternIndexes;
                if (initialSuperPatternSearch.TryGetValue(superPattern, out superPatternIndexes))
                {
                    discoveredSuperPatterns[superPattern] = superPatternIndexes;
                }
                else
                {
                    superPatternIndexes = new HashSet<int>();
                    initialSuperPatternSearch.Add(superPattern, superPatternIndexes);
                }

                superPatternIndexes.Add(patternIndex);
            }
        }
    }

    return discoveredSuperPatterns;
}

Analysis


Worst Case

If you provide input such as 010101…01 then running time will be ((N(N-1))/2)-4, this gives us a triangular number. This is because this input is just one big continuous pattern and it’s discovering patterns one character at the time i.e. (N-1)+(N-2)+(N-3)…+(N-(N-2)). If we drop all terms that grow slowly we end up with O(N^2).

Average Case

If input is not a big continuous pattern but has many small sub patterns then running time will be X*N. If we drop all terms that grow slowly we end up with average running time Θ(N).

Thursday, 9 March 2017

Cloud Architecture & Engineering, B2B Revolution and B2C Evolution


This year I will be posting series of articles about Cloud architecture, it will be my attempt to make multitenancy, cloud and SaaS less theoretical. Articles that have been / being written:

Few years ago I was having a conversation with one of my friends about cloud, SaaS and multitenancy architecture. I was telling him how different it was to normal “hosted” app architecture. That we can setup new business tenant with custom url in seconds. He has said, well, instant tenant provisioning and custom URLs are not new. So what’s so special about it? Blogger and WordPress have been doing it for years. At that point in time I have struggled to answer his question, there was too much to say, ideas in my head were too raw. In this article I am going to explain why cloud, SaaS and multitenancy is revolutionary.



Before we start, let’s get few definitions out of the way:

Model
Description
Example
B2B, Business Applications Business sells to businesses Oracle, Visual Studio Team Services, Sage & Salesforce
B2C, Consumer Applications Business sells to consumers Spotify, Netflix & Blogger.

Your solution’s technical requirements will differ a lot depending on what business model you follow, we will be compare typical B2C and B2B requirements shortly.

Term
Somewhat My Definition
Software as a Service (SaaS) This is where you don’t care about software versioning, upgrades, deployments, where it sits, etc. You just pay for it and use it. If you are using Spotify, you are using SaaS.
Cloud Before cloud we had hosting companies, where you could rent a VM and host your site. Now we have Cloud, which is the same thing but it’s just much smarter and comes with lots of tools. Companies such as Microsoft offer prepackaged services that seamlessly scale your apps horizontally and all you have to do is just upload your binaries via FTP to a folder.
Multitenancy Traditionally each company would get their own “installation”, with multitenancy there is only one installation and everyone is seamlessly using it.

These concepts are very important as they will feed of each other, as I will illustrate this later on.
People use cloud, multitenancy and SaaS as synonyms and this is a mistake (I make this mistake as well).

B2C, Consumer Applications


When it comes to selling directly to customers things are easier, there are a lot less hoops to jump through. Here are some technical requirements that you might need to follow when you build B2C app:

Requirement Description
Availability Consumer facing sites, need to be highly available, if they are not, this can cause reputation damage and loss in revenue.
Authentication Consumers need to be able to login using some form of login, Google, Facebook, etc.
PCI DSS When customers make payments it’s important that card data is used & stored securely.
Recovery Point & Time Objectives Similiar to availability it’s critical that if data was corrupted or upgrade has gone wrong that app can recover quickly with as much data as possible.


B2C architecture before cloud:


B2C architecture after cloud:



This list is not complete, however the point is that typical B2C apps don’t have lots of technical requirements enforced upon them from consumers. This is why B2C service is so much cheaper, how much do you pay for Netflix and Spotify?


B2B, Business Applications


Here are some technical requirements that businesses often enforce on to other businesses:

Requirement Description
Corporate Identity Integration (Single Sign On Authentication) Medium/Large size companies have their own identity apps such as AzureAD, they don’t want to use your built in authentication mechanism if they can enable Single Sign On. This is where Federated identity comes in.
Audit They would like assurance and full trail of what’s happening inside the system, so they will ask for audit.
Data restore & Backup If company makes a mistake in their environment they will want to restore their data back to how it was.
Integration Often they have their own systems and they will want to integrate their existing applications with your application.
Security clearance Depending on the industry your business customers might want your ops staff to be security cleared, if this is not possible then sensitive data needs to be hidden or encrypted.
Escrow Businesses need to ensure businesses continuity, to do this they need to be able keep going even your business will go out of business.
ISO 27001 Key management, Antivirus, Firewalls, Segregation of duties, Pen testing, etc.
Reporting Access to data to be able to report on it.
Customisation & Configuration Data retention, authorisation, workflow, etc.
Availability Business sites don’t necessarily need to be as highly available as in a lot of industries people work 9am-5pm.
PCI DSS When businesses make payments it’s important that card data is used & stored securely.
Recovery Point & Time It’s critical that if data was corrupted or upgrade has gone wrong that app can recover quickly, business finance or patient data can’t be lost, huge amount of productivity can be lost if this not designed correctly.
Support Be able to troubleshoot issues quickly.
On-prem installation Company is able to install your software in their own environment.

This list is not complete, however if you compare B2B to B2C it’s easy to see that there is a lot more to it. So when companies charge other companies a lot more for their services we can appreciate why this happens.


B2B architecture before cloud and multitenancy:




B2B architecture after cloud and multitenancy:


So where is the revolution?


Consumer facing sites were doing SaaS with multitenancy for years. Websites like YouTube and Ebay have been scaling their solutions for a long time. Cloud has just made it easier for them to scale and in turn it has reduced their hosting cost. Business applications were doing SaaS for a long time, however multitenancy and cloud was just not a thing. Most service providers painfully installed software for each customer manually, then, they upgraded each customer one at the time and they monitored each customer one at the time. You get the picture. Customer provisioning took weeks and maybe months. Multitenancy offers something truly awesome, it removes slow per customer installation and upgrade problem and in turn it dramatically speeds up the software delivery and customer onboarding process. Cloud removes the infrastructure maintenance and provisioning burden. Together they simply transform business in a remarkable way. B2B apps can finally compete with B2C apps in terms of feature delivery speed and cost.

Conclusion


Let’s come back to the conversation with my friend. He thought that nothing was new here because he was thinking of B2C evolution. While I was thinking of B2B where multitenancy and cloud is transforming businesses and revolutionising business applications.

Sunday, 12 February 2017

Azure App Services Web App Antivirus - Concept

Introduction


Good web application antivirus solution needs to meet the following acceptance criteria:
  • Real-time inbound and outbound file scanning
  • Low latency, sub second performance
  • Transparent, no need to make changes to your web app
  • Requires no maintenance and is easy to setup

If you are using PaaS service such as Azure Web Apps your antivirus options are:

Option Limitation Installation Effort Maintenance Effort Cost Meets criteria
Firewall with ICAP integration Expense and ongoing maintenance High Medium High Yes
Send files to the antivirus API from your app Extra unnecessary development work Medium/High Medium Low No
Background service that scans your files Not real time, high possibility that viruses will be uploaded and downloaded Medium Low Low No
Reverse proxy binary content forwarding Scans only uploads Medium Low Low No
IIS Filter (this solution) PoC, not production ready Medium Low Low Yes

In depth exploration of different antivirus options is beyond the scope this article.


IIS Filter Solution, inspired by ModSecurity





This article will only cover the interesting parts of the solution, if you have questions please do comment.


AVFilter, File Upload/Download Filter


Filtering uploads and downloads is relatively straight forward thanks to IIS http modules, read more about them here.


Filtering Uploads


Here is an example of how you can detect file upload inside the http module:
 private void Context_PreRequestHandlerExecute(object sender, EventArgs e)
  {
      HttpApplication httpApplication = (HttpApplication)sender;
      HttpRequest httpRequest = httpApplication.Context.Request;
      
      if (httpRequest.RequestType == WebRequestMethods.Http.Post)
      {
          if (httpRequest.Files.Count != 0)
          {
              //Check files
          }
      }
  }
Take a look at the full implementation here.


Filtering Downloads


When it comes to intercepting downloads it’s not as simple. Unfortunately outgoing traffic is written to the System.Web.HttpResponseStreamFilterSink and this stream is write only, so you can’t read it. So the only option that I and Google can think of is to proxy the outgoing stream. To make this happen I have created proxy class that changes write only steam to write/read stream, I have conveniently called it ReadWriteProxyStream, take a look at the implementation here.

Now that we have ReadWriteProxyStream class, we need to hook it up response filter, this is done when web request is first received:
    private void Context_PreRequestHandlerExecute(object sender, EventArgs e)
    {
        HttpApplication httpApplication = (HttpApplication)sender;
        
        httpApplication.Context.Response.Filter = new ReadWriteProxyStream(httpApplication.Context.Response.Filter);
    }
Take a look at the full implementation here.

Before web response is returned back to user we need to check it, we can do this by reading the stream:
    private void Context_PreSendRequestContent(object sender, EventArgs e)
    {
        HttpApplication httpApp = (HttpApplication)sender;
        HttpResponse httpResponse = httpApp.Context.Response;
        
        if(httpResponse.ContentType == "application/octet-stream")
        {
            ReadWriteProxyStream filter = httpResponse.Filter as ReadWriteProxyStream;

            if (filter != null)
            {
                byte[] data = new byte[filter.Length];
                filter.Position = 0;
                filter.Read(data, 0, data.Length);

                //Check files
            }
        }
    }
Take a look at the full implementation here.


AVFileReceiver, Virus Scanning


For this proof of concept I have used Symantec virus scanner which comes with CLI, and to test it I have used EICAR test virus. EICAR test virus is harmless, it can’t actually do anything to your machine.

Symantec CLI will scan the specified file synchronously, so if the file is no longer there you know that it was not safe:
  using (Process process = Process.Start(
        "C:\\Program Files (x86)\\Symantec\\Symantec Endpoint Protection\\DoScan.exe",
        "/ScanFile " + filePath))
    {
        process.WaitForExit();
    }

    if (!File.Exists(filePath))
        throw new Exception("File was not safe!")
Take a look at the full implementation here.



Final Thoughts


If you like this concept then please do share and rate this blog post. If this post will generate enough interest I will productionise this concept. However, if you just can’t wait, then here is what you need to do to make this production ready:
  • Secure communication between AVFilter and AVFileReceiver
  • Secure AVFileReceiver access (authentication, network security group, etc)
  • Reduce VM maintenance through automation, use PowerShell files provided and place Cloud Service inside availability and upgrade group
  • Reduce memory consumption by intercepting binary responses only
  • Make debugging easier by logging exceptions
  • Improve performance by forwarding binary content using MTOM encoding to the AVFileReceiver and remove AVFilter temp storage
  • Improve scalability and responsiveness by using Async in the AVFileReceiver and AVFilter