Sunday, 3 December 2017

ASP.NET MVC Multitenancy, Part 3 - NHibernate Tenant Data Filtering and Shared Database

In part 1 we have established tenant context. In part 2 we have configured authentication. Now we will be getting tenant data out of our RDBMS. Before we start storing and retrieving data it's important that you figure out what kind of multitenancy data segregation model you are going to go for. This is an important point and it really depends on your business requirements. To help you choose check out this Microsoft article: Multi-tenant SaaS database tenancy patterns.

My sample app is called “Stop The Line”, it is very basic. It holds very little data, and I have no idea if it's going to take off or not. I have decided to go with "Multi-tenant app with a single multi-tenant database" pattern. Put simply, this means that all my tenants are going to share the same database and I am going to use foreign key to discriminate between them. This is one the cheapest and fastest ways to implement tenancy.

When it comes to development what I don't want to do is to specify tenant id in each query and I don't want to manually specify tenant id when I create and update my domain objects. Remember the onion architecture, as far as I am concerned tenancy is an infrastructure concern and it should stay out of my domain. After few minutes of searching I have found this great blog post that meets these requirements: Bolt-on Multi-Tenancy in ASP.NET MVC With Unity and NHibernate: Part II – Commingled Data. Let's incorporate this in to my sample app.

Implementation

In my sample app it's possible to query in multitenancy context and without. For example, if a new tenant is signing up for an account then we would not apply the multitenancy context. This is because TenantContext will not exist until tenant was created in the first place.

Also in this implementation I have tried to keep infrastructure decoupled so that it would not be too difficult to move to another database tenancy pattern.

Base Configuration

As there are going to be two different contexts, non-multitenant and multitenant. I have decided to create NHConfiguration base class.

NHConfiguration.cs
   
public abstract class NHConfiguration
{
    public abstract global::NHibernate.Cfg.Configuration Config { get; }
}

Below is the standard configuration that is going to be used for non-multitenant read/write.

Configuration.cs
   
public class Configuration : NHConfiguration
{
    public override NHibernate.Cfg.Configuration Config => config;

    private NHibernate.Cfg.Configuration config;

    public Configuration()
    {
        this.config = Fluently.Configure()
            .Database(MsSqlConfiguration.MsSql2012
                .ConnectionString(c => c.FromConnectionStringWithKey("default"))
#if DEBUG
                .ShowSql()
                .FormatSql()
#endif
                .AdoNetBatchSize(50)
            ).Mappings(m =>
                m.FluentMappings.AddFromAssembly(Assembly.GetExecutingAssembly())
            )
            .ExposeConfiguration(c => SchemaMetadataUpdater.QuoteTableAndColumns(c))
            .BuildConfiguration();
    }
}

This configuration will get injected into the Unit of Work. If you are new to this, then you should read Unit Of Work Abstraction For NHibernate or Entity Framework C# Example. I have omitted the class implementation.

   
public class NHUnitOfWork : IUnitOfWork
{
    //..
    private NHConfiguration nhConfiguration;

    public NHUnitOfWork(NHConfiguration nhConfiguration)
    {
        this.nhConfiguration = nhConfiguration;
        //..
    }
    //..
}

Unfortunately with this solution you will still need to add the private "tenantId" field to your domain classes.

   
public class Some 
{
    //..
    private Guid tenantId;
}

The good news is that you don't have to do with it anything inside your domain. It will be handled by infrastructure. The bad news is that it still there. I have spent few hours trying to remove it, I have used some reflection and proxy class generation techniques, unfortunately to no avail. If you find an elegant solution please do share it.

Multitenant Writes

This is where standard configuration is expanded and interceptor is set. Interceptor will be used when some object is updated or saved.

ConfigurationMultiTenancy.cs
   
public class ConfigurationMultiTenancy : Configuration
{
    public ConfigurationMultiTenancy(NHSharedDatabaseMultiTenancyInterceptor interceptor)
    {
        this.Config.SetInterceptor(interceptor);
    }
}

Above you will notice that NHSharedDatabaseMultiTenancyInterceptor is injected in to ConfigurationMultiTenancy. Here is the actual interceptor.

NHSharedDatabaseMultiTenancyInterceptor.cs
   
public class NHSharedDatabaseMultiTenancyInterceptor : EmptyInterceptor
{
    readonly TenantContext tenantContext;

    public NHSharedDatabaseMultiTenancyInterceptor(TenantContext tenantContext)
    {
        this.tenantContext = tenantContext;
    }

    public override bool OnSave(object entity, object id, object[] state, string[] propertyNames, IType[] types)
    {
        int index = Array.IndexOf(propertyNames, "tenantId");

        if (index == -1)
            return false;

        state[index] = this.tenantContext.ID;

        entity.GetType()
                .GetField("tenantId", BindingFlags.Instance | BindingFlags.NonPublic)
                .SetValue(entity, tenantContext.ID);

        return base.OnSave(entity, id, state, propertyNames, types);
    }
}

Multitenant Reads

Configuring reads is much simpler. You just need to enable a filter and provide the parameter. First you need to define the actual filter.

MultitenantFilter.cs
   
public class MultitenantFilter : FilterDefinition
{
    public MultitenantFilter()
    {
        WithName("MultitenantFilter").AddParameter("TenantId", NHibernateUtil.Guid);
    }
}

Then you need to enable it.

NHUnitOfWorkMultitenancy.cs
   
public class NHUnitOfWorkMultitenancy : NHUnitOfWork
{
    public NHUnitOfWorkMultitenancy(NHConfiguration nhConfiguration, TenantContext tenantContext) 
        : base(nhConfiguration)
    {
        this.Session.EnableFilter("MultitenantFilter").SetParameter("TenantId", tenantContext.ID);
    }
}

Finally you just need to apply it.

   
public class SomeMap : ClassMap<Some>
{
    public SomeMap()
    {
        //..
        //This is used for writes
        Map(Reveal.Member<Some>("tenantId")).Not.Nullable();

        //This is used for reads
        ApplyFilter<MultitenantFilter>("TenantId = :TenantId");
    }
}

*Note: Code in this article is not production ready and is used for prototyping purposes only. If you have suggestions or feedback please do comment.

Thursday, 30 November 2017

ASP.NET MVC Multitenancy, Part 2 - OpenID Connect

In part 1 we have established tenant context. Most business applications need authentication. I have decided to use OpenID Connect, luckily for me it comes with awesome middleware and it's very easy to configure. Here is how you would do it:
public class Startup
{
    public void Configuration(IAppBuilder app)
    {
        //...
        app.UseOpenIdConnectAuthentication(new OpenIdConnectAuthenticationOptions
        {
            ClientId = "clientid",
            Authority = 'authority",
            RedirectUri = "http://mysite.com/"
        });
    }
}

However, not so fast, for each tenant you will need to configure client id, authority, url, etc. This also needs to be done at runtime as you don't want to ship new code every time new tenant registers to use your software. I have looked around for a solution and it turns out that this is a known problem, just take a look at the The Grand Auth Redesign. After a surprising amount of searching I have stumbled upon this amazing Multi-tenant middleware pipelines in ASP.NET Core blog post by Ben Foster. This solution is so damn good. Please read it. Unfortunately it did not work for me out of the box I had to port it over from ASP.NET Core to ASP.NET.

Hopefully by blogging and linking to Ben's article it will be easier for others to find it.

Example Usage

public class Startup
{
    public void Configuration(IAppBuilder app)
    {
        //..
        app.UsePerTenant((tenantContext, appBranch) =>
        {
            appBranch.SetDefaultSignInAsAuthenticationType(CookieAuthenticationDefaults.AuthenticationType);

            appBranch.UseCookieAuthentication(new CookieAuthenticationOptions
            { 
                CookieName = $"OAuthCookie.{tenantContext.FriendlyName}"
            });

            appBranch.UseOpenIdConnectAuthentication(new OpenIdConnectAuthenticationOptions
            {
                ClientId = tenantContext.AuthClientId,
                Authority = tenantContext.AuthAuthority,
                RedirectUri = $"http://localhost:2295/{tenantContext.FriendlyName}/"
            });
        });
    }
}

ASP.NET Implementation


TenantPipelineMiddleware.cs
   
/// <summary>
/// Workaround for the known OpenID multitenancy issue https://github.com/aspnet/Security/issues/1179
/// based on http://benfoster.io/blog/aspnet-core-multi-tenant-middleware-pipelines and https://weblogs.asp.net/imranbaloch/conditional-middleware-in-aspnet-core designs 
/// </summary>
public class TenantPipelineMiddleware : OwinMiddleware
{
    readonly IAppBuilder rootApp;
    readonly Action<TenantContext, IAppBuilder> newBranchAppConfig;
    readonly ConcurrentDictionary<TenantContext, Lazy<Func<IDictionary<string, object>, Task>>> branches;

    public TenantPipelineMiddleware(OwinMiddleware next, IAppBuilder rootApp, Action<TenantContext, IAppBuilder> newBranchAppConfig)
        : base(next)
    {
        this.rootApp = rootApp;
        this.newBranchAppConfig = newBranchAppConfig;
        this.branches = new ConcurrentDictionary<TenantContext, Lazy<Func<IDictionary<string, object>, Task>>>();
    }

    public override async Task Invoke(IOwinContext context)
    {
        TenantContext tenantContext = context.GetTenantContext();

        if (tenantContext == null || tenantContext.IsEmpty())
        {
            await this.Next.Invoke(context);
            return;
        }

        Lazy<Func<IDictionary<string, object>, Task>> branch = 
            branches.GetOrAdd(tenantContext, new Lazy<Func<IDictionary<string, object>, Task>>(() =>
            {
                IAppBuilder newAppBuilderBranch = rootApp.New();
                newBranchAppConfig(tenantContext, newAppBuilderBranch);
                newAppBuilderBranch.Run((oc) => this.Next.Invoke(oc));
                return newAppBuilderBranch.Build();
            }));

        await branch.Value(context.Environment);
    }
}

OwinContextExtensions.cs
   
public static class OwinContextExtensions
{
    static string tenantContextKey = "tenantcontext";

    public static TenantContext GetTenantContext(this IOwinContext context)
    {
        if (context.Environment.ContainsKey(tenantContextKey))
        {
            return (TenantContext)context.Environment[tenantContextKey];
        }
        return null;
    }
}

AppBuilderExtensions.cs
   
public static class AppBuilderExtensions
{
    public static IAppBuilder UsePerTenant(this IAppBuilder app, Action<TenantContext, IAppBuilder> newBranchAppConfig)
    {
        return app.Use<TenantPipelineMiddleware>(app, newBranchAppConfig);
    }
}

*Note: Code in this article is not production ready and is used for prototyping purposes only. If you have suggestions or feedback please do comment.

Sunday, 20 August 2017

ASP.NET MVC Multitenancy, Part 1 - Routing with OWIN

This blog post was rewritten on 29/11/2017.

Recently I have been migrating one of my multitenant ASP.NET MVC application's to the OWIN middleware. This has presented me with an opportunity to change my initial tenant resolution and pipeline registration implementation.

I've started by doing some research to see if I could find something I could just download and use. However, I could not find anything that would meet the following requirements:
  • ASP.NET Core and ASP.NET compatibility
  • Optional multitenant processing i.e. all requests don't have be multitenant
  • Lifecycle delegates for missing tenant identifier and tenant record, and tenant context creation
  • Runtime tenant context dependency injection registration
  • Compatible with multitenant URL paths mysite.com/sometenant/ and could work with subdomains sometenant.mysite.com

After much deliberation, I have decided to go with my own implementation. However this implementation was massively influenced by Ben Foster's Building multi-tenant applications with ASP.NET Core (ASP.NET 5) and OpenID Connect Authentication Middleware designs.

Example Usage

public class Startup
{
    public void Configuration(IAppBuilder app)
    {
        //..
        app.UseMultitenancy(new MultitenancyNotifications<YourTenantRecord>
        {
            TenantIdentifierNotFound = context =>
            {
                throw new HttpException(404, "Tenant identifier must be provided");
            },
            TenantRecordNotFound = context =>
            {
                context.Response.Redirect("/signup/tenant/");
                return Task.FromResult(0);
            },
            CreateTenantContext = (context, tenantRecord) =>
            {
                ITenantContextFactory tenantContextFactory = ServiceLocator.Resolve<ITenantContextFactory>();
                TenantContext tenantContext = tenantContextFactory.Create(tenantRecord.Id, tenantRecord.NameFriendly, tenantRecord.AuthClientId, tenantRecord.AuthAuthority);
                return Task.FromResult(tenantContext);
            }
        });
    }
}
Please note that "CreateTenantContext" is a factory delegate. It allows you to control tenant record to tenant context mapping. It also allows you to perform other actions such as runtime tenant context dependency injection registration. By doing this, downstream objects such as controllers will get tenant context injected in to them.
public class SomeController : Controller
{
    public SomeController(TenantContext tenantContext)
    {
        //..
    }
}

Implementation


Tenant Identifier Extraction

In my implementation have decided to use MVC routing with data tokens for tenant discrimination.

RouteConfig.cs
   
public class RouteConfig
{
    public static void RegisterRoutes(RouteCollection routes)
    {
        //...
        routes.MapRoute(
            "multi",
            "{tenant}/{controller}/{action}/{id}",
            new {controller = "Home", action = "Index", id = UrlParameter.Optional}
        ).DataTokens.Add("name", "webclient_multitenancy");
    }
}
By using data tokens, I am able to check the request route. In this case if request route has data token of name "webclient_multitenancy" I will know that I will be able to extract tenant identifier from the route. As you have probably gathered, in this case I am getting multitenancy identifier from the URL path mysite.com/sometenant/.

ITenantIdentifierExtractor.cs
   
public interface ITenantIdentifierExtractor
{
    bool CanExtract(IOwinContext context);
    string GetIdentifier(IOwinContext context);
}

RouteDataTokensTenantIdentifierExtractor.cs
   
public class RouteDataTokensTenantIdentifierExtractor : ITenantIdentifierExtractor
{
    public bool CanExtract(IOwinContext context)
    {
        RouteData routeData = this.getRouteData(context);
        return routeData != null && routeData.DataTokens.ContainsValue("webclient_multitenancy");
    }

    public string GetIdentifier(IOwinContext context)
    {
        if (!this.CanExtract(context))
            return null;

        RouteData routeData = this.getRouteData(context);
        return routeData.GetRequiredString("tenant");
    }

    private RouteData getRouteData(IOwinContext context)
    {
        HttpContextBase httpContext = (HttpContextBase)context.Environment["System.Web.HttpContextBase"];
        return RouteTable.Routes.GetRouteData(httpContext);
    }
}

You don’t have to use data token approach in your application, you can get multitenancy data out of the HTTP request manually.

Tenant Record Resolution

Resolution is where you call service or repository to obtain tenant record, this tenant record will be used later on to create tenant context.

ITenantRecordResolver.cs
   
public interface ITenantRecordResolver<TTenantRecord>
{
    TTenantRecord GetTenant(string tenantIdentifier);
}

In this implementation, I have decided to call the ITenantService to get the TenantDto object.

TenantRecordResolver.cs
   
public class TenantRecordResolver : ITenantRecordResolver<TenantDto>
{
    readonly ITenantService tenantService;

    public TenantRecordResolver(ITenantService tenantService)
    {
        this.tenantService = tenantService;
    }

    public TenantDto GetTenant(string tenantIdentifier)
    {
        return this.tenantService.Get(tenantIdentifier);
    }
}

Most of the time on each request you will be resolving tenant record. At scale this will become very inefficient so it is highly likely that you will want to cache tenant record, to do this you can use decorator pattern.

TenantRecordResolverCacheDecorator.cs
   
public class TenantRecordResolverCacheDecorator : ITenantRecordResolver<TenantDto>
{
    readonly TenantRecordResolver tenantRecordResolver;

    public TenantRecordResolverCacheDecorator(TenantRecordResolver tenantRecordResolver)
    {
        this.tenantRecordResolver = tenantRecordResolver;
    }

    public TenantDto GetTenant(string tenantIdentifier)
    {
        string cacheKey = $"tenantIdentifier:{tenantIdentifier}";

        TenantDto tenant = (TenantDto)MemoryCache.Default[cacheKey];

        if (tenant != null)
        {
            return tenant;
        }

        tenant = this.tenantRecordResolver.GetTenant(tenantIdentifier);
        if (tenant == null)
        {
            return null;
        }

        MemoryCache.Default.Set(cacheKey, tenant, new CacheItemPolicy
        {
            AbsoluteExpiration = DateTimeOffset.Now.AddMinutes(10)
        });
    
        return tenant;
    }
}

Middleware & Extensions

We now have all the key components in place and it is time to implement the actual middleware.

MultitenancyMiddleware.cs
   
public class MultitenancyMiddleware<TTenantRecord>
    where TTenantRecord : class
{
    readonly ITenantIdentifierExtractor tenantIdentifierExtractor;
    readonly ITenantRecordResolver<TTenantRecord> tenantRecordResolver;
    readonly MultitenancyNotifications<TTenantRecord> notifications;
    readonly Func<Task> next;

    public MultitenancyMiddleware(Func<Task> next, ITenantIdentifierExtractor tenantIdentifierExtractor, 
        ITenantRecordResolver<TTenantRecord> tenantRecordResolver,  MultitenancyNotifications<TTenantRecord> notifications)
    {
        this.next = next;
        this.tenantIdentifierExtractor = tenantIdentifierExtractor;
        this.tenantRecordResolver = tenantRecordResolver;
        this.notifications = notifications;
    }

    public async Task Invoke(IOwinContext context)
    {
        if (!this.tenantIdentifierExtractor.CanExtract(context))
        {
            await this.next();
            return;
        }

        string identifier = this.tenantIdentifierExtractor.GetIdentifier(context);

        if (string.IsNullOrEmpty(identifier))
        {
            await this.notifications.TenantIdentifierNotFound(context);
            return;
        }

        TTenantRecord tenantRecord = this.tenantRecordResolver.GetTenant(identifier);
        if (tenantRecord == null)
        {
            await this.notifications.TenantRecordNotFound(context);
            return;
        }

        TenantContext tenantContext = await this.notifications.CreateTenantContext(context, tenantRecord);

        context.SetTenantContext(tenantContext);

        await this.next();
    }
}

MultitenancyNotifications.cs
   
public class MultitenancyNotifications<TTenantRecord>
    where TTenantRecord : class
{
    public Func<IOwinContext, Task> TenantIdentifierNotFound { get; set; }
    public Func<IOwinContext, Task> TenantRecordNotFound { get; set; }
    public Func<IOwinContext, TTenantRecord, Task<TenantContext>> CreateTenantContext { get; set; }

    public MultitenancyNotifications()
    {
        this.TenantIdentifierNotFound = context => Task.FromResult(0);
        this.TenantRecordNotFound = context => Task.FromResult(0);
        this.CreateTenantContext = (context, tenantRecord) => Task.FromResult<TenantContext>(null);
    }
}

OwinContextExtensions.cs
   
public static class OwinContextExtensions
{
    static string tenantContextKey = "tenantcontext";

    public static void SetTenantContext(this IOwinContext context, TenantContext tenantContext)
    {
        context.Environment.Add(tenantContextKey, tenantContext);
    }
}
AppBuilderExtensions.cs
   
public static class AppBuilderExtensions
{
    public static IAppBuilder UseMultitenancy<TTenantRecord>(this IAppBuilder app, 
        MultitenancyNotifications<TTenantRecord> notifications)
        where TTenantRecord : class
    {
        return app.Use((context, next) =>
        {
            MultitenancyMiddleware<TTenantRecord> multitenancyMiddleware = 
                ServiceLocator.Resolve<MultitenancyMiddleware<TTenantRecord>>(new { next, notifications });
            return multitenancyMiddleware.Invoke(context);
        });
    }
}


*Note: Code in this article is not production ready and is used for prototyping purposes only. If you have suggestions or feedback please do comment.

Sunday, 2 July 2017

Testing Data for Randomness

Movie "Contact 1997"
When I was a young man I watched “Contact 1997” movie, I don’t know why but it really stuck with me. I have recently re-watched it. There were few scenes where Eleanor Arroway listened to space noise on certain frequency, eventually she heard something. It was awesome. Now, in reality I imagine what she was doing would not work at all. There are many dishes recording data on different frequencies, there would be simply too much data for one person to go through. For fun, I have decided to write a program that can detect non-randomness automatically, this way Eleanor Arroway would not need to waste hours listening to space noise.

Non Randomness Detection

To make non randomness detection work I will use pattern discovery algorithm and birthday problem calculation that I have blogged about earlier. Pattern discovery algorithm will discover patterns in the provided data. Birthday problem calculation will be given three parameters, size of the data, all possible pattern permutations given the number of unique characters and length in identified pattern, and number of unique pattern matches. If probability of a given pattern is not in the range then it’s highly likely that pattern is not random.

   

public class RandomnessDetection
{
    public static List<KeyValuePair<string, List<int>>> FindNonRandomPatterns(Dictionary<string, List<int>> patterns, string raw)
    {
        if (patterns == null || patterns.Count == 0)
            throw new Exception("No patterns found, cannot test.");

        int uniqueChars = raw.Distinct().Count();
        int rawLength = raw.Length;

        if (uniqueChars == 1)
            throw new Exception("Need to have more than one character");

        if (rawLength < 29)
            throw new Exception("String needs to be at least 30 chars long");

        List<KeyValuePair<string, List<int>>> nonRandomPatterns = new List<KeyValuePair<string, List<int>>>();

        foreach (KeyValuePair<string, List<int>> pattern in patterns)
        {
            double possibleCombinations = Math.Pow(uniqueChars, pattern.Key.Length);
            double prob = BirthdayProblemPoissonApprox.ProbabilityOfPair(rawLength - 2, possibleCombinations, pattern.Value.Count);

            if (Math.Truncate(prob * 100) / 100 <= 0.01)
                nonRandomPatterns.Add(pattern);
        }

        return nonRandomPatterns;
    }
}


Command-line interface

Now we have all the key ingredients. We just need create a program that we can interact with and we can start testing data for randomness. I have decided to use Command Line Parser Library to parse args for me, it’s awesome. You will notice that I have allowed for binary content checking. Unfortunately when I have designed pattern discovery algorithm I have not designed it to work with the binary data, so to make that work I had to encode data to base64.

   

class Program
{
    class Options
    {
        [Option('f', "file", Required = true, HelpText = "Input file to be processed.")]
        public string InputFile { get; set; }

        [Option('b', "fileisbinary", DefaultValue = false, HelpText = "If file is binary it will be encoded before pattern search.")]
        public bool FileIsBinary { get; set; }

        [Option('s', "showallpatterns", DefaultValue = false, HelpText = "Shows all patterns random and non-random.")]
        public bool ShowAllPatterns { get; set; }

        [Option('v', "verbose", DefaultValue = true,
          HelpText = "Prints all messages to standard output.")]
        public bool Verbose { get; set; }

        [ParserState]
        public IParserState LastParserState { get; set; }

        [HelpOption]
        public string GetUsage()
        {
            return HelpText.AutoBuild(this,
              (HelpText current) => HelpText.DefaultParsingErrorsHandler(this, current));
        }
    }

    static void Main(string[] args)
    {
        var options = new Options();
        if (CommandLine.Parser.Default.ParseArguments(args, options))
        {
            if (options.Verbose)
            {
                Console.WriteLine("file: " + options.InputFile);
                Console.WriteLine("fileisbinary: " + options.FileIsBinary);
                Console.WriteLine("showallpatterns: " + options.ShowAllPatterns);
            }

            try
            {
                if (!File.Exists(options.InputFile))
                {
                    Console.WriteLine("File path doesn't exist, can't do pattern discovery");
                    return;
                }

                string raw = string.Empty;

                if (options.FileIsBinary)
                    raw = System.Convert.ToBase64String(File.ReadAllBytes(options.InputFile));
                else
                    raw = File.ReadAllText(options.InputFile);

                if (String.IsNullOrEmpty(raw))
                {
                    Console.WriteLine("Input File is empty");
                    return;
                }

                Console.WriteLine("Starting to discover patterns");

                Dictionary<string, List<int>> patterns = RepeatingPatternDiscovery.DiscoverPatterns(raw);

                Console.WriteLine($"Discovered {patterns.Count} patterns, testing them for randomness");

                List<KeyValuePair<string, List<int>>> nonRandomPatterns = RandomnessDetection.FindNonRandomPatterns(patterns, raw);

                if (nonRandomPatterns.Count != 0)
                {
                    Console.WriteLine($"Data is not random, {nonRandomPatterns.Count} non random patterns found:");

                    foreach (KeyValuePair<string, List<int>> pattern in nonRandomPatterns)
                    {
                        Console.WriteLine($"Pattern: {pattern.Key}, found: {pattern.Value.Count} times, Indexes: [{String.Join(", ", pattern.Value)}]");
                    }
                }
                else
                {
                    Console.WriteLine("Data is random");
                }

                if (options.ShowAllPatterns)
                {
                    Console.WriteLine("All patterns found:");

                    foreach (KeyValuePair<string, List<int>> pattern in patterns)
                    {
                        Console.WriteLine($"Pattern: {pattern.Key}, found: {pattern.Value.Count} times, Indexes: [{String.Join(", ", pattern.Value)}]");
                    }
                }

            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error: {ex.Message}");
            }
        }
    }


Testing text for randomness

To start, I am going to test text_blogpost.txt and text_pi.txt for randomness. Is π random?
C:\data>randomnesstest -f C:\data\text_pi.txt
file: C:\data\text_pi.txt
fileisbinary: False
showallpatterns: False
Starting to discover patterns
Discovered 228688 patterns, testing them for randomness
Data is random

Apparently π is random. How about one of my blog posts?
C:\data>randomnesstest -f C:\data\text_blogpost.txt
file: C:\data\text_blogpost.txt
fileisbinary: False
showallpatterns: False
Starting to discover patterns
Discovered 559 patterns, testing them for randomness
Data is not random, non random patterns found:
Pattern:
    public Purchase Checkout(Cart cart)
    {
        Purchase purchase = new Purchase()
        {
             Customer = this,
             Products = cart.Products,
             Created = DateTime.Now
        };
        this.purchases.Add(purchase);
        return purchase;
    }
}

, found: 2 times, Indexes: [6718, 8033]
...
My blog post is not random. That’s fantastic news! Sometimes I feel like it is.

Testing pictures for randomness

Random noise picture:
white noise
C:\data>randomnesstest -f C:\data\pic_noise.png -b
file: C:\data\pic_noise.png
fileisbinary: True
showallpatterns: False
Starting to discover patterns
Discovered 7685 patterns, testing them for randomness
Data is random

Van Gogh, Bridge In The Rain picture:
van gogh rain
C:\data>randomnesstest -f C:\data\pic_vangogh.jpg -b
file: C:\data\pic_vangogh.jpg
fileisbinary: True
showallpatterns: False
Starting to discover patterns
Discovered 432021 patterns, testing them for randomness
Data is not random, non random patterns found:
Pattern: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA, found: 2 times, Indexes: [1311, 1312]
Pattern: CgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCg, found: 2 times, Indexes: [3388, 3392]
Pattern: DQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0N, found: 2 times, Indexes: [3508, 3512]
Pattern: AAAAAWFlaIAAAAAAAA, found: 2 times, Indexes: [1383, 3319]
Pattern: TWF1cmljZSBUcm9tc, found: 2 times, Indexes: [100, 548]
Pattern: ZGM6Y3JlYXRvcj4gP, found: 2 times, Indexes: [508, 596]
Pattern: EBAQAAAAAAAAAAAA, found: 2 times, Indexes: [3594, 3742]
...

Testing audio for randomness

For a final test I am going to test white noise and building noise for randomness. I have downloaded white noise file from here.
C:\data>randomnesstest -f C:\data\sound_whitenoise.wav -b
file: C:\data\sound_whitenoise.wav
fileisbinary: True
showallpatterns: False
Starting to discover patterns
Discovered 204117 patterns, testing them for randomness
Data is random
C:\data>randomnesstest -f C:\data\sound_buildingnoise.m4a -b
file: C:\data\sound_buildingnoise.m4a
fileisbinary: True
showallpatterns: False
Starting to discover patterns
Discovered 61762 patterns, testing them for randomness
Data is not random, non random patterns found:
Pattern: AwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwM, found: 2 times, Indexes: [262938, 262950]
Pattern: AAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAA, found: 2 times, Indexes: [256227, 256387]
Pattern: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA, found: 2 times, Indexes: [256274, 256275]
Pattern: aGQAAAAA0v3TfNL903wAAKxEAB, found: 2 times, Indexes: [256180, 256468]
Pattern: AAAAAAAAAAQAAAAAAAAAAAA, found: 2 times, Indexes: [256223, 256691]
Pattern: AAACvAAAAqgAAAKoAAAC, found: 2 times, Indexes: [256999, 261399]
Pattern: AAAArwAAAKwAAAClAAAA, found: 2 times, Indexes: [259132, 260604]
Pattern: AAACrAAAArwAAAKwAAAC, found: 2 times, Indexes: [259159, 260599]
...


Conclusion

So, there you go, we have just saved Eleanor Arroway centuries worth of work. Just to be clear, I’ve done this for fun, I am not expecting this to be super precise and to be used in production systems, there are many other much more sophisticated and proven ways that you can use to test for randomness.

Friday, 23 June 2017

Birthday Problem More Than 2 People and Reverse

More Than 2 People Sharing The Same Birthday


There are very few coding examples out there when it comes to calculating the probability of more than 2 people sharing the same birthday. To make this work I have used this Poisson approximation equation, in generalised form this equation looks like this:

n = random people
g = pairs
d = days
r = probability



Here is the source code:
  public static double ProbabilityOfPair(int numOfRandomInstances, int uniquePossibilities, int numOfPairs = 2)
{
    if (numOfPairs < 2)
        throw new Exception("Cannot have less than two pairs");

    if (numOfRandomInstances < 1 || uniquePossibilities < 1)
        throw new Exception("Number of random instances or unique possibilities needs to be greater than 0");

    double bionomial = calcBinomialCoefficient(numOfRandomInstances, numOfPairs);
    double totalPairPossibilities = Math.Pow(uniquePossibilities, numOfPairs - 1);
    return 1 - Math.Exp(-(bionomial / totalPairPossibilities));
}

static double calcBinomialCoefficient(double n, double k)
{
    if (k > n)
        throw new Exception("n must be greater than k");

    double result = 1;
    for(int i = 1; i <= k; i++)
    {
        result *= n - (k - i);
        result /= i;
    }
    return result;
}

Birthday Problem Reverse


Now what do you do if you want to reverse the birthday problem and instead get back number of random people required to reach certain probability? For example, instead of this:
  ProbabilityOfPair(23, 365, 2) // returns 0.5000017521827107
You would like to do something like this:
  FindNumOfRandomInstancesRequired(365, 2, 0.50) // returns 23

To begin with I’ve used ProbabilityOfPair function and just did a linear search. This did not scale very well at all, so I have decided to use binary search algorithm instead. This has worked very well, however it still felt inefficient. So, I have decided to rearrange the above equation, this is what I have ended up with:

n = random people
g = pairs
d = days
r = probability



Here is the source code:
  public static int FindNumOfRandomInstancesRequired(int uniquePossibilities, int numOfPairs, double requiredProbability)
{
    if (requiredProbability < 0.00 || requiredProbability >= 1.00)
        throw new Exception("Required probability must be between 0.00 and 1.00");

    double inverse = -Math.Log(1.0 - requiredProbability);
    double bionomialFact = factorial(numOfPairs);
    double totalPairPossibilities = Math.Pow(uniquePossibilities, numOfPairs - 1);
    double precision = (numOfPairs - 1.0) / 2.0;
    double numOfRandomInstancesRequired = Math.Pow(bionomialFact * totalPairPossibilities * inverse, 1.0 / numOfPairs) + precision;
            
    return (int)Math.Round(numOfRandomInstancesRequired, 0, MidpointRounding.AwayFromZero);
}

static int factorial(int i)
{
    if (i <= 1)
        return 1;
    return i * factorial(i - 1);
}

Full Source Code (licensed under the GNU licence v3.0)

  public class BirthdayProblemPoissonApprox
{
    /// <summary>
    /// Generalised birthday problem probability calculation using poisson approximation.  
    /// </summary>
    /// <example>
    ///  ProbabilityOfPair(23, 365), returns 0.5000017521827107
    ///  ProbabilityOfPair(30, 365, 3), returns 0.030015086543320635
    /// </example>
    /// <param name="numOfRandomInstances">Number of random instances in the group e.g. number of random people in the group</param>
    /// <param name="uniquePossibilities">Number of unique possibilities that random instances can fit into e.g. number of unique days</param>
    /// <param name="numOfPairs">Number of pairs</param>
    /// <see cref="https://en.wikipedia.org/wiki/Birthday_problem"/>
    /// <remarks>To make this scale beyond 2 pairs, I have used equations from here: https://math.stackexchange.com/questions/25876/probability-of-3-people-in-a-room-of-30-having-the-same-birthday </remarks>
    /// <returns>Probability of pair</returns>
    public static double ProbabilityOfPair(int numOfRandomInstances, int uniquePossibilities, int numOfPairs = 2)
    {
        if (numOfPairs < 2)
            throw new Exception("Cannot have less than two pairs");

        if (numOfRandomInstances < 1 || uniquePossibilities < 1)
            throw new Exception("Number of random instances or unique possibilities needs to be greater than 0");

        double bionomial = calcBinomialCoefficient(numOfRandomInstances, numOfPairs);
        double totalPairPossibilities = Math.Pow(uniquePossibilities, numOfPairs - 1);
        return 1 - Math.Exp(-(bionomial / totalPairPossibilities));
    }
    /// <summary>
    /// Generalised birthday problem reversed, find number of random instances required to fit certain probability.
    /// </summary>
    /// <example>
    /// FindNumOfRandomInstancesRequired(365, 2, 0.50), returns 23
    /// FindNumOfRandomInstancesRequired(365, 3, 0.03), returns 30
    /// </example>
    /// <param name="uniquePossibilities">Number of unique possibilities that random instances can fit into e.g. number of unique days</param>
    /// <param name="numOfPairs">Number of pairs</param>
    /// <param name="requiredProbability">Probability that must be reached</param>
    /// <returns>Number of random instances required</returns>
    public static int FindNumOfRandomInstancesRequired(int uniquePossibilities, int numOfPairs, double requiredProbability)
    {
        if (requiredProbability < 0.00 || requiredProbability >= 1.00)
            throw new Exception("Required probability must be between 0.00 and 1.00");

        double inverse = -Math.Log(1.0 - requiredProbability);
        double bionomialFact = factorial(numOfPairs);
        double totalPairPossibilities = Math.Pow(uniquePossibilities, numOfPairs - 1);
        double precision = (numOfPairs - 1.0) / 2.0;
        double numOfRandomInstancesRequired = Math.Pow(bionomialFact * totalPairPossibilities * inverse, 1.0 / numOfPairs) + precision;
            
        return (int)Math.Round(numOfRandomInstancesRequired, 0, MidpointRounding.AwayFromZero);
    }

    /// <summary>
    /// Calculates efficiently number of possible combinations, "n choose k". 
    /// </summary>
    /// <example>For example, suppose you have a deck of 5 cards (n = 5) and you want to know how many different hands of 3 cards you can make (k = 3). If you number the cards 1 through 5, then you will get 10 possible combinations.</example>
    /// <param name="n">Total set of items</param>
    /// <param name="k">Number of items to pick from n</param>
    /// <see cref="http://www.vb-helper.com/howto_calculate_n_choose_k.html"/>
    /// <returns></returns>
    static double calcBinomialCoefficient(double n, double k)
    {
        if (k > n)
            throw new Exception("n must be greater than k");

        double result = 1;
        for(int i = 1; i <= k; i++)
        {
            result *= n - (k - i);
            result /= i;
        }
        return result;
    }

    static int factorial(int i)
    {
        if (i <= 1)
            return 1;
        return i * factorial(i - 1);
    }
}

Sunday, 11 June 2017

Repeating String Pattern Search

In this post I am going to introduce basic repeating string pattern search algorithm that I have written for fun. This algorithm’s average running time is Θ(N), worst running time is O(N^2) (see analysis below). This algorithm takes ~1 minute to search 10 million digits of Pi.

Example usage

  
string raw = "091283091283091831093810933";
Dictionary<string, List<int>> patterns = DiscoverPatterns(raw);
foreach (KeyValuePair<string, List<int>> pattern in patterns)
{
    Console.WriteLine($"Pattern: {pattern.Key}, Indexes: {String.Join(", ", pattern.Value)}");
}

Example output

Pattern: 091283091, Indexes: 0, 6
Pattern: 1093, Indexes: 17, 22

Source Code (licensed under the GNU licence v3.0)

  
public static Dictionary<string, List<int>> DiscoverPatterns(string raw, int patternSizeMin = 2)
{
    if (patternSizeMin < 1)
        throw new Exception("Minimum pattern size should not be less than 1.");

    Dictionary<string, HashSet<int>> initialPatternSearch = new Dictionary<string, HashSet<int>>();
    Dictionary<string, HashSet<int>> discoveredPatterns = new Dictionary<string, HashSet<int>>();

    int numberOfInitialSearches = raw.Length - (patternSizeMin - 1);
    for (int i = 0; i < numberOfInitialSearches; i++)
    {
        string pattern = raw.Substring(i, patternSizeMin);

        HashSet<int> patternIndexes;
        if (initialPatternSearch.TryGetValue(pattern, out patternIndexes))
        {
            discoveredPatterns[pattern] = patternIndexes;
        }
        else
        {
            patternIndexes = new HashSet<int>();
            initialPatternSearch.Add(pattern, patternIndexes);
        }

        patternIndexes.Add(i);
    }

    initialPatternSearch.Clear();

    Dictionary<string, HashSet<int>> basePatterns = discoveredPatterns;
    do
    {
        Dictionary<string, HashSet<int>> superPatterns = discoverSuperPatterns(raw, ++patternSizeMin, basePatterns);

        foreach (KeyValuePair<string, HashSet<int>> superPattern in superPatterns)
        {
            string basePattern = superPattern.Key.Substring(0, patternSizeMin - 1);
            discoveredPatterns.Remove(basePattern);

            string baseInnerPattern = superPattern.Key.Substring(1, patternSizeMin - 1);
            discoveredPatterns.Remove(baseInnerPattern);

            discoveredPatterns.Add(superPattern.Key, superPattern.Value);
        }

        basePatterns = superPatterns;
    } while (basePatterns.Count != 0);

    return discoveredPatterns.OrderByDescending(pattern => pattern.Key.Length).ToDictionary(pattern => pattern.Key, x => x.Value.ToList());
}

private static Dictionary<string, HashSet<int>> discoverSuperPatterns(string raw, int patternSize, Dictionary<string, HashSet<int>> basePatterns)
{
    Dictionary<string, HashSet<int>> initialSuperPatternSearch = new Dictionary<string, HashSet<int>>();
    Dictionary<string, HashSet<int>> discoveredSuperPatterns = new Dictionary<string, HashSet<int>>();

    foreach (KeyValuePair<string, HashSet<int>> basePattern in basePatterns)
    {
        foreach (int patternIndex in basePattern.Value)
        {
            if (raw.Length >= (patternIndex + patternSize))
            {
                string superPattern = raw.Substring(patternIndex, patternSize);

                HashSet<int> superPatternIndexes;
                if (initialSuperPatternSearch.TryGetValue(superPattern, out superPatternIndexes))
                {
                    discoveredSuperPatterns[superPattern] = superPatternIndexes;
                }
                else
                {
                    superPatternIndexes = new HashSet<int>();
                    initialSuperPatternSearch.Add(superPattern, superPatternIndexes);
                }

                superPatternIndexes.Add(patternIndex);
            }
        }
    }

    return discoveredSuperPatterns;
}

Analysis


Worst Case

If you provide input such as 010101…01 then running time will be ((N(N-1))/2)-4, this gives us a triangular number. This is because this input is just one big continuous pattern and it’s discovering patterns one character at the time i.e. (N-1)+(N-2)+(N-3)…+(N-(N-2)). If we drop all terms that grow slowly we end up with O(N^2).

Average Case

If input is not a big continuous pattern but has many small sub patterns then running time will be X*N. If we drop all terms that grow slowly we end up with average running time Θ(N).

Thursday, 9 March 2017

Cloud Architecture & Engineering, B2B Revolution and B2C Evolution


Few years ago I was having a conversation with one of my friends about cloud, SaaS and multitenancy architecture. I was telling him how different it was to normal “hosted” app architecture. That we can setup new business tenant with custom url in seconds. He has said, well, instant tenant provisioning and custom URLs are not new. So what’s so special about it? Blogger and WordPress have been doing it for years. At that point in time I have struggled to answer his question, there was too much to say, ideas in my head were too raw. In this article I am going to explain why cloud, SaaS and multitenancy is revolutionary.



Before we start, let’s get few definitions out of the way:

Model
Description
Example
B2B, Business Applications Business sells to businesses Oracle, Visual Studio Team Services, Sage & Salesforce
B2C, Consumer Applications Business sells to consumers Spotify, Netflix & Blogger.

Your solution’s technical requirements will differ a lot depending on what business model you follow, we will be compare typical B2C and B2B requirements shortly.

Term
Somewhat My Definition
Software as a Service (SaaS) This is where you don’t care about software versioning, upgrades, deployments, where it sits, etc. You just pay for it and use it. If you are using Spotify, you are using SaaS.
Cloud Before cloud we had hosting companies, where you could rent a VM and host your site. Now we have Cloud, which is the same thing but it’s just much smarter and comes with lots of tools. Companies such as Microsoft offer prepackaged services that seamlessly scale your apps horizontally and all you have to do is just upload your binaries via FTP to a folder.
Multitenancy Traditionally each company would get their own “installation”, with multitenancy there is only one installation and everyone is seamlessly using it.

These concepts are very important as they will feed of each other, as I will illustrate this later on.
People use cloud, multitenancy and SaaS as synonyms and this is a mistake (I make this mistake as well).

B2C, Consumer Applications


When it comes to selling directly to customers things are easier, there are a lot less hoops to jump through. Here are some technical requirements that you might need to follow when you build B2C app:

Requirement Description
Availability Consumer facing sites, need to be highly available, if they are not, this can cause reputation damage and loss in revenue.
Authentication Consumers need to be able to login using some form of login, Google, Facebook, etc.
PCI DSS When customers make payments it’s important that card data is used & stored securely.
Recovery Point & Time Objectives Similiar to availability it’s critical that if data was corrupted or upgrade has gone wrong that app can recover quickly with as much data as possible.


B2C architecture before cloud:


B2C architecture after cloud:



This list is not complete, however the point is that typical B2C apps don’t have lots of technical requirements enforced upon them from consumers. This is why B2C service is so much cheaper, how much do you pay for Netflix and Spotify?


B2B, Business Applications


Here are some technical requirements that businesses often enforce on to other businesses:

Requirement Description
Corporate Identity Integration (Single Sign On Authentication) Medium/Large size companies have their own identity apps such as AzureAD, they don’t want to use your built in authentication mechanism if they can enable Single Sign On. This is where Federated identity comes in.
Audit They would like assurance and full trail of what’s happening inside the system, so they will ask for audit.
Data restore & Backup If company makes a mistake in their environment they will want to restore their data back to how it was.
Integration Often they have their own systems and they will want to integrate their existing applications with your application.
Security clearance Depending on the industry your business customers might want your ops staff to be security cleared, if this is not possible then sensitive data needs to be hidden or encrypted.
Escrow Businesses need to ensure businesses continuity, to do this they need to be able keep going even your business will go out of business.
ISO 27001 Key management, Antivirus, Firewalls, Segregation of duties, Pen testing, etc.
Reporting Access to data to be able to report on it.
Customisation & Configuration Data retention, authorisation, workflow, etc.
Availability Business sites don’t necessarily need to be as highly available as in a lot of industries people work 9am-5pm.
PCI DSS When businesses make payments it’s important that card data is used & stored securely.
Recovery Point & Time It’s critical that if data was corrupted or upgrade has gone wrong that app can recover quickly, business finance or patient data can’t be lost, huge amount of productivity can be lost if this not designed correctly.
Support Be able to troubleshoot issues quickly.
On-prem installation Company is able to install your software in their own environment.

This list is not complete, however if you compare B2B to B2C it’s easy to see that there is a lot more to it. So when companies charge other companies a lot more for their services we can appreciate why this happens.


B2B architecture before cloud and multitenancy:




B2B architecture after cloud and multitenancy:


So where is the revolution?


Consumer facing sites were doing SaaS with multitenancy for years. Websites like YouTube and Ebay have been scaling their solutions for a long time. Cloud has just made it easier for them to scale and in turn it has reduced their hosting cost. Business applications were doing SaaS for a long time, however multitenancy and cloud was just not a thing. Most service providers painfully installed software for each customer manually, then, they upgraded each customer one at the time and they monitored each customer one at the time. You get the picture. Customer provisioning took weeks and maybe months. Multitenancy offers something truly awesome, it removes slow per customer installation and upgrade problem and in turn it dramatically speeds up the software delivery and customer onboarding process. Cloud removes the infrastructure maintenance and provisioning burden. Together they simply transform business in a remarkable way. B2B apps can finally compete with B2C apps in terms of feature delivery speed and cost.

Conclusion


Let’s come back to the conversation with my friend. He thought that nothing was new here because he was thinking of B2C evolution. While I was thinking of B2B where multitenancy and cloud is transforming businesses and revolutionising business applications.

Sunday, 12 February 2017

Azure App Services Web App Antivirus - Concept

Introduction


Good web application antivirus solution needs to meet the following acceptance criteria:
  • Real-time inbound and outbound file scanning
  • Low latency, sub second performance
  • Transparent, no need to make changes to your web app
  • Requires no maintenance and is easy to setup

If you are using PaaS service such as Azure Web Apps your antivirus options are:

Option Limitation Installation Effort Maintenance Effort Cost Meets criteria
Firewall with ICAP integration Expense and ongoing maintenance High Medium High Yes
Send files to the antivirus API from your app Extra unnecessary development work Medium/High Medium Low No
Background service that scans your files Not real time, high possibility that viruses will be uploaded and downloaded Medium Low Low No
Reverse proxy binary content forwarding Scans only uploads Medium Low Low No
IIS Filter (this solution) PoC, not production ready Medium Low Low Yes

In depth exploration of different antivirus options is beyond the scope this article.


IIS Filter Solution, inspired by ModSecurity





This article will only cover the interesting parts of the solution, if you have questions please do comment.


AVFilter, File Upload/Download Filter


Filtering uploads and downloads is relatively straight forward thanks to IIS http modules, read more about them here.


Filtering Uploads


Here is an example of how you can detect file upload inside the http module:
 private void Context_PreRequestHandlerExecute(object sender, EventArgs e)
  {
      HttpApplication httpApplication = (HttpApplication)sender;
      HttpRequest httpRequest = httpApplication.Context.Request;
      
      if (httpRequest.RequestType == WebRequestMethods.Http.Post)
      {
          if (httpRequest.Files.Count != 0)
          {
              //Check files
          }
      }
  }
Take a look at the full implementation here.


Filtering Downloads


When it comes to intercepting downloads it’s not as simple. Unfortunately outgoing traffic is written to the System.Web.HttpResponseStreamFilterSink and this stream is write only, so you can’t read it. So the only option that I and Google can think of is to proxy the outgoing stream. To make this happen I have created proxy class that changes write only steam to write/read stream, I have conveniently called it ReadWriteProxyStream, take a look at the implementation here.

Now that we have ReadWriteProxyStream class, we need to hook it up response filter, this is done when web request is first received:
    private void Context_PreRequestHandlerExecute(object sender, EventArgs e)
    {
        HttpApplication httpApplication = (HttpApplication)sender;
        
        httpApplication.Context.Response.Filter = new ReadWriteProxyStream(httpApplication.Context.Response.Filter);
    }
Take a look at the full implementation here.

Before web response is returned back to user we need to check it, we can do this by reading the stream:
    private void Context_PreSendRequestContent(object sender, EventArgs e)
    {
        HttpApplication httpApp = (HttpApplication)sender;
        HttpResponse httpResponse = httpApp.Context.Response;
        
        if(httpResponse.ContentType == "application/octet-stream")
        {
            ReadWriteProxyStream filter = httpResponse.Filter as ReadWriteProxyStream;

            if (filter != null)
            {
                byte[] data = new byte[filter.Length];
                filter.Position = 0;
                filter.Read(data, 0, data.Length);

                //Check files
            }
        }
    }
Take a look at the full implementation here.


AVFileReceiver, Virus Scanning


For this proof of concept I have used Symantec virus scanner which comes with CLI, and to test it I have used EICAR test virus. EICAR test virus is harmless, it can’t actually do anything to your machine.

Symantec CLI will scan the specified file synchronously, so if the file is no longer there you know that it was not safe:
  using (Process process = Process.Start(
        "C:\\Program Files (x86)\\Symantec\\Symantec Endpoint Protection\\DoScan.exe",
        "/ScanFile " + filePath))
    {
        process.WaitForExit();
    }

    if (!File.Exists(filePath))
        throw new Exception("File was not safe!")
Take a look at the full implementation here.



Final Thoughts


If you like this concept then please do share and rate this blog post. If this post will generate enough interest I will productionise this concept. However, if you just can’t wait, then here is what you need to do to make this production ready:
  • Secure communication between AVFilter and AVFileReceiver
  • Secure AVFileReceiver access (authentication, network security group, etc)
  • Reduce VM maintenance through automation, use PowerShell files provided and place Cloud Service inside availability and upgrade group
  • Reduce memory consumption by intercepting binary responses only
  • Make debugging easier by logging exceptions
  • Improve performance by forwarding binary content using MTOM encoding to the AVFileReceiver and remove AVFilter temp storage
  • Improve scalability and responsiveness by using Async in the AVFileReceiver and AVFilter