![]() |
Movie "Contact 1997" |
Non Randomness Detection
To make non randomness detection work I will use pattern discovery algorithm and birthday problem calculation that I have blogged about earlier. Pattern discovery algorithm will discover patterns in the provided data. Birthday problem calculation will be given three parameters, size of the data, all possible pattern permutations given the number of unique characters and length in identified pattern, and number of unique pattern matches. If probability of a given pattern is not in the range then it’s highly likely that pattern is not random.public class RandomnessDetection { public static List<KeyValuePair<string, List<int>>> FindNonRandomPatterns(Dictionary<string, List<int>> patterns, string raw) { if (patterns == null || patterns.Count == 0) throw new Exception("No patterns found, cannot test."); int uniqueChars = raw.Distinct().Count(); int rawLength = raw.Length; if (uniqueChars == 1) throw new Exception("Need to have more than one character"); if (rawLength < 29) throw new Exception("String needs to be at least 30 chars long"); List<KeyValuePair<string, List<int>>> nonRandomPatterns = new List<KeyValuePair<string, List<int>>>(); foreach (KeyValuePair<string, List<int>> pattern in patterns) { double possibleCombinations = Math.Pow(uniqueChars, pattern.Key.Length); double prob = BirthdayProblemPoissonApprox.ProbabilityOfPair(rawLength - 2, possibleCombinations, pattern.Value.Count); if (Math.Truncate(prob * 100) / 100 <= 0.01) nonRandomPatterns.Add(pattern); } return nonRandomPatterns; } }
Command-line interface
Now we have all the key ingredients. We just need create a program that we can interact with and we can start testing data for randomness. I have decided to use Command Line Parser Library to parse args for me, it’s awesome. You will notice that I have allowed for binary content checking. Unfortunately when I have designed pattern discovery algorithm I have not designed it to work with the binary data, so to make that work I had to encode data to base64.class Program { class Options { [Option('f', "file", Required = true, HelpText = "Input file to be processed.")] public string InputFile { get; set; } [Option('b', "fileisbinary", DefaultValue = false, HelpText = "If file is binary it will be encoded before pattern search.")] public bool FileIsBinary { get; set; } [Option('s', "showallpatterns", DefaultValue = false, HelpText = "Shows all patterns random and non-random.")] public bool ShowAllPatterns { get; set; } [Option('v', "verbose", DefaultValue = true, HelpText = "Prints all messages to standard output.")] public bool Verbose { get; set; } [ParserState] public IParserState LastParserState { get; set; } [HelpOption] public string GetUsage() { return HelpText.AutoBuild(this, (HelpText current) => HelpText.DefaultParsingErrorsHandler(this, current)); } } static void Main(string[] args) { var options = new Options(); if (CommandLine.Parser.Default.ParseArguments(args, options)) { if (options.Verbose) { Console.WriteLine("file: " + options.InputFile); Console.WriteLine("fileisbinary: " + options.FileIsBinary); Console.WriteLine("showallpatterns: " + options.ShowAllPatterns); } try { if (!File.Exists(options.InputFile)) { Console.WriteLine("File path doesn't exist, can't do pattern discovery"); return; } string raw = string.Empty; if (options.FileIsBinary) raw = System.Convert.ToBase64String(File.ReadAllBytes(options.InputFile)); else raw = File.ReadAllText(options.InputFile); if (String.IsNullOrEmpty(raw)) { Console.WriteLine("Input File is empty"); return; } Console.WriteLine("Starting to discover patterns"); Dictionary<string, List<int>> patterns = RepeatingPatternDiscovery.DiscoverPatterns(raw); Console.WriteLine($"Discovered {patterns.Count} patterns, testing them for randomness"); List<KeyValuePair<string, List<int>>> nonRandomPatterns = RandomnessDetection.FindNonRandomPatterns(patterns, raw); if (nonRandomPatterns.Count != 0) { Console.WriteLine($"Data is not random, {nonRandomPatterns.Count} non random patterns found:"); foreach (KeyValuePair<string, List<int>> pattern in nonRandomPatterns) { Console.WriteLine($"Pattern: {pattern.Key}, found: {pattern.Value.Count} times, Indexes: [{String.Join(", ", pattern.Value)}]"); } } else { Console.WriteLine("Data is random"); } if (options.ShowAllPatterns) { Console.WriteLine("All patterns found:"); foreach (KeyValuePair<string, List<int>> pattern in patterns) { Console.WriteLine($"Pattern: {pattern.Key}, found: {pattern.Value.Count} times, Indexes: [{String.Join(", ", pattern.Value)}]"); } } } catch (Exception ex) { Console.WriteLine($"Error: {ex.Message}"); } } }
Testing text for randomness
To start, I am going to test text_blogpost.txt and text_pi.txt for randomness. Is π random?C:\data>randomnesstest -f C:\data\text_pi.txt file: C:\data\text_pi.txt fileisbinary: False showallpatterns: False Starting to discover patterns Discovered 228688 patterns, testing them for randomness Data is random
Apparently π is random. How about one of my blog posts?
C:\data>randomnesstest -f C:\data\text_blogpost.txt file: C:\data\text_blogpost.txt fileisbinary: False showallpatterns: False Starting to discover patterns Discovered 559 patterns, testing them for randomness Data is not random, non random patterns found: Pattern: public Purchase Checkout(Cart cart) { Purchase purchase = new Purchase() { Customer = this, Products = cart.Products, Created = DateTime.Now }; this.purchases.Add(purchase); return purchase; } } , found: 2 times, Indexes: [6718, 8033] ...My blog post is not random. That’s fantastic news! Sometimes I feel like it is.
Testing pictures for randomness
Random noise picture:
C:\data>randomnesstest -f C:\data\pic_noise.png -b file: C:\data\pic_noise.png fileisbinary: True showallpatterns: False Starting to discover patterns Discovered 7685 patterns, testing them for randomness Data is random
Van Gogh, Bridge In The Rain picture:

C:\data>randomnesstest -f C:\data\pic_vangogh.jpg -b file: C:\data\pic_vangogh.jpg fileisbinary: True showallpatterns: False Starting to discover patterns Discovered 432021 patterns, testing them for randomness Data is not random, non random patterns found: Pattern: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA, found: 2 times, Indexes: [1311, 1312] Pattern: CgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCg, found: 2 times, Indexes: [3388, 3392] Pattern: DQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0N, found: 2 times, Indexes: [3508, 3512] Pattern: AAAAAWFlaIAAAAAAAA, found: 2 times, Indexes: [1383, 3319] Pattern: TWF1cmljZSBUcm9tc, found: 2 times, Indexes: [100, 548] Pattern: ZGM6Y3JlYXRvcj4gP, found: 2 times, Indexes: [508, 596] Pattern: EBAQAAAAAAAAAAAA, found: 2 times, Indexes: [3594, 3742] ...
Testing audio for randomness
For a final test I am going to test white noise and building noise for randomness. I have downloaded white noise file from here.C:\data>randomnesstest -f C:\data\sound_whitenoise.wav -b file: C:\data\sound_whitenoise.wav fileisbinary: True showallpatterns: False Starting to discover patterns Discovered 204117 patterns, testing them for randomness Data is random
C:\data>randomnesstest -f C:\data\sound_buildingnoise.m4a -b file: C:\data\sound_buildingnoise.m4a fileisbinary: True showallpatterns: False Starting to discover patterns Discovered 61762 patterns, testing them for randomness Data is not random, non random patterns found: Pattern: AwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwMCAwMDAwMDAwM, found: 2 times, Indexes: [262938, 262950] Pattern: AAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAA, found: 2 times, Indexes: [256227, 256387] Pattern: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA, found: 2 times, Indexes: [256274, 256275] Pattern: aGQAAAAA0v3TfNL903wAAKxEAB, found: 2 times, Indexes: [256180, 256468] Pattern: AAAAAAAAAAQAAAAAAAAAAAA, found: 2 times, Indexes: [256223, 256691] Pattern: AAACvAAAAqgAAAKoAAAC, found: 2 times, Indexes: [256999, 261399] Pattern: AAAArwAAAKwAAAClAAAA, found: 2 times, Indexes: [259132, 260604] Pattern: AAACrAAAArwAAAKwAAAC, found: 2 times, Indexes: [259159, 260599] ...
No comments:
Post a Comment