You have been tasked with decoding several batches of coded messages that have been intercepted.
Based on previous intelligence that has been gathered, you can be confident that the messages were all encoded using a simple Caesar cipher (a type of substitution cipher), an example of which is the ROT13 cipher (discussed in Cody Challenge Problem 78). As in the Cody Challenge Problem, uppercase and lowercase letters are handled independently of one another, and all other characters (e.g. punctuation, numbers, accented letters) are unchanged. Unlike the Cody Challenge Problem, here the number of letters to shift is not known in advance, and will vary between (not within) batches — also, here you need to decode, not encode.
These latest activities that you are investigating have been nicknamed 'Operation Xiangliu' by your own organisation. A few decoding options are at your organisation's disposal:
The third option will be faster than the first option, and more reliable than the second option, so this is the approach you will be taking.
You have been careful to avoid using 'lemmatised' lists (which contain e.g. the verb "to be", rather than the various inflected forms such as "am", "is", "are") like those based on the OEC or COCA, and after setting aside another list you finally choose the list based on the BNC as the most reliable, and will use the first 100 words on that list. This list will be available for you to access as an input variable, bncWordlist, to your function. Note: (i) some entries on the list are morphemes (e.g. " n't " and " 's ") rather than words; (ii) some entries appear more than once (representing different grammatical word classes). Of course, in the original messages any capitalisation might be used.
You have been instructed that your 'certitude' (degree of confidence) in the decoding must be reported for each batch, and shall depend proportionally upon the sum of the characters for the words/morphemes in the decoded message that match words/morphemes in bncWordlist. Being able to match words/morphemes accounting for three-tenths of the characters (excluding spaces) shall correspond to 100% certitude; matching three-twentieths would be 50% certitude, and so on. Certitude shall be reported as a percentage, rounded up to the nearest integer, not greater than 100. You need to maximise your certitude for each batch by appropriate choice of the shifting parameter. If there are multiple shifts of equal certitude, report both options on different rows (arranged in order of ascending shift parameter).
Your task is to crack the codes and report back in a structure array: (1) the shifting parameter that had been used in the encoding [as uint8] (usually scalar, but may be column vector); (2) the decoded messages [as a cell array (containing character arrays)] (usually an array with a single row, but occasionally with multiple rows); (3) your 'certitude' in the decoding [as uint8] (always scalar). The name of the structure array shall be s, with respective fields shift, message, and certitude.
EXAMPLE 1
Suppose the batch contained two encoded messages — "Vomftt qvstvfe, pqfo op eppst." and "Ffmt dbo ljmm, opu pomz xpvoe." (provided as character arrays within a cell array) — and a (right-shifting) ROT1 cipher had been applied. In that case A→B, B→C, ..., Y→Z, and Z→A; similarly, a→b, b→c, ..., y→z, and z→a. Thus the original messages would have been: "Unless pursued, open no doors." and "Eels can kill, not only wound." . Twelve of the 51 characters have been matched: "no", "can", "not", and "only".
The correct answer would therefore comprise:
s.shift = uint8(1) s.message = {'Unless pursued, open no doors.', 'Eels can kill, not only wound.'} s.certitude = uint8(79)
EXAMPLE 2
Suppose the batch contained one encoded message — "Oa oqvvq'u cnycau dggp: "Ctu itcvkc ctvku"." (provided as a character array within a cell array) — and a (right-shifting) ROT2 cipher had been applied. In that case A→C, B→D, ..., Y→A, and Z→B; similarly, a→c, b→d, ..., y→a, and z→b. Thus the original message would have been: "My motto's always been: "Ars gratia artis"." . Eight of the 37 characters have been matched: "My", " 's ", and "been". Note carefully that: " 's " should only be matched once; "to" (in motto), "be" (in been), "at" (in gratia), "is" (in artis) and "a" (passim) should not be matched at all; and " ' " will only ever be used as an apostrophe (never as a quotation mark).
The correct answer would therefore comprise:
s.shift = uint8(2) s.message = {'My motto''s always been: "Ars gratia artis".'} s.certitude = uint8(73)
If the batch of messages cannot be decoded when following the above assumptions, then return the original encoded message(s) unchanged, with a shift of zero and a certitude of zero.
Note: Many Cody solutions are hard to read and not necessarily computationally efficient. To direct attention toward efficient runtime speed of execution, timings are measured in the Test Suite, and reported back (if the submission is runnable). Although the timings are not perfectly reproducible, they do provide an indication of computational resource demand. (Try resubmitting on another day if your code runs slowly, in case it is caused by a server issue.)
To provide some extra motivation for you to get your code to run efficiently, it will fail the Test Suite if it is deemed "too slow".
----------
Previous problem: Operation Orthos. Next problem: TBA.
I don't understand the computing of the certitude parameter.
In Example1: ...Twelve of the 51 characters have been matched.. Why 51 characters ?
In Example2:... Eight of the 37 characters have been matched: "My", " 's ", and "been". 7 not 8 characters ?
Could you be more specific ?
Thank you for these great problems !
Hello, Jean-Marie Sainthillier. Thanks for your feedback. Regarding Ex. 1, the key information is given in a preceding paragraph about the general calculation of 'certitude', which states: "[...] accounting for [...] of the characters (excluding spaces)". I excluded spaces because there are none in the 'dictionary', so they would skew the results. E.g. if the phrase "They didn't understand" were padded with one hundred spaces before encoding, then we would always get a low 'certitude' if spaces weren't excluded (viz. 28%), suggesting only low–moderate confidence in the decryption, despite decoding half of the matchable characters, which should have given us high confidence (viz. 100%) given the small size of our 'dictionary'. Arguably punctuation could have been excluded too, but I opted to _include_ punctuation, not least because apostrophes do occur in the 'dictionary'. That is why there are 8 matched characters in Ex. 2: the apostrophe is counted (in _'s_). In a bigger 'dictionary', hyphens and full stops could also occur. Some punctuation, such as commas and semicolons, would perhaps never occur in any practical 'dictionary', but their effect on the 'certitude' is relatively small, and segregating punctuation according to function is beyond the present scope!! —DIV
Thank you :)
Alfonso Nieto-Castanon's solution passed all tests in the Test Suite on 03 February 2019.
Wall time for 1193 message batches = 6.14 s. (CPU time = 8.24 seconds.)
Mu's solution passed all tests in the Test Suite on 08 January 2019.
Wall time for 1170 message batches = 4.26 s. (CPU time = 6.25 seconds.)
William's solution passed all tests in the Test Suite on 25 September 2018.
Wall time for 1080 message batches = 7.17 s. (CPU time = 9.01 seconds.)
William's solution passed all tests in the Test Suite on 25 September 2018.
Wall time for 1080 message batches = 7.30 s. (CPU time = 9.21 seconds.)
Jean-Marie Sainthillier's solution passed all tests in the Test Suite on 13 July 2018.
Wall time for 1021 message batches = 6.77 s. (CPU time = 7.09 seconds.)
David Verrelli's solution passed all tests in the Test Suite on 18 June 2018.
Wall time for 1001 message batches = 7.05 s. (CPU time = 9.01 seconds.)
Binbin Qi's solution passed all tests in the Test Suite on 17 June 2018.
Wall time for 1001 message batches = 3.81 s. (CPU time = 4.73 seconds.)
J. S. Kowontan's solution passed all tests in the Test Suite on 17 June 2018.
Wall time for 1001 message batches = 3.36 s. (CPU time = 4.06 seconds.)
Remove any row in which a NaN appears
5981 Solvers
Create a square matrix of multiples
330 Solvers
Return elements unique to either input
478 Solvers
We love vectorized solutions. Problem 1 : remove the row average.
385 Solvers
Sums of Multiple Pairs of Triangular Numbers
179 Solvers