Improving the Solution: reCAPTCHA Breaking Google’s reCAPTCHA By Douglas Mechaber July 22, 2012 11:10 AM Tags : Security Speakers Issa Spam Twitter Computers Devices Forms Facebook Audio Google Table of contents 1. Stiltwalker Project and dc949 2. reCAPTCHA Analysis 3. Improving the Solution: reCAPTCHA 4. Word Mashing – Get Fuzzy: reCAPTCHA Hacking 5. Google’s Reaction: The Lady or The Tiger? 3. Improving the Solution: reCAPTCHA Via fairly complex mathematics, Adam created a neural network amongst dc949’s computers, in so doing creating up to 5000 nodes and multiple outputs for reCAPTCHA analysis.. As he explained, the machine learning is very similar to linear regression, and he set up some neural networks to help solve the problem of reducing 2048 inputs to fewer outputs (58)– the set of words used by reCAPTCHA. Now by plotting frequency versus amplitude, and trying to match that curve, a sample would only match that inflection, and that specific background. Figure 4. Red marks represent a broken frequency “map” of red, blue marks represent the word blue, and green marks represent the word green. What word is represented by the black marks? No, not green. It’s red. Fig. 5. Amplitude versus Frequency of target words, showing a good, but not perfect, curve fit. This curve better fits general cases of the word red, versus a more exacting fit. Stiltwalker measures the distance between an unknown word, and the various curves generated by known words to find a match. By solving many CAPTCHAs beforehand, dc949 coached the neural network into making better choices. Then Adam and his peeps created back up solvers - more neural nets, trained on different combinations of words, inflections, background noises - to solve more conditions of varying spoken word inflection and various background noises. The best combination used 13 different solvers, chained together. What they did, essentially, was measure the distance from the x’s to the known good/solved word, and assign that a certainty. So a given sample could have a certainty of 1% to be boat, but 97% to be kettle, so the guess would be kettle. Some of the challenges included the fact that reCAPTCHA used three simultaneous instances of background noise intertwined. Like humans, dc949 ignored the background noise, which worked! Previous Next 3. Improving the Solution: reCAPTCHA 1. Stiltwalker Project and dc9492. reCAPTCHA Analysis3. Improving the Solution: reCAPTCHA 4. Word Mashing – Get Fuzzy: reCAPTCHA Hacking5. Google’s Reaction: The Lady or The Tiger? Comment on this article ... Comment(s)| Comments