I could be wrong but I don't think you need to do that. There is an easier approach if you just think outside the box. You don't want to turn what you got into perfect letters. No - what you need to do is to create a new alphabet and recognize that you got, that is, decide which letter in your new alphabet best matches your unknown/test/mystery letter. So you don't need to have a perfect binary mask of the D letter for example. But if you can assume that your lighting is the same for all images (illuminated at a glancing angle from the lower right) then you want to define a D as that gray scale pattern. It doesn't matter what it is, it just matters that what it is, is defined by you as a D. So whenever it sees that same pattern of bright, dark, and gray pixels, it will say it's a D. So you make up a library of all letters and numbers with those actual patterns and associate them with the letter that makes that pattern. So for example, you cut out the bounding box of that shadow cast D letter and call that "D.png". Do the same for all the other letters and numbers. OK, now you have your library.
Next what you want to do is to compute the Hu's moments for each letter's image. Then you isolate a blob that represents a letter by any reasonable technique and you compute it's Hu's moments. Then you compare that letters Hu's moments to the Hu's moments of each of your library of letters and see which letter in your library is the closest match to your unknown letter.
Please give it a try - it should not be too difficult.