Re-creating Scrabble, pt. 2

This is more of an update to pt. 1. I decided to improve my Python code. I discovered that yes, you can iterate through the alphabet using “import ascii_uppercase,” thanks to this StackOverflow article. It took me 28 minutes to improve the code and 9 minutes to edit another newspaper, but this time the newspaper’s OCR was much more crisp and I took a few more shortcuts with the editing without sacrificing characters.

#Evening star. (Washington, D.C.), April 05, 1934,
import re
from string import ascii_uppercase
f = open("EveningStar040534-OCR.txt", "r")
teststring=(f.read())
caps = teststring.upper()
regex = re.findall("[A-Z]",caps)
newspaperCharLst = []
for letter in regex:
    newspaperCharLst.append(letter)
for letter in ascii_uppercase:
    print(letter,": ",newspaperCharLst.count(letter))
# 28 minutes
# 9 minutes to edit OCR

I was pleased to find that my code worked about the same either way, and I only had one major discrepancy between the two newspapers at the letter K, which was pretty significant, but a quick scan of the Evening Star didn’t reveal any glaring mistakes, so I’m going to let it go:

The new data from the Evening Star didn’t substantially change the chart relative to points and distribution:

That jagged line in the middle of our letters still bothers me. I tried out using standard deviation to see if a Bell curve helped me discover why the points and distribution creates invert one another, but that math did not work out… or, I just don’t know when or how to apply standard deviation. Very possible. I never took a stats class. It was fun to learn how to do it, though.

https://chroniclingamerica.loc.gov/lccn/sn83045462/1934-04-05/ed-1/seq-1/

Leave a comment