Hi! Great post, very interesting stuff. I was wondering if you think training a model like what is suggested here (which would essentially enable someone to compute a probability of a word appearing next given the preceding context) help you classify a word as the beginning of a sentence - presumably to improve performance for the “no punctuation” case?