Language Detection

Who is talking about the French Open?

I don’t think rOpenSci’s Jeroen Ooms can ever top the coolness of his magick package but I have to admit other things he’s developped are not bad at all. He’s recently been working on interfaces to Google compact language detectors 2 and 3 (the latter being more experimental). I saw this cool use case and started thinking about other possible applications of the packages.

I was very sad when I realized it was too late to try and download tweets about the Eurovision song context but then I also remembered there’s this famous tennis tournament going on right now, about which people probably tweet in various languages. I don’t follow the French Open myself, but it seemed interesting to find out which languages were the most prevalent, and whether the results from the cld2 and cld3 packages are similar and whether they’re similar to the language detection results from Twitter itself.