It’s October, time for spooky Twitter names! If you’re on this social
media platform, you might have noticed some of your friends switching
their names to something spooky and punny. Last year I was “Maelstrom
Salmon”, which I find scary but is arguably not that funny. Anyhow, what
if you want to switch your name but have no inspiration? In this post,
we shall explore R’s abilities to help us with that with the help of
webscraping, phonetic spelling and string distance algorithms, and the
magic of randomness!
What can a kaka, a kakapo, an European rabbit and a grey heron have in
common? Well, they might co-habit in the bookshelf of an R user, since
they’re all animals on the covers of popular R books: “R
Packages”, “R for Data
Science”, “Text mining with
R” and “Efficient R
programming”, respectively.
Their publisher, O’Reilly, has now based its brand on covers featuring
beautiful gravures of animals.
Recently, while wondering what the name of R for Data Science bird was
again (I thought it was a kea!), I was thrilled to find the whole
O’Reilly menagerie, i.e. a list of
books and corresponding animals! The website also features a link to “A
short history of the O’Reilly
animals”
that was an amazing read. In it was noted that “The animals are in
trouble.”, with a few examples of endangered species. It inspired me to
actually try and assess the conservation status of O’Reilly animals
using responsible webscraping, taxonomic name resolving and IUCN Redlist
API querying…
I was until recently subscribed to an email list, ALLSTAT, “A UK-based worldwide e-mail broadcast system for the statistical community, operated by ICSE for HEA Statistics.” created in 1998. That’s how I saw the ad for my previous job in Barcelona! Now, I dislike emails more and more so I unsubscribed, but I’d still check out the archives any time I need a job, since many messages are related to openings. Nowadays, I probably identify more as a research software engineer or data scientist than a statistician… which made me wonder, when did ALLSTAT start featuring data scientist jobs? How do their frequency compare to those of statisticians?
In this post, I’ll webscrape and analyse meta-data of ALLSTAT emails. It’ll also be the occasion for me to take the wonderful new polite
package for a ride, that helps respectful webscraping!
It’s nearly been two years since I defended my PhD thesis! On top of allowing me to call myself doctor, having a PhD in statistics gives me the honour to feature in the data of the Mathematics Genealogy Project. Today, I decided to webscrape my mathematical ancestors.
I couldn’t miss the fun Twitter hashtag #BadStockPhotosOfMyJob thanks to a tweet by Julia Silge and another one by Colin Fay. The latter inspired me to actually go and look for what makes a data science photo… What characterizes “data science” stock photos?
My husband and I recently started watching the wonderful series “Parks and recreation” which was recommended to me by my fellow R-Lady Jennifer Thompson in this very convincing thread. The serie was even endorsed by other R-Ladies. Jennifer told me the first two seasons are not as good as the following ones, but that it was worth it to make it through them. We actually started enjoying the humor and characters right away!
Then, this week while watching the show, one of the characters did a very basic text analysis that made me feel like imitating him for a blog post – my husband told me it was very Leslie of me to plan something while doing something else which made me very proud. I tested my idea on other Leslie fans, and they seemed to think it was a great idea… and that this post should be the beginning of a series of R-Ladies blog posts about Parks and recreation!
In this two-short-part blog post, I’ll therefore inaugurate this series, what an honor!
I’ve recently been binge-reading The Guardian Experience columns. I’m a big fan of The Guardian life and style section regulars: the blind dates to which I dedicated a blog post, Oliver Burkeman’s This column will change your life, etc. Experience is another regular that I enjoy a lot. In each of the column, someone tells something remarkable that happened to them. It can really be anything.
I was thinking of maybe scraping the titles and get a sense of most common topics. The final push was my husband’s telling me about this article of
Gabriella Paiella’s about the best Guardian Experience columns. She wrote “the “Experience” column does often touch on heavier topics”. Can one know what is the most prevalent “weight” of Experience columns scraping all their titles?