It’s October, time for spooky Twitter names! If you’re on this social
media platform, you might have noticed some of your friends switching
their names to something spooky and punny. Last year I was “Maelstrom
Salmon”, which I find scary but is arguably not that funny. Anyhow, what
if you want to switch your name but have no inspiration? In this post,
we shall explore R’s abilities to help us with that with the help of
webscraping, phonetic spelling and string distance algorithms, and the
magic of randomness!
What can a kaka, a kakapo, an European rabbit and a grey heron have in
common? Well, they might co-habit in the bookshelf of an R user, since
they’re all animals on the covers of popular R books: “R
Packages”, “R for Data
Science”, “Text mining with
R” and “Efficient R
Their publisher, O’Reilly, has now based its brand on covers featuring
beautiful gravures of animals.
Recently, while wondering what the name of R for Data Science bird was
again (I thought it was a kea!), I was thrilled to find the whole
O’Reilly menagerie, i.e. a list of
books and corresponding animals! The website also features a link to “A
short history of the O’Reilly
that was an amazing read. In it was noted that “The animals are in
trouble.”, with a few examples of endangered species. It inspired me to
actually try and assess the conservation status of O’Reilly animals
using responsible webscraping, taxonomic name resolving and IUCN Redlist
I was until recently subscribed to an email list, ALLSTAT, “A UK-based worldwide e-mail broadcast system for the statistical community, operated by ICSE for HEA Statistics.” created in 1998. That’s how I saw the ad for my previous job in Barcelona! Now, I dislike emails more and more so I unsubscribed, but I’d still check out the archives any time I need a job, since many messages are related to openings. Nowadays, I probably identify more as a research software engineer or data scientist than a statistician… which made me wonder, when did ALLSTAT start featuring data scientist jobs? How do their frequency compare to those of statisticians?
In this post, I’ll webscrape and analyse meta-data of ALLSTAT emails. It’ll also be the occasion for me to take the wonderful new polite package for a ride, that helps respectful webscraping!
It’s nearly been two years since I defended my PhD thesis! On top of allowing me to call myself doctor, having a PhD in statistics gives me the honour to feature in the data of the Mathematics Genealogy Project. Today, I decided to webscrape my mathematical ancestors.
I’ve now done a few collages from R using magick: the faces of #rstats Twitter, We R-Ladies with Lucy D’Agostino McGowan, and a holiday card for R-Ladies. The faces of #rstats Twitter and holiday card collages were arranged at random, while the We R-Ladies one was a mosaic forming the R-Ladies logo. I got the idea to up my collage skills by trying to learn how to arrange pics by their main colour, like a rainbow. The verb rainbow doesn’t exist, and “rainbowing” doesn’t mean ordering by colour, but I didn’t let this stop me.
It was the occasion to grab some useful knowledge about colours, not useless for someone who did not even know about Pantone’s Colors of the Year a few weeks ago…
This post has nothing to do with Kesha’s new album. However, you can listen to it while reading since it’s so good, but maybe switch to something older from her when I use “$”.
My husband and I recently started watching the wonderful series “Parks and recreation” which was recommended to me by my fellow R-Lady Jennifer Thompson in this very convincing thread. The serie was even endorsed by other R-Ladies. Jennifer told me the first two seasons are not as good as the following ones, but that it was worth it to make it through them. We actually started enjoying the humor and characters right away!
Then, this week while watching the show, one of the characters did a very basic text analysis that made me feel like imitating him for a blog post – my husband told me it was very Leslie of me to plan something while doing something else which made me very proud. I tested my idea on other Leslie fans, and they seemed to think it was a great idea… and that this post should be the beginning of a series of R-Ladies blog posts about Parks and recreation!
In this two-short-part blog post, I’ll therefore inaugurate this series, what an honor!
It was recently brought to my attention by Hannah Frick that there are now sooo many R-Ladies chapters around the world! R-Ladies is a world-wide organization to promote gender diversity in the R community, and I’m very grateful to be part of this community through which I met so many awesome ladies! Since we’re all connected, it has now happened quite a few times that R-Ladies gave talks at chapters outside of their hometowns. An R-Lady from Taiwan giving a talk in Madrid while on a trip in Europe and another one doing the same in Lisbon, an R-Lady from San Francisco presenting at the London and Barcelona chapters thanks to a conference on the continent, an R-Lady from Uruguay sharing her experience for the New York City and San Francisco chapters… It’s like rockstars tours!
Therefore we R-Ladies often joke about doing an exhaustive global tour. Hannah made me think about this tour again… If someone were to really visit all of the chapters, what would be the shortest itinerary? And could we do a cool gif with the results? These are the problems we solve here.
I’ve recently been binge-reading The Guardian Experience columns. I’m a big fan of The Guardian life and style section regulars: the blind dates to which I dedicated a blog post, Oliver Burkeman’s This column will change your life, etc. Experience is another regular that I enjoy a lot. In each of the column, someone tells something remarkable that happened to them. It can really be anything.
I was thinking of maybe scraping the titles and get a sense of most common topics. The final push was my husband’s telling me about this article of
Gabriella Paiella’s about the best Guardian Experience columns. She wrote “the “Experience” column does often touch on heavier topics”. Can one know what is the most prevalent “weight” of Experience columns scraping all their titles?
I am a runner but also a Body Pump enthusiast. Body Pump is a group fitness class of the Les Mills company, in which you train different muscle groups using a weighted bar – whose total weight you modulate with plates in order to adapt it to your fitness level and to the muscle group. Like R, Body Pump was created in New Zealand, what a wonderful country! Every three months, a new class is released, with new songs and choreographies. What doesn’t change is the muscle group trained in each of the 10 songs of each class.
I’ve thought of analysing Body Pump data for a long time now but could never find what I was looking for, which was a dataset of number of “reps” by song, e.g. how many squats do you do in each squats song. Then I realized I could also play with other data, like a quite comprehensive list of songs used in releases. I decided to cross this information with information about style of the corresponding artist in Spotify. Here is what I came up with!
Last week I published a post about scraping Radio Swiss Classic program. After that, Bob Rudis wrote an extremely useful post improving my code a lot and teaching me cool stuff. I don’t know why I forgot to add pauses between requests… Really bad behaviour! I will use his code today for re-scraping the data.
Why re-scrape the data? I mentioned broken links in my post. In fact, each time I hit a broken page, Radio Swiss Classic webmaster received an email. That person received a lot of emails because of me. They repaired the bug explaining these broken pages and contacted me because someone had turned me in (I feel super famous or spied on now), very kindly mentioning they had fixed all pages, and not holding any grudge against me. So let’s scrape everything again!
I am not a classical music expert at all, but I happen to have friends who are, and am even married to someone who plays the cello (and the ukulele!). I appreciate listening to such music from time to time, in particular Baroque music. A friend made me discover Radio Swiss classic, an online radio playing classical music all day and all night long, with a quite nice variety, and very little speaking between pieces, with no ads (thank you, funders of the radio!). Besides, the voices telling me which piece has just been played are really soothing, so Radio Swiss classic is a good one in my opinion.
Today, instead of anxiously waiting for the results of the French presidential elections, I decided to download the program of the radio in the last years and have a quick look at it, since after all, the website says that the radio aims at relaxing people.
One of my more or less guilty pleasures is reading The Guardian blind date each week. I think I started doing this when living in Cambridge, England for five months. I would buy i every weekday and The Guardian week-end every week-end. I wasn’t even dating at the time I discovered The Guardian blind dates but I’ve always liked their format.
I get so much into each date report that seeing both participants say they want to meet again makes me ridiculously happy. I like wondering how matches were made, but today I just want to look into the contents of post-date interviews.
I like Wikipedia. My husband likes it even more, he included it in his PhD thesis acknowledgements! I appreciate the efforts done for sharing knowledge, and also the apparently random stuff you can find on the website. In particular, I’ve been intrigued by the monthly lists of notable deaths such as this one. Who are people (or dogs, yes, dogs) whose life was deemed notable enough to be listed there? Also, using the numbers of such deaths, can I judge whether 2016 was really worse than previous years? The first step in answering these questions was to scrape the data. I’ll describe the process in this post. In another post I’ll have a look at my study population and in a third post I’ll analyse the time series of death counts.
In this post, I’m sharing a brand-new analysis! The reason for this is my blog being added to R-bloggers by Tal Galili after I filled this form. R-bloggers is a collection of blogs about R, whose new posts get added to the website via the magic of RSS feeds. R-bloggers even has a Twitter account. As a reader of R-bloggers you get exposed to many different analyses and ideas, as a R-blogger you reach a wider audience, so really it’s an useful website. Tal does a great job maintaining R-bloggers and understandably likes seeing R-bloggers mentioning the website on their blog, which I already do in the About section, and in one article, which I’ve consistently failed to do in the last two posts because I got too caught up about the article at hand to think about anything else. So I’ve figured out the best way not to forget to thank Tal for his work was to do an analysis about R-bloggers! Genius, I know. I’ve scraped the full list of contributing blogs and had a look at their names and addresses.