You might have heard of the Advent of Code,
a 25-day challenge involving a programming puzzle a day, to be solved
with the language of your choice. I’ve noted the popularity of this
activity in my Twitter timeline but also in my GitHub timeline where
I’ve seen the creation of a few advent-of-code or so repositories.
AoC is largely an exercise in figuring how to write your favourite language as if were C or C++ 😁, which can be fun ... in moderation
If I were to participate one year, I’d probably use R. Jenny Bryan’s
tweet above inspired me to try and gauge the popularity of languages
used in the Advent of Code. To do that, in this post, I shall use the
search endpoint of GitHub V3 API to identify Advent of Code 2018 repos.
Remember my blog post about automatic tools for improving R packages? One of these tools is Jim Hester’s lintr, a package that performs static code analysis. In my experience it mostly helps identifying too long code lines and missing space, although it’s a bit more involved than that. In any case, lintr helps you maintain good code style, and as mentioned in that now old post of mine, you can add a lintr unit test to your package which will ensure you don’t get lazy over time.
Now say your package has a lintr unit test and lives on GitHub. What happens if someone makes a pull request and writes looong code lines? Continuous integration builds will fail but not only that… The contributor will get to know Lintr Bot, lintr’s Hester (Easter) egg!
Recently I needed to count lines of code for a project at work work (this is an expression of the person honored in this post), and happened to discover that Bob Rudis had started an R package wrapping the Perl CLOC script. Of course! He has packages for a lot of things! And he’s always ready to help: after I asked him a question about the package, and made a pull request to renew its wrapped CLOC script, he made it all pretty and ready-to-go!
He himself defined his Stack Overflow Driven-Development (SODD) workflow in a blog post: someone will ask him a question on Stack Overflow, and he’ll write a long answer eventually becoming a package, that will or will not make it to CRAN… Which is the motivation of this blog post. How can I output a list of all packages Bob has on GitHub?
On my pretty and up-to-date CV, one of the first things one sees is my Github username, linking to my Github profile. What does a potential employer look at there? Hopefully not my non informative commit messages… My imitating a red Ampelmann, my being part of several organizations, my pinned repositories described with emojis… But how would they know where&how I’ve mostly been active without too much effort?
A considerable part of my Github work happens in organizations: I’m a co-editor at rOpenSci onboarding of packages, I contribute content to the R Weekly newsletter, etc. Although my profile shows the organizations I belong to, one would need to dig into them for a while before seeing how much or how little I’ve done. Which is fine most of the time but less so when trying to profile myself for jobs, right? Let’s try and fetch some Github data to create a custom profile.
Note: yep I’m looking for a job and ResearchGate’s suggestions are not helpful! Do you need an enthusiastic remote data scientist or research software engineer for your team? I’m available up to 24 hours a week! I care a lot about science, health, open source and community. Ideally I’d like to keep working in something close to public research but we can talk!
Do you know Lucy? She is a very talented biostatistics PhD candidate that I had the chance to e-meet thanks to R-Ladies. One maybe superficial reason to admire her, on top of her other achievements, is her emoji game in git commits. Looking at Lucy’s git history (find her on Github), one wants to start using version control because she makes it look fun!
In this post, I will download many git commit messages of Lucy’s from Github’s API via the gh package, and have a look at the emojis she uses the most frequently.
When you do simulations, for instance in R, e.g. drawing samples from a distribution, it’s best to set a random seed via the function set.seed in order to have reproducible results. The function has no default value. I think I mostly use set.seed(1). Last week I received an R script from a colleague in which he used a weird number in set.seed (maybe a phone number? or maybe he let his fingers type randomly?), which made me curious about the usual seed values. As in my blog post about initial commit messages I used the Github API via the gh package to get a very rough answer (an answer seedling from the question seed?).
When I create a new .git repository, my first commit message tends to be “1st commit”. I’ve been wondering what other people use as initial commit message. Today I used the gh package to get first commits of all repositories of the ropensci and ropenscilabs organizations.