git and GitHub in R for the casual user
If you’ve been taught git and GitHub but practice so rarely that you’re discouraged, what should you do to re-start more easily? Let’s imagine you have to, or really want to, use git and GitHub for your next analysis project. Here’s what I would recommend… I assume you already own a GitHub account. If not, refer to happygitwithr guidance. Thanks to the people who shared recommendations on Mastodon, whose names are acknowledged in the rest of the post!
8 (octo!) GitHub Tips
I’m spending quite a lot of my working time on GitHub, so have taken some habits. Maybe some of them can be useful to you! 1: How to get started I’ve never actually taught git and GitHub, but I like sharing these useful links: Happy Git and GitHub for the useR by Jenny Bryan, the STAT 545 TAs, Jim Hester. It includes a big picture section “Why Git? Why GitHub?
Advent of Code: Most Popular Languages
You might have heard of the Advent of Code,
a 25-day challenge involving a programming puzzle a day, to be solved
with the language of your choice. I’ve noted the popularity of this
activity in my Twitter timeline but also in my GitHub timeline where
I’ve seen the creation of a few
advent-of-code or so repositories.
AoC is largely an exercise in figuring how to write your favourite language as if were C or C++ 😁, which can be fun ... in moderation— Jenny Bryan (@JennyBryan) December 12, 2018
If I were to participate one year, I’d probably use R. Jenny Bryan’s tweet above inspired me to try and gauge the popularity of languages used in the Advent of Code. To do that, in this post, I shall use the search endpoint of GitHub V3 API to identify Advent of Code 2018 repos.
Lintr Bot, lintr's Hester egg
Remember my blog post about automatic tools for improving R packages? One of these tools is Jim Hester’s
lintr, a package that performs static code analysis. In my experience it mostly helps identifying too long code lines and missing space, although it’s a bit more involved than that. In any case,
lintr helps you maintain good code style, and as mentioned in that now old post of mine, you can add a
lintr unit test to your package which will ensure you don’t get lazy over time.
Now say your package has a
lintr unit test and lives on GitHub. What happens if someone makes a pull request and writes looong code lines? Continuous integration builds will fail but not only that… The contributor will get to know Lintr Bot, lintr’s Hester (Easter) egg!
hrbrpkgs: list Bob Rudis' packages
Recently I needed to count lines of code for a project at work work (this is an expression of the person honored in this post), and happened to discover that Bob Rudis had started an R package wrapping the Perl CLOC script. Of course! He has packages for a lot of things! And he’s always ready to help: after I asked him a question about the package, and made a pull request to renew its wrapped CLOC script, he made it all pretty and ready-to-go!
He himself defined his Stack Overflow Driven-Development (SODD) workflow in a blog post: someone will ask him a question on Stack Overflow, and he’ll write a long answer eventually becoming a package, that will or will not make it to CRAN… Which is the motivation of this blog post. How can I output a list of all packages Bob has on GitHub?
Where have you been? Getting my Github activity
On my pretty and up-to-date CV, one of the first things one sees is my Github username, linking to my Github profile. What does a potential employer look at there? Hopefully not my non informative commit messages… My imitating a red Ampelmann, my being part of several organizations, my pinned repositories described with emojis… But how would they know where&how I’ve mostly been active without too much effort?
A considerable part of my Github work happens in organizations: I’m a co-editor at rOpenSci onboarding of packages, I contribute content to the R Weekly newsletter, etc. Although my profile shows the organizations I belong to, one would need to dig into them for a while before seeing how much or how little I’ve done. Which is fine most of the time but less so when trying to profile myself for jobs, right? Let’s try and fetch some Github data to create a custom profile.
Note: yep I’m looking for a job and ResearchGate’s suggestions are not helpful! Do you need an enthusiastic remote data scientist or research software engineer for your team? I’m available up to 24 hours a week! I care a lot about science, health, open source and community. Ideally I’d like to keep working in something close to public research but we can talk!
A tribute to Lucy D'Agostino McGowan's git commit emoji game
Do you know Lucy? She is a very talented biostatistics PhD candidate that I had the chance to e-meet thanks to R-Ladies. One maybe superficial reason to admire her, on top of her other achievements, is her emoji game in git commits. Looking at Lucy’s git history (find her on Github), one wants to start using version control because she makes it look fun!
In this post, I will download many git commit messages of Lucy’s from Github’s API via the
gh package, and have a look at the emojis she uses the most frequently.
Sow the seeds, know the seeds
When you do simulations, for instance in R, e.g. drawing samples from a distribution, it’s best to set a random seed via the function
set.seed in order to have reproducible results. The function has no default value. I think I mostly use
set.seed(1). Last week I received an R script from a colleague in which he used a weird number in
set.seed (maybe a phone number? or maybe he let his fingers type randomly?), which made me curious about the usual seed values. As in my blog post about initial commit messages I used the Github API via the
gh package to get a very rough answer (an answer seedling from the question seed?).
First commit or initial commit?
When I create a new .git repository, my first commit message tends to be “1st commit”. I’ve been wondering what other people use as initial commit message. Today I used the
gh package to get first commits of all repositories of the ropensci and ropenscilabs organizations.