Edit: now there is an R package for server-side MathJax rendering, katex by Jeroen Ooms.
Whilst I most certainly do not write LaTeX formulas on the regular anymore, I got curious about their MathJax rendering on websites.1 In brief : your website source contains LaTeX code, and the MathJax JS library (self-hosted or hosted on a CDN) transforms it into something humans can understand: some HTML with inline CSS but also some MathML for screen-reader users.2 As I quite like the idea of moving things from the client-side to the server-side, I started wondering whether the processing of LaTeX code could happen before an user opens a website. Searching for “server-side MathJax rendering” on the web gave a few hits. A few hits only, sure, which shows how niche the topic is, and meant reading resources was not too verwhelming. 😁 In this post I am reporting on my findings.
Why render MathJax on the server-side
Let me be honest, there is no extremely strong reason for me. Sure loading MathJax from a CDN might be a privacy problem (as the CDN measures traffic) or make your website potentially fragile if the CDN goes down (remember the Fastly outage last week). Now you could still self-host MathJax to mitigate that (and to know exactly who to blame in case of an outage i.e. yourself 😉). Not having MathJax might mean the page takes less time to load… if having the MathJax-rendered HTML weren’t making the page bigger than the same page with LaTeX code instead. But hey I still think it’s an interesting problem.
Something I realized whilst working on this post is that one probably does not want to remove all MathJax JS from the browser, as it includes accessibility options! You need some JS in order to have a menu etc. That’s not something I’ve tackled here, I think that to extract the right MathJax JS to have only the components for the menu but not those for processign input, one would need to understand MathJax much better than I do.
How to render MathJax on the server-side
The recipe would be:
- extract LaTeX code. E.g. use XPath with xml2 for HTML; or regular expressions if that’s your jam.
- transform the LaTeX string into HTML somehow.
- replace the LaTeX code with the HTML.
The first and second steps are things close to e.g. the HTML tweaking pkgdown does. They might well shall entail (un)escaping problems.
Now the second step, how to transform LaTeX code into HTML? The best summary of the state-of-the-art I found is the blog post Math Rendering is Wrong.
Solutions I saw are:
- a Node module called mathjax-node-cli. I know there are ways to use JS from R but I got lazy once I saw I had to update my Node installation or whatever the error message was.
- a way to run a MathJax API, which I have not tried.
- rendering math on a browser and extracting the result.
I will now focus on the last solution as it works well from R these days.
MathJax, but not in the viewer’s browser!
Let’s take a minimal HTML with MathJax loaded from a CDN and an empty paragraph with class “mathp”:
html <- xml2::read_html('<!DOCTYPE html>
<html>
<head>
<title>MathJax</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<script type="text/javascript" src="https://mathjax.rstudio.com/2.7.2/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
</head>
<body>
<p class="mathp">
</p>
</body>
</html>
')
Now let’s add math to it. Note that a real solution would have to differentiate in-line and block maths whose resulting HTML has a different XPath.
mathp <- xml2::xml_find_first(html, "//p[@class='mathp']")
xml2::xml_text(mathp) <- r"($$x = {-b \pm \sqrt{b^2-4ac} \over 2a}.$$)"
as.character(html)
#> [1] "<!DOCTYPE html>\n<html>\n<head>\n<title>MathJax</title>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n<script type=\"text/javascript\" src=\"https://mathjax.rstudio.com/2.7.2/MathJax.js?config=TeX-AMS-MML_HTMLorMML\"></script>\n</head>\n<body>\n\n<p class=\"mathp\">$$x = {-b \\pm \\sqrt{b^2-4ac} \\over 2a}.$$</p>\n\n</body>\n</html>\n"
file <- withr::local_tempfile(fileext = ".html")
#> Setting deferred event(s) on global environment.
#> * Execute (and clear) with `withr::deferred_run()`.
#> * Clear (without executing) with `withr::deferred_clear()`.
xml2::write_html(html, file)
Now we shall load the file in a browser via chromote3 and extract the HTML Pandoc produced. To find the XPath, I examined the HTML with the DevTools of my browser.
library("chromote")
b <- ChromoteSession$new()
b$Page$navigate(sprintf("file://%s", file))
#> $frameId
#> [1] "7FC1F01C63F746D0D66CC7E2DC40EF87"
#>
#> $loaderId
#> [1] "A3031F97EEF09D8BF745DF8C089F448F"
# Make sure we wait long enough
Sys.sleep(2)
doc <- b$DOM$getDocument()
x <- b$DOM$querySelector(doc$root$nodeId, ".MathJax_Display")
(math_html <- b$DOM$getOuterHTML(x$nodeId))
#> $outerHTML
#> [1] "<div class=\"MathJax_Display\" style=\"text-align: center;\"><span class=\"MathJax\" id=\"MathJax-Element-1-Frame\" tabindex=\"0\" data-mathml=\"<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>x</mi><mo>=</mo><mrow class="MJX-TeXAtom-ORD"><mfrac><mrow><mo>&#x2212;</mo><mi>b</mi><mo>&#x00B1;</mo><msqrt><msup><mi>b</mi><mn>2</mn></msup><mo>&#x2212;</mo><mn>4</mn><mi>a</mi><mi>c</mi></msqrt></mrow><mrow><mn>2</mn><mi>a</mi></mrow></mfrac></mrow><mo>.</mo></math>\" role=\"presentation\" style=\"text-align: center; position: relative;\"><nobr aria-hidden=\"true\"><span class=\"math\" id=\"MathJax-Span-1\" style=\"width: 10.144em; display: inline-block;\"><span style=\"display: inline-block; position: relative; width: 9.552em; height: 0px; font-size: 106%;\"><span style=\"position: absolute; clip: rect(0.307em, 1009.47em, 3.054em, -1000em); top: -2.182em; left: 0em;\"><span class=\"mrow\" id=\"MathJax-Span-2\"><span class=\"mi\" id=\"MathJax-Span-3\" style=\"font-family: MathJax_Math-italic;\">x</span><span class=\"mo\" id=\"MathJax-Span-4\" style=\"font-family: MathJax_Main; padding-left: 0.278em;\">=</span><span class=\"texatom\" id=\"MathJax-Span-5\" style=\"padding-left: 0.278em;\"><span class=\"mrow\" id=\"MathJax-Span-6\"><span class=\"mfrac\" id=\"MathJax-Span-7\"><span style=\"display: inline-block; position: relative; width: 7.116em; height: 0px; margin-right: 0.12em; margin-left: 0.12em;\"><span style=\"position: absolute; clip: rect(2.879em, 1007em, 4.443em, -1000em); top: -4.753em; left: 50%; margin-left: -3.498em;\"><span class=\"mrow\" id=\"MathJax-Span-8\"><span class=\"mo\" id=\"MathJax-Span-9\" style=\"font-family: MathJax_Main;\">−</span><span class=\"mi\" id=\"MathJax-Span-10\" style=\"font-family: MathJax_Math-italic;\">b</span><span class=\"mo\" id=\"MathJax-Span-11\" style=\"font-family: MathJax_Main; padding-left: 0.222em;\">±</span><span class=\"msqrt\" id=\"MathJax-Span-12\" style=\"padding-left: 0.222em;\"><span style=\"display: inline-block; position: relative; width: 4.567em; height: 0px;\"><span style=\"position: absolute; clip: rect(3.073em, 1003.54em, 4.268em, -1000em); top: -4.009em; left: 1em;\"><span class=\"mrow\" id=\"MathJax-Span-13\"><span class=\"msubsup\" id=\"MathJax-Span-14\"><span style=\"display: inline-block; position: relative; width: 0.858em; height: 0px;\"><span style=\"position: absolute; clip: rect(3.139em, 1000.42em, 4.197em, -1000em); top: -4.009em; left: 0em;\"><span class=\"mi\" id=\"MathJax-Span-15\" style=\"font-family: MathJax_Math-italic;\">b</span><span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span><span style=\"position: absolute; top: -4.298em; left: 0.429em;\"><span class=\"mn\" id=\"MathJax-Span-16\" style=\"font-size: 70.7%; font-family: MathJax_Main;\">2</span><span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span></span></span><span class=\"mo\" id=\"MathJax-Span-17\" style=\"font-family: MathJax_Main; padding-left: 0.222em;\">−</span><span class=\"mn\" id=\"MathJax-Span-18\" style=\"font-family: MathJax_Main; padding-left: 0.222em;\">4</span><span class=\"mi\" id=\"MathJax-Span-19\" style=\"font-family: MathJax_Math-italic;\">a</span><span class=\"mi\" id=\"MathJax-Span-20\" style=\"font-family: MathJax_Math-italic;\">c</span></span><span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span><span style=\"position: absolute; clip: rect(3.56em, 1003.57em, 3.958em, -1000em); top: -4.69em; left: 1em;\"><span style=\"display: inline-block; position: relative; width: 3.567em; height: 0px;\"><span style=\"position: absolute; font-family: MathJax_Main; top: -4.009em; left: -0.084em;\">−<span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span><span style=\"position: absolute; font-family: MathJax_Main; top: -4.009em; left: 2.873em;\">−<span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span><span style=\"font-family: MathJax_Main; position: absolute; top: -4.009em; left: 0.388em;\">−<span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span><span style=\"font-family: MathJax_Main; position: absolute; top: -4.009em; left: 0.885em;\">−<span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span><span style=\"font-family: MathJax_Main; position: absolute; top: -4.009em; left: 1.382em;\">−<span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span><span style=\"font-family: MathJax_Main; position: absolute; top: -4.009em; left: 1.879em;\">−<span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span><span style=\"font-family: MathJax_Main; position: absolute; top: -4.009em; left: 2.376em;\">−<span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span></span><span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span><span style=\"position: absolute; clip: rect(2.983em, 1001.02em, 4.536em, -1000em); top: -4.103em; left: 0em;\"><span style=\"font-family: MathJax_Size1;\">√</span><span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span></span></span></span><span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span><span style=\"position: absolute; clip: rect(3.167em, 1001.01em, 4.196em, -1000em); top: -3.323em; left: 50%; margin-left: -0.514em;\"><span class=\"mrow\" id=\"MathJax-Span-21\"><span class=\"mn\" id=\"MathJax-Span-22\" style=\"font-family: MathJax_Main;\">2</span><span class=\"mi\" id=\"MathJax-Span-23\" style=\"font-family: MathJax_Math-italic;\">a</span></span><span style=\"display: inline-block; width: 0px; height: 4.009em;\"></span></span><span style=\"position: absolute; clip: rect(0.811em, 1007.12em, 1.238em, -1000em); top: -1.281em; left: 0em;\"><span style=\"display: inline-block; overflow: hidden; vertical-align: 0em; border-top: 1.3px solid; width: 7.116em; height: 0px;\"></span><span style=\"display: inline-block; width: 0px; height: 1.061em;\"></span></span></span></span></span></span><span class=\"mo\" id=\"MathJax-Span-24\" style=\"font-family: MathJax_Main;\">.</span></span><span style=\"display: inline-block; width: 0px; height: 2.182em;\"></span></span></span><span style=\"display: inline-block; overflow: hidden; vertical-align: -0.8em; border-left: 0px solid; width: 0px; height: 2.662em;\"></span></span></nobr><span class=\"MJX_Assistive_MathML MJX_Assistive_MathML_Block\" role=\"presentation\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><mi>x</mi><mo>=</mo><mrow class=\"MJX-TeXAtom-ORD\"><mfrac><mrow><mo>−</mo><mi>b</mi><mo>±</mo><msqrt><msup><mi>b</mi><mn>2</mn></msup><mo>−</mo><mn>4</mn><mi>a</mi><mi>c</mi></msqrt></mrow><mrow><mn>2</mn><mi>a</mi></mrow></mfrac></mrow><mo>.</mo></math></span></span></div>"
The math_html
above is the MathJax-rendered math HTML we were after!
Now we create a minimal HTML without MathJax and add the math HTML to it.
html <- xml2::read_html('<!DOCTYPE html>
<html>
<head>
<title>MathJax</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body>
<p id="mathp" class="mathp">
</p>
</body>
</html>
')
mathp <- xml2::xml_find_first(html, "//p[@id='mathp']")
xml2::xml_text(mathp) <- math_html$outerHTML
Sadly the step above means the math HTML will have been escaped. One could find an actual fix, but my fix will be to unescape the HTML à la pkgdown i.e. literally using an internal function of pkgdown’s.
# from https://github.com/r-lib/pkgdown/blob/23eb05ceceda1c44573b254dd8b96e92cd91f825/R/html-build.R#L49
unescape_html <- function(x) {
x <- gsub("<", "<", x)
x <- gsub(">", ">", x)
x <- gsub("&", "&", x)
x
}
html <- xml2::read_html(unescape_html(as.character(html)))
xml2::write_html(html, "example.html")
Now we can look at the example and see that it has indeed some math! It’s probably lacking fonts, the absence of any MathJax JS means that we see the math twice, one of them being a div for assistive technology.
Conclusion
In this post I explored server-side MathJax rendering. I have not created a workable solution : An actually acceptable solution would necessitate more knowledge of MathJax, be far more detail-oriented and would check the accessibility of the produced document! Nevertheless, it was interesting to me to use chromote to extract HTML produced by MathJax and to learn more about MathJax. I don’t think I’ll feel like writing more LaTeX than now but I am sure curious to also check out MathJax “competitors” like KaTeX.4
-
MathJax is not the only way to render math for the web but it is the most popular one. ↩︎
-
I found the video Accessible Math on the Web: A Server/Client Solution interesting, albeit a few years old already. Yes, I watched a video about SAS documentation. ↩︎
-
One could also choose to use crrri but this is left as an exercise to the reader. ↩︎
-
If you use Katex you might be interested in a related PR to rmarkdown. ↩︎