Thursday, August 27, 2020

The joy of buying a standing desk during the pandemic

When my employer announced that we were all going to work remotely back in March, I had no semblance of a home office and had to scramble to figure out how to set up a space in my small urban apartment that would be suitable for days dominated by videoconferencing.  Thinking it'd be just a few weeks, I thought I could ride it out using my guest bedroom's writing desk, but by the time June rolled around and it was clear we were not returning before 2021, it was time to give up the fantasy and set up a real home office.  Priority #1 was to get a real desk.

Real desks are expensive, and if I was going to spend a few hundred dollars on one, I wanted it to be at the right ergonomic height.  I type at a height of 28 inches from the ground which turns out to be a non-standard desktop height, so my attention quickly turned to adjustable-height desks.  It wasn't long before my attention turned to standing desks which cost a bit more than just a couple hundred dollars, and being stingy, I spent weeks doing my homework and agonizing over exactly which model to order to get the most out of the $900+ investment.  For the benefit of anyone else facing a similar situation and wanting to agonize over the details of standing desks, I decided to document my adventure.

Wednesday, May 20, 2020

Exascale's long shadow and the HPC being left behind

The delivery of Japan's all-CPU Fugaku machine and the disclosure of the UK's all-CPU ARCHER 2 system amidst the news, solidly "pre-Exascale" machines with pre-exascale budgets, is opening old wounds around the merits of deploying all-CPU systems in the context of leadership HPC.  Whether a supercomputer can truly be "leadership" if it is addressing the needs of today using power-inefficient, low-throughput technologies (rather than the needs of tomorrow, optimized for efficiency) is a very fair question to ask, and Filippo took this head-on:



Of course, the real answer depends on your definition of "leadership HPC."  Does a supercomputer qualify as "leadership" by definition if its budget is leadership-level?  Or does it need to enable science at a scale that was previously unavailable?  And does that science necessarily have to require dense floating point operations, as the Gordon Bell Prize has historically incentivized?  Does simulation size even have anything to do with the actual impact of the scientific output?

While I do genuinely believe that the global exascale effort has brought nearly immeasurable good to the HPC industry, it's now casting a very stark shadow that brings contrast to the growing divide between energy-efficient, accelerated computing (and the science that can make use of it) and all the applications and science domains that do not neatly map to dense linear algebra.  This growing divide causes me to lose sleep at night because it's splitting the industry into two parts with unequal share of capital.  The future is not bright for infrastructure for long-tail HPC funded by the public, especially since the cloud is aggressively eating up this market.

Because this causes a lot of personal anxiety about the future of the industry in which I am employed, I submitted the following whitepaper in response to an NSCI RFI issued in 2019 titled "Request for Information on Update to Strategic Computing Objectives."  To be clear, I wrote this entirely on my personal time and without the permission or knowledge of anyone who pays me--to that extent, I did not write this as a GPU- or DOE-apologist company man, and I did not use this as a springboard to advance my own research agenda as often happens with these things.  I just care about my own future and am continually trying to figure out how much runway I've got.

The TL;DR is that I am very supportive of efforts such as Fugaku and Crossroads (contrary to accusations otherwise), which are looking to do the hard thing and advance the state of the art in HPC technology without leaving wide swaths of traditional HPC users and science domains behind. Whether or not efforts like Fugaku or Crossroads are enough to keep the non-Exascale HPC industry afloat remains unclear.  For what it's worth, I never heard of any follow-up to my response to this RFI and expect it fell on deaf ears.

Wednesday, April 1, 2020

Understanding random read performance along the RAIDZ data path

Although I've known a lot of the parameters and features surrounding ZFS since its relative early days, I never really understood why ZFS had the quirks that it had.  ZFS is coming to the forefront of HPC these days though--for example, the first exabyte file system will use ZFS--so a few years ago I spent two days at the OpenZFS Developer Summit in San Francisco learning how ZFS works under the hood.

Two of the biggest mysteries to me at the time were
  1. What exactly does a "variable stripe size" mean in the context of a RAID volume?
  2. Why does ZFS have famously poor random read performance?
It turns out that the answer to these are interrelated, and what follows are notes that I took in 2018 as I was working through this.  I hope it's all accurate and of value to some budding storage architect out there.