Linux Support Philosophy

From TAMUQ Research Computing User Documentation Wiki
Jump to navigation Jump to search

The Challenges of Desktop Linux Support

It is challenging to provide robust support for researchers running one of the "free" Linux distributions on hardware that is not vendor-certified to support Linux, and employing open source science and engineering applications on such a platform. Because the challenge is greater than is generally realized, we'd like to help our users understand and appreciate it more. When a user starts to encounter problematic behavior in such a setup, the source of the problem is often difficult (i.e. time consuming) to identify. Without defining the nature of "problematic behavior" in our discussion (and this would cover a broad range of issues) we can still logically assert that it is often likely to fall within one of the following (non-comprehensive list of) categories:

  1. Mal-functioning or failing hardware.
  2. Apparently functional, but incompatible hardware (incompatible with the overlying operating system).
  3. Mis-configuration or sub-optimal configuration of the OS.
  4. Bugs in the user application code.
  5. Mis-configuration or faulty installation of the user application.
  6. Incorrect invocation of or input to the user application.

As a support team, the more we can shorten this list of potential problem sources, the more manageable our support challenge becomes.

Which Linux?

Let's examine one aspect of the challenge of supporting "Linux". Linux is often described as being available in many flavors; more precisely, these are called "distributions" (or "distros") within the Linux community. There are roughly a dozen popular and actively maintained distributions available to us today (although technically there are over 250 distributions, counting all the minor variants). All distributions are bundled around the same Linux "kernel" (core operating system code) so they are all still considered Linux at the end of the day. The differences in the distributions arise from the particular assortment of applications, utilities, and tools bundled into a coherent installable environment in any given case. Important differentiators include the type of windowing and desktop environments available in a distro, the package management toolset the distro relies on, and the breadth and diversity of the software repositories available to it. Some subsets of distros do share similar or identical toolsets and system applications, so knowing how to manage and use one distro goes a long way in knowing how to deal with other distros in the same subset (e.g. knowing debian helps with ubuntu, but much less with redhat; knowing fedora helps with centos, but not so much with freebsd, etc). But the differences -- particularly between distros from dissimilar subsets -- are significant enough that from a support perspective, it is very difficult for a single person to efficiently support more than two or perhaps three distros simultaneously.

Hardware Compatibility

Another challenge is the availability of a wide variety of laptops and desktops from various vendors that can now successfully (at least apparently) install one or more Linux distros. This leads users to believe that the hardware fully supports Linux because "Hey, I installed it and it boots up!" In reality, there are good reasons why system vendors do not automatically claim that their products (most of which are certified to run Windows) also support the Linux OS. Such claims, when they are made, typically force responsible vendors to discharge certain obligations, including extensive testing and certification of the target Linux distro on their product. In this context, it is useful to understand the interaction of hardware and software some more. A computing platform such as your laptop or desktop is a complex system comprised of numerous integrated subsystems working (hopefully) in perfect harmony to serve your needs. The complexity of these systems is managed and designed using layers of abstraction, whereby the working details of a particular subsystem are hidden from layers above and exposed to them only through standardized interfaces. This design principle facilitates interoperability, among other things.

One application of this is how a computer's hardware functionality is exposed to the computer's operating system (OS) software through device drivers. Device drivers are computer programs that operate hardware devices, providing software interfaces that enable the OS to interact with the concerned hardware. Drivers are hardware dependent and operating system specific, and are typically written by hardware development companies. Bugs in driver code may cause system instability or result in problems that are difficult to diagnose -- and in some cases even to reproduce. In the open source and Linux world, non-vendors (i.e. volunteer contributors, enthusiasts, "the community", etc) have also written drivers for a lot of hardware, either by reverse engineering or more often on the basis of relevant information officially released by hardware vendors that are otherwise unwilling to write and support Linux drivers on their own. This state of affairs means that if the correct and/or sufficiently robust Linux device drivers are not available for all the hardware present in your laptop or desktop, your bare-metal Linux installation on that system may (a) not have access to certain hardware components but be otherwise stable, or (b) have buggy or incomplete access to certain hardware components and features, limiting functionality and perhaps also exhibiting instability under certain circumstances. This cannot be a happy situation for any support organization.

Every researcher that comes to the support organization with their favorite Linux distro installed on their unique selection of hardware product running the open source application of their choice and expects efficient and timely support for the latest problem encountered will expose the organization to the challenges illustrated above.

Limited Resources

The Research Computing team is comprised of four technical staff members, apart from the director. Two of these staff are devoted entirely to the support of the High Performance Computing (HPC) infrastructure. At any given time the HPC infrastructure consists of a sizable cluster as well as a storage system, all shared by over a hundred researchers from a diverse set of disciplines using a diverse set of applications. In periods of transition, there may even be two HPC systems. Of the remaining two staff, one is dedicated to advanced programming support and visualization technologies, often even collaborating with (versus just supporting) researchers on their projects. The fourth staff member participates in general user support for all our users, user account processing, maintenance of user documentation, and also provides limited support for Linux outside of the HPC context (i.e. on desktops and laptops). This is a small team, and there are no hard barriers; we often help each other in fulfilling the group's collective mission and responsibilities. However, common sense and the dictates of efficiency and productivity require each member to maintain an area of focus pertinent to his or her role. In the end, what this means is that whichever way one looks at it, Research Computing does not have even one full FTE to support Linux on the desktop and in the labs.

Support Philosophy

In light of this critical limitation, we have only adopted the support philosophy that we feel we can actually sustain with reasonable effort. To put it another way, we will not promise what we know we are not in a position to deliver. And while this means there are many things we cannot do for you, we promise to do the best we can in areas that fit within our human resource envelope. In the past, we have characterized this approach as "best effort," but without sufficiently detailing what that means. Here, we would like to do just that. The thrust of this support philosophy is that (a) we will steer you away from common pitfalls in decision making concerning Linux at the planning/brainstorming stages, (b) we will make you aware of the operational "costs" and responsibilities of managing Linux on your own, and (c) we will enable you to help yourself with good step by step documentation for foundational tasks such as installing Linux. Furthermore, we will -- out of necessity -- also spell out what we will not do for you... we will not perform any system administration duties for your Linux systems (e.g. installing systems, applying updates, performing data backups/restores, configuring web services, database services, firewalls, or other such applications or functionality).

So then, how best can we leverage our time and expertise in service of your non-HPC Linux support needs? We believe we can do the following for our users:

  • After discussing the needs of your resrearch project, we can offer assessments on various technology options and implementation routes you could pursue.
    • "Does Linux fit in to the picture given what you are trying to do?"
    • "If you pursue a Linux solution, are you aware of the potential operational costs and challenges?"
    • "Do you really need to purchase hardware to implement the solution?"
    • "Do you really need a dual-boot system with both Linux and Windows on it? What are the pitfalls if you do?"
    • "Can virtual machines and/or containers help you achieve what you seek?"
    • "Is the best solution in the cloud?"
    • "Which hardware should you choose? Which Linux distribution is best?"
  • If your solution requires hardware that will run the Linux OS, we can recommend what hardware to buy.
    • We will ensure the hardware vendor supports one or more Linux distributions officially.
    • We will try to ensure the hardware can be procured easily in Qatar (and perhaps through TAMUQ IT)
  • We can offer self-help instructions on...
    • Bare-metal installs
    • Dual-boot installs
    • Linux guest VM's on a Windows host
    • Linux containers
    • How to manually back up important data from a linux host
    • Other common tasks
  • We can offer to answer basic questions about Linux system management, or point you to resources that help
  • When we become aware of common challenges affecting multiple Linux users, we can attempt to discover and publish solutions in the form of knowledge-base articles

In the end, despite the few things we can do for you, there is no doubt that given the paucity of our resources, you must be largely self-reliant if you are going to make desktop linux a centerpiece of your research needs. If this reality is accepted up front, then we can proceed with more clarity and direction.