Reports to: System Architect
DUG runs multiple large clustered storage environments, based primarily on Lustre and ZFS. This role helps to administer and support the entire DUG storage environment, with a particular emphasis on planning, predictive monitoring, and troubleshooting. You’ll also work closely with the security, network, and senior system admins to diagnose problems that relate to storage.
- Overall responsibility for the planning, maintenance, and operation of network storage systems globally, both online and archive
- Troubleshooting the entire I/O path (from the desktop software all the way down to the disk platter) and root-cause analysis
- Help the monitoring/automation/DevOps engineers to design useful tools to simplify or otherwise improve operations
- Storage-related performance tuning
- Providing guidance to management on all storage infrastructure.
- Background as a solid Linux system administration generalist
- True Lustre expertise, with experience deploying, administering, and troubleshooting Lustre at scale
- Knowledge of modern Linux NFS theory and practice; we don’t use it a great deal, but it has its place
- The tenacity and attention to detail to pursue a challenging, complex issue to its root cause.
- Experience with a large-scale, high-performance ZFS deployment, particularly on Linux
- Experience diagnosing and solving problems in the Linux kernel.