Aerospace Corp. Principal Director or Technical Computing Stuart KerrA talk with The Aerospace Corp's Principal Director of Technical Computing Stuart Kerr, who runs a large R&D Linux infrastructure using a variety of open source technologies (clobber, puppet, spacewalk, etc.).
Mikhael Felker for Tom's IT Pro (MK):Tell us about your role and how it came about.
Stuart Kerr (SK): Sure, a little background to help set the stage. I am currently with The Aerospace Corporation. Our primary role is to assure space mission success for DoD, the intelligence community, and civil and commercial customers. Our population is composed primarily of engineers and scientists. Around the time of the personal computer, technical computation started to decentralize, not a particularly bad thing, but within our company, this continued to evolve over decades into a fragmented and unsustainable technical computing landscape of mini-data centers. I say unsustainable, for the following reasons:
- Power and Cooling: A significant amount of CPUs are quiescent, but still require power and cooling.
- Software licensing: There was substantial duplication, multiple vendor contacts, a myriad of organizational budgets with similar licenses.
- Security: As security threats evolved, protecting our systems became a higher priority. However, the landscape of mini data centers was being administered inconsistently, and our security, to put it in a sound byte - is only as strong as our weakest link.
- Our engineers and scientists are some of the best in their fields. They were not hired against a job description that included systems administration. But as these mini-data centers grew more complex, key engineering staff became distracted from their primary science by tending to their organization’s IT infrastructure.
About five years ago, The Aerospace Corporation’s CIO hired me from the Computer Systems Research organization to architect and deploy a High Performance Computing (HPC) infrastructure addressing the previously mentioned unsustainable issues. We are currently wrapping up what I call Phase 1 and are planning and prototyping the next phase.
In addition, we are responsible for enterprise application development and operations. This ranges from operating and maintaining proprietary legacy systems and architecture, to executing our current application development roadmap addressing a more mobile workforce.
MF: Tell us about the infrastructure (data centers, servers, apps, customers, etc.) that open-source supports.
SK: Our High Performance Computing (HPC) resources and services are used by approximately 2500 engineers and scientists located throughout the United States. Our architecture supports unclassified and very sensitive computations. I mention that because it is very important for us not to architecturally bifurcate these two environments. Operating similar architectures reduces overall costs, and more importantly, our customers do not have to context switch and learn another computational system as they transition between environments.
Our infrastructure consists of HPC clusters primarily comprised of commodity 1-Unit nodes, though we operate one large blade-based cluster. These handle our parallelized computations, non-linear optimization problems, and Monte Carlo analyses. We recognize there are pros and cons between blades and 1-Unit hardware platforms, but we have some unique requirements. In addition, we operate what we call “dense core” servers; these are nominally 4-Unit systems with as many cores and as much RAM as our budget allows. Our latest purchases have 1TB of RAM. All these systems run open source operating systems and server management software.
From an application development perspective, I would say about 95% of the tools and frameworks are Open Source Software. This is true particularly regarding our mobile application development. With the plethora of mobile devices and the BYOD pressures, we chose to implement a web based application architecture, versus device specific applications and associated “app-stores”. This decision was also determined by developer resource availability.
Mikhael Felker is an IT pro who has worked in Defense, Healthcare, High-Tech and Non-Profits. He teaches, writes, and speaks at numerous Southern California venues about technology.
See here to check out all his Tom's IT Pro articles.