What happened at Idaho National Laboratory?


A story of aliens, time bots, and mind control

Part I, The Department of Energy's Student Undergraduate Laboratory Internships

I was an undergraduate intern at INL, hired in August 2019. I made my way in through the SULI program, which at the time received little attention. My internship mentor, Dr. Gorakh Pawar, said that he chose me because he was impressed by my coursework. This was surprising to me because I assumed that more attention would have been given to my experience through projects and other extra curricular activities. After orientation, I was assigned a cubicle with a slow desktop computer and debriefed on some of the activities that the material science department was working on. I was told that if I needed more computing power they could get me a better computer, but that I also have access to the supercomputers.

For the first four months, I took loosely defined projects and was permitted to play with them and see what I could do. I have NDA restrictions that prevent me from giving specific details, but I worked on simulations of lithium and carbon atoms and analyzed SEM images of material samples and chemical responses. For some of this, I am most grateful for the discrete mathematics and machine learning courses I took. From basic data manipulation, I found some features in the simulations that were not noticed before. I automated certain analysis processes and I created an image analysis GUI for the SEM images.

I found out that there are some locals who perceived Idaho National Laboratory as a highly secretive government facility, and that in the most speculative minds belonged to the stuff of Area 51, within the context of sci-fi novels. I neither deny nor confirm this... I am joking. INL is primarily an energy research site. It also deals with engineering and national security. As I see it, the non-disclosures exist for much of the same reasons that corporations hold trade secrets. INL is a leader in clean energy research and its various projects, perhaps especially with security, are often in competition with foreign governments. The nuclear reactors INL is founded with have shaped its culture to one of the highest standards of safety. This culture has some impact on other projects, where safety mostly translates to security. This is not necessarily a bad thing, but I think this with the naturally differing levels of secrecy for projects is what results in the heightened perception among some people. INL is open about many of its activities. Research on improving wind energy, biofuels, and electric vehicles, studying geology, and the military's use of the advanced test reactor are all examples of things that you can get a great deal of information on from an open tour.

At the end of 2019, Dr. Pawar and I decided to extend the internship. The SULI program was over and I became an intern funded more directly by INL. We began focusing on more specific objectives and worked closely with Dr. Boryann Liaw on his projects. Dr. Yuxiao Lin was a part of our small team also. Lin was pretty good at Python and MATLAB. While we set the internship to end in April 2020, we later extended it to July, August, September, then December before realizing that we could not continue after November since I was no longer in school. Having spent plenty of time in academics, I am ready to move on to other things and further prove myself in a commercial setting.

Part II, 2020

So, the internship lasted through November 2020; what happened during that time?

I worked mostly on studying the data of lithium atoms in molecular dynamics simulations created from LAMMPS. During the simulation, discrete timesteps may be saved with the desired details of energy levels. Recording the states of each atom at every timestep provides more valuable information. Visualization and feature engineering allow for good insight. Sometimes the amount of data I wanted to process for generating values, assigning predictions, or otherwise was more than could fit in memory. (I had a much better work computer than I started with by now.) Most of the time, I would run Jupyter Notebook for this on the servers. Sometimes that would still not provide enough memory, so I would process what I could in chunks or sample the data in a way that was representative. I would further reduce processing time by the usual techniques of vectorization, smarter code, multiprocessing, storing data in a binary format, and possibly substituting for a data.table backend.

I researched analysis methods in existence and applied them in various combinations with my own algorithms to accomplish things. I typically dove in and used whatever would work, which was a great strategy when helping another project nearing its deadline. However, I wanted my work to be easy to reproduce and I started opting for simpler models, which were also easier to interpret. I often created a long exploratory process to extract some feature, then used my results to train a classic regression model to recognize or quantify it. I had a tendency to like models with a high "accuracy" (high adjusted r-squared or low AIC), but I could validate them to see just how much prediction is actually lost when simplifying terms, to get a fair balance. Dr. Liaw suggested I look more into fuzzy logic systems and this was useful in a case where I had a qualitative value that I wanted to convert to a continuous scale. I wrote an R library (not public) with lots of documentation that provides the necessary analysis features.

Like many people, I transitioned to telecommuting. This was a mostly painless process with our technical support. Due to the busy-ness, there were many problems I was left to work around: X required approval, so I set up Y instead. Y stopped working, but by then I'd gotten permission for X. --That kind of thing. Sometimes I would not have much left to do because I had to wait on a computation or management thing. I would max out the multitasking that could effectively be done, so I'd switch to general research. One time, I didn't have access to a computer at all, so I looked things up on my phone and made algorithm concept notes. Those notes, which I created in October 2019, helped for several months after.

Part III, Communication

I was the only data scientist out of us four, and possibly all of the material science and engineering department. Further, I come from a business mindset, not an academic, "publish or perish" one. So, we didn't share the same background. This is something to expect anywhere. Cultural differences are even more apparent in between departments in college. I like to think computer science is the most tethered to the real world, but an observant study of forums proves otherwise. Surely the business department is down to earth, yet it's obvious they're all loons. No, actually, business and English majors were among the most fun to be with, although they don't talk much about math and science. We bond over other things.

I've digressed. It seemed Dr. Liaw had the best sense about machine learning and the motivations of data science. He understood that the solution to one problem was, in principle, the solution to another even when from a completely different field. And while Dr. Pawar was initially suggested as the one to translate between us, it seemed that Liaw was the real translator. He asked good questions that caused me to clarify a plot or process. So did Dr. Lin. I don't mean that any of them were less competent. They all had different strengths. Dr. Pawar is perhaps the most friendly and dedicated. As one example, he and his wife took all of us to dinner one night. I suppose it's interesting that while talking about communication I keep talking about relationships.

While I may have had fine communication skills, they showed me that I needed to emphasize certain details and explain some things that I came to take for granted when working with other data science students. Sometimes we used the same words to talk about different things and we didn't realize it until later, even after explaining things back to each other. We had to directly address words that created confusion. "Cluster" is the best example, as the way mathematicians and ML engineers (I) use that word doesn't match how it appears in the dictionary. Sometimes I would slip up and then append my statement, clarifying which definition I meant.

Part IV, The End

My internship ended more abruptly than I or the other scientists would have liked. One of the most important things to me was the perpetuation of the work, so I left with some details about projects to pursue in the future. I also suggested a place to start for reducing the computation cost of the simulations. That was something I didn't cover with the work I did, but will matter in the future. Knowing that the ease of transition plays a strong role in whether an innovation is adopted, I provided a tutorial for getting acquainted with and picking up from my work.

I don't have much control over INL's future, but there is something I want everyone who reads this to understand about data science. No matter who you are, you should learn more about it because data is a major part of all of our lives. When you are on social media, you are being fed data and the most important fact checker is you. If you are a scientist, you collect and analyze data, so you are already using data science. Wouldn't knowing some more techniques or refreshing your knowledge of statistics be helpful?

Believe that you can learn because you can. The industry is broad and fast growing, but you donโ€™t need to let how much there is to learn about it impede you because you don't have to learn everything at once. The value you gain from the time you put into knowing data science may be something like a log function:

Plot: log(Time spent learning data science) = (Usefulness of that knowledge)

I don't mean to suggest that learning is useless after some finite point, but that you can get a lot by learning just a little.

And the idea that you don't learn much once you are old is much less true than you might think it is.

One more thing

If you are used to scientific and academic research I may need to tell you that for data science and most computery fields, you will want to use Google more often than you use Google Scholar. The peer reviewed papers are great, but there's much more out there. I suggest becoming comfortable with forums, blogs, and YouTube at 2x speed. Use StackExchange and get familiar with GitHub. Be sure to inspect that code works as intended. That's it.