Mastodawn

I started outlining Book III as a change in direction from Book II and a way to salvage parts of Book I

Book I made it to the proposal stage (rejected with honors). Book II was a rethink of Book I and got a tiny amount of mental traction.

The working titles were (approximately)
I: Revitalizing Legacy Scientific Software
II: Writing Legacy Scientific Software (and Why You Would Want To)
III: Analysis Software and Engineering Practice

I wanted to avoid "A Holistic Approach" for III because while it's accurate, it sounds vague and "holistic medicine" really taints the term.

The quick summary is that the practice of engineering analysis revolves around the production of technical reports and calculations as artifacts of the analysis process. We first need to undetstand the purpose and practice of engineering analysis, define the role of analysis software, and develop critical characteristics of analysis software to ensure it supports engineering peactic.

It's a long way of saying we need to understand the work philosophy, intent, practice, and artifacts to ensure software tools support good work practice. We also need to rethink how we teach software development to engineers, to integrate it with practice rather than isolating and simplifying it as a generic task suitable for computer science educators and a general audience. Software development requires context to understand good practice in a specific environment.

Show thread

arclight 1d ago

Honestly, I should just write 30-40 pages on work practice and artifacts, leaving all the computer stuff aside. Just focus on providing context for later discussion of software design, requirements, and characteristics.

Show thread

curtosis 1d ago

@arclight It does seem that context is so critical to understanding why the computer stuff ends up being so different to Software As She Is Taught. I can’t even come up with a good analogy.

Show thread

arclight 1d ago

@curtosis This is coming straight from subject matter expert territory. We have plenty of books on numerical coding and "scientific" programming and basically none on characteristics of software that works well in its domain. This is the forgotten land of "nonfunctional requirements" and usability beyond half-assed web/app design.

An example is having a file "manifest", a list of every file read or written for a given computer run. This helps with review and can be used to archive cases and delete stale output. Building a file manifest within an application is very easy; a simple 3 column CSV file of name, description, and classification (code, configuration, content, or crap) makes archiving and cleanup trivial. In turn that keeps systems available by reducing buildup of garbage files and ensures work is current and properly archived for review or later revision.

Having a standard set of command line options reduces the cognitive burden on workers, especially having --help, --version, --usage, and --dry-run (show me what the code is going to do without writing output).

An idea I stole from Doxygen and Postfix - generate a default commented config file, strip comments/make canonical, and show only changes from default configuration. These get people working quickly and help users & mentors debug systems. The coding isn't difficult and it saves people a ton of time especially when debugging configs.

Show thread

Paolo Amoroso 2d ago

@arclight Are you writing a book (series)?

Show thread

arclight 2d ago

@amoroso I'm trying to write one book that has a chance of being salable. I've spent a lot of time thinking through the scientific software recovery and refactoring process. People who care about that have been encouraging but I had a nice meeting with Bill Pollock of No Starch Press and we couldn't find a big enough market to make it pay for itself.

I tried to reframe the problem as understanding the big picture of what these codes do and why that is (or was) valuable. Show legacy and modern ways of handling the same needs so the reader gets something useful whether they are writing new code or refactoring something old.

That seemed to be missing a lot of context as to the core philosophy of why we need certain functions or facilities and why systems need to be built or documented or tested a specific way (e.g. unless they are well-documented and comprehensive, unit tests aren't adequate as acceptance tests. Unit testing is a development activity, not a qualification or V&V activity.)

So I backed up a further level, trying to summarize the process of engineering analysis (vs design or operations), looking at the artifacts of that process (technical reports and calculations), and explaining how aspects of the software support the creation of those artifacts. There are hard rules that apply to all aspects of analysis including software: cite your sources, state your assumptions, show your work, clearly define terms and show units of measure, state limits of applicability, ensure the work can be independently verified _in practice_.

That last one is key for large spreadsheets. In theory one could independently check the correctness of every spreadsheet cell (each of which is a line of code). In practice, a reviewer can't verify more than a few thousand cells and the spreadsheet should be converted to a traditional auditable application. Once you lay out the characteristics of dependable trustworthy analysis software, it becomes glaringly obvious that Excel is not fit for purpose. The convenience of construction is offset by the intractability of testing, review, configuration management, documentation, testing, etc., etc.

The book idea shifted from being purely about computation to primarily being about the engineering analysis process and how to design and implement software for that environment. I think that makes the book applicable to a much wider audience and helps bridge the gap between developers and analysts. It's especially important for researchers writing code because often those systems take on a life of their own and get used in environments the code authors don't understand. Knowing the restrictions and requirements on production engineering code would help avoid a lot of serious preventable problems when creating research software.