Posts Tagged ‘excel’

High-performing teams vs. not-invented-here syndrome

A few months ago, being particularly frustrated by yet-another-bug and yet-another-limitation of a library used in one of my team’s systems, I remembered a story about the Excel dev team and dug up In Defense of Not-Invented-Here Syndrome, which I read years ago. I didn’t think much of the essay when I first read it but now, having been in the industry for a while, I have a greater appreciation for it.

NIH syndrome is generally looked at in a negative light and for good reason; companies and teams that are too insular and reject ideas or technologies from the outside can find themselves behind the curve. However, there’s a spectrum here and, at the opposite end, heedless adoption of things from the outside can put companies and teams in an equally precarious position.

So, back to the story of the Excel development team:

“The Excel development team will never accept it,” he said. “You know their motto? ‘Find the dependencies — and eliminate them.’ They’ll never go for something with so many dependencies.”

Dealing with dependencies is a reality of software engineering, perhaps even more-so now than in the past, and for good reason, there’s a world of functionality that can simply be plugged into a project, saving significant amounts of time and energy. However, there’s a number of downsides as well:

  • Your team doesn’t control control the evolution or lifecycle of that dependency
  • Your team doesn’t control the quality of that that dependency
  • Your team doesn’t have knowledge of how that dependency does what it does

When something breaks or you hit a limitation, your team is suddenly spending a ton of time trying to debug an issue that originates from a codebase they’re not familiar with and, once there’s an understanding of the issue, coding some ugly hack to get the dependency to behave in a more reasonable way. So when a team has the resources it’s not unreasonable to target elimination of dependencies for:

  • A healthier codebase
  • A codebase that is more easily understood and can be reasoned about

These 2 points invariably lead to a higher performing team. In the case of the Excel dev team:

The Excel team’s ruggedly independent mentality also meant that they always shipped on time, their code was of uniformly high quality, and they had a compiler which, back in the 1980s, generated pcode and could therefore run unmodified on Macintosh’s 68000 chip as well as Intel PCs.

Finally, Joel’s recommendation on what shouldn’t be a dependency and be done in-house:

Pick your core business competencies and goals, and do those in house.

This makes sense and resonates with me. Though there is a subtle requirement here that I’ve seen overlooked: engineering departments and teams need to distill business competencies and goals (hopefully, these exist and are sensible) into technical competencies and goals. Without that distillation, engineering is rudderless; teams pull in dependencies for things that should be built internally, while others sink time into building things from scratch that will never get the business resources to be properly developed or maintained.

Are spreadsheets databases?

A few weeks ago, news that using Excel resulted in the loss of ~16,000 coronavirus cases in England due to a 65K row limit in XLS files, sparked a number of tweets around how Excel isn’t a database. There’s a lot to cringe at here, but saying Excel or, more precisely, spreadsheets aren’t databases and using a “proper database” would have prevented a failure is incredibly reductive.

excel is not a database

Definition

To start, it’s worth looking at what the definition of “database” actually is:

definition of database

At least from that definition, I think it’s fair to say spreadsheets are databases. They’re primitive, there’s little-to-nothing in the way of concurrency, security, constraints, etc., but they are databases.

In software engineering, the term “database” is typically shorthand for a relational database, but that feels more and more problematic, as we now see many more databases in use now that aren’t relational (Cassandra, Mongo, DynamoDB, etc.). Even S3 now serves as a foundational basis for many databases.

System Considerations

Beyond definitions, what’s also interesting here is that tooling and database choice is never a simple equation for a non-trivial data system; there’s a host of considerations that come into play. Here’s a few that pop into my mind:

  • Interfacing: Who’s accessing and/or manipulating data in the system? Sophisticated data systems and bespoke interfaces are powerful, but they require training and expertise, and there’s typically a higher maintenance burden. Leveraging common and ubiquitous tooling can be beneficial when it comes to interfacing needs for a larger audience. This comment on ArsTechnica points out that the “proper tools” are inordinately complex and not something a typical end-user can pick-up and understand easily, and I think that’s a fair assessment of the product landscape.
  • System limits: every data system has limits, some are explicit and obvious, some are not. It’s also not surprising to bump into limits due how a database is setup or how a schema is designed. Hitting a 65K row limit is frustrating and problematic, but so is discovering a value is truncated because the field length was set too small or an incorrect type was used.
  • Cost: What is the cost of the technical infrastructure? What about the cost of the people needed maintain the system? Unsurprisingly, more sophisticated and complex systems will cost more.
  • Failure modes: What are some common ways this system fails? What does it take to recover and get back to normal operations? With simple systems you tend to just hit hard limits, but more complex systems fail in a multitude of ways.
  • Time: When does this need to be shipped and what compromises need to be made? When you don’t have weeks or months to design and prototype, leveraging per-existing and proven method is typically the path of least resistance.

It’s perhaps easy to point to some of the issues that come into play with Excel and spreadsheets, but any data system will have its fair share of limits and risks, along with any potential benefits.

CSV file detected as SYLK file in Excel

Working on some CSV export code and encountered this weird error when opening the CSV file in Excel,

Excel has detected that 'test.csv' is a SYLK file, but cannot load it. Either the file has errors or it is not a SYLK file format. Click OK to try to open the file in a different format.

The file was certainly a CSV file; I had no clue what an SYLK file was.

The issue, as I discovered here, is due to the first 2 letters in the file being “ID” (which was the name of the first column); change this to something else (e.g. “Id”) and the file loads correctly without any warnings or errors.