Archive for the ‘Random’ Category

A look at S.M.A.R.T.

What is S.M.A.R.T.?

Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T) is a monitoring system on computer hard drives and solid state drives. SMART primarily consists of a set of a attributes, with the disk monitoring current and worst values for said attributes. The attributes, associated ID number, and the range for normalized values (1 to 253) are standardized across disks. Unfortunately, there’s no standardization around which attributes are implemented, the range for raw values, what raw values actually represent, or what the threshold is for normalized value. Despite the lack of standardization around the attribute values, SMART still provides some value and offers a significant degree of observability into the state and behavior of the disk.

SMART is something I’ve been aware of for a while, but it never seemed to really matter all that much for my consumer desktop needs. There are a bunch of Windows apps to check the SMART attributes and I vaguely recall having something installed to check for out-of-range values but regularly monitoring/checking wasn’t something I took seriously; just having a backup was typically good enough and I could deal with a drive failure if/when it happened. Recently, motivated by re-using an old motherboard and CPU for a NAS server, making use of a batch of old hard disks I had accumulated, and maximizing the storage capacity of the box, I decided to take another look at SMART and its efficacy in predicting drive failures. Ideally, I could have a system where a drive could be replaced before a failure resulting in data loss.

SMART attributes correlated to hard drive failure

Google

A Google study from 2007, “Failure Trends in a Large Disk Drive Population“, lists 4 SMART attributes that are highly correlated with failure, these are:

  • SMART 5: Reallocated Sectors Count
  • SMART 187: Reported Uncorrectable Errors
  • SMART 197: Current Pending Sector Count
  • SMART 198: Uncorrectable Sector Count

It’s also worth noting that any change in SMART 187 was seen to be highly predictive of failure:

… after their first scan error (i.e. when a positive value for 187 is observed for the first time), drives are 39 times more likely to fail within 60 days than drives with no such errors.

Other attributes were looked at but results were not always consistent across models and manufacturers. The study also found using SMART parameters to predict failure was severely limited, as a large number of failed drives shown no SMART errors whatsoever:

Out of all failed drives, over 56% of them have no count in any of the four strong SMART signals, namely scan errors, reallocation count, offline reallocation, and probational count. In other words, models based only on those signals can never predict more than half of the failed drives.

… even when we add all remaining SMART parameters (except temperature) we still find that over 36% of all failed drives had zero counts on all variables.

… failure prediction models based on SMART parameters alone are likely to be severely limited in their prediction accuracy, given that a large fraction of our failed drives have shown no SMART error signals whatsoever.

This is incredibly important as correlating SMART attributes to failure means little if the correlation simply doesn’t matter for a significant percentage of drives. From skimming a few other papers, this is also something that I don’t always see being addressed/re-addressed, which is disappointing.

Backblaze

Backblaze conducted an analysis on their drives in 2016 which also showed some interesting results. In addition to the 4 SMART attributes identified in the Google study, Backblaze also found another attribute highly correlated to failure:

  • SMART 188: Command Timeout

Similar to the Google study, Backblaze also found a significant number of failed drives reporting no SMART errors for these 5 attributes but, interestingly, it was a smaller percentage that that in the Google study:

Failed drives with one or more of our five SMART stats greater than zero: 76.7%.

That means that 23.3% of failed drives showed no warning from the SMART stats we record.

Another study utilizing Backblaze’s data, “Lifespan and Failures of SSDs and HDDs: Similarities, Differences, and Prediction Models“, was also interesting as it points to another attribute highly correlated with failure:

  • SMART 240: Head Flying Hours

We examine all SMART features for HDDs, and find out that head flying hours (HFH, SMART 240) is highly related to failures even if it is not correlated with other HDD features.

A more recent post from Backblaze looks at the paper “Interpretable predictive maintenance for hard drives” which utilizes data published from Backblaze. I didn’t have a clear takeaway from the paper it did highlight the limitation of previous studies:

The analyses from Backblaze and Google were univariate and only considered correlation between failures and a single metric at a time. As such, they would not be able to detect any nonlinear interactions between metrics that affected the chance of failure. Another limitation of this analysis is that it leaves humans to choose the cutoff values that will raise alerts if exceeded.

So, for hard drives, SMART is interesting. We can say that there are maybe 6 attributes we should definitely be looking at when observing for drive failure but, unfortunately, a drive may still fail without showing any anomalous values for these attributes.

SMART attributes correlated to solid-state drive failure

While their price-per-gigabyte is still much higher than that of a hard drive, solid-state drives are increasingly commonplace. While SSDs do support SMART, I couldn’t as much research done on SSDs. The study I mentioned above, “Lifespan and Failures of SSDs and HDDs: Similarities, Differences, and Prediction Models“, didn’t use SMART attributes but instead daily performance logs in a proprietary format:

… daily performance logs for three MLC SSD models collected at a Google data center over a period of six years. All three models are manufactured by the same vendor and have a 480GB capacity … they utilize custom firmware and drivers, meaning that error reporting is done in a proprietary format rather than through standard SMART features.

Another study, “An In-Depth Study of Correlated Failures in Production SSD-Based Data Centers” didn’t find any correlation with SMART attributes:

Intra-node and intra-rack failures have limited correlations with the SMART attributes and have no significant differences of correlations with each SMART attribute. Thus, the SMART attributes are not good indicators for detecting the existence of intra-node and intra-rack failures in practice.

Finally, I looked at “SSD Failures in Datacenters: What? When? and Why?” which had a similar conclusion:

… even though tracking [SMART] symptoms is important, prognosis of whether a SSD will fail(-stop) or not, cannot be made entirely based on the symptoms. This motivates us to study other factors, beyond SMART symptoms, to better understand the characteristics of failed devices.

So it seems that for SSDs, SMART isn’t an effective tool when it comes to predicting failure.

Reading SMART attributes

On Windows, there’s a number of tools to read SMART attribute, these posts on superuser list a bunch. On Linux, smartmontools seems to be available for most distros; it’s fairly easy to install and use (from the command line, something like sudo smartctl --all /dev/sda).

As for reading the SMART data programmatically, information was more sparse. For Windows, this article provides some code and points to using DeviceIoControl() to communicate with the device driver to retrieve the attributes. For Linux, this post provides a lot of good information and code on how to read the attributes. I haven’t tried implementing either of these approaches myself, but something I might play around with for a future project.

Deployments with git tags + npm publish

Git tags and npm

Deployment workflows can vary a lot, but what I’ve tended to find ideal is to tag releases on GitHub (or whatever platform, as most have some mechanism to handle releases), the tag itself being the version number of whatever is being deployed, and having a deployment pipeline orchestrate and perform whatever steps are necessary to deploy the application, service, library, etc. This flow is well supported with git, well supported on platforms like GitHub, and is dead simple for developers to pick up and work with (in GitHub, this means filling out a form and hitting “Publish release”).

npm doesn’t play nicely with this workflow. With npm version numbers aren’t tied to git tags, or any external mechanism, but instead to the value defined in the project’s package.json file. So, trying to publish a package via tagging requires some additional steps. The typical solutions seem to be:

  • Update the version in package.json first, then create the tag
  • Use some workflow that include npm version patch to have npm handle the update to package.json and creating the git tag
  • Use an additional tool (e.g. standard-version), that tries to abstract away management of version numbers from both package.json and git tag

None of these options are great; versioning responsibility and authority is pulled away from git and, in the process, additional workflow complexity and, in the latter case, additional dependencies are introduced.

Version 0.0.0

In order to publish with npm, keep versioning authority with git, and maintain a simple workflow that doesn’t include additional steps or dependencies, the following has been working well in my projects:

  • In package.json, set the version number to “0.0.0”; this value is never changed within any git branch and, conceptually, it can be viewed as representing the “dev version” of the library. package.json only has a “non-dev version” for code published to our package repository.
  • In the deployment pipeline (triggered by tagging a release), update package.json with the version from the git tag.

    Most CI systems have some way of getting the tag being processed and working with it. For example, in CircleCI, working with tags formatted like vMAJOR.MINOR.PATCH, we can reference the tag, remove the “v” prefix, and set the version in package.json using npm version as follows:

    npm --no-git-tag-version version ${CIRCLE_TAG:1}

    Note that this update to package.json is only done within the checked-out copy of the code used in the pipeline. The change is never committed to the repo nor pushed upstream.
  • Finally, within the deployment pipeline, publish as usual via npm publish

Limitations

I haven’t run across any major limitations with this workflow. There is some loss of information captured in the git repository, as the version in package.json is fixed at 0.0.0, but I’ve yet to come across that being an issue. I could potentially see issues if you want to allow developers to do deployments locally via npm publish but, in general, I view local deployments as an anti-pattern when done for anything beyond toy projects.

Writing to the Bitfenix ICON display with Rust (part 2, writing text)

Bitmap Fonts

Picking up from being able to successfully write images to the Bitfenix ICON display, I started looking into how to render textual content. Despite their lack of versatility, using a bitmap font was a natural option: they’re easy to work with, they’re a good fit for fixed-resolution displays, and rendering is fast.

I pulled up the following bitmap font from an old project (I don’t remember the source used to create this):

A few points about this font:

  • Each character glyph is 16×16, with a total of 256 characters in the 256×256 bitmap
  • Characters are arranged/indexed left-to-right, top-to-bottom, and the index of a 16×16 block will correspond with an ASCII or UTF8 codepoint value (e.g. the character at index 33 = “!”, which also maps to UTF8 codepoint 33)
  • Despite space for 256 characters, there’s a very limited set of characters here, but there’s enough for simple US English strings
  • While each character is 16×16 pixels, this is not a monospaced font, there is data for accompanying widths for each character
  • The glyphs are simply black and white (i.e. there’s no antialiasing on the character glyphs, we can simply ignore black pixels and not worry about blending into the background)

Rendering Strings

We need to render characters from the bitmap font onto something. We could create a new image but, building upon what was done in part 1, I decided to render atop this background image. The composite of the background image + characters will be the image written to the ICON display.

To start, we’ll load the background image just as we did in part 1, but we need to make it mutable, as we’ll be writing character pixels direct to it:

// Background image needs to be 240x320 (24bpp, no alpha channel) let mut background_image = reduce_image_to_16bit_color(&load_png_image("assets/1.png"));

Next, we’ll load the font PNG in the same manner (it doesn’t need to be mutable, as we’re not modifying the character pixels):

let font_image = reduce_image_to_16bit_color(&load_png_image("assets/fonts/font1.png"));

To handle text rendering a TextRenderer struct is declared with the rendering logic encapsulated in TextRenderer.render_string(), which loops through each character in the input string, looks up the location of the character in the bitmap font, and renders the 16×16 block of pixels for the character onto the background image. The x-location of where to render a character is incremented by the width of the previous character (found via lookup into TextRenderer::font_widths vector).

pub struct TextRenderer { font_widths: Vec<u8>, } impl TextRenderer { pub fn new() -> TextRenderer { TextRenderer { font_widths: build_font_width_vec() } } pub fn render_string(&self, txt: &str, x: u64, y: u64, fontimg: &[u8], outbuf: &mut [u8]) { // x position of where character should be rendered let mut cur_x = x; for ch in txt.chars() { // From codepoint, lookup x, y of character in font and width of character from self.font_widths let ch_idx = ch as u64; let ch_width = self.font_widths[(ch as u32 % 256) as usize]; let ch_x = (ch_idx % 16) * 16; let ch_y = ((ch_idx as f32 / 16.0) as u64) * 16; // For each character, copy the 16x16 block of pixels into outbuf for fy in ch_y..ch_y+16 { for fx in ch_x..ch_x+16 { let fidx: usize = ((fx + fy*256) * 2) as usize; let fdx = fx - ch_x; let fdy = fy - ch_y; let outbuf_idx: usize = (((cur_x + fdx) + (y + fdy)*240) * 2) as usize; // If the pixel from the font bitmap is not black, write it out to outbuf if fontimg[fidx] != 0x00 && fontimg[fidx+1] != 0x00 { outbuf[outbuf_idx] = fontimg[fidx]; outbuf[outbuf_idx + 1] = fontimg[fidx + 1]; } } } cur_x = cur_x + (ch_width as u64); } } }

The TextRenderer::font_widths vector is built from the build_font_width_vec() method, which simply builds and returns a vector of hardcoded values for the character widths:

fn build_font_width_vec() -> Vec<u8> { let result = vec![ 7, 7, 7, 7, 7, 7, 7, 7, 7, 30, 0, 7, 7, 0, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 5, 3, 5, 9, 7, 18, 8, 3, 5, 5, 7, 9, 4, 8, 3, 5, 8, 4, 7, 7, 8, 7, 7, 7, 7, 7, 3, 4, 6, 9, 6, 7, 9, 8, 8, 8, 8, 8, 8, 8, 8, 5, 7, 8, 7, 9, 9, 9, 8, 9, 8, 8, 9, 8, 9, 10, 9, 9, 8, 4, 6, 4, 8, 9, 5, 7, 7, 6, 7, 7, 6, 7, 7, 3, 5, 7, 3, 9, 7, 7, 7, 7, 6, 6, 6, 7, 7, 10, 7, 7, 6, 5, 3, 5, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 3, 3, 5, 5, 3, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 3, 6, 7, 7, 13, 3, 11, 10, 13, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, ]; result }

Writing to the ICON display

The final piece is simply creating a TextRenderer instance and calling render_string() with the necessary parameters. Here we’ll write out “Hello world!” to position 10, 20 on the display:

let tr = TextRenderer::new(); tr.render_string("Hello world!", 10, 20, &font_image, &mut background_image);
Bitfenix ICON image display with text

All of the code presented is up on the bitfenix-icon-sysstatus repo.

Next, I’m looking to play around with the systemstat library to print out something useful to the display.

Writing to the Bitfenix ICON display with Rust (part 1, writing images)

Bitfenix ICON

I love the Bitfenix Pandora case and used it for my desktop PC build a few years ago. One interesting aspect about this case is that it has an attached LCD on the front, the Bitfenix ICON display; I’ve never done anything with the display (I just left the default logo on it), but recently I discovered the source code was available and it was possible to programmatically write to the display, and became interested in how this is done and what I could do with the display.

Instead of using the code provided by Bitfenix, I used the modified source produced by Rizvi R. I fired up Visual C++, dealt with downloading and pointing the compiler + linker to necessary libraries, I hit a stumbling block with libjpeg, but stripped out the JPEG support and was able to get something up and running. The code loads an image, resizes it to fit the display, reduces the color depth to match the display, and then writes to the display using HIDAPI (which is really cool and something I didn’t realize existed). The code was ok, but I decided to see if I could modularize it further and rewrite it in Rust (avoiding some of the less modern things in C++ world, like package management) as a learning exercise.

The Display

The ICON display is far from any sort of high quality panel, but it’s adequate for something put on a case. I haven’t come across official specs, but here’s what I’ve been able to gather:

  • Resolution of 240×320
  • 16-bit color, 5-6-5 format (5 bits for red, 6 bits for green, 5 bits for blue)
  • Super slow refresh rate, like over 1 second (so not suitable for any sort of animation)

Rust Dependencies

We’ll make use of 2 Rust packages, png and hidapi via cargo:

[dependencies] png = "0.15.3" hidapi = "1.1.0"

Loading PNG images & converting to 16bpp

We can load a PNG image and get a pixel buffer as follows:

fn load_png_image(filepath: &str) -> Vec<u8> { let decoder = png::Decoder::new(File::open(filepath).unwrap()); let (info, mut reader) = decoder.read_info().unwrap(); // Allocate the output buffer. let mut buf = vec![0; info.buffer_size()]; reader.next_frame(&mut buf).unwrap(); buf }

This will work well for 24bpp images and return a buffer with R, G, and B components, 8 bits each. As the ICON is a 5-6-5 16bpp display, we’ll need to reduce the color depth by chopping off bits with some bit shift operations (for more details on what’s happening here, this is a good resource):

fn reduce_image_to_16bit_color(image_buf: &[u8]) -> Vec<u8> { let mut result: Vec<u8> = Vec::with_capacity(240 * 320 * 2); for i in (0..image_buf.len()).step_by(3) { let b: u16 = ((image_buf[i + 2] as u16) >> 3) & 0x001F; let g: u16 = (((image_buf[i + 1] as u16) >> 2) << 5) & 0x07E0; let r: u16 = (((image_buf[i] as u16) >> 3) << 11) & 0xF800; let rgb: u16 = r | g | b; result.push( ((rgb >> 8) & 0x00FF) as u8 ); result.push( (rgb & 0x00FF) as u8 ); } result }

We can now load an image and reduce the color depth as follows:

// Image needs to be 240x320 (24bpp, no alpha channel) let src_image = load_png_image("assets/1.png"); let reduced_color_img = reduce_image_to_16bit_color(&src_image);

Open the ICON device

Using the HIDAPI, we can open the device with it’s vendor ID (0x1fc9) and product ID (0x100b):

let hid = HidApi::new().unwrap(); let bitfenix_icon_device = hid.open(0x1fc9, 0x100b).unwrap();

I thought it might be useful to query and surface product and manufacturer info of the device as follows:

let device_manuf_string = bitfenix_icon_device.get_manufacturer_string().unwrap().unwrap(); let device_prod_string = bitfenix_icon_device.get_product_string().unwrap().unwrap(); println!("{}: {}", device_manuf_string.as_str(), device_prod_string.as_str());

However, this isn’t terribly insightful. We get a product string equal to “NXP Semiconductors” manufacturer string equal to “LPC11Uxx HID”, which is the microcontroller used on the device.

Writing to the display

With the pixel buffer and an open device, we can now write a new image to the display, in a 3 step process:

  • Clear the display
  • Write the buffer to the display
  • Refresh the display

I’m somewhat confused as to why the display needs to be cleared, but if I skip this step I end up with weird overdraw of pixels on top of the existing image. In any case, clearing the display is simple enough and involves writing a 6 byte code to the display:

fn clear_display(bitfenix_icon_device: &HidDevice) { let erase_flash_code: [u8; 6] = [0x0, 0x1, 0xde, 0xad, 0xbe, 0xef]; bitfenix_icon_device.write(&erase_flash_code).unwrap(); }

Writing the buffer to the display involves copying up to 64 bytes at a time to the display, this is a max of 61 bytes from the pixel buffer + a 3 byte header (indicating the operation and number of bytes being written):

fn write_image_to_display(bitfenix_icon_device: &HidDevice, image_buf: &[u8]) { let num_image_bytes_per_write = 61; /* +3 bytes for the header, note that the device only accepts writes in 64 bytes chunks */ let num_writes = ((image_buf.len() as f64 / num_image_bytes_per_write as f64).ceil()) as usize; for i in 0..num_writes { let start = i * num_image_bytes_per_write; let mut length = num_image_bytes_per_write; if i == (num_writes-1) { length = image_buf.len() - ((num_writes - 1) * num_image_bytes_per_write); } let mut image_data_with_header: Vec<u8> = Vec::with_capacity(length + 3); image_data_with_header.push(0x0); image_data_with_header.push(0x2); image_data_with_header.push(length as u8); for image_byte_idx in start..start+length { image_data_with_header.push(image_buf[image_byte_idx]); } bitfenix_icon_device.write(&image_data_with_header).unwrap(); } }

Refreshing the display requires writing a 2 bytes code to the display:

fn refresh_display(bitfenix_icon_device: &HidDevice) { let refresh_code: [u8; 2] = [0x0, 0x3]; bitfenix_icon_device.write(&refresh_code).unwrap(); }

Pulling it all together we have a few straightforward function calls:

println!("Writing new image..."); clear_display(&bitfenix_icon_device); // needs to be done or you end up with weird overwriting on top of exiting image write_image_to_display(&bitfenix_icon_device, &reduced_color_img); println!("Refreshing display..."); refresh_display(&bitfenix_icon_device);
Bitfenix ICON image

Code and future work

I have the above code up on GitHub. This was fun and a great learning experience. Next, I think it would be interesting to see if the display could be utilized to reflect information about the computer, maybe showing CPU usage or internet connection status. There could be some utility in providing such metrics to the user and it would give the display a purpose beyond that of a digital case badge.

Reel

I wrote a little desktop application to capture short videos and turn them into GIFs. I call it Reel. It’s still rough around the edges but you can grab an early version of it below.

Reel 0.1 (Windows Install)

I’ll have a Linux/Ubuntu version soon. Maybe an OS X version… I have to jump through a few extra hoops here as Apple still refuses to allow OS X to be virtualized.

Reel - Drinking Bird

Aside from its utility, this was also an experiment piecing together some technologies I’ve written about here before: XUL + XPCOM + SocketBridge, video capture using web tech and, in general, using web technologies for desktop applications.

Logging to stdout and a file

A simple way to log to both stdout and a file (using a pipe and tee):

./myapp 2>&1 | tee -a myapp.log

A more relational dictionary

As I started looking to add more functionality to Lexiio, I realized the Wiktionary definitions database dump I was using wasn’t going to cut it; specifically, I needed a normalized schema, or I’d have data duplication all over the place. I started normalizing in MySQL, but whether it was MySQL or MySQL Workbench, I kept running into character encoding issues. Using a simple INSERT-SELECT, in MySQL 5.7, to transfer words from the existing table to a new table resulted losing characters:

MySQL losing characters

I dumped the data into PostgreSQL, didn’t encounter the issue, and just kept working from there.

The normalized schema can be downloaded here: LexiioDB normalized
(released under the Creative Commons Attribution-ShareAlike License)

LexiioDB schema

The unknown_words and unknown_to_similar_words tables is specific to Lexiio and serve as a place to store unknown words entered by the user and close/similar matches to known words (via the Levenshtein distance).

Lexiio

Another little experiment of mine: Lexiio, a web-based CLI dictionary.

Lexiio

A few takeaways:

  • Part of the reason for building this was that I wanted to actually make use of the Wiktionary data set snapshot in a real project. The data set is pretty comprehensive, and easy to parse and work with.
  • This was also a learning exercise for Golang. There’s nothing complex here but, so far, working with Go has been enjoyable. I like that I’m building a native application, types are enforced, and the HTTP server included as part of the standard library is incredibly easy to setup and work with.
  • I wanted to experiment a bit with what a web-based CLI would look and feel like. For something like a dictionary, where user interaction revolves around textual input/output, a command-line interface seems to work really well.

Function argument tricks

I came across this “trick” on coderwall for avoiding an if statement when you have a a function argument that is allowed to be either an array or a scalar value (something that seems oddly common in loosely typed languages).

function example(ids) {
;[].concat(ids).forEach(
function (id) {
// ...
})
}

The trick in question is just concatenating the argument with an empty array so that, within the function, you’re always dealing with an array and elements of the array.

Taking a step back, the bigger question is why do this?
In my experience, it’s almost always better to keep the code within the function straightforward and force the caller to adapt and give the function what it needs. In this case, that means simply forcing the caller to always pass an array.

The Future of Programming

This is a great talk given by Bret Victor that I came across a while ago:

Bret Victor – The Future of Programming from Bret Victor on Vimeo.

All four ideas presented resonate with me any my work, particularly direct manipulation of data, as I’m continually disappointed when I see markup languages and frameworks seen as the go-to solution in places where proper tools would yield better and faster results.